How Agent-WeChat Is Architected

How agent-wechat runs WeChat in a container, automates its UI with a state machine, and exposes it all through four client libraries.

In the previous post, I covered the reverse-engineering story: using Claude Code and Frida to crack open WeChat’s encryption, extract database keys from memory, and build a programmatic interface to the official client. This post is about how the system actually works, and how you’d use it.

Architecture: one API, many clients

Two problems drive the architecture:

WeChat needs a controlled environment. As we established in the previous post, UI automation, database reads, and memory instrumentation all need to run alongside the WeChat binary. A Docker container gives us that: one container = one isolated WeChat instance with everything it needs.

Different use cases need different interfaces. A CLI is great for quick control. OpenClaw needs a channel plugin. Wechaty users want to connect their existing bots. Rather than building each of these into the container, we put a REST + WebSocket API in front of everything and let each client talk to it in its own idiomatic way.

graph LR
    subgraph Clients
        CLI["CLI (wx)"]
        WP["Wechaty Puppet"]
        OC["OpenClaw Plugin"]
        WC2["Wechaty Client"]
    end

    subgraph Gateway Container
        GW["Wechaty Gateway<br/>(gRPC)"]
    end

    subgraph Agent Container
        AS["agent-server<br/>(Rust/Axum)"]
        WC["WeChat Binary"]
        XV["Xvfb<br/>(Virtual Display)"]
        DB["SQLCipher DBs"]
    end

    CLI -->|REST + WS| AS
    WP -->|REST + WS| AS
    OC -->|REST + WS| AS
    WC2 -->|gRPC| GW
    GW -->|REST + WS| AS

    AS -->|AT-SPI| WC
    AS -->|Frida| WC
    AS -->|SQLCipher| DB
    WC --> XV
    WC -->|Read/Write| DB

One container = one WeChat instance. The design puts all the intelligence in the container: the agent-server handles database reads, memory instrumentation, and UI automation, so clients don’t need to know anything about WeChat internals. They just make API calls. This means you can swap clients freely, run the container locally or in the cloud, and the same server handles everything.

The agent-server is written in Rust to keep resource usage low, since it shares a container with WeChat itself. The Wechaty Gateway is a separate container that bridges existing Wechaty clients (over gRPC) to the agent-server’s REST API, so you can plug agent-wechat into a Wechaty codebase without changing anything.

A few examples of what the API looks like:

# Check login status
GET /api/status/auth

# List recent chats
GET /api/chats?limit=20

# Send a message
POST /api/messages/send  { "chatId": "wxid_abc123", "text": "hello" }

# Get messages from a chat
GET /api/messages/wxid_abc123?limit=50

# Download media from a message
GET /api/messages/wxid_abc123/media/12345

Lifecycle

When a client issues a command, the agent-server coordinates between the client and WeChat:

sequenceDiagram
    participant C as Client
    participant S as agent-server
    participant W as WeChat

    Note over S,W: Container starts: Xvfb, D-Bus, WeChat boot up
    C->>S: Command (e.g. login, send message)
    S->>W: Read DB / instrument memory / drive UI
    W-->>S: State changes, data
    S-->>C: Result (or timeout)

UI state automation: react to what you see, not what you expect

Two problems make UI-based RPA hard, and both need to be solved for automation to be reliable:

UI state is non-deterministic. You can’t predict what screen you’ll see next. Network errors cause popups. The user scans a QR code but doesn’t confirm on their phone. The app is already logged in from a previous session. These are all external factors outside the automation’s control, and any of them can derail a script that assumes a fixed sequence of screens.

Commands span multiple UI states. A single operation like “login” touches several screens: QR code, phone confirmation, possibly an error dialog, then the main chat window. The automation needs to track where it is in this multi-step sequence separately from what’s currently on screen, because the two don’t always align.

The solution borrows directly from Redux:

The plan selects actions based on the current state, but never modifies state itself. This separation means the plan doesn’t assume a fixed sequence of screens; it just responds to whatever state the reducer produces.

graph TD
    subgraph s1 [" "]
        A["OBSERVE<br/>a11y tree + screenshot"] --> B["IDENTIFY<br/>match known UI state"]
        subgraph s2 [" "]
            B --> C["REDUCE<br/>update abstract AppState"]
            C --> D["SELECT<br/>plan picks next action"]
            D --> E["EXECUTE<br/>click / type / key / scroll"]
            E --> F{"Goal<br/>reached?"}
            F -->|Yes| G["Return result"]
        end
    end
    F -->|No| A
    classDef invisible fill:none,stroke:none
    class s1,s2 invisible

One iteration during login, when the QR code is on screen:

StepInputsOutputsSide effects
ObserveVirtual displayA11y tree with QR image and “Scan to Log In” label
IdentifyA11y treeMatched state: QR login screen
ReducePrevious state + identified stateLoginQr
SelectCurrent state + login planExtract QR action
ExecuteExtract QR actionSend QR to client via WebSocket

Next iteration, if the user scanned, the screen changed, the reducer produces a new state, and the plan adjusts. If a popup appeared instead, same thing.

The same pattern handles sending messages, opening chats, and logging out.

Using agent-wechat: four ways in

There are four ways to talk to agent-wechat. All of them end up hitting the same REST + WebSocket API on the container.

CLI

The wx command gives you direct access from your terminal. Start by pulling and running the container (requires Docker Desktop or Colima):

$ wx up
Pulling ghcr.io/thisnick/agent-wechat:latest...
Starting agent-wechat container...
Container running on http://localhost:6174

$ wx auth login
Scan the QR code with WeChat:
█████████████████████████████
█████████████████████████████
█████████████████████████████

Login successful. User: Nick

$ wx chats list --limit 3
wxid_abc123   Nick           Hey! What's up?    22:53
wxid_def456   Workflowly     OpenClaw's bug..   21:58
wxid_ghi789   File Transfer                     21:30

$ wx messages send wxid_abc123 "Meeting at 3pm tomorrow?"
Sent.

$ wx messages list wxid_abc123 --limit 3
[22:54] → Meeting at 3pm tomorrow?
[22:53] ← Hey! What's up?
[22:50] ← Here?

Install with npm install -g @agent-wechat/cli. By default, the CLI connects to http://localhost:6174 and reads your token from ~/.config/agent-wechat/token. To point it at a remote instance:

export AGENT_WECHAT_URL=https://your-instance.agent-wx.app
export AGENT_WECHAT_TOKEN=your-token-here

Wechaty Puppet

If you’re building a bot, the Wechaty puppet gives you an event-driven API. It implements Wechaty’s standard puppet interface, so you get message handlers, contact management, and room support.

import { WechatyBuilder } from 'wechaty'
import { PuppetAgentWeChat } from '@agent-wechat/wechaty-puppet'

const bot = WechatyBuilder.build({
  puppet: new PuppetAgentWeChat({
    serverUrl: 'http://localhost:6174',
    token: process.env.AGENT_WECHAT_TOKEN,
  }),
})

bot.on('scan', (qrcode, status) => {
  console.log(`Scan QR: https://wechaty.js.org/qrcode/${encodeURIComponent(qrcode)}`)
})

bot.on('login', (user) => {
  console.log(`Logged in as ${user.name()}`)
})

bot.on('message', async (msg) => {
  if (msg.text() === 'ping') {
    await msg.say('pong')
  }
})

await bot.start()

The puppet polls for new messages every 2 seconds and streams login events over WebSocket. It works against a local container or a remote hosted instance; just change the serverUrl.

Wechaty Gateway

The gateway wraps the puppet in a gRPC service. If you don’t want to add @agent-wechat/wechaty-puppet as a dependency in your code, you can run the gateway as a sidecar and connect to it using the standard wechaty-puppet-service client. This is especially useful if you already have a Wechaty codebase and just want to point it at a hosted agent-wechat instance.

Connect to a remote gateway using wechaty-puppet-service:

import { existsSync, readFileSync } from 'fs'
import { PuppetService } from 'wechaty-puppet-service'

// Use system CAs for TLS verification
for (const p of ['/etc/ssl/cert.pem', '/etc/ssl/certs/ca-certificates.crt']) {
  if (existsSync(p)) {
    process.env.WECHATY_PUPPET_SERVICE_TLS_CA_CERT = readFileSync(p, 'utf-8')
    break
  }
}

const endpoint = 'your-instance.agent-wx.app:8443'
const token = process.env.WECHATY_TOKEN // provided as-is, includes SNI prefix

const puppet = new PuppetService({
  endpoint,
  token,
  tls: { serverName: endpoint.split(':')[0] },
})

puppet.on('message', async (payload) => {
  const msg = await puppet.messagePayload(payload.messageId)
  console.log(`[${msg.talkerId}] ${msg.text}`)
})

await puppet.start()

If you’re using a hosted instance, the token we provide already includes the SNI prefix, so you can pass it directly.

OpenClaw

OpenClaw is an open-source AI personal assistant you interact with through messaging platforms. The agent-wechat plugin adds WeChat as a channel, so your assistant can send and receive WeChat messages, handle images, and manage group conversations.

Install and configure it:

# Install the extension
openclaw plugins install @agent-wechat/wechat

# Add WeChat as a channel (defaults to localhost:6174)
openclaw channels add --channel wechat

# Or with a remote server
openclaw channels add --channel wechat --url <url> --token <token>

# Restart the gateway to pick up the new channel
openclaw gateway restart

Once running, tell your agent “Log in to WeChat” in whatever channel you’ve set up (Slack, Telegram, etc.). The agent will generate a QR code image right in the chat. Scan it with WeChat on your phone, confirm, and the session is live. You only need to do this once; the session persists across container restarts.

The plugin supports DM and group chat policies (open, allowlist, or disabled), and per-group mention requirements.

Hosting: run it yourself or let me run it for you

Self-hosting

The simplest setup is Docker Compose:

# Generate a token first:
#   mkdir -p ~/.config/agent-wechat
#   openssl rand -hex 32 > ~/.config/agent-wechat/token

services:
  agent-wechat:
    image: ghcr.io/thisnick/agent-wechat:latest
    security_opt:
      - seccomp=unconfined
    cap_add:
      - SYS_PTRACE
      - NET_ADMIN
    ports:
      - "6174:6174"
    volumes:
      - agent-wechat-data:/data
      - agent-wechat-home:/home/wechat
      - ~/.config/agent-wechat/token:/data/auth-token:ro
    environment:
      - PROXY=${PROXY:-}
    restart: unless-stopped

volumes:
  agent-wechat-data:
  agent-wechat-home:

SYS_PTRACE and seccomp=unconfined mean you need a real VM or a container runtime that allows privileged capabilities. Cloud Run, Fargate, and similar serverless container platforms won’t work.

Avoiding datacenter IP detection

Cloud providers use datacenter IPs, and WeChat may flag them. If you’re hosting in the cloud, route your outgoing traffic through a residential proxy. Set the PROXY environment variable and the container uses redsocks to transparently route all traffic through it. A residential proxy with sticky sessions works best, since you want WeChat to see a consistent, residential IP.

Don’t want to self-host?

I can set up a hosted instance for you, running on GCE with residential proxy routing already configured. You get a URL and a token. Plug them into the CLI, Wechaty Puppet, or OpenClaw and you’re up. Wechaty gateway is available on a separate port for existing Wechaty users.

The hosting fee supports ongoing development. WeChat ships new binaries regularly, and each update means finding new memory offsets for key extraction.

Interested? Reach out on GitHub or at nick@thisnick.me.


The code is open source at thisnick/agent-wechat. The previous post covers how the reverse engineering worked. Questions? nick@thisnick.me.