How Agent-WeChat Is Architected

How agent-wechat runs WeChat in a container, automates its UI with a state machine, and exposes it all through four client libraries.

In the previous post, I covered the reverse-engineering story: using Claude Code and Frida to crack open WeChat’s encryption, extract database keys from memory, and build a programmatic interface to the official client. This post is about how the system actually works, and how you’d use it.

Architecture: one API, many clients

Two problems drive the architecture:

WeChat needs a controlled environment. As we established in the previous post, UI automation, database reads, and memory instrumentation all need to run alongside the WeChat binary. A Docker container gives us that: one container = one isolated WeChat instance with everything it needs.

Different use cases need different interfaces. A CLI is great for quick control. OpenClaw needs a channel plugin. Wechaty users want to connect their existing bots. Rather than building each of these into the container, we put a REST + WebSocket API in front of everything and let each client talk to it in its own idiomatic way.

graph LR
    subgraph Clients
        CLI["CLI (wx)"]
        WP["Wechaty Puppet"]
        OC["OpenClaw Plugin"]
        WC2["Wechaty Client"]
    end

    subgraph Gateway Container
        GW["Wechaty Gateway<br/>(gRPC)"]
    end

    subgraph Agent Container
        AS["agent-server<br/>(Rust/Axum)"]
        WC["WeChat Binary"]
        XV["Xvfb<br/>(Virtual Display)"]
        DB["SQLCipher DBs"]
    end

    CLI -->|REST + WS| AS
    WP -->|REST + WS| AS
    OC -->|REST + WS| AS
    WC2 -->|gRPC| GW
    GW -->|REST + WS| AS

    AS -->|AT-SPI| WC
    AS -->|Frida| WC
    AS -->|SQLCipher| DB
    WC --> XV
    WC -->|Read/Write| DB

One container = one WeChat instance. The design puts all the intelligence in the container: the agent-server handles database reads, memory instrumentation, and UI automation, so clients don’t need to know anything about WeChat internals. They just make API calls. This means you can swap clients freely, run the container locally or in the cloud, and the same server handles everything.

The agent-server is written in Rust to keep resource usage low, since it shares a container with WeChat itself. The Wechaty Gateway is a separate container that bridges existing Wechaty clients (over gRPC) to the agent-server’s REST API, so you can plug agent-wechat into a Wechaty codebase without changing anything.

A few examples of what the API looks like:

# Check login status
GET /api/status/auth

# List recent chats
GET /api/chats?limit=20

# Send a message
POST /api/messages/send  { "chatId": "wxid_abc123", "text": "hello" }

# Get messages from a chat
GET /api/messages/wxid_abc123?limit=50

# Download media from a message
GET /api/messages/wxid_abc123/media/12345

When a client issues a command, the agent-server coordinates the work: it reads the database, instruments memory with Frida, or drives the UI — then returns the result (or a timeout) to the client. All the complexity stays inside the container.

UI state automation: react to what you see, not what you expect

Two problems make UI-based RPA hard, and both need to be solved for automation to be reliable:

UI state is non-deterministic. You can’t predict what screen you’ll see next. Network errors cause popups. The user scans a QR code but doesn’t confirm on their phone. The app is already logged in from a previous session. These are all external factors outside the automation’s control, and any of them can derail a script that assumes a fixed sequence of screens.

Commands span multiple UI states. A single operation like “login” touches several screens: QR code, phone confirmation, possibly an error dialog, then the main chat window. The automation needs to track where it is in this multi-step sequence separately from what’s currently on screen, because the two don’t always align.

The solution borrows directly from Redux:

The plan selects actions based on the current state, but never modifies state itself. This separation means the plan doesn’t assume a fixed sequence of screens; it just responds to whatever state the reducer produces.

graph TD
    subgraph s1 [" "]
        A["OBSERVE<br/>a11y tree + screenshot"] --> B["IDENTIFY<br/>match known UI state"]
        subgraph s2 [" "]
            B --> C["REDUCE<br/>update abstract AppState"]
            C --> D["SELECT<br/>plan picks next action"]
            D --> E["EXECUTE<br/>click / type / key / scroll"]
            E --> F{"Goal<br/>reached?"}
            F -->|Yes| G["Return result"]
        end
    end
    F -->|No| A
    classDef invisible fill:none,stroke:none
    class s1,s2 invisible

Each pass through the loop, the server captures the accessibility tree via AT-SPI (and optionally a screenshot), then pattern-matches it against known UI states — QR login screen, chat window, popup dialog. A pure reducer function takes the previous AppState plus this observation and produces a new state, just like a Redux reducer. The active plan then picks the next action based on that state and its goal, and the execution engine carries it out: click a button, type text, press a key, scroll.

During login, for instance, the loop observes a QR code on screen, reduces state to LoginQr, and the plan extracts the QR image and sends it to the client over WebSocket. Next iteration, if the user scanned, the screen changed, the reducer produces a new state, and the plan adjusts. If a popup appeared instead, same thing. The same pattern handles sending messages, opening chats, and logging out.

Using agent-wechat: four ways in

There are four client libraries, all hitting the same REST + WebSocket API on the container. Use the CLI for quick terminal control, the Wechaty Puppet for building bots, the Wechaty Gateway to plug into an existing Wechaty codebase without new dependencies, or the OpenClaw plugin if you already run an OpenClaw agent.

CLI

The wx command gives you direct access from your terminal. Start by pulling and running the container (requires Docker Desktop or Colima):

$ wx up
Pulling ghcr.io/thisnick/agent-wechat:latest...
Starting agent-wechat container...
Container running on http://localhost:6174

$ wx auth login
Scan the QR code with WeChat:
█████████████████████████████
█████████████████████████████
█████████████████████████████

Login successful. User: Nick

$ wx chats list --limit 3
wxid_abc123   Nick           Hey! What's up?    22:53
wxid_def456   Workflowly     OpenClaw's bug..   21:58
wxid_ghi789   File Transfer                     21:30

$ wx messages send wxid_abc123 "Meeting at 3pm tomorrow?"
Sent.

$ wx messages list wxid_abc123 --limit 3
[22:54] → Meeting at 3pm tomorrow?
[22:53] ← Hey! What's up?
[22:50] ← Here?

Install with npm install -g @agent-wechat/cli. By default, the CLI connects to http://localhost:6174 and reads your token from ~/.config/agent-wechat/token. To point it at a remote instance:

export AGENT_WECHAT_URL=https://your-instance.agent-wx.app
export AGENT_WECHAT_TOKEN=your-token-here

Wechaty Puppet

If you’re building a bot, the Wechaty puppet gives you an event-driven API. It implements Wechaty’s standard puppet interface, so you get message handlers, contact management, and room support.

import { WechatyBuilder } from 'wechaty'
import { PuppetAgentWeChat } from '@agent-wechat/wechaty-puppet'

const bot = WechatyBuilder.build({
  puppet: new PuppetAgentWeChat({
    serverUrl: 'http://localhost:6174',
    token: process.env.AGENT_WECHAT_TOKEN,
  }),
})

bot.on('scan', (qrcode, status) => {
  console.log(`Scan QR: https://wechaty.js.org/qrcode/${encodeURIComponent(qrcode)}`)
})

bot.on('login', (user) => {
  console.log(`Logged in as ${user.name()}`)
})

bot.on('message', async (msg) => {
  if (msg.text() === 'ping') {
    await msg.say('pong')
  }
})

await bot.start()

The puppet polls for new messages every 2 seconds and streams login events over WebSocket. It works against a local container or a remote hosted instance; just change the serverUrl.

Wechaty Gateway

The gateway wraps the puppet in a gRPC service. If you don’t want to add @agent-wechat/wechaty-puppet as a dependency in your code, you can run the gateway as a sidecar and connect to it using the standard wechaty-puppet-service client. This is especially useful if you already have a Wechaty codebase and just want to point it at a hosted agent-wechat instance.

Connect to a remote gateway using wechaty-puppet-service:

import { existsSync, readFileSync } from 'fs'
import { PuppetService } from 'wechaty-puppet-service'

// Use system CAs for TLS verification
for (const p of ['/etc/ssl/cert.pem', '/etc/ssl/certs/ca-certificates.crt']) {
  if (existsSync(p)) {
    process.env.WECHATY_PUPPET_SERVICE_TLS_CA_CERT = readFileSync(p, 'utf-8')
    break
  }
}

const endpoint = 'your-instance.agent-wx.app:8443'
const token = process.env.WECHATY_TOKEN // provided as-is, includes SNI prefix

const puppet = new PuppetService({
  endpoint,
  token
})

puppet.on('message', async (payload) => {
  const msg = await puppet.messagePayload(payload.messageId)
  console.log(`[${msg.talkerId}] ${msg.text}`)
})

await puppet.start()

If you’re using a hosted instance, the token we provide already includes the SNI prefix, so you can pass it directly.

OpenClaw Plugin

OpenClaw is an open-source AI personal assistant you interact with through messaging platforms. The agent-wechat plugin adds WeChat as a channel, so your assistant can send and receive WeChat messages, handle images, and manage group conversations.

Install and configure it:

# Install the extension
openclaw plugins install @agent-wechat/wechat

# Add WeChat as a channel (defaults to localhost:6174)
openclaw channels add --channel wechat

# Or with a remote server
openclaw channels add --channel wechat --url <url> --token <token>

# Restart the gateway to pick up the new channel
openclaw gateway restart

Once running, tell your agent “Log in to WeChat” in whatever channel you’ve set up (Slack, Telegram, etc.). The agent will generate a QR code image right in the chat. Scan it with WeChat on your phone, confirm, and the session is live. You only need to do this once; the session persists across container restarts.

The plugin supports DM and group chat policies (open, allowlist, or disabled), and per-group mention requirements.

Hosting

Self-hosting

The simplest setup is Docker Compose:

# Generate a token first:
#   mkdir -p ~/.config/agent-wechat
#   openssl rand -hex 32 > ~/.config/agent-wechat/token

services:
  agent-wechat:
    image: ghcr.io/thisnick/agent-wechat:latest
    security_opt:
      - seccomp=unconfined
    cap_add:
      - SYS_PTRACE
      - NET_ADMIN
    ports:
      - "6174:6174"
    volumes:
      - agent-wechat-data:/data
      - agent-wechat-home:/home/wechat
      - ~/.config/agent-wechat/token:/data/auth-token:ro
    environment:
      - PROXY=${PROXY:-}
    restart: unless-stopped

volumes:
  agent-wechat-data:
  agent-wechat-home:

SYS_PTRACE and seccomp=unconfined mean you need a real VM or a container runtime that allows privileged capabilities. Cloud Run, Fargate, and similar serverless container platforms won’t work.

Avoiding datacenter IP detection

Cloud providers use datacenter IPs, and WeChat may flag them. If you’re hosting in the cloud, route your outgoing traffic through a residential proxy. Set the PROXY environment variable and the container uses redsocks to transparently route all traffic through it. A residential proxy with sticky sessions works best, since you want WeChat to see a consistent, residential IP.

Hosted option

If you’d rather skip the ops, We can set up a hosted instance on GCE with residential proxy routing already configured. You get a URL and a token — plug them into the CLI, Wechaty Puppet, or OpenClaw and you’re up. Wechaty Gateway is available on a separate port for existing Wechaty users. The hosting fee supports ongoing development; WeChat ships new binaries regularly, and each update means finding new memory offsets for key extraction.


The code is open source at thisnick/agent-wechat. Questions or interested in hosting? nick@thisnick.me.