Wire Protocol
Typed NDJSON messages. Every interaction is a standardized request/response pair over a single transport.
Plugin Runtime
Thin execution kernel that loads plugins, routes messages by type, and manages session lifecycle.
Capability Namespaces
10 standard namespaces — env, tools, memory, learn, proc, skills, toolkits, chains, mcp, subagents.
Like POSIX for agents — the protocol is the interface. The implementations are swappable plugins.
type_to_plugins — every plugin declares what it handles, and the executor dispatches.
Agent code is coupled to every backend it talks to — until the protocol absorbs the churn. Swap the memory plugin from in-memory to SQLite to Pinecone. The agent never notices. The only cost: swap the plugin.
learn/adapt/req runs in the same message flow as env/step/req. No separate training pipeline. No offline batch job. No vendor support ticket. The agent improves between turns.
Every plugin is a Python class you control. The env backend is yours. The adaptation strategy is yours. The audit log is yours. When training data is just learn/feedback/req messages in your JSONL, you're not a tenant.
| Namespace | What It Does | Key Messages |
|---|---|---|
| env | RL environments: step, reset, observe, reward | env/reset/req, env/step/req, env/observe/req |
| tools | Named callable functions with structured I/O | tool/list/req, tool/call/req |
| memory | Three-tier: working, episodic, semantic | memory/store/req, memory/recall/req |
| learn | Feedback, experience replay, adaptation | learn/feedback/req, learn/adapt/req |
| proc | Long-running subprocess management | proc/spawn/req, proc/signal/req |
| skills | Named, versioned, sandboxed execution | skill/call/req, skill/list/req |
| chains | DAG pipelines with branching | chain/execute/req, chain/list/req |
| mcp / subagents / toolkits | MCP bridge, multi-agent orchestration, tool bundles | Extension namespaces via plugin registration |
Every capability shares the same transport. One NDJSON connection, one A2EClient, N capability wrappers. No separate connections to a tool server, a memory server, and an env server.
Swap the transport config (HTTP → Direct → Subprocess). Or swap the plugin on the host. The agent code never changes.
tools.call("read_file", {path})
Call named functions with structured I/O. Results flow back as ToolResult with success/failure, output, and timing.
env.step(episode_id, action)
Send action, receive next_state + reward + done. The core RL primitive — test and training in one message.
skills.call("analyze", {input})
Sandboxed execution units. Versioned, tagged, with streaming event support for long-running operations.
chains.run(nodes, entry_node)
DAG pipelines with branching, fan-out, and node-level event streaming. For complex multi-step reasoning.
Feedback — learn.feedback(polarity, score, dimension, source) records a structured evaluation signal tied to the action just taken.
Memory — memory.remember(key, value, tier="episodic") stores the turn's trajectory context for future retrieval.
Adapt — learn.adapt(strategy="ucb1") updates routing weights every 5 steps. The agent improves between turns.
Subclass EnvPlugin, implement on_reset() and on_step(). The base class handles protocol dispatch, req_id injection, error wrapping, and audit logging.
Manages episode → env mapping. Each env/step auto-records an (s, a, r, s', done) experience tuple via the optional learning hook.
Load plugins via config.yaml. Each plugin registers its message types. Agent negotiates capabilities at handshake.
type_to_plugins dispatch table.
reset, step, observe, close
list, call, search, events
store, recall, forget, merge
feedback, experience, adapt
Not a framework, not an SDK you build on top of — a standard wire format between agent and environment. Like POSIX, the interface is the product; implementations are interchangeable.
The agent calls tools, steps environments, invokes skills, runs chains, and records learning — all through the same session. The executor routes by message type. Adding a capability means adding a plugin.
Feedback, experience, and adaptation are protocol messages. The same message bus carries env/step/req and learn/feedback/req. The loop isn't a hack — it's architecture.
Auto-learning hooks, episode-to-env tracking, reward computation — these live in the harness, not the agent. The agent sends simple messages. The harness does the orchestration.