A2E PROTOCOL
Building an
A2E-Integrated Agent & Harness
From environment plugin to RL loop — a complete walkthrough of the Agent-to-Environment protocol
env tools learn memory NDJSON
01 / 12
A2E PROTOCOL
The A2E Design Framework
Three layers that define how agents talk to environments

Layer 1

Wire Protocol
Typed NDJSON messages. Every interaction is a standardized request/response pair over a single transport.

Layer 2

Plugin Runtime
Thin execution kernel that loads plugins, routes messages by type, and manages session lifecycle.

Layer 3

Capability Namespaces
10 standard namespaces — env, tools, memory, learn, proc, skills, toolkits, chains, mcp, subagents.

Key Idea

Like POSIX for agents — the protocol is the interface. The implementations are swappable plugins.

Design principle: The host has no global tool registry. It routes solely by type_to_plugins — every plugin declares what it handles, and the executor dispatches.
02 / 12
A2E PROTOCOL
Strategic Value
Three reasons the framework changes the economics of agent infrastructure
01

Decoupling

Agent code is coupled to every backend it talks to — until the protocol absorbs the churn. Swap the memory plugin from in-memory to SQLite to Pinecone. The agent never notices. The only cost: swap the plugin.

02

Programmable Loop

learn/adapt/req runs in the same message flow as env/step/req. No separate training pipeline. No offline batch job. No vendor support ticket. The agent improves between turns.

03

Ownership

Every plugin is a Python class you control. The env backend is yours. The adaptation strategy is yours. The audit log is yours. When training data is just learn/feedback/req messages in your JSONL, you're not a tenant.

The argument: In an era where software production cost trends toward zero, the cost of churn — swapping a memory backend or migrating off a deprecated platform — trends up. A2E inverts this tradeoff.
03 / 12
A2E PROTOCOL
10 Capability Namespaces
Every interaction is a typed message. Every capability is a plugin.
NamespaceWhat It DoesKey Messages
envRL environments: step, reset, observe, rewardenv/reset/req, env/step/req, env/observe/req
toolsNamed callable functions with structured I/Otool/list/req, tool/call/req
memoryThree-tier: working, episodic, semanticmemory/store/req, memory/recall/req
learnFeedback, experience replay, adaptationlearn/feedback/req, learn/adapt/req
procLong-running subprocess managementproc/spawn/req, proc/signal/req
skillsNamed, versioned, sandboxed executionskill/call/req, skill/list/req
chainsDAG pipelines with branchingchain/execute/req, chain/list/req
mcp / subagents / toolkitsMCP bridge, multi-agent orchestration, tool bundlesExtension namespaces via plugin registration
This post focuses on env + tools + memory + learn — the four pillars of a complete RL agent loop.
04 / 12
A2E PROTOCOL
The Agent Client
Single session. Single transport. Multiple capabilities.
class EnvClient:
    def __init__(self, logger):
        transport = build_transport(HTTPTransportConfig(
            base_url="http://localhost:8765"
        ))
        self.client = A2EClient(
            transport=transport,
            agent_id="react-agent",
            agent_caps=[ENV, PROC, TOOLKITS]
        )

        self.env = EnvAPI(self.client)
        self.memory = MemoryAPI(self.client)
        self.learn = LearnAPI(self.client)
        self.tools = ToolAPI(self.client)

Key Pattern

Every capability shares the same transport. One NDJSON connection, one A2EClient, N capability wrappers. No separate connections to a tool server, a memory server, and an env server.

To change backends

Swap the transport config (HTTP → Direct → Subprocess). Or swap the plugin on the host. The agent code never changes.

05 / 12
A2E PROTOCOL
The Multi-Capability ReAct Loop
Four dispatch branches, one protocol session
observe
recall
plan
act
learn

Tool Action

tools.call("read_file", {path})
Call named functions with structured I/O. Results flow back as ToolResult with success/failure, output, and timing.

Env Action

env.step(episode_id, action)
Send action, receive next_state + reward + done. The core RL primitive — test and training in one message.

Skill Action

skills.call("analyze", {input})
Sandboxed execution units. Versioned, tagged, with streaming event support for long-running operations.

Chain Action

chains.run(nodes, entry_node)
DAG pipelines with branching, fan-out, and node-level event streaming. For complex multi-step reasoning.

06 / 12
A2E PROTOCOL
Learning from Each Step
On-policy RL over a standard protocol
1

Feedbacklearn.feedback(polarity, score, dimension, source) records a structured evaluation signal tied to the action just taken.

2

Memorymemory.remember(key, value, tier="episodic") stores the turn's trajectory context for future retrieval.

3

Adaptlearn.adapt(strategy="ucb1") updates routing weights every 5 steps. The agent improves between turns.

On-policy by design: The policy that generated the experience is the same policy that gets updated. Credit assignment is exact. Learning is immediate.

Feedback Dimensions

correctness helpfulness safety plan_quality tone

Feedback Sources

HUMAN ENV SELF
07 / 12
A2E PROTOCOL
Building the Harness
Three layers: Environment Plugin → Host Adapter → A2E Server

Layer 1: EnvPlugin

Subclass EnvPlugin, implement on_reset() and on_step(). The base class handles protocol dispatch, req_id injection, error wrapping, and audit logging.

class CounterEnv(EnvPlugin):
    name = "counter"
    def on_reset(self, seed, options):
        self.state = EnvState(count=0)
        return self.state

Layer 2: Host Adapter

Manages episode → env mapping. Each env/step auto-records an (s, a, r, s', done) experience tuple via the optional learning hook.

Auto-learning: Every step produces a training sample. The agent doesn't have to call an additional API.

Layer 3: Server Wiring

Load plugins via config.yaml. Each plugin registers its message types. Agent negotiates capabilities at handshake.

plugins:
  - name: counter_env
    cls: .counter_env.CounterEnv
  - name: mymemory
    cls: .inmemory.InMemoryPlugin
08 / 12
A2E PROTOCOL
End-to-End Data Flow
One episode, four plugin handoffs
AGENT
env/reset/req
EnvPlugin
AGENT
memory/recall/req
MemoryPlugin
AGENT
env/step/req
EnvPlugin
auto-learn
AGENT
learn/feedback/req
LearnPlugin
AGENT
learn/adapt/req
↑ updated weights
One protocol session. Four plugins. Zero shared state. The executor routes by message type — every message finds its handler through the type_to_plugins dispatch table.
09 / 12
A2E PROTOCOL
Architecture Diagram
The three-layer stack with the protocol as the interface
YOUR AGENT
ReAct or DeepAgent loop
Any model (Claude, GPT, etc.)
Tool-aware planning
Sends typed NDJSON messages
A2E PROTOCOL
NDJSON messages
env/*, tool/*, learn/*
Capability negotiation
Audit logging
The interface you own
YOUR HOST (swappable plugins)

EnvPlugin

reset, step, observe, close

ToolPlugin

list, call, search, events

MemoryPlugin

store, recall, forget, merge

LearnPlugin

feedback, experience, adapt

Protocol-first, not SDK-first. Agent and host communicate over typed NDJSON. They don't share imports, class hierarchies, or runtime state. The protocol is the contract.
10 / 12
A2E PROTOCOL
Key Design Principles
What makes this architecture different

Protocol is the Interface

Not a framework, not an SDK you build on top of — a standard wire format between agent and environment. Like POSIX, the interface is the product; implementations are interchangeable.

Multi-Capability Dispatch

The agent calls tools, steps environments, invokes skills, runs chains, and records learning — all through the same session. The executor routes by message type. Adding a capability means adding a plugin.

Learning is First-Class

Feedback, experience, and adaptation are protocol messages. The same message bus carries env/step/req and learn/feedback/req. The loop isn't a hack — it's architecture.

Harness Owns the Integration

Auto-learning hooks, episode-to-env tracking, reward computation — these live in the harness, not the agent. The agent sends simple messages. The harness does the orchestration.

Build your agents against the protocol. Own your harness. Swap your plugins. The agent never notices.
11 / 12
A2E PROTOCOL
Own the Interface.
Swap the Plugins.
The agent never notices.
a2eprotocol.github.io/docs github.com/a2eprotocol/python-sdk env+tool+memory+learn
agent.py
A2E messages
host adapter
learn
12 / 12
A2E Protocol — Agent-to-Environment — docs