A2E-Integrated Agent and Harness

A2E PROTOCOL

Building an
A2E-Integrated Agent & Harness

From environment plugin to RL loop — a complete walkthrough of the Agent-to-Environment protocol

env tools learn memory NDJSON

01 / 12

A2E PROTOCOL

The A2E Design Framework

Three layers that define how agents talk to environments

Layer 1

Wire Protocol
Typed NDJSON messages. Every interaction is a standardized request/response pair over a single transport.

Layer 2

Plugin Runtime
Thin execution kernel that loads plugins, routes messages by type, and manages session lifecycle.

Layer 3

Capability Namespaces
10 standard namespaces — env, tools, memory, learn, proc, skills, toolkits, chains, mcp, subagents.

Key Idea

Like POSIX for agents — the protocol is the interface. The implementations are swappable plugins.

Design principle: The host has no global tool registry. It routes solely by type_to_plugins — every plugin declares what it handles, and the executor dispatches.

02 / 12

A2E PROTOCOL

Strategic Value

Three reasons the framework changes the economics of agent infrastructure

01

Decoupling

Agent code is coupled to every backend it talks to — until the protocol absorbs the churn. Swap the memory plugin from in-memory to SQLite to Pinecone. The agent never notices. The only cost: swap the plugin.

02

Programmable Loop

learn/adapt/req runs in the same message flow as env/step/req. No separate training pipeline. No offline batch job. No vendor support ticket. The agent improves between turns.

03

Ownership

Every plugin is a Python class you control. The env backend is yours. The adaptation strategy is yours. The audit log is yours. When training data is just learn/feedback/req messages in your JSONL, you're not a tenant.

The argument: In an era where software production cost trends toward zero, the cost of churn — swapping a memory backend or migrating off a deprecated platform — trends up. A2E inverts this tradeoff.

03 / 12

A2E PROTOCOL

10 Capability Namespaces

Every interaction is a typed message. Every capability is a plugin.

Namespace	What It Does	Key Messages
env	RL environments: step, reset, observe, reward	`env/reset/req`, `env/step/req`, `env/observe/req`
tools	Named callable functions with structured I/O	`tool/list/req`, `tool/call/req`
memory	Three-tier: working, episodic, semantic	`memory/store/req`, `memory/recall/req`
learn	Feedback, experience replay, adaptation	`learn/feedback/req`, `learn/adapt/req`
proc	Long-running subprocess management	`proc/spawn/req`, `proc/signal/req`
skills	Named, versioned, sandboxed execution	`skill/call/req`, `skill/list/req`
chains	DAG pipelines with branching	`chain/execute/req`, `chain/list/req`
mcp / subagents / toolkits	MCP bridge, multi-agent orchestration, tool bundles	Extension namespaces via plugin registration

This post focuses on env + tools + memory + learn — the four pillars of a complete RL agent loop.

04 / 12

A2E PROTOCOL

The Agent Client

Single session. Single transport. Multiple capabilities.

              class EnvClient:

                  def __init__(self, logger):

                      transport = build_transport(HTTPTransportConfig(

                          base_url="http://localhost:8765"

                      ))

                      self.client = A2EClient(

                          transport=transport,

                          agent_id="react-agent",

                          agent_caps=[ENV, PROC, TOOLKITS]

                      )

                      self.env = EnvAPI(self.client)

                      self.memory = MemoryAPI(self.client)

                      self.learn = LearnAPI(self.client)

                      self.tools = ToolAPI(self.client)

Key Pattern

Every capability shares the same transport. One NDJSON connection, one A2EClient, N capability wrappers. No separate connections to a tool server, a memory server, and an env server.

To change backends

Swap the transport config (HTTP → Direct → Subprocess). Or swap the plugin on the host. The agent code never changes.

05 / 12

A2E PROTOCOL

The Multi-Capability ReAct Loop

Four dispatch branches, one protocol session

observe

→

recall

→

plan

→

act

→

learn

Tool Action

tools.call("read_file", {path})
Call named functions with structured I/O. Results flow back as ToolResult with success/failure, output, and timing.

Env Action

env.step(episode_id, action)
Send action, receive next_state + reward + done. The core RL primitive — test and training in one message.

Skill Action

skills.call("analyze", {input})
Sandboxed execution units. Versioned, tagged, with streaming event support for long-running operations.

Chain Action

chains.run(nodes, entry_node)
DAG pipelines with branching, fan-out, and node-level event streaming. For complex multi-step reasoning.

06 / 12

A2E PROTOCOL

Learning from Each Step

On-policy RL over a standard protocol

1

Feedback — learn.feedback(polarity, score, dimension, source) records a structured evaluation signal tied to the action just taken.

2

Memory — memory.remember(key, value, tier="episodic") stores the turn's trajectory context for future retrieval.

3

Adapt — learn.adapt(strategy="ucb1") updates routing weights every 5 steps. The agent improves between turns.

On-policy by design: The policy that generated the experience is the same policy that gets updated. Credit assignment is exact. Learning is immediate.

Feedback Dimensions

correctness helpfulness safety plan_quality tone

Feedback Sources

HUMAN ENV SELF

07 / 12

A2E PROTOCOL

Building the Harness

Three layers: Environment Plugin → Host Adapter → A2E Server

Layer 1: EnvPlugin

Subclass EnvPlugin, implement on_reset() and on_step(). The base class handles protocol dispatch, req_id injection, error wrapping, and audit logging.

              class CounterEnv(EnvPlugin):

                  name = "counter"

                  def on_reset(self, seed, options):

                      self.state = EnvState(count=0)

                      return self.state

Layer 2: Host Adapter

Manages episode → env mapping. Each env/step auto-records an (s, a, r, s', done) experience tuple via the optional learning hook.

Auto-learning: Every step produces a training sample. The agent doesn't have to call an additional API.

Layer 3: Server Wiring

Load plugins via config.yaml. Each plugin registers its message types. Agent negotiates capabilities at handshake.

              plugins:

                - name: counter_env

                  cls: .counter_env.CounterEnv

                - name: mymemory

                  cls: .inmemory.InMemoryPlugin

08 / 12

A2E PROTOCOL

End-to-End Data Flow

One episode, four plugin handoffs

AGENT

→

env/reset/req

→

EnvPlugin

AGENT

→

memory/recall/req

→

MemoryPlugin

AGENT

→

env/step/req

→

EnvPlugin

→

auto-learn

AGENT

→

learn/feedback/req

→

LearnPlugin

AGENT

→

learn/adapt/req

→

↑ updated weights

One protocol session. Four plugins. Zero shared state. The executor routes by message type — every message finds its handler through the type_to_plugins dispatch table.

09 / 12

A2E PROTOCOL

Architecture Diagram

The three-layer stack with the protocol as the interface

YOUR AGENT

ReAct or DeepAgent loop

Any model (Claude, GPT, etc.)

Tool-aware planning

Sends typed NDJSON messages

A2E PROTOCOL

NDJSON messages

env/*, tool/*, learn/*

Capability negotiation

Audit logging

The interface you own

YOUR HOST (swappable plugins)

EnvPlugin

reset, step, observe, close

ToolPlugin

list, call, search, events

MemoryPlugin

store, recall, forget, merge

LearnPlugin

feedback, experience, adapt

Protocol-first, not SDK-first. Agent and host communicate over typed NDJSON. They don't share imports, class hierarchies, or runtime state. The protocol is the contract.

10 / 12

A2E PROTOCOL

Key Design Principles

What makes this architecture different

Protocol is the Interface

Not a framework, not an SDK you build on top of — a standard wire format between agent and environment. Like POSIX, the interface is the product; implementations are interchangeable.

Multi-Capability Dispatch

The agent calls tools, steps environments, invokes skills, runs chains, and records learning — all through the same session. The executor routes by message type. Adding a capability means adding a plugin.

Learning is First-Class

Feedback, experience, and adaptation are protocol messages. The same message bus carries env/step/req and learn/feedback/req. The loop isn't a hack — it's architecture.

Harness Owns the Integration

Auto-learning hooks, episode-to-env tracking, reward computation — these live in the harness, not the agent. The agent sends simple messages. The harness does the orchestration.

Build your agents against the protocol. Own your harness. Swap your plugins. The agent never notices.

11 / 12

A2E PROTOCOL

Own the Interface.
Swap the Plugins.

The agent never notices.

a2eprotocol.github.io/docs github.com/a2eprotocol/python-sdk env+tool+memory+learn

agent.py

→

A2E messages

→

host adapter

→

learn

12 / 12