AXME
Multi-agent orchestration: how to coordinate AI agents without losing state, retry, or audit trail
A complete guide to multi-agent orchestration — sequential chains, parallel fan-out, hierarchical coordination, cross-framework agent communication, and durable state across agent boundaries.
Multi-agent orchestration coordinates specialized agents without losing shared state, retry semantics, or audit across framework boundaries.
What multi-agent orchestration is
More than one agent participates in a business outcome — researcher, writer, reviewer, or cross-service specialists — with explicit handoffs and failure handling.
In production, this shows up when a prototype works in a notebook but breaks the first time a deploy restarts mid-run, a manager takes a day to approve, or a second agent needs the same state. The failure mode is almost never the model — it is missing lifecycle infrastructure.
AXME models work as durable intents: submit once, wait in a known state, resume with audit. That lets you keep LangGraph, CrewAI, OpenAI Agents, or your own stack while shipping the same patterns operations and compliance teams expect.
When you evaluate build-vs-buy, ask three questions: does state survive process restarts, can humans approve without a bespoke webhook stack, and is audit intent-level rather than log archaeology? Teams that answer yes ship faster through incidents because one ID ties model output, tools, approvals, and retries.
The patterns below are framework-agnostic. Wire AXME at boundaries — after a graph node, before a cross-service call, or when Mesh policy must enforce spend and tool scope — rather than rewriting agent logic you already trust.
Where multi-agent systems break
AutoGen crews work in one process; put agents on different hosts and you are building a message broker. LangGraph hands off to CrewAI via S3 paths and prayer. Partial failures orphan work with no shared timeout or audit.
In production, this shows up when a prototype works in a notebook but breaks the first time a deploy restarts mid-run, a manager takes a day to approve, or a second agent needs the same state. The failure mode is almost never the model — it is missing lifecycle infrastructure.
AXME models work as durable intents: submit once, wait in a known state, resume with audit. That lets you keep LangGraph, CrewAI, OpenAI Agents, or your own stack while shipping the same patterns operations and compliance teams expect.
When you evaluate build-vs-buy, ask three questions: does state survive process restarts, can humans approve without a bespoke webhook stack, and is audit intent-level rather than log archaeology? Teams that answer yes ship faster through incidents because one ID ties model output, tools, approvals, and retries.
The patterns below are framework-agnostic. Wire AXME at boundaries — after a graph node, before a cross-service call, or when Mesh policy must enforce spend and tool scope — rather than rewriting agent logic you already trust.
The protocol layer: why you need AXP
AXP gives a shared intent ID across LangGraph, CrewAI, AutoGen, and backend services — so every participant reads the same lifecycle and audit trail.
In production, this shows up when a prototype works in a notebook but breaks the first time a deploy restarts mid-run, a manager takes a day to approve, or a second agent needs the same state. The failure mode is almost never the model — it is missing lifecycle infrastructure.
AXME models work as durable intents: submit once, wait in a known state, resume with audit. That lets you keep LangGraph, CrewAI, OpenAI Agents, or your own stack while shipping the same patterns operations and compliance teams expect.
When you evaluate build-vs-buy, ask three questions: does state survive process restarts, can humans approve without a bespoke webhook stack, and is audit intent-level rather than log archaeology? Teams that answer yes ship faster through incidents because one ID ties model output, tools, approvals, and retries.
The patterns below are framework-agnostic. Wire AXME at boundaries — after a graph node, before a cross-service call, or when Mesh policy must enforce spend and tool scope — rather than rewriting agent logic you already trust.
Human-in-the-loop across agent boundaries
Approvals often sit between agents: legal before publish, manager before spend. Durable waits attach to the intent, not to a single framework run.
In production, this shows up when a prototype works in a notebook but breaks the first time a deploy restarts mid-run, a manager takes a day to approve, or a second agent needs the same state. The failure mode is almost never the model — it is missing lifecycle infrastructure.
AXME models work as durable intents: submit once, wait in a known state, resume with audit. That lets you keep LangGraph, CrewAI, OpenAI Agents, or your own stack while shipping the same patterns operations and compliance teams expect.
When you evaluate build-vs-buy, ask three questions: does state survive process restarts, can humans approve without a bespoke webhook stack, and is audit intent-level rather than log archaeology? Teams that answer yes ship faster through incidents because one ID ties model output, tools, approvals, and retries.
The patterns below are framework-agnostic. Wire AXME at boundaries — after a graph node, before a cross-service call, or when Mesh policy must enforce spend and tool scope — rather than rewriting agent logic you already trust.
Failure handling: partial completion, retry, compensation
Define which agent steps retry, which escalate to humans, and which compensate (rollback side effects) when downstream agents fail.
In production, this shows up when a prototype works in a notebook but breaks the first time a deploy restarts mid-run, a manager takes a day to approve, or a second agent needs the same state. The failure mode is almost never the model — it is missing lifecycle infrastructure.
AXME models work as durable intents: submit once, wait in a known state, resume with audit. That lets you keep LangGraph, CrewAI, OpenAI Agents, or your own stack while shipping the same patterns operations and compliance teams expect.
When you evaluate build-vs-buy, ask three questions: does state survive process restarts, can humans approve without a bespoke webhook stack, and is audit intent-level rather than log archaeology? Teams that answer yes ship faster through incidents because one ID ties model output, tools, approvals, and retries.
The patterns below are framework-agnostic. Wire AXME at boundaries — after a graph node, before a cross-service call, or when Mesh policy must enforce spend and tool scope — rather than rewriting agent logic you already trust.
Observability: tracking state across agent boundaries
Fleet visibility shows every intent and agent class. Audit trail answers what each agent did before the handoff — critical for enterprise debugging.
In production, this shows up when a prototype works in a notebook but breaks the first time a deploy restarts mid-run, a manager takes a day to approve, or a second agent needs the same state. The failure mode is almost never the model — it is missing lifecycle infrastructure.
AXME models work as durable intents: submit once, wait in a known state, resume with audit. That lets you keep LangGraph, CrewAI, OpenAI Agents, or your own stack while shipping the same patterns operations and compliance teams expect.
When you evaluate build-vs-buy, ask three questions: does state survive process restarts, can humans approve without a bespoke webhook stack, and is audit intent-level rather than log archaeology? Teams that answer yes ship faster through incidents because one ID ties model output, tools, approvals, and retries.
The patterns below are framework-agnostic. Wire AXME at boundaries — after a graph node, before a cross-service call, or when Mesh policy must enforce spend and tool scope — rather than rewriting agent logic you already trust.
Versioning agents without breaking in-flight intents
Deploy new agent versions while intents reference stable intent IDs. Mesh can route new submissions to v2 while v1 completes open work — avoiding big-bang cutovers.
In production, this shows up when a prototype works in a notebook but breaks the first time a deploy restarts mid-run, a manager takes a day to approve, or a second agent needs the same state. The failure mode is almost never the model — it is missing lifecycle infrastructure.
AXME models work as durable intents: submit once, wait in a known state, resume with audit. That lets you keep LangGraph, CrewAI, OpenAI Agents, or your own stack while shipping the same patterns operations and compliance teams expect.
When you evaluate build-vs-buy, ask three questions: does state survive process restarts, can humans approve without a bespoke webhook stack, and is audit intent-level rather than log archaeology? Teams that answer yes ship faster through incidents because one ID ties model output, tools, approvals, and retries.
The patterns below are framework-agnostic. Wire AXME at boundaries — after a graph node, before a cross-service call, or when Mesh policy must enforce spend and tool scope — rather than rewriting agent logic you already trust.
Testing multi-agent flows before production
Use sandbox namespaces, synthetic intents, and policy dry-run to validate handoffs. Record expected state transitions so regressions show up in CI, not in customer traffic.
In production, this shows up when a prototype works in a notebook but breaks the first time a deploy restarts mid-run, a manager takes a day to approve, or a second agent needs the same state. The failure mode is almost never the model — it is missing lifecycle infrastructure.
AXME models work as durable intents: submit once, wait in a known state, resume with audit. That lets you keep LangGraph, CrewAI, OpenAI Agents, or your own stack while shipping the same patterns operations and compliance teams expect.
When you evaluate build-vs-buy, ask three questions: does state survive process restarts, can humans approve without a bespoke webhook stack, and is audit intent-level rather than log archaeology? Teams that answer yes ship faster through incidents because one ID ties model output, tools, approvals, and retries.
The patterns below are framework-agnostic. Wire AXME at boundaries — after a graph node, before a cross-service call, or when Mesh policy must enforce spend and tool scope — rather than rewriting agent logic you already trust.
PATTERNS
Three coordination shapes.
Sequential chain
A → B → C with durable waits between.
Parallel fan-out
Split work; join on completion.
Hierarchical
Supervisor delegates to workers.
Cross-framework handoff
Ad hoc
# custom message bus + state DB
AXP intent
langgraph_run → axme.intent → autogen_run
Frequently asked questions
- How many agents can share one intent?
- Multiple participants attach to the same intent via AXP. Each framework run updates lifecycle state; Mesh enforces fleet policy across participants.
- Do I have to replace LangGraph, CrewAI, or Temporal?
- No. AXME complements orchestration frameworks and workflow engines. You keep agent graphs and workers; intents add durability, HITL, audit, and fleet controls where those tools stop.
- How is this different from observability alone?
- Dashboards show symptoms after the fact. Intents carry lifecycle state, waiting semantics, and policy enforcement so you can pause, approve, cap spend, or kill one agent without redeploying the fleet.
Related reading
Deeper dives from the AXME blog.
Your AutoGen Agents Can't Talk Across Machines. Here's the Missing Piece.
AutoGen handles multi-agent conversations beautifully - inside one process. Put agents on different machines and you're back to building message brokers from scratch.
Read post →A2A Tells Agents How to Talk. It Doesn't Tell Them What Happens When Things Break.
Google's A2A protocol handles agent communication. But crash recovery, retries, timeouts, and human approval gates? That's still on you. Unless you add a lifecycle layer.
Read post →Subagents Without Context: Claude Code's Silent Bug
When Claude Code spawns a subagent to run a task in parallel, the subagent starts fresh. It doesn't inherit the parent's memory. Work gets done, but context is lost. Here's the fix, and why it's fragile.
Read post →
Further reading
Ship your first durable agent — in under 10 minutes.
Free tier. No credit card. Self-host or hosted — your choice.