AXME
Stop agent cost surprises — a complete guide to tracking, capping, and optimizing AI agent spend
How to track LLM token costs per agent, set hard budget caps, attribute spend across teams, and prevent runaway cost events.
Agent cost surprises come from unbounded tool loops, wrong model routing, and missing attribution — fix with per-intent metering, caps, and fleet-level policy.
Why agent cost surprises happen
A ticket agent expected to spend $8/day hits $500 over a weekend retry loop. Nobody gets paged because OpenAI bills one line item — engineering finds out in standup, not in a dashboard.
In production, this shows up when a prototype works in a notebook but breaks the first time a deploy restarts mid-run, a manager takes a day to approve, or a second agent needs the same state. The failure mode is almost never the model — it is missing lifecycle infrastructure.
AXME models work as durable intents: submit once, wait in a known state, resume with audit. That lets you keep LangGraph, CrewAI, OpenAI Agents, or your own stack while shipping the same patterns operations and compliance teams expect.
When you evaluate build-vs-buy, ask three questions: does state survive process restarts, can humans approve without a bespoke webhook stack, and is audit intent-level rather than log archaeology? Teams that answer yes ship faster through incidents because one ID ties model output, tools, approvals, and retries.
The patterns below are framework-agnostic. Wire AXME at boundaries — after a graph node, before a cross-service call, or when Mesh policy must enforce spend and tool scope — rather than rewriting agent logic you already trust.
Cost components: tokens, API calls, compute, third-party services
Attribute tokens, metered APIs, compute, and SaaS tools to an intent and agent class so finance can charge back — and so Mesh can hard-stop the one agent burning budget, not the whole deployment.
In production, this shows up when a prototype works in a notebook but breaks the first time a deploy restarts mid-run, a manager takes a day to approve, or a second agent needs the same state. The failure mode is almost never the model — it is missing lifecycle infrastructure.
AXME models work as durable intents: submit once, wait in a known state, resume with audit. That lets you keep LangGraph, CrewAI, OpenAI Agents, or your own stack while shipping the same patterns operations and compliance teams expect.
When you evaluate build-vs-buy, ask three questions: does state survive process restarts, can humans approve without a bespoke webhook stack, and is audit intent-level rather than log archaeology? Teams that answer yes ship faster through incidents because one ID ties model output, tools, approvals, and retries.
The patterns below are framework-agnostic. Wire AXME at boundaries — after a graph node, before a cross-service call, or when Mesh policy must enforce spend and tool scope — rather than rewriting agent logic you already trust.
Per-agent cost attribution: the technical approach
Tag intents with team, environment, and agent class. Mesh aggregates token and API spend in near real time.
In production, this shows up when a prototype works in a notebook but breaks the first time a deploy restarts mid-run, a manager takes a day to approve, or a second agent needs the same state. The failure mode is almost never the model — it is missing lifecycle infrastructure.
AXME models work as durable intents: submit once, wait in a known state, resume with audit. That lets you keep LangGraph, CrewAI, OpenAI Agents, or your own stack while shipping the same patterns operations and compliance teams expect.
When you evaluate build-vs-buy, ask three questions: does state survive process restarts, can humans approve without a bespoke webhook stack, and is audit intent-level rather than log archaeology? Teams that answer yes ship faster through incidents because one ID ties model output, tools, approvals, and retries.
The patterns below are framework-agnostic. Wire AXME at boundaries — after a graph node, before a cross-service call, or when Mesh policy must enforce spend and tool scope — rather than rewriting agent logic you already trust.
Budget cap types: hard stop vs soft alert vs rate limit
Hard stop halts the intent. Soft alert pages on-call. Rate limit throttles calls per minute — pick per risk profile.
In production, this shows up when a prototype works in a notebook but breaks the first time a deploy restarts mid-run, a manager takes a day to approve, or a second agent needs the same state. The failure mode is almost never the model — it is missing lifecycle infrastructure.
AXME models work as durable intents: submit once, wait in a known state, resume with audit. That lets you keep LangGraph, CrewAI, OpenAI Agents, or your own stack while shipping the same patterns operations and compliance teams expect.
When you evaluate build-vs-buy, ask three questions: does state survive process restarts, can humans approve without a bespoke webhook stack, and is audit intent-level rather than log archaeology? Teams that answer yes ship faster through incidents because one ID ties model output, tools, approvals, and retries.
The patterns below are framework-agnostic. Wire AXME at boundaries — after a graph node, before a cross-service call, or when Mesh policy must enforce spend and tool scope — rather than rewriting agent logic you already trust.
Chargeback models for enterprise multi-team deployments
Export monthly attribution by team. Align caps with product P&L owners so agents stay within approved spend.
In production, this shows up when a prototype works in a notebook but breaks the first time a deploy restarts mid-run, a manager takes a day to approve, or a second agent needs the same state. The failure mode is almost never the model — it is missing lifecycle infrastructure.
AXME models work as durable intents: submit once, wait in a known state, resume with audit. That lets you keep LangGraph, CrewAI, OpenAI Agents, or your own stack while shipping the same patterns operations and compliance teams expect.
When you evaluate build-vs-buy, ask three questions: does state survive process restarts, can humans approve without a bespoke webhook stack, and is audit intent-level rather than log archaeology? Teams that answer yes ship faster through incidents because one ID ties model output, tools, approvals, and retries.
The patterns below are framework-agnostic. Wire AXME at boundaries — after a graph node, before a cross-service call, or when Mesh policy must enforce spend and tool scope — rather than rewriting agent logic you already trust.
Cost optimization strategies: model routing, caching, batching
Route simple steps to smaller models. Cache tool results on intents. Batch non-urgent work off peak.
In production, this shows up when a prototype works in a notebook but breaks the first time a deploy restarts mid-run, a manager takes a day to approve, or a second agent needs the same state. The failure mode is almost never the model — it is missing lifecycle infrastructure.
AXME models work as durable intents: submit once, wait in a known state, resume with audit. That lets you keep LangGraph, CrewAI, OpenAI Agents, or your own stack while shipping the same patterns operations and compliance teams expect.
When you evaluate build-vs-buy, ask three questions: does state survive process restarts, can humans approve without a bespoke webhook stack, and is audit intent-level rather than log archaeology? Teams that answer yes ship faster through incidents because one ID ties model output, tools, approvals, and retries.
The patterns below are framework-agnostic. Wire AXME at boundaries — after a graph node, before a cross-service call, or when Mesh policy must enforce spend and tool scope — rather than rewriting agent logic you already trust.
When to automate the stop vs alert a human
Automate hard stops for runaway loops. Alert humans for borderline policy violations that need judgment.
In production, this shows up when a prototype works in a notebook but breaks the first time a deploy restarts mid-run, a manager takes a day to approve, or a second agent needs the same state. The failure mode is almost never the model — it is missing lifecycle infrastructure.
AXME models work as durable intents: submit once, wait in a known state, resume with audit. That lets you keep LangGraph, CrewAI, OpenAI Agents, or your own stack while shipping the same patterns operations and compliance teams expect.
When you evaluate build-vs-buy, ask three questions: does state survive process restarts, can humans approve without a bespoke webhook stack, and is audit intent-level rather than log archaeology? Teams that answer yes ship faster through incidents because one ID ties model output, tools, approvals, and retries.
The patterns below are framework-agnostic. Wire AXME at boundaries — after a graph node, before a cross-service call, or when Mesh policy must enforce spend and tool scope — rather than rewriting agent logic you already trust.
AXME Mesh cost controls — quickstart
# Define budget policy per agent class policy = mesh.cost_policy( agent_class="researcher", hard_cap_usd=50, alert_at_usd=40, ) mesh.apply(policy)
Frequently asked questions
- Can caps apply per customer or tenant?
- Yes. Tag intents with tenant and agent class; Mesh aggregates spend and enforces caps per namespace or customer ID.
- What is the fastest win for cost control?
- Hard caps on high-risk agent classes plus alerts at 80% of budget. Pair with kill switch rules when error rates spike alongside token burn.
- Do I have to replace LangGraph, CrewAI, or Temporal?
- No. AXME complements orchestration frameworks and workflow engines. You keep agent graphs and workers; intents add durability, HITL, audit, and fleet controls where those tools stop.
- How is this different from observability alone?
- Dashboards show symptoms after the fact. Intents carry lifecycle state, waiting semantics, and policy enforcement so you can pause, approve, cap spend, or kill one agent without redeploying the fleet.
Related reading
Deeper dives from the AXME blog.
Your AI Agent Spent $500 Overnight and Nobody Noticed
AI agents call LLMs. LLMs cost money per token. Nobody tracks it per agent. One runaway loop and your OpenAI bill is a disaster.
Read post →Your AI Agent Made 10,000 API Calls in an Hour. Here's How to Stop That.
One runaway retry loop. 10,000 API calls. $130 in LLM costs. No rate limit fired because you never built one. Here's how to add centralized rate and cost limiting to AI agents.
Read post →How to Stop a Rogue AI Agent in Production
Your AI agent went rogue at 3am. It's running on multiple instances across regions. There's no terminal to Ctrl+C. You need a kill switch that works in under 1 second, enforced at the infrastructure level.
Read post →
Further reading
Ship your first durable agent — in under 10 minutes.
Free tier. No credit card. Self-host or hosted — your choice.