A AgentReplay Console

Agent side-effect regression

Freeze. Diff. Gate.

Side-effect regression testing for AI agents.

Trace captured
FAILED 3/6 PASSED 6/6

Trace timeline

11 sec replay
00:00 00:02 00:04 00:07 00:09 00:11

Tool calls

search_orders200 OK
get_customer200 OK
create_refund500 ERR
send_email200 OK
update_crm500 ERR
log_event200 OK

Side-effect diff

Resource Actual Diff
refunds id: rf_123 + CREATED
emails to: customer@email.com + SENT
orders status: refunded CHANGED
crm.contacts last_contacted: now CHANGED
audit_logs +1 entry + CREATED

The problem

Production failures should not become folklore.

Freeze the exact run, preserve the state boundary, and turn the incident into a gate.

unreproducible

Wrong refund

Tool
stripe.refund
Target
in_original
State
changed
side effect

Email sent

Tool
gmail.send
Approval
missing
Action
delivered
missed approval

CRM drift

System
HubSpot
Order
too early
Replay
missing

3 / 8 workflow

A release gate built from real incidents.

Capture the production run, compare the candidate, and block regressions before release.

1

Freeze

Capture the run

agentreplay freeze
2

Diff

Compare side effects

agentreplay diff
3

Gate

Block regressions

agentreplay gate

Bad trace

FAILED 3/6
  • No PII in logs
  • Email sent
  • Refund within 24h

Fixed trace

PASSED 6/6
  • No PII in logs
  • Email drafted
  • Refund within 24h
Tool diffSide-effect diffStatus
gmail.send → gmail.draftemail side effect removedchanged
stripe.refundrefund target changedverified
hubspot.updateorder preservedpass
agentreplay diff bad_trace.json fixed_trace.json

Section 4 / 8

The incident becomes the test.

Replay the bad run, compare the candidate, and block the regression before release.

3/6 failed checks 2 changes detected 6/6 passing checks

5 / 8

Wrap the tools your agents already use.

Record inputs, responses, approvals, and side effects without replacing your agent framework.

AgentReplay
Harness
Stripe
Gmail
HubSpot
GitHub
Postgres
Slack
OpenAPI
harness.wrapTool('stripe.refund', refundCustomer)

6 / 8

Built for agents that act.

Billing, support, RevOps, and platform teams need proof before the next release.

$

Billing Ops

refund gate

invoice targetPASS amount checkPASS email actionFIXED

Support Automation

draft approval

Approve Reject
{ }

AI Platform

CI release gate

Agent Agencies

client proof bundle

ClientStatusTests
Northwinddelivered21/24
Atlasreviewed18/18
SDK CLI READY
import { createHarness } from 'agentreplay'

const harness = createHarness({
  projectKey: process.env.AGENTREPLAY_PROJECT_KEY,
  redact: ['pii', 'raw_keys']
})

const refund = harness.wrapTool(
  'stripe.refund',
  refundCustomer
)
agentreplay gate traces/billing-bad-run.json PASSED 6/6 agentreplay diff bad.json fixed.json DIFF 3

Developer experience

Drop it around the tools. Keep your agent.

AgentReplay works beside your framework: OpenAI Agents, LangGraph, custom MCP servers, or your own loop.

  • No framework rewrite
  • Deterministic gates
  • Redaction by default
  • CI-ready
PASSED 6/6

Ship agents with evidence.

Every side effect frozen. Every fix compared. Every release gated.