Stress-test your AI agents before production does
Point Fabrik at an agent. It builds a simulation environment around it, generates scenarios from your code and real production traces, runs everything in parallel, and tells you exactly what broke.
Runs your agent on the framework you already use
01/Why simulation, not evals
Evals score outputs against rubrics. They miss the failures that actually break agents.
These aren't output-quality failures — they're integration failures. They only show up when the agent runs against a realistic environment with realistic state.
stripe.refunds.createWrong-amount tool calls
The agent calls a tool with the wrong arguments because the user's request was ambiguous and the agent guessed.
langgraph.nodeLoops on changed shapes
A LangGraph node loops three times because the OpenAI tool-call response shape changed underneath it.
auth.proxyUnhandled null state
The auth proxy returns null because the customer is a returning user with a different ID format than the test fixtures.
02/How Fabrik works
From a connected agent to a failure report — automatically
Your agent
Point Fabrik at a connected sandbox.
Discover
Fabrik learns how the agent works.
Build environment
Mocked services, seeded world state, personas.
Generate scenarios
From your code and real production traces.
Run in parallel
Hundreds of rollouts at once.
Report
Failure clusters and exactly what broke.
03/The product
Five things you can do in sixty seconds
Watch parallel rollouts
Pick a scenario set, hit run, and see N scenarios execute simultaneously — each with its persona, current turn, and assertion status emerging live.
Inspect a rollout's trace
Actor messages, agent messages, tool calls, mock hits/misses, DB reads/writes, assertion results, and grader output — one timeline, color-coded by lane.
Compare two runs
Pick a baseline. Every scenario is grouped by delta: fixed, regressed, or unchanged. Failure clusters sit side-by-side with deltas.
Production traces → scenarios
Drop a JSONL / Langfuse / OpenTelemetry export. Fabrik normalizes it, redacts PII, and seeds scenario generation grounded in real user phrasing.
Export training data
Filter to passing rollouts with high behavioral scores. Download as JSONL in OpenAI fine-tuning shape, aggregated across runs into one re-fetchable snapshot.
{"messages":[
{"role":"user","content":"refund order_1001"},
{"role":"assistant","content":"Refunded $42.00 …"}
],"metadata":{"score":0.91,"run":"8f3a"}}04/The output
Three things at once, every run
Bugs you would have shipped
Specific scenarios where the agent regressed, fabricated answers, called the wrong tool, or violated your policies — with reproducible inputs.
Comparison data across versions
"v2 fixed 47% of v1's failures, here are 3 new regressions" — with one click.
Training data
Every passing rollout becomes a JSONL line in OpenAI fine-tuning shape. The longer Fabrik runs, the more high-signal corpus you accumulate.
Most teams come for the bug-finding. The training data quietly compounds in the background.
05/Quick start
Three ways in — pick how much access you want to give
Full SDK injection
Zero code — Fabrik writes the wrapping
Best when you control the agent's repo. Fabrik creates a fabrik-prep branch, analyzes your code, and proposes wraps for your DB / auth / API / notification / payment calls one group at a time. You approve each plan; Fabrik commits it.
Environment-only
~3 lines in your handler
Best when you can't or don't want Fabrik to edit your code. Fabrik runs discovery, builds the mock catalog, detects personas + framework, and publishes an environment version. You add a few lines to your request handler.
Bring your own framework
Native trace enrichment
Best when you're already running OpenAI Agents SDK / Vercel AI SDK / LangGraph / Google ADK. Fabrik's framework-detection skill identifies the framework and sets up the right adapter automatically.
Environment-only — that's the whole integration:
refreshFabrikRuntimeFromRequest, withFabrikRuntimeResponse,} from '@fabrik-evals/core'; const body = await req.json(); refreshFabrikRuntimeFromRequest(body); // pulls the runtime envelope const reply = await myAgent(body.messages); return Response.json(withFabrikRuntimeResponse({ text: reply }));}06/See it run
One happy-path test vs. hundreds of parallel rollouts
Traditional evals run the workflow once and call it green. Fabrik runs it against every persona, edge case, and service failure — and shows you exactly what broke.
Find the bugs before your users do
Get early access and updates on AI agent simulation and reliability.
No spam. Only valuable updates.