Test Runs

A Test Run executes a test case and captures the agent’s output, performance metrics (latency, tokens, cost), and execution trace.

Quick Start

run = client.runs.create(test_case_id="tc_xyz789")

print(f"Run ID: {run.id}")
print(f"Success: {run.success}")
print(f"Output: {run.output}")

const run = await client.runs.create({ testCaseId: "tc_xyz789" });

console.log(`Run ID: ${run.id}`);
console.log(`Success: ${run.success}`);
console.log(`Output: ${run.output}`);

Run Results

Each run returns:

success - Whether test passed
output - Agent’s response
metrics - Latency, tokens, cost
trace_id - Execution trace link

run = client.runs.create(test_case_id="tc_xyz789")
print(f"Success: {run.success}")
print(f"Latency: {run.metrics.latency_ms}ms")
print(f"Cost: ${run.metrics.cost_usd:.4f}")

Running Multiple Tests

By IDs or Tags

# Multiple test cases
run = client.runs.create(test_case_ids=["tc_001", "tc_002", "tc_003"])

# By tags
run = client.runs.create(tags=["refunds"])

Async Mode

For long-running tests:

run = client.runs.create(test_case_id="tc_xyz789", async_mode=True)
# Check status later
run_result = client.runs.get(run.id)

Compare Agent Versions

run_v1 = client.runs.create(test_case_id="tc_xyz789")
run_v2 = client.runs.create(test_case_id="tc_xyz789", agent_id="agent_v2")

Next Steps

Evaluate runs with metrics:

run = client.runs.create(test_case_id="tc_xyz789")

evaluation = client.evaluate(
    run_id=run.id,
    scorers=["answer_relevancy", "faithfulness", "bias"]
)

Evaluations

Score runs with 29 built-in metrics

API Reference

Full API documentation

Getting Started

Core Concepts

Test Runs

Test Runs

Quick Start

Run Results

Running Multiple Tests

By IDs or Tags

Async Mode

Compare Agent Versions

Next Steps

Evaluations

API Reference

​Test Runs

​Quick Start

​Run Results

​Running Multiple Tests

​By IDs or Tags

​Async Mode

​Compare Agent Versions

​Next Steps

Evaluations

API Reference

Test Runs

Quick Start

Run Results

Running Multiple Tests

By IDs or Tags

Async Mode

Compare Agent Versions

Next Steps