Core Concepts

Playgent uses a simple pipeline for testing and evaluating AI agents:

Agent → Test Case → Test Run → Evaluation

Agent

Your AI system configuration including provider, credentials, and system prompt.

Test Case

Input scenarios with expected behaviors, context, and ground truth.

Test Run

Execution of test cases with captured outputs, metrics, and traces.

Evaluation

Scoring runs with 29 built-in metrics (RAG, safety, agentic, multi-turn).

How They Connect

Agent - Register your AI system with Playgent
Test Case - Define scenarios to test against that agent
Test Run - Execute test cases and capture results
Evaluation - Score the run’s output with chosen metrics

Quick Example

from playgent import Playgent

client = Playgent(api_key="your-api-key")

# 1. Create agent
agent = client.agents.create(
    name="Support Agent",
    provider="openai",
    system_prompt="You are a helpful support agent..."
)

# 2. Create test case
test_case = client.test_cases.create(
    name="Refund Request",
    agent_id=agent.id,
    turns=[{
        "input": {"text": "I want a refund"},
        "expected_behavior": "Ask for order details",
        "context": ["Refunds allowed within 30 days"]
    }]
)

# 3. Run test
run = client.runs.create(test_case_id=test_case.id)
print(f"Run passed: {run.success}")

# 4. Evaluate
evaluation = client.evaluate(
    run_id=run.id,
    scorers=["answer_relevancy", "faithfulness", "bias"]
)
print(f"Evaluation passed: {evaluation.overall_pass}")

Quickstart Agents

Getting Started

Core Concepts

Overview

Core Concepts

Agent

Test Case

Test Run

Evaluation

How They Connect

Quick Example

​Core Concepts

Agent

Test Case

Test Run

Evaluation

​How They Connect

​Quick Example

Core Concepts

How They Connect

Quick Example