Batch Evaluate

curl -X POST https://api.playgent.com/v1/evaluate/batch \
  -H "Authorization: Bearer your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "items": [
      {
        "input": "What is your refund policy?",
        "output": "Returns accepted within 30 days.",
        "context": ["Policy: 30 day returns"]
      },
      {
        "input": "How do I track my order?",
        "output": "You can track your order at tracking.example.com",
        "context": ["Tracking available at tracking.example.com"]
      }
    ],
    "scorers": ["faithfulness", "relevance"],
    "config": {
      "parallel": true,
      "fail_fast": false
    }
  }'

{
  "batch_id": "batch_stu678",
  "status": "completed",
  "results": [
    {
      "item_index": 0,
      "overall_pass": true,
      "scores": { "faithfulness": 0.95, "relevance": 0.92 }
    },
    {
      "item_index": 1,
      "overall_pass": true,
      "scores": { "faithfulness": 0.88, "relevance": 0.90 }
    }
  ],
  "summary": {
    "total": 2,
    "passed": 2,
    "failed": 0,
    "avg_scores": { "faithfulness": 0.915, "relevance": 0.91 }
  }
}

POST

evaluate

batch

curl -X POST https://api.playgent.com/v1/evaluate/batch \
  -H "Authorization: Bearer your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "items": [
      {
        "input": "What is your refund policy?",
        "output": "Returns accepted within 30 days.",
        "context": ["Policy: 30 day returns"]
      },
      {
        "input": "How do I track my order?",
        "output": "You can track your order at tracking.example.com",
        "context": ["Tracking available at tracking.example.com"]
      }
    ],
    "scorers": ["faithfulness", "relevance"],
    "config": {
      "parallel": true,
      "fail_fast": false
    }
  }'

{
  "batch_id": "batch_stu678",
  "status": "completed",
  "results": [
    {
      "item_index": 0,
      "overall_pass": true,
      "scores": { "faithfulness": 0.95, "relevance": 0.92 }
    },
    {
      "item_index": 1,
      "overall_pass": true,
      "scores": { "faithfulness": 0.88, "relevance": 0.90 }
    }
  ],
  "summary": {
    "total": 2,
    "passed": 2,
    "failed": 0,
    "avg_scores": { "faithfulness": 0.915, "relevance": 0.91 }
  }
}

Run evaluation on multiple outputs in a single request. Ideal for batch processing, dataset evaluation, and regression testing. Uses the same 27 built-in metrics as the single evaluation endpoint.

array

required

Array of items to evaluate

Show item object

string

required

User input

string

required

Agent output

array

Context documents

string

Ground truth

array

required

Scorers to apply to all items. Choose from 27 built-in metrics: Custom: playval RAG: answer_relevancy, faithfulness, contextual_precision, contextual_recall, contextual_relevancy Safety: bias, toxicity, non_advice, misuse, pii_leakage, role_violation Agentic: task_completion, tool_correctness, argument_correctness, step_efficiency, plan_adherence, plan_quality Multi-Turn: turn_relevancy, role_adherence, knowledge_retention, conversation_completeness, goal_accuracy, tool_use, topic_adherence, turn_faithfulness, turn_contextual_precision, turn_contextual_recall Or use custom scorer IDs from Create Custom Scorer

object

Batch configuration

Show properties

boolean

Run evaluations in parallel (default: true)

boolean

Stop on first failure (default: false)

string

required

Batch identifier

string

required

Batch status: running, completed, failed

array

required

Per-item evaluation results

object

required

Batch summary statistics

Show properties

integer

Total items

integer

Items that passed all scorers

integer

Items that failed

object

Average scores per scorer

curl -X POST https://api.playgent.com/v1/evaluate/batch \
  -H "Authorization: Bearer your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "items": [
      {
        "input": "What is your refund policy?",
        "output": "Returns accepted within 30 days.",
        "context": ["Policy: 30 day returns"]
      },
      {
        "input": "How do I track my order?",
        "output": "You can track your order at tracking.example.com",
        "context": ["Tracking available at tracking.example.com"]
      }
    ],
    "scorers": ["faithfulness", "relevance"],
    "config": {
      "parallel": true,
      "fail_fast": false
    }
  }'

{
  "batch_id": "batch_stu678",
  "status": "completed",
  "results": [
    {
      "item_index": 0,
      "overall_pass": true,
      "scores": { "faithfulness": 0.95, "relevance": 0.92 }
    },
    {
      "item_index": 1,
      "overall_pass": true,
      "scores": { "faithfulness": 0.88, "relevance": 0.90 }
    }
  ],
  "summary": {
    "total": 2,
    "passed": 2,
    "failed": 0,
    "avg_scores": { "faithfulness": 0.915, "relevance": 0.91 }
  }
}

Evaluate Create Custom Scorer

Overview

Agents

Test Cases

Test Runs

Tracing

Evaluation

Optimization

Webhooks

Analytics