Skip to main content
POST
/
v1
/
evaluate
/
batch
curl -X POST https://api.playgent.com/v1/evaluate/batch \
  -H "Authorization: Bearer your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "items": [
      {
        "input": "What is your refund policy?",
        "output": "Returns accepted within 30 days.",
        "context": ["Policy: 30 day returns"]
      },
      {
        "input": "How do I track my order?",
        "output": "You can track your order at tracking.example.com",
        "context": ["Tracking available at tracking.example.com"]
      }
    ],
    "scorers": ["faithfulness", "relevance"],
    "config": {
      "parallel": true,
      "fail_fast": false
    }
  }'
{
  "batch_id": "batch_stu678",
  "status": "completed",
  "results": [
    {
      "item_index": 0,
      "overall_pass": true,
      "scores": { "faithfulness": 0.95, "relevance": 0.92 }
    },
    {
      "item_index": 1,
      "overall_pass": true,
      "scores": { "faithfulness": 0.88, "relevance": 0.90 }
    }
  ],
  "summary": {
    "total": 2,
    "passed": 2,
    "failed": 0,
    "avg_scores": { "faithfulness": 0.915, "relevance": 0.91 }
  }
}

Documentation Index

Fetch the complete documentation index at: https://playgent.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

Run evaluation on multiple outputs in a single request. Ideal for batch processing, dataset evaluation, and regression testing. Uses the same 27 built-in metrics as the single evaluation endpoint.
items
array
required
Array of items to evaluate
scorers
array
required
Scorers to apply to all items. Choose from 27 built-in metrics: Custom: playval RAG: answer_relevancy, faithfulness, contextual_precision, contextual_recall, contextual_relevancy Safety: bias, toxicity, non_advice, misuse, pii_leakage, role_violation Agentic: task_completion, tool_correctness, argument_correctness, step_efficiency, plan_adherence, plan_quality Multi-Turn: turn_relevancy, role_adherence, knowledge_retention, conversation_completeness, goal_accuracy, tool_use, topic_adherence, turn_faithfulness, turn_contextual_precision, turn_contextual_recall Or use custom scorer IDs from Create Custom Scorer
config
object
Batch configuration
batch_id
string
required
Batch identifier
status
string
required
Batch status: running, completed, failed
results
array
required
Per-item evaluation results
summary
object
required
Batch summary statistics
curl -X POST https://api.playgent.com/v1/evaluate/batch \
  -H "Authorization: Bearer your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "items": [
      {
        "input": "What is your refund policy?",
        "output": "Returns accepted within 30 days.",
        "context": ["Policy: 30 day returns"]
      },
      {
        "input": "How do I track my order?",
        "output": "You can track your order at tracking.example.com",
        "context": ["Tracking available at tracking.example.com"]
      }
    ],
    "scorers": ["faithfulness", "relevance"],
    "config": {
      "parallel": true,
      "fail_fast": false
    }
  }'
{
  "batch_id": "batch_stu678",
  "status": "completed",
  "results": [
    {
      "item_index": 0,
      "overall_pass": true,
      "scores": { "faithfulness": 0.95, "relevance": 0.92 }
    },
    {
      "item_index": 1,
      "overall_pass": true,
      "scores": { "faithfulness": 0.88, "relevance": 0.90 }
    }
  ],
  "summary": {
    "total": 2,
    "passed": 2,
    "failed": 0,
    "avg_scores": { "faithfulness": 0.915, "relevance": 0.91 }
  }
}