Documentation Index
Fetch the complete documentation index at: https://playgent.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Evaluations
Evaluation scores agent outputs using metrics. Playgent provides 29 built-in evaluation metrics across Custom, RAG, Agentic, Multi-Turn, and Safety categories.Available Metrics (29 Total)
| Type | Metric | Description |
|---|---|---|
| Custom | playval | General-purpose LLM-as-judge with custom criteria |
| RAG | answer_relevancy | How relevant is the answer to the question? |
| RAG | faithfulness | Is the answer grounded in the provided context? |
| RAG | contextual_precision | Are relevant context chunks ranked higher? |
| RAG | contextual_recall | Does the context contain all needed information? |
| RAG | contextual_relevancy | Is the retrieved context relevant to the query? |
| Agentic | task_completion | Did the agent complete the requested task? |
| Agentic | tool_correctness | Were the right tools selected? |
| Agentic | argument_correctness | Were tool arguments correct? |
| Agentic | step_efficiency | Were unnecessary steps avoided? |
| Agentic | plan_adherence | Did the agent follow its stated plan? |
| Agentic | plan_quality | Was the plan logical and effective? |
| Multi-Turn | turn_relevancy | Is each response relevant to its turn? |
| Multi-Turn | role_adherence | Does agent maintain its role throughout? |
| Multi-Turn | knowledge_retention | Does agent remember earlier context? |
| Multi-Turn | conversation_completeness | Was the conversation goal achieved? |
| Multi-Turn | goal_accuracy | How well did agent achieve the user’s goal? |
| Multi-Turn | tool_use | Were tools used appropriately across turns? |
| Multi-Turn | topic_adherence | Did agent stay on topic? |
| Multi-Turn | turn_faithfulness | Is each turn grounded in provided context? |
| Multi-Turn | turn_contextual_precision | Context precision per turn |
| Multi-Turn | turn_contextual_recall | Context recall per turn |
| Multi-Turn | turn_contextual_relevancy | Context relevancy per turn |
| Safety | bias | Detects biased or discriminatory content |
| Safety | toxicity | Detects harmful, offensive, or toxic language |
| Safety | non_advice | Ensures no professional advice (legal, medical, financial) |
| Safety | misuse | Detects potential misuse or harmful instructions |
| Safety | pii_leakage | Checks for personally identifiable information leaks |
| Safety | role_violation | Detects when agent breaks character or role boundaries |
Metric Requirements
| Type | Required Parameters |
|---|---|
| Custom | input, output, expected_behavior |
| RAG | input, output, context |
| Agentic | tools_called, expected_tools |
| Multi-Turn | conversation |
| Safety | output only |
Quick Start
Common Use Cases
Ad-hoc Evaluation
Evaluate without running a test:Batch Evaluation
Evaluate multiple runs:Custom Thresholds
Override default 0.7 threshold:Automatic Evaluation
Set default scorers on agent:Next Steps
API Reference
Full evaluation API with all 29 metrics
Custom Scorers
Create domain-specific metrics

