Skip to main content
POST
/
v1
/
scorers
curl -X POST https://api.playgent.com/v1/scorers \
  -H "Authorization: Bearer your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "policy_compliance",
    "description": "Checks if the response adheres to company policies",
    "type": "llm_judge",
    "config": {
      "model": "gpt-4-turbo",
      "rubric": "You are evaluating whether an AI agent response complies with company policies.\n\nPolicies:\n{{policies}}\n\nAgent Response:\n{{output}}\n\nScore from 0-1 where 1 is fully compliant. List any violations. Explain your reasoning.",
      "output_schema": {
        "score": "number",
        "reasoning": "string",
        "violations": "array"
      }
    },
    "variables": {
      "policies": "1. Never promise refunds over $500 without manager approval\n2. Always verify customer identity before discussing order details\n3. Do not discuss competitor products"
    }
  }'
{
  "scorer_id": "scorer_abc123",
  "name": "policy_compliance"
}

Documentation Index

Fetch the complete documentation index at: https://playgent.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

Create a custom scorer for domain-specific evaluation criteria. Playgent provides 27 built-in metrics (RAG, safety, agentic, multi-turn), but use this endpoint when you need evaluation logic specific to your use case.
Already have 27 metrics? Check out the built-in scorers before creating custom ones. Most use cases are covered by playval, faithfulness, answer_relevancy, bias, toxicity, and other out-of-the-box metrics.

When to Create Custom Scorers

  • Domain-specific quality: e.g., “Does the response follow medical compliance rules?”
  • Business logic: e.g., “Did the agent offer the correct discount tier?”
  • Custom rubrics: Your own evaluation criteria not covered by built-ins
  • Code-based evaluation: Regex, exact matching, or programmatic checks

Scorer Types

Use an LLM to evaluate based on your custom rubric. Most flexible option. Best for: Subjective quality assessment, custom criteria
Execute Python code to evaluate the response programmatically. Best for: Exact matching, calculations, format validation
Use regex patterns to validate response format or content. Best for: Format checking, required keyword presence

Parameters

name
string
required
Scorer name (e.g., policy_compliance, discount_accuracy)
description
string
Human-readable description of what this scorer evaluates
type
string
required
Scorer type: llm_judge, code, or regex
config
object
required
Scorer configuration
variables
object
Custom variables to inject into the rubric template (e.g., company policies, reference data)
scorer_id
string
required
Unique scorer identifier (use in evaluate requests)
name
string
required
Scorer name
curl -X POST https://api.playgent.com/v1/scorers \
  -H "Authorization: Bearer your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "policy_compliance",
    "description": "Checks if the response adheres to company policies",
    "type": "llm_judge",
    "config": {
      "model": "gpt-4-turbo",
      "rubric": "You are evaluating whether an AI agent response complies with company policies.\n\nPolicies:\n{{policies}}\n\nAgent Response:\n{{output}}\n\nScore from 0-1 where 1 is fully compliant. List any violations. Explain your reasoning.",
      "output_schema": {
        "score": "number",
        "reasoning": "string",
        "violations": "array"
      }
    },
    "variables": {
      "policies": "1. Never promise refunds over $500 without manager approval\n2. Always verify customer identity before discussing order details\n3. Do not discuss competitor products"
    }
  }'
{
  "scorer_id": "scorer_abc123",
  "name": "policy_compliance"
}

Example: Code-Based Scorer

For programmatic evaluation:
curl -X POST https://api.playgent.com/v1/scorers \
  -H "Authorization: Bearer your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "required_keywords",
    "description": "Checks if response contains required keywords",
    "type": "code",
    "config": {
      "rubric": "required = [\"refund\", \"policy\", \"30 days\"]\nfound = sum(1 for keyword in required if keyword.lower() in output.lower())\nscore = found / len(required)\nreasoning = f\"Found {found}/{len(required)} required keywords\"\nreturn {\"score\": score, \"reasoning\": reasoning}"
    }
  }'

Example: Regex Scorer

For format validation:
curl -X POST https://api.playgent.com/v1/scorers \
  -H "Authorization: Bearer your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "order_id_format",
    "description": "Validates order ID format (ORD-XXXXX)",
    "type": "regex",
    "config": {
      "pattern": "ORD-[0-9]{5}"
    }
  }'

Using Custom Scorers

Once created, use your custom scorer in evaluation requests:
evaluation = client.evaluate(
    input="I want a refund",
    output="I can process that $600 refund for you right away",
    scorers=[
        "scorer_abc123",  # Your custom policy_compliance scorer
        "answer_relevancy",  # Built-in scorer
        "faithfulness"  # Built-in scorer
    ]
)

Tips

  • Start with built-ins: Try playval with a custom expected_behavior or use safety metrics like bias/toxicity before creating a full scorer
  • Template variables: Use {{output}}, {{input}}, {{context}} to reference evaluation data
  • Combine with built-ins: Mix custom scorers with RAG/safety/agentic metrics for comprehensive evaluation
  • Version control: Create new scorers for significant rubric changes rather than modifying existing ones