Create Custom Scorer

Create a custom scorer for domain-specific evaluation criteria. Playgent provides 27 built-in metrics (RAG, safety, agentic, multi-turn), but use this endpoint when you need evaluation logic specific to your use case.

Already have 27 metrics? Check out the built-in scorers before creating custom ones. Most use cases are covered by playval, faithfulness, answer_relevancy, bias, toxicity, and other out-of-the-box metrics.

When to Create Custom Scorers

Domain-specific quality: e.g., “Does the response follow medical compliance rules?”
Business logic: e.g., “Did the agent offer the correct discount tier?”
Custom rubrics: Your own evaluation criteria not covered by built-ins
Code-based evaluation: Regex, exact matching, or programmatic checks

Scorer Types

llm_judge

Use an LLM to evaluate based on your custom rubric. Most flexible option. Best for: Subjective quality assessment, custom criteria

code

Execute Python code to evaluate the response programmatically. Best for: Exact matching, calculations, format validation

regex

Use regex patterns to validate response format or content. Best for: Format checking, required keyword presence

Parameters

name

string

required

Scorer name (e.g., policy_compliance, discount_accuracy)

description

string

Human-readable description of what this scorer evaluates

type

string

required

Scorer type: llm_judge, code, or regex

config

object

required

Scorer configuration

Show properties

model

string

LLM model for llm_judge type (default: gpt-4-turbo)

rubric

string

required

Evaluation rubric (for llm_judge) or code (for code type) Available template variables: {{ input }}, {{ output }}, {{ context }}, {{ ground_truth }}, {{ conversation_history }}, plus any custom variables

pattern

string

Regex pattern (for regex type)

output_schema

object

Expected output structure for llm_judge

Show properties

score

string

Score field type (usually number)

reasoning

string

Reasoning field type (usually string)

[custom]

string

Additional custom fields (e.g., violations, suggestions)

variables

object

Custom variables to inject into the rubric template (e.g., company policies, reference data)

scorer_id

string

required

Unique scorer identifier (use in evaluate requests)

name

string

required

Scorer name

curl -X POST https://api.playgent.com/v1/scorers \
  -H "Authorization: Bearer your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "policy_compliance",
    "description": "Checks if the response adheres to company policies",
    "type": "llm_judge",
    "config": {
      "model": "gpt-4-turbo",
      "rubric": "You are evaluating whether an AI agent response complies with company policies.\n\nPolicies:\n{{policies}}\n\nAgent Response:\n{{output}}\n\nScore from 0-1 where 1 is fully compliant. List any violations. Explain your reasoning.",
      "output_schema": {
        "score": "number",
        "reasoning": "string",
        "violations": "array"
      }
    },
    "variables": {
      "policies": "1. Never promise refunds over $500 without manager approval\n2. Always verify customer identity before discussing order details\n3. Do not discuss competitor products"
    }
  }'

{
  "scorer_id": "scorer_abc123",
  "name": "policy_compliance"
}

Example: Code-Based Scorer

For programmatic evaluation:

curl -X POST https://api.playgent.com/v1/scorers \
  -H "Authorization: Bearer your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "required_keywords",
    "description": "Checks if response contains required keywords",
    "type": "code",
    "config": {
      "rubric": "required = [\"refund\", \"policy\", \"30 days\"]\nfound = sum(1 for keyword in required if keyword.lower() in output.lower())\nscore = found / len(required)\nreasoning = f\"Found {found}/{len(required)} required keywords\"\nreturn {\"score\": score, \"reasoning\": reasoning}"
    }
  }'

Example: Regex Scorer

For format validation:

curl -X POST https://api.playgent.com/v1/scorers \
  -H "Authorization: Bearer your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "order_id_format",
    "description": "Validates order ID format (ORD-XXXXX)",
    "type": "regex",
    "config": {
      "pattern": "ORD-[0-9]{5}"
    }
  }'

Using Custom Scorers

Once created, use your custom scorer in evaluation requests:

evaluation = client.evaluate(
    input="I want a refund",
    output="I can process that $600 refund for you right away",
    scorers=[
        "scorer_abc123",  # Your custom policy_compliance scorer
        "answer_relevancy",  # Built-in scorer
        "faithfulness"  # Built-in scorer
    ]
)

Tips

Start with built-ins: Try playval with a custom expected_behavior or use safety metrics like bias/toxicity before creating a full scorer
Template variables: Use {{output}}, {{input}}, {{context}} to reference evaluation data
Combine with built-ins: Mix custom scorers with RAG/safety/agentic metrics for comprehensive evaluation
Version control: Create new scorers for significant rubric changes rather than modifying existing ones

Overview

Agents

Test Cases

Test Runs

Tracing

Evaluation

Optimization

Webhooks

Analytics

Create Custom Scorer

When to Create Custom Scorers

Scorer Types

Parameters

Example: Code-Based Scorer

Example: Regex Scorer

Using Custom Scorers

Tips

Overview

Agents

Test Cases

Test Runs

Tracing

Evaluation

Optimization

Webhooks

Analytics

Documentation Index

​When to Create Custom Scorers

​Scorer Types

​Parameters

​Example: Code-Based Scorer

​Example: Regex Scorer

​Using Custom Scorers

​Tips

When to Create Custom Scorers

Scorer Types

Parameters

Example: Code-Based Scorer

Example: Regex Scorer

Using Custom Scorers

Tips