Behavior

Back to Glossary Testing

A formalized expectation that describes how your AI system should perform, such as response quality, safety, or accuracy.

Overview

Behaviors define the expectations for how your AI system should perform. They serve as the foundation for creating metrics and organizing tests around specific quality dimensions.

Common Behavior Categories

Quality:

Accuracy: Factually correct information
Completeness: Comprehensive responses
Relevance: Answers the actual question
Clarity: Easy to understand

Safety:

Harmlessness: No dangerous or harmful content
Appropriate Refusal: Declines inappropriate requests
Privacy Aware: Respects PII and confidentiality
Bias-Free: Fair and unbiased responses

Functional:

Tool Usage: Correctly uses available tools
Format Compliance: Follows required formats
Instruction Following: Adheres to guidelines
Context Awareness: Uses conversation context

Using Behaviors

In the Web Interface: Define behaviors through the Rhesis web interface when creating metrics and organizing tests.

With SDK Synthesizers:

python
from rhesis.sdk.synthesizers import Synthesizer

synthesizer = Synthesizer(
      prompt="Test a medical chatbot",
      behaviors=[
          "medically accurate",
          "cites reliable sources",
          "admits uncertainty when appropriate",
          "refuses to diagnose"
      ],
      categories=["symptoms", "medications", "treatments"]
)

test_set = synthesizer.generate(num_tests=50)

From Behaviors to Tests

Define the behaviors you care about
Generate tests that exercise those behaviors
Create metrics to evaluate the behaviors
Run evaluations and analyze results
Iterate based on findings

Best Practices

Be specific: Vague behaviors lead to inconsistent evaluation
Provide examples: Show what good and bad looks like
Prioritize: Focus on behaviors that matter most to users
Iterate: Refine behaviors based on real-world performance

Documentation

/platform/behaviors

Related Terms

Metric Test