Model
Back to GlossaryConfiguration
An AI model configuration used for test generation, evaluation, or as a judge in metric assessments.
Also known as: AI model, LLM
Overview
Models in Rhesis serve multiple purposes: generating tests, evaluating responses as judges, and powering multi-turn test conversations. Configure models once and use them across different contexts.
Model Roles
Judge Models: Evaluate AI responses against metrics:
- GPT-4 for nuanced evaluation
- Claude for detailed reasoning
- Gemini for multimodal judging
Test Generation Models: Create test cases from prompts and knowledge:
- Generate diverse scenarios
- Create edge cases
- Produce realistic prompts
Multi-Turn Test Models: Power Penelope for conversational tests:
- Adaptive dialogue
- Goal-oriented conversations
- Context-aware responses
Supported Providers
- OpenAI: GPT-4, GPT-4 Turbo, GPT-3.5
- Anthropic: Claude 3 Opus, Sonnet, Haiku
- Google: Gemini Pro, Gemini Flash
- Ollama: Local model execution
- Hugging Face: Open-source models
- Rhesis: Models served by Rhesis
Using Models with SDK
Choosing Models
For Evaluation:
- Accuracy: Use most capable models (GPT-4, Claude Opus)
- Speed: Balance with GPT-4 Turbo or Gemini Flash
- Cost: Use GPT-3.5 or local models for simple checks
For Test Generation:
- Diversity: Higher temperature models
- Speed: Fast models like Gemini Flash
- Scale: Efficient models for bulk generation
Best Practices
- Model selection: Match model capabilities to task complexity
- Cost monitoring: Track usage and optimize model choice
- Benchmark: Compare model performance on your use cases
- Defaults: Use without arguments for sensible defaults