Containment Rate

Back to Glossary Results

The percentage of user interactions successfully handled by the AI system without requiring escalation, human intervention, or failure.

Also known as: resolution rate, self-service rate, automation rate

Overview

Containment rate measures how effectively your AI system resolves user queries independently. High containment rates indicate the system successfully handles most interactions, while low rates suggest frequent escalations, failures to understand, or inability to complete tasks.

Why Containment Rate Matters

Containment rate directly impacts costs—each successfully contained interaction eliminates the need for expensive human intervention. High containment enables scalability by allowing the system to handle more users simultaneously. Quick complete resolutions improve user satisfaction, and the metric provides a clear way to measure your AI system's return on investment.

As a quality measure, containment rate reveals your system's capability across different scenarios. It shows whether your training and configuration are effective, exposes gaps in handling edge cases, and indicates how well the system aligns with actual user needs in practice.

Measuring Containment Rate

Using Penelope, you can measure goal achievement as a proxy for containment. When Penelope successfully completes defined goals without hitting dead ends or requiring fallbacks, that interaction counts as contained. Task completion metrics track whether conversations reach their intended outcome versus requiring human takeover. The key is defining clear success criteria for what constitutes "contained" versus "escalated" in your specific context.

Improving Containment Rate

Start by identifying escalation patterns—which types of requests most often require human intervention? Common escalation causes might include knowledge gaps, unclear user intent, edge cases your system hasn't been trained for, or policy situations requiring human judgment. Analyzing these patterns reveals where to focus improvement efforts.

Expand capabilities systematically by addressing the most frequent escalation causes first. This might mean adding knowledge about common topics, improving your system's ability to ask clarifying questions, or training on previously unseen scenarios. Test each improvement to verify it actually increases containment rates before moving to the next.

Design graceful handoffs for situations where containment isn't possible. Rather than abruptly failing, your system should recognize its limitations, gather relevant context, and pass that context to human agents efficiently. Good handoffs maintain user trust even when full automation isn't achievable.

Containment Rate Targets

Expected containment rates vary significantly by use case complexity. Simple FAQ systems handling straightforward questions might achieve 80-90% containment. E-commerce support dealing with order status, returns, and product questions typically sees 60-75% containment. Technical support involving troubleshooting and diagnostics often achieves 40-60% containment. Complex services requiring nuanced judgment might target 30-50% containment, with human expertise handling more sophisticated cases.

Multi-Turn Containment Testing

Multi-turn conversations reveal containment patterns that single-turn tests miss. Does your system maintain containment across extended dialogues, or does it eventually need human help? Testing multi-turn containment with Penelope helps identify where conversations break down, whether the system can recover from clarification requests, and how well it handles complex multi-step scenarios.

Monitoring Containment in Production

Production monitoring provides real-world containment data beyond what test scenarios reveal. Track containment rates over time to detect degradation or improvement. Segment by query type, time of day, user cohort, and other dimensions to identify specific areas needing attention. Correlate containment with user satisfaction metrics to understand whether high containment actually translates to happy users, or if you're containing interactions poorly.

Best Practices

Define containment clearly in your specific context. What counts as successful resolution? Does partial help count, or must the user's problem be completely solved? Track trends over time rather than obsessing over daily snapshots, as containment rates naturally vary. Segment by scenario type, recognizing that different interaction types have different expected containment rates.

When identifying improvement opportunities, analyze which scenarios cause escalation most frequently. Prioritize fixing the most common escalation causes rather than rare edge cases. Expand capabilities gradually and systematically, verifying each change actually improves containment before adding more. Test improvements with realistic scenarios including edge cases, not just happy path interactions.

For measurement and communication, connect containment to business metrics by showing cost savings and efficiency gains. Correlate containment with user satisfaction scores to ensure automation isn't sacrificing quality for quantity. Track improvements over time to demonstrate progress and justify continued investment. Analyze why containment fails in specific cases to guide future enhancements rather than just counting successes and failures.

Documentation

/platform/test-results /penelope

Related Terms

Multi-Turn Test Penelope Test Result