Out-of-Scope Query

User requests that fall outside the AI system's designed capabilities, knowledge domain, or intended purpose.

Also known as: out of domain, off-topic query

Overview

Out-of-scope queries test how your AI system handles requests it shouldn't or can't fulfill. Graceful handling of these queries is crucial for maintaining user trust and preventing harmful outputs.

Types of Out-of-Scope Queries

Domain mismatches occur when users ask specialized systems about unrelated topics. A medical chatbot might be asked about legal advice, an e-commerce bot about unrelated topics, a customer support bot to write code, or a financial advisor bot about health matters. These requests fall outside the system's knowledge domain and intended purpose.

Capability mismatches involve requesting actions the system cannot perform, such as requesting real-time information the system can't access, asking for actions beyond system capabilities, requesting access to unavailable data, or wanting features that don't exist. The system physically cannot fulfill these requests regardless of how well designed it is.

Policy violations include harmful or dangerous requests, privacy-violating queries, inappropriate content, and requests against terms of service. These fall outside acceptable use even if the system technically could respond.

Testing Out-of-Scope Handling

Refusal quality metrics evaluate how well your system declines out-of-scope requests. Good refusals clearly explain why the request can't be fulfilled, maintain helpful tone, and offer alternatives where possible. Generating out-of-scope tests means systematically creating requests that span different types of scope violations to verify consistent, appropriate refusals. Boundary testing probes the edges of your system's scope, testing requests that are almost but not quite within capabilities to ensure the system correctly distinguishes.

Refusal Patterns

Good refusals clearly explain why the request is out of scope, maintain a professional and helpful tone, offer alternatives or redirect to appropriate resources, and never attempt to fake capabilities the system doesn't have. For example: "I'm designed to help with medical information, but legal questions require specialized legal expertise. I'd recommend consulting with a qualified attorney for advice on this matter."

Poor refusals simply say "I can't help with that" without explanation, attempt to answer despite being out of scope (often leading to hallucinations or errors), fake capabilities by pretending the system can do things it cannot, or respond with unhelpful or dismissive tone that frustrates users.

Testing Strategies

Categorical evaluation systematically tests each type of out-of-scope query—domain mismatches, capability limits, and policy violations—to ensure consistent handling across categories. Multi-turn scope testing verifies that users can't gradually steer the system out of scope through a series of requests that individually seem reasonable but collectively violate boundaries.

Common Pitfalls

Attempting to answer anyway despite being out of scope leads to low-quality, potentially incorrect responses that damage user trust. The system should recognize its limitations rather than guessing. Hallucinating capabilities means pretending to have features or knowledge the system doesn't actually possess, creating false expectations and potentially providing dangerous misinformation.

Best Practices

For scope definition, clearly document what your system is and isn't designed to support. Focus testing on boundary cases where scope might be ambiguous. Include obviously out-of-scope cases in your test suite to establish baseline refusal behavior. Test user persistence by checking if repeated attempts can circumvent refusals. Monitor production queries to identify common out-of-scope requests that might indicate areas to expand or better document limitations.

For quality responses, provide clear refusals that explain why requests fall outside scope. Offer alternatives by suggesting what you can help with instead. Maintain professional and helpful tone even when declining requests. Direct users to appropriate resources or systems that can actually help. Never hallucinate capabilities or fake features the system doesn't have.

For continuous improvement, expand capabilities thoughtfully based on demonstrated demand rather than trying to be all things to all users. Document patterns in out-of-scope requests to understand user needs. Improve refusal messaging based on user feedback to make limitations clearer and less frustrating. Build a list of helpful redirects for common out-of-scope categories.

Documentation

/platform/tests