Skip to Content
GlossarySmoke Testing - Glossary

Smoke Testing

Back to GlossaryTesting

Quick, high-level validation tests that check basic functionality and critical features to determine if a system is stable enough for more detailed testing.

Also known as: sanity testing, build verification test

Overview

Smoke testing provides rapid validation that core functionality works before investing time in comprehensive testing. In AI systems, smoke tests verify that the system responds appropriately to basic scenarios and that critical paths function correctly.

Purpose of Smoke Testing

Smoke tests enable quick validation, running in minutes rather than hours, making them practical to execute frequently. They provide early detection by catching major issues immediately before deeper testing begins. They serve as gate keeping, blocking deployment of obviously broken systems before they reach production. Smoke tests offer confidence checks that verify basics work correctly before investing significant effort in detailed testing.

The use cases for smoke testing span multiple scenarios. Run smoke tests pre-production before promoting code to production environments. Execute them post-deployment to verify that deployment succeeded and the system is functioning. Use them for continuous monitoring as regular health checks of production systems. Smoke tests inform rollback decisions by quickly identifying when changes need to be reverted.

Creating Smoke Tests

Critical path coverage ensures your smoke tests check the most important user journeys and essential functionality. Identify the core features users rely on and create simple tests that verify these work at a basic level. A manual smoke test suite can supplement automated tests, providing a checklist humans can quickly run through when automated testing isn't available or as a secondary verification.

Running Smoke Tests

Integrating smoke tests into CI/CD pipelines ensures they run automatically on every build or deployment. Configure your continuous integration system to execute smoke tests before allowing promotion to the next environment. Post-deployment verification runs smoke tests immediately after deploying to production, confirming the deployment succeeded and core functionality remains intact.

Smoke Test Characteristics

What makes a good smoke test? Speed is essential—each individual test should complete in seconds, with the entire suite finishing in under five minutes. Tests should target critical functionality, focusing only on features that are absolutely essential for basic operation. Good smoke tests are representative, covering different capability areas rather than clustering around one type of functionality. Stability matters—smoke tests should have minimal false negatives, consistently passing when the system is healthy and consistently failing when there are real issues.

Smoke Tests vs. Full Testing

Smoke tests differ from comprehensive test suites in several key dimensions. For speed, smoke tests complete in under five minutes while full test suites may take hours. Coverage-wise, smoke tests focus on critical paths only, while full suites provide comprehensive coverage of all features and edge cases. Depth of validation is shallow for smoke tests—they verify basic functionality works—versus deep evaluation in full suites that examine quality, edge cases, and detailed behavior. The purpose differs: smoke tests make a go/no-go decision about whether to proceed, while full testing provides detailed quality assessment. Frequency reflects this—smoke tests run on every deploy, pull request, or commit, while full testing runs less frequently due to time requirements. Finally, thresholds are lenient for smoke tests to catch only major issues, while full test suites use strict thresholds to catch all issues including subtle problems.

Monitoring with Smoke Tests

Using smoke tests for continuous health checks means running them periodically in production to verify the system remains functional. This catches issues that might emerge from external factors like API changes or resource constraints. SLA monitoring can incorporate smoke tests as part of measuring whether your system meets service level agreements for availability and basic functionality.

Best Practices

For test selection, prioritize critical features first, covering the most important functionality users rely on. Ensure tests are representative of different capability areas rather than redundant checks of similar features. Design each test to be fast, completing in seconds. Maintain stability by removing flaky tests that fail intermittently without real issues. Make pass/fail criteria clear and obvious so it's immediately apparent when something is wrong.

For execution timing, run smoke tests early before deeper testing to avoid wasting time on detailed analysis of a broken system. Run them often—on every deploy, pull request, and commit—since their speed makes frequent execution practical. Implement fail-fast behavior where testing stops immediately on critical failures rather than continuing through the suite. Provide clear reporting so results are easy to understand at a glance. Enable quick action by configuring smoke test failures to trigger alerts or automatic rollbacks.

For maintenance, keep the smoke test suite minimal with 15-30 tests maximum to maintain speed. Update tests regularly as core features change but resist the temptation to expand unnecessarily. Remove flaky tests immediately to maintain reliability and trust in results. Review failures to distinguish true issues from false negatives. Expand carefully, ensuring new additions don't slow down the suite to the point where frequent execution becomes impractical.

Example Smoke Test Suite

A typical smoke test suite for an AI system might include: basic response generation testing that the system responds to simple queries, safety filter verification ensuring harmful content is blocked, core feature checks for the three most critical capabilities, latency validation that responses complete within acceptable timeframes, and error handling tests verifying graceful degradation when external dependencies fail. This covers essentials without attempting comprehensive validation, enabling quick go/no-go decisions.

Documentation

Related Terms