Glossary
Test flakiness is the rate at which tests produce inconsistent results - passing sometimes and failing others without code changes. It’s measured as the percentage of test transitions from pass to fail that are non-deterministic.
Over 800 million seconds (25 years) of flaky tests are manually re-run by developers each month on Buildkite’s platform alone. At Slack, flaky tests were the number one factor impacting mobile developer happiness. Flakiness compounds: as flake rate rises, trust falls, engineers disable checks, and real bugs escape.
The industry standard is to track flake rate per test over a 30-day window. Tests flaking above a threshold (e.g., >10% of runs) are quarantined - removed from the blocking gate but kept visible. Google, Uber, and Slack have all published their flake management approaches.
Zerocheck records run history, failure cause, screenshots, recordings, and step traces so teams can see what failed and whether a pattern looks noisy over time.