How to fix flaky tests in CI

84% of CI failures are flaky, not real bugs. Here’s how to identify, classify, and fix flaky tests - without re-running and hoping.

Why this is hard to test

  • Flaky tests have multiple root causes: timing issues, data non-determinism, environment drift, and third-party dependency changes
  • Re-running hides the problem - teams lose trust in CI and start ignoring failures
  • Quarantining without governance hides real bugs alongside flaky ones
  • At scale, flake investigation consumes 20–40% of engineering time in QA-heavy organizations

Approach 1: Manual flake management

  1. 1.Track flake rate per test over 30 days - use CI analytics (Buildkite Test Analytics, CircleCI insights)
  2. 2.Classify root causes: timing (add explicit waits), data (make fixtures deterministic), environment (check staging parity)
  3. 3.Quarantine high-flake tests - remove from blocking gate, add to a “repair queue” with assigned owners
  4. 4.Set an SLA: quarantined tests must be fixed or deleted within 2 sprints
  5. 5.Monitor flake rate as a team metric - target <5% of total test transitions

Approach 2: Zerocheck run evidence

  1. 1.Approved tests run on GitHub PRs and record failure details, screenshots, recordings, and step traces
  2. 2.PR comments show which approved failures are blocking and which are non-blocking
  3. 3.Run history and trend data help engineers spot noisy tests without guessing from red/green alone
  4. 4.Suggested regression tests can be reviewed and approved after repeated failures or incidents

Common pitfalls

  • Don’t just re-run - every re-run that passes is a hidden flaky test you’re ignoring
  • Don’t quarantine without governance - quarantined tests need owners and deadlines
  • Don’t add sleep() as a fix - use explicit wait conditions (waitForSelector, waitForResponse)
  • Don’t blame the framework - most flakiness comes from test design (shared state, non-deterministic data), not the tool

FAQ

What causes flaky tests?

The top causes are: timing/race conditions (tests interact before the page is ready), non-deterministic test data (random IDs, ordering), environment drift (staging differs from production), and third-party dependency changes (Stripe, OAuth providers).

How do I measure flake rate?

Track the percentage of test transitions from pass-to-fail that revert on re-run. Google found that 84% of such transitions are flaky. Most CI platforms (Buildkite, CircleCI) provide this data.

Should I delete flaky tests?

If a test has been quarantined for 2+ sprints with no fix, delete it. A permanently quarantined test provides zero value and clutters your suite. Better to have no test than a test everyone ignores.

How to fix flaky tests in CI

Skip the setup. Zerocheck handles it in plain English.

Get a demo