Flaky failures and unrelated passes both make PR review harder. Zerocheck reports a confidence score based on what changed, what ran, and the reliability of the results.
“At Google, we found that 84% of pass-to-fail transitions are caused by flaky tests, not real bugs.”
Google Testing Blogsource
“Test failures are ignored, builds are rerun blindly, and every green pipeline is suspect.”
Dmytro Huz, DEV Communitysource
84% of CI failures are flaky, not real bugs
Most testing tools do not produce a calibrated per-PR confidence score
Binary pass/fail gates are overridden 10-20% of the time at most companies
Most testing tools output binary pass/fail. A green check can come from irrelevant tests, and a red check can come from flaky noise.
Testim and Mabl have element-level confidence for self-healing, but not PR-level confidence. Datadog monitors report status, not risk per change.
Teams still need to know which specific user flow a PR could break and how much signal the test run provides.
Developer pushes PR. CI runs 200 tests. 6 fail. Are they real? Developer investigates for 45 minutes. 5 are known flakes, 1 is a real issue in an unrelated module. Developer fixes, re-runs, waits again. Total time wasted: 2 hours. Confidence in the merge: uncertain.
Same PR. Zerocheck analyzes the diff, runs targeted tests, and reports: 'Confidence: 94%. 4 tests ran (all relevant to checkout changes). 1 informational warning on settings page (nightly). 0 flakes.' Developer merges in 5 minutes with evidence attached.
PR diff analyzed to identify affected user flows
Tests run with flake-vs-real classification per failure
Run confidence reflects execution results and step-level resolution confidence
PR comment shows score, evidence, and pass/fail status
Other tools document their own platform controls. Zerocheck produces JSON evidence from your executed application tests.
Get coverage on the flows customers will notice when they break, without turning testing into a quarter-long infrastructure project.
Guard the only code path where a bug is measured in lost dollars per minute.