Blog

Data-driven insights on flaky tests, AI testing, test maintenance, and the future of QA.

·3 min read

GPT-4o fixes made code 37% MORE vulnerable

53% of AI-generated code contains security vulnerabilities. Iterative AI revision makes it worse. If AI can't write secure code, can it write meaningful tests?

Read post →
·4 min read

"Self-healing tests" are just backup XPaths

Most self-healing tools are backup selectors in a priority queue. When a selector breaks because the feature broke, it 'heals' and hides the bug.

Read post →
·3 min read

Same incident 3x. Fix was in postmortem

Average postmortem: 3-7 action items. Under 40% completed in 90 days. 60% never completed. 'Add a regression test' always loses to feature work.

Read post →
·3 min read

Playwright's shift away from CSS selectors

Multiple tools are moving from CSS/XPath selectors to the browser's accessibility tree. MCP is becoming the standard way AI agents interact with browsers.

Read post →
·3 min read

3 teams deleted most of their e2e tests

Hasura deleted 95%, Nubank deleted all, another team deleted half. Same pattern: flaky tests weren't catching bugs. Deleting without safety nets differs.

Read post →
·3 min read

42% of testers can't script. AI writes 41%

The scripting part of QA is exactly what AI automates. The judgment and strategy part isn't. But most QA roles are still defined around execution, not strategy.

Read post →
·3 min read

Your e2e results are SOC 2 evidence

Compliance platforms automate 80% of infra monitoring, but app-level testing evidence is still manual screenshots. 2 engineers, 2 weeks, every audit.

Read post →
·3 min read

Cypress to Playwright: ~67% flakiness drop

Currents.dev analyzed 400M test records: Playwright flake rate 0.72% vs Cypress 0.83%. But Cypress masks race conditions, so week one is terrifying.

Read post →
·3 min read

60-70% of test budgets go to maintenance

55% of teams spend 20+ hrs/week maintaining e2e tests. One fintech: 23 hrs/week just updating tests for UI changes. Hasura deleted 95% of theirs.

Read post →
·3 min read

41% of code is AI-generated. Who tests it?

53% of AI-generated code has security vulns. After 5 rounds of GPT-4o fixes, code had 37% MORE vulns. Creation is accelerating faster than verification.

Read post →
·4 min read

Every "pass locally, fail in CI" cause

Why tests pass locally but fail in CI: timing/race (40%), environment (30%), resource contention (15%), test isolation (15%). Full taxonomy.

Read post →
·3 min read

46% of devs distrust AI testing accuracy

79% cite AI as the most impactful testing tech. 46% distrust AI testing accuracy. 30% of AI testing projects get abandoned after POC.

Read post →
·3 min read

84% of CI failures are flaky, not bugs

Google: 84% of CI pass-to-fail transitions are flaky. Buildkite: 800M seconds of flaky re-runs per month. One team deleted half their tests, bugs went down.

Read post →