Every AI testing vendor markets "self-healing tests" so I got curious about what is happening under the hood.

Most of them work like this: record a test with CSS selectors, and when the primary selector breaks, try a ranked list of fallbacks. XPath, text content, element position, data-testid. If a fallback matches, mark the test as "healed." That's it. Backup selectors with a priority queue.

The dev reactions I keep seeing track with this: "Self-healing = five backup XPaths." "Great in demos, fails on real products." "Works fine as long as your entire test suite fits on a login page." lol

The real problem is what happens when the selector broke because the feature broke. The test "heals" by finding an alternative element, succeeds, and now you have a passing test on broken behavior. "What happens when the AI heals away a real bug?"

I think this is why around 30% of AI testing POCs get abandoned. Teams try it, see promising results in the demo, then in production you can't tell the difference between "healed correctly" and "healed incorrectly, now passing on a bug." That ambiguity kills trust pretty fast.

What makes more sense to me is testing based on what the user sees. "Click the submit button" defined by what the button looks like and does, not where it sits in the HTML tree. Playwright's getByRole/getByText is a step in this direction. Selector healing reacts to breakage. Intent-based approaches don't create the breakage in the first place.

Anyone running self-healing tools in production for more than 6 months? Does trust hold up or degrade over time?

"Self-healing tests" are just backup XPaths

Stop babysitting flaky tests