Glossary

Agentic Testing: What It Is and How It Differs from Self-Healing

Definition

Agentic testing is a testing approach where an AI agent generates, executes, or maintains end-to-end tests by interacting with the rendered application instead of relying only on code-level selectors. Traditional test automation (Playwright, Cypress, Selenium) locates DOM elements through CSS selectors, XPath expressions, roles, or data-testid attributes, then performs actions on those elements programmatically. Self-healing tools (Mabl, Testim, Healenium) still capture selectors, but use ML models to guess replacement selectors when the originals break.

The interaction model differs. A Playwright test may click the element matching 'button.primary-cta[data-testid="submit-order"]'. If a designer renames that class or a developer removes the data-testid, the test can break. A self-healing tool may recover by finding a nearby button with similar attributes, but it is still searching for a DOM match. An agentic test describes the intended action: click the Submit Order button. The agent renders the page, identifies the button by label, position, and context, and clicks it.

The failure mode also differs. Selector-based tools fail when the locator no longer matches. Self-healing tools can fail silently when they guess wrong and report a pass. Agentic tools should fail when they cannot confidently identify the intended element. A button whose label changed from "Submit" to "Place Order" should trigger a confidence check, not an unreviewed guess.

Why it matters

Self-healing was marketed as the answer to test maintenance: selectors break, ML repairs them. The track record is mixed. A 2023 Tricentis survey found that 46% of developers distrust AI-driven testing accuracy, often because some tools silently modify test behavior and report green when they should report red.

The economic argument is maintenance cost. The World Quality Report consistently finds that teams spend 60-70% of their test automation budgets on maintenance, not new test creation. Much of that work is selector repair: updating CSS paths after a UI refactor, adding new data-testid attributes, or rewriting XPath expressions when the DOM restructures.

Agentic testing reduces the selector repair workload by avoiding customer-maintained selectors for browser steps. It also changes who can review tests. Intent-based steps describe product behavior: go to the pricing page, choose Upgrade, fill in the payment form, confirm the payment. Product managers, designers, QA, and engineers can all review those steps for intent.

How teams handle it today

The E2E testing landscape in 2025 sits across four tiers.

Manual selector-based tools (Playwright, Cypress, Selenium) remain the default choice for many engineering teams. These tools give you full control. The tradeoff is that each test is a hand-coded artifact that must be updated when the UI, data setup, or environment changes.

Self-healing via ML locators (Mabl, Testim, Healenium) attempts to reduce selector brittleness by finding replacement selectors when originals break. These tools can reduce maintenance for simple UI changes like class name renames. The failure mode is false confidence: when the tool "heals" to the wrong element, the test passes but validates the wrong thing.

Intent-based and vision-based tools (testRigor, Momentic, Spur) represent the newer agentic category. testRigor uses natural-language specs and interacts with the app through visual and accessibility-tree analysis. Momentic uses a vision model to identify elements on rendered pages. Spur takes a similar screen-based approach. The common concern is transparency: when the agent makes a decision, engineers need to see why.

Managed testing services (QA Wolf) use human QA engineers, sometimes augmented by AI tooling, to write and maintain tests as a service. The tradeoff is cost and dependency on an external team's turnaround time.

How Zerocheck approaches it

Zerocheck interacts with your application visually, reads rendered pages, and executes approved browser-step specs. The customer does not maintain CSS selectors or data-testid attributes for those steps.

Every interaction includes a confidence signal. When the agent clicks a button, it reports how confident it is that it found the right element. When confidence drops below threshold, the test fails instead of guessing.

Each run records what the agent saw, what it identified, what it clicked, and where it failed. Engineers can review the screenshots, recording, and step trace from the PR comment or dashboard.

Agentic testing vs self-healing: the architecture difference

The terms "agentic testing" and "self-healing testing" are sometimes used interchangeably in marketing copy, but they describe different architectures.

Self-healing operates as a repair layer on top of selector-based automation. The underlying test still defines interactions in terms of DOM elements: click the element at #checkout-form > button.submit, fill the input at [data-testid='email-field']. The self-healing layer monitors these selectors and, when one fails to match, searches for a replacement using heuristics: nearby elements with similar attributes, elements with matching text content, elements in similar DOM positions. Tools like Healenium maintain a database of selector histories and use cosine similarity between old and new DOM snapshots. Mabl uses a multi-locator strategy, trying CSS, XPath, visual position, and accessibility attributes in sequence.

This architecture handles simple changes well: a class rename from .btn-primary to .cta-button, an added wrapper div that shifts element depth. It handles complex changes poorly: a form that gets split into a multi-step wizard, a navigation that moves from a sidebar to a top bar, a checkout flow that replaces inline fields with a third-party iframe. In these cases, the self-healing model either fails to find a match or matches the wrong element.

Agentic testing has no selector layer to repair. The agent receives an instruction like "fill in the email field with [email protected]" and interprets the current page to find the email field. It uses field labels, placeholder text, position relative to other elements, and visual context. If the email field moves from a single-page form to the second step of a wizard, the agent can navigate to that step and fill it in because it follows intent instead of a DOM path.

The workflow difference shows up during UI refactors. A team that redesigns checkout with a self-healing tool will see a mix of healed tests and broken tests, then manually verify healed steps. A team using an agentic tool should see low-confidence steps flagged for review where the redesigned flow is ambiguous.

The trust problem in AI testing

The 46% developer distrust statistic from Tricentis lines up with how self-healing is often deployed. The core complaint, which shows up consistently in developer forums and team retrospectives, is: "How do I know the AI didn't heal away a real bug?"

Here is the concrete scenario. A developer changes a form validation rule, accidentally breaking the error message display. The submit button still exists, the form still submits, but error messages no longer appear. A selector-based test that checks for the error message element would catch this, assuming the selector still matches. A self-healing test that was checking for the error message might "heal" by finding a different text element on the page and reporting a pass. The real bug ships to production.

Teams that adopt self-healing tools without strong review processes report discovering production bugs that their passing test suites should have caught. Each false pass reduces attention paid to test results, which lets more bugs escape.

Transparency is the control. An agent can be wrong, but an agent that shows its work lets engineers catch mistakes. This means three things in practice. First, every agent decision should be logged with enough context to understand why: what the agent saw on the page, which elements it considered, which one it chose, and its confidence level. Second, low-confidence decisions should be surfaced proactively. If the agent is 73% confident it found the right element, that should be a yellow flag in the test report. Third, engineers need the ability to pin specific interactions when they want deterministic behavior. If a test step confirms a payment amount, the engineer should be able to say "this field must contain exactly $49.99" rather than relying on interpretation.

Teams tend to keep tools that make AI decisions visible and give engineers control over the confidence threshold. This is why the "fails closed" pattern matters more than model accuracy: a tool that admits uncertainty is easier to review than a tool that always claims confidence.

When agentic testing fails closed

"Fails closed" is a concept borrowed from security engineering. A firewall that fails closed blocks all traffic when it encounters an error, rather than allowing all traffic through. Applied to testing, failing closed means: when the agent is not confident in its action, the test reports a failure rather than guessing and reporting a pass.

This is the opposite of how most self-healing tools operate. Self-healing tools fail open: when the primary selector breaks, they try alternatives and, if any alternative matches with reasonable similarity, they proceed and report success. The assumption is that healing is usually correct, so it is better to keep tests green than to generate false failures. In practice, this optimizes for CI dashboard aesthetics over actual test reliability.

Failing closed has a concrete workflow. When an agentic test encounters a step where confidence drops below threshold (say, below 0.85), several things happen. The test step is marked as uncertain. The test run reports a warning or failure, depending on team configuration. The visual trace shows exactly what the agent saw and why it was uncertain: maybe the button label changed, maybe two elements matched the description, maybe the element was partially obscured. An engineer reviews the flagged step, confirms or corrects the agent's interpretation, and the test resumes with updated understanding.

The objection to failing closed is: "Won't this create too many false failures after a big UI change?" In practice, the answer depends on the confidence threshold. A threshold of 0.85 will flag genuinely ambiguous situations (icon-only buttons, duplicate labels, significantly restructured layouts) while passing straightforward adaptations (moved elements, renamed classes, added containers). Teams typically see 5-15% of test steps flagged after a major redesign, with most requiring 30 seconds of review to confirm.

The alternative, failing open, has a hidden cost that is harder to measure: steady erosion of trust. When a tool silently heals and reports green, engineers stop looking at test results carefully. When they stop looking carefully, they miss the one time the healing was wrong. Failing closed keeps engineers in the loop for ambiguous situations while handling the clear cases automatically. The goal is not zero human involvement. The goal is human involvement only where it adds value.

Where the industry is heading

The term "agentic testing" barely registered on Google Trends twelve months ago. As of early 2025, it has measurable search volume and shows up in 10 out of 10 Google Autocomplete suggestions for related queries. This is not organic developer curiosity alone. Enterprise vendors are actively investing in the category.

UiPath, which built its business on robotic process automation (RPA), has been expanding into AI-driven test automation with agentic capabilities. Their pitch connects process automation agents to test automation agents, a natural extension of their platform. Mabl has been repositioning from "self-healing" to "intelligent test automation" with agent-based features. Perfecto (now part of Perforce) is integrating agentic capabilities into their mobile and web testing platform. Tricentis, the largest pure-play testing vendor, has been making acquisitions in the AI testing space.

Gartner's 2024 Hype Cycle for Software Testing placed AI-augmented testing in the "Slope of Enlightenment" phase, meaning the initial hype has faded and practical, production-viable implementations are emerging. Their recommendation: teams should evaluate agentic tools for new projects starting in 2025, rather than waiting for the technology to fully mature.

The developer adoption pattern is following a predictable path. Early adopters (2023-2024) were startups with small teams, no existing test infrastructure, and a willingness to try new tools. Early majority (2025-2026) will be mid-market companies with painful selector maintenance burdens and enough test suite complexity to justify switching. Late majority adoption will follow once agentic tools demonstrate reliability at enterprise scale, with SOC 2 evidence trails, audit logs, and integration with existing CI/CD platforms like Jenkins, CircleCI, and GitHub Actions.

The interesting competitive dynamic is that selector-based tools are not going away. Playwright, in particular, continues to improve and has a loyal community. The likely equilibrium is that Playwright becomes the "control" option for teams that want full determinism, while agentic tools handle the 80% of tests where maintenance cost outweighs the value of explicit selector control. Teams will use both, the same way they use unit tests and integration tests for different purposes.

Related terms