How to fix flaky Playwright tests in CI

    Your Playwright tests pass locally but fail in CI. Here are the 6 most common causes and the exact code to fix each one.

    Why this is hard to test

    • CI runners have slower CPUs and less memory than your laptop, so timing-sensitive assertions fail under load
    • Headless mode renders differently than headed browsers, causing layout-dependent locators to miss elements
    • Network requests to external services (APIs, CDNs, auth providers) add variable latency that localhost never sees
    • Playwright's auto-wait handles visibility and enabled state, but not animations, debounced inputs, or SPA route transitions
    • Shared test data and parallel execution create ordering dependencies that only surface when tests run together

    Approach 1: Debug and fix each flake

    1. 1.Enable trace capture on first retry to collect actionability logs, DOM snapshots, and network timelines
    2. 2.Run the failing test 50 times locally with --repeat-each=50 to reproduce intermittent failures
    3. 3.Classify each flake: race condition, data dependency, environment difference, or external service timeout
    4. 4.Replace implicit waits with explicit conditions: waitForLoadState, waitForFunction, or expect().toBeVisible()
    5. 5.Isolate test data per worker using unique IDs or beforeEach cleanup to eliminate ordering dependencies
    6. 6.Pin CI config: set workers to 1 for initial stability, then increase once flakes are resolved

    Approach 2: Skip selectors entirely with Zerocheck

    1. 1.Zerocheck interacts with your app visually, the way a real user does, so there are no selectors to go stale
    2. 2.No frameLocator chains, no CSS selectors, no data-testid attributes to maintain
    3. 3.Auto-wait is based on visual readiness, not DOM state, which eliminates animation and transition flakes
    4. 4.Tests run on managed infrastructure with consistent resources, removing the CI runner lottery

    Race conditions: when auto-wait isn't enough

    Playwright's auto-wait is good. It checks that an element is visible, enabled, and stable before interacting with it. But "stable" in Playwright's definition means the element's bounding box hasn't changed between two animation frames. That is not the same as "ready for interaction." Three categories of race conditions survive auto-wait. First, CSS animations and transitions. A modal that slides in over 300ms is technically visible and stable midway through the animation, but clicking a button inside it while it is still moving can miss the target. Second, debounced inputs. If your search field waits 500ms after the last keystroke before firing a request, Playwright will type and immediately try to assert results, long before the debounce fires. Third, SPA route transitions. Client-side navigation with React Router, Next.js, or similar frameworks can update the URL before the destination component has mounted and rendered data. The fix is different for each case. For animations, wait for the animation to complete using waitForFunction that checks getComputedStyle or a specific CSS class. For debounced inputs, wait for the network request that the debounce triggers. For SPA transitions, wait for the actual content to appear on the destination page, not just the URL change. The code below shows a common flaky pattern and the fixed version for each case.

    import { test, expect } from "@playwright/test";
    
    // BAD: Clicking a button inside an animating modal
    test("flaky modal interaction", async ({ page }) => {
      await page.goto("/dashboard");
      await page.getByRole("button", { name: "New project" }).click();
      // Modal is animating in - click may miss
      await page.getByRole("button", { name: "Create" }).click();
    });
    
    // FIXED: Wait for animation to complete
    test("stable modal interaction", async ({ page }) => {
      await page.goto("/dashboard");
      await page.getByRole("button", { name: "New project" }).click();
    
      // Wait for the modal to finish its CSS transition
      const modal = page.getByRole("dialog");
      await modal.waitFor({ state: "visible" });
      await page.waitForFunction(() => {
        const el = document.querySelector('[role="dialog"]');
        if (!el) return false;
        return getComputedStyle(el).opacity === "1";
      });
    
      await modal.getByRole("button", { name: "Create" }).click();
    });
    
    // BAD: Asserting search results before debounce fires
    test("flaky search", async ({ page }) => {
      await page.goto("/users");
      await page.getByPlaceholder("Search").fill("alice");
      // Debounce hasn't fired yet - results are stale
      await expect(page.getByText("[email protected]")).toBeVisible();
    });
    
    // FIXED: Wait for the search request to complete
    test("stable search", async ({ page }) => {
      await page.goto("/users");
      const searchResponse = page.waitForResponse(
        (resp) => resp.url().includes("/api/users") && resp.status() === 200
      );
      await page.getByPlaceholder("Search").fill("alice");
      await searchResponse;
      await expect(page.getByText("[email protected]")).toBeVisible();
    });

    Iframe and cross-origin timing

    Playwright's auto-wait operates within a single frame. When your app embeds cross-origin iframes (Stripe Elements, OAuth consent screens, embedded analytics, chat widgets), the auto-wait boundary stops at the iframe edge. Your test needs to explicitly cross into the iframe context and wait for content inside it independently. The most common flake pattern is calling frameLocator() and immediately trying to interact with an element inside the iframe before that iframe has finished loading its own content. The iframe element itself may be present in the parent DOM, but the content inside it is still fetching scripts, rendering components, or waiting on its own network requests. Stripe Elements is the classic example. The iframe appears in the DOM almost instantly, but the card input field inside it takes 1 to 3 seconds to become interactive, depending on network speed and Stripe's CDN latency. In CI, this latency is often higher than on your local machine. OAuth popups add another layer. Some providers (Google, GitHub) open a new browser context rather than an iframe. Playwright can handle this with page.waitForEvent('popup'), but the popup's content has its own load lifecycle that you need to wait for independently. The fix for all iframe timing issues is the same pattern: locate the frame, then wait for a specific element inside it to reach the state you need before interacting.

    import { test, expect } from "@playwright/test";
    
    // BAD: Interacting with iframe content too early
    test("flaky iframe interaction", async ({ page }) => {
      await page.goto("/checkout");
      const stripeFrame = page.frameLocator(
        'iframe[src*="js.stripe.com"]'
      );
      // iframe DOM exists but Stripe JS hasn't rendered the input yet
      await stripeFrame
        .locator('[placeholder="Card number"]')
        .fill("4242424242424242");
    });
    
    // FIXED: Wait for iframe content to be ready
    test("stable iframe interaction", async ({ page }) => {
      await page.goto("/checkout");
      const stripeFrame = page.frameLocator(
        'iframe[src*="js.stripe.com"]'
      );
    
      // Wait for the input to be visible inside the iframe
      const cardInput = stripeFrame.locator('[placeholder="Card number"]');
      await expect(cardInput).toBeVisible({ timeout: 10000 });
      await cardInput.fill("4242424242424242");
    });
    
    // Handling OAuth popup windows
    test("OAuth login via popup", async ({ page }) => {
      await page.goto("/login");
    
      // Listen for the popup before triggering it
      const popupPromise = page.waitForEvent("popup");
      await page.getByRole("button", { name: "Sign in with Google" }).click();
      const popup = await popupPromise;
    
      // Wait for the popup to finish loading
      await popup.waitForLoadState("domcontentloaded");
      await popup.getByLabel("Email").fill("[email protected]");
      await popup.getByRole("button", { name: "Next" }).click();
    
      // Popup closes after auth, control returns to main page
      await expect(page.getByText("Welcome back")).toBeVisible({
        timeout: 15000,
      });
    });

    Shared test data and ordering dependencies

    This is the flake that makes you question your career choices. Every test passes when run alone. Every test passes when run in a specific order. But run them all in parallel, and 3 out of 40 fail randomly. The root cause is shared mutable state. Tests that read from or write to the same database rows, the same browser storage, or the same API resources will interfere with each other when execution order changes. Playwright runs test files in parallel by default (one worker per CPU core), and the order within a file is sequential but the order across files is not deterministic. Common patterns that cause this: test A creates a user with email [email protected], test B also creates a user with email [email protected], whoever runs second gets a unique constraint violation. Or test A deletes all items from a list, test B asserts the list has 3 items, and B fails because A ran first. The fix is test isolation. Each test should create its own data and not depend on data created by other tests. Use unique identifiers (timestamps, UUIDs) for test entities. Use beforeEach to set up fresh state and afterEach to clean up. For tests that genuinely depend on a sequence (like a multi-step workflow), use test.describe.serial() to force sequential execution within that describe block. Playwright's storageState feature is also useful for isolation. You can capture authenticated state in a setup step and reuse it across tests without them sharing a live session.

    import { test, expect } from "@playwright/test";
    
    // BAD: Tests share the same user data
    test("create project", async ({ page }) => {
      await page.goto("/projects");
      await page.getByRole("button", { name: "New" }).click();
      // This name collides when tests run in parallel
      await page.getByLabel("Name").fill("My Project");
      await page.getByRole("button", { name: "Create" }).click();
    });
    
    // FIXED: Unique data per test run
    test("create project (isolated)", async ({ page }) => {
      const uniqueName = `Project-${Date.now()}-${Math.random()
        .toString(36)
        .slice(2, 7)}`;
    
      await page.goto("/projects");
      await page.getByRole("button", { name: "New" }).click();
      await page.getByLabel("Name").fill(uniqueName);
      await page.getByRole("button", { name: "Create" }).click();
      await expect(page.getByText(uniqueName)).toBeVisible();
    });
    
    // Sequential tests for dependent workflows
    test.describe.serial("checkout workflow", () => {
      test("add item to cart", async ({ page }) => {
        await page.goto("/products/widget-1");
        await page.getByRole("button", { name: "Add to cart" }).click();
        await expect(page.getByTestId("cart-count")).toHaveText("1");
      });
    
      test("complete checkout", async ({ page }) => {
        await page.goto("/cart");
        await page.getByRole("button", { name: "Checkout" }).click();
        await expect(page).toHaveURL(/\/confirmation/);
      });
    });
    
    // Shared auth state without shared sessions
    // In playwright.config.ts:
    // { storageState: '.auth/user.json' }
    // In auth.setup.ts:
    test("authenticate", async ({ page }) => {
      await page.goto("/login");
      await page.getByLabel("Email").fill("[email protected]");
      await page.getByLabel("Password").fill(process.env.CI_TEST_PASSWORD!);
      await page.getByRole("button", { name: "Log in" }).click();
      await page.waitForURL("/dashboard");
      // Save signed-in state for other tests to reuse
      await page.context().storageState({ path: ".auth/user.json" });
    });

    CI environment differences

    Your laptop has 16GB of RAM, an 8-core CPU, and a fast SSD. A standard GitHub Actions runner has 7GB of RAM, 2 vCPUs, and a network-attached disk. That difference alone explains why a test that takes 2 seconds locally takes 8 seconds in CI, and why an assertion with a 5-second timeout passes locally but fails in CI. Beyond raw performance, there are specific environment differences that cause flakes. Headless Chromium in CI may render fonts differently if the system font set differs from your local machine. This affects screenshot comparisons and any test that depends on text measurement or layout calculations. Timezone differences can break date-related assertions. A test that asserts "Today" shows "March 29" will fail in a CI runner set to UTC if you wrote it in UTC-5. Container memory limits are another silent killer. If your Playwright test spawns a browser that consumes more memory than the container allows, the Linux OOM killer will terminate the browser process mid-test. The error message is usually cryptic: "browser has been closed" or "target closed." You will spend hours looking for a bug in your test code when the real problem is a 512MB memory limit in your Docker config. The Playwright config below shows settings tuned for CI stability. The key changes: reduce workers to limit memory pressure, increase timeouts to account for slower execution, enable retries as a safety net (with the caveat that retries mask flakes rather than fixing them), and set explicit viewport and timezone.

    // playwright.config.ts - CI-optimized settings
    import { defineConfig, devices } from "@playwright/test";
    
    export default defineConfig({
      // Increase global timeout for slower CI runners
      timeout: 60_000,
      expect: {
        timeout: 10_000,
      },
    
      // Retries catch genuine flakes but mask root causes.
      // Use 2 in CI, 0 locally so you feel the pain.
      retries: process.env.CI ? 2 : 0,
    
      // Reduce workers in CI to lower memory pressure.
      // Default is 50% of CPU cores.
      workers: process.env.CI ? 1 : undefined,
    
      use: {
        headless: true,
        viewport: { width: 1280, height: 720 },
    
        // Consistent timezone across all environments
        timezoneId: "America/New_York",
        locale: "en-US",
    
        // Capture trace on first retry for debugging CI flakes
        trace: "on-first-retry",
    
        // Capture screenshot on failure
        screenshot: "only-on-failure",
    
        // Increase navigation timeout for slow CI networks
        navigationTimeout: 30_000,
        actionTimeout: 15_000,
      },
    
      projects: [
        {
          name: "chromium",
          use: { ...devices["Desktop Chrome"] },
        },
      ],
    
      // CI-specific web server config
      webServer: {
        command: "npm run start",
        port: 3000,
        // Give the server more time to start in CI
        timeout: 120_000,
        reuseExistingServer: !process.env.CI,
      },
    });

    The nuclear option: Playwright trace viewer for CI

    When a test only flakes in CI and you cannot reproduce it locally, the trace viewer is your best diagnostic tool. A Playwright trace captures a complete recording of the test execution: every DOM snapshot, every network request, every console message, every action your test performed with exact timing. The recommended config is trace: 'on-first-retry'. This means Playwright runs the test normally on the first attempt (no trace overhead), and if it fails, the retry captures a full trace. This keeps your CI fast on green runs and gives you detailed diagnostics when something flakes. Once the trace is captured, you need to get it off the CI runner. Traces are saved as .zip files in the test-results directory. Configure your CI to upload this directory as a build artifact. In GitHub Actions, use the actions/upload-artifact step. In other CI systems, configure equivalent artifact storage. To analyze the trace, download the artifact and open it with npx playwright show-trace trace.zip. The trace viewer shows a timeline of every action, a DOM snapshot at each point, network requests with timing, and console logs. You can step through the test frame by frame and see exactly what the page looked like when the assertion failed. The key insight traces reveal for CI-specific flakes is timing. You can see that a click happened 50ms before an element finished animating, or that a network request took 4 seconds in CI versus 200ms locally. This concrete timing data turns a "works on my machine" problem into a specific, fixable issue.

    # .github/workflows/e2e.yml
    name: E2E Tests
    on: [push, pull_request]
    
    jobs:
      test:
        runs-on: ubuntu-latest
        steps:
          - uses: actions/checkout@v4
          - uses: actions/setup-node@v4
            with:
              node-version: 20
    
          - name: Install dependencies
            run: npm ci
    
          - name: Install Playwright browsers
            run: npx playwright install --with-deps chromium
    
          - name: Run Playwright tests
            run: npx playwright test
            env:
              CI: true
    
          # Upload traces and screenshots when tests fail
          - name: Upload test artifacts
            if: failure()
            uses: actions/upload-artifact@v4
            with:
              name: playwright-traces
              path: test-results/
              retention-days: 7
    
    # After downloading the artifact, analyze the trace:
    # npx playwright show-trace test-results/my-test/trace.zip
    #
    # Or view traces in the HTML report:
    # npx playwright show-report

    Common pitfalls

    • Don't add retries as a first response to flakes. Retries are a safety net, not a fix. A test that needs 3 retries to pass is a test that will eventually fail at the worst possible time.
    • Don't use page.waitForTimeout() (hard-coded sleeps) to fix timing issues. It makes tests slow and still flaky because the timing is a guess. Wait for a specific condition instead.
    • Don't disable parallelism permanently. Setting workers: 1 stops data collision flakes but quadruples your CI time. Fix the shared data problem, then re-enable parallelism.
    • Don't ignore the first flake. The first time a test flakes is the easiest time to diagnose it. By the time it has flaked 20 times, everyone has learned to ignore it, and a real regression will hide behind it.
    • Don't blame Playwright. Most flakes come from test design (shared state, missing waits) or app behavior (animations, race conditions), not from the test framework itself.

    FAQ

    Why do Playwright tests pass locally but fail in CI?

    Three main reasons. First, CI runners have less CPU and memory, so operations take longer and timeout-sensitive assertions fail. Second, headless mode can render differently, causing layout-dependent locators to miss. Third, network latency to external services is higher and more variable in CI than on localhost.

    Should I use retries to handle flaky tests?

    Retries are a reasonable safety net, not a fix. Setting retries: 2 in your Playwright config will get your CI green, but every retried test is a flake you are choosing to ignore. Use retries to unblock the team while you investigate, but track retry rate as a metric and fix the root causes. If your retry rate climbs above 5%, you have a test quality problem.

    How do I debug a Playwright test that only flakes in CI?

    Enable trace capture with trace: 'on-first-retry' in your Playwright config. When a test flakes, the retry records a full trace with DOM snapshots, network requests, and timing data. Upload the test-results directory as a CI artifact, download it, and open with npx playwright show-trace. The trace shows you exactly what the page looked like at the moment of failure.

    Does Playwright have built-in flaky test detection?

    Not natively. Playwright has retries, which re-run failed tests, but it does not track flake history or classify failures as flaky vs genuine. You can build basic detection by parsing the JSON reporter output over multiple runs. For automatic classification, tools like Zerocheck track per-test flake rates across runs and auto-classify failures.

    When should I give up on fixing a flaky test?

    If you have spent more than 4 hours debugging a single flake without progress, step back and ask whether the test is covering something critical. If it is, rewrite the test from scratch with better isolation and explicit waits. If it is not, delete it. A flaky test that nobody trusts is worse than no test, because it trains the team to ignore CI failures.

    How to fix flaky Playwright tests in CI

    Skip the setup. Zerocheck handles it in plain English.

    See it run on your app