Your Playwright tests pass locally but fail in CI. Here are the 6 most common causes and the exact code to fix each one.
Playwright's auto-wait is good. It checks that an element is visible, enabled, and stable before interacting with it. But "stable" in Playwright's definition means the element's bounding box hasn't changed between two animation frames. That is not the same as "ready for interaction." Three categories of race conditions survive auto-wait. First, CSS animations and transitions. A modal that slides in over 300ms is technically visible and stable midway through the animation, but clicking a button inside it while it is still moving can miss the target. Second, debounced inputs. If your search field waits 500ms after the last keystroke before firing a request, Playwright will type and immediately try to assert results, long before the debounce fires. Third, SPA route transitions. Client-side navigation with React Router, Next.js, or similar frameworks can update the URL before the destination component has mounted and rendered data. The fix is different for each case. For animations, wait for the animation to complete using waitForFunction that checks getComputedStyle or a specific CSS class. For debounced inputs, wait for the network request that the debounce triggers. For SPA transitions, wait for the actual content to appear on the destination page, not just the URL change. The code below shows a common flaky pattern and the fixed version for each case.
import { test, expect } from "@playwright/test";
// BAD: Clicking a button inside an animating modal
test("flaky modal interaction", async ({ page }) => {
await page.goto("/dashboard");
await page.getByRole("button", { name: "New project" }).click();
// Modal is animating in - click may miss
await page.getByRole("button", { name: "Create" }).click();
});
// FIXED: Wait for animation to complete
test("stable modal interaction", async ({ page }) => {
await page.goto("/dashboard");
await page.getByRole("button", { name: "New project" }).click();
// Wait for the modal to finish its CSS transition
const modal = page.getByRole("dialog");
await modal.waitFor({ state: "visible" });
await page.waitForFunction(() => {
const el = document.querySelector('[role="dialog"]');
if (!el) return false;
return getComputedStyle(el).opacity === "1";
});
await modal.getByRole("button", { name: "Create" }).click();
});
// BAD: Asserting search results before debounce fires
test("flaky search", async ({ page }) => {
await page.goto("/users");
await page.getByPlaceholder("Search").fill("alice");
// Debounce hasn't fired yet - results are stale
await expect(page.getByText("[email protected]")).toBeVisible();
});
// FIXED: Wait for the search request to complete
test("stable search", async ({ page }) => {
await page.goto("/users");
const searchResponse = page.waitForResponse(
(resp) => resp.url().includes("/api/users") && resp.status() === 200
);
await page.getByPlaceholder("Search").fill("alice");
await searchResponse;
await expect(page.getByText("[email protected]")).toBeVisible();
});Playwright's auto-wait operates within a single frame. When your app embeds cross-origin iframes (Stripe Elements, OAuth consent screens, embedded analytics, chat widgets), the auto-wait boundary stops at the iframe edge. Your test needs to explicitly cross into the iframe context and wait for content inside it independently. The most common flake pattern is calling frameLocator() and immediately trying to interact with an element inside the iframe before that iframe has finished loading its own content. The iframe element itself may be present in the parent DOM, but the content inside it is still fetching scripts, rendering components, or waiting on its own network requests. Stripe Elements is the classic example. The iframe appears in the DOM almost instantly, but the card input field inside it takes 1 to 3 seconds to become interactive, depending on network speed and Stripe's CDN latency. In CI, this latency is often higher than on your local machine. OAuth popups add another layer. Some providers (Google, GitHub) open a new browser context rather than an iframe. Playwright can handle this with page.waitForEvent('popup'), but the popup's content has its own load lifecycle that you need to wait for independently. The fix for all iframe timing issues is the same pattern: locate the frame, then wait for a specific element inside it to reach the state you need before interacting.
import { test, expect } from "@playwright/test";
// BAD: Interacting with iframe content too early
test("flaky iframe interaction", async ({ page }) => {
await page.goto("/checkout");
const stripeFrame = page.frameLocator(
'iframe[src*="js.stripe.com"]'
);
// iframe DOM exists but Stripe JS hasn't rendered the input yet
await stripeFrame
.locator('[placeholder="Card number"]')
.fill("4242424242424242");
});
// FIXED: Wait for iframe content to be ready
test("stable iframe interaction", async ({ page }) => {
await page.goto("/checkout");
const stripeFrame = page.frameLocator(
'iframe[src*="js.stripe.com"]'
);
// Wait for the input to be visible inside the iframe
const cardInput = stripeFrame.locator('[placeholder="Card number"]');
await expect(cardInput).toBeVisible({ timeout: 10000 });
await cardInput.fill("4242424242424242");
});
// Handling OAuth popup windows
test("OAuth login via popup", async ({ page }) => {
await page.goto("/login");
// Listen for the popup before triggering it
const popupPromise = page.waitForEvent("popup");
await page.getByRole("button", { name: "Sign in with Google" }).click();
const popup = await popupPromise;
// Wait for the popup to finish loading
await popup.waitForLoadState("domcontentloaded");
await popup.getByLabel("Email").fill("[email protected]");
await popup.getByRole("button", { name: "Next" }).click();
// Popup closes after auth, control returns to main page
await expect(page.getByText("Welcome back")).toBeVisible({
timeout: 15000,
});
});This is the flake that makes you question your career choices. Every test passes when run alone. Every test passes when run in a specific order. But run them all in parallel, and 3 out of 40 fail randomly. The root cause is shared mutable state. Tests that read from or write to the same database rows, the same browser storage, or the same API resources will interfere with each other when execution order changes. Playwright runs test files in parallel by default (one worker per CPU core), and the order within a file is sequential but the order across files is not deterministic. Common patterns that cause this: test A creates a user with email [email protected], test B also creates a user with email [email protected], whoever runs second gets a unique constraint violation. Or test A deletes all items from a list, test B asserts the list has 3 items, and B fails because A ran first. The fix is test isolation. Each test should create its own data and not depend on data created by other tests. Use unique identifiers (timestamps, UUIDs) for test entities. Use beforeEach to set up fresh state and afterEach to clean up. For tests that genuinely depend on a sequence (like a multi-step workflow), use test.describe.serial() to force sequential execution within that describe block. Playwright's storageState feature is also useful for isolation. You can capture authenticated state in a setup step and reuse it across tests without them sharing a live session.
import { test, expect } from "@playwright/test";
// BAD: Tests share the same user data
test("create project", async ({ page }) => {
await page.goto("/projects");
await page.getByRole("button", { name: "New" }).click();
// This name collides when tests run in parallel
await page.getByLabel("Name").fill("My Project");
await page.getByRole("button", { name: "Create" }).click();
});
// FIXED: Unique data per test run
test("create project (isolated)", async ({ page }) => {
const uniqueName = `Project-${Date.now()}-${Math.random()
.toString(36)
.slice(2, 7)}`;
await page.goto("/projects");
await page.getByRole("button", { name: "New" }).click();
await page.getByLabel("Name").fill(uniqueName);
await page.getByRole("button", { name: "Create" }).click();
await expect(page.getByText(uniqueName)).toBeVisible();
});
// Sequential tests for dependent workflows
test.describe.serial("checkout workflow", () => {
test("add item to cart", async ({ page }) => {
await page.goto("/products/widget-1");
await page.getByRole("button", { name: "Add to cart" }).click();
await expect(page.getByTestId("cart-count")).toHaveText("1");
});
test("complete checkout", async ({ page }) => {
await page.goto("/cart");
await page.getByRole("button", { name: "Checkout" }).click();
await expect(page).toHaveURL(/\/confirmation/);
});
});
// Shared auth state without shared sessions
// In playwright.config.ts:
// { storageState: '.auth/user.json' }
// In auth.setup.ts:
test("authenticate", async ({ page }) => {
await page.goto("/login");
await page.getByLabel("Email").fill("[email protected]");
await page.getByLabel("Password").fill(process.env.CI_TEST_PASSWORD!);
await page.getByRole("button", { name: "Log in" }).click();
await page.waitForURL("/dashboard");
// Save signed-in state for other tests to reuse
await page.context().storageState({ path: ".auth/user.json" });
});Your laptop has 16GB of RAM, an 8-core CPU, and a fast SSD. A standard GitHub Actions runner has 7GB of RAM, 2 vCPUs, and a network-attached disk. That difference alone explains why a test that takes 2 seconds locally takes 8 seconds in CI, and why an assertion with a 5-second timeout passes locally but fails in CI. Beyond raw performance, there are specific environment differences that cause flakes. Headless Chromium in CI may render fonts differently if the system font set differs from your local machine. This affects screenshot comparisons and any test that depends on text measurement or layout calculations. Timezone differences can break date-related assertions. A test that asserts "Today" shows "March 29" will fail in a CI runner set to UTC if you wrote it in UTC-5. Container memory limits are another silent killer. If your Playwright test spawns a browser that consumes more memory than the container allows, the Linux OOM killer will terminate the browser process mid-test. The error message is usually cryptic: "browser has been closed" or "target closed." You will spend hours looking for a bug in your test code when the real problem is a 512MB memory limit in your Docker config. The Playwright config below shows settings tuned for CI stability. The key changes: reduce workers to limit memory pressure, increase timeouts to account for slower execution, enable retries as a safety net (with the caveat that retries mask flakes rather than fixing them), and set explicit viewport and timezone.
// playwright.config.ts - CI-optimized settings
import { defineConfig, devices } from "@playwright/test";
export default defineConfig({
// Increase global timeout for slower CI runners
timeout: 60_000,
expect: {
timeout: 10_000,
},
// Retries catch genuine flakes but mask root causes.
// Use 2 in CI, 0 locally so you feel the pain.
retries: process.env.CI ? 2 : 0,
// Reduce workers in CI to lower memory pressure.
// Default is 50% of CPU cores.
workers: process.env.CI ? 1 : undefined,
use: {
headless: true,
viewport: { width: 1280, height: 720 },
// Consistent timezone across all environments
timezoneId: "America/New_York",
locale: "en-US",
// Capture trace on first retry for debugging CI flakes
trace: "on-first-retry",
// Capture screenshot on failure
screenshot: "only-on-failure",
// Increase navigation timeout for slow CI networks
navigationTimeout: 30_000,
actionTimeout: 15_000,
},
projects: [
{
name: "chromium",
use: { ...devices["Desktop Chrome"] },
},
],
// CI-specific web server config
webServer: {
command: "npm run start",
port: 3000,
// Give the server more time to start in CI
timeout: 120_000,
reuseExistingServer: !process.env.CI,
},
});When a test only flakes in CI and you cannot reproduce it locally, the trace viewer is your best diagnostic tool. A Playwright trace captures a complete recording of the test execution: every DOM snapshot, every network request, every console message, every action your test performed with exact timing. The recommended config is trace: 'on-first-retry'. This means Playwright runs the test normally on the first attempt (no trace overhead), and if it fails, the retry captures a full trace. This keeps your CI fast on green runs and gives you detailed diagnostics when something flakes. Once the trace is captured, you need to get it off the CI runner. Traces are saved as .zip files in the test-results directory. Configure your CI to upload this directory as a build artifact. In GitHub Actions, use the actions/upload-artifact step. In other CI systems, configure equivalent artifact storage. To analyze the trace, download the artifact and open it with npx playwright show-trace trace.zip. The trace viewer shows a timeline of every action, a DOM snapshot at each point, network requests with timing, and console logs. You can step through the test frame by frame and see exactly what the page looked like when the assertion failed. The key insight traces reveal for CI-specific flakes is timing. You can see that a click happened 50ms before an element finished animating, or that a network request took 4 seconds in CI versus 200ms locally. This concrete timing data turns a "works on my machine" problem into a specific, fixable issue.
# .github/workflows/e2e.yml
name: E2E Tests
on: [push, pull_request]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: 20
- name: Install dependencies
run: npm ci
- name: Install Playwright browsers
run: npx playwright install --with-deps chromium
- name: Run Playwright tests
run: npx playwright test
env:
CI: true
# Upload traces and screenshots when tests fail
- name: Upload test artifacts
if: failure()
uses: actions/upload-artifact@v4
with:
name: playwright-traces
path: test-results/
retention-days: 7
# After downloading the artifact, analyze the trace:
# npx playwright show-trace test-results/my-test/trace.zip
#
# Or view traces in the HTML report:
# npx playwright show-reportThree main reasons. First, CI runners have less CPU and memory, so operations take longer and timeout-sensitive assertions fail. Second, headless mode can render differently, causing layout-dependent locators to miss. Third, network latency to external services is higher and more variable in CI than on localhost.
Retries are a reasonable safety net, not a fix. Setting retries: 2 in your Playwright config will get your CI green, but every retried test is a flake you are choosing to ignore. Use retries to unblock the team while you investigate, but track retry rate as a metric and fix the root causes. If your retry rate climbs above 5%, you have a test quality problem.
Enable trace capture with trace: 'on-first-retry' in your Playwright config. When a test flakes, the retry records a full trace with DOM snapshots, network requests, and timing data. Upload the test-results directory as a CI artifact, download it, and open with npx playwright show-trace. The trace shows you exactly what the page looked like at the moment of failure.
Not natively. Playwright has retries, which re-run failed tests, but it does not track flake history or classify failures as flaky vs genuine. You can build basic detection by parsing the JSON reporter output over multiple runs. For automatic classification, tools like Zerocheck track per-test flake rates across runs and auto-classify failures.
If you have spent more than 4 hours debugging a single flake without progress, step back and ask whether the test is covering something critical. If it is, rewrite the test from scratch with better isolation and explicit waits. If it is not, delete it. A flaky test that nobody trusts is worse than no test, because it trains the team to ignore CI failures.
Skip the setup. Zerocheck handles it in plain English.
See it run on your app