E2E Testing in CI/CD: The Complete Setup Guide

From GitHub Actions to GitLab CI, here's how to run E2E tests on every PR without slowing down your pipeline.

Why this is hard to test

  • E2E tests are slow - a full suite takes 20-45 minutes, blocking deploys and frustrating engineers
  • Browser tests in CI need headless browsers, display servers, and specific OS-level dependencies
  • Flaky tests in CI erode trust - 84% of CI failures are flaky, not real bugs
  • Parallelization is hard to configure correctly across shards, workers, and CI runners
  • Auth and test data management differ between local and CI environments in ways that cause silent failures

GitHub Actions + Playwright setup

  1. 1.Add a playwright.yml workflow triggered on pull_request events
  2. 2.Install browsers with npx playwright install --with-deps for headless CI compatibility
  3. 3.Configure sharding for parallelization using --shard=1/4 across multiple CI jobs
  4. 4.Use Playwright's built-in retry for flakiness with --retries=1 (one retry, not infinite)
  5. 5.Store auth state with storageState to skip login in every test and save minutes per run
  6. 6.Upload trace files and screenshots as artifacts on failure for debugging without re-running

Zerocheck CI integration

  1. 1.Install the Zerocheck GitHub App with no workflow YAML
  2. 2.Tests run automatically on every PR - no CI configuration or browser setup required
  3. 3.PR-diff-aware: generates test suggestions from your PR diff
  4. 4.Results posted as PR comments with screenshots and traces - no artifact digging
  5. 5.JSON run evidence attached to executed browser runs

GitHub Actions configuration

This workflow runs Playwright E2E tests on every pull request. It uses sharding to split the suite across 4 parallel jobs, caches browser binaries to avoid re-downloading on every run, and uploads traces and screenshots as artifacts when tests fail. The key details: the shardIndex matrix creates 4 jobs that each run a quarter of the suite, --retries=1 gives each test one retry to handle transient flakiness, and the artifacts step only triggers on failure to keep storage costs low.

name: Playwright E2E Tests

on:
  pull_request:
    branches: [main]

jobs:
  e2e:
    runs-on: ubuntu-latest
    strategy:
      fail-fast: false
      matrix:
        shardIndex: [1, 2, 3, 4]
        shardTotal: [4]
    steps:
      - uses: actions/checkout@v4

      - uses: actions/setup-node@v4
        with:
          node-version: 20
          cache: npm

      - name: Install dependencies
        run: npm ci

      - name: Cache Playwright browsers
        uses: actions/cache@v4
        with:
          path: ~/.cache/ms-playwright
          key: playwright-${{ runner.os }}-${{ hashFiles('package-lock.json') }}

      - name: Install Playwright browsers
        run: npx playwright install --with-deps chromium

      - name: Run E2E tests
        run: npx playwright test --shard=${{ matrix.shardIndex }}/${{ matrix.shardTotal }} --retries=1
        env:
          BASE_URL: ${{ vars.STAGING_URL }}

      - name: Upload test artifacts
        if: failure()
        uses: actions/upload-artifact@v4
        with:
          name: playwright-report-${{ matrix.shardIndex }}
          path: |
            playwright-report/
            test-results/
          retention-days: 7

GitLab CI configuration

This GitLab CI configuration achieves the same result: parallel E2E tests on merge requests with artifact upload on failure. The parallel keyword splits the job into 4 instances automatically (GitLab assigns CI_NODE_INDEX and CI_NODE_TOTAL), and Playwright's --shard flag uses those variables. Browser caching uses GitLab's cache mechanism keyed on the lock file. Artifacts are configured with when: on_failure so they only upload when tests fail, and expire_in: 7 days keeps storage costs manageable.

e2e-tests:
  stage: test
  image: mcr.microsoft.com/playwright:v1.49.0-noble
  parallel: 4
  rules:
    - if: $CI_PIPELINE_SOURCE == "merge_request_event"
  variables:
    BASE_URL: $STAGING_URL
  cache:
    key:
      files:
        - package-lock.json
    paths:
      - node_modules/
      - ~/.cache/ms-playwright/
  script:
    - npm ci
    - npx playwright install --with-deps chromium
    - npx playwright test --shard=$CI_NODE_INDEX/$CI_NODE_TOTAL --retries=1
  artifacts:
    when: on_failure
    paths:
      - playwright-report/
      - test-results/
    expire_in: 7 days

Common pitfalls

  • Don't run the full suite on every PR - use sharding or test selection to keep feedback under 10 minutes
  • Don't ignore flaky tests - quarantine them with a tracking issue and an owner assigned
  • Don't hard-code secrets in workflow files - use CI environment variables or secret managers
  • Don't skip browser installation caching - it saves 2-3 minutes per run and adds up fast
  • Don't run tests against production - use staging or preview deployments to avoid polluting real data

FAQ

How long should E2E tests take in CI?

Under 10 minutes for PR-level feedback. If your full suite takes 30+ minutes, use sharding to split it across parallel jobs (4 shards turns a 40-minute suite into 10 minutes). For even faster feedback, use PR-diff-aware test generation to create targeted tests from your PR diff.

Should E2E tests block merges?

Yes, but only if your suite is reliable. A suite with less than 5% flake rate should be a required check on PRs. If flakiness is higher, fix the flaky tests first - a gate that everyone ignores or overrides is worse than no gate at all.

How do you handle flaky tests in CI?

Use Playwright's --retries=1 flag to give each test one retry. Track tests that only pass on retry - those are your flaky tests. Quarantine any test above 5% flake rate: remove it from the required gate, assign an owner, and set a 2-sprint deadline to fix or delete it.

Can you run E2E tests in parallel in CI?

Yes. Playwright supports --shard=N/M to split tests across M parallel CI jobs. In GitHub Actions, use a matrix strategy. In GitLab CI, use the parallel keyword. Each shard runs a slice of the suite, and the CI platform reports the combined result. Start with 4 shards and adjust based on suite size.

E2E Testing in CI/CD: The Complete Setup Guide

Start with a URL, review the suggested tests, and run the approved suite in a hosted browser.

Get a demo