DESIGN2024-03-29BY MARA SABOGAL

Story: the accessibility regression our visual tests missed

Visual regression tests said everything was fine. Keyboard users and screen readers disagreed. This is how we found and fixed the gap.

designaccessibilitytestingux

What happened

A release changed the layout of a critical settings page.

It wasn’t glamorous work: we were cleaning up a long, scrolling form into something more modular.

We had modern tooling in place:

visual regression tests for the main flows
automated checks for basic contrast and ARIA attributes
a design system with accessible components

The screenshots looked fine.

The automated checks passed.

The page shipped.

A few days later, support started forwarding a specific pattern of complaints:

"I can’t tell what field I’m in."
"When I tab, the focus jumps in a weird order."
"My screen reader is reading parts of the form out of order."

All from people using keyboard navigation or assistive tech.

Our visual tests had passed because, visually, the page looked better.

The regression lived in the interaction model.

Where the tests stopped

Our visual tests compared before/after screenshots.

They verified:

labels were present
fields aligned roughly the same way
major elements didn’t disappear

Our automated accessibility checks verified:

required ARIA attributes existed
color contrast stayed above thresholds

What they didn’t verify was:

focus order
keyboard traps
the reading order for screen readers

We had assumed that using the design system correctly was enough.

We learned it wasn’t, especially once we started composing components into more complex layouts.

What we changed

1. Add focus order to the definition of done

We updated our design and engineering checklists for any form or multi-step flow:

tab order matches visual and logical order
focus indicators are visible and consistent
modals and overlays trap focus correctly and restore it when closed

Designers started annotating focus paths in their specs for complex screens.

Engineers added simple tests where possible:

unit or integration tests that simulate key presses and assert which element is focused

2. Include screen reader passes in critical flows

We don’t have the capacity to run full manual audits on every screen.

We do have the capacity to run focused checks on critical flows (like settings, billing, and recovery):

use a screen reader to navigate the page
listen for confusing or out-of-order announcements
verify that grouped fields (like address sections) are announced in a way that makes sense

We treat this like performance spot-checks: targeted, repeatable, and part of the definition of done for those flows.

3. Extend our automation just enough

We added a small layer on top of our existing tests:

smoke tests that move focus through a page and assert that it doesn’t get trapped
checks that headings and landmarks appear in an expected order

These don’t replace manual checks, but they catch the most obvious regressions.

4. Capture accessibility bugs as operational issues

In our incident and bug triage, we:

started tagging accessibility regressions explicitly
looked at their impact in terms of support load and task completion, not just "UI correctness"

For this incident, we measured:

increased support contacts from keyboard-only users
completion rates for the settings change before and after the fix

This gave the issue the same weight we’d give to a performance regression.

5. Feed the lessons back into the design system

The specific bug involved a component used across multiple pages.

We:

fixed the component to enforce a safer default focus order
updated its documentation with explicit accessibility guidance
added one example focused on keyboard-only navigation

The goal was to make the easiest way to use the component also the safest.

Takeaways

Visual correctness is not interaction correctness.
Automated accessibility checks are useful, but they don’t guarantee a good experience for keyboard and screen reader users.
Adding a small number of focused manual checks in critical flows can catch high-impact regressions.
Design systems need to encode not just how things look, but how they behave for different input and assistive technologies.