Testing and Continuous Improvement

Beyond the Green Checkmark

🔄 Quick Recall: In the previous lesson, you applied inclusive design to real-world scenarios — forms, navigation, error states, and onboarding flows. You designed interactions that work for diverse abilities. Now you’ll build the testing and measurement systems that ensure accessibility improves continuously, not just during an annual audit.

Most organizations treat accessibility as a project: audit, fix, done. But websites change constantly — new features, new content, new team members. Without continuous testing, accessibility degrades over time. AI-powered testing tools make continuous monitoring practical.

The Three Layers of Accessibility Testing

Layer	What It Tests	How Often	Who Does It
Automated scanning	Structural WCAG compliance (~30% of criteria)	Every PR/deploy	CI/CD pipeline
Manual expert review	Interaction quality, keyboard flow, screen reader experience (~70% of criteria)	Quarterly	Accessibility specialist
User testing	Real-world usability with assistive technology	Twice per year	People with disabilities

All three layers are necessary. None is sufficient alone.

Layer 1: Automated Testing

Help me set up continuous automated accessibility testing.

Our environment:
- Website/app framework: [React/Vue/WordPress/etc.]
- CI/CD system: [GitHub Actions/GitLab/Jenkins]
- Current testing: [what automated tests do you run now?]

Design the automated testing layer:

PER-COMMIT CHECKS (fast, blocks deployment if fails):
- axe-core linting in development environment
- HTML validation (semantic elements, proper nesting)
- Color contrast check on changed components

PER-PR CHECKS (comprehensive, runs before merge):
- Full axe-core scan on changed pages/components
- Heading hierarchy validation
- Image alt attribute presence check
- Form label association check
- Focus indicator visibility check
- ARIA attribute validation

WEEKLY FULL-SITE SCAN:
- Crawl all public pages
- Generate trend report (improving, declining, stable?)
- Flag new issues introduced since last scan
- Track issue count over time

DASHBOARDS AND ALERTS:
- Accessibility score by page/section
- Regression alerts (new issues on previously clean pages)
- Trend charts for leadership reporting

Automated Testing Tools

Tool	Best For	Integration
axe DevTools	Most comprehensive rule set, AI remediation suggestions	Browser extension, CI via axe-core
WAVE	Visual error overlay, good for manual spot-checks	Browser extension
Lighthouse	Quick scoring, built into Chrome	CLI for CI, Chrome DevTools
Pa11y	Full-site crawling with CI integration	CLI, CI/CD pipelines
TestSprite	AI-mapped WCAG criteria to code	SaaS platform

✅ Quick Check: Why should automated accessibility tests run on every pull request, not just weekly? Because accessibility regressions are cheapest to fix when they’re caught during development. A developer who introduced a contrast issue in a PR they submitted 10 minutes ago can fix it in seconds. The same issue caught in a weekly scan requires context-switching, investigation, and potentially a new PR — 10x the effort. Shift-left accessibility testing treats accessibility like any other code quality metric.

Layer 2: Manual Expert Review

Automated tools can’t evaluate:

Whether the tab order is logical
Whether screen reader announcements make sense in context
Whether content is cognitively clear
Whether animations respect user preferences in all states
Whether error messages are actually helpful

Help me create a manual accessibility audit checklist.

For each check, specify:
- What to test
- How to test it
- Pass/fail criteria
- Which WCAG criterion it maps to

KEYBOARD TESTING:
- Can every interactive element be reached by Tab?
- Is the tab order logical (matches visual flow)?
- Can dropdowns, modals, and tooltips be operated
  without a mouse?
- Are there keyboard traps? (Tab in, can't Tab out)
- Are focus indicators visible on every focusable element?

SCREEN READER TESTING:
- Does the page make sense read linearly?
- Are headings descriptive and properly nested?
- Do images have meaningful alt text (not just present)?
- Are form instructions announced before the input?
- Are dynamic updates announced (aria-live regions)?

COGNITIVE REVIEW:
- Is language clear and concise?
- Are instructions unambiguous?
- Is the navigation predictable and consistent?
- Can the user recover from errors easily?
- Is there unnecessary cognitive load (animations,
  complex layouts, information overload)?

MOTION AND SENSORY:
- Does prefers-reduced-motion disable all animations?
- Is there auto-playing media? Can it be paused?
- Is color used with redundant indicators (shape, text)?

Layer 3: User Testing

Help me plan an accessibility user testing session.

Product: [what we're testing]
Testing goals: [what we want to learn]

RECRUITMENT:
- 5-8 participants across disability types
- At least: 2 screen reader users, 1 keyboard-only user,
  1 magnification user, 1 cognitive disability user
- Sources: [disability organizations, testing panels,
  accessibility communities]
- Compensation: [market rate for their time + expertise]

TEST TASKS (5-7 real-world tasks):
1. [Task that tests navigation: "Find X on the site"]
2. [Task that tests forms: "Complete the signup process"]
3. [Task that tests content: "Find the return policy"]
4. [Task that tests interaction: "Add item to cart and checkout"]
5. [Task that tests error handling: "Try submitting
    with an intentional error"]

SESSION FORMAT:
- 45-60 minutes per participant
- Remote (participants use own devices and AT setup)
- Think-aloud protocol
- Observe navigation patterns and workarounds
- Note where they pause, backtrack, or express frustration

ANALYSIS:
- Task completion rate by participant type
- Time-to-complete vs. benchmark (non-AT users)
- Severity-weighted issue list
- Prioritized recommendations

Measuring Progress

Track accessibility across all three layers:

Metric	Source	Target	Frequency
Automated scan score	axe/Pa11y	95%+	Per deploy
Issues per page	Automated scan	<2 average	Weekly
Regression rate	CI/CD	0 new critical issues/month	Per PR
Keyboard task completion	Manual audit	100%	Quarterly
Screen reader task completion	User testing	90%+	Bi-annual
Time-to-complete ratio	User testing	AT users <2x non-AT	Bi-annual
User satisfaction	Post-test survey	4/5+ average	Bi-annual

✅ Quick Check: Why is the “time-to-complete ratio” (assistive technology users vs. non-AT users) one of the most important accessibility metrics? Because it measures real-world usability, not just compliance. If a sighted mouse user completes checkout in 2 minutes but a screen reader user takes 20 minutes, the site is technically accessible but practically unusable. The target of <2x means AT users should be able to complete the same task in no more than twice the time — aspirational but achievable for well-designed interfaces.

Key Takeaways

Accessibility testing requires three layers: automated scanning (structural compliance), manual expert review (interaction quality), and user testing with assistive technology users (real-world usability)
Automated scan scores (the 30% of issues tools detect) should not be used as the sole success metric — the remaining 70% of issues require human evaluation
Run automated accessibility tests on every pull request (not just weekly) to catch regressions when they’re cheapest to fix
User testing with 5-8 participants across disability types, using their own devices and settings, reveals experiential issues that no compliance check can identify
The time-to-complete ratio (AT users vs. non-AT users) is the ultimate accessibility metric — targeting <2x for core tasks

Up Next: You’ll assemble everything into a complete accessibility program — with organizational structures, team practices, and continuous improvement processes that keep accessibility integrated into every stage of development.