Testing and Continuous Improvement
Build a comprehensive accessibility testing program — combining AI-powered automated scans, manual expert reviews, and user testing with assistive technology users into a continuous improvement cycle.
Premium Course Content
This lesson is part of a premium course. Upgrade to Pro to unlock all premium courses and content.
- Access all premium courses
- 1000+ AI skill templates included
- New content added weekly
Beyond the Green Checkmark
🔄 Quick Recall: In the previous lesson, you applied inclusive design to real-world scenarios — forms, navigation, error states, and onboarding flows. You designed interactions that work for diverse abilities. Now you’ll build the testing and measurement systems that ensure accessibility improves continuously, not just during an annual audit.
Most organizations treat accessibility as a project: audit, fix, done. But websites change constantly — new features, new content, new team members. Without continuous testing, accessibility degrades over time. AI-powered testing tools make continuous monitoring practical.
The Three Layers of Accessibility Testing
| Layer | What It Tests | How Often | Who Does It |
|---|---|---|---|
| Automated scanning | Structural WCAG compliance (~30% of criteria) | Every PR/deploy | CI/CD pipeline |
| Manual expert review | Interaction quality, keyboard flow, screen reader experience (~70% of criteria) | Quarterly | Accessibility specialist |
| User testing | Real-world usability with assistive technology | Twice per year | People with disabilities |
All three layers are necessary. None is sufficient alone.
Layer 1: Automated Testing
Help me set up continuous automated accessibility testing.
Our environment:
- Website/app framework: [React/Vue/WordPress/etc.]
- CI/CD system: [GitHub Actions/GitLab/Jenkins]
- Current testing: [what automated tests do you run now?]
Design the automated testing layer:
PER-COMMIT CHECKS (fast, blocks deployment if fails):
- axe-core linting in development environment
- HTML validation (semantic elements, proper nesting)
- Color contrast check on changed components
PER-PR CHECKS (comprehensive, runs before merge):
- Full axe-core scan on changed pages/components
- Heading hierarchy validation
- Image alt attribute presence check
- Form label association check
- Focus indicator visibility check
- ARIA attribute validation
WEEKLY FULL-SITE SCAN:
- Crawl all public pages
- Generate trend report (improving, declining, stable?)
- Flag new issues introduced since last scan
- Track issue count over time
DASHBOARDS AND ALERTS:
- Accessibility score by page/section
- Regression alerts (new issues on previously clean pages)
- Trend charts for leadership reporting
Automated Testing Tools
| Tool | Best For | Integration |
|---|---|---|
| axe DevTools | Most comprehensive rule set, AI remediation suggestions | Browser extension, CI via axe-core |
| WAVE | Visual error overlay, good for manual spot-checks | Browser extension |
| Lighthouse | Quick scoring, built into Chrome | CLI for CI, Chrome DevTools |
| Pa11y | Full-site crawling with CI integration | CLI, CI/CD pipelines |
| TestSprite | AI-mapped WCAG criteria to code | SaaS platform |
✅ Quick Check: Why should automated accessibility tests run on every pull request, not just weekly? Because accessibility regressions are cheapest to fix when they’re caught during development. A developer who introduced a contrast issue in a PR they submitted 10 minutes ago can fix it in seconds. The same issue caught in a weekly scan requires context-switching, investigation, and potentially a new PR — 10x the effort. Shift-left accessibility testing treats accessibility like any other code quality metric.
Layer 2: Manual Expert Review
Automated tools can’t evaluate:
- Whether the tab order is logical
- Whether screen reader announcements make sense in context
- Whether content is cognitively clear
- Whether animations respect user preferences in all states
- Whether error messages are actually helpful
Help me create a manual accessibility audit checklist.
For each check, specify:
- What to test
- How to test it
- Pass/fail criteria
- Which WCAG criterion it maps to
KEYBOARD TESTING:
- Can every interactive element be reached by Tab?
- Is the tab order logical (matches visual flow)?
- Can dropdowns, modals, and tooltips be operated
without a mouse?
- Are there keyboard traps? (Tab in, can't Tab out)
- Are focus indicators visible on every focusable element?
SCREEN READER TESTING:
- Does the page make sense read linearly?
- Are headings descriptive and properly nested?
- Do images have meaningful alt text (not just present)?
- Are form instructions announced before the input?
- Are dynamic updates announced (aria-live regions)?
COGNITIVE REVIEW:
- Is language clear and concise?
- Are instructions unambiguous?
- Is the navigation predictable and consistent?
- Can the user recover from errors easily?
- Is there unnecessary cognitive load (animations,
complex layouts, information overload)?
MOTION AND SENSORY:
- Does prefers-reduced-motion disable all animations?
- Is there auto-playing media? Can it be paused?
- Is color used with redundant indicators (shape, text)?
Layer 3: User Testing
Help me plan an accessibility user testing session.
Product: [what we're testing]
Testing goals: [what we want to learn]
RECRUITMENT:
- 5-8 participants across disability types
- At least: 2 screen reader users, 1 keyboard-only user,
1 magnification user, 1 cognitive disability user
- Sources: [disability organizations, testing panels,
accessibility communities]
- Compensation: [market rate for their time + expertise]
TEST TASKS (5-7 real-world tasks):
1. [Task that tests navigation: "Find X on the site"]
2. [Task that tests forms: "Complete the signup process"]
3. [Task that tests content: "Find the return policy"]
4. [Task that tests interaction: "Add item to cart and checkout"]
5. [Task that tests error handling: "Try submitting
with an intentional error"]
SESSION FORMAT:
- 45-60 minutes per participant
- Remote (participants use own devices and AT setup)
- Think-aloud protocol
- Observe navigation patterns and workarounds
- Note where they pause, backtrack, or express frustration
ANALYSIS:
- Task completion rate by participant type
- Time-to-complete vs. benchmark (non-AT users)
- Severity-weighted issue list
- Prioritized recommendations
Measuring Progress
Track accessibility across all three layers:
| Metric | Source | Target | Frequency |
|---|---|---|---|
| Automated scan score | axe/Pa11y | 95%+ | Per deploy |
| Issues per page | Automated scan | <2 average | Weekly |
| Regression rate | CI/CD | 0 new critical issues/month | Per PR |
| Keyboard task completion | Manual audit | 100% | Quarterly |
| Screen reader task completion | User testing | 90%+ | Bi-annual |
| Time-to-complete ratio | User testing | AT users <2x non-AT | Bi-annual |
| User satisfaction | Post-test survey | 4/5+ average | Bi-annual |
✅ Quick Check: Why is the “time-to-complete ratio” (assistive technology users vs. non-AT users) one of the most important accessibility metrics? Because it measures real-world usability, not just compliance. If a sighted mouse user completes checkout in 2 minutes but a screen reader user takes 20 minutes, the site is technically accessible but practically unusable. The target of <2x means AT users should be able to complete the same task in no more than twice the time — aspirational but achievable for well-designed interfaces.
Key Takeaways
- Accessibility testing requires three layers: automated scanning (structural compliance), manual expert review (interaction quality), and user testing with assistive technology users (real-world usability)
- Automated scan scores (the 30% of issues tools detect) should not be used as the sole success metric — the remaining 70% of issues require human evaluation
- Run automated accessibility tests on every pull request (not just weekly) to catch regressions when they’re cheapest to fix
- User testing with 5-8 participants across disability types, using their own devices and settings, reveals experiential issues that no compliance check can identify
- The time-to-complete ratio (AT users vs. non-AT users) is the ultimate accessibility metric — targeting <2x for core tasks
Up Next: You’ll assemble everything into a complete accessibility program — with organizational structures, team practices, and continuous improvement processes that keep accessibility integrated into every stage of development.
Knowledge Check
Complete the quiz above first
Lesson completed!