Researchers map accent bias in speech recognition to specific neural subspaces
A new audit technique called ACES reveals that accent-discriminative information in speech recognition models concentrates in low-dimensional subspaces at early layers. Testing Wav2Vec2-base on five English accents, researchers found accent data clusters in layer 3 with just 8 dimensions, but attempting to remove it paradoxically worsens fairness.
Researchers Map Accent Bias in Speech Recognition to Specific Neural Subspaces
A new analysis technique reveals why accent bias persists in automatic speech recognition (ASR) models and why simple debiasing approaches fail: accent information is deeply entangled with the acoustic features models need to understand speech at all.
The research, published on arXiv as "ACES: Accent Subspaces for Coupling, Explanations, and Stress-Testing in Automatic Speech Recognition," introduces a representation-centric audit method that pinpoints exactly where and how accent information flows through ASR systems.
Key Findings
Testing on Wav2Vec2-base with five English accents, the researchers made three critical discoveries:
Accent information concentrates in a compact subspace. Accent-discriminative data doesn't scatter across the entire model. Instead, it clusters in a low-dimensional subspace at layer 3 with just k=8 dimensions—less than 1% of the layer's total representation space.
The subspace correlates with performance gaps. The magnitude of projection onto this accent subspace correlates with per-utterance word error rate (WER) at r=0.26. More importantly, when researchers applied perturbations constrained to this subspace, the coupling between representation shift and WER degradation strengthened dramatically to r=0.32—compared to just r=0.15 for random subspace controls. This suggests accent information doesn't exist in isolation but is mechanistically tied to recognition performance.
Removing it makes things worse. The most striking finding: linear attenuation (zeroing out) of the accent subspace does not reduce performance disparities and slightly worsens them. This indicates that accent-relevant features are not redundant noise but entangled with recognition-critical acoustic cues that the model actually needs.
What This Means
The ACES audit reframes the accent bias problem in ASR. Accent disparities aren't a debiasing challenge that can be solved through simple feature removal or regularization. Instead, they reflect a fundamental architectural limitation: models learn accent and acoustic phonetics together because they're inherently related in speech signals.
This finding has practical implications for fairness work in ASR. Rather than attempting erasure-based debiasing, which the research shows backfires, developers may need to focus on training data diversity, model capacity for accent variation, or architectural changes that learn accent-invariant representations from the ground up.
The work also establishes accent subspaces as a diagnostic tool. By analyzing how models distribute accent information across layers and dimensions, researchers can identify where and why specific accents underperform—moving beyond black-box fairness metrics toward mechanistic understanding.
The research uses publicly available Wav2Vec2-base model and standard English accent datasets, making the audit method reproducible. The paper does not propose a complete solution to accent bias, but instead provides the analytical framework needed to understand why previous solutions have been insufficient.