Acoustical Society of America volume:11:1562-1573.
Do listeners process noisy speech by taking advantage of "glimpses"—spectrotemporal regions in
which the target signal is least affected by the background? This study used an automatic speech
recognition system, adapted for use with partially specified inputs, to identify consonants in noise.
Twelve masking conditions were chosen to create a range of glimpse sizes. Several different
glimpsing models were employed, differing in the local signal-to-noise ratio (SNR) used for
detection, the minimum glimpse size, and the use of information in the masked regions. Recognition
results were compared with behavioral data.Aquantitative analysis demonstrated that the proportion
of the time–frequency plane glimpsed is a good predictor of intelligibility. Recognition scores in
each noise condition confirmed that sufficient information exists in glimpses to support consonant
identification. Close fits to listeners’ performance were obtained at two local SNR thresholds: one
at around 8 dB and another in the range −5 to −2 dB. A transmitted information analysis revealed
that cues to voicing are degraded more in the model than in human auditory processing.