This study compared behavioural performance on a multispeaker speech-in-noise
task with that of a model inspired by automatic speech recognition techniques.
Listeners identied 3 keywords in simple 6-word sentences in speech-shaped noise
spoken by one of 18 male or 16 female speakers. An across-speaker analysis of a
number of acoustic parameters (vocal tract length, mean fundamental frequency
and speaking rate) found none to be consistently good predictors of relative intelligibility.
A simple measure of degree of energetic masking was a good predictor of
female speech intelligibility, especially in high noise conditions, but failed to account
for interspeaker dierences for the male group. A glimpsing model, which combined
a simulation of energetic masking with speaker-dependent statistical models, produced
recognition scores which were tted to the behavioural data pooled across
all speakers. Using a single set of speaker-independent, noise-level-indepedent parameters,
the model was able to predict not only the intelligibility of individual
speakers to a remarkable degree, but could also account for most of the token-wise
intelligibilities of the letter keywords. The t was particularly good in high noise