Traditional intelligibility models are concerned with predicting the average number of words heard correctly in given noise conditions and can be readily tested by comparison with listener data. In contrast, recent ‘microscopic’ intelligibility models, which attempt to make precise predictions about a listener’s perception or misperception of specific utterances, have less well-defined
goals and are hard to compare. This paper presents a novel evaluation framework that proposes a standardised procedure for microscopic model evaluation. The key problem is how to compare a model prediction made at a sub lexical granularity with a set of listeners’ lexical responses, especially considering that not all listeners will hear a given noisy target word in the same way. Our approach is to phonetically align the target word
to each listener’s response and then, for each position in the target word, calculate both the probability of a phonetic error occurring and the probability distribution of possible phonetic substitutions. Models can then be evaluated according to their ability to estimate these probabilities using likelihood as a measure. This approach has been built into a framework for model evaluation and demonstrated using a recently released corpus of consistent speech misperceptions and a set of naive intelligibility models.