Models which describe the perception of foreign language sounds typically do so using qualitative relations with sounds in the first language, but a more fine-grained account of learners’ perceptual difficulties might be obtained via techniques from automatic speech recognition. Here, we employ generative statistical models of speech – Hidden Markov Models – which underly most work in speech recognition, to learn the sound systems of English and Mandarin Chinese, and use these as the basis for a quantitative model of the perception of English intervocalic consonants by Chinese listeners. This approach allows both the prediction of consonant identification preferences and the construction of a complete cross-language consonant distance matrix, which can then be compared to consonant categorisation and goodness ratings respectively. To evaluate the model, 30 native Chinese listeners with moderate English competence and residing in China categorised and rated English intervocalic consonants. A high degree of consistency was found between listeners and the computer model for sound categorisation as well as a clear and significant correlation between goodness ratings and distance measurements, suggesting that the technique has potential in predicting sound-level difficulties experienced by language learners. One key difference between listeners and the model is in the relative influence of acoustic and orthographic factors in sound categorisation: while model decisions and distances are based solely on acoustic information, clear evidence of orthographic interference can be seen in listener responses. We explored this issue by comparing the use of Chinese characters versus Pinyin graphemes on the task. Listeners who were presented with Pinyin showed some orthographic influences, while a dialect influence occurred in the character group.