Journal of the Phonetic Society of Japan volume:19:1-12.
The extent to which language learners hear non-native sounds in terms of native categories depends in part on acoustic and auditory similarities between the two sets of sounds. One unresolved issue is the choice of parameter space in which similarity should be measured. The current paper demonstrates the application of an unsupervised, corpus-based, data-driven mapping technique which permits the use of rich, high-dimensional data representations, obviating the need for prior commitment to specific low-order speech parameters such as formant frequencies. The approach, known as generative topographic mapping, preserves the structure of the high-dimensional space while mapping to a lower-dimensional space. We show how this low-dimensional latent space can be used for tasks such as visualising the location of L2 consonants in an existing L1 space and measuring the effect of L2 exposure on the representation of both L2 and L1 consonants by comparison with data from a behavioural study in which Chinese listeners underwent an intensive training regime on Spanish consonants.