Publications

Model-driven detection of clean speech patches in noise

Authors
Jonny Laidler, Martin Cooke, Neil Lawrence.
Conference
Interspeech
Year
2007
Location
Antwerp
Links

Listeners may be able to recognise speech in adverse conditions by “glimpsing” time-frequency regions where the targetspeech is dominant. Previous computational attempts to identifysuch regions have been source-driven, using primitive cues.This paper describes a model-driven approach in which the likelihood of spectro-temporal patches of a noisy mixture representing speech is given by a generative model. The focus is on patch size and patch modelling. Small patches lead to a lack of discrimination, while large patches are more likely to contain contributions from other sources. A “cleanness” measure revealsthat a good patch size is one which extends over a quarter of thespeech frequency range and lasts for 40 ms. Gaussian mixturemodels are used to represent patches. A compact representationbased on a 2D discrete cosine transform leads to reasonablespeech/background discrimination.