Model-driven detection of clean speech patches in noise
Authors
Conference
Year
Location
Links
Listeners may be able to recognise speech in adverse conditions by “glimpsing” time-frequency regions where the targetspeech is dominant. Previous computational attempts to identifysuch regions have been source-driven, using primitive cues.This paper describes a model-driven approach in which the likelihood of spectro-temporal patches of a noisy mixture representing speech is given by a generative model. The focus is on patch size and patch modelling. Small patches lead to a lack of discrimination, while large patches are more likely to contain contributions from other sources. A “cleanness” measure revealsthat a good patch size is one which extends over a quarter of thespeech frequency range and lasts for 40 ms. Gaussian mixturemodels are used to represent patches. A compact representationbased on a 2D discrete cosine transform leads to reasonablespeech/background discrimination.