Supervised Noise Reduction for Multichannel Keyword Spotting


This paper presents a robust, small-footprint, far-field keyword spotting (KWS) algorithm, which was inspired by the human auditory system’s ability to achieve the so-called cocktail party effect in adverse acoustic environments. It introduces the idea of combining microphone-array speech enhancement with machine learning, by incorporating a feedback path from the neural network (NN) KWS classifier to its signal preprocessing frontend so that frontend noise reduction can benefit from, and in turn, better serve backend machine intelligence. We find that the new system can significantly improve KWS performance for Google Home when there is strong music or TV noise in the background. While this innovative and successfully validated strategy of combining signal processing and machine learning is developed for KWS, its technical feasibility is presumably extensible to many other applications, including noise robust speaker identification and automatic speech recognition.