AI

A subband-based stationary-component suppression method using harmanics and power ratio for reverberant speech recognition

Abstract

This letter describes a preprocessing method called subband-based stationary-component suppression method using harmonics and power ratio (SHARP) processing for reverberant speech recognition. SHARP processing extends a previous algorithm called Suppression of Slowly varying components and the Falling edge (SSF), which suppresses the steady-state portions of subband spectral envelopes. The SSF algorithm tends to over-subtract these envelopes in highly reverberant environments when there are high levels of power in previous analysis frames. The proposed SHARP method prevents excessive suppression both by boosting the floor value using the harmonics in voiced speech segments and by inhibiting the subtraction for unvoiced speech by detecting frames in which power is concentrated in high-frequency channels. These modifications enable the SHARP algorithm to improve recognition accuracy by further reducing the mismatch between power contours of clean and reverberated speech. Experimental results indicate that the SHARP method provides better recognition accuracy in highly reverberant environments compared to the SSF algorithm. It is also shown that the performance of the SHARP method can be further improved by combining it with feature-space maximum likelihood linear regression (fMLLR).