AI

A FREQUENCY-WEIGHTED POST-FILTERING TRANSFORM FOR COMPENSATION OF THE OVER-SMOOTHING EFFECT IN HMM-BASED SPEECH SYNTHESIS

Abstract

Over-smoothing is one of the major sources of quality degradation in statistical parametric speech synthesis. Many methods have been proposed to compensate over-smoothing with the speech parameter generation algorithm considering Global Variance (GV) being one of the most successfull. This paper models over-smoothing as a radial relocation of poles and zeros of the spectral envelope towards the origin of the z-plane and uses radial scaling to enhance spectral peaks and to deepen spectral valeys. The radial scaling technique is improved by introducing over-emphasis, spectral-tilt compensation and frequency weighting. Listening test results indicate that the proposed method is 11%-13% more preferable than GV while it has less algorithmic delay (only 5 ms) and computational complexity.