AI

Kernel Based Text-Independnent Speaker Verification

Abstract

The goal of a person authentication system is to authenticate the claimed identity of a user. When this authentication is based on the voice of the user, without respect of what the user exactly said, the system is called a text-independent speaker verification system. Speaker verification systems are increasingly often used to secure personal information, particularly for mobile phone based applications. Furthermore, text-independent versions of speaker verification systems are the most used for their simplicity, as they do not require complex speech recognition modules. The most common approach to this task is based on Gaussian Mixture Models (GMMs), which do not take into account any temporal information. GMMs have been intensively used thanks to their good performance, especially with the use of the Maximum A Posteriori (MAP) adaptation algorithm. This approach is based on the density estimation of an impostor data distribution, followed by its adaptation to a specific client data set. Note that the estimation of these densities is not the final goal of speaker verification systems, which is rather to discriminate the client and impostor classes; hence discriminative approaches might appear good candidates for this task as well. As a matter of fact, Support Vector Machine (SVM) based systems have been the subject of several recent publications in the speaker verification community, in which they obtain similar to or even better performance than GMMs on several text-independent speaker verification tasks. In order to use SVMs or any other discriminant approaches for speaker verification, several modifications from the classical techniques need to be performed. The purpose of this chapter is to present an overview of discriminant approaches that have been used successfully for the task of text-independent speaker verification, to analyze their difference and their similarities with each other and with classical generative approaches based on GMMs. An open-source version of the C++ source code used to performed all experiments described in this chapter can be found at http://speaker.abracadoudou.com.