Using a Cascade of Asymmetric Resonators with Fast-Acting Compression as a Cochlear Model for Machine-Hearing Applications


Every day, machines process many thousands of hours of audio signals through a realistic cochlear model. They extract features, inform classifiers and recommenders, and identify copyrighted material. The machine-hearing approach to such tasks has taken root in recent years, because hearing-based approaches perform better than we can do with more conventional sound-analysis approaches. We use a bio-mimetic "cascade of asymmetric resonators with fast-acting compression" (CAR-FAC)—an efficient sound analyzer that incorporates the hearing research community's findings on nonlinear auditory filter models and cochlear wave mechanics. The CAR-FAC is based on a pole–zero filter cascade (PZFC) model of auditory filtering, in combination with a multi-time-scale coupled automatic-gain-control (AGC) network. It uses simple nonlinear extensions of conventional digital filter stages, and runs fast due to its low complexity. The PZFC plus AGC network, the CAR-FAC, mimics features of auditory physiology, such as masking, compressive traveling-wave response, and the stability of zero-crossing times with signal level. Its output "neural activity pattern" is converted to a "stabilized auditory image" to capture pitch, melody, and other temporal and spectral features of the sound.