Abstract
Deep neural network (DNN) based acoustic modelling has been successfully used for a variety of automatic speech recognition (ASR) tasks, thanks to its ability to learn higher-level information using multiple hidden layers. This paper investigates the recently proposed exemplar-based speech enhancement technique using coupled dictionaries as a pre-processing stage for DNN-based systems. In this setting, the noisy speech is decomposed as a weighted sum of atoms in an input dictionary containing exemplars sampled from a domain of choice, and the resulting weights are applied to a coupled output dictionary containing exemplars sampled in the short-time Fourier transform (STFT) domain to directly obtain the speech and noise estimates for speech enhancement. In this work, settings using input dictionary of exemplars sampled from the STFT, Mel-integrated magnitude STFT and modulation envelope spectra are evaluated. Experiments performed on the AURORA-4 database revealed that these pre-processing stages can improve the performance of the DNN-HMM-based ASR systems with both clean and multi-condition training.
Original language | English |
---|---|
Title of host publication | ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings |
Publisher | IEEE |
Pages | 4485-4489 |
Number of pages | 5 |
ISBN (Print) | 9781467369978 |
DOIs | |
Publication status | Published - 4 Aug 2015 |
Publication type | A4 Article in conference proceedings |
Event | IEEE International Conference on Acoustics, Speech and Signal Processing - Duration: 1 Jan 1900 → 1 Jan 2000 |
Conference
Conference | IEEE International Conference on Acoustics, Speech and Signal Processing |
---|---|
Period | 1/01/00 → 1/01/00 |
Keywords
- coupled dictionaries
- deep neural networks
- modulation envelope
- non-negative matrix factorisation
- speech enhancement
Publication forum classification
- Publication forum level 1
ASJC Scopus subject areas
- Signal Processing
- Software
- Electrical and Electronic Engineering