Coupled dictionaries for exemplar-based speech enhancement and automatic speech recognition

Deepak Baby, Tuomas Virtanen, Jort F. Gemmeke, Hugo Van hamme

    Research output: Contribution to journalArticleScientificpeer-review

    20 Citations (Scopus)

    Abstract

    Exemplar-based speech enhancement systems work by decomposing the noisy speech as a weighted sum of speech and noise exemplars stored in a dictionary and use the resulting speech and noise estimates to obtain a time-varying filter in the full-resolution frequency domain to enhance the noisy speech. To obtain the decomposition, exemplars sampled in lower dimensional spaces are preferred over the full-resolution frequency domain for their reduced computational complexity and the ability to better generalize to unseen cases. But the resulting filter may be sub-optimal as the mapping of the obtained speech and noise estimates to the full-resolution frequency domain yields a low-rank approximation. This paper proposes an efficient way to directly compute the full-resolution frequency estimates of speech and noise using coupled dictionaries: an input dictionary containing atoms from the desired exemplar space to obtain the decomposition and a coupled output dictionary containing exemplars from the full-resolution frequency domain. We also introduce modulation spectrogram features for the exemplar-based tasks using this approach. The proposed system was evaluated for various choices of input exemplars and yielded improved speech enhancement performances on the AURORA-2 and AURORA-4 databases. We further show that the proposed approach also results in improved word error rates (WERs) for the speech recognition tasks using HMM-GMM and deep-neural network (DNN) based systems.

    Original languageEnglish
    Pages (from-to)1788-1799
    Number of pages12
    JournalIeee-Acm transactions on audio speech and language processing
    Volume23
    Issue number11
    DOIs
    Publication statusPublished - 1 Nov 2015
    Publication typeA1 Journal article-refereed

    Keywords

    • Exemplar-based
    • Modulation envelope
    • Noise robust automatic speech recognition
    • Non-negative sparse coding

    Publication forum classification

    • Publication forum level 1

    ASJC Scopus subject areas

    • Signal Processing
    • Electrical and Electronic Engineering
    • Media Technology
    • Acoustics and Ultrasonics
    • Instrumentation
    • Linguistics and Language
    • Speech and Hearing

    Fingerprint

    Dive into the research topics of 'Coupled dictionaries for exemplar-based speech enhancement and automatic speech recognition'. Together they form a unique fingerprint.

    Cite this