Noise Robust Speaker Recognition with Convolutive Sparse Coding

Antti Hurmalainen, Rahim Saeidi, Tuomas Virtanen

    Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

    13 Citations (Scopus)


    Recognition and classification of speech content in everyday environments is challenging due to the large diversity of real-world noise sources, which may also include competing speech. At signal-to-noise ratios below 0 dB, a majority of features may become corrupted, severely degrading the performance of classifiers built upon clean observations of a target class. As the energy and complexity of competing sources increase, their explicit modelling becomes integral for successful detection and classification of target speech. We have previously demonstrated how non-negative compositional modelling in a spectrogram space is suitable for robust recognition of speech and speakers even at low SNRs. In this work, the sparse coding approach is extended to cover the whole separation and classification chain to recognise the speaker of short utterances in difficult noise environments. A convolutive matrix factorisation and coding system is evaluated on 2nd CHiME Track 1 data. Over 98% average speaker recognition accuracy is achieved for shorter than three second utterances at +9 … -6 dB SNR, illustrating the system's performance in challenging conditions.
    Original languageEnglish
    Title of host publicationINTERSPEECH 2015
    Subtitle of host publication16th Annual Conference of the International Speech Communication Association
    PublisherInternational Speech Communication Association ISCA
    Number of pages5
    Publication statusPublished - 2015
    Publication typeA4 Article in conference proceedings
    EventInterspeech -
    Duration: 1 Jan 1900 → …

    Publication series

    NameAnnual Conference of the International Speech Communication Association
    ISSN (Print)1990-9772


    Period1/01/00 → …

    Publication forum classification

    • Publication forum level 1


    Dive into the research topics of 'Noise Robust Speaker Recognition with Convolutive Sparse Coding'. Together they form a unique fingerprint.

    Cite this