Blind Separation of Audio Mixtures Through Nonnegative Tensor Factorization of Modulation Spectrograms

Tom Barker, Tuomas Virtanen

    Research output: Contribution to journalArticleScientificpeer-review

    8 Citations (Scopus)


    This paper presents an algorithm for unsupervised single-channel source separation of audio mixtures. The approach specifically addresses the challenging case of separation where no training data are available. By representing mixtures in the modulation spectrogram (MS) domain, we exploit underlying similarities in patterns present across frequency. A three-dimensional tensor factorization is able to take advantage of these redundant patterns, and is used to separate a mixture into an approximated sum of components by minimizing a divergence cost. Furthermore, we show that the basic tensor factorization can be extended with convolution in time being used to improve separation results and provide update rules to learn components in such a manner. Following factorization, sources are reconstructed in the audio domain from estimated components using a novel approach based on reconstruction masks that are learned using MS activations, and then applied to a mixture spectrogram. We demonstrate that the proposed method produces superior separation performance to a spectrally based nonnegative matrix factorization approach, in terms of source-to-distortion ratio. We also compare separation with the perceptually motivated interference-related perceptual score metric and identify cases with higher performance.

    Original languageEnglish
    Pages (from-to)2377-2389
    Number of pages13
    JournalIeee-Acm transactions on audio speech and language processing
    Issue number12
    Publication statusPublished - 1 Dec 2016
    Publication typeA1 Journal article-refereed


    • Factorization
    • nonnegative matrix factorization (NMF)
    • source separation
    • speech enhancement

    Publication forum classification

    • Publication forum level 2

    ASJC Scopus subject areas

    • Signal Processing
    • Media Technology
    • Instrumentation
    • Acoustics and Ultrasonics
    • Linguistics and Language
    • Speech and Hearing
    • Electrical and Electronic Engineering


    Dive into the research topics of 'Blind Separation of Audio Mixtures Through Nonnegative Tensor Factorization of Modulation Spectrograms'. Together they form a unique fingerprint.

    Cite this