Semi-supervised non-negative tensor factorisation of modulation spectrograms for monaural speech separation

T. Barker, T. Virtanen

    Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

    6 Citations (Scopus)

    Abstract

    This paper details the use of a semi-supervised approach to audio source separation. Where only a single source model is available, the model for an unknown source must be estimated. A mixture signal is separated through factorisation of a feature-tensor representation, based on the modulation spectrogram. Harmonically related components tend to modulate in a similar fashion, and this redundancy of patterns can be isolated. This feature representation requires fewer parameters than spectrally based methods and so minimises overfitting. Following the tensor factorisation, the separated signals are reconstructed by learning appropriate Wiener-filter spectral parameters which have been constrained by activation parameters learned in the first stage. Strong results were obtained for two-speaker mixtures where source separation performance exceeded those used as benchmarks. Specifically, the proposed semi-supervised method outperformed both semi-supervised non-negative matrix factorisation and blind non-negative modulation spectrum tensor factorisation.
    Original languageEnglish
    Title of host publicationNeural Networks (IJCNN), 2014 International Joint Conference on
    Pages3556-3561
    Number of pages6
    DOIs
    Publication statusPublished - 1 Jul 2014
    Publication typeA4 Article in conference proceedings
    EventInternational Joint Conference on Neural Networks -
    Duration: 1 Jan 1900 → …

    Conference

    ConferenceInternational Joint Conference on Neural Networks
    Period1/01/00 → …

    Keywords

    • Wiener filters
    • audio signal processing
    • matrix decomposition
    • signal reconstruction
    • source separation
    • speech processing
    • tensors
    • Wiener-filter spectral parameters
    • activation parameters
    • audio source separation
    • blind nonnegative modulation spectrum tensor factorisation
    • feature-tensor representation factorisation
    • harmonically-related component
    • mixture signal separation
    • modulation spectrograms
    • monaural speech separation
    • semisupervised nonnegative matrix factorisation
    • semisupervised nonnegative tensor factorisation
    • signal separation reconstruction
    • single-source model
    • source separation performance
    • spectrally-based method
    • two-speaker mixtures
    • Equations
    • Mathematical model
    • Modulation
    • Source separation
    • Spectrogram
    • Tensile stress
    • Training

    Publication forum classification

    • Publication forum level 1

    Fingerprint

    Dive into the research topics of 'Semi-supervised non-negative tensor factorisation of modulation spectrograms for monaural speech separation'. Together they form a unique fingerprint.

    Cite this