MaD TwinNet: Masker-Denoiser Architecture with Twin Networks for Monaural Sound Source Separation

Konstantinos Drossos, Stylianos Ioannis Mimilakis, Dmitriy Serdyuk, Gerald Schuller, Tuomas Virtanen, Yoshua Bengio

    Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

    10 Citations (Scopus)

    Abstract

    Monaural singing voice separation task focuses on the prediction of the singing voice from a single channel music mixture signal. Current state of the art (SOTA) results in monaural singing voice separation are obtained with deep learning based methods. In this work we present a novel recurrent neural approach that learns long-term temporal patterns and structures of a musical piece. We build upon the recently proposed Masker-Denoiser (MaD) architecture and we enhance it with the Twin Networks, a technique to regularize a recurrent generative network using a backward running copy of the network. We evaluate our method using the Demixing Secret Dataset and we obtain an increment to signal-to-distortion ratio (SDR) of 0.37 dB and to signal-to-interference ratio (SIR) of 0.23 dB, compared to previous SOTA results.
    Original languageEnglish
    Title of host publication2018 International Joint Conference on Neural Networks (IJCNN)
    PublisherIEEE
    ISBN (Electronic)978-1-5090-6014-6
    ISBN (Print)978-1-5090-6015-3
    DOIs
    Publication statusPublished - 10 Jul 2018
    Publication typeA4 Article in a conference publication
    EventInternational Joint Conference on Neural Networks -
    Duration: 1 Jan 1900 → …

    Publication series

    Name
    ISSN (Electronic)2161-4407

    Conference

    ConferenceInternational Joint Conference on Neural Networks
    Period1/01/00 → …

    Publication forum classification

    • Publication forum level 1

    Fingerprint

    Dive into the research topics of 'MaD TwinNet: Masker-Denoiser Architecture with Twin Networks for Monaural Sound Source Separation'. Together they form a unique fingerprint.

    Cite this