Multichannel Singing Voice Separation by Deep Neural Network Informed DOA Constrained CNMF

Antonio-Jesús Muñoz-Montoro, Julio J. Carabias-Orti, Archontis Politis, Konstantinos Drossos

Research output: Other conference contributionScientific

Abstract

This work addresses the problem of multichannel source separation combining two powerful approaches, multichannel spectral factorization with recent monophonic deep-learning (DL) based spectrum inference. Individual source spectra at different channels are estimated with a Masker-Denoiser Twin Network (MaD TwinNet), able to model long-term temporal patterns of a musical piece. The monophonic source spectrograms are used within a spatial covariance mixing model based on Complex Non-Negative Matrix Factorization (CNMF) that predicts the spatial characteristics of each source. The proposed framework is evaluated on the task of singing voice separation with a large multichannel dataset. Experimental results show that our joint DL+CNMF method outperforms both the individual monophonic DL-based separation and the multichannel CNMF baseline methods.
Original languageEnglish
Publication statusPublished - 2021
Publication typeNot Eligible
EventIEEE International Workshop on Multimedia Signal Processing - Tampere, Finland
Duration: 6 Oct 20218 Oct 2021
https://attend.ieee.org/mmsp-2021/

Conference

ConferenceIEEE International Workshop on Multimedia Signal Processing
Abbreviated titleIEEE MMSP 2021
Country/TerritoryFinland
CityTampere
Period6/10/218/10/21
Internet address

Fingerprint

Dive into the research topics of 'Multichannel Singing Voice Separation by Deep Neural Network Informed DOA Constrained CNMF'. Together they form a unique fingerprint.

Cite this