Abstract
This work addresses the problem of multichannel source separation combining two powerful approaches, multichannel spectral factorization with recent monophonic deep-learning (DL) based spectrum inference. Individual source spectra at different channels are estimated with a Masker-Denoiser Twin Network (MaD TwinNet), able to model long-term temporal patterns of a musical piece. The monophonic source spectrograms are used within a spatial covariance mixing model based on Complex Non-Negative Matrix Factorization (CNMF) that predicts the spatial characteristics of each source. The proposed framework is evaluated on the task of singing voice separation with a large multichannel dataset. Experimental results show that our joint DL+CNMF method outperforms both the individual monophonic DL-based separation and the multichannel CNMF baseline methods.
Original language | English |
---|---|
Publication status | Published - 2021 |
Publication type | Not Eligible |
Event | IEEE International Workshop on Multimedia Signal Processing - Tampere, Finland Duration: 6 Oct 2021 → 8 Oct 2021 https://attend.ieee.org/mmsp-2021/ |
Conference
Conference | IEEE International Workshop on Multimedia Signal Processing |
---|---|
Abbreviated title | IEEE MMSP 2021 |
Country/Territory | Finland |
City | Tampere |
Period | 6/10/21 → 8/10/21 |
Internet address |