Unsupervised Interpretable Representation Learning for Singing Voice Separation

Stylianos Ioannis Mimilakis, Konstantinos Drossos, Gerald Schuller

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

Abstract

In this work, we present a method for learning interpretable music signal representations directly from waveform signals. Our method can be trained using unsupervised objectives and relies on the denoising auto-encoder model that uses a simple sinusoidal model as decoding functions to reconstruct the singing voice. To demonstrate the benefits of our method, we employ the obtained representations to the task of informed singing voice separation via binary masking, and measure the obtained separation quality by means of scale-invariant signal to distortion ratio. Our findings suggest that our method is capable of learning meaningful representations for singing voice separation, while preserving conveniences of the the short-time Fourier transform like non-negativity, smoothness, and reconstruction subject to time-frequency masking, that are desired in audio and music source separation.
Original languageEnglish
Title of host publication28th European Signal Processing Conference
PublisherEUSIPCO
Pages1412-1416
Number of pages5
ISBN (Electronic)978-9-0827-9705-3
DOIs
Publication statusPublished - 2020
Publication typeA4 Article in a conference publication
EventEuropean Signal Processing Conference - Beurs van Berlage, Amsterdam, Netherlands
Duration: 18 Jan 202122 Jan 2021
Conference number: 28
https://eusipco2020.org

Publication series

NameEuropean Signal Processing Conference
Number2021-January
ISSN (Print)2219-5491

Conference

ConferenceEuropean Signal Processing Conference
Abbreviated titleEUSIPCO2020
CountryNetherlands
CityAmsterdam
Period18/01/2122/01/21
Internet address

Publication forum classification

  • Publication forum level 1

Fingerprint Dive into the research topics of 'Unsupervised Interpretable Representation Learning for Singing Voice Separation'. Together they form a unique fingerprint.

Cite this