Activities per year
Abstract
Audio representation learning based on deep neural networks (DNNs) emerged as an alternative approach to hand-crafted features. For achieving high performance, DNNs often need a large amount of annotated data which can be difficult and costly to obtain. In this paper, we propose a method for learning audio representations, aligning the learned latent representations of audio and associated tags. Aligning is done by maximizing the agreement of the latent representations of audio and tags, using a contrastive loss. The result is an audio embedding model which reflects acoustic and semantic characteristics of sounds. We evaluate the quality of our embedding model, measuring its performance as a feature extractor on three different tasks (namely, sound event recognition, and music genre and musical instrument classification), and investigate what type of characteristics the model captures. Our results are promising, sometimes in par with the state-of-the-art in the considered tasks and the embeddings produced with our method are well correlated with some acoustic descriptors.
Original language | English |
---|---|
Title of host publication | International Conference on Machine Learning (ICML) |
Subtitle of host publication | Workshop on Self-supervision in Audio and Speech |
Publication status | Published - 2020 |
Publication type | D3 Professional conference proceedings |
Event | International Conference on Machine Learning - Virtual Duration: 13 Jul 2020 → 18 Jul 2020 Conference number: 37 https://icml.cc |
Conference
Conference | International Conference on Machine Learning |
---|---|
Abbreviated title | ICML |
Period | 13/07/20 → 18/07/20 |
Internet address |
Fingerprint
Dive into the research topics of 'COALA: Co-Aligned Autoencoders for Learning Semantically Enriched Audio Representations'. Together they form a unique fingerprint.Datasets
-
Dataset used in COALA: Co-Aligned Autoencoders for Learning Semantically Enriched Audio Representations
Favory, X. (Creator), Drossos, K. (Creator), Virtanen, T. (Creator) & Serra, X. (Creator), 9 Jun 2020
DOI: 10.5281/zenodo.3887261, https://github.com/xavierfav/coala
Dataset
-
-
Music Technology Group (MTG), Department of Information and Communication Technologies, Universitat Pompeu Fabra
Konstantinos Drosos (Visitor)
2 Sept 2019 → 29 Nov 2019Activity: Visiting an external institution › Visit abroad