Skip to main navigation Skip to search Skip to main content

Self-labeling sounds using optimal transport

Research output: Contribution to journalArticleScientificpeer-review

4 Downloads (Pure)

Abstract

Self-labeling is a method to simultaneously learn representations and classes using unlabeled data. The naive approach to self-labeling leads to a degenerate solution, and the model-generated labels require regularization to serve as useful training targets. In this work, we adapt a self-labeling method using optimal transport to the audio domain using the FSD50K dataset. We analyze the structure of the learned representations and compare the emergent classes with the reference annotations. We compare the learned representations with the ones produced using Bootstrap Your Own Latent for Audio (BYOL-A) across several downstream tasks. Our findings indicate that the method learns to group perceptually similar sounds without supervision. The results show that the method is a viable approach for audio representation learning, and that the learned embeddings are as effective for downstream tasks as the ones obtained with the benchmark method. As an additional outcome, the generated classifications give valuable insight into what the model learns, promoting explainability in feature learning.
Original languageEnglish
JournalIEEE Open Journal of Signal Processing
Volume7
DOIs
Publication statusPublished - 2026
Publication typeA1 Journal article-refereed

Funding

FundersFunder number
Research Council of Finland332063

    Publication forum classification

    • Publication forum level 1

    Fingerprint

    Dive into the research topics of 'Self-labeling sounds using optimal transport'. Together they form a unique fingerprint.

    Cite this