Leveraging Category Information for Single-Frame Visual Sound Source Separation

Lingyu Zhu, Esa Rahtu

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

6 Citations (Scopus)
14 Downloads (Pure)

Abstract

Visual sound source separation aims at identifying sound components from a given sound mixture with the presence of visual cues. Prior works have demonstrated impressive results, but with the expense of large multi-stage architectures and complex data representations (e.g. optical flow trajectories). In contrast, we study simple yet efficient models for visual sound separation using only a single video frame. Furthermore, our models are able to exploit the information of the sound source category in the separation process. To this end, we propose two models where we assume that i) the category labels are available at the training time, or ii) we know if the training sample pairs are from the same or different category. The experiments with the MUSIC dataset show that our model obtains comparable or better performance compared to several recent baseline methods. The code is available at https://github.com/ly-zhu/Leveraging-Category-Information-for-Single-Frame-Visual-Sound-Source-Separation.

Original languageEnglish
Title of host publicationProceedings of the 2021 9th European Workshop on Visual Information Processing, EUVIP 2021
EditorsA. Beghdadi, F. Alaya Cheikh, J.M.R.S. Tavares, A. Mokraoui, G. Valenzise, L. Oudre, M.A. Qureshi
PublisherIEEE
Number of pages6
ISBN (Electronic)9781665432306
ISBN (Print)9781665432313
DOIs
Publication statusPublished - 20 Jul 2021
Publication typeA4 Article in conference proceedings
EventEuropean Workshop on Visual Information Processing - Paris, France
Duration: 23 Jun 202125 Jun 2021

Publication series

NameEuropean Workshop on Visual Information Processing
ISSN (Print)2164-974X
ISSN (Electronic)2471-8963

Conference

ConferenceEuropean Workshop on Visual Information Processing
Country/TerritoryFrance
CityParis
Period23/06/2125/06/21

Keywords

  • attention mechanism
  • self-supervised learning
  • sound source localization
  • visual sound separation

Publication forum classification

  • Publication forum level 1

ASJC Scopus subject areas

  • Computer Vision and Pattern Recognition
  • Information Systems
  • Signal Processing

Fingerprint

Dive into the research topics of 'Leveraging Category Information for Single-Frame Visual Sound Source Separation'. Together they form a unique fingerprint.

Cite this