Visually Guided Sound Source Separation Using Cascaded Opponent Filter Network

Lingyu Zhu, Esa Rahtu

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

3 Citations (Scopus)
8 Downloads (Pure)

Abstract

The objective of this paper is to recover the original component signals from a mixture audio with the aid of visual cues of the sound sources. Such task is usually referred as visually guided sound source separation. The proposed Cascaded Opponent Filter (COF) framework consists of multiple stages, which recursively refine the source separation. A key element in COF is a novel opponent filter module that identifies and relocates residual components between sources. The system is guided by the appearance and motion of the source, and, for this purpose, we study different representations based on video frames, optical flows, dynamic images, and their combinations. Finally, we propose a Sound Source Location Masking (SSLM) technique, which, together with COF, produces a pixel level mask of the source location. The entire system is trained in an end-to-end manner using a large set of unlabelled videos. We compare COF with recent baselines and obtain the state-of-the-art performance in three challenging datasets (MUSIC, A-MUSIC, and A-NATURAL).
Original languageEnglish
Title of host publicationComputer Vision – ACCV 2020
Subtitle of host publication15th Asian Conference on Computer Vision, Kyoto, Japan, November 30 – December 4, 2020, Revised Selected Papers, Part VI
EditorsHiroshi Ishikawa, Cheng-Lin Liu, Tomas Pajdla, Jianbo Shi
PublisherSpringer
Pages409-426
Number of pages18
ISBN (Electronic)978-3-030-69544-6
ISBN (Print)978-3-030-69543-9
DOIs
Publication statusPublished - 26 Feb 2021
Publication typeA4 Article in conference proceedings
EventAsian Conference on Computer Vision - Virtual, Online
Duration: 30 Nov 20204 Dec 2020
Conference number: 15

Publication series

NameLecture Notes in Computer Science
Volume12627
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

ConferenceAsian Conference on Computer Vision
Period30/11/204/12/20

Keywords

  • audio-visual learning
  • sound source separation
  • sound source localization
  • opponent filter
  • dynamic image

Publication forum classification

  • Publication forum level 1

Fingerprint

Dive into the research topics of 'Visually Guided Sound Source Separation Using Cascaded Opponent Filter Network'. Together they form a unique fingerprint.

Cite this