Differentiable Tracking-Based Training of Deep Learning Sound Source Localizers

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

Abstract

Data-based and learning-based sound source localization (SSL) has shown promising results in challenging conditions, and is commonly set as a classification or a regression problem. Regression-based approaches have certain advantages over classification-based, such as continuous direction-of-arrival estimation of static and moving sources. However, multi-source scenarios require multiple regressors without a clear training strategy up-to-date, that does not rely on auxiliary information such as simultaneous sound classification. We investigate end-to-end training of such methods with a technique recently proposed for video object detectors, adapted to the SSL setting. A differentiable network is constructed that can be plugged to the output of the localizer to solve the optimal assignment between predictions and references, optimizing directly the popular CLEAR-MOT tracking metrics. Results indicate large improvements over directly optimizing mean squared errors, in terms of localization error, detection metrics, and tracking capabilities.
Original languageEnglish
Title of host publication2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)
PublisherIEEE
Pages211-215
Number of pages5
ISBN (Electronic)978-1-6654-4870-3
DOIs
Publication statusPublished - 2021
Publication typeA4 Article in a conference publication
EventIEEE Workshop on Applications of Signal Processing to Audio and Acoustics - , United States
Duration: 17 Oct 202120 Oct 2021

Publication series

NameIEEE Workshop on Applications of Signal Processing to Audio and Acoustics
ISSN (Print)1931-1168
ISSN (Electronic)1947-1629

Conference

ConferenceIEEE Workshop on Applications of Signal Processing to Audio and Acoustics
Country/TerritoryUnited States
Period17/10/2120/10/21

Keywords

  • Training
  • Location awareness
  • Measurement
  • Deep learning
  • Direction-of-arrival estimation
  • Conferences
  • Training data
  • sound source localization
  • deep-learning acoustic processing
  • multi-target tracking

Publication forum classification

  • Publication forum level 1

Fingerprint

Dive into the research topics of 'Differentiable Tracking-Based Training of Deep Learning Sound Source Localizers'. Together they form a unique fingerprint.

Cite this