Binaural Signal Representations for Joint Sound Event Detection and Acoustic Scene Classification

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

10 Downloads (Pure)


Sound event detection (SED) and Acoustic scene classification (ASC) are two widely researched audio tasks that constitute an important part of research on acoustic scene analysis. Considering shared information between sound events and acoustic scenes, performing both tasks jointly is a natural part of a complex machine listening system. In this paper, we investigate the usefulness of several spatial audio features in training a joint deep neural network (DNN) model performing SED and ASC. Experiments are performed for two different datasets containing binaural recordings and synchronous sound event and acoustic scene labels to analyse the differences between performing SED and ASC separately or jointly. The presented results show that the use of specific binaural features, mainly the Generalized Cross Correlation with Phase Transform (GCC-phat) and sines and cosines of phase differences, result in a better performing model in both separate and joint tasks as compared with baseline methods based on logmel energies only.
Original languageEnglish
Title of host publication2022 30th European Signal Processing Conference (EUSIPCO)
Number of pages5
ISBN (Electronic)978-90-827970-9-1
Publication statusPublished - 1 Sept 2022
Publication typeA4 Article in conference proceedings
Event European Signal Processing Conference - Belgrade, Serbia
Duration: 29 Aug 20222 Sept 2022

Publication series

NameEuropean Signal Processing Conference
ISSN (Electronic)2076-1465


Conference European Signal Processing Conference


  • sound event detection
  • acoustic scene classification
  • deep neural networks
  • binaural audio

Publication forum classification

  • Publication forum level 1


Dive into the research topics of 'Binaural Signal Representations for Joint Sound Event Detection and Acoustic Scene Classification'. Together they form a unique fingerprint.

Cite this