TY - GEN
T1 - Sound Event Detection and Localization with Distance Estimation
AU - Krause, Daniel Aleksander
AU - Politis, Archontis
AU - Mesaros, Annamaria
N1 - Publisher Copyright:
© 2024 European Signal Processing Conference, EUSIPCO. All rights reserved.
PY - 2024
Y1 - 2024
N2 - Sound Event Detection and Localization (SELD) is a combined task of identifying sound events and their corresponding direction-of-arrival (DOA). While this task has numerous applications and has been extensively researched in recent years, it fails to provide full information about the sound source position. In this paper, we overcome this problem by extending the task to Sound Event Detection, Localization with Distance Estimation (3D SELD). We study two ways of integrating distance estimation within the SELD core - a multi-task approach, in which the problem is tackled by a separate model output, and a single-task approach obtained by extending the multi-ACCDOA method to include distance information. We investigate both methods for the Ambisonic and binaural versions of STARSS23: Sony-TAU Realistic Spatial Soundscapes 2023. Moreover, our study involves experiments on the loss function related to the distance estimation part. Our results show that it is possible to perform 3D SELD without any degradation of performance in sound event detection and DOA estimation.
AB - Sound Event Detection and Localization (SELD) is a combined task of identifying sound events and their corresponding direction-of-arrival (DOA). While this task has numerous applications and has been extensively researched in recent years, it fails to provide full information about the sound source position. In this paper, we overcome this problem by extending the task to Sound Event Detection, Localization with Distance Estimation (3D SELD). We study two ways of integrating distance estimation within the SELD core - a multi-task approach, in which the problem is tackled by a separate model output, and a single-task approach obtained by extending the multi-ACCDOA method to include distance information. We investigate both methods for the Ambisonic and binaural versions of STARSS23: Sony-TAU Realistic Spatial Soundscapes 2023. Moreover, our study involves experiments on the loss function related to the distance estimation part. Our results show that it is possible to perform 3D SELD without any degradation of performance in sound event detection and DOA estimation.
KW - Ambisonics
KW - binaural recordings
KW - sound distance estimation
KW - Sound event detection
KW - sound source localization
U2 - 10.23919/EUSIPCO63174.2024.10715220
DO - 10.23919/EUSIPCO63174.2024.10715220
M3 - Conference contribution
AN - SCOPUS:85208414361
T3 - European Signal Processing Conference
SP - 286
EP - 290
BT - 32nd European Signal Processing Conference, EUSIPCO 2024 - Proceedings
PB - IEEE
T2 - European Signal Processing Conference
Y2 - 26 August 2024 through 30 August 2024
ER -