Time Difference of Arrival Estimation of Multiple Simultaneous Speakers Using Deep Clustering Neural Networks

Mikko Parviainen, Pasi Pertilä

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

Abstract

A novel multiple acoustic source localization approach is presented that is capable of providing spatial information about concurrent active speakers from a mixture signal captured by a microphone array. The proposed method first separates the observed array mixture signal into single speaker array signals using deep clustering (DC), which is a deep neural network (DNN) based method that maps source signal information into an embedding space, in which a clustering algorithm can be then used to separate the sources. Spatial information in terms of time difference of arrival (TDoA) can be then extracted from each separated signal. This approach is novel for TDoA estimation of multiple sources, since the state-of-the-art method first localizes multiple sources and then performs the separation. The inherent advantage of the proposed approach is that there is no need for data association of the measurements and the sources. The results with data from an actual room show that the proposed approach outperforms the current state-of-the- art in extracting the spatial information from two concurrent speakers mixture signal.
Original languageEnglish
Title of host publicationIEEE MMSP 2021 - 23rd Workshop on Multimedia Signal Processing
PublisherIEEE
Number of pages6
ISBN (Electronic)978-1-6654-3288-7
DOIs
Publication statusPublished - 2022
Publication typeA4 Article in conference proceedings
EventIEEE International Workshop on Multimedia Signal Processing - Tampere, Finland
Duration: 6 Oct 20218 Oct 2021
https://attend.ieee.org/mmsp-2021/

Publication series

NameIEEE International Workshop on Multimedia Signal Processing
ISSN (Electronic)2473-3628

Conference

ConferenceIEEE International Workshop on Multimedia Signal Processing
Abbreviated titleIEEE MMSP 2021
Country/TerritoryFinland
CityTampere
Period6/10/218/10/21
Internet address

Publication forum classification

  • Publication forum level 1

Fingerprint

Dive into the research topics of 'Time Difference of Arrival Estimation of Multiple Simultaneous Speakers Using Deep Clustering Neural Networks'. Together they form a unique fingerprint.

Cite this