Time Difference of Arrival Estimation of Speech Signals Using Deep Neural Networks with Integrated Time-frequency Masking

Pasi Pertilä, Mikko Parviainen

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

30 Citations (Scopus)
46 Downloads (Pure)

Abstract

The Time Difference of Arrival (TDoA) of a sound wavefront impinging on a microphone pair carries spatial information about the source. However, captured speech typically contains dynamic non-speech interference sources and noise. Therefore, the TDoA estimates fluctuate between speech and interference. Deep Neural Networks (DNNs) have been applied for Time-Frequency (TF) masking for Acoustic Source Localization (ASL) to filter out non-speech components from a speaker location likelihood function. However, the type of TF mask for this task is not obvious. Secondly, the DNN should estimate the TDoA values, but existing solutions estimate the TF mask instead. To overcome these issues, a direct formulation of the TF masking as a part of a DNN-based ASL structure is proposed. Furthermore, the proposed network operates in an online manner, i.e., producing estimates frame-by-frame. Combined with the use of recurrent layers it exploits the sequential progression of speaker related TDoAs. Training with different microphone spacings allows model re-use for different microphone pair geometries in inference. Real-data experiments with smartphone recordings of speech in interference demonstrate the network's generalization capability.

Original languageEnglish
Title of host publication2019 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019 - Proceedings
PublisherIEEE
Pages436-440
Number of pages5
ISBN (Electronic)9781479981311
DOIs
Publication statusPublished - 1 May 2019
Publication typeA4 Article in conference proceedings
EventIEEE International Conference on Acoustics, Speech, and Signal Processing - Brighton, United Kingdom
Duration: 12 May 201917 May 2019

Conference

ConferenceIEEE International Conference on Acoustics, Speech, and Signal Processing
Country/TerritoryUnited Kingdom
CityBrighton
Period12/05/1917/05/19

Keywords

  • Acoustic Source Localization
  • Microphone Arrays
  • Recurrent Neural Networks
  • Time-Frequency Masking

Publication forum classification

  • Publication forum level 1

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Time Difference of Arrival Estimation of Speech Signals Using Deep Neural Networks with Integrated Time-frequency Masking'. Together they form a unique fingerprint.

Cite this