Deep Neural Network Based Low-Latency Speech Separation with Asymmetric Analysis-Synthesis Window Pair

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

7 Citations (Scopus)
11 Downloads (Pure)

Abstract

Time-frequency masking or spectrum prediction computed via short symmetric windows are commonly used in low-latency deep neural network (DNN) based source separation. In this paper, we propose the use of an asymmetric analysis-synthesis window pair which allows for training with targets with better frequency resolution, while retaining the low-latency during inference suitable for real-time speech enhancement or assisted hearing applications. In order to assess our approach across various model types and datasets, we evaluate it with a speaker-independent deep clustering (DC) model and a speaker-dependent mask inference (MI) model. We report an improvement in separation performance of up to 1.5 dB in terms of source-to-distortion ratio (SDR) while maintaining an algorithmic latency of 8 ms.
Original languageEnglish
Title of host publicationEuropean Signal Processing Conference 2021
PublisherIEEE
Pages301-305
Number of pages5
ISBN (Electronic)978-9-0827-9706-0
DOIs
Publication statusPublished - 2021
Publication typeA4 Article in conference proceedings
EventEuropean Signal Processing Conference - Dublin, Ireland
Duration: 23 Aug 202127 Aug 2021
https://eusipco2021.org

Publication series

NameEuropean Signal Processing Conference
ISSN (Electronic)2076-1465

Conference

ConferenceEuropean Signal Processing Conference
Abbreviated titleEUSIPCO
Country/TerritoryIreland
CityDublin
Period23/08/2127/08/21
Internet address

Keywords

  • Deep learning
  • Training
  • Time-frequency analysis
  • Source separation
  • Signal processing algorithms
  • Europe
  • Speech enhancement
  • Monaural speaker separation
  • Low latency
  • Asymmetric windows
  • Deep clustering

Publication forum classification

  • Publication forum level 1

Fingerprint

Dive into the research topics of 'Deep Neural Network Based Low-Latency Speech Separation with Asymmetric Analysis-Synthesis Window Pair'. Together they form a unique fingerprint.

Cite this