Deep Neural Network Based Low-Latency Speech Separation with Asymmetric Analysis-Synthesis Window Pair

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

2 Downloads (Pure)


Time-frequency masking or spectrum prediction computed via short symmetric windows are commonly used in low-latency deep neural network (DNN) based source separation. In this paper, we propose the use of an asymmetric analysis-synthesis window pair which allows for training with targets with better frequency resolution, while retaining the low-latency during inference suitable for real-time speech enhancement or assisted hearing applications. In order to assess our approach across various model types and datasets, we evaluate it with a speaker-independent deep clustering (DC) model and a speaker-dependent mask inference (MI) model. We report an improvement in separation performance of up to 1.5 dB in terms of source-to-distortion ratio (SDR) while maintaining an algorithmic latency of 8 ms.
Original languageEnglish
Title of host publicationEuropean Signal Processing Conference 2021
Number of pages5
ISBN (Electronic)978-9-0827-9706-0
Publication statusPublished - 2021
Publication typeA4 Article in conference proceedings
EventEuropean Signal Processing Conference - Dublin, Ireland
Duration: 23 Aug 202127 Aug 2021

Publication series

NameEuropean Signal Processing Conference
ISSN (Electronic)2076-1465


ConferenceEuropean Signal Processing Conference
Abbreviated titleEUSIPCO
Internet address


  • Deep learning
  • Training
  • Time-frequency analysis
  • Source separation
  • Signal processing algorithms
  • Europe
  • Speech enhancement
  • Monaural speaker separation
  • Low latency
  • Asymmetric windows
  • Deep clustering

Publication forum classification

  • Publication forum level 1


Dive into the research topics of 'Deep Neural Network Based Low-Latency Speech Separation with Asymmetric Analysis-Synthesis Window Pair'. Together they form a unique fingerprint.

Cite this