Abstract
Time-frequency masking or spectrum prediction computed via short symmetric windows are commonly used in low-latency deep neural network (DNN) based source separation. In this paper, we propose the use of an asymmetric analysis-synthesis window pair which allows for training with targets with better frequency resolution, while retaining the low-latency during inference suitable for real-time speech enhancement or assisted hearing applications. In order to assess our approach across various model types and datasets, we evaluate it with a speaker-independent deep clustering (DC) model and a speaker-dependent mask inference (MI) model. We report an improvement in separation performance of up to 1.5 dB in terms of source-to-distortion ratio (SDR) while maintaining an algorithmic latency of 8 ms.
Original language | English |
---|---|
Title of host publication | European Signal Processing Conference 2021 |
Publisher | IEEE |
Pages | 301-305 |
Number of pages | 5 |
ISBN (Electronic) | 978-9-0827-9706-0 |
DOIs | |
Publication status | Published - 2021 |
Publication type | A4 Article in conference proceedings |
Event | European Signal Processing Conference - Dublin, Ireland Duration: 23 Aug 2021 → 27 Aug 2021 https://eusipco2021.org |
Publication series
Name | European Signal Processing Conference |
---|---|
ISSN (Electronic) | 2076-1465 |
Conference
Conference | European Signal Processing Conference |
---|---|
Abbreviated title | EUSIPCO |
Country/Territory | Ireland |
City | Dublin |
Period | 23/08/21 → 27/08/21 |
Internet address |
Keywords
- Deep learning
- Training
- Time-frequency analysis
- Source separation
- Signal processing algorithms
- Europe
- Speech enhancement
- Monaural speaker separation
- Low latency
- Asymmetric windows
- Deep clustering
Publication forum classification
- Publication forum level 1