Abstract
The steered response power (SRP) methods can be used to build a map of sound direction likelihood. In the presence of interference and reverberation, the map will exhibit multiple peaks with heights related to the corresponding sound's spectral content. Often in realistic use cases, the target of interest (such as speech) can exhibit a lower peak compared to an interference source. This will corrupt any direction dependent method, such as beamforming.
Regression has been used to predict time-frequency (TF) regions corrupted by reverberation, and static broadband noise can be efficiently estimated for TF points. TF regions dominated by noise or reverberation can then be de-emphasized to obtain more reliable source direction estimates. In this work, we propose the use of convolutional neural networks (CNNs) for the prediction of a TF mask for emphasizing the direct path speech signal in time-varying interference. SRP with phase transform (SRP-PHAT) combined with the CNN-based masking is shown to be capable of reducing the impact of time-varying interference for speaker direction estimation using real speech sources in reverberation.
Regression has been used to predict time-frequency (TF) regions corrupted by reverberation, and static broadband noise can be efficiently estimated for TF points. TF regions dominated by noise or reverberation can then be de-emphasized to obtain more reliable source direction estimates. In this work, we propose the use of convolutional neural networks (CNNs) for the prediction of a TF mask for emphasizing the direct path speech signal in time-varying interference. SRP with phase transform (SRP-PHAT) combined with the CNN-based masking is shown to be capable of reducing the impact of time-varying interference for speaker direction estimation using real speech sources in reverberation.
Original language | English |
---|---|
Title of host publication | 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) |
Publisher | IEEE |
Pages | 6125-6129 |
ISBN (Electronic) | 978-1-5090-4117-6 |
DOIs | |
Publication status | Published - 2017 |
Publication type | A4 Article in conference proceedings |
Event | IEEE International Conference on Acoustics, Speech and Signal Processing - Duration: 1 Jan 1900 → 1 Jan 2000 |
Publication series
Name | |
---|---|
ISSN (Electronic) | 2379-190X |
Conference
Conference | IEEE International Conference on Acoustics, Speech and Signal Processing |
---|---|
Period | 1/01/00 → 1/01/00 |
Publication forum classification
- Publication forum level 1