TY - GEN
T1 - Reference Channel Selection by Multi-Channel Masking for End-to-End Multi-Channel Speech Enhancement
AU - Dai, Wang
AU - Li, Xiaofei
AU - Politis, Archontis
AU - Virtanen, Tuomas
N1 - Publisher Copyright:
© 2024 European Signal Processing Conference, EUSIPCO. All rights reserved.
PY - 2024
Y1 - 2024
N2 - In end-to-end multi-channel speech enhancement, the traditional approach of designating one microphone signal as the reference for processing may not always yield optimal results. The limitation is particularly in scenarios with large distributed microphone arrays with varying speaker-to-microphone distances or compact, highly directional microphone arrays where speaker or microphone positions change over time. Current mask-based methods often fix the reference channel during training, which makes it not possible to adaptively select the reference channel for optimal performance. To address this problem, we introduce an adaptive approach for selecting the optimal reference channel. Our method leverages a multi-channel masking-based scheme, where multiple masked signals are combined to generate a single-channel output signal. This enhanced signal is then used for loss calculation, while the reference clean speech is adjusted based on the highest scale-invariant signal-to-distortion ratio (SI-SDR). The experimental results on the Spear challenge simulated dataset D4 demonstrate the superiority of our proposed method over the conventional approach of using a fixed reference channel with single-channel masking.
AB - In end-to-end multi-channel speech enhancement, the traditional approach of designating one microphone signal as the reference for processing may not always yield optimal results. The limitation is particularly in scenarios with large distributed microphone arrays with varying speaker-to-microphone distances or compact, highly directional microphone arrays where speaker or microphone positions change over time. Current mask-based methods often fix the reference channel during training, which makes it not possible to adaptively select the reference channel for optimal performance. To address this problem, we introduce an adaptive approach for selecting the optimal reference channel. Our method leverages a multi-channel masking-based scheme, where multiple masked signals are combined to generate a single-channel output signal. This enhanced signal is then used for loss calculation, while the reference clean speech is adjusted based on the highest scale-invariant signal-to-distortion ratio (SI-SDR). The experimental results on the Spear challenge simulated dataset D4 demonstrate the superiority of our proposed method over the conventional approach of using a fixed reference channel with single-channel masking.
KW - end-to-end multi-channel speech enhancement
KW - multi-channel masking
KW - reference channel selection
U2 - 10.23919/EUSIPCO63174.2024.10715275
DO - 10.23919/EUSIPCO63174.2024.10715275
M3 - Conference contribution
AN - SCOPUS:85208445220
T3 - European Signal Processing Conference
SP - 241
EP - 245
BT - 2024 32nd European Signal Processing Conference (EUSIPCO)
PB - IEEE
T2 - European Signal Processing Conference
Y2 - 26 August 2024 through 30 August 2024
ER -