Abstrakti
Multi-channel speech separation in dynamic environments is challenging as time-varying spatial and spectral features evolve at different temporal scales. Existing methods typically employ sequential architectures, forcing a single network stream to simultaneously model both feature types, creating an inherent modeling conflict. In this paper, we propose a dual-branch parallel spectral-spatial (PS2) architecture that separately processes spectral and spatial features through parallel streams. The spectral branch uses a bi-directional long short-term memory (BLSTM)-based frequency module, a Mamba-based temporal module, and a self-attention module to model spectral features. The spatial branch employs bi-directional gated recurrent unit (BGRU) networks to process spatial features that encode the evolving geometric relationships between sources and microphones. Features from both branches are integrated through a cross-attention fusion mechanism that adaptively weights their contributions. Experimental results demonstrate that the PS2 outperforms existing state-of-the-art (SOTA) methods by 1.6-2.2 dB in scale-invariant signal-to-distortion ratio (SI-SDR) for moving speaker scenarios, with robust separation quality under different reverberation times (RT60), noise levels, and source movement speeds. Even with fast source movements, the proposed model maintains SI-SDR improvements of over 13 dB. These improvements are consistently observed across multiple datasets, including WHAMR! and our generated WSJ0-Demand-6ch-Move dataset.
| Alkuperäiskieli | Englanti |
|---|---|
| Sivut | 1524-1537 |
| Julkaisu | IEEE Transactions on Audio, Speech and Language Processing |
| Vuosikerta | 34 |
| Varhainen verkossa julkaisun päivämäärä | 2026 |
| DOI - pysyväislinkit | |
| Tila | Julkaistu - 2026 |
| OKM-julkaisutyyppi | A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä |
Julkaisufoorumi-taso
- Jufo-taso 3
!!ASJC Scopus subject areas
- Acoustics and Ultrasonics
- Electrical and Electronic Engineering
Sormenjälki
Sukella tutkimusaiheisiin 'Moving Speaker Separation Via Parallel Spectral-Spatial Processing'. Ne muodostavat yhdessä ainutlaatuisen sormenjäljen.Siteeraa tätä
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver