Joint speaker separation and recognition using non-negative matrix deconvolution with adaptive dictionary

Szymon Drgas, Tuomas Virtanen

Research output: Contribution to journalArticleScientificpeer-review

Abstract

In this article, we propose a new method for joint cochannel speaker separation and recognition called adaptive-dictionary non-negative matrix deconvolution (DANMD). This method is an extension of non-negative matrix deconvolution (NMD) which models spectrogram matrix as a linear combination of dictionary elements (atoms). We propose a dictionary which is a linear combination of speaker-independent component and components representing speaker variability. The dictionary is parametric and all atoms depend on a small number of parameters. The speaker-independent component and components representing speaker variability are learned from recordings of tens or hundreds of speakers. We show that the proposed method can be applied to the single-channel speech separation task where two speakers of unknown identity are to be separated. In a scenario where the unknown speakers’ recordings are in training dataset together with recordings of many other speakers, we show that the proposed method outperforms stacked NMD (NMD with a dictionary which contains atoms of all speakers in the dataset) in terms of signal-to-distortion ratio (SDR). DANMD was also tested in a scenario where recordings of the recognized speakers were not in the training dataset. In this case it brought clearly positive signal-to-distortion ratios. The proposed model was also tested for a co-channel speaker identification task, where the parameters of the adapted model are a basis for a decision about the identity of the speakers in the mixture. In this case, the accuracy was 81.2 in comparison to 84.1 in the case of stacked NMD. While the speaker recognition accuracy is lower for the new approach, we find the primary value in the improved SDR.

Original languageEnglish
Article number101223
Number of pages14
JournalComputer Speech and Language
Volume70
DOIs
Publication statusPublished - Nov 2021
Publication typeA1 Journal article-refereed

Keywords

  • Cochannel speaker identification
  • Non-negative matrix deconvolution
  • Speech separation

Publication forum classification

  • Publication forum level 2

ASJC Scopus subject areas

  • Software
  • Theoretical Computer Science
  • Human-Computer Interaction

Fingerprint

Dive into the research topics of 'Joint speaker separation and recognition using non-negative matrix deconvolution with adaptive dictionary'. Together they form a unique fingerprint.

Cite this