Nonnegative matrix factorization (NMF) is a very powerful model for representing speech and music data. In this chapter, we present the mathematical foundations, and describe several probabilistic frameworks and various algorithms for computing an NMF. We also describe some advanced NMF models that are able to more accurately represent audio signals, by enforcing properties such as sparsity, harmonicity and spectral smoothness, and by taking the non‐stationarity of the data into account. We show that coupled factorizations make it possible to exploit some extra information we may have about the observed signal, including the musical score. Finally, we present several methods that perform dictionary learning for NMF, and we conclude about the main benefits and downsides of NMF models.
|Title of host publication||Audio Source Separation and Speech Enhancement|
|Editors||Emmanuel Vincent, Tuomas Virtanen, Sharon Gannot|
|Publication status||Published - 3 Aug 2018|
|Publication type||B2 Book chapter|