Vocal Effort Based Speaking Style Conversion Using Vocoder Features and Parallel Learning

Shreyas Seshadri, Lauri Juvela, Okko Räsänen, Paavo Alku

Research output: Contribution to journalArticleScientificpeer-review

15 Citations (Scopus)


Speaking style conversion (SSC) is the technology of converting natural speech signals from one style to another. In this study, we aim to provide a general SSC system for converting styles with varying vocal effort and focus on normal-to-Lombard conversion as a case study of this problem. We propose a parametric approach that uses a vocoder to extract speech features. These features are mapped using parallel machine learning models from utterances spoken in normal style to the corresponding features of Lombard speech. Finally, the mapped features are converted to a Lombard speech waveform with the vocoder. A total of three vocoders (GlottDNN, STRAIGHT, and Pulse model in log domain (PML)) and three machine learning mapping methods (standard GMM, Bayesian GMM, and feed-forward DNN) were compared in the proposed normal-to-Lombard style conversion system. The conversion was evaluated using two subjective listening tests measuring perceived Lombardness and quality of the converted speech signals, and by using an instrumental measure called Speech Intelligibility in Bits (SIIB) for speech intelligibility evaluation under various noise levels. The results of the subjective tests show that the system is able to convert normal speech into Lombard speech and that there is a trade-off between quality and Lombardness of the mapped utterances. The GlottDNN and PML stand out as the best vocoders in terms of quality and Lombardness, respectively, whereas the DNN is the best mapping method in terms of Lombardness. PML with the standard GMM seems to give a good compromise between the two attributes. The SIIB experiments indicate that intelligibility of converted speech compared to that of normal speech improved in noisy conditions most effectively when DNN mapping was used with STRAIGHT and PML.

Original languageEnglish
Pages (from-to)17230-17246
Number of pages17
JournalIEEE Access
Publication statusPublished - 2019
Publication typeA1 Journal article-refereed


  • Bayesian GMM
  • DNN
  • GlottDNN
  • Lombard speech
  • pulse model in log domain
  • speaking style conversion
  • vocal effort

Publication forum classification

  • Publication forum level 2

ASJC Scopus subject areas

  • General Computer Science
  • General Materials Science
  • General Engineering


Dive into the research topics of 'Vocal Effort Based Speaking Style Conversion Using Vocoder Features and Parallel Learning'. Together they form a unique fingerprint.

Cite this