Abstract
Affective expression plays a major role in everyday spoken and written language.
In order to study how affect is expressed by Finnish language users in day-to-day life, data consisting of samples from naturalistic and unscripted contexts is required. The present work describes the first spontaneous speech corpus for Finnish with affect-related annotations, containing 12,000 transcribed samples of unscripted speech paired with continuous-valued scores of valence and arousal marked by five native Finnish speakers. We first describe the creation of the corpus, based on combining speech samples from three large-scale Finnish speech corpora, from which we chose samples for annotation using an active learning-based affect mining approach. We then report characteristics of the resulting corpus and annotation consistency, followed by speech emotion recognition (SER) experiments with several classifiers and regression models to test the feasibility of the corpus for SER system development and evaluation. Annotation analyses reveal mean Pearson correlations between annotator scores and the mean of all annotators to be rho = 0.856 for valence and rho = 0.898 for arousal. The SER experiments on discretized labels result in an average unweighted average recall (UAR) of 0.458 for ternary valence classification and 0.719 for binary arousal classification using a fine-tuned ExHuBERT model for valence prediction and a support vector machine (SVM) classifier for arousal prediction, reaching comparable levels to those reported earlier for spontaneous speech. For the regression task, concordance correlation coefficients of 0.270 and 0.689 were obtained for valence and arousal, respectively, when using a WavLM-based model trained on MSP-Podcast corpus and fine-tuned on the target data. Overall, the analyses suggest that the corpus provides a feasible basis for later study on affective expression in spontaneous Finnish.
In order to study how affect is expressed by Finnish language users in day-to-day life, data consisting of samples from naturalistic and unscripted contexts is required. The present work describes the first spontaneous speech corpus for Finnish with affect-related annotations, containing 12,000 transcribed samples of unscripted speech paired with continuous-valued scores of valence and arousal marked by five native Finnish speakers. We first describe the creation of the corpus, based on combining speech samples from three large-scale Finnish speech corpora, from which we chose samples for annotation using an active learning-based affect mining approach. We then report characteristics of the resulting corpus and annotation consistency, followed by speech emotion recognition (SER) experiments with several classifiers and regression models to test the feasibility of the corpus for SER system development and evaluation. Annotation analyses reveal mean Pearson correlations between annotator scores and the mean of all annotators to be rho = 0.856 for valence and rho = 0.898 for arousal. The SER experiments on discretized labels result in an average unweighted average recall (UAR) of 0.458 for ternary valence classification and 0.719 for binary arousal classification using a fine-tuned ExHuBERT model for valence prediction and a support vector machine (SVM) classifier for arousal prediction, reaching comparable levels to those reported earlier for spontaneous speech. For the regression task, concordance correlation coefficients of 0.270 and 0.689 were obtained for valence and arousal, respectively, when using a WavLM-based model trained on MSP-Podcast corpus and fine-tuned on the target data. Overall, the analyses suggest that the corpus provides a feasible basis for later study on affective expression in spontaneous Finnish.
| Original language | English |
|---|---|
| Article number | 103327 |
| Journal | Speech Communication |
| Volume | 175 |
| DOIs | |
| Publication status | Published - Nov 2025 |
| Publication type | A1 Journal article-refereed |
Keywords
- affective expression
- speech analysis
- spontaneous speech
- speech emotion recognition
- perception of affect
- active learning
Publication forum classification
- Publication forum level 2