Building a Naturalistic and Representative Affective Speech Corpus for the Finnish Language

Research output: Other conference contributionAbstractScientific

Abstract

Spoken language contains affective (emotional) information, which is conveyed by suprasegmental variation (e.g prosody and phonation) in speech as well as by other situational variation such as word choices along with dialectal, syntactic and semantic variation. The information is ultimately perceived by the listener as subjective interpretations. While affect is part of everyday conversational communication, there is little existing research on expression and perception of affect in spoken Finnish [8][7], not to mention across different idiolectal subgroups such as speakers of different age or dialectal background. Since expression and interpretation of affect in language is known to depend on cultural and social conventions, better understanding of the expression of affect in Finnish would be desirable. The goal of our work is to research how affect is expressed in everyday spoken Finnish using large-scale data. A prerequisite for our research on affective language is a speech corpus containing unscripted audio recordings of speech paired with metadata (or annotations) containing information about the affective expression. However, we are aware of only two Finnish speech corpora related to affect, both consisting of acted emotional expressions while reading a pre-defined script and consisting only a small amount of speech in total [1][5]. In contrast, several large-scale datasets containing unscripted speech in Finnish exists [4][2][6], but they lack affect related metadata. Building an affective speech corpus can be done in several ways, typically by recording acted speech in a controlled setting or utilizing publicly available free speech audio sources from different medias such as podcasts, radio or television. The trade-off when building these types of datasets is typically between the richness and balance of affective expression present in the data and the level of information the dataset contains about the expression in the data [3]. In this presentation, we will describe our approach to compiling a spoken Finnish dataset for the study of affective expression by combining the LahjoitaPuhetta, HelPuhe and TamPuhe datasets. The dataset will be built by aligning the audio recordings with their respective text transcriptions and split into individual utterance samples (consisting of audio and text). Each utterance sample in the dataset will be augmented with a text sentiment, speech-to-noise ratio and audio based emotion estimates first by using automated tools and finally annotating a subset of samples manually. The final dataset can be used to build better tools for automated affect related annotation providing more options for researching affect and idiolectical variation using large-scale data. The work is a part of the CONVERGENCE-project at Tampere University, funded by the Jane and Aatos Erkko Foundation.
Original languageEnglish
Pages1-1
Number of pages1
Publication statusPublished - Apr 2024
Publication typeNot Eligible
EventFonetiikan Päivät 2024 - Estonia , Tallinn
Duration: 25 Apr 202426 Apr 2024
https://cs.ttu.ee/events/fp2024/

Conference

ConferenceFonetiikan Päivät 2024
CityTallinn
Period25/04/2426/04/24
Internet address

Fingerprint

Dive into the research topics of 'Building a Naturalistic and Representative Affective Speech Corpus for the Finnish Language'. Together they form a unique fingerprint.

Cite this