Audio Captioning in Finnish and English with Task-Dependent Output

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

6 Downloads (Pure)

Abstract

Describing audio content is a complex task for an annotator; the resulting caption depends on the annotator’s language, culture and expertise. In addition, physiological factors like vision impairment may affect on how the sound is perceived and interpreted. In this work, we explore bilingual audio captioning in Finnish and English. In connection with this study, we release the SiVi-CAFE dataset, a small-size dataset of Sighted and Visually-impaired Captions for Audio in Finnish and English, with a collection of parallel annotations for the same clips. We analyze briefly the differences between captions produced by sighted and visually-impaired annotators, and train a system to produce captions in both languages that also mimics the style of different annotator groups. Obtaining a CIDEr score of 34.75% and 28.75% on the English and Finnish datasets, respectively. Furthermore, the system is able to perform a tagging task, obtaining F-score of 79.73%.
Original languageEnglish
Title of host publicationProceedings of the Detection and Classification of Acoustic Scenes and Events 2024 Workshop (DCASE2024)
PublisherDCASE
Pages76-80
ISBN (Electronic)978-952-03-3171-9
Publication statusPublished - 2024
Publication typeA4 Article in conference proceedings
EventWorkshop on Detection and Classification of Acoustic Scenes and Events - Tokyo, Japan
Duration: 23 Oct 202425 Oct 2024
https://dcase.community/workshop2024/

Workshop

WorkshopWorkshop on Detection and Classification of Acoustic Scenes and Events
Abbreviated titleDCASE2024
Country/TerritoryJapan
CityTokyo
Period23/10/2425/10/24
Internet address

Publication forum classification

  • Publication forum level 1

Fingerprint

Dive into the research topics of 'Audio Captioning in Finnish and English with Task-Dependent Output'. Together they form a unique fingerprint.

Cite this