A summarization approach to evaluating audio captioning

Tutkimustuotos: KonferenssiartikkeliTieteellinenvertaisarvioitu

15 Lataukset (Pure)

Abstrakti

Audio captioning is currently evaluated with metrics originating from machine translation and image captioning, but their suitability for audio has recently been questioned. This work proposes content-based scoring of audio captions, an approach that considers the specific sound events content of the captions. Inspired from text summarization, the proposed measure gives relevance scores to the sound events present in the reference, and scores candidates based on the relevance of the retrieved sounds. In this work we use a simple, consensus-based definition of relevance, but different weighing schemes can be easily incorporated to change the importance of terms accordingly. Our experiments use two datasets and three different audio captioning systems and show that the proposed measure behaves consistently with the data: captions that correctly capture the most relevant sounds obtain a score of 1, while the ones containing less relevant sounds score lower. While the proposed content-based score is not concerned with the fluency or semantic content of the captions, it can be incorporated into a compound metric, similar to SPIDEr being a linear combination of a semantic and a syntactic fluency score.
AlkuperäiskieliEnglanti
OtsikkoProceedings of the 7th Workshop on Detection and Classication of Acoustic Scenes and Events (DCASE 2022)
ToimittajatMathieu Lagrange, Annamaria Mesaros, Thomas Pellegrini, Gaël Richard, Romain Serizel, Dan Stowell
KustantajaDCASE
Sivut116-120
ISBN (elektroninen)978-952-03-2677-7
TilaJulkaistu - 3 marrask. 2022
OKM-julkaisutyyppiA4 Artikkeli konferenssijulkaisussa
TapahtumaWorkshop on Detection and Classification of Acoustic Scenes and Events - Nancy, Ranska
Kesto: 3 marrask. 20224 marrask. 2022
https://dcase.community/workshop2022/

Conference

ConferenceWorkshop on Detection and Classification of Acoustic Scenes and Events
LyhennettäDCASE
Maa/AlueRanska
KaupunkiNancy
Ajanjakso3/11/224/11/22
www-osoite

Julkaisufoorumi-taso

  • Jufo-taso 1

Sormenjälki

Sukella tutkimusaiheisiin 'A summarization approach to evaluating audio captioning'. Ne muodostavat yhdessä ainutlaatuisen sormenjäljen.

Siteeraa tätä