Abstrakti
Audio captioning is currently evaluated with metrics originating from machine translation and image captioning, but their suitability for audio has recently been questioned. This work proposes content-based scoring of audio captions, an approach that considers the specific sound events content of the captions. Inspired from text summarization, the proposed measure gives relevance scores to the sound events present in the reference, and scores candidates based on the relevance of the retrieved sounds. In this work we use a simple, consensus-based definition of relevance, but different weighing schemes can be easily incorporated to change the importance of terms accordingly. Our experiments use two datasets and three different audio captioning systems and show that the proposed measure behaves consistently with the data: captions that correctly capture the most relevant sounds obtain a score of 1, while the ones containing less relevant sounds score lower. While the proposed content-based score is not concerned with the fluency or semantic content of the captions, it can be incorporated into a compound metric, similar to SPIDEr being a linear combination of a semantic and a syntactic fluency score.
Alkuperäiskieli | Englanti |
---|---|
Otsikko | Proceedings of the 7th Workshop on Detection and Classication of Acoustic Scenes and Events (DCASE 2022) |
Toimittajat | Mathieu Lagrange, Annamaria Mesaros, Thomas Pellegrini, Gaël Richard, Romain Serizel, Dan Stowell |
Kustantaja | DCASE |
Sivut | 116-120 |
ISBN (elektroninen) | 978-952-03-2677-7 |
Tila | Julkaistu - 3 marrask. 2022 |
OKM-julkaisutyyppi | A4 Artikkeli konferenssijulkaisussa |
Tapahtuma | Workshop on Detection and Classification of Acoustic Scenes and Events - Nancy, Ranska Kesto: 3 marrask. 2022 → 4 marrask. 2022 https://dcase.community/workshop2022/ |
Conference
Conference | Workshop on Detection and Classification of Acoustic Scenes and Events |
---|---|
Lyhennettä | DCASE |
Maa/Alue | Ranska |
Kaupunki | Nancy |
Ajanjakso | 3/11/22 → 4/11/22 |
www-osoite |
Julkaisufoorumi-taso
- Jufo-taso 1