MACS - Multi-Annotator Captioned Soundscapes

Dataset

Description

This is a dataset containing audio captions and corresponding audio tags for a number of 3930 audio files of the TAU Urban Acoustic Scenes 2019 development dataset (airport, public square, and park). The files were annotated using a web-based tool. Each file is annotated by multiple annotators that provided tags and a one-sentence description of the audio content. The data also includes annotator competence estimated using MACE (Multi-Annotator Competence Estimation). The annotation procedure, processing and analysis of the data are presented in the following papers: Irene Martin-Morato, Annamaria Mesaros. What is the ground truth? Reliability of multi-annotator data for audio tagging, 29th European Signal Processing Conference, EUSIPCO 2021 Irene Martin-Morato, Annamaria Mesaros. Diversity and bias in audio captioning datasets, submitted to DCASE 2021 Workshop (to be updated with arxiv link) Data is provided as two files: MACS.yaml - containing the complete annotations in the following format: - filename: file1.wav annotations: - annotator_id: ann_1 sentence: caption text tags: - tag1 - tag2 - annotator_id: ann_2 sentence: caption text tags: - tag1 MACS_competence.csv - containing the estimated annotator competence; for each annotator_id in the yaml file, competence is a number between 0 (considered as annotating at random) and 1 id [tab] competence The audio files can be downloaded from https://zenodo.org/record/2589280 and are covered by their own license.
Date made available22 Jul 2021
PublisherZenodo

Field of science, Statistics Finland

  • 113 Computer and information sciences
  • Diversity and bias in audio captioning datasets

    Martin Morato, I. & Mesaros, A., 15 Nov 2021, Proceedings of the 6th Workshop on Detection and Classication of Acoustic Scenes and Events (DCASE 2021). Font, F., Mesaros, A., P.W. Ellis, D., Fonseca, E., Fuentes, M. & Elizalde, B. (eds.). DCASE, p. 90-94

    Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

    Open Access
    File
    50 Downloads (Pure)
  • What is the ground truth? Reliability of multi-annotator data for audio tagging

    Martin Morato, I. & Mesaros, A., 2021, 29th European Signal Processing Conference EUSIPCO 2021. IEEE, 5 p. 1173

    Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

    Open Access
    File
    33 Citations (Scopus)
    15 Downloads (Pure)

Cite this