MAESTRO Real - Multi-Annotator Estimated Strong Labels



The dataset was created for studying estimation of strong labels using crowdsourcing. It contains 49 real-life audio files from 5 different acoustic scenes, and the annotation outcome. Annotation was performed using Amazon Mechanical Turk. Total duration of the dataset is 97 minutes and 4 seconds Audio files are a subset from TUT Acoustic Scenes 2016 dataset, belonging to five acoustic scenes: cafe/restaurant, city center, grocery store, metro station and residential area. Each scene have 6 classes, some of them are common to all the scenes, resulting into 17 classes in total. The dataset contains: audio: the 49 real-life recordings, each from 3 to 5 min long. soft labels: estimated strong labels from the crowdsourced data, values between 0 and 1 indicates the uncertainty of the annotators. For more details about the real-life recordings, please see the following paper: A. Mesaros, T. Heittola and T. Virtanen, "TUT database for acoustic scene classification and sound event detection," 2016 24th European Signal Processing Conference (EUSIPCO), 2016, pp. 1128-1132.
Date made available28 Feb 2023

Field of science, Statistics Finland

  • 113 Computer and information sciences

Cite this