Abstract
Paid crowdsourcing has emerged as a popular method for annotating diverse data types such as images, text, and audio. However, the amount of carelessly working annotators has increased as platforms have become more popular, leading to an influx of spam workers that answer at random, which renders the platforms unusable. This paper documents our attempt to annotate the DESED dataset using Amazon’s Mechanical Turk, and failing to obtain any useful data after two attempts. Our observations reveal that while the number of workers performing the tasks has increased since 2021, the quality of obtained labels has declined considerably. After successful trials for annotating audio data in 2021 and 2022, in 2024 the same user interface annotation setup predominantly attracted spammers. Given the consistent task setup and similarity to previous attempts, it remains unclear whether the workers are inherently subpar or if they are intentionally exploiting the platform. The bottom line is that despite spending a considerable amount of money on it, we obtained no usable data.
Original language | English |
---|---|
Title of host publication | Proceedings of the Detection and Classification of Acoustic Scenes and Events 2024 Workshop (DCASE2024) |
Place of Publication | Tokyo, Japan |
Publisher | DCASE |
Pages | 56-60 |
ISBN (Electronic) | 978-952-03-3171-9 |
Publication status | Published - Oct 2024 |
Publication type | A4 Article in conference proceedings |
Event | Workshop on Detection and Classification of Acoustic Scenes and Events - Tokyo, Japan Duration: 23 Oct 2024 → 25 Oct 2024 https://dcase.community/workshop2024/ |
Workshop
Workshop | Workshop on Detection and Classification of Acoustic Scenes and Events |
---|---|
Abbreviated title | DCASE2024 |
Country/Territory | Japan |
City | Tokyo |
Period | 23/10/24 → 25/10/24 |
Internet address |
Publication forum classification
- Publication forum level 1