Clotho Analysis Set

  • Felix Gontier (Creator)
  • Romain Serizel (Creator)
  • Huang Xie (Creator)
  • Samuel Lipping (Creator)
  • Tuomas Virtanen (Creator)
  • Konstantinos Drossos (Wolt Enterprises Oy) (Creator)

Dataset

Description

This dataset is derived from the evaluation subset of Clotho dataset (https://zenodo.org/doi/10.5281/zenodo.3490683). It is designed to analyze the behavior of the captioning system under certain perturbation in order to try and identify some open challenges in automated audio captioning. The original audio clips are transformed with audio_degrader. The transformations applied are the following: Microphone response simulation Mixup with another clip from the dataset (ratio -6dB, -3dB and 0dB) Additive noise from DESED (ratio -12dB, -6dB, 0dB)
Date made available2 Jun 2022
PublisherZenodo

Field of science, Statistics Finland

  • 113 Computer and information sciences

Cite this