Clotho dataset

Dataset

Description

Clotho is a novel audio captioning dataset, consisting of 4981 audio samples, and each audio sample has five captions (a total of 24 905 captions). Audio samples are of 15 to 30 s duration and captions are eight to 20 words long.
Date made available15 Oct 2019
PublisherTampere University of Technology
Date of data production2019 -
  • Clotho: an Audio Captioning Dataset

    Drossos, K., Lipping, S. & Virtanen, T., 2020, IEEE 45th International Conference on Acoustics, Speech, and Signal Processing (ICASSP). IEEE, p. 736-740 5 p. (Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing).

    Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

    Open Access
    4 Citations (Scopus)
  • Crowdsourcing a Dataset of Audio Captions

    Lipping, S., Drossos, K. & Virtanen, T., 26 Oct 2019, Proceedings of the Detection and Classification of Acoustic Scenes and Events 2019 Workshop (DCASE2019).

    Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

    Open Access

Cite this