Description
The AVCaps dataset is an audio-visual captioning resource designed to advance research in multimodal machine perception. Derived from the VidOR dataset, it features 2061 video clips spanning a total of 28.8 hours.
Date made available | 20 Dec 2024 |
---|---|
Publisher | Zenodo |
Funding
Funders | Funder number |
---|---|
Jane and Aatos Erkko Foundation |
Field of science, Statistics Finland
- 113 Computer and information sciences