Realistic-Scale Speech and Audiovisual Naming Events to Study Language Development in Infants

Dataset

Description

This dataset contains the file names of audio and audiovisual data used to simulate infant language exposure from 0 to 12 months. The dataset does not include the actual audio or image files but provides references to files from the SpokenCOCO and LibrisSpeech datasets. The data is divided into two stages of learning:* 0–6 months: Focused on pure auditory learning, comprising 1049 hours of read speech sampled from a mixture of the SpokenCOCO and LibrisSpeech datasets.* 6–12 months: Combines auditory and audiovisual learning, using samples from the SpokenCOCO dataset (images paired with their caption descriptions). The audiovisual naming events are tailored to simulate realistic-scale data that infants are typically exposed to. Subsets are provided for 2, 4, and 6 months of audiovisual exposure.

Note: Access to the actual audio and image files must be obtained separately from the respective datasets (SpokenCOCO and LibrisSpeech).

If you use this dataset in your work, please cite the following paper:Khorrami, K., & Räsänen, O. (2024). A model of early word acquisition based on realistic-scale audiovisual naming events. Speech Communication. DOI:  10.1016/j.specom.2024.103169
Date made available20 Dec 2024
PublisherZenodo

Field of science, Statistics Finland

  • 113 Computer and information sciences

Cite this