Description
Tampere University of Technology (TUT) Sound Events 2018 - Ambisonic, Reverberant and Real-life Impulse Response Dataset This dataset consists of real-life first order Ambisonic (FOA) format recordings with stationary point sources each associated with a spatial coordinate. The dataset was generated by collecting impulse responses (IR) from a real environment using the Eigenmike spherical microphone array. The measurement was done by slowly moving a Genelec G Two loudspeaker continuously playing a maximum length sequence around the array in circular trajectory in one elevation at a time. The playback volume was set to be 30 dB greater than the ambient sound level. The recording was done in a corridor inside the university with classrooms around it during work hours.The IRs were collected at elevations −40 to 40 with 10-degree increments at 1 m from the Eigenmike and at elevations −20 to 20 with 10-degree increments at 2 m. The dataset consists of three sub-datasets with a) maximum one temporally overlapping sound events, b) maximum two temporally overlapping sound events, and c) maximum three temporally overlapping sound events. Each of the sub-datasets has three cross-validation splits, that consists of 240 recordings of about 30 seconds long for training split and 60 recordings of the same length for the testing split. For each recording, the metadata file with the same name consists of the sound event name, the temporal onset and offset time (in seconds), spatial location in azimuth and elevation angles (in degrees), and distance from the microphone (in meters). The isolated sound events were taken from the urbansound8k dataset. This dataset consists of 10 sound event classes such as air_conditioner, car_horn, children_playing, dog_bark, drilling, enginge_idling, gun_shot, jackhammer, siren, and street_music. We do not consider the air_conditioner and children_playing sound events. Further, we only include the sound event examples marked as foreground in the dataset. We used the splits 1, 8 and 9 provided in the urbansound8k as the three CV splits. These splits were chosen as they had a good number of examples for all the chosen sound event classes after selecting only the foreground examples. During the sound scene synthesis, we randomly chose a sound event example and associated it with a random distance among the collected ones, azimuth and elevation angle. The sound event example was then convolved with the respective IR for the given distance, azimuth and elevation to spatially position it. The metadata.zip folder consists of the license and the metadata for the complete dataset. The rest of the nine zip files consists dataset for given split and overlap. For example, the wav_ov3_split1_30db.zip file consists of training and testing recordings for the case of maximum three temporally overlapping sound events (ov3) for the first cross-validation split (split1). Within each audio folder, the filenames for training split have the 'train' prefix, while the testing split filenames have the 'test' prefix. This dataset was collected as part of the 'Sound event localization and detection of overlapping sources using convolutional recurrent neural network' work. Data collector (s): Fagerlund, Eemi; Koskimies, Aino
Date made available | 30 Apr 2018 |
---|---|
Publisher | Zenodo |
Field of science, Statistics Finland
- 113 Computer and information sciences