TAU Moving Sound Events 2019 - Ambisonic, Reverberant, Real-life IR and Moving Source Dataset

Tietoaineisto

Kuvaus

Tampere University (TAU) Moving Sound Events 2019 - Ambisonic, Reverberant and Real-life Impulse Response and Moving Source Dataset This dataset consists of real-life first order Ambisonic (FOA) format recordings with moving point sources each in 2D spherical space represented with azimuth and elevation angles. The dataset was generated by collecting impulse responses (IR) from a real environment using the Eigenmike spherical microphone array. The measurement was done by slowly moving a Genelec G Two loudspeaker continuously playing a maximum length sequence around the array in circular trajectory in one elevation at a time. The playback volume was set to be 30 dB greater than the ambient sound level. The recording was done in a corridor inside the university with classrooms around it during work hours. The IRs were collected at elevations −40 to 40 with 10-degree increments at 1 m from the Eigenmike and at elevations −20 to 20 with 10-degree increments at 2 m. The dataset consists of three sub-datasets with a) maximum one temporally overlapping sound events, b) maximum two temporally overlapping sound events, and c) maximum three temporally overlapping sound events. Each of the sub-datasets has three cross-validation splits, that consists of 240 recordings of about 30 seconds long for training split and 60 recordings of the same length for the testing split. All sound events in this dataset are moving only along azimuth with a constant angular velocity in the range [-90, 90]/s with 10-degree/s steps. For each recording, the metadata file with the same name consists of the sound event name, the temporal onset and offset time (in seconds), starting spatial location in azimuth and elevation angles (in degrees), the angular velocity of motion and distance from the microphone (in meters). The isolated sound events were taken from the urbansound8k dataset. This dataset consists of 10 sound event classes such as air_conditioner, car_horn, children_playing, dog_bark, drilling, enginge_idling, gun_shot, jackhammer, siren, and street_music. We do not consider air_conditioner and children_playing sound events. Further, we only include the sound event examples marked as foreground in the dataset. We used the splits 1, 8 and 9 provided in the urbansound8k as the three CV splits. These splits were chosen as they had a good number of examples for all the chosen sound event classes after selecting only the foreground examples. During the sound scene synthesis, every sound event is assigned a spatial trajectory on an arc with a constant distance from the microphone and moving with a constant angular velocity for its duration. Other than the license file, there are nine zip files that consist of the dataset and corresponding metadata for given split and overlap. For example, the ov3_split1.zip file consists of training and testing recordings and metadata for the case of a maximum of three temporally overlapping sound events (ov3) for the first cross-validation split (split1). Within each folder, the filenames for training split have the 'train' prefix, while the testing split filenames have the 'test' prefix. This dataset was collected as part of the 'Localization, Detection and Tracking of Multiple Moving Sound Sources with Convolutional Recurrent Neural Networks' work. Data collector (s): Fagerlund, Eemi; Koskimies, Aino; Hakala, Aapo
Koska saatavilla11 huhtik. 2019

Field of science, Statistics Finland

  • 113 Tietojenkäsittely ja informaatiotieteet

Siteeraa tätä