Kuvaus
Tampere University (TAU) Moving Sound Events 2019 - Ambisonic, Anechoic and Synthetic Impulse Response (IR) and Moving Source Dataset This dataset consists of simulated anechoic first order Ambisonic (FOA) format recordings with moving point sources each in 2D spherical space represented with azimuth and elevation angles. The dataset consists of three sub-datasets with a) maximum one temporally overlapping sound events, b) maximum two temporally overlapping sound events, and c) maximum three temporally overlapping sound events. Each of the sub-datasets has three cross-validation splits, that consists of 240 recordings of about 30 seconds long for training split and 60 recordings of the same length for the testing split. For each recording, the metadata file with the same name consists of the sound event name, the temporal onset and offset time (in seconds), starting spatial location and directional spatial location in azimuth and elevation angles (in degrees), angular velocity of motion, and distance from the microphone (in meters). The isolated sound events were taken from the DCASE 2016 task 2 dataset. This dataset consists of 11 sound event classes such as Clearing throat, Coughing, Door knock, Door slam, Drawer, Human laughter, Keyboard, Keys (put on a table), Page turning, Phone ringing and Speech. Every event is assigned a spatial trajectory on an arc with a constant distance from the microphone (in the range 1-10 m) and moving with a constant angular velocity for its duration. Due to the choice of the ambisonic spatial recording format, the steering vectors for a plane wave source or point source in the far field are frequency-independent. Hence, there is no need for a time-variant convolution or impulse response interpolation scheme as the source is moving; the spatial encoding of the monophonic signal was done sample-by-sample using instantaneous ambisonic encoding vectors for the respective DOA of the moving source. The synthesized trajectories in the dataset vary in both azimuth and elevation and are simulated to have a constant angular velocity in the range [-90, 90]/s with 10-degree/s steps. The license of the dataset can be found in the LICENSE file. The rest of the nine zip files consists of datasets for a given split and overlap. For example, the ov3_split1.zip file consists of the audio and metadata folders for the case of maximum three temporally overlapping sound events (ov3) and the first cross-validation split (split1). Within each audio/metadata folder, the filenames for training split have the 'train' prefix, while the testing split filenames have the 'test' prefix. This dataset was collected as part of the 'Localization, Detection and Tracking of Multiple Moving Sound Sources with Convolutional Recurrent Neural Networks' work.
Koska saatavilla | 11 huhtik. 2019 |
---|
Field of science, Statistics Finland
- 113 Tietojenkäsittely ja informaatiotieteet