DESCRIPTION The TAU Spatial Room Impulse Response Database (TAU-SRIR DB) database contains spatial room impulse responses (SRIRs) captured in various spaces of Tampere University (TAU), Finland, for a fixed receiver position and multiple source positions per room, along with separate recordings of spatial ambient noise captured at the same recording point. The dataset is intended for emulation of spatial multichannel recordings for evaluation and/or training of multichannel processing algorithms in realistic reverberant conditions and over multiple rooms. The major distinct properties of the database compared to other databases of room impulse responses are: Capturing in a high resolution multichannel format (32 channels) from which multiple more limited application-specific formats can be derived (e.g. tetrahedral array, circular array, first-order Ambisonics, higher-order Ambisonics, binaural). Extraction of densely spaced SRIRs along measurement trajectories, allowing emulation of moving source scenarios. Multiple source distances, azimuths, and elevations from the receiver per room, allowing emulation of complex configurations for multi-source methods. Multiple rooms, allowing evaluation of methods at various acoustic conditions, and training of methods with the aim of generalization on different rooms. The RIRs were collected by staff of TAU between 12/2017 - 06/2018, and between 11/2019 - 1/2020. The data collection received funding from the European Research Council, grant agreement 637422 EVERYSOUND. NOTE: This database is a work-in-progress. We intend to publish additional rooms, additional formats, and potentially higher-fidelity versions of the captured responses in the near future, as new versions of the database in this repository. REPORT AND REFERENCE A compact description of the dataset, recording setup, recording procedure, and extraction can be found in: Politis., Archontis, Adavanne, Sharath, & Virtanen, Tuomas (2020). A Dataset of Reverberant Spatial Sound Scenes with Moving Sources for Sound Event Localization and Detection. In Proceedings of the Detection and Classification of Acoustic Scenes and Events 2020 Workshop (DCASE2020), Tokyo, Japan. available here. A more detailed report specifically focusing on the dataset collection and properties will follow. AIM The dataset can be used for generating multichannel or monophonic mixtures for testing or training of methods under realistic reverberation conditions, related to e.g. multichannel speech enhancement, acoustic scene analysis, and machine listening, among others. It is especially suitable for the follow application scenarios: monophonic and multichannal reverberant single- or multi-source speech in multi-room reverberant conditions monophonic and multichannel polyphonic sound events in multi-room reverberant conditions single-source and multi-source localization in multi-room reverberant conditions, in static or dynamic scenarios single-source and multi-source tracking in multi-room reverberant conditions, in static or dynamic scenarios sound event localization and detection in multi-room reverberant conditions, in static or dynamic scenarios SPECIFICATIONS The SRIRs were captured using an [Eigenmike](https://mhacoustics.com/products) spherical microphone array. A [Genelec G Three loudspeaker](https://www.genelec.com/g-three) was used to playback a maximum length sequence (MLS) around the Eigenmike. The SRIRs were obtained in the STFT domain using a least-squares regression between the known measurement signal (MLS) and far-field recording independently at each frequency. In this version of the dataset the SRIRs and ambient noise are downsampled to 24kHz for compactness. The currently published SRIR set was recorded at nine different indoor locations inside the Tampere University campus at Hervanta, Finland. Additionally, 30 minutes of ambient noise recordings were collected at the same locations with the IR recording setup unchanged. SRIR directions and distances differ with the room. Possible azimuths span the whole range of $\phi\in[-180,180)$, while the elevations span approximately a range between $\theta\in[-45,45]$ degrees. The currently shared measured spaces are as follows: Large open space in underground bomb shelter, with plastic-coated floor and rock walls. Ventilation noise. Circular source trajectory. Large open gym space. Ambience of people using weights and gym equipment in adjacent rooms. Circular source trajectory. Small classroom (PB132) with group work tables and carpet flooring. Ventilation noise. Circular source trajectory. Meeting room (PC226) with hard floor and partially glass walls. Ventilation noise. Circular source trajectory. Lecture hall (SA203) with inclined floor and rows of desks. Ventilation noise. Linear source trajectory. Small classroom (SC203) with group work tables and carpet flooring. Ventilation noise. Linear source trajectory. Large classroom (SE203) with hard floor and rows of desks. Ventilation noise. Linear source trajectory. Lecture hall (TB103) with inclined floor and rows of desks. Ventilation noise. Linear source trajectory. Meeting room (TC352) with hard floor and partially glass walls. Ventilation noise. Circular source trajectory. The measurement trajectories were organised in groups, with each group being specified by a circular or linear trace at the floor at a certain distance from the z-axis of the microphone. For circular trajectories two ranges were measured, a close and a far one, except room TC352, where the same range was measured twice, but with different furniture configuration and open or closed doors. For linear trajectories also two ranges were measured, close and far, but with linear paths at either side of the array, resulting in 4 unique trajectory groups, with the exception of room SA203 where 3 ranges were measured resulting on 6 trajectory groups. Linear trajectory groups are always parallel to each other, in the same room. Each trajectory group had multiple measurement trajectories, following the same floor path, but with the source at different heights. The SRIRs are extracted from the noise recordings of the slowly moving source across those trajectories, at an angular spacing of approximately every 1 degree from the microphone. Instead of extracting SRIRs at equally spaced points along the path (e.g. every 20cm), this extraction scheme was found more practical for synthesis purposes, making emulation of moving sources at an approximately constant angular speed easier. More details on the trajectory geometries can be found in the README file and the measinfo.mat file. RECORDING FORMATS As with the DCASE2019-2021 datasets, currently the database is provided in two formats, first-order Ambisonics, and a tetrahedral microphone array - both derived from the Eigenmike 32-channel recordings. For more details on the format specifications, check the README. We intend to add additional formats of the database, of both higher resolution (e.g. higher-order Ambisonics), or lower resolution (e.g. binaural). REFERENCE DOAs For each extracted RIR across a measurement trajectory there is a direction-of-arrival (DOA) associated with it, which can be used as the reference direction for sound source spatialized using this RIR, for training or evaluation purposes. The DOAs were determined acoustically from the extracted RIRs, by windowing the direct sound part and applying a broadband version of the MUSIC localization algorithm on the windowed multichannel signal. The DOAs are provided as Cartesian components [x, y, z] of unit length vectors. SCENE GENERATOR A set of routines is shared, here termed scene generator, that can spatialize a bank of sound samples using the SRIRs and noise recordings of this library, to emulate scenes for the two target formats. The code is similar to the one used to generate the TAU-NIGENS Spatial Sound Events 2021 dataset, and has been ported to Python from the original version written in Matlab. The generator can be found [**here**](https://github.com/danielkrause/DCASE2022-data-generator), along with more details on its use. The generator at the moment is set to work with the NIGENS sound event sample database, and the FSD50K sound event database, but additional sample banks can be added with small modifications. The dataset together with the generator has been used by the authors in the following public challenges: - DCASE 2019 Challenge Task 3, to generate the TAU Spatial Sound Events 2019 dataset (development/evaluation) - DCASE 2020 Challenge Task 3, to generate the TAU-NIGENS Spatial Sound Events 2020 dataset - DCASE2021 Challenge Task 3, to generate the TAU-NIGENS Spatial Sound Events 2021 dataset - DCASE2022 Challenge Task 3, to generate additional SELD synthetic mixtures for training the task baseline NOTE: The current version of the generator is work-in-progress, with some code being quite "rough". If something does not work as intended or it is not clear what certain parts do, please contact us. DATASET STRUCTURE The dataset contains a folder of the SRIRs (TAU-SRIR_DB), with all the SRIRs per room in a single MAT file. The file rirdata.mat contains some general information such as sample rate, format specifications, and most importantly the DOAs of every extracted SRIR. The file measinfo.mat contains measurement and recording information in each room. Finally, the dataset contains a folder of spatial ambient noise recordings (TAU-SNoise_DB), with one subfolder per room having two audio recordings fo the spatial ambience, one for each format, FOA or MIC. For more information on how to SRIRs and DOAs are organized, check the README. DOWNLOAD The files TAU-SRIR_DB.z01, ..., TAU-SRIR_DB.zip contain the SRIRs and measurement info files. The files TAU-SNoise_DB.z01, ..., TAU-SNoise_DB.zip contain the ambient noise recordings. Download the zip files and use your preferred compression tool to unzip these split zip files. To extract a split zip archive (named as zip, z01, z02, ...), you could use, for example, the following syntax in Linux or OSX terminal: Combine the split archive to a single archive: zip -s 0 split.zip --out single.zip Extract the single archive using unzip: unzip single.zip LICENSE The database is published under a custom **open non-commercial with attribution** license. It can be found in the `LICENSE.txt` file that accompanies the data.
Koska saatavilla | 4 huhtik. 2022 |
---|