Abstract
This paper targets on a generalized vocal mode classifier (speech/singing) that works on audio data from an arbitrary data source. However, previous studies on sound classification are commonly based on cross-validation using a single dataset, without considering the cases that training and testing data are recorded in mismatched condition. Experiments revealed a big difference between homogeneous recognition scenario and heterogeneous recognition scenario, using a new dataset TUT-vocal-2016. In the homogeneous recognition scenario, the classification accuracy using cross-validation on TUT-vocal-2016 was 95.5%. In heterogeneous recognition scenario, seven existing datasets were used as training material and TUT-vocal-2016 was used for testing, the classification accuracy was only 69.6%. Several feature normalization methods were tested to improve the performance in heterogeneous recognition scenario. The best performance (96.8%) was obtained using the proposed subdataset-wise normalization.
Original language | English |
---|---|
Title of host publication | 2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) |
Publisher | IEEE Computer Society |
Pages | 16–20 |
ISBN (Print) | 978-1-5386-1631-4 |
DOIs | |
Publication status | Published - 2017 |
Publication type | A4 Article in a conference publication |
Event | IEEE Workshop on Applications of Signal Processing to Audio and Acoustics - Duration: 1 Jan 1900 → … |
Conference
Conference | IEEE Workshop on Applications of Signal Processing to Audio and Acoustics |
---|---|
Period | 1/01/00 → … |
Keywords
- sound classification
- vocal mode
- heterogeneous data sources
- feature normalization
Publication forum classification
- Publication forum level 1