Learning vocal mode classifiers from heterogeneous data sources

    Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

    1 Citation (Scopus)
    59 Downloads (Pure)

    Abstract

    This paper targets on a generalized vocal mode classifier (speech/singing) that works on audio data from an arbitrary data source. However, previous studies on sound classification are commonly based on cross-validation using a single dataset, without considering the cases that training and testing data are recorded in mismatched condition. Experiments revealed a big difference between homogeneous recognition scenario and heterogeneous recognition scenario, using a new dataset TUT-vocal-2016. In the homogeneous recognition scenario, the classification accuracy using cross-validation on TUT-vocal-2016 was 95.5%. In heterogeneous recognition scenario, seven existing datasets were used as training material and TUT-vocal-2016 was used for testing, the classification accuracy was only 69.6%. Several feature normalization methods were tested to improve the performance in heterogeneous recognition scenario. The best performance (96.8%) was obtained using the proposed subdataset-wise normalization.
    Original languageEnglish
    Title of host publication 2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)
    PublisherIEEE Computer Society
    Pages16–20
    ISBN (Print)978-1-5386-1631-4
    DOIs
    Publication statusPublished - 2017
    Publication typeA4 Article in a conference publication
    EventIEEE Workshop on Applications of Signal Processing to Audio and Acoustics -
    Duration: 1 Jan 1900 → …

    Conference

    ConferenceIEEE Workshop on Applications of Signal Processing to Audio and Acoustics
    Period1/01/00 → …

    Keywords

    • sound classification
    • vocal mode
    • heterogeneous data sources
    • feature normalization

    Publication forum classification

    • Publication forum level 1

    Fingerprint

    Dive into the research topics of 'Learning vocal mode classifiers from heterogeneous data sources'. Together they form a unique fingerprint.

    Cite this