Polyphonic Sound Event Detection Using Multi Label Deep Neural Networks

Emre Cakir, Toni Heittola, Heikki Huttunen, Tuomas Virtanen

    Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

    154 Citations (Scopus)


    In this paper, the use of multi label neural networks are proposed for detection of temporally overlapping sound events in realistic environments. Real-life sound recordings typically have many overlapping sound events, making it hard to recognize each event with the standard sound event detection methods. Frame-wise spectral-domain features are used as inputs to train a deep neural network for multi label classification in this work. The model is evaluated with recordings from realistic everyday environments and the obtained overall accuracy is 63.8%. The method is compared against a state-of-the-art method using non-negative matrix factorization as a pre-processing stage and hidden Markov models as a classifier. The proposed method improves the accuracy by 19% percentage points overall.
    Original languageEnglish
    Title of host publication2015 International Joint Conference on Neural Networks (IJCNN)
    ISBN (Print)978-1-4799-1959-8
    Publication statusPublished - Jul 2015
    Publication typeA4 Article in a conference publication
    EventInternational Joint Conference on Neural Networks -
    Duration: 1 Jan 1900 → …


    ConferenceInternational Joint Conference on Neural Networks
    Period1/01/00 → …

    Publication forum classification

    • Publication forum level 1


    Dive into the research topics of 'Polyphonic Sound Event Detection Using Multi Label Deep Neural Networks'. Together they form a unique fingerprint.

    Cite this