Recurrent Neural Networks for Polyphonic Sound Event Detection in Real Life Recordings

Giambattista Parascandolo, Heikki Huttunen, Tuomas Virtanen

    Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

    160 Citations (Scopus)

    Abstract

    In this paper we present an approach to polyphonic sound event detection in real life recordings based on bi-directional long short term memory (BLSTM) recurrent neural networks (RNNs). A single multilabel BLSTM RNN is trained to map acoustic features of a mixture signal consisting of sounds from multiple classes, to binary activity indicators of each event class. Our method is tested on a large database of real-life recordings, with 61 classes (e.g. music, car, speech) from 10 different everyday contexts. The proposed method outperforms previous approaches by a large margin, and the results are further improved using data augmentation techniques. Overall, our system reports an average F1-score of 65.5% on 1 second blocks and 64.7% on single frames, a relative improvement over previous state-of-the-art approach of 6.8% and 15.1% respectively.
    Original languageEnglish
    Title of host publication2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
    Pages6440-6444
    Number of pages5
    DOIs
    Publication statusPublished - Mar 2016
    Publication typeA4 Article in a conference publication
    EventIEEE International Conference on Acoustics, Speech and Signal Processing -
    Duration: 1 Jan 19001 Jan 2000

    Publication series

    Name
    ISSN (Electronic)2379-190X

    Conference

    ConferenceIEEE International Conference on Acoustics, Speech and Signal Processing
    Period1/01/001/01/00

    Publication forum classification

    • Publication forum level 1

    Fingerprint Dive into the research topics of 'Recurrent Neural Networks for Polyphonic Sound Event Detection in Real Life Recordings'. Together they form a unique fingerprint.

    Cite this