Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection

Emre Cakir, Giambattista Parascandolo, Toni Heittola, Heikki Huttunen, Tuomas Virtanen

    Research output: Contribution to journalArticleScientificpeer-review

    203 Citations (Scopus)
    439 Downloads (Pure)


    Sound events often occur in unstructured environments where they exhibit wide variations in their frequency content and temporal structure. Convolutional neural networks (CNNs) are able to extract higher level features that are invariant to local spectral and temporal variations. Recurrent neural networks (RNNs) are powerful in learning the longer term temporal context in the audio signals. CNNs and RNNs as classifiers have recently shown improved performances over established methods in various sound recognition tasks. We combine these two approaches in a convolutional recurrent neural network (CRNN) and apply it on a polyphonic sound event detection task. We compare the performance of the proposed CRNN method with CNN, RNN, and other established methods, and observe a considerable improvement for four different datasets consisting of everyday sound events.

    Original languageEnglish
    Pages (from-to)1291-1303
    Number of pages13
    JournalIeee-Acm transactions on audio speech and language processing
    Issue number6
    Publication statusPublished - Jun 2017
    Publication typeA1 Journal article-refereed


    • Convolutional neural networks (CNNs)
    • deep neural networks
    • recurrent neural networks (RNNs)
    • sound event detection

    Publication forum classification

    • Publication forum level 2


    Dive into the research topics of 'Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection'. Together they form a unique fingerprint.

    Cite this