Filterbank Learning for Deep Neural Network Based Polyphonic Sound Event Detection

Emre Cakir, Ezgi Can Ozan, Tuomas Virtanen

    Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

    19 Citations (Scopus)
    95 Downloads (Pure)

    Abstract

    Deep learning techniques such as deep feedforward neural networks and deep convolutional neural networks have recently been shown to improve the performance in sound event detection compared to traditional methods such as Gaussian mixture models. One of the key factors of this improvement is the capability of deep architectures to automatically learn higher levels of acoustic features in each layer. In this work, we aim to combine the feature learning capabilities of deep architectures with the empirical knowledge of human perception. We use the first layer of a deep neural network to learn a mapping from a high-resolution magnitude spectrum to smaller amount of frequency bands, which effectively learns a filterbank for the sound event detection task. We initialize the first hidden layer weights to match with the perceptually motivated mel filterbank magnitude response. We also integrate this initialization scheme with context windowing by using an appropriately constrained deep convolutional neural network. The proposed method does not only result with better detection accuracy, but also provides insight on the frequencies deemed essential for better discrimination of given sound events.
    Original languageEnglish
    Title of host publication2016 International Joint Conference on Neural Networks (IJCNN)
    PublisherIEEE
    ISBN (Electronic)978-1-5090-0620-5
    DOIs
    Publication statusPublished - 3 Nov 2016
    Publication typeA4 Article in a conference publication
    EventInternational Joint Conference on Neural Networks -
    Duration: 1 Jan 1900 → …

    Publication series

    Name
    ISSN (Electronic)2161-4407

    Conference

    ConferenceInternational Joint Conference on Neural Networks
    Period1/01/00 → …

    Publication forum classification

    • Publication forum level 1

    Fingerprint Dive into the research topics of 'Filterbank Learning for Deep Neural Network Based Polyphonic Sound Event Detection'. Together they form a unique fingerprint.

    Cite this