Sound Event Detection with Depthwise Separable and Dilated Convolutions

Konstantinos Drossos, Stylianos Ioannis Mimilakis, Shayan Gharib, Yanxiong Li, Tuomas Virtanen

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

5 Citations (Scopus)

Abstract

State-of-the-art sound event detection (SED) methods usually employ a series of convolutional neural networks (CNNs) to extract useful features from the input audio signal, and then recurrent neural networks (RNNs) to model longer temporal context in the extracted features. The number of the channels of the CNNs and size of the weight matrices of the RNNs have a direct effect on the total amount of parameters of the SED method, which is to a couple of millions. Additionally, the usually long sequences that are used as an input to an SED method along with the employment of an RNN, introduce implications like increased training time, difficulty at gradient flow, and impeding the parallelization of the SED method. To tackle all these problems, we propose the replacement of the CNNs with depthwise separable convolutions and the replacement of the RNNs with dilated convolutions. We compare the proposed method to a baseline convolutional neural network on a SED task, and achieve a reduction of the amount of parameters by 85% and average training time per epoch by 78%, and an increase the average frame-wise F1 score and reduction of the average error rate by 4.6% and 3.8%, respectively.
Original languageEnglish
Title of host publication 2020 International Joint Conference on Neural Networks (IJCNN)
PublisherIEEE
ISBN (Electronic)978-1-7281-6926-2
DOIs
Publication statusPublished - 2020
Publication typeA4 Article in conference proceedings
EventInternational Joint Conference on Neural Networks (IJCNN) - Virtual, Glasgow, United Kingdom
Duration: 19 Jul 202024 Jul 2020
https://wcci2020.org

Publication series

Name
ISSN (Electronic)2161-4407

Conference

ConferenceInternational Joint Conference on Neural Networks (IJCNN)
Abbreviated titleWCCI/IJCNN
Country/TerritoryUnited Kingdom
CityGlasgow
Period19/07/2024/07/20
Internet address

Publication forum classification

  • Publication forum level 1

Fingerprint

Dive into the research topics of 'Sound Event Detection with Depthwise Separable and Dilated Convolutions'. Together they form a unique fingerprint.

Cite this