Comparison of Convolution Types in CNN-based Feature Extraction for Sound Source Localization

Daniel Krause, Archontis Politis, Konrad Kowalczyk

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review


This paper presents an overview of several approaches to convolutional feature extraction in the context of deep neural network (DNN) based sound source localization. Different ways of processing multichannel audio data in the time-frequency domain using convolutional neural networks (CNNs) are described and tested with the aim to provide a comparative study of their performance. In most considered approaches, models are trained with phase and magnitude components of the Short-Time Fourier Transform (STFT). In addition to state-of-the-art 2D convolutional layers, we investigate several solutions for the processing of 3D matrices containing multichannel complex representation of the microphone signals. The first two proposed approaches are the 3D convolutions and depthwise separable convolutions in which two types of filters are used to exploit information within and between the channels. Note that this paper presents the first application of depthwise separable convolutions in a task of sound source localization. The third approach is based on complex-valued neural networks which allows for performing convolutions directly on complex signal representations. Experiments are conducted using two synthetic datasets containing noise and speech signals recorded using a tetrahedral microphone array. The paper presents the results obtained using all investigated model types and discusses the resulting accuracy and computational complexity in DNN-based source localization.
Original languageEnglish
Title of host publication28th European Signal Processing Conference (EUSIPCO 2020)
Number of pages5
ISBN (Electronic)978-9-0827-9705-3
Publication statusPublished - 1 Nov 2020
Publication typeA4 Article in a conference publication
EventEuropean Signal Processing Conference -
Duration: 24 Aug 202028 Aug 2020

Publication series

NameEuropean Signal Processing Conference
ISSN (Print)2219-5491


ConferenceEuropean Signal Processing Conference

Publication forum classification

  • Publication forum level 1

Fingerprint Dive into the research topics of 'Comparison of Convolution Types in CNN-based Feature Extraction for Sound Source Localization'. Together they form a unique fingerprint.

Cite this