Acoustic Scene Classification using Deep Fisher network

Spoorthy Venkatesh, Manjunath Mulimani, Shashidhar G. Koolagudi

Research output: Contribution to journalArticleScientificpeer-review

3 Citations (Scopus)

Abstract

Acoustic Scene Classification (ASC) is the task of assigning a semantic label to an audio recording, based on the surrounding environment. In this work, a Fisher network is introduced for ASC. The proposed method mimics the working mechanism of a feed-forward Convolutional Neural Network (CNN) where, output of a layer is fed as an input to the succeeding layer. The Fisher network consists of a feature extraction step followed by a Fisher layer. The Fisher layer has three sub-layers, namely, Fisher Vector (FV) encoder, temporal pyramid and normalization layers along with feature reduction layer. Gammatone Time Cepstral Coefficients (GTCCs) and Mel-spectrograms are the features encoded as Fisher vector representation in FV encoder sub-layer. Temporal information of the Fisher vectors is retained using temporal pyramid sub-layer. After temporal pyramids are extracted from Fisher vectors, they are available as a feature vector. Irrelevant dimensions of the temporal pyramids are reduced further using Principal Component Analysis (PCA) in normalization and PCA sub-layers. The proposed model is evaluated on five DCASE datasets, TUT Urban Acoustic Scenes 2018 and Mobile, DCASE 2019 Acoustic Scene Classification Task 1(a) and Task 1(b), TAU Urban Acoustic Scenes 2020 datasets. The overall classification accuracy is 93%, 91%, 92%, 91% and 89% for TUT 2018, TUT Mobile 2018, DCASE Task 1(a) 2019, DCASE Task 1(b) 2019, and TAU Urban Acoustic Scenes 2020 datasets, respectively. The proposed model performed much better than the state-of-the-art ASC systems.

Original languageEnglish
Article number104062
JournalDigital Signal Processing: A Review Journal
Volume139
DOIs
Publication statusPublished - Jul 2023
Publication typeA1 Journal article-refereed

Keywords

  • Acoustic Scene Classification (ASC)
  • Fisher layer
  • Fisher network
  • Fisher vector encoding
  • Principal Component Analysis (PCA)

Publication forum classification

  • Publication forum level 1

ASJC Scopus subject areas

  • Signal Processing
  • Computer Vision and Pattern Recognition
  • Statistics, Probability and Uncertainty
  • Computational Theory and Mathematics
  • Artificial Intelligence
  • Applied Mathematics
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Acoustic Scene Classification using Deep Fisher network'. Together they form a unique fingerprint.

Cite this