Zero-Shot Audio Classification Based On Class Label Embeddings

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

32 Citations (Scopus)

Abstract

This paper proposes a zero-shot learning approach for audio classification based on the textual information about class labels without any audio samples from target classes. We propose an audio classification system built on the bilinear model, which takes audio feature embeddings and semantic class label embeddings as input, and measures the compatibility between an audio feature embedding and a class label embedding. We use VGGish to extract audio feature embeddings from audio recordings. We treat textual labels as semantic side information of audio classes, and use Word2Vec to generate class label embeddings. Results on the ESC-50 dataset show that the proposed system can perform zero-shot audio classification with small training dataset. It can achieve accuracy (26 % on average) better than random guess (10 %) on each audio category. Particularly, it reaches up to 39.7 % for the category of natural audio classes.
Original languageEnglish
Title of host publication2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)
PublisherIEEE
Pages264-267
Number of pages4
ISBN (Electronic)978-1-7281-1123-0
ISBN (Print)978-1-7281-1124-7
DOIs
Publication statusPublished - Oct 2019
Publication typeA4 Article in conference proceedings
EventIEEE Workshop on Applications of Signal Processing to Audio and Acoustics -
Duration: 1 Jan 1900 → …

Publication series

NameIEEE Workshop on Applications of Signal Processing to Audio and Acoustics
ISSN (Print)1931-1168
ISSN (Electronic)1947-1629

Conference

ConferenceIEEE Workshop on Applications of Signal Processing to Audio and Acoustics
Period1/01/00 → …

Keywords

  • zero-shot learning
  • audio classification
  • class label embedding

Publication forum classification

  • Publication forum level 1

Fingerprint

Dive into the research topics of 'Zero-Shot Audio Classification Based On Class Label Embeddings'. Together they form a unique fingerprint.

Cite this