Abstract
This article proposes an active learning system for sound event detection (SED). It aims at maximizing the accuracy of a learned SED model with limited annotation effort. The proposed system analyzes an initially unlabeled audio dataset, from which it selects sound segments for manual annotation. The candidate segments are generated based on a proposed change point detection approach, and the selection is based on the principle of mismatch-first farthest-traversal. During the training of SED models, recordings are used as training inputs, preserving the long-term context for annotated segments. The proposed system clearly outperforms reference methods in the two datasets used for evaluation (TUT Rare Sound 2017 and TAU Spatial Sound 2019). Training with recordings as context outperforms training with only annotated segments. Mismatch-first farthest-traversal outperforms reference sample selection methods based on random sampling and uncertainty sampling. Remarkably, the required annotation effort can be greatly reduced on the dataset where target sound events are rare: by annotating only 2% of the training data, the achieved SED performance is similar to annotating all the training data.
Original language | English |
---|---|
Pages (from-to) | 2895-2905 |
Number of pages | 11 |
Journal | IEEE/ACM Transactions on Audio Speech and Language Processing |
Volume | 28 |
DOIs | |
Publication status | Published - 2020 |
Publication type | A1 Journal article-refereed |
Funding
Manuscript received February 12, 2020; revised July 3, 2020 and August 6, 2020; accepted September 3, 2020. Date of publication October 8, 2020; date of current version November 5, 2020. This work was supported by the European Research Council under the European Unions H2020 Framework Programme through ERC Grant Agreement 637422 EVERYSOUND. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Isabel Barbancho. (Corresponding author: Shuyang Zhao.) The authors are with the Faculty of Information Technology and Communication Sciences, Tampere University, 33720 Tampere, Finland (e-mail: [email protected]; [email protected]; [email protected]). Digital Object Identifier 10.1109/TASLP.2020.3029652
Keywords
- Active learning
- change point detection
- mismatch-first farthest-traversal
- sound event detection
- weakly supervised learning
Publication forum classification
- Publication forum level 3
ASJC Scopus subject areas
- Computer Science (miscellaneous)
- Acoustics and Ultrasonics
- Computational Mathematics
- Electrical and Electronic Engineering