Abstract
To detect the class, and start and end times of sound events in real world recordings is a challenging task. Current computer systems often show relatively high frame-wise accuracy but low event-wise accuracy. In this paper, we attempted to merge the gap by explicitly including sequential information to improve the performance of a state-of-the-art polyphonic sound event detection system. We propose to 1) use delayed predictions of event activities as additional input features that are fed back to the neural network; 2) build N-grams to model the co-occurrence probabilities of different events; 3) use se-quentialloss to train neural networks. Our experiments on a corpus of real world recordings show that the N-grams could smooth the spiky output of a state-of-the-art neural network system, and improve both the frame-wise and the event-wise metrics.
Original language | English |
---|---|
Title of host publication | 16th International Workshop on Acoustic Signal Enhancement, IWAENC 2018 |
Publisher | IEEE |
Pages | 291-295 |
Number of pages | 5 |
ISBN (Electronic) | 9781538681510 |
DOIs | |
Publication status | Published - 2 Nov 2018 |
Publication type | A4 Article in conference proceedings |
Event | International Workshop on Acoustic Signal Enhancement - Tokyo, Japan Duration: 17 Sept 2018 → 20 Sept 2018 |
Conference
Conference | International Workshop on Acoustic Signal Enhancement |
---|---|
Country/Territory | Japan |
City | Tokyo |
Period | 17/09/18 → 20/09/18 |
Keywords
- Language modelling
- Polyphonic sound event detection
- Sequential information
Publication forum classification
- Publication forum level 1
ASJC Scopus subject areas
- Signal Processing
- Acoustics and Ultrasonics