Using sequential information in polyphonic sound event detection

    Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

    5 Citations (Scopus)
    12 Downloads (Pure)

    Abstract

    To detect the class, and start and end times of sound events in real world recordings is a challenging task. Current computer systems often show relatively high frame-wise accuracy but low event-wise accuracy. In this paper, we attempted to merge the gap by explicitly including sequential information to improve the performance of a state-of-the-art polyphonic sound event detection system. We propose to 1) use delayed predictions of event activities as additional input features that are fed back to the neural network; 2) build N-grams to model the co-occurrence probabilities of different events; 3) use se-quentialloss to train neural networks. Our experiments on a corpus of real world recordings show that the N-grams could smooth the spiky output of a state-of-the-art neural network system, and improve both the frame-wise and the event-wise metrics.

    Original languageEnglish
    Title of host publication16th International Workshop on Acoustic Signal Enhancement, IWAENC 2018
    PublisherIEEE
    Pages291-295
    Number of pages5
    ISBN (Electronic)9781538681510
    DOIs
    Publication statusPublished - 2 Nov 2018
    Publication typeA4 Article in conference proceedings
    EventInternational Workshop on Acoustic Signal Enhancement - Tokyo, Japan
    Duration: 17 Sept 201820 Sept 2018

    Conference

    ConferenceInternational Workshop on Acoustic Signal Enhancement
    Country/TerritoryJapan
    CityTokyo
    Period17/09/1820/09/18

    Keywords

    • Language modelling
    • Polyphonic sound event detection
    • Sequential information

    Publication forum classification

    • Publication forum level 1

    ASJC Scopus subject areas

    • Signal Processing
    • Acoustics and Ultrasonics

    Fingerprint

    Dive into the research topics of 'Using sequential information in polyphonic sound event detection'. Together they form a unique fingerprint.

    Cite this