Metrics for polyphonic sound event detection

    Research output: Contribution to journalArticleScientificpeer-review

    202 Citations (Scopus)
    1181 Downloads (Pure)

    Abstract

    This paper presents and discusses various metrics proposed for evaluation of polyphonic sound event detection systems used in realistic situations where there are typically multiple sound sources active simultaneously. The system output in this case contains overlapping events, marked as multiple sounds detected as being active at the same time. The polyphonic system output requires a suitable procedure for evaluation against a reference. Metrics from neighboring fields such as speech recognition and speaker diarization can be used, but they need to be partially redefined to deal with the overlapping events. We present a review of the most common metrics in the field and the way they are adapted and interpreted in the polyphonic case. We discuss segment-based and event-based definitions of each metric and explain the consequences of instance-based and class-based averaging using a case study. In parallel, we provide a toolbox containing implementations of presented metrics.

    Original languageEnglish
    Article number162
    JournalApplied Sciences
    Volume6
    Issue number6
    DOIs
    Publication statusPublished - 2016
    Publication typeA1 Journal article-refereed

    Keywords

    • Audio content analysis
    • Audio signal processing
    • Computational auditory scene analysis
    • Evaluation of sound event detection
    • Everyday sounds
    • Pattern recognition
    • Polyphonic sound event detection
    • Sound events

    Publication forum classification

    • Publication forum level 1

    ASJC Scopus subject areas

    • Fluid Flow and Transfer Processes
    • Process Chemistry and Technology
    • Computer Science Applications
    • Engineering(all)
    • Materials Science(all)
    • Instrumentation

    Fingerprint

    Dive into the research topics of 'Metrics for polyphonic sound event detection'. Together they form a unique fingerprint.

    Cite this