Self-attention fusion for audiovisual emotion recognition with incomplete data

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

59 Citations (Scopus)
28 Downloads (Pure)

Abstract

In this paper, we consider the problem of multi-modal data analysis with a use case of audiovisual emotion recognition. We propose an architecture capable of learning from raw data and describe three variants of it with distinct modality fusion mechanisms. While most of the previous works consider the ideal scenario of presence of both modalities at all times during inference, we evaluate the robustness of the model in the unconstrained settings where one modality is absent or noisy, and propose a method to mitigate these limitations in a form of modality dropout. Most importantly, we find that following this approach not only improves performance drastically under the absence/noisy representations of one modality, but also improves the performance in a standard ideal setting, outperforming the competing methods.

Original languageEnglish
Title of host publication2022 26th International Conference on Pattern Recognition, ICPR 2022
PublisherIEEE
Pages2822-2828
Number of pages7
ISBN (Electronic)9781665490627
DOIs
Publication statusPublished - 2022
Publication typeA4 Article in conference proceedings
EventInternational Conference on Pattern Recognition - Montreal, Canada
Duration: 21 Aug 202225 Aug 2022

Publication series

NameProceedings - International Conference on Pattern Recognition
Volume2022-August
ISSN (Print)1051-4651

Conference

ConferenceInternational Conference on Pattern Recognition
Country/TerritoryCanada
CityMontreal
Period21/08/2225/08/22

Publication forum classification

  • Publication forum level 1

ASJC Scopus subject areas

  • Computer Vision and Pattern Recognition

Fingerprint

Dive into the research topics of 'Self-attention fusion for audiovisual emotion recognition with incomplete data'. Together they form a unique fingerprint.

Cite this