Self-attention fusion for audiovisual emotion recognition with incomplete data

Kateryna Chumachenko, Alexandros Iosifidis, Moncef Gabbouj

Tutkimustuotos: KonferenssiartikkeliScientificvertaisarvioitu

Abstrakti

In this paper, we consider the problem of multi-modal data analysis with a use case of audiovisual emotion recognition. We propose an architecture capable of learning from raw data and describe three variants of it with distinct modality fusion mechanisms. While most of the previous works consider the ideal scenario of presence of both modalities at all times during inference, we evaluate the robustness of the model in the unconstrained settings where one modality is absent or noisy, and propose a method to mitigate these limitations in a form of modality dropout. Most importantly, we find that following this approach not only improves performance drastically under the absence/noisy representations of one modality, but also improves the performance in a standard ideal setting, outperforming the competing methods.

AlkuperäiskieliEnglanti
Otsikko2022 26th International Conference on Pattern Recognition, ICPR 2022
KustantajaIEEE
Sivut2822-2828
Sivumäärä7
ISBN (elektroninen)9781665490627
DOI - pysyväislinkit
TilaJulkaistu - 2022
OKM-julkaisutyyppiA4 Artikkeli konferenssijulkaisussa
TapahtumaInternational Conference on Pattern Recognition - Montreal, Kanada
Kesto: 21 elok. 202225 elok. 2022

Julkaisusarja

NimiProceedings - International Conference on Pattern Recognition
Vuosikerta2022-August
ISSN (painettu)1051-4651

Conference

ConferenceInternational Conference on Pattern Recognition
Maa/AlueKanada
KaupunkiMontreal
Ajanjakso21/08/2225/08/22

Julkaisufoorumi-taso

  • Jufo-taso 1

!!ASJC Scopus subject areas

  • Computer Vision and Pattern Recognition

Sormenjälki

Sukella tutkimusaiheisiin 'Self-attention fusion for audiovisual emotion recognition with incomplete data'. Ne muodostavat yhdessä ainutlaatuisen sormenjäljen.

Siteeraa tätä