SZTU-CMU at MER2024: Improving Emotion-LLaMA with Conv-Attention for Multimodal Emotion Recognition

Cheng, Zebang; Tu, Shuyuan; Huang, Dawei; Li, Minghan; Peng, Xiaojiang; Cheng, Zhi-Qi; Hauptmann, Alexander G.

doi:10.1145/3689092.3689404

Computer Science > Multimedia

arXiv:2408.10500 (cs)

[Submitted on 20 Aug 2024 (v1), last revised 21 Aug 2024 (this version, v2)]

Title:SZTU-CMU at MER2024: Improving Emotion-LLaMA with Conv-Attention for Multimodal Emotion Recognition

Authors:Zebang Cheng, Shuyuan Tu, Dawei Huang, Minghan Li, Xiaojiang Peng, Zhi-Qi Cheng, Alexander G. Hauptmann

View PDF HTML (experimental)

Abstract:This paper presents our winning approach for the MER-NOISE and MER-OV tracks of the MER2024 Challenge on multimodal emotion recognition. Our system leverages the advanced emotional understanding capabilities of Emotion-LLaMA to generate high-quality annotations for unlabeled samples, addressing the challenge of limited labeled data. To enhance multimodal fusion while mitigating modality-specific noise, we introduce Conv-Attention, a lightweight and efficient hybrid framework. Extensive experimentation vali-dates the effectiveness of our approach. In the MER-NOISE track, our system achieves a state-of-the-art weighted average F-score of 85.30%, surpassing the second and third-place teams by 1.47% and 1.65%, respectively. For the MER-OV track, our utilization of Emotion-LLaMA for open-vocabulary annotation yields an 8.52% improvement in average accuracy and recall compared to GPT-4V, securing the highest score among all participating large multimodal models. The code and model for Emotion-LLaMA are available at this https URL.

Comments:	Ranked 1st in MER24@IJCAI and MRAC24@ACM MM (MER-NOISE & MER-OV (self-evaluated))
Subjects:	Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2408.10500 [cs.MM]
	(or arXiv:2408.10500v2 [cs.MM] for this version)
	https://doi.org/10.48550/arXiv.2408.10500
Related DOI:	https://doi.org/10.1145/3689092.3689404

Submission history

From: Zhi-Qi Cheng [view email]
[v1] Tue, 20 Aug 2024 02:46:03 UTC (2,409 KB)
[v2] Wed, 21 Aug 2024 18:58:26 UTC (2,409 KB)

Computer Science > Multimedia

Title:SZTU-CMU at MER2024: Improving Emotion-LLaMA with Conv-Attention for Multimodal Emotion Recognition

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Multimedia

Title:SZTU-CMU at MER2024: Improving Emotion-LLaMA with Conv-Attention for Multimodal Emotion Recognition

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators