M3ED：多模式多场景多标签情感对话数据库

论文标题

M3ED：多模式多场景多标签情感对话数据库

M3ED: Multi-modal Multi-scene Multi-label Emotional Dialogue Database

论文作者

Zhao, Jinming, Zhang, Tenggan, Hu, Jingwen, Liu, Yuchen, Jin, Qin, Wang, Xinchao, Li, Haizhou

论文摘要

演讲者的情绪状态可能会受到对话中许多不同因素的影响，例如对话场景，对话主题和对话者刺激。但是，目前可用的数据资源以支持对话中这种多模式情感分析的规模和多样性有限。在这项工作中，我们提出了一个多模式的多景观多标签情感对话数据集M3ED，其中包含来自56个不同电视连续剧的990个二元性情感对话，总共9,082个转弯和24,449个话语。 M3 Ed在语音层面上注释了7个情感类别（快乐，惊喜，悲伤，厌恶，愤怒，恐惧和中立），并涵盖声学，视觉和文字方式。据我们所知，M3ED是中文中的第一个多模式情感对话数据集。它对于跨文化情绪分析和认可很有价值。我们在M3ED数据集上应用了几种最先进的方法，以验证数据集的有效性和质量。我们还提出了一个通用的多模式对话 - 感知互动框架MDI，以模拟情感识别的对话上下文，该识别的性能与M3ED上最新的方法相当。完整的数据集和代码可用。

The emotional state of a speaker can be influenced by many different factors in dialogues, such as dialogue scene, dialogue topic, and interlocutor stimulus. The currently available data resources to support such multimodal affective analysis in dialogues are however limited in scale and diversity. In this work, we propose a Multi-modal Multi-scene Multi-label Emotional Dialogue dataset, M3ED, which contains 990 dyadic emotional dialogues from 56 different TV series, a total of 9,082 turns and 24,449 utterances. M3 ED is annotated with 7 emotion categories (happy, surprise, sad, disgust, anger, fear, and neutral) at utterance level, and encompasses acoustic, visual, and textual modalities. To the best of our knowledge, M3ED is the first multimodal emotional dialogue dataset in Chinese. It is valuable for cross-culture emotion analysis and recognition. We apply several state-of-the-art methods on the M3ED dataset to verify the validity and quality of the dataset. We also propose a general Multimodal Dialogue-aware Interaction framework, MDI, to model the dialogue context for emotion recognition, which achieves comparable performance to the state-of-the-art methods on the M3ED. The full dataset and codes are available.

下载PDF全文

下载文献需遵守相关版权规定

论文标题