DeepSafety：多级音频文本特征提取和融合方法，用于对话中的暴力检测

论文标题

DeepSafety：多级音频文本特征提取和融合方法，用于对话中的暴力检测

DeepSafety:Multi-level Audio-Text Feature Extraction and Fusion Approach for Violence Detection in Conversations

论文作者

Anwar, Amna, Kanjo, Eiman, Anderez, Dario Ortega

论文摘要

自然语言处理最近使理解人的互动更加容易，从而改善了情感分析和行为预测。但是，对话中的单词和人声提示的选择为人身安全和预防犯罪的自然语言数据提供了无流行的丰富来源。在进行音频分析时，可以理解对话的上下文，包括人与人之间的紧张或裂痕。在现有工作的基础上，我们介绍了一种新的信息融合方法，该方法将提取和融合多层次的特征，包括口头，声音和文本，作为信息来源，以检测对话中暴力行为的程度。我们的多级多模型融合框架整合了来自原始音频信号的四种类型的信息，包括从BERT和BI-LONG短期内存（LSTM）模型产生的嵌入，以及应用于Mel-Fexeciency Cepstrum（MFCC）的2D CNN的输出，以及音频时间段落密集层的输出。然后，嵌入将传递到三层FC网络，该网络是串联的步骤。我们的实验设置表明，来自不同方式的多级特征的组合比使用F1得分= 0.85的单个表现更好。我们希望从我们的方法中得出的发现为对话中的暴力检测提供了新的方法。

Natural Language Processing has recently made understanding human interaction easier, leading to improved sentimental analysis and behaviour prediction. However, the choice of words and vocal cues in conversations presents an underexplored rich source of natural language data for personal safety and crime prevention. When accompanied by audio analysis, it makes it possible to understand the context of a conversation, including the level of tension or rift between people. Building on existing work, we introduce a new information fusion approach that extracts and fuses multi-level features including verbal, vocal, and text as heterogeneous sources of information to detect the extent of violent behaviours in conversations. Our multilevel multimodel fusion framework integrates four types of information from raw audio signals including embeddings generated from both BERT and Bi-long short-term memory (LSTM) models along with the output of 2D CNN applied to Mel-frequency Cepstrum (MFCC) as well as the output of audio Time-Domain dense layer. The embeddings are then passed to three-layer FC networks, which serve as a concatenated step. Our experimental setup revealed that the combination of the multi-level features from different modalities achieves better performance than using a single one with F1 Score=0.85. We expect that the findings derived from our method provides new approaches for violence detection in conversations.

下载PDF全文

下载文献需遵守相关版权规定

论文标题