功能选择增强和特征空间可视化，以识别语音的情绪识别

论文标题

功能选择增强和特征空间可视化，以识别语音的情绪识别

Feature Selection Enhancement and Feature Space Visualization for Speech-Based Emotion Recognition

论文作者

Kanwal, Sofia, Asghar, Sohail, Ali, Hazrat

论文摘要

强大的语音情感识别取决于语音特征的质量。我们提出了言语功能增强策略，以改善语音情感识别。我们使用了Interspeech 2010挑战功能集。我们从特征集合和应用原理分析分析到子集中确定了子集。最后，这些功能水平融合。在应用特征以识别情绪之前，使用T分布的邻居嵌入（T-SNE）分析所得的功能集。将该方法与文献中使用的最新方法进行了比较。经验证据是使用两个著名数据集绘制的：情感语音数据集（EMO-DB）和Ryerson Audio-Visual Visual of-Visual of Visatual of Emotional Speak and Song（Ravdess）分别用于两种语言，即德语和英语。与基线研究相比，我们的七个情绪中有六个情绪中有六个情绪中有六个情绪中有六个情绪中有六个情绪中的六个情绪中的六个情绪中的六个情绪中的六个情绪中的六个情绪中获得了平均识别率增长11.5 \％，而Ravdess数据集的平均识别率增加为11.5 \％。

Robust speech emotion recognition relies on the quality of the speech features. We present speech features enhancement strategy that improves speech emotion recognition. We used the INTERSPEECH 2010 challenge feature-set. We identified subsets from the features set and applied Principle Component Analysis to the subsets. Finally, the features are fused horizontally. The resulting feature set is analyzed using t-distributed neighbour embeddings (t-SNE) before the application of features for emotion recognition. The method is compared with the state-of-the-art methods used in the literature. The empirical evidence is drawn using two well-known datasets: Emotional Speech Dataset (EMO-DB) and Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) for two languages, German and English, respectively. Our method achieved an average recognition gain of 11.5\% for six out of seven emotions for the EMO-DB dataset, and 13.8\% for seven out of eight emotions for the RAVDESS dataset as compared to the baseline study.

下载PDF全文

下载文献需遵守相关版权规定

论文标题