比较exvo多任务学习曲目的监督和自我监督的嵌入

论文标题

比较exvo多任务学习曲目的监督和自我监督的嵌入

Comparing supervised and self-supervised embedding for ExVo Multi-Task learning track

论文作者

Purohit, Tilak, Mahmoud, Imen Ben, Vlasenko, Bogdan, -Doss, Mathew Magimai.

论文摘要

ICML表达性发声（EXVO）多任务挑战2022，重点是理解非语言发声的情感方面（声音爆发（VB））。这一挑战的目的是预测VB的情感强度，这是预测说话者的年龄和本地国家所需的多任务挑战。在这一挑战中，我们研究和比较了两个不同的嵌入空间，即基于自我监督的学习（SSL）的嵌入和基于任务的基于学习的嵌入。为此，我们研究了从几个预训练的SSL神经网络和特定于任务的监督分类神经网络获得的特征表示。我们的研究表明，最佳性能是通过混合方法获得的，其中使用SSL和特定于任务的监督学习得出的预测。我们在测试集合的最佳系统超过了比较基线（所有子任务分数的谐波平均值，即$ s_ {mtl} $），相对$ 13 \％$ $。

The ICML Expressive Vocalizations (ExVo) Multi-task challenge 2022, focuses on understanding the emotional facets of the non-linguistic vocalizations (vocal bursts (VB)). The objective of this challenge is to predict emotional intensities for VB, being a multi-task challenge it also requires to predict speakers' age and native-country. For this challenge we study and compare two distinct embedding spaces namely, self-supervised learning (SSL) based embeddings and task-specific supervised learning based embeddings. Towards that, we investigate feature representations obtained from several pre-trained SSL neural networks and task-specific supervised classification neural networks. Our studies show that the best performance is obtained with a hybrid approach, where predictions derived via both SSL and task-specific supervised learning are used. Our best system on test-set surpasses the ComPARE baseline (harmonic mean of all sub-task scores i.e., $S_{MTL}$) by a relative $13\%$ margin.

下载PDF全文

下载文献需遵守相关版权规定

论文标题