论文标题
了解语音及其感知重要性的手势参数的可预测性
Understanding the Predictability of Gesture Parameters from Speech and their Perceptual Importance
论文作者
论文摘要
手势行为是人类对话的自然组成部分。许多工作集中在消除乏味的手动作的需求,通过设计语音驱动的手势发电机来创建体现的对话剂。但是,这些发电机通常以黑盒方式工作,假设输入语音和输出运动之间存在一般关系。由于他们的成功仍然有限,我们更详细地研究了语音如何与手势运动的不同方面相关。我们确定表征手势的许多参数,例如速度和手势大小,并以两倍的方式探索它们与语音信号的关系。首先,我们训练多个经常性网络,以预测语音中的手势参数,以了解单独从语音单独使用手势属性来建模的方式。我们发现,手势参数可以从语音中部分预测,而某些参数(例如路径长度)比其他参数(例如速度)更准确地预测。其次,我们设计了一项感知研究,以评估每个手势参数对人们认为适合语音的运动的重要性。结果表明,任何参数中的降解都被否定地观察,但是某些变化(例如手形)比其他变化更具影响力。可以在https://youtu.be/aw6-_5kmljy上找到视频摘要。
Gesture behavior is a natural part of human conversation. Much work has focused on removing the need for tedious hand-animation to create embodied conversational agents by designing speech-driven gesture generators. However, these generators often work in a black-box manner, assuming a general relationship between input speech and output motion. As their success remains limited, we investigate in more detail how speech may relate to different aspects of gesture motion. We determine a number of parameters characterizing gesture, such as speed and gesture size, and explore their relationship to the speech signal in a two-fold manner. First, we train multiple recurrent networks to predict the gesture parameters from speech to understand how well gesture attributes can be modeled from speech alone. We find that gesture parameters can be partially predicted from speech, and some parameters, such as path length, being predicted more accurately than others, like velocity. Second, we design a perceptual study to assess the importance of each gesture parameter for producing motion that people perceive as appropriate for the speech. Results show that a degradation in any parameter was viewed negatively, but some changes, such as hand shape, are more impactful than others. A video summarization can be found at https://youtu.be/aw6-_5kmLjY.