让我们面对现实：概率多模式对话者在二元环境中的面部手势

论文标题

让我们面对现实：概率多模式对话者在二元环境中的面部手势

Let's Face It: Probabilistic Multi-modal Interlocutor-aware Generation of Facial Gestures in Dyadic Settings

论文作者

Jonell, Patrik, Kucherenko, Taras, Henter, Gustav Eje, Beskow, Jonas

论文摘要

为了实现更多自然的面对面互动，对话代理需要将其行为调整给对话者。此的一个关键方面是为代理人（例如面部手势）生成适当的非语言行为，这里定义为面部表情和头部运动。合成非语言行为时，大多数现有的手势生成系统不会利用对话者的多模式线索。那些这样做的人通常使用确定性的方法，这些方法冒着产生重复性和非生动动作的风险。在本文中，我们引入了一种概率方法，以在二元对话中综合对话者感知的面部手势（以高表达性火焰参数为代表）。我们的贡献是：a）一种从多方视频和语音记录中提取特征的方法，从而导致了一种表示，该表示可以独立控制和操纵3D阿凡达的表达和语音表达； b）Moglow的扩展，Moglow是一种基于归一化流量的最新运动合成方法，还将对话者的多模式信号作为输入，随后输出对话者感知的面部手势； c）评估输入方式的使用和相对重要性的主观评估。结果表明，该模型成功利用对话者的输入来生成更合适的行为。视频，数据和代码可用：https：//jonepatr.github.io/lets_face_it。

To enable more natural face-to-face interactions, conversational agents need to adapt their behavior to their interlocutors. One key aspect of this is generation of appropriate non-verbal behavior for the agent, for example facial gestures, here defined as facial expressions and head movements. Most existing gesture-generating systems do not utilize multi-modal cues from the interlocutor when synthesizing non-verbal behavior. Those that do, typically use deterministic methods that risk producing repetitive and non-vivid motions. In this paper, we introduce a probabilistic method to synthesize interlocutor-aware facial gestures - represented by highly expressive FLAME parameters - in dyadic conversations. Our contributions are: a) a method for feature extraction from multi-party video and speech recordings, resulting in a representation that allows for independent control and manipulation of expression and speech articulation in a 3D avatar; b) an extension to MoGlow, a recent motion-synthesis method based on normalizing flows, to also take multi-modal signals from the interlocutor as input and subsequently output interlocutor-aware facial gestures; and c) a subjective evaluation assessing the use and relative importance of the input modalities. The results show that the model successfully leverages the input from the interlocutor to generate more appropriate behavior. Videos, data, and code available at: https://jonepatr.github.io/lets_face_it.

下载PDF全文

下载文献需遵守相关版权规定

论文标题