对话框没有对话数据：从VQA数据学习视觉对话框代理

论文标题

对话框没有对话数据：从VQA数据学习视觉对话框代理

Dialog without Dialog Data: Learning Visual Dialog Agents from VQA Data

论文作者

Cogswell, Michael, Lu, Jiasen, Jain, Rishabh, Lee, Stefan, Parikh, Devi, Batra, Dhruv

论文摘要

我们可以开发可有效适应新任务而无需忘记如何与人交谈的视觉扎根对话代理吗？这样的代理可以利用更多的现有数据来推广到新任务，从而最大程度地减少昂贵的数据收集和注释。在这项工作中，我们研究了一个设置，我们称为“无需对话”的对话框，该设置要求代理商开发可视地接地的对话模型，这些模型可以适应新任务而无需语言级别的监督。通过分解意图和语言，我们的模型在微调新任务后最大程度地减少了语言漂移。我们提出了定性的结果，自动指标和人类研究，所有这些都表明我们的模型可以适应新任务并保持语言质量。基线要么无法在新任务上表现出色，要么体验语言漂移，对人类变得难以理解。代码已在https://github.com/mcogswell/dialog_without_dialog上提供。

Can we develop visually grounded dialog agents that can efficiently adapt to new tasks without forgetting how to talk to people? Such agents could leverage a larger variety of existing data to generalize to new tasks, minimizing expensive data collection and annotation. In this work, we study a setting we call "Dialog without Dialog", which requires agents to develop visually grounded dialog models that can adapt to new tasks without language level supervision. By factorizing intention and language, our model minimizes linguistic drift after fine-tuning for new tasks. We present qualitative results, automated metrics, and human studies that all show our model can adapt to new tasks and maintain language quality. Baselines either fail to perform well at new tasks or experience language drift, becoming unintelligible to humans. Code has been made available at https://github.com/mcogswell/dialog_without_dialog

下载PDF全文

下载文献需遵守相关版权规定

论文标题