SSCR：通过自我监督的反事实推理基于迭代语言的图像编辑

论文标题

SSCR：通过自我监督的反事实推理基于迭代语言的图像编辑

SSCR: Iterative Language-Based Image Editing via Self-Supervised Counterfactual Reasoning

论文作者

Fu, Tsu-Jui, Wang, Xin Eric, Grafton, Scott, Eckstein, Miguel, Wang, William Yang

论文摘要

基于迭代语言的图像编辑（IL-BIE）任务遵循迭代说明，逐步编辑图像。对于ILBIE来说，数据稀缺是一个重要的问题，因为在基于教学的更改之前和之后，收集大型图像示例是一项挑战。但是，即使有一个陌生的图像指导对，人类仍然可以完成这些编辑任务。这种能力是由反事实思维和思考已经发生事件的替代方案的能力所致。在本文中，我们引入了一个自我监督的反事实推理（SSCR）框架，该框架结合了反事实思维以克服数据稀缺。 SSCR允许该模型考虑与先前图像配对的分发指令。借助交叉任务一致性（CTC），我们在自我监督的情况下训练这些反事实指令。广泛的结果表明，SSCR在对象身份和位置方面提高了ILBIE的正确性，从而在两个IBLIE数据集（I-Clevr和Codraw）上建立了新的最新技术状态（SOTA）。即使只有50％的培训数据，SSCR还是使用完整数据的可比结果。

Iterative Language-Based Image Editing (IL-BIE) tasks follow iterative instructions to edit images step by step. Data scarcity is a significant issue for ILBIE as it is challenging to collect large-scale examples of images before and after instruction-based changes. However, humans still accomplish these editing tasks even when presented with an unfamiliar image-instruction pair. Such ability results from counterfactual thinking and the ability to think about alternatives to events that have happened already. In this paper, we introduce a Self-Supervised Counterfactual Reasoning (SSCR) framework that incorporates counterfactual thinking to overcome data scarcity. SSCR allows the model to consider out-of-distribution instructions paired with previous images. With the help of cross-task consistency (CTC), we train these counterfactual instructions in a self-supervised scenario. Extensive results show that SSCR improves the correctness of ILBIE in terms of both object identity and position, establishing a new state of the art (SOTA) on two IBLIE datasets (i-CLEVR and CoDraw). Even with only 50% of the training data, SSCR achieves a comparable result to using complete data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题