论文标题
与潜在随机功能解析的组成定律
Compositional Law Parsing with Latent Random Functions
论文作者
论文摘要
人类认知具有组成。我们通过将场景分解为不同的概念(例如,对象的形状和位置)并学习这些概念的各个概念(例如,运动定律)或人造(例如,游戏的定律)来理解场景。这些定律的自动解析表明该模型能够理解场景的能力,这使得解析在许多视觉任务中起着核心作用。本文提出了一个用于组成定律解析(CLAP)的深层可变量模型,该模型通过编码编码架构实现了类似人类的组成能力,以表示场景中的概念为潜在变量。拍手采用特定于概念的潜在随机功能,该功能与神经过程实例化,以捕获概念定律。我们的实验结果表明,拍手的表现优于多个视觉任务(例如直观物理,抽象的视觉推理和场景表示)中的基线方法。法律操纵实验通过修改样品上的特定潜在随机函数来说明拍手的解释性。例如,拍手从场景中移动的球从移动的球中学习了改变位置的定律和外观稳定的定律,从而可以在样本之间交换法律或将现有法律构成新法律。
Human cognition has compositionality. We understand a scene by decomposing the scene into different concepts (e.g., shape and position of an object) and learning the respective laws of these concepts, which may be either natural (e.g., laws of motion) or man-made (e.g., laws of a game). The automatic parsing of these laws indicates the model's ability to understand the scene, which makes law parsing play a central role in many visual tasks. This paper proposes a deep latent variable model for Compositional LAw Parsing (CLAP), which achieves the human-like compositionality ability through an encoding-decoding architecture to represent concepts in the scene as latent variables. CLAP employs concept-specific latent random functions instantiated with Neural Processes to capture the law of concepts. Our experimental results demonstrate that CLAP outperforms the baseline methods in multiple visual tasks such as intuitive physics, abstract visual reasoning, and scene representation. The law manipulation experiments illustrate CLAP's interpretability by modifying specific latent random functions on samples. For example, CLAP learns the laws of position-changing and appearance constancy from the moving balls in a scene, making it possible to exchange laws between samples or compose existing laws into novel laws.