论文标题
部分可观测时空混沌系统的无模型预测
Human-Object Interaction Detection via Disentangled Transformer
论文作者
论文摘要
人类对象相互作用检测解决了人类对象相互作用的联合定位和分类的问题。现有的HOI变压器要么采用单个解码器进行三重态预测,要么利用两个并行解码器分别检测单个对象和交互,并通过匹配过程组成三重态。相比之下,我们将三重态预测解放为人类对象对检测和相互作用分类。我们的主要动机是,检测人类对象实例并准确地对相互作用进行分类,以学习关注不同地区的表示形式。为此,我们提出了解开的变压器,在该变压器中,编码器和解码器均已分解以促进两个子任务的学习。为了关联截面解码器的预测,我们首先生成具有基本解码器的HOI三重态的统一表示形式,然后将其用作每个分离的解码器的输入功能。广泛的实验表明,我们的方法在两个公共HOI基准测试基准上的优于先前的工作。代码将可用。
Human-Object Interaction Detection tackles the problem of joint localization and classification of human object interactions. Existing HOI transformers either adopt a single decoder for triplet prediction, or utilize two parallel decoders to detect individual objects and interactions separately, and compose triplets by a matching process. In contrast, we decouple the triplet prediction into human-object pair detection and interaction classification. Our main motivation is that detecting the human-object instances and classifying interactions accurately needs to learn representations that focus on different regions. To this end, we present Disentangled Transformer, where both encoder and decoder are disentangled to facilitate learning of two sub-tasks. To associate the predictions of disentangled decoders, we first generate a unified representation for HOI triplets with a base decoder, and then utilize it as input feature of each disentangled decoder. Extensive experiments show that our method outperforms prior work on two public HOI benchmarks by a sizeable margin. Code will be available.