论文标题
Tricolo:文本三曲置对比损失以形成检索
TriCoLo: Trimodal Contrastive Loss for Text to Shape Retrieval
论文作者
论文摘要
与3D形状数据的增长,文本对形状的检索是一个日益相关的问题。关于学习联合嵌入在多模式数据的对比损失方面的最新工作已在诸如检索和分类之类的任务中成功。到目前为止,在3D形状和文本的联合表示学习上的工作着重于通过在表示形式或多任务学习之间建模复杂的注意力来改善嵌入。我们在文本,多视图图像和3D形状体素上提出了三峰学习方案,并表明,通过大批次对比度学习,我们在文本对形状的检索上实现了良好的性能,而没有复杂的注意机制或损失。我们的实验是为构建文本图像形状的三座嵌入的后续工作的基础。
Text-to-shape retrieval is an increasingly relevant problem with the growth of 3D shape data. Recent work on contrastive losses for learning joint embeddings over multimodal data has been successful at tasks such as retrieval and classification. Thus far, work on joint representation learning for 3D shapes and text has focused on improving embeddings through modeling of complex attention between representations, or multi-task learning. We propose a trimodal learning scheme over text, multi-view images and 3D shape voxels, and show that with large batch contrastive learning we achieve good performance on text-to-shape retrieval without complex attention mechanisms or losses. Our experiments serve as a foundation for follow-up work on building trimodal embeddings for text-image-shape.