通过多选令牌提升点 - 伯特

论文标题

通过多选令牌提升点 - 伯特

Boosting Point-BERT by Multi-choice Tokens

论文作者

Fu, Kexue, Yuan, Mingzhi, Wang, Manning

论文摘要

蒙面语言建模（MLM）已成为最成功的自我监管的预训练任务之一。受其成功的启发，Point-Bert是Point Cloud的先驱工作，提出了蒙版掩盖点建模（MPM），以在大规模的单位数据集中预先培训点变压器。尽管表现出色，但我们发现语言和点云之间的固有区别倾向于引起点云的模棱两可的令牌化。对于点云，没有用于点云令牌化的金标准。 Point-Bert使用离散的变性自动编码器（DVAE）作为令牌，但它可能会为语义相似的补丁生成不同的令牌ID，并为语义上的降低贴片生成相同的令牌ID。为了解决上述问题，我们提出了MCP-bert，这是一个具有多选择令牌的预训练框架。具体而言，我们在Point-Bert中放大了对贴片令牌ID的以前的单选项约束，并为每个补丁提供多项选择令牌ID作为监督。此外，我们利用了Transformer学到的高级语义，以进一步完善我们的监督信号。关于点云分类，很少的射击分类和零件分割任务的广泛实验证明了我们方法的优势，例如，预训练的变压器在ModelNet40上实现了94.1％的准确性，在ScanObjectnn和最难设置的最难设置和新的尚未进行少量学习的现状表现上，精度为84.28％。我们还证明，我们的方法不仅可以提高所有下游任务上的点 - 伯特的性能，而且几乎没有额外的计算开销。该代码将在https://github.com/fukexue/mcp-bert中发布。

Masked language modeling (MLM) has become one of the most successful self-supervised pre-training task. Inspired by its success, Point-BERT, as a pioneer work in point cloud, proposed masked point modeling (MPM) to pre-train point transformer on large scale unanotated dataset. Despite its great performance, we find the inherent difference between language and point cloud tends to cause ambiguous tokenization for point cloud. For point cloud, there doesn't exist a gold standard for point cloud tokenization. Point-BERT use a discrete Variational AutoEncoder (dVAE) as tokenizer, but it might generate different token ids for semantically-similar patches and generate the same token ids for semantically-dissimilar patches. To tackle above problem, we propose our McP-BERT, a pre-training framework with multi-choice tokens. Specifically, we ease the previous single-choice constraint on patch token ids in Point-BERT, and provide multi-choice token ids for each patch as supervision. Moreover, we utilitze the high-level semantics learned by transformer to further refine our supervision signals. Extensive experiments on point cloud classification, few-shot classification and part segmentation tasks demonstrate the superiority of our method, e.g., the pre-trained transformer achieves 94.1% accuracy on ModelNet40, 84.28% accuracy on the hardest setting of ScanObjectNN and new state-of-the-art performance on few-shot learning. We also demonstrate that our method not only improves the performance of Point-BERT on all downstream tasks, but also incurs almost no extra computational overhead. The code will be released in https://github.com/fukexue/McP-BERT.

下载PDF全文

下载文献需遵守相关版权规定

论文标题