论文标题
K-lite:学习具有外部知识的可转移视觉模型
K-LITE: Learning Transferable Visual Models with External Knowledge
论文作者
论文摘要
新一代最先进的计算机视觉系统是根据自然语言监督训练的,从简单对象类别名称到描述性字幕。由于通过大规模的数据收集过程实现的广泛概念覆盖范围,这种监督形式可确保学习视觉模型的高通用性和可用性。另外,我们认为,用外部知识学习是一种有前途的方式,它利用了更具结构化的监督来源并提供样本效率。我们提出了K-Lite,这是一种简单的策略,以利用外部知识来构建可转移的视觉系统:在培训中,它丰富了具有WordNet和Wiktionary知识的文本中的实体,从而导致了一种有效且可扩展的方法来学习使用有关视觉概念知识的图像表示。在评估中,文本还通过外部知识增强,然后用来引用学习的视觉概念(或描述新概念),以使预训练模型的零射击和少量转移。我们研究K-Lite在两个重要的计算机视觉问题,图像分类和对象检测方面的性能,分别在20和13个不同的现有数据集上进行基准测试。拟议的知识增强模型在转移学习绩效上比现有方法显示出显着改善。我们的代码可在https://github.com/microsoft/klite上找到。
The new generation of state-of-the-art computer vision systems are trained from natural language supervision, ranging from simple object category names to descriptive captions. This form of supervision ensures high generality and usability of the learned visual models, due to the broad concept coverage achieved via large-scale data collection process. Alternatively, we argue that learning with external knowledge is a promising way which leverages a much more structured source of supervision and offers sample efficiency. We propose K-LITE, a simple strategy to leverage external knowledge for building transferable visual systems: In training, it enriches entities in text with WordNet and Wiktionary knowledge, leading to an efficient and scalable approach to learning image representations that uses knowledge about the visual concepts. In evaluation, the text is also augmented with external knowledge and then used to reference learned visual concepts (or describe new ones) to enable zero-shot and few-shot transfer of the pre-trained models. We study the performance of K-LITE on two important computer vision problems, image classification and object detection, benchmarking on 20 and 13 different existing datasets, respectively. The proposed knowledge-augmented models show significant improvement in transfer learning performance over existing methods. Our code is available at https://github.com/microsoft/klite.