论文标题

在两个阶段范式中重新访问代码搜索

Revisiting Code Search in a Two-Stage Paradigm

论文作者

Hu, Fan, Wang, Yanlin, Du, Lun, Li, Xirong, Zhang, Hongyu, Han, Shi, Zhang, Dongmei

论文摘要

借助良好的代码搜索引擎,开发人员可以重复使用现有代码段并加速软件开发过程。当前的代码搜索方法可以分为两类:基于传统信息检索(IR)和基于深度学习(DL)方法。基于DL的方法包括跨编码器范式和双重编码范式。但是,两种方法都有一定的局限性。基于IR的基于IR和双重编码器模型的推断很快,但是它们还不够准确。虽然跨编码器模型可以实现更高的搜索精度,但会消耗更多的时间。在这项工作中,我们提出了TOSS,这是一个两阶段的融合代码搜索框架,可以结合不同代码搜索方法的优势。 TOSS首先使用基于IR的和双重编码器模型有效地回忆了少量的Top-K代码候选物,然后使用细粒的跨编码器进行较高的排名。此外,我们对不同的代码候选量和多种编程语言进行了广泛的实验,以验证折腾的有效性。我们还将折腾与六种数据融合方法进行比较。实验结果表明,与CodesearchNet基准为0.713的最佳基线结果相比,抛弃不仅是有效的,而且还达到了总体平均相互排名(MRR)得分为0.763,达到了最先进的准确性。我们的源代码和实验数据可在以下网址获得:https://github.com/fly-dragon211/toss。

With a good code search engine, developers can reuse existing code snippets and accelerate software development process. Current code search methods can be divided into two categories: traditional information retrieval (IR) based and deep learning (DL) based approaches. DL-based approaches include the cross-encoder paradigm and the bi-encoder paradigm. However, both approaches have certain limitations. The inference of IR-based and bi-encoder models are fast, however, they are not accurate enough; while cross-encoder models can achieve higher search accuracy but consume more time. In this work, we propose TOSS, a two-stage fusion code search framework that can combine the advantages of different code search methods. TOSS first uses IR-based and bi-encoder models to efficiently recall a small number of top-k code candidates, and then uses fine-grained cross-encoders for finer ranking. Furthermore, we conduct extensive experiments on different code candidate volumes and multiple programming languages to verify the effectiveness of TOSS. We also compare TOSS with six data fusion methods. Experimental results show that TOSS is not only efficient, but also achieves state-of-the-art accuracy with an overall mean reciprocal ranking (MRR) score of 0.763, compared to the best baseline result on the CodeSearchNet benchmark of 0.713. Our source code and experimental data are available at: https://github.com/fly-dragon211/TOSS.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源