SLACC：基于模拟的语言不可知论代码克隆

论文标题

SLACC：基于模拟的语言不可知论代码克隆

SLACC: Simion-based Language Agnostic Code Clones

论文作者

Mathew, George, Parnin, Chris, Stolee, Kathryn T

论文摘要

成功的跨语言克隆检测可以使研究人员和开发人员能够创建强大的语言迁移工具，促进一旦掌握了一种语言，就可以学习其他编程语言，并通过更广泛的代码库来促进代码片段的重复使用。但是，识别跨语言克隆对克隆检测问题提出了特殊挑战。任意语言之间缺乏共同的基础表示意味着检测克隆需要以下解决方案之一：1）在每种目标语言中复制的静态分析框架，并具有符合所有语言的语言特征的注释，或2）动态分析框架，该框架检测基于运行时行为的克隆。在这项工作中，我们证明了后一种溶液的可行性，后者是一种动态分析方法，称为SLACC用于跨语言克隆检测。像先前的克隆检测技术一样，我们使用输入/输出行为匹配克隆，尽管我们通过扩增输入的数量并涵盖更多数据类型来克服先前工作的局限性；结果，要获得比先前尝试更好的群集。由于簇是基于输入/输出行为生成的，因此SLACC支持跨语言克隆检测。作为一个额外的挑战，我们针对一种静态打字语言Java和动态的打字语言Python。与Hitoshiio相比，Java最近的克隆检测工具，SLACC检索了6倍的群集的6倍，并且具有更高的精度（86.7％比30.7％）。这是对动态类型语言（精度= 87.3％）进行克隆检测的第一项工作，也是第一个跨缺乏常见基础表示的语言执行克隆检测的工作（精度= 94.1％）。它提供了朝着可扩展语言迁移工具的更大目标的第一步。

Successful cross-language clone detection could enable researchers and developers to create robust language migration tools, facilitate learning additional programming languages once one is mastered, and promote reuse of code snippets over a broader codebase. However, identifying cross-language clones presents special challenges to the clone detection problem. A lack of common underlying representation between arbitrary languages means detecting clones requires one of the following solutions: 1) a static analysis framework replicated across each targeted language with annotations matching language features across all languages, or 2) a dynamic analysis framework that detects clones based on runtime behavior. In this work, we demonstrate the feasibility of the latter solution, a dynamic analysis approach called SLACC for cross-language clone detection. Like prior clone detection techniques, we use input/output behavior to match clones, though we overcome limitations of prior work by amplifying the number of inputs and covering more data types; and as a result, achieve better clusters than prior attempts. Since clusters are generated based on input/output behavior, SLACC supports cross-language clone detection. As an added challenge, we target a static typed language, Java, and a dynamic typed language, Python. Compared to HitoshiIO, a recent clone detection tool for Java, SLACC retrieves 6 times as many clusters and has higher precision (86.7% vs. 30.7%). This is the first work to perform clone detection for dynamic typed languages (precision = 87.3%) and the first to perform clone detection across languages that lack a common underlying representation (precision = 94.1%). It provides a first step towards the larger goal of scalable language migration tools.

下载PDF全文

下载文献需遵守相关版权规定

论文标题