论文标题

使用堆栈溢出评论编辑对推荐代码维护更改

On Using Stack Overflow Comment-Edit Pairs to recommend code maintenance changes

论文作者

Tang, Henry, Nadi, Sarah

论文摘要

代码维护数据集通常由包含改进或修复的代码的前后版本组成。此类数据集对于与代码维护相关的软件工程支持工具(例如程序维修,代码建议系统或应用程序编程接口(API)滥用检测)很重要。当前的大多数数据集都是根据版本控制系统中的采矿提交历史记录或问题跟踪系统中的问题构建的。 在本文中,我们研究了堆栈溢出是否可以用作附加数据源。关于堆栈溢出的评论为开发人员提供了一种有效的方法,可以指出现有答案,替代解决方案或陷阱的问题。在本文中,我们从堆栈溢出中挖掘了评论编辑对,并研究了它们的潜在用途。这些对具有额外的好处,即对为什么需要进行更改以及有可能纠结的更改来处理具体描述。我们首先设计一种技术来提取相关的评论对,然后研究这些对的性质。我们发现,大多数评论编辑对没有纠结,但是只有27%的研究对可能对上述应用有用。我们对采矿对的类型进行了分类,发现有用对的最高比例来自类别校正,过时,缺陷和扩展。为了证明我们提取的对的有效性,我们在GitHub上提交了15条拉的请求,其中10个已被接受为广泛使用的存储库,例如Apache Beam和NLTK。我们的工作是第一个调查堆栈溢出评论编辑对的工作,并为将来的工作打开了大门。根据我们的发现和观察,我们提供了有关如何潜在地识别一组有用的评论对对的具体建议,我们的共享数据也可以促进。

Code maintenance data sets typically consist of a before and after version of the code that contains the improvement or fix. Such data sets are important for software engineering support tools related to code maintenance, such as program repair, code recommender systems, or Application Programming Interface (API) misuse detection. Most of the current data sets are constructed from mining commit history in version-control systems or issues in issue-tracking systems. In this paper, we investigate whether Stack Overflow can be used as an additional data source. Comments on Stack Overflow provide an effective way for developers to point out problems with existing answers, alternative solutions, or pitfalls. In this paper, we mine comment-edit pairs from Stack Overflow and investigate their potential usefulness. These pairs have the added benefit of having concrete descriptions of why the change is needed as well as potentially having less tangled changes to deal with. We first design a technique to extract related comment-edit pairs and then investigate the nature of these pairs. We find that the majority of comment-edit pairs are not tangled, but only 27% of the studied pairs are potentially useful for the above applications. We categorize the types of mined pairs and find that the highest ratio of useful pairs come from categories Correction, Obsolete, Flaw, and Extension. To demonstrate the effectiveness of our extracted pairs, we submitted 15 pull requests on GitHub, 10 of which have been accepted to widely used repositories such as Apache Beam and nltk. Our work is the first to investigate Stack Overflow comment-edit pairs and opens the door for future work in this direction. Based on our findings and observations, we provide concrete suggestions on how to potentially identify a larger set of useful comment-edit pairs, which can also be facilitated by our shared data.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源