论文标题

堆栈溢出帖子编辑的注释数据集

An Annotated Dataset of Stack Overflow Post Edits

论文作者

Baltes, Sebastian, Wagner, Markus

论文摘要

为了改善软件工程,已经为代码片段和错误修复了软件存储库。通常,该采矿发生在文件或提交的水平上。为了能够更深入地挖掘并以更高的分辨率提取见解,我们在此提出了一个注释的数据集,该数据集包含超过700万个代码和文本堆栈溢出的编辑。我们的初步研究表明,这些编辑可能是用于挖掘有关细粒斑块的信息的宝库,例如,以优化非功能性能。

To improve software engineering, software repositories have been mined for code snippets and bug fixes. Typically, this mining takes place at the level of files or commits. To be able to dig deeper and to extract insights at a higher resolution, we hereby present an annotated dataset that contains over 7 million edits of code and text on Stack Overflow. Our preliminary study indicates that these edits might be a treasure trove for mining information about fine-grained patches, e.g., for the optimisation of non-functional properties.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源