论文标题
堆栈溢出帖子编辑的注释数据集
An Annotated Dataset of Stack Overflow Post Edits
论文作者
论文摘要
为了改善软件工程,已经为代码片段和错误修复了软件存储库。通常,该采矿发生在文件或提交的水平上。为了能够更深入地挖掘并以更高的分辨率提取见解,我们在此提出了一个注释的数据集,该数据集包含超过700万个代码和文本堆栈溢出的编辑。我们的初步研究表明,这些编辑可能是用于挖掘有关细粒斑块的信息的宝库,例如,以优化非功能性能。
To improve software engineering, software repositories have been mined for code snippets and bug fixes. Typically, this mining takes place at the level of files or commits. To be able to dig deeper and to extract insights at a higher resolution, we hereby present an annotated dataset that contains over 7 million edits of code and text on Stack Overflow. Our preliminary study indicates that these edits might be a treasure trove for mining information about fine-grained patches, e.g., for the optimisation of non-functional properties.