论文标题
通过混合GNN检索代码摘要的检索生成
Retrieval-Augmented Generation for Code Summarization via Hybrid GNN
论文作者
论文摘要
源代码摘要旨在从结构化代码片段中生成自然语言摘要,以更好地理解代码功能。但是,由于源代码的复杂性以及源代码和自然语言摘要之间的语言差距,自动代码摘要具有挑战性。以前的大多数方法都依赖于基于检索的方法(可以利用从检索数据库中看到的类似示例,但具有较低的概括性能)或基于生成的方法(它们具有更好的概括性能,但不能利用类似的示例)。本文提出了一种新颖的检索机制,以结合两个世界的好处。此外,为了减轻图形神经网络(GNN)在捕获源代码的全局图结构信息中的限制,我们提出了一个基于注意力的新型动力学图,以补充源代码的静态图表表示,并设计了传递GNN的混合信息,以捕获本地和全球结构信息。为了评估所提出的方法,我们发布了一种新的具有挑战性的基准,该基准从多元化的大型开源C项目(数据集中的总计95k+独特功能)中爬走。我们的方法实现了最先进的性能,从BLEU-4,Rouge-L和Meteor方面,将现有方法提高了1.42、2.44和1.29。
Source code summarization aims to generate natural language summaries from structured code snippets for better understanding code functionalities. However, automatic code summarization is challenging due to the complexity of the source code and the language gap between the source code and natural language summaries. Most previous approaches either rely on retrieval-based (which can take advantage of similar examples seen from the retrieval database, but have low generalization performance) or generation-based methods (which have better generalization performance, but cannot take advantage of similar examples). This paper proposes a novel retrieval-augmented mechanism to combine the benefits of both worlds. Furthermore, to mitigate the limitation of Graph Neural Networks (GNNs) on capturing global graph structure information of source code, we propose a novel attention-based dynamic graph to complement the static graph representation of the source code, and design a hybrid message passing GNN for capturing both the local and global structural information. To evaluate the proposed approach, we release a new challenging benchmark, crawled from diversified large-scale open-source C projects (total 95k+ unique functions in the dataset). Our method achieves the state-of-the-art performance, improving existing methods by 1.42, 2.44 and 1.29 in terms of BLEU-4, ROUGE-L and METEOR.