长文档与模块化重新排名重新排行

论文标题

长文档与模块化重新排名重新排行

Long Document Re-ranking with Modular Re-ranker

论文作者

Gao, Luyu, Callan, Jamie

论文摘要

对于基于伯特（Bert）等深层语言模型的神经重新级别而言，长期文档重新排行榜是一个具有挑战性的问题。早期的工作将文件分解为短片般的块。这些块独立映射到标量分数或潜在向量，然后将其汇总为最终相关得分。但是，这些编码和池的方法不可避免地引入了信息瓶颈：低维表示。在本文中，我们建议建模完整的查询到文档互动，利用注意操作和模块化变压器重新升级框架。首先，使用编码器模块独立编码文档块。然后，交互模块编码查询，并从查询到所有文档块表示。我们证明该模型可以利用这种新的自由度来汇总整个文档的重要信息。我们的实验表明，这种设计在两个经典的IR收集鲁棒04和Clueweb09上产生有效的重新排序，以及一个大规模监督的MS-Marco文档排名。

Long document re-ranking has been a challenging problem for neural re-rankers based on deep language models like BERT. Early work breaks the documents into short passage-like chunks. These chunks are independently mapped to scalar scores or latent vectors, which are then pooled into a final relevance score. These encode-and-pool methods however inevitably introduce an information bottleneck: the low dimension representations. In this paper, we propose instead to model full query-to-document interaction, leveraging the attention operation and modular Transformer re-ranker framework. First, document chunks are encoded independently with an encoder module. An interaction module then encodes the query and performs joint attention from the query to all document chunk representations. We demonstrate that the model can use this new degree of freedom to aggregate important information from the entire document. Our experiments show that this design produces effective re-ranking on two classical IR collections Robust04 and ClueWeb09, and a large-scale supervised collection MS-MARCO document ranking.

下载PDF全文

下载文献需遵守相关版权规定

论文标题