论文标题

反馈信号质量如何影响伪相关性反馈在通道检索中的有效性?

How does Feedback Signal Quality Impact Effectiveness of Pseudo Relevance Feedback for Passage Retrieval?

论文作者

Li, Hang, Mourad, Ahmed, Koopman, Bevan, Zuccon, Guido

论文摘要

伪相关反馈(PRF)假设第一阶段排名者检索到的最高结果与原始查询有关,并使用它们来改善第二轮检索的查询表示。但是,这个假设通常是不正确的:某些甚至所有的反馈文件可能无关紧要。实际上,PRF方法的有效性很可能取决于反馈信号的质量,从而取决于第一阶段排名者的有效性。但是,这方面以前很少受到关注。 在本文中,我们控制反馈信号的质量,并衡量其对PRF方法的影响,包括传统的单词袋方法(Rocchio)和基于密集的向量方法(学习和没有学习)。我们的结果表明,反馈信号在PRF方法的有效性上发挥了重要作用。重要的是,令人惊讶的是,我们的分析表明,处理质量不同的反馈信号时,并非所有PRF方法都是相同的。这些发现对于更好地了解PRF方法以及应使用的方法以及应使用的情况,具体取决于反馈信号质量,并为该领域的未来研究树立基础,至关重要。

Pseudo-Relevance Feedback (PRF) assumes that the top results retrieved by a first-stage ranker are relevant to the original query and uses them to improve the query representation for a second round of retrieval. This assumption however is often not correct: some or even all of the feedback documents may be irrelevant. Indeed, the effectiveness of PRF methods may well depend on the quality of the feedback signal and thus on the effectiveness of the first-stage ranker. This aspect however has received little attention before. In this paper we control the quality of the feedback signal and measure its impact on a range of PRF methods, including traditional bag-of-words methods (Rocchio), and dense vector-based methods (learnt and not learnt). Our results show the important role the quality of the feedback signal plays on the effectiveness of PRF methods. Importantly, and surprisingly, our analysis reveals that not all PRF methods are the same when dealing with feedback signals of varying quality. These findings are critical to gain a better understanding of the PRF methods and of which and when they should be used, depending on the feedback signal quality, and set the basis for future research in this area.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源