论文标题

大声笑:相对的正规化损失比查询重新重新计算的损失损失了伪的反馈

LoL: A Comparative Regularization Loss over Query Reformulation Losses for Pseudo-Relevance Feedback

论文作者

Zhu, Yunchang, Pang, Liang, Lan, Yanyan, Shen, Huawei, Cheng, Xueqi

论文摘要

伪相关反馈(PRF)已被证明是提高检索准确性的有效查询重新制定技术。它旨在减轻查询与其潜在相关文件之间语言表达不匹配的。现有的PRF方法独立处理源自同一查询但使用不同数量的反馈文档的修订的查询,从而导致严重的查询漂移。在不比较同一查询的两种不同修订的影响的情况下,PRF模型可能会错误地关注更多的无关信息,从而增加了更多的反馈,从而重新重新制定了比使用较少的反馈的修订效率不如修订的查询。理想情况下,如果PRF模型可以区分反馈中无关紧要和相关信息,则反馈文档越多,修订后的查询就会越好。为了弥合这一差距,我们提出了损失损失(LOL)框架,以比较培训期间同一查询的不同修订版之间的重新损失。具体而言,我们使用不同量的反馈并计算其重新损失的情况下并行多次修改原始查询。然后,我们引入了这些重新制定损失的额外正规化损失,以惩罚使用更多反馈但会增加更大损失的修订。通过这种比较正则化,PRF模型有望学会通过比较不同修订的查询的效果来抑制额外增加的无关信息。此外,我们提出了一种实施此框架的可区分查询重新制定方法。此方法修改了向量空间中的查询,并直接优化了查询向量的检索性能,适用于稀疏和密集的检索模型。经验评估证明了我们方法对于两个典型的稀疏和致密检索模型的有效性和鲁棒性。

Pseudo-relevance feedback (PRF) has proven to be an effective query reformulation technique to improve retrieval accuracy. It aims to alleviate the mismatch of linguistic expressions between a query and its potential relevant documents. Existing PRF methods independently treat revised queries originating from the same query but using different numbers of feedback documents, resulting in severe query drift. Without comparing the effects of two different revisions from the same query, a PRF model may incorrectly focus on the additional irrelevant information increased in the more feedback, and thus reformulate a query that is less effective than the revision using the less feedback. Ideally, if a PRF model can distinguish between irrelevant and relevant information in the feedback, the more feedback documents there are, the better the revised query will be. To bridge this gap, we propose the Loss-over-Loss (LoL) framework to compare the reformulation losses between different revisions of the same query during training. Concretely, we revise an original query multiple times in parallel using different amounts of feedback and compute their reformulation losses. Then, we introduce an additional regularization loss on these reformulation losses to penalize revisions that use more feedback but gain larger losses. With such comparative regularization, the PRF model is expected to learn to suppress the extra increased irrelevant information by comparing the effects of different revised queries. Further, we present a differentiable query reformulation method to implement this framework. This method revises queries in the vector space and directly optimizes the retrieval performance of query vectors, applicable for both sparse and dense retrieval models. Empirical evaluation demonstrates the effectiveness and robustness of our method for two typical sparse and dense retrieval models.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源