论文标题

为TripClick Health检索建立强大的基准

Establishing Strong Baselines for TripClick Health Retrieval

论文作者

Hofstätter, Sebastian, Althammer, Sophia, Sertkan, Mete, Hanbury, Allan

论文摘要

我们为最近发布的TripClick Health Ad-Hoc检索系列提供了强大的基于变压器的重新排列和密集的检索基线。我们以简单的负抽样策略来改善 - 最初的嘈杂 - 培训数据。在TripClick的重新排列任务中,我们在BM25上取得了巨大的收益,而TripClick并未实现原始基线。此外,我们研究了不同域特异性预训练模型对TripClick的影响。最后,我们表明,即使使用简单的培训程序,茂密的检索也优于BM25。

We present strong Transformer-based re-ranking and dense retrieval baselines for the recently released TripClick health ad-hoc retrieval collection. We improve the - originally too noisy - training data with a simple negative sampling policy. We achieve large gains over BM25 in the re-ranking task of TripClick, which were not achieved with the original baselines. Furthermore, we study the impact of different domain-specific pre-trained models on TripClick. Finally, we show that dense retrieval outperforms BM25 by considerable margins, even with simple training procedures.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源