↗ arXiv ↗ Hugging Face ↗ Papers with Code
TL;DR#
Current information retrieval systems often use a two-stage process: a fast retriever initially selects candidate documents, followed by a more accurate but computationally expensive reranker to refine the ranking. It is widely assumed that rerankers consistently enhance retrieval quality, especially when considering more documents. This paper investigates this assumption and found that the existing rerankers show diminishing returns when scoring progressively more documents and actually degrade quality beyond a certain limit. This is because rerankers often get distracted by documents with minimal lexical or semantic overlap with the query.
To address this issue, the researchers conducted experiments on various academic and enterprise datasets using several state-of-the-art rerankers and tested them on a full retrieval setting where they ranked the whole document set. The results confirmed the diminishing returns of rerankers with a large number of documents, frequently resulting in a lower recall than retrievers. They further propose listwise reranking via large language models as a more robust approach. This research has significant implications for how we build and evaluate large-scale retrieval systems.
Key Takeaways#
Why does it matter?#
This paper is crucial because it challenges the common assumption that rerankers always improve information retrieval, especially when scaling. This impacts how we design and optimize large-scale retrieval systems, prompting research into more robust methods. The findings will influence future IR system development and evaluation practices.