arXiv:2405.02048v1 [cs.IR] 3 May 2024

RelatedHighlights

Sounds fancy. Why do we care? GAR involves taking the source documents and having an LLM enrich them, prior to indexing. For example, the LLM might... * Generate titles for documents that are missing them * Standardize author names/formats* Extract dates, URLs, citations and other elements that might be valuable to search as separate fields* Create... See more

Feed | LinkedIn

Multiple indices. Splitting the document corpus up into multiple indices and then routing queries based on some criteria. This means that the search is over a much smaller set of documents rather than the entire dataset. Again, it is not always useful, but it can be helpful for certain datasets. The same approach works with the LLMs themselves.

Matt Rickard • Improving RAG: Strategies

ColBERT is a

fast

and

accurate

retrieval model, enabling scalable BERT-based search over large text collections in tens of milliseconds.

Figure 1: ColBERT's late interaction, efficiently scoring the fine-grained similarity between a queries and a passage.

As Figure 1 illustrates, ColBERT relies on fine-grained contextual late interaction : it encod... See more

stanford-futuredata • GitHub - stanford-futuredata/ColBERT: Stanford ColBERT: state-of-the-art neural search (SIGIR'20, TACL'21, NeurIPS'21, NAACL'22, CIKM'22)

Better RAG Results With Reciprocal Rank Fusion and Hybrid Search

John Wang assembled.com