arXiv:2405.02048v1 [cs.IR] 3 May 2024
Sounds fancy. Why do we care? GAR involves taking the source documents and having an LLM enrich them, prior to indexing. For example, the LLM might... * Generate titles for documents that are missing them * Standardize author names/formats* Extract dates, URLs, citations and other elements that might be valuable to search as separate fields* Create... See more
Feed | LinkedIn
- Multiple indices. Splitting the document corpus up into multiple indices and then routing queries based on some criteria. This means that the search is over a much smaller set of documents rather than the entire dataset. Again, it is not always useful, but it can be helpful for certain datasets. The same approach works with the LLMs themselves.
- Cu
Matt Rickard • Improving RAG: Strategies
ColBERT is a
fast
and
accurate
retrieval model, enabling scalable BERT-based search over large text collections in tens of milliseconds.
Figure 1: ColBERT's late interaction, efficiently scoring the fine-grained similarity between a queries and a passage.
As Figure 1 illustrates, ColBERT relies on fine-grained contextual late interaction : it encod... See more
fast
and
accurate
retrieval model, enabling scalable BERT-based search over large text collections in tens of milliseconds.
Figure 1: ColBERT's late interaction, efficiently scoring the fine-grained similarity between a queries and a passage.
As Figure 1 illustrates, ColBERT relies on fine-grained contextual late interaction : it encod... See more