Data Loading

Bill Mill notes.billmill.org

GitHub - VikParuchuri/surya: OCR, layout analysis, reading order, line detection in 90+ languages

Stability and scalability for search

tensorlakeai GitHub - tensorlakeai/indexify: A scalable realtime and continuous indexing engine for Unstructured Data to build Generative AI Applications

GitHub - Stirling-Tools/Stirling-PDF: #1 Locally hosted web application that allows you to perform various operations on PDF files

Bap Our 5 favourite open-source customer data platforms

Bap Our 5 favourite open-source customer data platforms

jina-ai jina-ai/reader: Convert any URL to an LLM-friendly input ... - GitHub

Filimoa GitHub - Filimoa/open-parse: Improved file parsing for LLM’s