GitHub - Filimoa/open-parse: Improved file parsing for LLM’s

Tyler Maran GitHub - getomni-ai/zerox: Zero shot pdf OCR with gpt-4o-mini

varunshenoy GitHub - varunshenoy/super-json-mode: Low latency JSON generation using LLMs ⚡️

huggingface GitHub - huggingface/datatrove: Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.