
Data Machina #222

Why Chat With PDF Is Hard And How ChatLLM Gets It Right
Chatting on long docs is hard because most LLMs other than Gemini don't have a large context.
However, even with Gemini's 1M context length, in-context learning is hard, and if you stuff the doc in the context, it doesn't do a good job.... See more
Bindu Reddyx.com⚡ LitGPT
Pretrain, finetune, evaluate, and deploy 20+ LLMs on your own data
Uses the latest state-of-the-art techniques:
✅ flash attention ✅ fp4/8/16/32 ✅ LoRA, QLoRA, Adapter (v1, v2) ✅ FSDP ✅ 1-1000+ GPUs/TPUs
Lightning AI • Models • Quick start • Inference • Finetune • Pretrain • Deploy • Features • Training recipes (YAML)
Finetune, pretrain and d... See more
Pretrain, finetune, evaluate, and deploy 20+ LLMs on your own data
Uses the latest state-of-the-art techniques:
✅ flash attention ✅ fp4/8/16/32 ✅ LoRA, QLoRA, Adapter (v1, v2) ✅ FSDP ✅ 1-1000+ GPUs/TPUs
Lightning AI • Models • Quick start • Inference • Finetune • Pretrain • Deploy • Features • Training recipes (YAML)
Finetune, pretrain and d... See more
Lightning-AI • GitHub - Lightning-AI/litgpt: Pretrain, finetune, deploy 20+ LLMs on your own data. Uses state-of-the-art techniques: flash attention, FSDP, 4-bit, LoRA, and more.
uniflow provides a unified LLM interface to extract and transform and raw documents.
- Document types: Uniflow enables data extraction from PDFs, HTMLs and TXTs.
- LLM agnostic: Uniflow supports most common-used LLMs for text tranformation, including
- OpenAI models (GPT3.5 and GPT4),
- Google Gemini models (Gemini 1.5, MultiModal),
- AWS BedRock models,
- Huggingf
CambioML • GitHub - CambioML/uniflow-llm-based-pdf-extraction-text-cleaning-data-clustering: LLM-based text extraction from unstructured data like PDFs, Words and HTMLs. Transform and cluster the text into your desired format. Less information loss, more interpretation, and faster...
GitHub - arthur-ai/bench: A tool for evaluating LLMs