Data Machina #222

RelatedInsightsHighlights

Why Chat With PDF Is Hard And How ChatLLM Gets It Right Chatting on long docs is hard because most LLMs other than Gemini don't have a large context. However, even with Gemini's 1M context length, in-context learning is hard, and if you stuff the doc in the context, it doesn't do a good job.... See more

Bindu Reddy x.com

⚡ LitGPT

Pretrain, finetune, evaluate, and deploy 20+ LLMs on your own data

Uses the latest state-of-the-art techniques:

✅ flash attention ✅ fp4/8/16/32 ✅ LoRA, QLoRA, Adapter (v1, v2) ✅ FSDP ✅ 1-1000+ GPUs/TPUs

Lightning AI • Models • Quick start • Inference • Finetune • Pretrain • Deploy • Features • Training recipes (YAML)

Finetune, pretrain and d... See more

Lightning-AI • GitHub - Lightning-AI/litgpt: Pretrain, finetune, deploy 20+ LLMs on your own data. Uses state-of-the-art techniques: flash attention, FSDP, 4-bit, LoRA, and more.

uniflow provides a unified LLM interface to extract and transform and raw documents.

Document types: Uniflow enables data extraction from PDFs, HTMLs and TXTs.

LLM agnostic: Uniflow supports most common-used LLMs for text tranformation, including
- OpenAI models (GPT3.5 and GPT4),
- Google Gemini models (Gemini 1.5, MultiModal),
- AWS BedRock models,
- Huggingf

CambioML • GitHub - CambioML/uniflow-llm-based-pdf-extraction-text-cleaning-data-clustering: LLM-based text extraction from unstructured data like PDFs, Words and HTMLs. Transform and cluster the text into your desired format. Less information loss, more interpretation, and faster...

GitHub - arthur-ai/bench: A tool for evaluating LLMs