GitHub - Unstructured-IO/unstructured: Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.

Today, I'm announcing Alexandria, an open-source initiative to embed the internet.
To start, we're releasing the embeddings for every research paper on the Arxiv. That's over 4m items, 600m tokens, and 3.07 billion vector dimensions.
We're not stopping here. Show more
GitHub - comfyanonymous/ComfyUI: The most powerful and modular stable diffusion GUI, api and backend with a graph/nodes interface.
github.com
Hugging Face – The AI community building the future.
huggingface.co
uniflow provides a unified LLM interface to extract and transform and raw documents.
- Document types: Uniflow enables data extraction from PDFs, HTMLs and TXTs.
- LLM agnostic: Uniflow supports most common-used LLMs for text tranformation, including
- OpenAI models (GPT3.5 and GPT4),
- Google Gemini models (Gemini 1.5, MultiModal),
- AWS BedRock models,
- Huggingf