GitHub - databonsai/databonsai: clean & curate your data with LLMs.

Datasets as Imagination

Lila Shroffjoinreboot.org
Thumbnail of Datasets as Imagination

alibaba GitHub - alibaba/data-juicer: A one-stop data processing system to make data higher-quality, juicier, and more digestible for LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大语言模型提供更高质量、更丰富、更易”消化“的数据!

huggingface GitHub - huggingface/datatrove: Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.