ehartford/dolphin · Datasets at Hugging Face

ghimiresunil GitHub - ghimiresunil/LLM-PowerHouse-A-Curated-Guide-for-Large-Language-Models-with-Custom-Training-and-Inferencing: LLM-PowerHouse: Unleash LLMs' potential through curated tutorials, best practices, and ready-to-use code for custom training and inferencing.

lyuchenyang GitHub - lyuchenyang/Macaw-LLM: Macaw-LLM: Multi-Modal Language Modeling with Image, Video, Audio, and Text Integration

Dolma: 3 Trillion Token Open Corpus for Language Model Pretraining

Luca Soldainiblog.allenai.org
Thumbnail of Dolma: 3 Trillion Token Open Corpus for Language Model Pretraining