togethercomputer/RedPajama-Data-V2 · Datasets at Hugging Face

togethercomputer/RedPajama-Data-V2 · Datasets at Hugging Face

huggingface.co
Thumbnail of togethercomputer/RedPajama-Data-V2 · Datasets at Hugging Face

Summing columns in remote Parquet files using DuckDB

Simon Willisontil.simonwillison.net
Thumbnail of Summing columns in remote Parquet files using DuckDB

huggingface GitHub - huggingface/datatrove: Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.

Open X-Embodiment: Robotic Learning Datasets and RT-X Models