Data Processing

Instill AI

WebDataset

google GitHub - google/magika: Detect file content types with deep learning

Why you should move your ETL stack to Modal

tobymao GitHub - tobymao/sqlglot: Python SQL Parser and Transpiler

Why you should move your ETL stack to Modal

spiceai GitHub - spiceai/spiceai: A unified SQL query interface and portable runtime to locally materialize, accelerate, and query data tables sourced from any database, data warehouse, or data lake.

dgarnitz GitHub - dgarnitz/vectorflow: VectorFlow is a high volume vector embedding pipeline that ingests raw data, transforms it into vectors and writes it to a vector DB of your choice.

Bap Our 5 favourite open-source customer data platforms