GitHub - google/magika: Detect file content types with deep ...

GitHub - google/magika: Detect file content types with deep learning

RelatedInsightsHighlights

ata Collection Experimentation Evaluation and Deployment Monitoring and Response Metadata Data catalogs, Amundsen, AWS Glue, Hive metas-tores Weights & Biases, MLFlow, train/test set parameter configs, A/B test tracking tools Dashboards, SQL, metric functions and window sizes Unit Data cleaning tools Tensorflow, ML-lib, PyTorch, Scikit-learn, X... See more

Shreya Shankar • "We Have No Idea How Models will Behave in Production until Production": How Engineers Operationalize Machine Learning.

NFX’s Generative Tech Open Source Market Map

docs.google.com

Meta AI released LLaMA ... and they included a paper which described exactly what it was trained on. It was 5TB of data.

2/3 of it was from Common Crawl. It had content from GitHub, Wikipedia, ArXiv, StackExchange and something called “Books”.

What’s Books? 4.5% of the training data was books. Part of this was Project Gutenberg, which is public dom

Creative AI Lab

creative-ai.org