GitHub - google/magika: Detect file content types with deep learning
ata Collection Experimentation Evaluation and Deployment Monitoring and Response Metadata Data catalogs, Amundsen, AWS Glue, Hive metas-tores Weights & Biases, MLFlow, train/test set parameter configs, A/B test tracking tools Dashboards, SQL, metric functions and window sizes Unit Data cleaning tools Tensorflow, ML-lib, PyTorch, Scikit-learn, X... See more
Shreya Shankar • "We Have No Idea How Models will Behave in Production until Production": How Engineers Operationalize Machine Learning.
NFX’s Generative Tech Open Source Market Map
docs.google.comMeta AI released LLaMA ... and they included a paper which described exactly what it was trained on. It was 5TB of data.
2/3 of it was from Common Crawl. It had content from GitHub, Wikipedia, ArXiv, StackExchange and something called “Books”.
What’s Books? 4.5% of the training data was books. Part of this was Project Gutenberg, which is public dom
Creative AI Lab
creative-ai.org