GitHub - lyuchenyang/Macaw-LLM: Macaw-LLM: Multi-Modal Language Modeling with Image, Video, Audio, and Text Integration
Just a moment...
researchgate.net
🥤 Cola [NeurIPS 2023]
Large Language Models are Visual Reasoning Coordinators
Liangyu Chen*,†,♥ Bo Li*,♥ Sheng Shen♣ Jingkang Yang♥
Chunyuan Li♠Kurt Keutzer♣ Trevor Darrell♣ Ziwei Liu✉,♥
♥S-Lab, Nanyang Technological University
♣University of California, Berkeley ♠Microsoft Research, Redmond
*Equal Contribution †Project Lead ✉Corresponding Author... See more
Large Language Models are Visual Reasoning Coordinators
Liangyu Chen*,†,♥ Bo Li*,♥ Sheng Shen♣ Jingkang Yang♥
Chunyuan Li♠Kurt Keutzer♣ Trevor Darrell♣ Ziwei Liu✉,♥
♥S-Lab, Nanyang Technological University
♣University of California, Berkeley ♠Microsoft Research, Redmond
*Equal Contribution †Project Lead ✉Corresponding Author... See more
cliangyu • GitHub - cliangyu/Cola: [NeurIPS2023] Official implementation of the paper "Large Language Models are Visual Reasoning Coordinators"
TorchMultimodal (Beta Release)
Introduction
TorchMultimodal is a PyTorch library for training state-of-the-art multimodal multi-task models at scale. It provides:
Introduction
TorchMultimodal is a PyTorch library for training state-of-the-art multimodal multi-task models at scale. It provides:
- A repository of modular and composable building blocks (models, fusion layers, loss functions, datasets and utilities).
- A repository of examples that show how to combine these building bloc