GitHub - lyuchenyang/Macaw-LLM: Macaw-LLM: Multi-Modal Langu...

GitHub - lyuchenyang/Macaw-LLM: Macaw-LLM: Multi-Modal Language Modeling with Image, Video, Audio, and Text Integration

RelatedInsightsHighlights

Just a moment...

🥤 Cola [NeurIPS 2023]

Large Language Models are Visual Reasoning Coordinators

Liangyu Chen*,†,♥ Bo Li*,♥ Sheng Shen♣ Jingkang Yang♥

Chunyuan Li♠ Kurt Keutzer♣ Trevor Darrell♣ Ziwei Liu✉,♥

♥S-Lab, Nanyang Technological University

♣University of California, Berkeley ♠Microsoft Research, Redmond

*Equal Contribution †Project Lead ✉Corresponding Author... See more

cliangyu • GitHub - cliangyu/Cola: [NeurIPS2023] Official implementation of the paper "Large Language Models are Visual Reasoning Coordinators"

TorchMultimodal (Beta Release)

Introduction

TorchMultimodal is a PyTorch library for training state-of-the-art multimodal multi-task models at scale. It provides:

A repository of modular and composable building blocks (models, fusion layers, loss functions, datasets and utilities).

A repository of examples that show how to combine these building bloc

facebookresearch • GitHub - facebookresearch/multimodal at a33a8b888a542a4578b16972aecd072eff02c1a6

Datasets as Imagination

Lila Shroff joinreboot.org