DeepSpeed-FastGen

Developing Rapidly with Generative AI

turboderp GitHub - turboderp/exllamav2: A fast inference library for running LLMs locally on modern consumer-class GPUs

sgl-project GitHub - sgl-project/sglang: SGLang is a structured generation language designed for large language models (LLMs). It makes your interaction with models faster and more controllable.

Ben Auffarth Generative AI with LangChain: Build large language model (LLM) apps with Python, ChatGPT, and other LLMs