r/LocalLLaMA - Reddit

RelatedInsightsHighlights

ExLlamaV2

ExLlamaV2 is an inference library for running local LLMs on modern consumer GPUs.

Overview of differences compared to V1

Faster, better kernels

Cleaner and more versatile codebase

Support for a new quant format (see below)

turboderp • GitHub - turboderp/exllamav2: A fast inference library for running LLMs locally on modern consumer-class GPUs

If you are looking to develop an AI application, and you have a Mac or Linux machine, Ollama is great because it’s very easy to set up, easy to work with, and fast.

If you are looking to chat locally with documents, GPT4All is the best out of the box solution that is also easy to set up

If you are looking for advanced control and insight into neural

Moyi • 10 Ways To Run LLMs Locally And Which One Works Best For You

slowllama

Fine-tune Llama2 and CodeLLama models, including 70B/35B on Apple M1/M2 devices (for example, Macbook Air or Mac Mini) or consumer nVidia GPUs.

slowllama is not using any quantization. Instead, it offloads parts of model to SSD or main memory on both forward/backward passes. In contrast with training large models from scratch (unattainable... See more

okuvshynov • GitHub - okuvshynov/slowllama: Finetune llama2-70b and codellama on MacBook Air without quantization

LlamaHub is a library of data loaders, readers, and tools created by the LlamaIndex community. It provides utilities to easily connect LLMs to diverse knowledge sources.

turboderp • GitHub - turboderp/exllamav2: A fast inference library for running LLMs locally on modern consumer-class GPUs

Moyi • 10 Ways To Run LLMs Locally And Which One Works Best For You

okuvshynov • GitHub - okuvshynov/slowllama: Finetune llama2-70b and codellama on MacBook Air without quantization

Ben Auffarth • Generative AI with LangChain: Build large language model (LLM) apps with Python, ChatGPT, and other LLMs