GitHub - okuvshynov/slowllama: Finetune llama2-70b and codel...

GitHub - okuvshynov/slowllama: Finetune llama2-70b and codellama on MacBook Air without quantization

okuvshynov github.com

RelatedHighlights

Ollama

ollama.com

2-5x faster 50% less memory local LLM finetuning

Manual autograd engine - hand derived backprop steps.

2x to 5x faster than QLoRA. 50% less memory usage.

All kernels written in OpenAI's Triton language.

0% loss in accuracy - no approximation methods - all exact.

No change of hardware necessary. Supports NVIDIA GPUs since 2018+. Minimum CUDA Compute Cap

unslothai • GitHub - unslothai/unsloth: 5X faster 50% less memory LLM finetuning

ExLlamaV2

ExLlamaV2 is an inference library for running local LLMs on modern consumer GPUs.

Overview of differences compared to V1

Faster, better kernels

Cleaner and more versatile codebase

Support for a new quant format (see below)

turboderp • GitHub - turboderp/exllamav2: A fast inference library for running LLMs locally on modern consumer-class GPUs

Mistral-finetune

mistral-finetune is a light-weight codebase that enables memory-efficient and performant finetuning of Mistral's models. It is based on LoRA, a training paradigm where most weights are frozen and only 1-2% additional weights in the form of low-rank matrix perturbations are trained.

For maximum efficiency it is recommended to use a A... See more

Ollama

unslothai • GitHub - unslothai/unsloth: 5X faster 50% less memory LLM finetuning

turboderp • GitHub - turboderp/exllamav2: A fast inference library for running LLMs locally on modern consumer-class GPUs

GitHub - mistralai/mistral-finetune