GitHub - unslothai/unsloth: 5X faster 50% less memory LLM finetuning
Ollama
ollama.com
slowllama
Fine-tune Llama2 and CodeLLama models, including 70B/35B on Apple M1/M2 devices (for example, Macbook Air or Mac Mini) or consumer nVidia GPUs.
slowllama is not using any quantization. Instead, it offloads parts of model to SSD or main memory on both forward/backward passes. In contrast with training large models from scratch (unattainable... See more
Fine-tune Llama2 and CodeLLama models, including 70B/35B on Apple M1/M2 devices (for example, Macbook Air or Mac Mini) or consumer nVidia GPUs.
slowllama is not using any quantization. Instead, it offloads parts of model to SSD or main memory on both forward/backward passes. In contrast with training large models from scratch (unattainable... See more
okuvshynov • GitHub - okuvshynov/slowllama: Finetune llama2-70b and codellama on MacBook Air without quantization
Deep-ML
deep-ml.comExLlamaV2
ExLlamaV2 is an inference library for running local LLMs on modern consumer GPUs.
Overview of differences compared to V1
ExLlamaV2 is an inference library for running local LLMs on modern consumer GPUs.
Overview of differences compared to V1
- Faster, better kernels
- Cleaner and more versatile codebase
- Support for a new quant format (see below)