GitHub - predibase/lorax: Multi-LoRA inference server that s...

GitHub - predibase/lorax: Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

RelatedHighlights

TL;DR

LLMLingua utilizes a compact, well-trained language model (e.g., GPT2-small, LLaMA-7B) to identify and remove non-essential tokens in prompts. This approach enables efficient inference with large language models (LLMs), achieving up to 20x compression with minimal performance loss.

LLMLingua: Compressing Prompts for Accelerated Inference of La

microsoft • GitHub - microsoft/LLMLingua: To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.

GPT4All: An ecosystem of open-source on-edge large language models.

Important

GPT4All v2.5.0 and newer only supports models in GGUF format (.gguf). Models used with a previous version of GPT4All (.bin extension) will no longer work.

GPT4All is an ecosystem to run powerful and customized large language models that work locally on consumer grade CPUs a... See more

nomic-ai • GitHub - nomic-ai/gpt4all: gpt4all: open-source LLM chatbots that you can run anywhere

Luminal is a deep learning library that uses composable compilers to achieve high performance.

Current ML libraries tend to be large and complex because they try to map high level operations directly on to low level handwritten kernels, and focus on eager execution. Libraries like PyTorch contain hundreds of thousands of lines of code, making it ne... See more

r/MachineLearning - Reddit

Llama | Governance for Onchain Organizations

llama.xyz