GitHub - predibase/lorax: Multi-LoRA inference server that s...

GitHub - predibase/lorax: Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

RelatedInsightsHighlights

Ollama

MLServer aims to provide an easy way to start serving your machine learning models through a REST and gRPC interface, fully compliant with KFServing's V2 Dataplane spec. Watch a quick video introducing the project here.

Multi-model serving, letting users run multiple models within the same process.

Ability to run inference in parallel for vertical sc

GitHub - SeldonIO/MLServer: An inference server for your machine learning models, including support for multiple frameworks, multi-model serving and more

GitHub - Sairyss/system-design-patterns: Resources related to distributed systems, system design, microservices, scalability and performance, etc

github.com

Thumbnail of GitHub - Sairyss/system-design-patterns: Resources related to distributed systems, system design, microservices, scalability and performance, etc

Large language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Flax. EasyLM can scale up LLM training to hundreds of TPU/GPU accelerators by leveraging JAX's pjit functionality.

Building on top of Hugginface's transformers and datasets, this repo provides an easy to use and easy... See more

young-geng • GitHub - young-geng/EasyLM: Large language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Flax.