GitHub - mit-han-lab/streaming-llm: Efficient Streaming Lang...

GitHub - mit-han-lab/streaming-llm: Efficient Streaming Language Models with Attention Sinks

RelatedCollectionsHighlights

txtai

neuml.github.io

Langfuse is an open source observability & analytics solution for LLM-based applications. It is mostly geared towards production usage but some users also use it for local development of their LLM applications.

Langfuse is focused on applications built on top of LLMs. Many new abstractions and common best practices evolved recently, e.g. agents,... See more

langfuse • GitHub - langfuse/langfuse: Open source observability and analytics for LLM applications

slowllama

Fine-tune Llama2 and CodeLLama models, including 70B/35B on Apple M1/M2 devices (for example, Macbook Air or Mac Mini) or consumer nVidia GPUs.

slowllama is not using any quantization. Instead, it offloads parts of model to SSD or main memory on both forward/backward passes. In contrast with training large models from scratch (unattainable... See more

okuvshynov • GitHub - okuvshynov/slowllama: Finetune llama2-70b and codellama on MacBook Air without quantization

[1hr Talk] Intro to Large Language Models

youtube.com