Sublime
An inspiration engine for ideas
StreamingLLM can enable Llama-2, MPT, Falcon, and Pythia to perform stable and efficient language modeling with up to 4 million tokens and more.
mit-han-lab • GitHub - mit-han-lab/streaming-llm: Efficient Streaming Language Models with Attention Sinks
two sorted sets
Markus Winand • Improving join-performance of SQL databases
In streaming settings, StreamingLLM outperforms the sliding window recomputation baseline by up to 22.2x speedup.
mit-han-lab • GitHub - mit-han-lab/streaming-llm: Efficient Streaming Language Models with Attention Sinks
