arXiv:2405.02048v1 [cs.IR] 3 May 2024
In streaming settings, StreamingLLM outperforms the sliding window recomputation baseline by up to 22.2x speedup.
mit-han-lab • GitHub - mit-han-lab/streaming-llm: Efficient Streaming Language Models with Attention Sinks
Gorilla
gorilla.cs.berkeley.edu