Sublime

An inspiration engine for ideas

GitHub - arthur-ai/bench: A tool for evaluating LLMs

Testing framework for LLM Part

dstackai GitHub - dstackai/dstack: dstack is an open-source toolkit for running GPU workloads on any cloud. It works seamlessly with any cloud GPU providers. Discord: https://discord.gg/u8SmfwPpMd

GitHub - mem0ai/mem0: The memory layer for Personalized AI

GitHub - deepseek-ai/DeepSeek-R1

github.com
Thumbnail of GitHub - deepseek-ai/DeepSeek-R1

Carlos Data Machina #222

langgenius GitHub - langgenius/dify: An Open-Source Assistants API and GPTs alternative. Dify.AI is an LLM application development platform. It integrates the concepts of Backend as a Service and LLMOps, covering the...

AgentBench: Evaluating LLMs as Agents

Evaluating Large Language Models (LLMs) as agents in interactive environments, highlighting the performance gap between API-based and open-source models, and introducing the AgentBench benchmark.

arxiv.org