AgentBench: Evaluating LLMs as Agents

AgentBench: Evaluating LLMs as Agents

Evaluating Large Language Models (LLMs) as agents in interactive environments, highlighting the performance gap between API-based and open-source models, and introducing the AgentBench benchmark.

arxiv.org

Saved by Darren LI

GitHub - arthur-ai/bench: A tool for evaluating LLMs

Embracing Agentic AI: Empowering Artificial Intelligence as Active Agents

Eva Kaushikmedium.com
Thumbnail of Embracing Agentic AI: Empowering Artificial Intelligence as Active Agents

GitHub - BrunoScaglione/langtest: Deliver safe & effective language models

What I learned from looking at 900 most popular open source AI tools

Chip Huyenhuyenchip.com
Thumbnail of What I learned from looking at 900 most popular open source AI tools