AgentBench: Evaluating LLMs as Agents

AgentBench: Evaluating LLMs as Agents

Evaluating Large Language Models (LLMs) as agents in interactive environments, highlighting the performance gap between API-based and open-source models, and introducing the AgentBench benchmark.

arxiv.org

Saved by Darren LI

LLM Powered Autonomous Agents

Lilian Wenglilianweng.github.io

James Briggs LLMs Are Not All You Need | Pinecone

What We Learned From a Year of Building With LLMs

Bryan Bischoforeilly.com
Thumbnail of What We Learned From a Year of Building With LLMs

A Survey on Large Language Model based Autonomous Agents

A comprehensive survey on large language model (LLM)-based autonomous agents, including their architecture design, application domains, and evaluation strategies.

arxiv.org