AgentBench: Evaluating LLMs as Agents

Evaluating Large Language Models (LLMs) as agents in interactive environments, highlighting the performance gap between API-based and open-source models, and introducing the AgentBench benchmark.

arxiv.org

Saved by Darren LI

RelatedCollectionsHighlightsNotes

LLM Powered Autonomous Agents

Lilian Weng lilianweng.github.io

An agent can be thought of as a logical wrapper around an LLM, allowing us to add several features to our AI systems, primarily:

Tool usage, such as calling APIs for info, executing code,

Internal thoughts over multiple generation steps

Ability to use various tools and reasoning steps to answer more complex queries.

Parallel agents can go and complete

James Briggs • LLMs Are Not All You Need | Pinecone

What We Learned From a Year of Building With LLMs

Bryan Bischof oreilly.com

A Survey on Large Language Model based Autonomous Agents

A comprehensive survey on large language model (LLM)-based autonomous agents, including their architecture design, application domains, and evaluation strategies.

arxiv.org