AgentBench: Evaluating LLMs as Agents

AgentBench: Evaluating LLMs as Agents

arxiv.org

Saved by Darren LI

GitHub - microsoft/TinyTroupe: LLM-powered multiagent persona simulation for imagination enhancement and business insights.

github.com
Thumbnail of GitHub - microsoft/TinyTroupe: LLM-powered multiagent persona simulation for imagination enhancement and business insights.

Zhaofeng Wu Reasoning skills of large language models are often overestimated

GitHub - confident-ai/deepeval: The LLM Evaluation Framework

Testing framework for LLM Part