GitHub - sqrkl/lm-evaluation-harness: A framework for few-shot evaluation of language models.

GitHub - sqrkl/lm-evaluation-harness: A framework for few-shot evaluation of language models.

github.com
Thumbnail of GitHub - sqrkl/lm-evaluation-harness: A framework for few-shot evaluation of language models.

GitHub - arthur-ai/bench: A tool for evaluating LLMs

Just a moment...

researchgate.net
Thumbnail of Just a moment...

GitHub - confident-ai/deepeval: The LLM Evaluation Framework