evaluating-llms-harness
maintained by Orchestra-Research
star
4.7k
account_tree
380
verified_user
MIT License
Evaluates LLMs across 60+ academic benchmarks (MMLU, HumanEval, GSM8K, TruthfulQA, HellaSwag). Use when benchmarking model quality, comparing models, reporting academic results, or tracking training progress. Industry standard used by EleutherAI, HuggingFace, and major labs. Supports HuggingFace, vLLM, APIs.
Key Features
- Comprehensive skill evaluation and performance tracking
- Community-driven ratings and reviews
- Easy integration with Claude Code
- Regular updates and maintenance
Quick Start
TopRank Skills install Orchestra-Research/lm-evaluation-harness
chat Comments (0)
Sign in to join the discussion and leave a comment.
Skill Details
GitHub Stars
4.7k
GitHub Forks
380
Created
Mar 2026
Last Updated
il y a 3 mois
tools
tools debugging
Related Skills
Build your own?
Join 12,000+ developers contributing to the Claude ecosystem.
No comments yet. Be the first to share your thoughts!