evaluating-llms-harness | Skill Performance & Reviews | TopRankSkills

TopRank Skills

Home / Skills / tools / evaluating-llms-harness

evaluating-llms-harness

maintained by Orchestra-Research

star 4.7k account_tree 380 verified_user MIT License
bolt View GitHub

Evaluates LLMs across 60+ academic benchmarks (MMLU, HumanEval, GSM8K, TruthfulQA, HellaSwag). Use when benchmarking model quality, comparing models, reporting academic results, or tracking training progress. Industry standard used by EleutherAI, HuggingFace, and major labs. Supports HuggingFace, vLLM, APIs.

Key Features

  • Comprehensive skill evaluation and performance tracking
  • Community-driven ratings and reviews
  • Easy integration with Claude Code
  • Regular updates and maintenance

Quick Start

TopRank Skills install Orchestra-Research/lm-evaluation-harness

chat Comments (0)

chat_bubble_outline

No comments yet. Be the first to share your thoughts!

Skill Details

GitHub Stars 4.7k
GitHub Forks 380
Created Mar 2026
Last Updated il y a 3 mois
tools tools debugging

Related Skills

fabric
chevron_right
typescript-expert
chevron_right
break-loop
chevron_right
burp-suite
chevron_right
page-behavior-audit
chevron_right

Build your own?

Join 12,000+ developers contributing to the Claude ecosystem.