agent-evaluation | Skill Performance & Reviews | TopRankSkills

TopRank Skills

Home / Skills / tools / agent-evaluation

agent-evaluation

maintained by sickn33

star 7.5k account_tree 1.6k verified_user MIT License
bolt View GitHub

Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world benchmarks Use when: agent testing, agent evaluation, benchmark agents, agent reliability, test agent.

Key Features

  • Comprehensive skill evaluation and performance tracking
  • Community-driven ratings and reviews
  • Easy integration with Claude Code
  • Regular updates and maintenance

Quick Start

TopRank Skills install sickn33/agent-evaluation

chat Comments (0)

chat_bubble_outline

No comments yet. Be the first to share your thoughts!

Skill Details

GitHub Stars 7.5k
GitHub Forks 1.6k
Created Jan 2026
Last Updated 4个月前
tools tools llm ai

Related Skills

ai-sdk

ai-sdk

vercel
star 22.3k
chevron_right
planning-with-files
chevron_right
ui-skills
chevron_right
biomni
chevron_right
building-agents
chevron_right

Build your own?

Join 12,000+ developers contributing to the Claude ecosystem.