Home / Skills / tools / agent-evaluation

agent-evaluation

maintained by sickn33

star 7.5k account_tree 1.6k verified_user MIT License

Overview Implementation Examples History

Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world benchmarks Use when: agent testing, agent evaluation, benchmark agents, agent reliability, test agent.

Key Features

Comprehensive skill evaluation and performance tracking
Community-driven ratings and reviews
Easy integration with Claude Code
Regular updates and maintenance

Quick Start

TopRank Skills install sickn33/agent-evaluation

chat Comments (0)

chat_bubble_outline

No comments yet. Be the first to share your thoughts!

Skill Details

GitHub Stars 7.5k

GitHub Forks 1.6k

Created Jan 2026

Last Updated 4个月前

tools tools llm ai

Related Skills

ai-sdk

vercel

star 22.3k

chevron_right

planning-with-files

OthmanAdi

star 13.5k

chevron_right

ui-skills

baptisteArno

star 9.7k

chevron_right

biomni

K-Dense-AI

star 8.6k

chevron_right

building-agents

adenhq

star 8.6k

chevron_right

Build your own?

Join 12,000+ developers contributing to the Claude ecosystem.

Sign in to Comment

agent-evaluation

Key Features

Quick Start

chat Comments (0)

Skill Details

Related Skills

ai-sdk

planning-with-files

ui-skills

biomni

building-agents

Build your own?