llm-evaluator

LLM-as-a-Judge evaluator via Langfuse. Scores traces on relevance, accuracy, hallucination, and helpfulness using GPT-5-nano as judge. Supports single trace scoring, batch backfill, and test mode. Integrates with Langfuse dashboard for observability. Triggers: evaluate trace, score quality, check accuracy, backfill scores, test evaluator, LLM judge.

View Source SKILL.md

Stars

Installs

Status

ACTIVE

Visibility

PUBLIC

安装方式

直接复制以下提示词，发送给你的 AI 助手即可完成安装。

请先检查是否已安装 SkillHub 商店，若未安装，请根据 https://skillhub-1388575217.cos.ap-guangzhou.myqcloud.com/install/skillhub.md 安装 SkillHub 商店，然后安装 llm-evaluator 技能。若已安装，则直接安装 llm-evaluator 技能。

Overview

Skill Key: aiwithabidi/llm-evaluator-pro
Author: aiwithabidi
Source Repo: openclaw/skills
Version: 1.0.0
Source Path: skills/aiwithabidi/llm-evaluator-pro
Latest Commit SHA: b949d889520dfc2053605ef1f4c14304a9c575cd

Extracted Content

SKILL.md excerpt

# LLM Evaluator ⚖️

LLM-as-a-Judge evaluation system powered by Langfuse. Uses GPT-5-nano to score AI outputs.

## When to Use

- Evaluating quality of search results or AI responses
- Scoring traces for relevance, accuracy, hallucination detection
- Batch scoring recent unscored traces
- Quality assurance on agent outputs

## Usage

```bash
# Test with sample cases
python3 {baseDir}/scripts/evaluator.py test

# Score a specific Langfuse trace
python3 {baseDir}/scripts/evaluator.py score <trace_id>

# Score with specific evaluator only
python3 {baseDir}/scripts/evaluator.py score <trace_id> --evaluators relevance

# Backfill scores on recent unscored traces
python3 {baseDir}/scripts/evaluator.py backfill --limit 20
```

## Evaluators

| Evaluator | Measures | Scale |
|-----------|----------|-------|
| relevance | Response relevance to query | 0–1 |
| accuracy | Factual correctness | 0–1 |
| hallucination | Made-up information detection | 0–1 |
| helpfulness | Overall usefulness | 0–1 |

## Credits

Built by [M. Abidi](https://www.linkedin.com/in/mohammad-ali-abidi) | [agxntsix.ai](https://www.agxntsix.ai)
[YouTube](https://youtube.com/@aiwithabidi) | [GitHub](https://github.com/aiwithabidi)
Part of the **AgxntSix Skill Suite** for OpenClaw agents.

📅 **Need help setting up OpenClaw for your business?** [Book a free consultation](https://cal.com/agxntsix/abidi-openclaw)

Related Claw Skills

youmind-openlab

nano-banana-pro-prompts-recommend-skill

★ 1,218

AI skill for OpenClaw & Claude Code — recommend from 10000+ Nano Banana Pro (Gemini) image prompts. Smart search by use case, content remix, sample images.

23blocks-os

ai-maestro

★ 525

AI Agent Orchestrator with Skills System - Give AI Agents superpowers: memory search, code graph queries, agent-to-agent messaging. Manage Claude, Codex or any AI Agent from one dashboard. Move Agents between computers and locations

hashgraph-online

registry-broker-skills

★ 118

AI agent skills for the Universal Registry - search, chat, and register 72,000+ agents across 14+ protocols. Works with Claude, Codex, Cursor, OpenClaw, and any AI assistant.

rito-w

skills-manager

★ 90

A cross-platform skills manager for AI IDEs. Search marketplace, download locally, and install to Claude, Cursor, Windsurf, and more with one click.

besoeasy

open-skills

★ 87

Battle-tested skill library for AI agents. Save 98% of API costs with ready-to-use code for crypto, PDFs, search, web scraping & more. No trial-and-error, no expensive APIs.

zeropointrepo

youtube-skills

★ 73

YouTube Transcript API skills for AI agents. Get transcripts, search videos, browse channels. Works with OpenClaw, ClawdBot, Claude Code, Cursor, Windsurf.