TopRank Skills

Home / Claw Skills / 其他 / ml-model-eval-benchmark
Official OpenClaw rules 15%

ml-model-eval-benchmark

Compare model candidates using weighted metrics and deterministic ranking outputs. Use for benchmark leaderboards and model promotion decisions.

Stars

0

Installs

0

Status

ACTIVE

Visibility

PUBLIC

安装方式

直接复制以下提示词,发送给你的 AI 助手即可完成安装。

请先检查是否已安装 SkillHub 商店,若未安装,请根据 https://skillhub-1388575217.cos.ap-guangzhou.myqcloud.com/install/skillhub.md 安装 SkillHub 商店,然后安装 ml-model-eval-benchmark 技能。 若已安装,则直接安装 ml-model-eval-benchmark 技能。

Overview

Skill Key
0x-professor/ml-model-eval-benchmark
Author
0x-professor
Source Repo
openclaw/skills
Version
-
Source Path
skills/0x-professor/ml-model-eval-benchmark
Latest Commit SHA
1726463af6394f1f1a7ec414b5e0d9170ee5c6b1

Extracted Content

SKILL.md excerpt

# ML Model Eval Benchmark

## Overview

Produce consistent model ranking outputs from metric-weighted evaluation inputs.

## Workflow

1. Define metric weights and accepted metric ranges.
2. Ingest model metrics for each candidate.
3. Compute weighted score and ranking.
4. Export leaderboard and promotion recommendation.

## Use Bundled Resources

- Run `scripts/benchmark_models.py` to generate benchmark outputs.
- Read `references/benchmarking-guide.md` for weighting and tie-break guidance.

## Guardrails

- Keep metric names and scales consistent across candidates.
- Record weighting assumptions in the output.

Related Claw Skills