Overview
- Skill Key
- globalcaos/memory-bench-pioneer
- Author
- globalcaos
- Source Repo
- openclaw/skills
- Version
- -
- Source Path
- skills/globalcaos/memory-bench-pioneer
- Latest Commit SHA
- 56b71efac48fd8f23a169a5284fa6d2fe10a29d4
Be one of the first to benchmark your agent's memory — and help shape how AI remembers. Runs a peer-review-grade evaluation suite (LLM-as-judge, nDCG/MAP/MRR with 95% CIs, ablation studies) against your live memory system and submits anonymized results to the ENGRAM/CORTEX research papers. Your data stays private; only aggregate stats leave. Works with agent-memory-ultimate. For the bold few who believe AI memory should be measured, not guessed at.
Stars
0
Installs
0
Status
ACTIVE
Visibility
PUBLIC
直接复制以下提示词,发送给你的 AI 助手即可完成安装。
请先检查是否已安装 SkillHub 商店,若未安装,请根据 https://skillhub-1388575217.cos.ap-guangzhou.myqcloud.com/install/skillhub.md 安装 SkillHub 商店,然后安装 memory-bench-pioneer 技能。 若已安装,则直接安装 memory-bench-pioneer 技能。
# Memory Bench Collect, assess, and submit anonymized memory system statistics for the ENGRAM and CORTEX research papers. ## Three-Step Pipeline ### 1. Assess Retrieval Quality Run the standard test set (30 queries across 4 types × 3 difficulty levels) with LLM-as-judge: ```bash # Full assessment with GPT-4o-mini judge + ablation (recommended) python3 scripts/rate.py --queries 30 --judge openai --ablation # Without OpenAI key: local embedding judge (weaker, marked in output) python3 scripts/rate.py --queries 30 --judge local --ablation # Custom test set python3 scripts/rate.py --testset path/to/queries.json --judge openai ``` **What it measures:** - **RAR** (Recall Accuracy Ratio), **MRR** (Mean Reciprocal Rank) - **nDCG@5**, **MAP@5**, **Precision@5**, **Hit Rate** - All metrics include **95% bootstrap confidence intervals** - **Ablation**: runs with AND without spreading activation to isolate its contribution **Judge methods:** - `openai` — GPT-4o-mini rates each (query, result) pair 1-5. Independent from retrieval system. ~$0.01 per run. - `local` — Embedding cosine similarity. Weaker, marked as such in output. Zero cost. **Standard test set** (`scripts/testset.json`): 30 queries stratified across semantic/episodic/procedural/strategic types and easy/medium/hard difficulty. No lexical overlap with stored memories. All deployments run the same queries for cross-site comparability. ### 2. Collect Statistics ```bash python3 scripts/collect.py --contributor GITHUB_USER --days 14 --output /tmp/memory-bench-report.json ``` **Collected (anonymized):** Memory counts/types/ages, strength/importance histograms, association graph size, hierarchy levels, consolidation history, retrieval metrics (RAR/MRR/nDCG/MAP with CIs), ablation results, judge method, algorithm version, embedding coverage. Instance ID is a random UUID (not reversible). **Never collected:** Memory content, queries, file paths, usernames, hostnames. ### 3. Submit as PR ```bash scripts/subm...
youmind-openlab
AI skill for OpenClaw & Claude Code — recommend from 10000+ Nano Banana Pro (Gemini) image prompts. Smart search by use case, content remix, sample images.
23blocks-os
AI Agent Orchestrator with Skills System - Give AI Agents superpowers: memory search, code graph queries, agent-to-agent messaging. Manage Claude, Codex or any AI Agent from one dashboard. Move Agents between computers and locations
hashgraph-online
AI agent skills for the Universal Registry - search, chat, and register 72,000+ agents across 14+ protocols. Works with Claude, Codex, Cursor, OpenClaw, and any AI assistant.
rito-w
A cross-platform skills manager for AI IDEs. Search marketplace, download locally, and install to Claude, Cursor, Windsurf, and more with one click.
besoeasy
Battle-tested skill library for AI agents. Save 98% of API costs with ready-to-use code for crypto, PDFs, search, web scraping & more. No trial-and-error, no expensive APIs.
zeropointrepo
YouTube Transcript API skills for AI agents. Get transcripts, search videos, browse channels. Works with OpenClaw, ClawdBot, Claude Code, Cursor, Windsurf.