domain-retrospective | Skill Performance & Reviews | TopRankSkills

TopRank Skills

Home / Skills / tools / domain-retrospective

domain-retrospective

maintained by Erland366

star 0 account_tree 0 verified_user MIT License
bolt View GitHub

name: domain-retrospective description: > Turn experiment reports and development notes into summaries and reusable skills. Adapts behavior based on project domain (research, unsloth, cuda) by reading registry.json. Triggers on or requests for lessons learned. metadata: short-description: "Summarize findings and distill them into skills" tags: - documentation - retrospective - knowledge-capture

Skill: domain-retrospective

When to use

Use this skill when:

  • The user message starts with <retrospective>, or
  • The user requests a summary or lessons-learned across experiments/development.

Initialization

  1. Read .codex/skills/registry.json to determine:

    • domain: research | unsloth | cuda
    • paths.reports: where to find experiment/benchmark reports
    • paths.experiment_log: path to experiment log
    • paths.troubleshooting: path to troubleshooting guide
    • paths.templates: path to templates directory
  2. Adapt behavior based on domain (see Domain-Specific Behavior below).

Behavior

  1. Select inputs

    • Use the user's description to identify relevant:
      • Reports from paths.reports directory
      • Sections of paths.experiment_log
    • If ambiguous, list candidate reports and ask the user to choose.
  2. Summarize findings

    • For each report, extract:
      • Setup and configuration
      • Key parameters/settings
      • Metrics and results
      • What worked (successes)
      • What failed (with reasons)
    • Write a markdown summary with:
      • "What we tried"
      • "Key findings"
      • "What failed"
      • "Open questions"
  3. Update troubleshooting (if needed)

    • If experiments reveal new error patterns and fixes:
      • Propose new entries for paths.troubleshooting
      • Use template from templates/references/troubleshooting-entry-template.md
      • Ask user for confirmation before editing.
  4. Propose or update result skills

    • Decide what result skills should capture these findings.
    • For each skill:
      • If new: start from templates/skills/result-skill-template.md
      • If existing: identify which sections to update
    • Draft SKILL.md content including:
      • General description and context
      • When to apply this knowledge
      • Results summary with concrete numbers
      • Recommended practice
      • Failure modes to avoid
    • Use domain-appropriate terminology and focus areas.
  5. Ask before writing

    • Present the proposed skill changes.
    • Only create or modify files under .codex/skills/ with user approval.
  6. Log the retrospective

    • Append a summarized entry to paths.experiment_log
    • Example: "2025-01-12 – Retrospective on LoRA rank experiments"
    • Include a short "General description" line for context.

Domain-Specific Behavior

Research Domain

When domain: research:

What to extract from reports:

  • Model architecture details
  • Training hyperparameters (lr, batch_size, epochs, warmup)
  • Dataset configurations and mixtures
  • Evaluation metrics (accuracy, loss, perplexity)
  • Training dynamics (convergence speed, stability)

Result skill focus:

  • Hyperparameter recommendations for specific tasks
  • Dataset mixture recipes
  • Model architecture insights
  • Training tips and tricks

Skill naming convention:

  • {task}-{finding} e.g., colbert-chunking-optimal, gpt2-lr-schedule

Unsloth Domain

When domain: unsloth:

What to extract from reports:

  • LoRA configuration (rank, alpha, target_modules)
  • Quantization settings
  • Memory usage and batch sizes achieved
  • Fine-tuning duration and throughput
  • Model-specific quirks

Result skill focus:

  • Optimal LoRA configurations for model families
  • Memory-efficient training recipes
  • Quantization tradeoffs
  • Common fine-tuning pitfalls

Skill naming convention:

  • {model}-{config} e.g., llama3-lora-optimal, mistral-4bit-recipe

CUDA Domain

When domain: cuda:

What to extract from reports:

  • Kernel configurations (block sizes, grid dims)
  • Memory access patterns
  • Bandwidth and FLOPS achieved
  • Occupancy and register usage
  • Profiling metrics (from nsight/ncu)

Result skill focus:

  • Optimal tiling strategies for operations
  • Memory coalescing patterns
  • Warp-level optimization techniques
  • Triton autotuning configurations

Skill naming convention:

  • {operation}-{optimization} e.g., softmax-online, matmul-tiled, attention-flash

Result Skill Template

The generated result skill should follow this structure:

---
name: {skill-name}
description: >
  {One-line description with trigger conditions}
  Use when: {specific scenarios}
metadata:
  short-description: "{Brief tagline}"
  tags:
    - {tag1}
    - {tag2}
  domain: {research|unsloth|cuda}
  created: {YYYY-MM-DD}
  author: {name}
---

# {Skill Name}

## General Description

{2-3 sentences on what this skill captures and why it matters}

## When to Apply

Use this knowledge when:
- {Condition 1}
- {Condition 2}

## Results Summary

| Metric | Value | Notes |
|--------|-------|-------|
| {metric1} | {value1} | {notes1} |

## Recommended Practice

{Concrete, actionable recommendations with specific values}

## Failure Modes

| What Failed | Why | Lesson |
|-------------|-----|--------|
| {attempt1} | {reason1} | {lesson1} |

## Configuration

{Copy-paste ready configuration, if applicable}

Example Output

Research Retrospective

## Retrospective: Attention Head Experiments (Jan 2025)

### What we tried
- Varied attention heads from 4 to 12 on GPT-2 small architecture
- Fixed: lr=1e-4, batch_size=32, 10 epochs

### Key findings
- 6 heads achieved 91.5% accuracy (vs 92% baseline with 8 heads)
- 4 heads dropped to 87% - too aggressive
- Wider FFN (4096) partially compensated for fewer heads

### What failed
- 4 heads without FFN compensation: 87% accuracy
- 12 heads: no improvement, just slower training

### Open questions
- Would 6 heads + deeper network work better?
- Test on larger model scales

---

**Proposed skill:** `attention-head-scaling`

Unsloth Retrospective

## Retrospective: Llama-3 Fine-tuning (Jan 2025)

### What we tried
- LoRA ranks: 8, 16, 32 on Llama-3 8B
- Quantization: 4-bit vs 8-bit
- Gradient checkpointing variations

### Key findings
- rank=16 + 4-bit optimal for A100 40GB
- rank=32 needed CPU offload, 2x slower
- 8-bit gave marginal quality improvement, not worth memory cost

### What failed
- rank=8: underfitting on complex tasks
- Full fine-tune: OOM even with offload

---

**Proposed skill:** `llama3-lora-optimal`

CUDA Retrospective

## Retrospective: Softmax Kernel Optimization (Jan 2025)

### What we tried
- 1D tiling (baseline)
- 2D tiling with various block sizes
- Warp-level reduction
- Online softmax algorithm

### Key findings
- 2D tiling (64x64) achieved 95% bandwidth utilization
- Online softmax 1.5x faster for attention fusion
- Warp shuffles eliminated shared memory bank conflicts

### What failed
- BLOCK_M=128: register spilling, 30% slowdown
- Naive reduction: bank conflicts killed performance

---

**Proposed skill:** `softmax-online`

chat Comments (0)

chat_bubble_outline

No comments yet. Be the first to share your thoughts!

Skill Details

GitHub Stars 0
GitHub Forks 0
Created Jan 2026
Last Updated 5个月前
tools tools productivity tools

Related Skills

ui-ux-pro-max
chevron_right
ai-sdk

ai-sdk

vercel
star 22.3k
chevron_right
planning-with-files
chevron_right
agent-browser
chevron_right
content-prd
chevron_right

Build your own?

Join 12,000+ developers contributing to the Claude ecosystem.