name: domain-retrospective
description: >
Turn experiment reports and development notes into summaries and reusable skills.
Adapts behavior based on project domain (research, unsloth, cuda) by reading registry.json.
Triggers on or requests for lessons learned.
metadata:
short-description: "Summarize findings and distill them into skills"
tags:
- documentation
- retrospective
- knowledge-capture
Skill: domain-retrospective
When to use
Use this skill when:
- The user message starts with
<retrospective>, or - The user requests a summary or lessons-learned across experiments/development.
Initialization
-
Read
.codex/skills/registry.jsonto determine:-
domain: research | unsloth | cuda -
paths.reports: where to find experiment/benchmark reports -
paths.experiment_log: path to experiment log -
paths.troubleshooting: path to troubleshooting guide -
paths.templates: path to templates directory
-
-
Adapt behavior based on domain (see Domain-Specific Behavior below).
Behavior
-
Select inputs
- Use the user's description to identify relevant:
- Reports from
paths.reportsdirectory - Sections of
paths.experiment_log
- Reports from
- If ambiguous, list candidate reports and ask the user to choose.
- Use the user's description to identify relevant:
-
Summarize findings
- For each report, extract:
- Setup and configuration
- Key parameters/settings
- Metrics and results
- What worked (successes)
- What failed (with reasons)
- Write a markdown summary with:
- "What we tried"
- "Key findings"
- "What failed"
- "Open questions"
- For each report, extract:
-
Update troubleshooting (if needed)
- If experiments reveal new error patterns and fixes:
- Propose new entries for
paths.troubleshooting - Use template from
templates/references/troubleshooting-entry-template.md - Ask user for confirmation before editing.
- Propose new entries for
- If experiments reveal new error patterns and fixes:
-
Propose or update result skills
- Decide what result skills should capture these findings.
- For each skill:
- If new: start from
templates/skills/result-skill-template.md - If existing: identify which sections to update
- If new: start from
- Draft SKILL.md content including:
- General description and context
- When to apply this knowledge
- Results summary with concrete numbers
- Recommended practice
- Failure modes to avoid
- Use domain-appropriate terminology and focus areas.
-
Ask before writing
- Present the proposed skill changes.
- Only create or modify files under
.codex/skills/with user approval.
-
Log the retrospective
- Append a summarized entry to
paths.experiment_log - Example: "2025-01-12 – Retrospective on LoRA rank experiments"
- Include a short "General description" line for context.
- Append a summarized entry to
Domain-Specific Behavior
Research Domain
When domain: research:
What to extract from reports:
- Model architecture details
- Training hyperparameters (lr, batch_size, epochs, warmup)
- Dataset configurations and mixtures
- Evaluation metrics (accuracy, loss, perplexity)
- Training dynamics (convergence speed, stability)
Result skill focus:
- Hyperparameter recommendations for specific tasks
- Dataset mixture recipes
- Model architecture insights
- Training tips and tricks
Skill naming convention:
-
{task}-{finding}e.g.,colbert-chunking-optimal,gpt2-lr-schedule
Unsloth Domain
When domain: unsloth:
What to extract from reports:
- LoRA configuration (rank, alpha, target_modules)
- Quantization settings
- Memory usage and batch sizes achieved
- Fine-tuning duration and throughput
- Model-specific quirks
Result skill focus:
- Optimal LoRA configurations for model families
- Memory-efficient training recipes
- Quantization tradeoffs
- Common fine-tuning pitfalls
Skill naming convention:
-
{model}-{config}e.g.,llama3-lora-optimal,mistral-4bit-recipe
CUDA Domain
When domain: cuda:
What to extract from reports:
- Kernel configurations (block sizes, grid dims)
- Memory access patterns
- Bandwidth and FLOPS achieved
- Occupancy and register usage
- Profiling metrics (from nsight/ncu)
Result skill focus:
- Optimal tiling strategies for operations
- Memory coalescing patterns
- Warp-level optimization techniques
- Triton autotuning configurations
Skill naming convention:
-
{operation}-{optimization}e.g.,softmax-online,matmul-tiled,attention-flash
Result Skill Template
The generated result skill should follow this structure:
---
name: {skill-name}
description: >
{One-line description with trigger conditions}
Use when: {specific scenarios}
metadata:
short-description: "{Brief tagline}"
tags:
- {tag1}
- {tag2}
domain: {research|unsloth|cuda}
created: {YYYY-MM-DD}
author: {name}
---
# {Skill Name}
## General Description
{2-3 sentences on what this skill captures and why it matters}
## When to Apply
Use this knowledge when:
- {Condition 1}
- {Condition 2}
## Results Summary
| Metric | Value | Notes |
|--------|-------|-------|
| {metric1} | {value1} | {notes1} |
## Recommended Practice
{Concrete, actionable recommendations with specific values}
## Failure Modes
| What Failed | Why | Lesson |
|-------------|-----|--------|
| {attempt1} | {reason1} | {lesson1} |
## Configuration
{Copy-paste ready configuration, if applicable}
Example Output
Research Retrospective
## Retrospective: Attention Head Experiments (Jan 2025)
### What we tried
- Varied attention heads from 4 to 12 on GPT-2 small architecture
- Fixed: lr=1e-4, batch_size=32, 10 epochs
### Key findings
- 6 heads achieved 91.5% accuracy (vs 92% baseline with 8 heads)
- 4 heads dropped to 87% - too aggressive
- Wider FFN (4096) partially compensated for fewer heads
### What failed
- 4 heads without FFN compensation: 87% accuracy
- 12 heads: no improvement, just slower training
### Open questions
- Would 6 heads + deeper network work better?
- Test on larger model scales
---
**Proposed skill:** `attention-head-scaling`
Unsloth Retrospective
## Retrospective: Llama-3 Fine-tuning (Jan 2025)
### What we tried
- LoRA ranks: 8, 16, 32 on Llama-3 8B
- Quantization: 4-bit vs 8-bit
- Gradient checkpointing variations
### Key findings
- rank=16 + 4-bit optimal for A100 40GB
- rank=32 needed CPU offload, 2x slower
- 8-bit gave marginal quality improvement, not worth memory cost
### What failed
- rank=8: underfitting on complex tasks
- Full fine-tune: OOM even with offload
---
**Proposed skill:** `llama3-lora-optimal`
CUDA Retrospective
## Retrospective: Softmax Kernel Optimization (Jan 2025)
### What we tried
- 1D tiling (baseline)
- 2D tiling with various block sizes
- Warp-level reduction
- Online softmax algorithm
### Key findings
- 2D tiling (64x64) achieved 95% bandwidth utilization
- Online softmax 1.5x faster for attention fusion
- Warp shuffles eliminated shared memory bank conflicts
### What failed
- BLOCK_M=128: register spilling, 30% slowdown
- Naive reduction: bank conflicts killed performance
---
**Proposed skill:** `softmax-online`
chat Comments (0)
Sign in to join the discussion and leave a comment.
Skill Details
Related Skills
Build your own?
Join 12,000+ developers contributing to the Claude ecosystem.
No comments yet. Be the first to share your thoughts!