name: domain-retrospective description: > Turn experiment reports and development notes into summaries and reusable skills. Adapts behavior based on project domain (research, unsloth, cuda) by reading registry.json. Triggers on or requests for lessons learned. metadata: short-description: "Summarize findings and distill them into skills" tags: - documentation - retrospective - knowledge-capture

Skill: domain-retrospective

When to use

Use this skill when:

The user message starts with <retrospective>, or
The user requests a summary or lessons-learned across experiments/development.

Initialization

Read .codex/skills/registry.json to determine:
- domain: research | unsloth | cuda
- paths.reports: where to find experiment/benchmark reports
- paths.experiment_log: path to experiment log
- paths.troubleshooting: path to troubleshooting guide
- paths.templates: path to templates directory
Adapt behavior based on domain (see Domain-Specific Behavior below).

Behavior

Select inputs
- Use the user's description to identify relevant:
  - Reports from paths.reports directory
  - Sections of paths.experiment_log
- If ambiguous, list candidate reports and ask the user to choose.
Summarize findings
- For each report, extract:
  - Setup and configuration
  - Key parameters/settings
  - Metrics and results
  - What worked (successes)
  - What failed (with reasons)
- Write a markdown summary with:
  - "What we tried"
  - "Key findings"
  - "What failed"
  - "Open questions"
Update troubleshooting (if needed)
- If experiments reveal new error patterns and fixes:
  - Propose new entries for paths.troubleshooting
  - Use template from templates/references/troubleshooting-entry-template.md
  - Ask user for confirmation before editing.
Propose or update result skills
- Decide what result skills should capture these findings.
- For each skill:
  - If new: start from templates/skills/result-skill-template.md
  - If existing: identify which sections to update
- Draft SKILL.md content including:
  - General description and context
  - When to apply this knowledge
  - Results summary with concrete numbers
  - Recommended practice
  - Failure modes to avoid
- Use domain-appropriate terminology and focus areas.
Ask before writing
- Present the proposed skill changes.
- Only create or modify files under .codex/skills/ with user approval.
Log the retrospective
- Append a summarized entry to paths.experiment_log
- Example: "2025-01-12 – Retrospective on LoRA rank experiments"
- Include a short "General description" line for context.

Domain-Specific Behavior

Research Domain

When domain: research:

What to extract from reports:

Model architecture details
Training hyperparameters (lr, batch_size, epochs, warmup)
Dataset configurations and mixtures
Evaluation metrics (accuracy, loss, perplexity)
Training dynamics (convergence speed, stability)

Result skill focus:

Hyperparameter recommendations for specific tasks
Dataset mixture recipes
Model architecture insights
Training tips and tricks

Skill naming convention:

{task}-{finding} e.g., colbert-chunking-optimal, gpt2-lr-schedule

Unsloth Domain

When domain: unsloth:

What to extract from reports:

LoRA configuration (rank, alpha, target_modules)
Quantization settings
Memory usage and batch sizes achieved
Fine-tuning duration and throughput
Model-specific quirks

Result skill focus:

Optimal LoRA configurations for model families
Memory-efficient training recipes
Quantization tradeoffs
Common fine-tuning pitfalls

Skill naming convention:

{model}-{config} e.g., llama3-lora-optimal, mistral-4bit-recipe

CUDA Domain

When domain: cuda:

What to extract from reports:

Kernel configurations (block sizes, grid dims)
Memory access patterns
Bandwidth and FLOPS achieved
Occupancy and register usage
Profiling metrics (from nsight/ncu)

Result skill focus:

Optimal tiling strategies for operations
Memory coalescing patterns
Warp-level optimization techniques
Triton autotuning configurations

Skill naming convention:

{operation}-{optimization} e.g., softmax-online, matmul-tiled, attention-flash

Result Skill Template

The generated result skill should follow this structure:

---
name: {skill-name}
description: >
  {One-line description with trigger conditions}
  Use when: {specific scenarios}
metadata:
  short-description: "{Brief tagline}"
  tags:
    - {tag1}
    - {tag2}
  domain: {research|unsloth|cuda}
  created: {YYYY-MM-DD}
  author: {name}
---

# {Skill Name}

## General Description

{2-3 sentences on what this skill captures and why it matters}

## When to Apply

Use this knowledge when:
- {Condition 1}
- {Condition 2}

## Results Summary

| Metric | Value | Notes |
|--------|-------|-------|
| {metric1} | {value1} | {notes1} |

## Recommended Practice

{Concrete, actionable recommendations with specific values}

## Failure Modes

| What Failed | Why | Lesson |
|-------------|-----|--------|
| {attempt1} | {reason1} | {lesson1} |

## Configuration

{Copy-paste ready configuration, if applicable}

Example Output

Research Retrospective

## Retrospective: Attention Head Experiments (Jan 2025)

### What we tried
- Varied attention heads from 4 to 12 on GPT-2 small architecture
- Fixed: lr=1e-4, batch_size=32, 10 epochs

### Key findings
- 6 heads achieved 91.5% accuracy (vs 92% baseline with 8 heads)
- 4 heads dropped to 87% - too aggressive
- Wider FFN (4096) partially compensated for fewer heads

### What failed
- 4 heads without FFN compensation: 87% accuracy
- 12 heads: no improvement, just slower training

### Open questions
- Would 6 heads + deeper network work better?
- Test on larger model scales

---

**Proposed skill:** `attention-head-scaling`

Unsloth Retrospective

## Retrospective: Llama-3 Fine-tuning (Jan 2025)

### What we tried
- LoRA ranks: 8, 16, 32 on Llama-3 8B
- Quantization: 4-bit vs 8-bit
- Gradient checkpointing variations

### Key findings
- rank=16 + 4-bit optimal for A100 40GB
- rank=32 needed CPU offload, 2x slower
- 8-bit gave marginal quality improvement, not worth memory cost

### What failed
- rank=8: underfitting on complex tasks
- Full fine-tune: OOM even with offload

---

**Proposed skill:** `llama3-lora-optimal`

CUDA Retrospective

## Retrospective: Softmax Kernel Optimization (Jan 2025)

### What we tried
- 1D tiling (baseline)
- 2D tiling with various block sizes
- Warp-level reduction
- Online softmax algorithm

### Key findings
- 2D tiling (64x64) achieved 95% bandwidth utilization
- Online softmax 1.5x faster for attention fusion
- Warp shuffles eliminated shared memory bank conflicts

### What failed
- BLOCK_M=128: register spilling, 30% slowdown
- Naive reduction: bank conflicts killed performance

---

**Proposed skill:** `softmax-online`

TopRank Skills

domain-retrospective

Skill: domain-retrospective

When to use

Initialization

Behavior

Domain-Specific Behavior

Research Domain

Unsloth Domain

CUDA Domain

Result Skill Template

Example Output

Research Retrospective

Unsloth Retrospective

CUDA Retrospective

chat Comments (0)

Skill Details

Related Skills

ui-ux-pro-max

ai-sdk

planning-with-files

agent-browser

content-prd

Build your own?

Sign in to Comment

domain-retrospective

Skill: domain-retrospective

When to use

Initialization

Behavior

Domain-Specific Behavior

Research Domain

Unsloth Domain

CUDA Domain

Result Skill Template

Example Output

Research Retrospective

Unsloth Retrospective

CUDA Retrospective

chat Comments (0)

Skill Details

Related Skills

ui-ux-pro-max

ai-sdk

planning-with-files

agent-browser

content-prd

Build your own?