Incident Response

Overview

Skill Key: chunhualiao/incident-response
Author: chunhualiao
Source Repo: openclaw/skills
Version: -
Source Path: skills/chunhualiao/incident-response
Latest Commit SHA: aeb692040f61d1a3bc147ff5e404432d65b3fbba

Extracted Content

SKILL.md excerpt

# Incident Response

Seven phases, in order. Never skip. Never assume — follow the evidence.

**Outputs produced by this skill:**
- Root cause statement (5 Whys chain with evidence citations)
- Restore confirmation (what was restored, verified working)
- Prevention commit (git commit hash of guard/rule added)
- Monitoring cron (job ID + schedule)
- Learning entry (appended to `~/.openclaw/learnings/rules.md`)

## Phase 0: Triage (2 min)

**Check current state FIRST before investigating history.**

```bash
# Is it actually broken right now?
openclaw status
ssh "<remote-host>" "launchctl list | grep openclaw"
# Test with correct protocol (check source: HTTP vs HTTPS?)
```

If currently working → report "recovered, investigating cause." If still broken → proceed.

## Phase 1: Evidence Collection

Gather hard evidence from four sources:

### 1a. Config backups timeline
```bash
# See binding/setting counts over time
ssh "<remote-host>" "python3 << 'EOF'
import json, glob, os
for f in sorted(glob.glob('~/.openclaw/config-backups/openclaw-*.json'), key=os.path.getmtime):
    d = json.load(open(f))
    import datetime
    dt = datetime.datetime.fromtimestamp(os.path.getmtime(f)).strftime('%Y-%m-%d %H:%M')
    # Customize: bindings, agents, channels, etc.
    count = len(d.get('bindings', []))
    ids = [b.get('agentId') for b in d.get('bindings', [])]
    print(f'{dt} [{count}] {ids}')
EOF"
```

### 1b. Git audit trail
```bash
ssh "<remote-host>" "cd ~/.openclaw && git log --oneline -20"
ssh "<remote-host>" "cd ~/.openclaw && git diff <commit-a> <commit-b> -- openclaw.json | grep '^[+-]' | grep -v '^---\|^+++'"
```

### 1c. Session logs (who did what)
```bash
# Find sessions that touched the broken config key
ssh "<remote-host>" "rg -rl 'keyword' ~/.openclaw/agents/*/sessions/*.jsonl | head -5"

# Extract tool calls from a session
ssh "<remote-host>" "python3 << 'EOF'
import json
for line in open('SESSION.jsonl'):
    obj = json.loads(line)
    if obj.get('type') != 'messag...

README excerpt

# Incident Response

Structured 7-phase incident response workflow for OpenClaw system failures.

Built from real production investigations — binding loss events, gateway crashes, config regressions, and root cause traces through backup timelines, git diffs, and session JSONL logs.

---

## What it does

When something breaks, this skill walks you through seven phases in strict order:

| Phase | Name | Purpose |
|-------|------|---------|
| 0 | Triage | Check current state — is it actually still broken? |
| 1 | Evidence | Gather hard evidence from 4 sources (backups, git, session logs, diffs) |
| 2 | 5 Whys | Root cause analysis — every "why" must cite specific evidence |
| 3 | Restore | Merge from known-good backup, verify, restart |
| 4 | Prevent | Add guards proportional to severity (config guard, SOUL.md rule, chmod) |
| 5 | Monitor | Schedule a cron check (7–30 days depending on severity) |
| 6 | Document | Write to `~/.openclaw/learnings/rules.md` and MEMORY.md |

**Rule: Never skip a phase. Never assume — follow the evidence.**

---

## Install

```bash
clawhub install incident-response
```

---

## Trigger phrases

```
investigate binding loss
investigate gateway crash
why did X stop working
gateway down
gateway crashed
bindings lost
agent not responding
root cause
who changed X
audit X
something disappeared
```

---

## What's included

| File | Purpose |
|------|---------|
| `SKILL.md` | Full 7-phase workflow with runnable commands |
| `references/checklists.md` | Quick diagnosis checklists for 6 common failure types |
| `references/prevention-patterns.md` | 6 prevention patterns with code templates |
| `references/cron-template.md` | Post-incident monitoring cron template |

---

## Failure types covered

- **Gateway crash** — invalid config key, launchctl exit code, doctor/fix flow
- **Binding loss** — backup timeline, count guard, restore from good state
- **Config key disappeared** — grep backups, git log, patch restore
- **Agent routing wrong** — bind...

TopRank Skills

安装方式

Overview

Extracted Content

SKILL.md excerpt

README excerpt

Related Claw Skills

bambu-studio-ai

stock-data-skill

dojo.md

wps-macos-helper

geo-optimization

openai-codex-operator