Overview
- Skill Key
- 1kalin/afrexai-sre-platform
- Author
- 1kalin
- Source Repo
- openclaw/skills
- Version
- -
- Source Path
- skills/1kalin/afrexai-sre-platform
- Latest Commit SHA
- 8a8fb230d97c6ce01c72529f2faca16fa07ea746
SRE & Incident Management Platform
Stars
0
Installs
0
Status
ACTIVE
Visibility
PUBLIC
直接复制以下提示词,发送给你的 AI 助手即可完成安装。
请先检查是否已安装 SkillHub 商店,若未安装,请根据 https://skillhub-1388575217.cos.ap-guangzhou.myqcloud.com/install/skillhub.md 安装 SkillHub 商店,然后安装 Afrexai Sre Platform 技能。 若已安装,则直接安装 Afrexai Sre Platform 技能。
# SRE & Incident Management Platform
Complete Site Reliability Engineering system — from SLO definition through incident response, chaos engineering, and operational excellence. Zero dependencies.
---
## Phase 1: Reliability Assessment
Before building anything, assess where you are.
### Service Catalog Entry
```yaml
service:
name: ""
tier: "" # critical | important | standard | experimental
owner_team: ""
oncall_rotation: ""
dependencies:
upstream: [] # services we call
downstream: [] # services that call us
data_classification: "" # public | internal | confidential | restricted
deployment_frequency: "" # daily | weekly | biweekly | monthly
architecture: "" # monolith | microservice | serverless | hybrid
language: ""
infra: "" # k8s | ECS | Lambda | VM | bare-metal
traffic_pattern: "" # steady | diurnal | spiky | seasonal
peak_rps: 0
storage_gb: 0
monthly_cost_usd: 0
```
### Maturity Assessment (Score 1-5 per dimension)
| Dimension | 1 (Ad-hoc) | 3 (Defined) | 5 (Optimized) | Score |
|-----------|-----------|-------------|---------------|-------|
| SLOs | No SLOs defined | SLOs exist, reviewed quarterly | Data-driven SLOs, auto error budgets | |
| Monitoring | Basic health checks | Golden signals + dashboards | Full observability, anomaly detection | |
| Incident Response | No runbooks, hero culture | Documented process, postmortems | Automated detection, structured ICS | |
| Automation | Manual deployments | CI/CD pipeline, some automation | Self-healing, auto-scaling, GitOps | |
| Chaos Engineering | No testing | Basic failure injection | Continuous chaos in production | |
| Capacity Planning | Reactive scaling | Quarterly forecasting | Predictive auto-scaling | |
| Toil Management | >50% toil | Toil tracked, reduction plans | <25% toil, systematic elimination | |
| On-Call Health | Burnout, 24/7 individuals | Rotation exists, escalation paths | Balanced load, <2 pages/shift | |
**Score interpretation:**
- 8-1...
# AfrexAI SRE & Incident Management Platform ⚡ The most comprehensive SRE skill on ClawHub. Complete system from SLO definition through incident response, chaos engineering, toil management, and operational excellence. ## Install ```bash clawhub install afrexai-sre-platform ``` ## What's Inside - **Reliability Assessment** — 8-dimension maturity model with scoring - **SLI/SLO Framework** — Selection guides, burn rate alerts, 28-day rolling windows - **Error Budget Management** — 4-state policy with automated escalation rules - **Monitoring Architecture** — Golden Signals + USE + RED methods, alert design rules - **Incident Response** — Full ICS framework, severity matrix, communication templates - **Postmortem Framework** — Blameless template, Five Whys, Fishbone analysis, action tracking - **Chaos Engineering** — 12 experiment templates, Game Day runbook, maturity model - **Toil Management** — Inventory, priority matrix, automation targets - **Capacity Planning** — Growth modeling, load testing benchmarks, scaling strategies - **On-Call Excellence** — Health metrics, rotation practices, handoff templates, runbook template - **Production Readiness Review** — 28-point checklist before any service goes live - **Self-Healing Patterns** — Auto-remediation templates, multi-region strategies ## Quick Start ``` "Assess reliability for payment-service" "Define SLOs for our API gateway" "Start incident for elevated 5xx errors on checkout" "Plan chaos experiment for database failover" ``` ## ⚡ Level Up This free skill gives you the methodology. For industry-specific reliability patterns: - **[SaaS Context Pack ($47)](https://afrexai-cto.github.io/context-packs/)** — SaaS-specific SLOs, multi-tenant reliability, PLG scaling - **[Fintech Context Pack ($47)](https://afrexai-cto.github.io/context-packs/)** — Financial system SLOs, compliance monitoring, payment reliability - **[Healthcare Context Pack ($47)](https://afrexai-cto.github.io/context-packs/)** — HIPAA-awar...
aicodelion
🚀 Clone your OpenClaw AI Agent to a new device in ~25 minutes — configs, memory, skills, everything.
0xnyk
X Intelligence CLI — search, monitor, analyze, and engage on X/Twitter. TypeScript + Bun. AI agent skill.
heyixuan2
Bambu Lab 3D printer control and automation. Activate when user mentions: printer status, 3D printing, slice, analyze model, generate 3D, AMS filament, print monitor, Bambu Lab, or any 3D printing task. Full pipeline: search → generate → analyze → colorize → preview → open BS → user slice → print → monitor. Supports all 9 Bambu Lab printers (A1 Mini, A1, P1S, P2S, X1C, X1E, H2C, H2S, H2D).
jackculpan
Track flight prices from Google Flights with this OpenClaw skill. Search routes, monitor prices, and get alerts when prices drop.
openclaw-trade
openclaw trading assistant| openclaw trading skill | nof1.ai & openclaw [moltbot] collaboration | We get the best practices from alpha arena trading seasons and bring it to clawdbot All top AI agents, realtime monitoring and news research, gather info from private insiders and many other! Using Hyperliquid API.
xquik-dev
X (Twitter) automation skill for AI coding agents. Tweet search, user lookup, follower/following extraction, media download, reply/retweet/quote extraction, 40+ tools, account monitoring & trending topics. REST API, MCP server, HMAC webhooks. Works with Claude Code, Cursor, Codex, Copilot, Windsurf & 40+ agents.