incident-response

@korallis/incident-response

10 forks

Updated 4/1/2026

Respond to production incidents systematically with triage, investigation, resolution, and post-mortem analysis to minimize downtime and prevent recurrence. Use when handling production outages, triaging incidents, investigating critical bugs, coordinating incident response, implementing hotfixes, conducting post-mortems, or establishing incident response procedures.

Installation

$npx agent-skills-cli install @korallis/incident-response

Claude Code

Cursor

Copilot

Codex

Antigravity

Details

Repositorykorallis/Droidz

Pathdroidz_installer/payloads/claude/default/skills/incident-response/SKILL.md

Branchmain

Scoped Name@korallis/incident-response

Usage

After installing, this skill will be available to your AI coding assistant.

Verify installation:

npx agent-skills-cli list

Skill Instructions

name: incident-response description: Respond to production incidents systematically with triage, investigation, resolution, and post-mortem analysis to minimize downtime and prevent recurrence. Use when handling production outages, triaging incidents, investigating critical bugs, coordinating incident response, implementing hotfixes, conducting post-mortems, or establishing incident response procedures.

Incident Response - Production Issue Management

When to use this skill

Responding to production outages
Triaging critical incidents
Investigating high-severity bugs
Coordinating incident response teams
Implementing emergency hotfixes
Conducting post-mortem analyses
Establishing incident response procedures
Communicating status during incidents
Creating runbooks for common issues
Implementing rollback strategies
Documenting incident timelines
Preventing incident recurrence

When to use this skill

Responding to outages, managing incidents, conducting postmortems.
When working on related tasks or features
During development that requires this expertise

Use when: Responding to outages, managing incidents, conducting postmortems.

Incident Response Process

1. Detect

Monitoring alerts
User reports
Automated checks

2. Triage

Assess severity (P0-P4)
Page on-call engineer
Create incident channel

3. Mitigate

Rollback to last known good
Scale resources
Apply hotfix
Communicate status

4. Resolve

Verify fix
Monitor metrics
Update status page
Close incident

5. Postmortem

Timeline of events
Root cause analysis
Action items
Follow-up tasks

Severity Levels

P0 (Critical): Complete outage, data loss
P1 (High): Major feature broken, revenue impact
P2 (Medium): Degraded performance, workaround exists
P3 (Low): Minor bug, cosmetic issue
P4 (Informational): Enhancement request

Example Runbook

```markdown

High CPU Usage Runbook

Symptoms

Server CPU > 90%
Slow response times
Request timeouts

Investigation

Check top processes: `top`
Check memory: `free -h`
Check logs: `tail -f app.log`

Mitigation

Scale horizontally: Add servers
Restart service: `systemctl restart app`
Rate limit: Enable aggressive rate limiting

Resolution

Identify root cause (N+1 query, memory leak, etc.)
Deploy fix
Monitor for 1 hour ```

Communication Template

``` [INCIDENT] Service X degraded

Status: Investigating Impact: 20% of users seeing slow load times ETA: 30 minutes

Updates:

10:00 AM: Issue detected
10:05 AM: On-call paged, investigation started
10:15 AM: Root cause identified (database bottleneck)
10:30 AM: Fix deployed, monitoring

Next update: 11:00 AM ```

Resources

More by korallis

View all

typescript-strict

Write type-safe TypeScript code with strict mode enabled, comprehensive type definitions, proper error handling, and elimination of any types. Use when enabling TypeScript strict mode, adding types to existing JavaScript, fixing type errors, creating type definitions, using utility types, implementing type guards, avoiding any types, creating generic types, or ensuring complete type safety across the codebase.

performance-optimization

Optimize application performance through code splitting, lazy loading, caching strategies, bundle size reduction, render optimization, and profiling. Use when improving page load times, reducing bundle sizes, optimizing React rendering, implementing code splitting, configuring caching strategies, lazy loading components and routes, optimizing images and assets, profiling performance bottlenecks, implementing virtual scrolling for large lists, or improving Core Web Vitals and Lighthouse scores.

tailwind-design-system

Create consistent, scalable design systems using Tailwind CSS utility classes with custom themes, design tokens, and responsive design patterns. Use when building design systems, implementing custom themes, creating reusable utility patterns, configuring Tailwind theme extensions, implementing dark mode, building responsive layouts, creating design tokens, using arbitrary values, or establishing consistent spacing and typography scales.

frontend-component-patterns

Build reusable, composable, and maintainable React/Vue/Angular components following established design patterns like compound components, render props, custom hooks, and HOCs. Use when creating component libraries, implementing component composition, building reusable UI elements, designing prop APIs, managing component state patterns, implementing controlled vs uncontrolled components, creating compound components, using render props or children as functions, building custom hooks, or establishing component architecture standards.