Agent SkillsAgent Skills
jeremylongshore

deploying-monitoring-stacks

@jeremylongshore/deploying-monitoring-stacks
jeremylongshore
2,103
284 forks
Updated 5/5/2026
View on GitHub

Monitor use when deploying monitoring stacks including Prometheus, Grafana, and Datadog. Trigger with phrases like "deploy monitoring stack", "setup prometheus", "configure grafana", or "install datadog agent". Generates production-ready configurations with metric collection, visualization dashboards, and alerting rules.

Installation

$npx agent-skills-cli install @jeremylongshore/deploying-monitoring-stacks
Claude Code
Cursor
Copilot
Codex
Antigravity

Details

Pathplugins/devops/monitoring-stack-deployer/skills/deploying-monitoring-stacks/SKILL.md
Branchmain
Scoped Name@jeremylongshore/deploying-monitoring-stacks

Usage

After installing, this skill will be available to your AI coding assistant.

Verify installation:

npx agent-skills-cli list

Skill Instructions


name: deploying-monitoring-stacks description: 'Monitor use when deploying monitoring stacks including Prometheus, Grafana, and Datadog. Trigger with phrases like "deploy monitoring stack", "setup prometheus", "configure grafana", or "install datadog agent". Generates production-ready configurations with metric collection, visualization dashboards, and alerting rules.

' allowed-tools: Read, Write, Edit, Grep, Glob, Bash(docker:), Bash(kubectl:) version: 1.0.0 author: Jeremy Longshore jeremy@intentsolutions.io license: MIT tags:

  • devops
  • deployment
  • monitoring
  • dashboard compatibility: Designed for Claude Code, also compatible with Codex and OpenClaw

Deploying Monitoring Stacks

Overview

Deploy production monitoring stacks (Prometheus + Grafana, Datadog, or Victoria Metrics) with metric collection, custom dashboards, and alerting rules. Configure exporters, scrape targets, recording rules, and notification channels for comprehensive infrastructure and application observability.

Prerequisites

  • Target infrastructure identified: Kubernetes cluster, Docker hosts, or bare-metal servers
  • Metric endpoints accessible from the monitoring platform (application /metrics, node exporters)
  • Storage backend capacity planned for time-series data (Prometheus TSDB, Thanos, or Cortex for long-term)
  • Alert notification channels defined: Slack webhook, PagerDuty integration key, or email SMTP
  • Helm 3+ for Kubernetes deployments using kube-prometheus-stack or similar charts

Instructions

  1. Select the monitoring platform: Prometheus + Grafana for open-source self-hosted, Datadog for managed SaaS, Victoria Metrics for high-cardinality workloads
  2. Deploy the monitoring stack: helm install kube-prometheus-stack prometheus-community/kube-prometheus-stack or Docker Compose for non-Kubernetes
  3. Install exporters on monitored systems: node-exporter for host metrics, kube-state-metrics for Kubernetes object states, application-specific exporters
  4. Configure scrape targets in prometheus.yml: define job names, scrape intervals, and relabeling rules for service discovery
  5. Create recording rules for frequently queried aggregations to reduce dashboard query load
  6. Define alerting rules with meaningful thresholds: high CPU (>80% for 5m), high memory (>90%), error rate (>1%), latency P99 (>500ms)
  7. Configure Alertmanager with routing, grouping, and notification channels (Slack, PagerDuty, email)
  8. Build Grafana dashboards: RED metrics (Rate, Errors, Duration) for services, USE metrics (Utilization, Saturation, Errors) for resources
  9. Set up data retention: configure TSDB retention period (15-30 days local), set up Thanos/Cortex for long-term storage if needed
  10. Test the full pipeline: trigger a test alert and verify notification delivery

Output

  • Helm values file or Docker Compose for the monitoring stack
  • Prometheus configuration with scrape targets, recording rules, and alerting rules
  • Alertmanager configuration with routing tree and notification receivers
  • Grafana dashboard JSON files for infrastructure and application metrics
  • Exporter deployment manifests (node-exporter DaemonSet, application ServiceMonitor)

Error Handling

ErrorCauseSolution
No data points in dashboardScrape target not reachable or metric name wrongCheck Targets page in Prometheus UI; verify service discovery and metric name
Too many time series (high cardinality)Labels with unbounded values (user IDs, request IDs)Remove high-cardinality labels with metric_relabel_configs; use recording rules for aggregation
Alert condition met but no notificationAlertmanager routing or receiver misconfiguredVerify Alertmanager config with amtool check-config; test receiver with amtool silence
Prometheus OOMKilledInsufficient memory for series countIncrease memory limits; reduce scrape targets or retention; add WAL compression
Grafana datasource connection failedWrong Prometheus URL or network policy blocking accessVerify datasource URL in Grafana; check Kubernetes service name and port; review network policies

Examples

  • "Deploy kube-prometheus-stack on Kubernetes with alerts for node CPU > 80%, pod restart count > 5, and API error rate > 1%, sending to Slack."
  • "Set up Prometheus + Grafana on Docker Compose for monitoring 10 application servers with node-exporter and custom application metrics."
  • "Create Grafana dashboards for the four golden signals (latency, traffic, errors, saturation) for a microservices application."

Resources

More by jeremylongshore

View all
docker-compose-generator
2,103

generating-docker-compose-files: This skill enables Claude to generate Docker Compose configurations for multi-container applications. It leverages best practices for production-ready deployments, including defining services, networks, volumes, health checks, and resource limits. Claude should use this skill when the user requests a Docker Compose file, specifies application architecture involving multiple containers, or mentions needs for container orchestration, environment variables, or persistent data management in a Docker environment. Trigger terms include "docker-compose", "docker compose file", "multi-container", "container orchestration", "docker environment", "service definition", "volume management", "network configuration", "health checks", "resource limits", and ".env files".

environment-config-manager
2,103

managing-environment-configurations: This skill enables Claude to manage environment configurations and secrets across different deployments using the environment-config-manager plugin. It is invoked when the user needs to generate, update, or retrieve configuration settings for various environments (e.g., development, staging, production). Use this skill when the user explicitly mentions "environment configuration," "secrets management," "deployment configuration," or asks to "generate config files". It helps streamline DevOps workflows by providing production-ready configurations based on best practices.

fairdb-backup-manager
2,103

Automatically manages PostgreSQL backups with pgBackRest and Wasabi S3 storage when working with FairDB databases Activates when you request "fairdb backup manager" functionality.

git-commit-smart
2,103

generating-smart-commits: This skill generates conventional commit messages using AI analysis of staged Git changes. It automatically determines the commit type (feat, fix, docs, etc.), identifies breaking changes, and formats the message according to conventional commit standards. Use this when asked to create a commit message, write a Git commit, or when the user uses the `/commit-smart` or `/gc` command. It is especially useful after changes have been staged with `git add`.