Build monitoring dashboards with Grafana. Covers panel types, queries, variables, alerting, provisioning, and data sources like Prometheus and InfluxDB. Use for infrastructure monitoring, observability, and metrics visualization.
Installation
Details
Usage
After installing, this skill will be available to your AI coding assistant.
Verify installation:
npx agent-skills-cli listSkill Instructions
name: grafana-dashboards description: Build monitoring dashboards with Grafana. Covers panel types, queries, variables, alerting, provisioning, and data sources like Prometheus and InfluxDB. Use for infrastructure monitoring, observability, and metrics visualization.
Grafana Dashboards
Build powerful monitoring and observability dashboards.
Instructions
- Start with key metrics - CPU, memory, latency, error rates
- Use consistent time ranges - All panels should sync
- Add context with variables - Filter by environment, service, host
- Set up alerts - Proactive monitoring, not reactive
- Use templates - Consistent dashboard styling
Dashboard Structure
Dashboard JSON
{
"dashboard": {
"id": null,
"uid": "my-dashboard",
"title": "Service Overview",
"tags": ["production", "service-name"],
"timezone": "browser",
"schemaVersion": 39,
"version": 1,
"refresh": "30s",
"time": {
"from": "now-1h",
"to": "now"
},
"templating": {
"list": []
},
"panels": [],
"annotations": {
"list": []
}
}
}
Panel Types
// Time series (line chart)
{
"type": "timeseries",
"title": "Request Rate",
"gridPos": { "x": 0, "y": 0, "w": 12, "h": 8 },
"fieldConfig": {
"defaults": {
"unit": "reqps",
"custom": {
"lineWidth": 2,
"fillOpacity": 10,
"gradientMode": "opacity"
}
}
},
"targets": [
{
"expr": "rate(http_requests_total{job=\"$job\"}[5m])",
"legendFormat": "{{method}} {{status}}"
}
]
}
// Stat panel (single value)
{
"type": "stat",
"title": "Total Requests",
"gridPos": { "x": 0, "y": 0, "w": 4, "h": 4 },
"options": {
"colorMode": "value",
"graphMode": "area",
"justifyMode": "auto",
"orientation": "auto",
"reduceOptions": {
"calcs": ["lastNotNull"],
"fields": "",
"values": false
}
},
"targets": [
{
"expr": "sum(http_requests_total{job=\"$job\"})",
"legendFormat": ""
}
]
}
// Gauge
{
"type": "gauge",
"title": "CPU Usage",
"fieldConfig": {
"defaults": {
"unit": "percent",
"min": 0,
"max": 100,
"thresholds": {
"mode": "absolute",
"steps": [
{ "color": "green", "value": null },
{ "color": "yellow", "value": 70 },
{ "color": "red", "value": 90 }
]
}
}
}
}
// Table
{
"type": "table",
"title": "Top Endpoints",
"transformations": [
{
"id": "sortBy",
"options": {
"fields": {},
"sort": [{ "field": "Value", "desc": true }]
}
}
]
}
Prometheus Queries (PromQL)
Basic Queries
# Instant rate (requests per second)
rate(http_requests_total[5m])
# Sum by label
sum by (status_code) (rate(http_requests_total[5m]))
# Average latency
histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))
# Error rate percentage
sum(rate(http_requests_total{status=~"5.."}[5m])) /
sum(rate(http_requests_total[5m])) * 100
# CPU usage percentage
100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
# Memory usage
(node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) /
node_memory_MemTotal_bytes * 100
# Disk usage
(node_filesystem_size_bytes - node_filesystem_avail_bytes) /
node_filesystem_size_bytes * 100
Aggregation & Filtering
# Filter by label
http_requests_total{job="api", environment="production"}
# Regex match
http_requests_total{path=~"/api/v[0-9]+/.*"}
# Not equal
http_requests_total{status!="200"}
# Rate over time window
rate(metric[5m]) # 5 minute rate
irate(metric[5m]) # Instant rate (last 2 points)
# Aggregations
sum(metric) # Total
avg(metric) # Average
max(metric) # Maximum
min(metric) # Minimum
count(metric) # Count of series
topk(5, metric) # Top 5 series
bottomk(5, metric) # Bottom 5 series
# Group by label
sum by (instance) (metric)
avg without (instance) (metric)
Variables (Templating)
{
"templating": {
"list": [
{
"name": "datasource",
"type": "datasource",
"query": "prometheus",
"current": {},
"hide": 0
},
{
"name": "environment",
"type": "query",
"datasource": "${datasource}",
"query": "label_values(up, environment)",
"refresh": 1,
"multi": false,
"includeAll": true,
"allValue": ".*"
},
{
"name": "instance",
"type": "query",
"datasource": "${datasource}",
"query": "label_values(up{environment=\"$environment\"}, instance)",
"refresh": 2,
"multi": true,
"includeAll": true
},
{
"name": "interval",
"type": "interval",
"options": [
{ "selected": false, "text": "1m", "value": "1m" },
{ "selected": true, "text": "5m", "value": "5m" },
{ "selected": false, "text": "15m", "value": "15m" }
]
}
]
}
}
Usage in queries:
rate(http_requests_total{environment=~"$environment", instance=~"$instance"}[$interval])
Alerting
Alert Rule
{
"alert": "HighErrorRate",
"expr": "sum(rate(http_requests_total{status=~\"5..\"}[5m])) / sum(rate(http_requests_total[5m])) > 0.05",
"for": "5m",
"labels": {
"severity": "critical"
},
"annotations": {
"summary": "High error rate detected",
"description": "Error rate is {{ $value | humanizePercentage }} for the last 5 minutes"
}
}
Grafana Alerting (v8+)
# provisioning/alerting/alerts.yaml
apigroups:
- name: service-alerts
folder: Alerts
interval: 1m
rules:
- uid: high-error-rate
title: High Error Rate
condition: C
data:
- refId: A
datasourceUid: prometheus
model:
expr: sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m])) * 100
- refId: C
datasourceUid: __expr__
model:
type: threshold
conditions:
- evaluator:
type: gt
params: [5]
for: 5m
labels:
severity: critical
annotations:
summary: Error rate above 5%
Dashboard Provisioning
File Structure
grafana/
βββ provisioning/
β βββ dashboards/
β β βββ dashboards.yaml
β βββ datasources/
β β βββ datasources.yaml
β βββ alerting/
β βββ alerts.yaml
βββ dashboards/
βββ overview.json
βββ service-details.json
Datasources Config
# provisioning/datasources/datasources.yaml
apidatasources:
- name: Prometheus
type: prometheus
access: proxy
url: http://prometheus:9090
isDefault: true
editable: false
- name: InfluxDB
type: influxdb
access: proxy
url: http://influxdb:8086
database: metrics
user: admin
secureJsonData:
password: ${INFLUXDB_PASSWORD}
Dashboard Provider
# provisioning/dashboards/dashboards.yaml
apiproviders:
- name: default
orgId: 1
folder: ''
type: file
disableDeletion: false
editable: true
options:
path: /var/lib/grafana/dashboards
Common Dashboard Patterns
RED Method (Request, Error, Duration)
# Request Rate
sum(rate(http_requests_total[5m]))
# Error Rate
sum(rate(http_requests_total{status=~"5.."}[5m])) /
sum(rate(http_requests_total[5m]))
# Duration (95th percentile)
histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le))
USE Method (Utilization, Saturation, Errors)
# CPU Utilization
100 - (avg(rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
# Memory Saturation
node_memory_SwapCached_bytes / node_memory_SwapTotal_bytes
# Network Errors
rate(node_network_receive_errs_total[5m])
Best Practices
- Use consistent colors - Red for errors, green for success
- Add descriptions - Panel descriptions explain what's shown
- Set meaningful thresholds - Color changes at important values
- Link related dashboards - Drill-down from overview to details
- Version control dashboards - Store JSON in git
- Use dashboard folders - Organize by team or service
When to Use
- Infrastructure monitoring
- Application performance monitoring
- Business metrics dashboards
- Real-time operational dashboards
- SLA/SLO tracking
Notes
- Grafana Cloud offers managed hosting
- Use Terraform provider for IaC
- Consider Grafana Loki for logs
- Grafana Tempo for distributed tracing
More by fgarofalo56
View allEvent routing with Azure Event Grid. Configure topics, subscriptions, event handlers, and dead-lettering. Use for event-driven architectures, serverless triggers, and reactive systems on Azure.
Build accessible web applications meeting WCAG 2.1 AA standards. Covers ARIA attributes, keyboard navigation, screen reader optimization, focus management, color contrast, semantic HTML, and accessible components. Use for a11y audits, accessible UI development, and inclusive design.
Expert guidance for converting files to Markdown using Microsoft's MarkItDown utility. Convert PDF, Word, PowerPoint, Excel, images, audio, HTML, CSV, JSON, XML, ZIP, and EPub files to LLM-friendly Markdown format. Use when processing documents for AI analysis, extracting content from files, or preparing data for language models.
Structured outputs with Instructor. Extract typed data from LLMs using Pydantic models and validation. Use for data extraction, structured generation, and type-safe LLM responses.
