Skill Analysis
Conflict detection, context budgeting, diffing, and benchmarking
doctor --deepbudgetdiffbench
doctor --deep#
Deep conflict analysis across all installed skills.
skills doctor --deep
Extends the existing doctor command with a --deep flag that runs three conflict detection strategies:
| Strategy | Description |
|---|---|
| Keyword Contradiction | Detects conflicting instructions (e.g., "use tabs" vs "use spaces") |
| Topic Overlap | Jaccard similarity on section headings to find duplicate coverage |
| Rule Extraction | Extracts imperative instructions and compares across skills |
Output:
π©Ί Agent Skills Doctor
β Installed skills: 16/16 valid
π Deep Conflict Analysis
β No conflicting instructions found.
β No topic overlaps found.
When conflicts exist, you'll see severity levels and estimated wasted tokens.
budget#
Smart context budget manager β loads only the most relevant skills within a token limit.
skills budget -b <tokens> [options]
| Option | Description |
|---|---|
-b, --budget <tokens> | Token budget (required, e.g. 8000) |
-f, --format <format> | Output format: text, xml, json (default: text) |
-m, --min-relevance <score> | Minimum relevance score (0β100, default: 10) |
-p, --project <dir> | Project directory to analyze (default: cwd) |
--list-only | Show ranked list without selecting |
Relevance scoring (no LLM required):
- β’File extension matching (project file types β skill language keywords)
- β’Dependency matching (package.json, requirements.txt, Cargo.toml)
- β’Keyword density (skill body vs project file/directory names)
- β’Description match against all project signals
Examples:
skills budget -b 8000 # Text output with relevance bars
skills budget -b 4000 --format xml # Agent-ready XML
skills budget -b 10000 --format json # Machine-readable
skills budget -b 6000 --min-relevance 30 # Only high-relevance skills
Output:
π Context Budget Plan
Budget: 8000 tokens | Skills found: 21
β
Loading 4 skill(s):
ββββββββββββββββββββ skill-creator (4426 tokens, 67% relevant)
ββββββββββββββββββββ docx (2538 tokens, 64% relevant)
ββββββββββββββββββββ skill-installer (704 tokens, 23% relevant)
ββββββββββββββββββββ test-skill (106 tokens, 10% relevant)
Summary: Used 7774 / 8000 tokens
diff#
Section-aware comparison between two skills.
skills diff <skill-a> <skill-b> [options]
| Option | Description |
|---|---|
--json | Output as JSON |
Parses each SKILL.md by headings and compares:
- β’Added sections β only in skill B
- β’Removed sections β only in skill A
- β’Changed sections β same heading, different content (with line delta)
- β’Token delta β how many tokens differ
Examples:
skills diff frontend-design frontend-code-review
skills diff ./skill-a ./skill-b --json
Output:
π Skill Diff: frontend-design vs frontend-code-review
β Added: 11 sections
β Removed: 2 sections
βοΈ Changed: 1
Token delta: -358
bench#
Benchmark and compare skills by quality, size, and coverage.
skills bench [skills...] [options]
| Option | Description |
|---|---|
-a, --all | Benchmark all installed skills |
--sort <field> | Sort by: quality, tokens, name (default: quality) |
--json | Output as JSON |
--min-quality <n> | Filter skills below this quality score (0β100) |
Quality scoring (0β100):
- β’Frontmatter with name (+10) and description (+10)
- β’Section headings (+15/+5), code blocks (+15/+5)
- β’Has examples (+10), has instructions (+10)
- β’Appropriate token range (+10), no TODOs (+10)
Examples:
skills bench --all # All skills, sorted by quality
skills bench --all --sort tokens # Sorted by size
skills bench --min-quality 80 # Only high-quality skills
Output:
π Skill Benchmark Results
Skill Quality Tokens Sections Code Features
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
add-uint-support ββββββββββ 2311 24 13 π π‘ π
frontend-code-review ββββββββββ 711 11 2 π π‘ π
frontend-design ββββββββββ 1069 2 0 π π‘ π
Summary: 17 skills | Avg quality: 90% | Total tokens: 41279
Legend: π frontmatter π‘ examples π instructions