Dicklesworthstone

cm

@Dicklesworthstone/cm
Dicklesworthstone
9
0 forks
Updated 1/6/2026
View on GitHub

CASS Memory System - procedural memory for AI coding agents. Three-layer cognitive architecture transforms scattered sessions into persistent, cross-agent learnings with confidence decay, anti-pattern learning, and scientific validation.

Installation

$skills install @Dicklesworthstone/cm
Claude Code
Cursor
Copilot
Codex
Antigravity

Details

Pathskills/cm/SKILL.md
Branchmain
Scoped Name@Dicklesworthstone/cm

Usage

After installing, this skill will be available to your AI coding assistant.

Verify installation:

skills list

Skill Instructions


name: cm description: "CASS Memory System - procedural memory for AI coding agents. Three-layer cognitive architecture transforms scattered sessions into persistent, cross-agent learnings with confidence decay, anti-pattern learning, and scientific validation."

CM - CASS Memory System

Procedural memory for AI coding agents. Transforms scattered agent sessions into persistent, cross-agent memory so every agent learns from every other agent's experience.


Critical Concepts for AI Agents

The Three-Layer Cognitive Architecture

LayerRoleStorageTool
Episodic MemoryRaw session transcripts~/.local/share/cass/cass
Working MemorySession summariesDiary entriescm reflect
Procedural MemoryDistilled action rulesPlaybookcm

Flow: Sessions → Diary summaries → Playbook rules

Why This Matters

Without cm, each agent session starts from zero. With cm:

  • Rules that helped in past sessions get reinforced
  • Anti-patterns that caused failures become explicit warnings
  • Cross-agent learning means Agent B benefits from Agent A's mistakes
  • Confidence decay naturally retires stale guidance

Quick Reference for AI Agents

Start of Session

# Get relevant rules and history for your task
cm context "implementing OAuth authentication" --json

# Output includes:
# - Relevant playbook rules with scores
# - Related diary entries
# - Gap analysis (uncovered areas)

During Work

# Find rules about a topic
cm similar "error handling" --json

# Check if a pattern is validated
cm validate "Always use prepared statements for SQL"

End of Session

# Record which rules helped
cm outcome success "RULE-123,RULE-456"

# Record which rules caused problems
cm outcome failure "RULE-789"

# Apply recorded outcomes
cm outcome-apply

Periodic Maintenance

# Extract new rules from recent sessions
cm reflect

# Find stale rules needing re-validation
cm stale --days 30

# System health
cm doctor

The ACE Pipeline

CM uses a four-stage pipeline to extract and curate rules:

Sessions → Generator → Reflector → Validator → Curator → Playbook
              ↓            ↓            ↓           ↓
           Diary      Candidates    Evidence    Final Rules
          Entries       (LLM)        (LLM)      (NO LLM!)

Stage Details

StageUses LLMPurpose
GeneratorYesSummarize sessions into diary entries
ReflectorYesPropose candidate rules from patterns
ValidatorYesCheck rules against historical evidence
CuratorNODeterministic merge into playbook

CRITICAL: The Curator is intentionally LLM-free to prevent hallucinated provenance. All rule additions must trace to actual session evidence.


Confidence & Decay System

The Scoring Algorithm

Every rule has a confidence score that decays over time:

score = base_confidence × decay_factor × feedback_modifier

Where:

  • base_confidence: Initial confidence (0.0-1.0)
  • decay_factor: 0.5^(days_since_feedback / 90) (90-day half-life)
  • feedback_modifier: Accumulated helpful/harmful signals

Decay Visualization

Day 0:   ████████████████████ 1.00
Day 45:  ██████████████       0.71
Day 90:  ██████████           0.50  ← Half-life
Day 180: █████                0.25
Day 270: ██                   0.125

Feedback Multipliers

FeedbackMultiplierEffect
Helpful1.0xStandard positive reinforcement
Harmful4.0xAggressive penalty (asymmetric by design)

Why asymmetric? Bad advice is more damaging than good advice is helpful. A harmful rule should decay 4x faster than a helpful one recovers.

Anti-Pattern Learning

When a rule accumulates too much harmful feedback, it doesn't just disappear—it inverts:

Original:  "Always use global state for configuration"
           ↓ (harmful feedback accumulates)
Inverted:  "⚠️ ANTI-PATTERN: Avoid global state for configuration"

The inverted anti-pattern becomes a warning that prevents future agents from making the same mistake.


Command Reference

cm context — Get Task-Relevant Memory

The primary command for starting any task.

# Basic context retrieval
cm context "implementing user authentication"

# JSON output for programmatic use
cm context "database migration" --json

# Deeper historical context
cm context "API refactoring" --depth deep

# Include gap analysis
cm context "payment processing" --gaps

Output includes:

  • Top relevant playbook rules (ranked by score × relevance)
  • Related diary entries from past sessions
  • Gap analysis (categories with thin coverage)
  • Suggested starter rules for uncovered areas

cm top — Highest-Scoring Rules

# Top 10 rules by confidence score
cm top 10

# JSON output
cm top 20 --json

# Filter by category
cm top 10 --category testing

cm similar — Find Related Rules

# Semantic search over playbook
cm similar "error handling patterns"

# With scores
cm similar "authentication flow" --scores

# JSON output
cm similar "database queries" --json

cm playbook — Manage the Playbook

# List all rules
cm playbook list

# Statistics
cm playbook stats

# Export for documentation
cm playbook export --format md > PLAYBOOK.md
cm playbook export --format json > playbook.json

# Import rules
cm playbook import rules.json

cm why — Rule Provenance

# Show evidence chain for a rule
cm why RULE-123

# Output shows:
# - Original session(s) that generated the rule
# - Diary entries that led to extraction
# - Feedback history
# - Confidence trajectory

cm mark — Provide Feedback

# Mark as helpful (reinforces rule)
cm mark RULE-123 --helpful

# Mark as harmful (penalizes rule, may trigger inversion)
cm mark RULE-123 --harmful

# With context
cm mark RULE-123 --helpful --reason "Prevented auth vulnerability"

# Undo feedback
cm undo RULE-123

cm reflect — Extract Rules from Sessions

# Process recent sessions
cm reflect

# Specific time range
cm reflect --since "7d"
cm reflect --since "2024-01-01"

# Dry run (show what would be extracted)
cm reflect --dry-run

# Force re-processing of already-reflected sessions
cm reflect --force

cm audit — Check Sessions Against Rules

# Audit recent sessions for rule violations
cm audit

# Specific time range
cm audit --since "24h"

# JSON output
cm audit --json

cm validate — Test a Proposed Rule

# Check if a rule has historical support
cm validate "Always use transactions for multi-table updates"

# Output shows:
# - Supporting evidence (sessions where this helped)
# - Contradicting evidence (sessions where this hurt)
# - Recommendation (add/skip/needs-more-data)

cm outcome — Record Session Results

# Record which rules helped
cm outcome success "RULE-123,RULE-456,RULE-789"

# Record which rules hurt
cm outcome failure "RULE-999"

# Apply all pending outcomes
cm outcome-apply

# Clear pending outcomes without applying
cm outcome-clear

cm stale — Find Stale Rules

# Find rules without recent feedback
cm stale

# Custom threshold
cm stale --days 60

# JSON output
cm stale --json

# Include decay projection
cm stale --project

cm forget — Deprecate Rules

# Soft-delete a rule
cm forget RULE-123

# With reason
cm forget RULE-123 --reason "No longer relevant after framework change"

# Force (skip confirmation)
cm forget RULE-123 --force

cm doctor — System Health

# Run diagnostics
cm doctor

# Auto-fix issues
cm doctor --fix

# JSON output
cm doctor --json

Checks:

  • cass installation and accessibility
  • Playbook integrity
  • Diary consistency
  • Configuration validity
  • Session index freshness

cm usage — Usage Statistics

# Show usage stats
cm usage

# JSON output
cm usage --json

cm stats — Playbook Health Metrics

# Show playbook health
cm stats

# Output includes:
# - Total rules
# - Average confidence
# - Category distribution
# - Stale rule count
# - Anti-pattern count

Agent-Native Onboarding

CM includes a guided onboarding system that requires zero API calls:

cm onboard

The onboarding wizard:

  1. Explains the three-layer architecture
  2. Walks through basic commands
  3. Seeds initial rules from session history
  4. Sets up appropriate starter playbook
  5. Configures privacy preferences

No LLM required — onboarding works offline.


Starter Playbooks

Pre-built playbooks for common tech stacks:

# List available starters
cm starters

# Initialize with a starter
cm init --starter typescript
cm init --starter python
cm init --starter go
cm init --starter rust

Available starters:

  • typescript - TS/JS patterns, npm, testing
  • python - Python idioms, pip, pytest
  • go - Go conventions, modules, testing
  • rust - Rust patterns, cargo, clippy
  • general - Language-agnostic best practices

Gap Analysis

CM tracks which categories have thin coverage:

cm context "some task" --gaps

Gap analysis shows:

  • Categories with few rules
  • Categories with low-confidence rules
  • Suggested areas for rule extraction

This helps agents identify blind spots in the collective memory.


Batch Rule Addition

For bulk importing rules:

# From JSON
cm playbook import rules.json

# From markdown
cm playbook import rules.md

# With validation
cm playbook import rules.json --validate

JSON format:

[
  {
    "content": "Always validate user input at API boundaries",
    "category": "security",
    "confidence": 0.8,
    "source": "manual"
  }
]

Data Models

PlaybookBullet

{
  "id": "RULE-abc123",
  "content": "Use parameterized queries for all database access",
  "category": "security",
  "confidence": 0.85,
  "created_at": "2024-01-15T10:30:00Z",
  "last_feedback": "2024-03-20T14:22:00Z",
  "helpful_count": 12,
  "harmful_count": 1,
  "source_sessions": ["session-xyz", "session-abc"],
  "is_anti_pattern": false,
  "maturity": "validated"
}

FeedbackEvent

{
  "rule_id": "RULE-abc123",
  "type": "helpful",
  "reason": "Prevented SQL injection in auth flow",
  "session_id": "session-current",
  "timestamp": "2024-03-20T14:22:00Z"
}

DiaryEntry

{
  "id": "diary-xyz789",
  "session_id": "session-abc",
  "summary": "Implemented OAuth2 flow with PKCE",
  "patterns_observed": ["token-refresh", "secure-storage"],
  "issues_encountered": ["redirect-uri-mismatch"],
  "candidate_rules": ["Always use state parameter in OAuth"],
  "created_at": "2024-03-19T16:45:00Z"
}

SessionOutcome

{
  "session_id": "session-current",
  "helpful_rules": ["RULE-123", "RULE-456"],
  "harmful_rules": ["RULE-789"],
  "recorded_at": "2024-03-20T17:00:00Z",
  "applied": false
}

Rule Maturity States

Rules progress through maturity stages:

proposed → validated → mature → stale → deprecated
    ↓          ↓          ↓        ↓
 Needs      Evidence   Proven   Needs    Soft-
evidence   confirmed  helpful  refresh  deleted
StateMeaningAction
proposedNewly extracted, unvalidatedAwait evidence
validatedHas supporting evidenceMonitor feedback
matureConsistently helpful over timeTrust highly
staleNo recent feedback (>90 days)Seek re-validation
deprecatedMarked for removalWill be purged

MCP Server

Run cm as an MCP server for direct agent integration:

# Start server
cm serve

# Custom port
cm serve --port 9000

# With logging
cm serve --verbose

MCP Tools

ToolDescription
get_contextRetrieve task-relevant rules and history
search_rulesSemantic search over playbook
record_feedbackMark rules as helpful/harmful
record_outcomeRecord session outcome with rule attribution
get_statsGet playbook health metrics
validate_ruleCheck proposed rule against evidence

MCP Resources

ResourceDescription
playbook://rulesFull playbook as JSON
playbook://top/{n}Top N rules by score
playbook://staleRules needing re-validation
diary://recent/{n}Recent diary entries
stats://healthPlaybook health metrics

Configuration

Directory Structure

~/.config/cm/
├── config.toml           # Main configuration
├── playbook.json         # Rule storage
└── diary/                # Session summaries
    └── *.json

.cm/                      # Project-local config
├── config.toml           # Project overrides
└── playbook.json         # Project-specific rules

Config File Reference

# ~/.config/cm/config.toml

[general]
# LLM model for Generator/Reflector/Validator
model = "claude-sonnet-4-20250514"

# Auto-apply outcomes after session
auto_apply_outcomes = false

# Check for updates
check_updates = true

[decay]
# Half-life in days
half_life_days = 90

# Harmful feedback multiplier
harmful_multiplier = 4.0

# Minimum score before deprecation
min_score = 0.1

[reflection]
# Minimum sessions before reflecting
min_sessions = 3

# Auto-reflect on session end
auto_reflect = false

[privacy]
# Enable cross-agent learning
cross_agent_enrichment = true

# Anonymize session data in rules
anonymize_sources = false

[mcp]
# MCP server port
port = 8080

# Enable MCP server
enabled = false

Environment Variables

VariableDescriptionDefault
CM_CONFIG_DIRConfiguration directory~/.config/cm
CM_DATA_DIRData storage directory~/.local/share/cm
CM_MODELLLM model for ACE pipelineclaude-sonnet-4-20250514
CM_HALF_LIFEDecay half-life in days90
CM_MCP_PORTMCP server port8080

Privacy Controls

# View current privacy settings
cm privacy

# Disable cross-agent learning
cm privacy --disable-enrichment

# Enable cross-agent learning
cm privacy --enable-enrichment

# Anonymize sources in exported playbooks
cm privacy --anonymize-export

What Cross-Agent Enrichment Means

When enabled:

  • Rules extracted from Agent A's sessions can help Agent B
  • Diary entries reference sessions across agents
  • Collective learning improves all agents

When disabled:

  • Each agent's playbook is isolated
  • No session data shared between identities
  • Rules only come from your own sessions

Project Integration

Export playbook for project documentation:

# Generate project patterns doc
cm project --output docs/PATTERNS.md

# Include confidence scores
cm project --output docs/PATTERNS.md --scores

# Only mature rules
cm project --output docs/PATTERNS.md --mature-only

This creates a human-readable document of learned patterns for team reference.


Graceful Degradation

CM degrades gracefully when components are unavailable:

ScenarioBehavior
No cass indexWorks from diary only
No LLM accessCurator still works, reflection paused
Stale sessionsUses cached diary entries
Empty playbookReturns starter suggestions

Performance Characteristics

OperationTypical TimeNotes
cm context50-200msDepends on playbook size
cm similar100-300msSemantic search overhead
cm reflect2-10sLLM calls for Generator/Reflector
cm validate1-3sLLM call for Validator
cm mark<50msPure database operation

Integration with CASS

CM builds on top of cass (Coding Agent Session Search):

# cass provides raw session search
cass search "authentication" --robot

# cm transforms that into procedural memory
cm context "authentication"

The typical workflow:

  1. Work happens in agent sessions (stored by various tools)
  2. cass indexes and searches those sessions
  3. cm reflect extracts patterns into diary/playbook
  4. cm context retrieves relevant knowledge for new tasks

Exit Codes

CodeMeaning
0Success
1General error
2Configuration error
3Validation failed
4LLM error (reflection/validation)
5No data (empty results)

Troubleshooting

Common Issues

ProblemSolution
"No sessions found"Run cass reindex to rebuild session index
"Reflection failed"Check LLM API key and model availability
"Stale playbook"Run cm reflect to process recent sessions
"Low confidence everywhere"Natural decay; use cm mark --helpful to reinforce

Debug Mode

# Verbose output
cm context "task" --verbose

# Show scoring details
cm top 10 --debug

# Trace decay calculations
cm stats --trace-decay

Ready-to-Paste AGENTS.md Blurb

## cm - CASS Memory System

Procedural memory for AI coding agents. Transforms scattered sessions into
persistent cross-agent learnings with confidence decay and anti-pattern detection.

### Quick Start
cm context "your task" --json       # Get relevant rules
cm mark RULE-ID --helpful           # Reinforce good rules
cm outcome success "RULE-1,RULE-2"  # Record session results
cm reflect                          # Extract new rules

### Three Layers
- Episodic: Raw sessions (via cass)
- Working: Diary summaries
- Procedural: Playbook rules (this tool)

### Key Features
- 90-day confidence half-life (stale rules decay)
- 4x penalty for harmful rules (asymmetric by design)
- Anti-pattern auto-inversion (bad rules become warnings)
- Cross-agent learning (everyone benefits)
- LLM-free Curator (no hallucinated provenance)

### Essential Commands
cm context "task" --json    # Start of session
cm similar "pattern"        # Find related rules
cm mark ID --helpful/harmful # Give feedback
cm outcome success "IDs"    # End of session
cm reflect                  # Periodic maintenance

Exit codes: 0=success, 1=error, 2=config, 3=validation, 4=LLM, 5=no-data

Workflow Example: Complete Session

# 1. Start task - get relevant context
cm context "implementing rate limiting for API" --json
# → Returns rules about rate limiting, caching, API design

# 2. Note which rules you're applying
# (mental note: RULE-123 about token buckets, RULE-456 about Redis)

# 3. During work, if you find a useful pattern
cm validate "Use sliding window for rate limit precision"
# → Shows if this has historical support

# 4. End of session - record what helped
cm outcome success "RULE-123,RULE-456"

# 5. If something hurt (caused bugs/issues)
cm outcome failure "RULE-789"

# 6. Apply the feedback
cm outcome-apply

# 7. Periodically, extract new rules
cm reflect --since "7d"

Philosophy: Why Procedural Memory?

Episodic memory (raw sessions) is too noisy for real-time use. Working memory (summaries) lacks actionability. Procedural memory distills both into executable rules that directly guide behavior.

The key insight: Rules should be testable hypotheses, not static commandments. The confidence decay system treats every rule as a hypothesis that requires ongoing validation. Rules that stop being useful naturally fade; rules that keep helping get reinforced.

This mirrors how human expertise works: you don't remember every project you've done, but you retain the patterns that proved useful across many projects.