episode

@Motium-AI/episode

0 forks

Updated 4/7/2026

Generate educational video episodes with Minecraft-style graphics. Orchestrates fal.ai (Kling video, Flux images), ElevenLabs TTS, and FFmpeg assembly into complete episodes. Use when asked to "generate an episode", "create educational video", "produce an episode", or "/episode".

Installation

$npx agent-skills-cli install @Motium-AI/episode

Claude Code

Cursor

Copilot

Codex

Antigravity

Details

RepositoryMotium-AI/claude-code-toolkit

Pathconfig/skills/episode/SKILL.md

Branchmain

Scoped Name@Motium-AI/episode

Usage

After installing, this skill will be available to your AI coding assistant.

Verify installation:

npx agent-skills-cli list

Skill Instructions

name: episode description: > Generate educational video episodes with Minecraft-style graphics. Orchestrates fal.ai (Kling video, Flux images), ElevenLabs TTS, and FFmpeg assembly into complete episodes. Use when asked to "generate an episode", "create educational video", "produce an episode", or "/episode".

Educational Video Episode Generator (/episode)

Autonomous skill that generates complete educational video episodes with Minecraft-style graphics. Claude acts as Creative Director, writing the script and orchestrating AI generation APIs to produce the final video.

Architecture

/episode "How photosynthesis works"
    │
    ├── Phase 0: Activation (state file, dependency check)
    ├── Phase 1: Script Generation (Claude writes episode script as JSON manifest)
    ├── Phase 2: Media Generation (pipeline.py calls APIs)
    │   ├── Images: Flux via fal.ai
    │   ├── Video clips: Kling I2V via fal.ai
    │   ├── Audio: ElevenLabs TTS
    │   └── Assembly: FFmpeg
    ├── Phase 3: Review (human watches output)
    └── Phase 4: Complete (checkpoint validation)

Triggers

/episode <topic>
"generate an episode about..."
"create educational video on..."
"produce an episode explaining..."

Phase 0: Activation

State File (Automatic)

Create .claude/autonomous-state.json with "mode": "episode" at activation.

Dependency Check

Before proceeding, verify dependencies are installed:

python3 -c "import fal_client; import elevenlabs" 2>/dev/null || {
    echo "Installing dependencies..."
    pip3 install fal-client elevenlabs
}

API Keys Check

Required environment variables:

FAL_KEY - fal.ai API key
ELEVENLABS_API_KEY - ElevenLabs API key

[ -z "$FAL_KEY" ] && echo "ERROR: FAL_KEY not set" && exit 1
[ -z "$ELEVENLABS_API_KEY" ] && echo "ERROR: ELEVENLABS_API_KEY not set" && exit 1

If missing, ask the user ONCE at start, then proceed autonomously.

Phase 1: Script Generation (Claude as Creative Director)

You ARE the creative director. Write the episode script as a JSON manifest.

Episode Structure: The 7-Act Framework

Every episode follows this learning-science-backed narrative structure:

Act	Duration	Purpose	Minecraft Metaphor
SPARK	1 scene	Hook with wonder/mystery	Discovering a glowing ore
QUEST	2 scenes	Frame learning as journey	Setting out from spawn
MAP	2 scenes	Overview before deep dive	Crafting a map
MINE	4-5 scenes	Core concept exploration	Mining at different depths
CRAFT	3 scenes	Synthesis, the "aha" moment	Crafting table assembly
BUILD	3 scenes	Application, abstract→concrete	Building a structure
PORTAL	2 scenes	Climax + "what's next" hook	Activating a portal

Scene Count by Duration

Episode Duration	Total Scenes	Clips per Scene	Total Clips
3 minutes	12-15 scenes	1-2 clips	15-20 clips
10 minutes	18-22 scenes	2-3 clips	40-50 clips

Minecraft Visual Style Guide

ALWAYS include this prefix in visual prompts:

"Minecraft-style 3D voxel world. Blocky cubic geometry, pixel-art textures, warm ambient lighting. Characters are blocky humanoid figures with square heads. Bright saturated colors, soft shadows. Low-poly aesthetic."

Visual Metaphor Dictionary:

Abstract Concept	Minecraft Visual
Variable	Chest with name tag
Function	Redstone circuit
Data flow	Minecart on rails
Memory	Storage room with chests
Error/bug	Creeper hiding in build
Process	Furnace smelting
Input/Output	Hopper feeding items
Hierarchy	Stacked blocks, scaffolding
Connection	Redstone wire linking blocks
Transformation	Crafting animation

Manifest Schema

Create the manifest at episodes/EP{NNN}/manifest.json:

{
  "episode_id": "EP001",
  "title": "How Photosynthesis Works",
  "topic": "photosynthesis",
  "target_duration_seconds": 180,
  "style": "minecraft",
  "created_at": "2026-01-30T21:00:00Z",
  "cost_budget_usd": 25.00,
  "cost_spent_usd": 0.00,

  "scenes": [
    {
      "scene_id": "scene-001",
      "sequence": 1,
      "act": "SPARK",
      "duration_seconds": 10,
      "narration": "What if I told you that every leaf is a tiny factory?",
      "visual_prompt": "Minecraft-style forest clearing at sunrise. Sunbeams pierce through blocky oak leaves. Golden light particles float in the air. A blocky character looks up in wonder at the glowing canopy.",
      "camera": "slow_pan_up",
      "music_mood": "wonder",

      "image": {
        "status": "pending",
        "fal_request_id": null,
        "asset_path": null,
        "cost_usd": 0
      },
      "clip": {
        "status": "pending",
        "fal_request_id": null,
        "asset_path": null,
        "cost_usd": 0
      },
      "audio": {
        "status": "pending",
        "asset_path": null,
        "cost_usd": 0
      }
    }
  ],

  "assembly": {
    "status": "pending",
    "asset_path": null
  }
}

Writing the Script

Create episode directory:

mkdir -p episodes/EP001/assets/{images,clips,audio}

Write manifest.json with all scenes following the 7-act structure
Each scene needs:
- narration: What the narrator says (10-20 words per 10s)
- visual_prompt: Detailed Minecraft-style visual description (include style prefix)
- camera: Camera movement (static, slow_pan, zoom_in, tracking)
- duration_seconds: 8-15 seconds per scene

Phase 2: Media Generation

Run the pipeline script to generate all media:

python3 ~/.claude/skills/episode/scripts/pipeline.py episodes/EP001/manifest.json

The script handles:

Images → Flux via fal.ai (~3-10s per image)
Video clips → Kling I2V via fal.ai (~60-180s per clip)
Audio → ElevenLabs TTS (~2-5s per scene)
Assembly → FFmpeg concat with crossfades

Monitoring Progress

The manifest is updated after each operation. Check progress:

cat episodes/EP001/manifest.json | jq '.scenes[] | {scene_id, image: .image.status, clip: .clip.status, audio: .audio.status}'

Resume from Checkpoint

If the pipeline crashes or times out, simply re-run:

python3 ~/.claude/skills/episode/scripts/pipeline.py episodes/EP001/manifest.json

The script reads the manifest and skips completed work.

Cost Tracking

The manifest tracks costs per operation. Check total:

cat episodes/EP001/manifest.json | jq '.cost_spent_usd'

Expected costs:

3-minute episode: $15-22
10-minute episode: $48-68

Phase 3: Review

After pipeline completes, the final video is at:

episodes/EP001/episode.mp4

Human review is required. Watch the video and verify:

Narration is clear and educational
Visuals match the Minecraft style
Pacing feels natural
No jarring cuts or artifacts
Audio levels are balanced

Phase 4: Complete

Completion Checkpoint Schema

{
  "self_report": {
    "is_job_complete": true,
    "code_changes_made": true,
    "linters_pass": true,
    "category": "pattern"
  },
  "reflection": {
    "what_was_done": "Generated 3-min episode 'How Photosynthesis Works' with 15 scenes",
    "what_remains": "none",
    "key_insight": "Reusable lesson about episode generation approach (>50 chars)",
    "search_terms": ["episode", "video-generation", "minecraft"],
    "memory_that_helped": []
  },
  "evidence": {
    "episode_path": "episodes/EP001/episode.mp4",
    "manifest_path": "episodes/EP001/manifest.json",
    "cost_usd": 18.50,
    "duration_seconds": 182,
    "scene_count": 15
  }
}

Troubleshooting

fal.ai Queue Timeout

If Kling takes too long (>5 min per clip), the script may timeout. Just re-run:

python3 ~/.claude/skills/episode/scripts/pipeline.py episodes/EP001/manifest.json --phase clips

Rate Limits

The script caps concurrent video submissions to 3. If you hit rate limits, wait 60 seconds and re-run.

Style Drift

If later clips look less "Minecraft-like", the I2V model may be drifting. Regenerate the keyframe image with a stronger style prompt and re-run the clip generation.

FFmpeg Errors

Ensure FFmpeg is installed:

which ffmpeg || brew install ffmpeg

API Reference

fal.ai Endpoints

Model	Endpoint	Use
Flux Dev	`fal-ai/flux/dev`	Keyframe images
Kling 2.1	`fal-ai/kling-video/v2.1/pro/image-to-video`	Video clips

ElevenLabs

Model	Voice ID	Use
Eleven Multilingual v2	`JBFqnCBsd6RMkjVDRZzb` (George)	Narration

Example: 3-Minute Episode on Photosynthesis

# 1. Invoke the skill
/episode "How photosynthesis works"

# 2. Claude writes the script (manifest.json with 15 scenes)

# 3. Run the pipeline
python3 ~/.claude/skills/episode/scripts/pipeline.py episodes/EP001/manifest.json

# 4. Watch the output
open episodes/EP001/episode.mp4

# 5. Update checkpoint and complete

Skill Fluidity

You may use techniques from any skill for sub-problems without switching modes. Your autonomous state and checkpoint remain governed by /episode.

Philosophy

This skill embodies the Namshub principle: Claude is the intelligence, APIs are the tools.

Claude doesn't just call APIs. Claude is the creative director who:

Writes compelling educational narratives
Designs visual metaphors that make abstract concepts tangible
Paces the episode for engagement and retention
Makes intentional creative decisions at every step

The APIs generate pixels and audio. Claude provides the vision.

More by Motium-AI

View all

prompt-engineering-patterns

Design prompts, skills, and CLAUDE.md files as context engineering problems. Use when writing skills, optimizing prompts, designing agent workflows, auditing CLAUDE.md, or reducing prompt bloat. Triggers on "prompt engineering", "optimize prompt", "write a skill", "reduce bloat", "context engineering".

audiobook

Transform technical documents into long-form audiobooks. Uses 4-agent heavy analysis, TTS optimization, Michael Caine oration style, and stop-slop enforcement. Generates ElevenLabs-ready output with SSML pause tags and full text normalization. Use when asked to "create an audiobook", "turn this into audio", or "/audiobook".

ux-improver

Recursively improve web application UX via vision-based screenshot analysis. Use when asked to "improve UX", "fix usability", "audit user experience", or "/uximprove". Triggers on UX review, usability improvement, user flow analysis, interaction audit.

compound

Capture solved problems as memory events for cross-session learning. Use after solving non-trivial problems. Triggers on "/compound", "document this solution", "capture this learning", "remember this fix".