Transcribes video audio using WhisperX, preserving original timestamps. Creates JSON transcript with word-level timing. Use when you need to generate audio transcripts for videos.
Installation
Details
Usage
After installing, this skill will be available to your AI coding assistant.
Verify installation:
skills listSkill Instructions
name: transcribe-audio description: Transcribes video audio using WhisperX, preserving original timestamps. Creates JSON transcript with word-level timing. Use when you need to generate audio transcripts for videos.
Skill: Transcribe Audio
Transcribes video audio using WhisperX and creates clean JSON transcripts with word-level timing data.
When to Use
- Videos need audio transcripts before visual analysis
Critical Requirements
Use WhisperX, NOT standard Whisper. WhisperX preserves the original video timeline including leading silence, ensuring transcripts match actual video timestamps. Run WhisperX directly on video files. Don't extract audio separately - this ensures timestamp alignment.
Workflow
1. Read Language from Library File
Read the library's library.yaml to get the language code:
# Library metadata
library_name: [library-name]
language: en # Language code stored here
...
2. Run WhisperX
whisperx "/full/path/to/video.mov" \
--language en \
--model medium \
--compute_type float32 \
--device cpu \
--output_format json \
--output_dir libraries/[library-name]/transcripts
3. Prepare Audio Transcript
After WhisperX completes, format the JSON using our prepare_audio_script:
ruby .claude/skills/transcribe-audio/prepare_audio_script.rb \
libraries/[library-name]/transcripts/video_name.json \
/full/path/to/original/video_name.mov
This script:
- Adds video source path as metadata
- Removes unnecessary fields to reduce file size
- Prettifies JSON
4. Return Success Response
After audio preparation completes, return this structured response to the parent agent:
✓ [video_filename.mov] transcribed successfully
Audio transcript: libraries/[library-name]/transcripts/video_name.json
Video path: /full/path/to/video_filename.mov
DO NOT update library.yaml - the parent agent will handle this to avoid race conditions when running multiple transcriptions in parallel.
Running in Parallel
This skill is designed to run inside a Task agent for parallel execution:
- Each agent handles ONE video file
- Multiple agents can run simultaneously
- Parent thread updates library.yaml sequentially after each agent completes
- No race conditions on shared YAML file
Next Step
After audio transcription, use the analyze-video skill to add visual descriptions and create the visual transcript.
Installation
Ensure WhisperX is installed. Use the setup skill to verify dependencies.
More by barefootford
View allCreates a new ButterCut release with version bump, changelog, git tag, gem build, and GitHub release. Use when publishing a new version.
Creates video rough cut yaml file for use with Buttercut gem. Concatenates visual transcripts with file markers, creates a roughcut yaml with clip selections, then exports to XML format. Use this skill when users want a "roughcut", "sequence" or "scene" generated. These are all the same thing, just with different lengths.
Creates compressed ZIP backups of libraries directory. Backs up library.yaml, transcripts, and roughcuts (not video files). This skill can also be useful when you need to restore a library.
Adds visual descriptions to transcripts by extracting and analyzing video frames with ffmpeg. Creates visual transcript with periodic visual descriptions of the video clip. Use when all files have audio transcripts present (transcript) but don't yet have visual transcripts created (visual_transcript).