Dataset versioning skill using DVC for tracking data changes, managing data pipelines, and ensuring reproducibility.
Installation
Details
Usage
After installing, this skill will be available to your AI coding assistant.
Verify installation:
npx agent-skills-cli listSkill Instructions
name: dvc-dataset-versioning description: Dataset versioning skill using DVC for tracking data changes, managing data pipelines, and ensuring reproducibility. allowed-tools:
- Read
- Write
- Bash
- Glob
- Grep
dvc-dataset-versioning
Overview
Dataset versioning skill using DVC (Data Version Control) for tracking data changes, managing data pipelines, and ensuring reproducibility in ML workflows.
Capabilities
- Dataset version tracking
- Data pipeline definition and execution
- Remote storage management (S3, GCS, Azure, etc.)
- Reproducibility enforcement
- Data lineage tracking
- Experiment comparison with data versions
- Cache management for large datasets
Target Processes
- Data Collection and Validation Pipeline
- ML Model Retraining Pipeline
- Feature Store Implementation
Tools and Libraries
- DVC
- Git
- Remote storage SDKs (boto3, google-cloud-storage, etc.)
Input Schema
{
"type": "object",
"required": ["action"],
"properties": {
"action": {
"type": "string",
"enum": ["init", "add", "push", "pull", "diff", "checkout", "run", "repro"],
"description": "DVC action to perform"
},
"paths": {
"type": "array",
"items": { "type": "string" },
"description": "File or directory paths to track"
},
"remote": {
"type": "string",
"description": "Remote storage name"
},
"revision": {
"type": "string",
"description": "Git revision for checkout/diff"
},
"pipeline": {
"type": "object",
"description": "Pipeline stage definition for run action"
}
}
}
Output Schema
{
"type": "object",
"required": ["status", "action"],
"properties": {
"status": {
"type": "string",
"enum": ["success", "error"]
},
"action": {
"type": "string"
},
"trackedFiles": {
"type": "array",
"items": { "type": "string" }
},
"changes": {
"type": "array",
"items": {
"type": "object",
"properties": {
"path": { "type": "string" },
"status": { "type": "string" },
"hash": { "type": "string" }
}
}
},
"remote": {
"type": "object",
"properties": {
"name": { "type": "string" },
"url": { "type": "string" },
"syncStatus": { "type": "string" }
}
}
}
}
Usage Example
{
kind: 'skill',
title: 'Version training dataset',
skill: {
name: 'dvc-dataset-versioning',
context: {
action: 'add',
paths: ['data/train.csv', 'data/test.csv'],
remote: 's3-bucket'
}
}
}
More by a5c-ai
View allBuild interactive API try-it-out consoles for documentation
Multi-dimensional visual scoring using pixel-diff and structural analysis for design-to-implementation comparison
Structured skill for conducting engineering trade studies and concept selection
Analyze and apply film/TV genre conventions, tropes, and audience expectations
