Guide for using vision utilities in speedy_utils, including fast GPU image loading, memory-mapped datasets, and notebook visualization.
Installation
Details
Usage
After installing, this skill will be available to your AI coding assistant.
Verify installation:
npx agent-skills-cli listSkill Instructions
name: 'vision-utilities' description: 'Guide for using vision utilities in speedy_utils, including fast GPU image loading, memory-mapped datasets, and notebook visualization.'
Vision Utilities Guide
This skill provides comprehensive guidance for using the vision utilities in speedy_utils.
When to Use This Skill
Use this skill when you need to:
- Load images efficiently, leveraging GPU acceleration (NVIDIA DALI) when available.
- Create memory-mapped datasets (
ImageMmap) for extremely fast random access training loops. - Visualize batches of images in Jupyter notebooks with automatic grid layout.
- Handle various image formats (numpy, torch, file paths) uniformly.
Prerequisites
speedy_utilsinstalled.Pillowandnumpy(required).matplotlib(for plotting).nvidia-dali-cuda110or similar (optional, for GPU loading).torch(optional, for tensor support).
Core Capabilities
Fast Image Loading (read_images)
- Tries GPU (DALI) first, falls back to CPU (Pillow).
- Supports batch processing and resizing.
- Validates images to skip corrupted files.
Memory-Mapped Datasets (ImageMmap, ImageMmapDynamic)
ImageMmap: For fixed-size images. Pre-processes and resizes images once, then stores them in a single binary file for zero-copy access.ImageMmapDynamic: For variable-size images. Stores flattened images and metadata.- Both support multi-process safe building with file locks.
Notebook Visualization (plot_images_notebook)
- Automatically arranges images in a grid.
- Handles mixed inputs: paths, numpy arrays, torch tensors.
- Supports (H, W, C), (C, H, W), and (B, ...) formats.
Usage Examples
Example 1: Fast Image Loading
Load a batch of images, resizing them to 224x224.
from vision_utils.io_utils import read_images
paths = ["img1.jpg", "img2.jpg", "img3.jpg"]
# Returns dict: {path: np.ndarray}
images = read_images(paths, hw=(224, 224))
Example 2: Creating a Mmap Dataset
Create a dataset that loads instantly on subsequent runs.
from vision_utils.io_utils import ImageMmap
# First run: reads files, resizes, writes .cache/mmap_dataset_...
# Next runs: maps file directly
dataset = ImageMmap(paths, size=(224, 224))
# Access like a list/array
img = dataset[0] # np.ndarray (224, 224, 3)
Example 3: Visualizing Images
Plot a mix of tensors and paths in a notebook.
from vision_utils.plot import plot_images_notebook
import torch
import numpy as np
images = [
"img1.jpg", # Path
np.random.rand(100, 100, 3), # Numpy
torch.rand(3, 64, 64) # Tensor (C, H, W)
]
plot_images_notebook(images, ncols=3, titles=["File", "Random", "Tensor"])
Guidelines
-
GPU Loading:
read_imagesis most effective for large batches. For single images, CPU overhead is lower.- Ensure DALI is installed for GPU speedup.
-
Mmap Datasets:
- Use
ImageMmapfor training pipelines where fixed size is required (e.g., ResNet). - Use
ImageMmapDynamicif you need original resolutions (e.g., for object detection with variable size inputs). - The cache is stored in
.cache/by default. Clear it if your source images change content but keep the same filenames (hashing is based on paths).
- Use
-
Plotting:
plot_images_notebookis designed for notebooks. It usesplt.show().- It automatically handles normalization (0-1 vs 0-255) for display.
Limitations
- DALI Installation: Installing DALI can be complex depending on CUDA version. The code gracefully falls back to CPU if DALI is missing.
- Disk Space: Mmap datasets duplicate image data in uncompressed format (raw pixels). This takes significantly more disk space than JPEGs but offers much faster read speeds.
More by anhvth
View allRay - Distributed Computing for AI and Python Applications: Comprehensive guide to using Ray for scalable distributed computing, including Ray Core, Data, Train, Tune, Serve, and RLlib with practical examples
Guide for using caching utilities in speedy_utils, including memory, disk, and hybrid caching strategies for sync and async functions.
Guide for using LLM utilities in speedy_utils, including memoized OpenAI clients and chat format transformations.
Guide for using IO utilities in speedy_utils, including fast JSONL reading, multi-format loading, and file serialization.
