Kueue — ClusterQueues, ResourceFlavors, fair sharing, preemption, TAS, MultiKueue. Use when managing batch workload queuing and GPU quotas on K8s. NOT for Volcano.
Installation
Details
Usage
After installing, this skill will be available to your AI coding assistant.
Verify installation:
npx agent-skills-cli listSkill Instructions
name: kueue description: "Kueue — ClusterQueues, ResourceFlavors, fair sharing, preemption, TAS, MultiKueue. Use when managing batch workload queuing and GPU quotas on K8s. NOT for Volcano."
Kueue
Kubernetes-native job queueing system. Manages quotas and decides when workloads should wait, start, or be preempted.
Docs: https://kueue.sigs.k8s.io/docs/
GitHub: https://github.com/kubernetes-sigs/kueue
Version: v0.16.x | Requires: Kubernetes ≥ 1.29
Core API Objects
| Object | Scope | Purpose |
|---|---|---|
| ResourceFlavor | Cluster | Maps to node types (GPU models, spot/on-demand, architectures). Optional topologyName for TAS. |
| ClusterQueue | Cluster | Defines resource quotas per flavor, fair sharing, preemption, admission checks |
| LocalQueue | Namespace | Tenant-facing queue pointing to a ClusterQueue |
| WorkloadPriorityClass | Cluster | Priority for queue ordering (independent of pod priority) |
| Workload | Namespace | Unit of admission — auto-created for each job |
| Topology | Cluster | Hierarchical node topology for TAS (block → rack → node) |
| Cohort | Cluster | Hierarchical cohort tree node — shared resource pools and nested borrowing/lending between ClusterQueues (v0.11+) |
| AdmissionCheck | Cluster | Gate admission on external signals (provisioning, MultiKueue) |
Flow: Job → LocalQueue → ClusterQueue → quota reservation → admission checks → admission → pods created.
Installation & Operations
See references/operations.md for comprehensive deployment, configuration, Helm values, metrics, upgrades, MultiKueue setup, and feature gates.
Quick install:
# kubectl
kubectl apply --server-side -f https://github.com/kubernetes-sigs/kueue/releases/download/v0.16.2/manifests.yaml
# Helm
helm install kueue oci://registry.k8s.io/kueue/charts/kueue \
--version=0.16.2 --namespace kueue-system --create-namespace --wait
Minimal Setup
apiVersion: kueue.x-k8s.io/v1beta2
kind: ResourceFlavor
metadata:
name: default-flavor
---
apiVersion: kueue.x-k8s.io/v1beta2
kind: ClusterQueue
metadata:
name: cluster-queue
spec:
namespaceSelector: {}
resourceGroups:
- coveredResources: ["cpu", "memory", "nvidia.com/gpu"]
flavors:
- name: default-flavor
resources:
- name: "cpu"
nominalQuota: 40
- name: "memory"
nominalQuota: 256Gi
- name: "nvidia.com/gpu"
nominalQuota: 8
---
apiVersion: kueue.x-k8s.io/v1beta2
kind: LocalQueue
metadata:
name: user-queue
namespace: default
spec:
clusterQueue: cluster-queue
Submitting Workloads
Label any supported job with kueue.x-k8s.io/queue-name:
apiVersion: batch/v1
kind: Job
metadata:
name: training-job
labels:
kueue.x-k8s.io/queue-name: user-queue
kueue.x-k8s.io/priority-class: high-priority
spec:
suspend: true # Kueue unsuspends on admission
parallelism: 4
completions: 4
template:
spec:
containers:
- name: trainer
image: training:latest
resources:
requests:
cpu: "2"
memory: 8Gi
nvidia.com/gpu: "1"
restartPolicy: Never
Supported Integrations
Batch: Job, JobSet, RayJob, RayCluster, PyTorchJob, TFJob, MPIJob, PaddleJob, XGBoostJob, TrainJob (Trainer v2), plain Pods, pod groups
Serving: Deployment, StatefulSet, LeaderWorkerSet, RayService
Enable integrations in the KueueConfiguration:
integrations:
frameworks:
- "batch/job"
- "jobset.x-k8s.io/jobset"
- "ray.io/rayjob"
- "ray.io/raycluster"
- "kubeflow.org/pytorchjob"
- "kubeflow.org/mpijob"
- "pod"
Pod Groups (Plain Pods)
Group multiple pods as a single workload using the group label:
apiVersion: v1
kind: Pod
metadata:
name: worker-0
labels:
kueue.x-k8s.io/queue-name: user-queue
annotations:
kueue.x-k8s.io/pod-group-name: my-training
kueue.x-k8s.io/pod-group-total-count: "4"
spec:
containers:
- name: worker
image: training:latest
resources:
requests:
nvidia.com/gpu: "1"
ClusterQueue Configuration
Resource Groups
Resources in the same group are assigned the same flavor (e.g., GPU + CPU + memory on the same node type):
spec:
resourceGroups:
- coveredResources: ["cpu", "memory", "nvidia.com/gpu"]
flavors:
- name: gpu-a100
resources:
- name: "cpu"
nominalQuota: 64
- name: "memory"
nominalQuota: 512Gi
- name: "nvidia.com/gpu"
nominalQuota: 8
borrowingLimit: 4 # max borrow from cohort
lendingLimit: 2 # max lend to cohort
- name: gpu-t4 # fallback flavor
resources:
- name: "cpu"
nominalQuota: 32
- name: "memory"
nominalQuota: 128Gi
- name: "nvidia.com/gpu"
nominalQuota: 4
Kueue tries flavors in order — A100 first, T4 as fallback.
Namespace Selector
Restrict which namespaces can submit to a ClusterQueue:
spec:
namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: team-a # single namespace
# OR matchLabels with custom label for multiple namespaces
# OR namespaceSelector: {} for all namespaces
Queueing Strategy
| Strategy | Behavior |
|---|---|
BestEffortFIFO (default) | Priority-ordered, but smaller jobs can skip ahead if larger ones don't fit |
StrictFIFO | Strict ordering — head-of-line blocks even if smaller jobs fit |
Cohorts and Borrowing
ClusterQueues in the same cohort share unused quota. See references/multi-tenant.md for full examples.
nominalQuota— guaranteed resourcesborrowingLimit— max resources this queue can borrowlendingLimit— max resources this queue lends out
Hierarchical Cohorts (v0.11+): Cohorts can be explicitly created as Cohort resources to form tree structures, enabling multi-level borrowing across teams/orgs. A Cohort can define resourceGroups (shared pool for its members) and reference a parentName to nest cohorts. ClusterQueues still use .spec.cohort: <name> to join a cohort.
apiVersion: kueue.x-k8s.io/v1beta2
kind: Cohort
metadata:
name: org-wide
spec:
resourceGroups:
- coveredResources: ["nvidia.com/gpu"]
flavors:
- name: gpu-a100
resources:
- name: "nvidia.com/gpu"
nominalQuota: 16 # shared pool for all child queues
---
apiVersion: kueue.x-k8s.io/v1beta2
kind: Cohort
metadata:
name: team-ml
spec:
parentName: org-wide # can borrow from org-wide pool
Flavor Fungibility
Controls behavior when preferred flavor is full:
spec:
flavorFungibility:
whenCanBorrow: Borrow # or TryNextFlavor
whenCanPreempt: TryNextFlavor # or Preempt
Stop Policy
Pause a ClusterQueue: Hold (stop admitting) or HoldAndDrain (stop admitting + evict running).
Preemption
Configure in .spec.preemption:
spec:
preemption:
withinClusterQueue: LowerPriority # Never | LowerPriority | LowerOrNewerEqualPriority
reclaimWithinCohort: Any # Never | Any | LowerPriority | LowerOrNewerEqualPriority
borrowWithinCohort:
policy: LowerPriority # Never | LowerPriority | LowerOrNewerEqualPriority
maxPriorityThreshold: 100 # only preempt workloads at or below this priority
Preemption order: Borrowing workloads in cohort → lowest priority → most recently admitted.
Fair Sharing
Preemption-based (DRF): Enable globally in KueueConfiguration with fairSharing.enable: true. Per-queue weight via spec.fairSharing.weight.
Admission-based (usage history): Orders workloads by historical LocalQueue resource consumption. Enable via feature gate AdmissionFairSharing: true (beta, on by default since v0.15). Configure usageHalfLifeTime and usageSamplingInterval in the admissionFairSharing: block of KueueConfiguration (see references/operations.md).
Topology Aware Scheduling (TAS)
Optimizes pod placement for network throughput. Critical for distributed training.
Setup
- Define a
Topologywith hierarchy levels (zone → hostname) - Reference
topologyNamefrom aResourceFlavor
User Annotations (PodTemplate level)
kueue.x-k8s.io/podset-required-topology: <level>— all pods MUST be in the same topology domainkueue.x-k8s.io/podset-preferred-topology: <level>— best-effort, falls back to wider domainskueue.x-k8s.io/podset-unconstrained-topology: ""— TAS capacity accounting, no placement constraint
Hot-swap: When lowest topology level is hostname, TAS supports automatic node replacement on failure. See assets/tas-training-job.yaml for a complete example.
AdmissionChecks
External gates that must pass before a workload is admitted. Primary use: ProvisioningRequest for cluster autoscaler integration.
Flow: quota reserved → ProvisioningRequest created → cluster-autoscaler scales nodes → Provisioned=true → workload admitted.
Elastic Workloads & Dynamic Reclaim
- Elastic: Alpha. Feature gate
ElasticJobsViaWorkloadSlices: true. Annotate withkueue.x-k8s.io/elastic-job: "true". - Dynamic Reclaim: Admitted workloads release unused quota early via
status.reclaimablePods.
WorkloadPriorityClass
apiVersion: kueue.x-k8s.io/v1beta2
kind: WorkloadPriorityClass
metadata:
name: high-priority
value: 10000
description: "Production training jobs"
Apply via label: kueue.x-k8s.io/priority-class: high-priority
Workload Lifecycle
spec.active: false— deactivates workload (evicts if admitted)spec.maximumExecutionTimeSeconds: N— auto-evict after N seconds
Key kubectl Commands
# List all Kueue objects
kubectl get clusterqueues,localqueues,resourceflavors,workloads,workloadpriorityclasses -A
# ClusterQueue status and usage
kubectl describe clusterqueue <name>
# Workload for a job
JOB_UID=$(kubectl get job -n <ns> <name> -o jsonpath='{.metadata.uid}')
kubectl get workloads -n <ns> -l "kueue.x-k8s.io/job-uid=$JOB_UID"
# Pending workloads (visibility API)
kubectl get --raw "/apis/visibility.kueue.x-k8s.io/v1beta1/clusterqueues/<cq>/pendingworkloads"
# Controller logs
kubectl logs -n kueue-system deploy/kueue-controller-manager --tail=200
kueuectl plugin: For streamlined management (create, resume, stop), see references/kueuectl.md.
Feature Compatibility
Not all features compose freely. Before combining TAS, elastic workloads, MultiKueue, or fair sharing, check references/compatibility.md for known incompatibilities and upgrade gotchas.
Diagnostic Script
Run scripts/kueue-diag.sh for automated diagnostics:
# Cluster-wide overview: queues, quotas, pending workloads
bash scripts/kueue-diag.sh
# Per-workload diagnostics: admission status, checks, events
bash scripts/kueue-diag.sh <workload-name> [namespace]
Checks controller health, ClusterQueue quota usage, cohort configuration, pending workloads, ResourceFlavor status, and per-workload admission conditions.
References
compatibility.md— Feature incompatibilities and version-specific gotchaskueuectl.md— kueuectl CLI plugin installation and usagemulti-tenant.md— Fair sharing, cohorts, borrowing, and preemption for multi-team GPU clustersoperations.md— Installation, configuration, monitoring, and production operationsprovisioning-request.md— ProvisioningRequest terminal states, naming, retry behavior, and debuggingtroubleshooting.md— Pending workloads, admission failures, and preemption debugging
Cross-References
- kuberay — Gang scheduling RayJob/RayCluster workloads with Kueue
- aws-fsx — FSx storage for queued training jobs
- volcano — Alternative batch scheduler with gang scheduling
- gpu-operator — GPU resource management for queued workloads
- karpenter — Node provisioning via NodePools/NodeClaims (does NOT use the ProvisioningRequest API — that requires cluster-autoscaler)
- kubeflow-trainer — Queue Kubeflow training jobs
- leaderworkerset — Queue multi-node LWS workloads
- prometheus-grafana — Monitor Kueue queue depth and admission metrics
More by tylertitsworth
View alluv — fast Python package/project manager, lockfiles, Python versions, uvx tool runner, Docker/CI integration. Use for Python dependency management. NOT for package publishing.
TensorRT-LLM — engine building, quantization (FP8/FP4/INT4/AWQ), Python LLM API, AutoDeploy, KV cache tuning, in-flight batching, disaggregated serving with HTTP cluster management, Ray orchestrator, sparse attention (RocketKV), Triton backend. Use when optimizing directly with TRT-LLM. NOT for NIM deployment or vLLM/SGLang setup.
NVIDIA Triton Inference Server — model repository, config.pbtxt, ensemble/BLS pipelines, backends (TensorRT/ONNX/Python), dynamic batching, model management API, perf_analyzer. Use when serving models with Triton Inference Server. NOT for K8s deployment patterns. NOT for NIM.
KubeRay operator — RayCluster, RayJob, RayService, GPU scheduling, autoscaling, auth tokens, Label Selector API, GCS fault tolerance, TLS, observability, and Kueue/Volcano integration. Use when deploying Ray on Kubernetes. NOT for Ray Core programming (see ray-core).
