kueue

@tylertitsworth/kueue

0 forks

Updated 4/1/2026

Kueue — ClusterQueues, ResourceFlavors, fair sharing, preemption, TAS, MultiKueue. Use when managing batch workload queuing and GPU quotas on K8s. NOT for Volcano.

Installation

$npx agent-skills-cli install @tylertitsworth/kueue

Claude Code

Cursor

Copilot

Codex

Antigravity

Details

Repositorytylertitsworth/skills

Pathkueue/SKILL.md

Branchmain

Scoped Name@tylertitsworth/kueue

Usage

After installing, this skill will be available to your AI coding assistant.

Verify installation:

npx agent-skills-cli list

Skill Instructions

name: kueue description: "Kueue — ClusterQueues, ResourceFlavors, fair sharing, preemption, TAS, MultiKueue. Use when managing batch workload queuing and GPU quotas on K8s. NOT for Volcano."

Kueue

Kubernetes-native job queueing system. Manages quotas and decides when workloads should wait, start, or be preempted.

Docs: https://kueue.sigs.k8s.io/docs/
GitHub: https://github.com/kubernetes-sigs/kueue
Version: v0.16.x | Requires: Kubernetes ≥ 1.29

Core API Objects

Object	Scope	Purpose
ResourceFlavor	Cluster	Maps to node types (GPU models, spot/on-demand, architectures). Optional `topologyName` for TAS.
ClusterQueue	Cluster	Defines resource quotas per flavor, fair sharing, preemption, admission checks
LocalQueue	Namespace	Tenant-facing queue pointing to a ClusterQueue
WorkloadPriorityClass	Cluster	Priority for queue ordering (independent of pod priority)
Workload	Namespace	Unit of admission — auto-created for each job
Topology	Cluster	Hierarchical node topology for TAS (block → rack → node)
Cohort	Cluster	Hierarchical cohort tree node — shared resource pools and nested borrowing/lending between ClusterQueues (v0.11+)
AdmissionCheck	Cluster	Gate admission on external signals (provisioning, MultiKueue)

Flow: Job → LocalQueue → ClusterQueue → quota reservation → admission checks → admission → pods created.

Installation & Operations

See references/operations.md for comprehensive deployment, configuration, Helm values, metrics, upgrades, MultiKueue setup, and feature gates.

Quick install:

# kubectl
kubectl apply --server-side -f https://github.com/kubernetes-sigs/kueue/releases/download/v0.16.2/manifests.yaml

# Helm
helm install kueue oci://registry.k8s.io/kueue/charts/kueue \
  --version=0.16.2 --namespace kueue-system --create-namespace --wait

Minimal Setup

apiVersion: kueue.x-k8s.io/v1beta2
kind: ResourceFlavor
metadata:
  name: default-flavor
---
apiVersion: kueue.x-k8s.io/v1beta2
kind: ClusterQueue
metadata:
  name: cluster-queue
spec:
  namespaceSelector: {}
  resourceGroups:
  - coveredResources: ["cpu", "memory", "nvidia.com/gpu"]
    flavors:
    - name: default-flavor
      resources:
      - name: "cpu"
        nominalQuota: 40
      - name: "memory"
        nominalQuota: 256Gi
      - name: "nvidia.com/gpu"
        nominalQuota: 8
---
apiVersion: kueue.x-k8s.io/v1beta2
kind: LocalQueue
metadata:
  name: user-queue
  namespace: default
spec:
  clusterQueue: cluster-queue

Submitting Workloads

Label any supported job with kueue.x-k8s.io/queue-name:

apiVersion: batch/v1
kind: Job
metadata:
  name: training-job
  labels:
    kueue.x-k8s.io/queue-name: user-queue
    kueue.x-k8s.io/priority-class: high-priority
spec:
  suspend: true  # Kueue unsuspends on admission
  parallelism: 4
  completions: 4
  template:
    spec:
      containers:
      - name: trainer
        image: training:latest
        resources:
          requests:
            cpu: "2"
            memory: 8Gi
            nvidia.com/gpu: "1"
      restartPolicy: Never

Supported Integrations

Batch: Job, JobSet, RayJob, RayCluster, PyTorchJob, TFJob, MPIJob, PaddleJob, XGBoostJob, TrainJob (Trainer v2), plain Pods, pod groups
Serving: Deployment, StatefulSet, LeaderWorkerSet, RayService

Enable integrations in the KueueConfiguration:

integrations:
  frameworks:
  - "batch/job"
  - "jobset.x-k8s.io/jobset"
  - "ray.io/rayjob"
  - "ray.io/raycluster"
  - "kubeflow.org/pytorchjob"
  - "kubeflow.org/mpijob"
  - "pod"

Pod Groups (Plain Pods)

Group multiple pods as a single workload using the group label:

apiVersion: v1
kind: Pod
metadata:
  name: worker-0
  labels:
    kueue.x-k8s.io/queue-name: user-queue
  annotations:
    kueue.x-k8s.io/pod-group-name: my-training
    kueue.x-k8s.io/pod-group-total-count: "4"
spec:
  containers:
  - name: worker
    image: training:latest
    resources:
      requests:
        nvidia.com/gpu: "1"

ClusterQueue Configuration

Resource Groups

Resources in the same group are assigned the same flavor (e.g., GPU + CPU + memory on the same node type):

spec:
  resourceGroups:
  - coveredResources: ["cpu", "memory", "nvidia.com/gpu"]
    flavors:
    - name: gpu-a100
      resources:
      - name: "cpu"
        nominalQuota: 64
      - name: "memory"
        nominalQuota: 512Gi
      - name: "nvidia.com/gpu"
        nominalQuota: 8
        borrowingLimit: 4    # max borrow from cohort
        lendingLimit: 2      # max lend to cohort
    - name: gpu-t4           # fallback flavor
      resources:
      - name: "cpu"
        nominalQuota: 32
      - name: "memory"
        nominalQuota: 128Gi
      - name: "nvidia.com/gpu"
        nominalQuota: 4

Kueue tries flavors in order — A100 first, T4 as fallback.

Namespace Selector

Restrict which namespaces can submit to a ClusterQueue:

spec:
  namespaceSelector:
    matchLabels:
      kubernetes.io/metadata.name: team-a  # single namespace
  # OR matchLabels with custom label for multiple namespaces
  # OR namespaceSelector: {} for all namespaces

Queueing Strategy

Strategy	Behavior
`BestEffortFIFO` (default)	Priority-ordered, but smaller jobs can skip ahead if larger ones don't fit
`StrictFIFO`	Strict ordering — head-of-line blocks even if smaller jobs fit

Cohorts and Borrowing

ClusterQueues in the same cohort share unused quota. See references/multi-tenant.md for full examples.

nominalQuota — guaranteed resources
borrowingLimit — max resources this queue can borrow
lendingLimit — max resources this queue lends out

Hierarchical Cohorts (v0.11+): Cohorts can be explicitly created as Cohort resources to form tree structures, enabling multi-level borrowing across teams/orgs. A Cohort can define resourceGroups (shared pool for its members) and reference a parentName to nest cohorts. ClusterQueues still use .spec.cohort: <name> to join a cohort.

apiVersion: kueue.x-k8s.io/v1beta2
kind: Cohort
metadata:
  name: org-wide
spec:
  resourceGroups:
  - coveredResources: ["nvidia.com/gpu"]
    flavors:
    - name: gpu-a100
      resources:
      - name: "nvidia.com/gpu"
        nominalQuota: 16   # shared pool for all child queues
---
apiVersion: kueue.x-k8s.io/v1beta2
kind: Cohort
metadata:
  name: team-ml
spec:
  parentName: org-wide   # can borrow from org-wide pool

Flavor Fungibility

Controls behavior when preferred flavor is full:

spec:
  flavorFungibility:
    whenCanBorrow: Borrow       # or TryNextFlavor
    whenCanPreempt: TryNextFlavor  # or Preempt

Stop Policy

Pause a ClusterQueue: Hold (stop admitting) or HoldAndDrain (stop admitting + evict running).

Preemption

Configure in .spec.preemption:

spec:
  preemption:
    withinClusterQueue: LowerPriority          # Never | LowerPriority | LowerOrNewerEqualPriority
    reclaimWithinCohort: Any                    # Never | Any | LowerPriority | LowerOrNewerEqualPriority
    borrowWithinCohort:
      policy: LowerPriority                    # Never | LowerPriority | LowerOrNewerEqualPriority
      maxPriorityThreshold: 100                # only preempt workloads at or below this priority

Preemption order: Borrowing workloads in cohort → lowest priority → most recently admitted.

Fair Sharing

Preemption-based (DRF): Enable globally in KueueConfiguration with fairSharing.enable: true. Per-queue weight via spec.fairSharing.weight.

Admission-based (usage history): Orders workloads by historical LocalQueue resource consumption. Enable via feature gate AdmissionFairSharing: true (beta, on by default since v0.15). Configure usageHalfLifeTime and usageSamplingInterval in the admissionFairSharing: block of KueueConfiguration (see references/operations.md).

Topology Aware Scheduling (TAS)

Optimizes pod placement for network throughput. Critical for distributed training.

Setup

Define a Topology with hierarchy levels (zone → hostname)
Reference topologyName from a ResourceFlavor

User Annotations (PodTemplate level)

kueue.x-k8s.io/podset-required-topology: <level> — all pods MUST be in the same topology domain
kueue.x-k8s.io/podset-preferred-topology: <level> — best-effort, falls back to wider domains
kueue.x-k8s.io/podset-unconstrained-topology: "" — TAS capacity accounting, no placement constraint

Hot-swap: When lowest topology level is hostname, TAS supports automatic node replacement on failure. See assets/tas-training-job.yaml for a complete example.

AdmissionChecks

External gates that must pass before a workload is admitted. Primary use: ProvisioningRequest for cluster autoscaler integration.

Flow: quota reserved → ProvisioningRequest created → cluster-autoscaler scales nodes → Provisioned=true → workload admitted.

Elastic Workloads & Dynamic Reclaim

Elastic: Alpha. Feature gate ElasticJobsViaWorkloadSlices: true. Annotate with kueue.x-k8s.io/elastic-job: "true".
Dynamic Reclaim: Admitted workloads release unused quota early via status.reclaimablePods.

WorkloadPriorityClass

apiVersion: kueue.x-k8s.io/v1beta2
kind: WorkloadPriorityClass
metadata:
  name: high-priority
value: 10000
description: "Production training jobs"

Apply via label: kueue.x-k8s.io/priority-class: high-priority

Workload Lifecycle

spec.active: false — deactivates workload (evicts if admitted)
spec.maximumExecutionTimeSeconds: N — auto-evict after N seconds

Key kubectl Commands

# List all Kueue objects
kubectl get clusterqueues,localqueues,resourceflavors,workloads,workloadpriorityclasses -A

# ClusterQueue status and usage
kubectl describe clusterqueue <name>

# Workload for a job
JOB_UID=$(kubectl get job -n <ns> <name> -o jsonpath='{.metadata.uid}')
kubectl get workloads -n <ns> -l "kueue.x-k8s.io/job-uid=$JOB_UID"

# Pending workloads (visibility API)
kubectl get --raw "/apis/visibility.kueue.x-k8s.io/v1beta1/clusterqueues/<cq>/pendingworkloads"

# Controller logs
kubectl logs -n kueue-system deploy/kueue-controller-manager --tail=200

kueuectl plugin: For streamlined management (create, resume, stop), see references/kueuectl.md.

Feature Compatibility

Not all features compose freely. Before combining TAS, elastic workloads, MultiKueue, or fair sharing, check references/compatibility.md for known incompatibilities and upgrade gotchas.

Diagnostic Script

Run scripts/kueue-diag.sh for automated diagnostics:

# Cluster-wide overview: queues, quotas, pending workloads
bash scripts/kueue-diag.sh

# Per-workload diagnostics: admission status, checks, events
bash scripts/kueue-diag.sh <workload-name> [namespace]

Checks controller health, ClusterQueue quota usage, cohort configuration, pending workloads, ResourceFlavor status, and per-workload admission conditions.

References

compatibility.md — Feature incompatibilities and version-specific gotchas
kueuectl.md — kueuectl CLI plugin installation and usage
multi-tenant.md — Fair sharing, cohorts, borrowing, and preemption for multi-team GPU clusters
operations.md — Installation, configuration, monitoring, and production operations
provisioning-request.md — ProvisioningRequest terminal states, naming, retry behavior, and debugging
troubleshooting.md — Pending workloads, admission failures, and preemption debugging

Cross-References

kuberay — Gang scheduling RayJob/RayCluster workloads with Kueue
aws-fsx — FSx storage for queued training jobs
volcano — Alternative batch scheduler with gang scheduling
gpu-operator — GPU resource management for queued workloads
karpenter — Node provisioning via NodePools/NodeClaims (does NOT use the ProvisioningRequest API — that requires cluster-autoscaler)
kubeflow-trainer — Queue Kubeflow training jobs
leaderworkerset — Queue multi-node LWS workloads
prometheus-grafana — Monitor Kueue queue depth and admission metrics

More by tylertitsworth

View all

uv — fast Python package/project manager, lockfiles, Python versions, uvx tool runner, Docker/CI integration. Use for Python dependency management. NOT for package publishing.

tensorrt-llm

TensorRT-LLM — engine building, quantization (FP8/FP4/INT4/AWQ), Python LLM API, AutoDeploy, KV cache tuning, in-flight batching, disaggregated serving with HTTP cluster management, Ray orchestrator, sparse attention (RocketKV), Triton backend. Use when optimizing directly with TRT-LLM. NOT for NIM deployment or vLLM/SGLang setup.

triton-inference-server

NVIDIA Triton Inference Server — model repository, config.pbtxt, ensemble/BLS pipelines, backends (TensorRT/ONNX/Python), dynamic batching, model management API, perf_analyzer. Use when serving models with Triton Inference Server. NOT for K8s deployment patterns. NOT for NIM.

kuberay

KubeRay operator — RayCluster, RayJob, RayService, GPU scheduling, autoscaling, auth tokens, Label Selector API, GCS fault tolerance, TLS, observability, and Kueue/Volcano integration. Use when deploying Ray on Kubernetes. NOT for Ray Core programming (see ray-core).