Queues
flyteplugins-union pluginThe queue CLI commands and Python objects on this page are provided by the
flyteplugins-union package. Install it with pip install flyteplugins-union.
A queue is a named scheduling lane. It does two jobs at once: it routes work to a cluster pool (and, optionally, specific clusters within it), and it governs that work with concurrency, depth, priority, and fairness limits.
This page covers creating and managing queues administratively, from either the CLI or Python. For how workflow authors target a queue from task code, see Queues in Configure tasks.
How a queue routes
A queue lives inside one cluster pool and routes work to one or more clusters
within that pool. By default (the * selector) it spreads across every healthy
cluster in the pool; you can also pin it to specific clusters. It can never reach a
cluster in another pool — pools are isolation boundaries.
flowchart TD
R(["Runs & actions"])
subgraph Pdef["Cluster pool: default"]
direction TB
QD["Queue: default<br/>selector: *"]
CA["Cluster A"]
CB["Cluster B"]
QD --> CA
QD --> CB
end
subgraph Pprod["Cluster pool: prod"]
direction TB
QP["Queue: prod-queue<br/>selector: *"]
QG["Queue: gpu-queue<br/>pinned: Cluster C"]
CC["Cluster C"]
CD["Cluster D"]
QP --> CC
QP --> CD
QG --> CC
end
R --> QD
R --> QP
R --> QG
Users submit to a queue, never to a pool or a cluster directly. Each queue sits inside exactly one pool:
defaultspreads across all clusters in thedefaultpool.prod-queuespreads across all clusters in theprodpool.gpu-queuelives in the sameprodpool but is pinned to a single cluster.
The selector (which clusters within the pool) is mutable. The pool a queue lives in is fixed at creation — an update that changes it is rejected, because moving a queue to another pool would cross an isolation boundary. To move workloads to another pool, see Move work to another pool.
Queues are currently organization-scoped. Some CLI and Python surfaces expose
project and domain parameters for future scoped queues, but current
deployments reject project/domain-scoped queue creation.
Create a queue
run_concurrency and action_concurrency are required; everything else has a
sensible default. With no cluster selector, a queue spreads work across all
healthy clusters in its pool.
flyte create queue my-queue \
--run-concurrency 100 \
--action-concurrency 1000Create a higher-priority queue in a specific pool:
flyte create queue gpu-queue \
--cluster-pool prod \
--cluster prod-us-east-1 \
--run-concurrency 50 \
--action-concurrency 500 \
--depth 5000 \
--priority max \
--fairness round_robinfrom flyteplugins.union.remote import Queue
queue = Queue.create(
"my-queue",
run_concurrency=100,
action_concurrency=1000,
)
print(queue.to_dict())Create a higher-priority queue in a specific pool:
queue = Queue.create(
"gpu-queue",
cluster_pool="prod",
clusters=["prod-us-east-1"],
run_concurrency=50,
action_concurrency=500,
depth=5000,
priority="max",
fairness="round_robin",
)Every queue is bound to a cluster pool, chosen at creation time with
cluster_pool in Python or --cluster-pool in the CLI. If you omit it, the
queue is bound to the default cluster pool.
What each setting controls
cluster_pool/--cluster-pool— the pool this queue lives in. A queue can only route to clusters in its own pool. Omit to bind the queue to thedefaultpool.clusters/--cluster— pin the queue to one or more clusters in the pool. Omit to use all clusters in the pool. In the API,["*"]means all enabled and healthy clusters in the pool, and*must be the only entry if used.run_concurrency/--run-concurrency— maximum number of runs active on the queue at once. Children of an active run aren’t counted; use this to stop a job from overlapping with a previous invocation of itself.0means no limit.action_concurrency/--action-concurrency— maximum number of actions (tasks) running at once. A cap of 1 serializes the queue; higher values bound the burst rate.0means no limit.depth/--depth— total in-flight plus waiting items the queue will hold (default10000).0means no limit.priority/--priority—min,medium(default), ormax. Among queues contending for the same pool’s capacity, higher-priority work is scheduled first. Under the hood these map to enum values 1, 50, and 100; usemaxfor a priority higher than 50. Priority controls ordering, not preemption.fairness/--fairness—round_robin(default) orshuffle_interleave. This controls how actions from different projects sharing the queue are interleaved.
Inspect queues
# List all queues
flyte get queue
# Inspect one queue's settings and status
flyte get queue gpu-queue
# Stream live metrics — runs in-flight, actions in-flight, queue depth
flyte get queue gpu-queue --watchfrom flyteplugins.union.remote import Queue
for queue in Queue.listall(limit=100):
print(queue.name, queue.status, queue.priority, queue.cluster_pool, queue.clusters)
queue = Queue.get("gpu-queue")
print(queue.to_dict())
metrics = Queue.details("gpu-queue")
print(metrics)To stream metrics:
for metrics in Queue.watch("gpu-queue"):
print(metrics)--watch renders live progress bars for run concurrency, action concurrency, and
depth, so you can see a queue filling up or draining in real time.
Change a queue’s settings
You can update limits, priority, fairness, or cluster pinning. The update API replaces the full queue spec; the Python wrapper handles this by reading the current queue first, changing only the fields you pass, and writing the complete spec back.
flyte update queue gpu-queue --editThis opens the queue in your $EDITOR so you can adjust the mutable settings.
from flyteplugins.union.remote import Queue
Queue.update(
"gpu-queue",
run_concurrency=75,
action_concurrency=750,
priority="max",
clusters=["prod-us-east-1"],
)Changing the cluster selector within the same pool (which clusters the queue pins to) takes effect immediately because every cluster in the pool shares the same data plane.
Drain and reactivate a queue
Draining takes a queue out of rotation without losing in-flight work: the
queue stops admitting new submissions, work already in flight runs to
completion, and once nothing is left the queue settles into the drained state.
Draining is how you quiesce a queue — before deleting the cluster behind it,
before maintenance, or as part of
moving work to another pool.
A queue is in one of three states:
active --[drain]--> draining --[in-flight work completes]--> drained
^ | |
+----[activate]------+----------------[activate]-------------+The drain operation is currently disabled — the control plane rejects drain requests. Support is coming in a future release.
flyte update queue gpu-queue --drain # stop new submissions; let in-flight work finish
flyte update queue gpu-queue --activate # put the queue back in rotationfrom flyteplugins.union.remote import Queue
Queue.drain("gpu-queue") # stop new submissions; let in-flight work finish
Queue.activate("gpu-queue") # put the queue back in rotationThe default queue is always active — its state cannot be changed. Note also
that queues cannot be deleted: draining is how a queue is retired, and an idle
queue costs nothing.
Move work to another pool
Moving work to a different pool crosses an isolation boundary. In-flight runs have already landed their data, containers, code, and secrets in the old pool’s data plane, and a different pool’s clusters cannot read them. So a queue can never change pools in place; moving work is a drain-and-replace migration:
- Create a new queue in the destination pool.
- Update workflows, launch plans, triggers, or run overrides to target the new queue.
- Let in-flight work finish on the old queue. Once draining is available, drain it to also shut out any straggler submissions.
- Leave the old queue idle. Queues cannot be deleted; an idle queue costs nothing.
A task can override its queue at runtime
(
task.override(queue=...)),
but only to another queue in the same pool as the run’s original queue. A
cross-pool override is rejected, for the same data plane reason that moving
work between pools requires a drain-and-replace migration.
See also
- Queues in Configure tasks — routing work to a queue from task code, triggers, and per-run context.
- Cluster pools and Clusters — the routing targets a queue points at.