Task Plugins
Flyte tasks are pluggable by design, allowing you to extend task execution beyond simple containers to support specialized compute frameworks and integrations.
Default Execution: Containers
By default, Flyte tasks execute as single containers in Kubernetes. When you decorate a function with @env.task, Flyte packages your code into a container and runs it on the cluster. For more advanced scenarios requiring multiple containers in a single pod (such as sidecars for logging or data mounting), you can use
pod templates, which allow you to customize the entire Kubernetes pod specification.
Compute Plugins
Beyond native container execution, Flyte provides compute plugins that enable you to run distributed computing frameworks directly on Kubernetes. These plugins create ephemeral clusters specifically for your task execution, spinning them up on-demand and tearing them down when complete.
Available Compute Plugins
Flyte supports several popular distributed computing frameworks through compute plugins:
- Spark: Run Apache Spark jobs using the Spark operator
- Ray: Execute Ray workloads for distributed Python applications and ML training
- Dask: Scale Python workflows with Dask distributed
- PyTorch: Run distributed training jobs using PyTorch and Kubeflow’s training operator
How Compute Plugins Work
Compute plugins create temporary, isolated clusters within the same Kubernetes environment as Flyte:
- Ephemeral clusters: Each task execution gets its own cluster, spun up on-demand
- Kubernetes operators: Flyte leverages specialized Kubernetes operators (Spark operator, Ray operator, etc.) to manage cluster lifecycle
- Native containerization: The same container image system used for regular tasks works seamlessly with compute plugins
- Per-environment configuration: You can define the cluster shape (number of workers, resources, etc.) using
plugin_configin yourTaskEnvironment
Using Compute Plugins
To use a compute plugin, you need to:
- Install the plugin package: Each plugin has a corresponding Python package (e.g.,
flyteplugins-rayfor Ray) - Configure the TaskEnvironment: Set the
plugin_configparameter with the plugin-specific configuration - Write your task: Use the framework’s native APIs within your task function
Example: Ray Plugin
Here’s how to run a distributed Ray task:
import ray
from flyteplugins.ray.task import HeadNodeConfig, RayJobConfig, WorkerNodeConfig
import flyte
# Define your Ray computation
@ray.remote
def compute_square(x):
return x * x
# Configure the Ray cluster
ray_config = RayJobConfig(
head_node_config=HeadNodeConfig(ray_start_params={"log-color": "True"}),
worker_node_config=[WorkerNodeConfig(group_name="ray-workers", replicas=2)],
runtime_env={"pip": ["numpy", "pandas"]},
enable_autoscaling=False,
shutdown_after_job_finishes=True,
ttl_seconds_after_finished=300,
)
# Create a task environment with Ray plugin configuration
image = (
flyte.Image.from_debian_base(name="ray")
.with_pip_packages("ray[default]==2.46.0", "flyteplugins-ray")
)
ray_env = flyte.TaskEnvironment(
name="ray_env",
plugin_config=ray_config,
image=image,
resources=flyte.Resources(cpu=(3, 4), memory=("3000Mi", "5000Mi")),
)
# Use the Ray cluster in your task
@ray_env.task
async def distributed_compute(n: int = 10) -> list[int]:
futures = [compute_square.remote(i) for i in range(n)]
return ray.get(futures)When this task runs, Flyte will:
- Spin up a Ray cluster with 1 head node and 2 worker nodes
- Execute your task code in the Ray cluster
- Tear down the cluster after completion
Using Plugins on Union
Most compute plugins are enabled by default on Union or can be enabled upon request. Contact your Account Manager to confirm plugin availability or request specific plugins for your deployment.
Backend Integrations
Beyond compute plugins, Flyte also supports integrations with external SaaS services and internal systems through connectors. These allow you to seamlessly interact with:
- Data warehouses: Snowflake, BigQuery, Redshift
- Data platforms: Databricks
- Custom services: Your internal APIs and services
Connectors enable Flyte to delegate task execution to these external systems while maintaining Flyte’s orchestration, observability, and data lineage capabilities. See the connectors documentation for more details on available integrations.
Next Steps
For detailed guides on each compute plugin, including configuration options, best practices, and advanced examples, see the Plugins section of the documentation. Each plugin guide covers:
- Installation and setup
- Configuration options
- Resource management
- Advanced use cases
- Troubleshooting tips