Running Ray Jobs using Airflow

Airflow

Apache Airflow is an open-source platform for authoring, scheduling and monitoring data and computing workflows. It was first developed by Airbnb and is now under the Apache Software Foundation. Airflow uses Python to create workflows that can be easily scheduled and monitored. Airflow can help you move data from one source to a destination, filter datasets, apply data policies, manipulation, monitoring and even call microservices to trigger database management tasks. It can be used for batch jobs, organizing, monitoring, and executing workflows automatically. Airflow has been used by many companies for various use cases such as ETL pipelines, machine learning workflows, data warehousing, and more.

Ray

Ray is an open-source distributed computing framework that makes it easy to build scalable and efficient applications. It was developed by a team at UC Berkeley's RISELab and has become increasingly popular over the years because of its ability to handle complex workloads with ease. Ray provides a simple API for building distributed applications, making it easy for developers to scale their applications without having to worry about the underlying infrastructure. Ray has been used for a wide variety of applications, including machine learning, reinforcement learning, data processing, and more. It has been adopted by many companies, including Amazon, NVIDIA, and Uber. Ray's popularity can be attributed to its ease of use, scalability, and flexibility.
Open source orchestrators like Airflow are one of the primary means by which companies leverage Ray in production. Airflow offers a mechanism to schedule and monitor these jobs as part of more complex workflow graphs. Kaspian has a native operator for Airflow; this operator makes it easy to either swap to or get started with running Ray jobs that utilize Kaspian's flexible compute layer.
Learn more about Kaspian and see how our flexible compute layer for the modern data cloud is already reshaping the way companies in industries like retail, manufacturing and logistics are thinking about data engineering and analytics.

Get started today

No credit card needed