Company

Simplifying Workflow Management: A Better Way to Use Airflow

technology
5 minutes
Ayush Garg

Airflow is a powerful and versatile workflow management tool for streamlining workflows and monitoring tasks on data processing pipelines, however, even seasoned engineers can find setting up and using Airflow confusing. Specifically for data scientists who want to run efficient pipelines, Airflow’s UX can be a steep challenge to simply deploy workflows that run on a regular schedule and connect to their data sources. Users don’t want to spend several days reading blog posts and watching YouTube tutorials on common Airflow practices.

As a data science platform, we wanted to remove all the hassle of deploying your own workflow and optimize the user experience. We utilized the Airflow API to build upon Airflow's platform and create a user-friendly workflow experience based around Airflow DAGs. To achieve this, we had to develop our own representation of users' workflows, build remote file control on servers, and provide ready-made tools for workflow control. Most importantly, the resulting UI should be intuitive and simple while allowing users to customize workflows as much as needed.

Building upon Airflows UI

The two views we set out to improve are the Workflows homepage and details page. We’ve seen that data science teams want to monitor individual workflows, run them, and read debug logs. All the extra bells and whistles are actively distracting. We show the same essential information as Airflow but keep the focus on the core objectives for our users. 

Airflow Homepage

Ludis Workflows Homepage

This shows the user the information they care about, and gives quick action buttons to edit, delete or manually trigger a workflow. A key feature to note here is the project button. This takes users to our code editor where their workflow file is located so they can quickly iterate through code and debug. It makes the process of developing and deploying workflows much easier than coding locally and having to manually upload files to a remote box. 

Airflow Details Page

Airflows UI

Ludis Workflows Detail Page

We followed the same philosophy for the details page. Highlight the essential information and give intuitive action buttons for the user. 

Connecting Ludis Workflows to Airflow

Since we are powering our workflows with Airflow we need to connect our Ludis workflows with corresponding Airflow DAGs. The first step is maintaining an internal representation of workflow objects, which is relatively straightforward. Most workflow data can be extracted from the Airflow API. We only need to know what Ludis project the workflow is in, which user it belongs to, and whether it has been published from our platform yet. 

The file management gets more interesting. We publish users' workflows and related files from our platform to a dedicated Airflow cluster using Github's API. To do this, we developed  a GitHub service that runs in conjunction with Airflow. This service listens for a publish request that pulls the user’s workflow repository on the Airflow server. Then, the workflow file and other related files are moved from the repository to the Airflow data directory. As the scheduler parses the directory, the workflow becomes accessible in Airflow. From the user's perspective, they just select the files and workflows they want published, and we do the rest.

Simplifying Workflow Development

The last step is creating a more straightforward experience for workflow development. Developing workflows can be difficult for users, so we pre-populate a basic template that provides users with basic example tasks, with Airflow ordering and generation, workflow variables and module imports. All the user has to provide to create a workflow is the workflow name, and the schedule they want the workflow to run on. We take care of the rest. This provides users a framework to import and run their scripts with ease. We also added ready-made code snippets for connecting data, like CSV and Excel files, or a user’s external database. The user just has to connect the dots.

 

Conclusion 

By building Airflow powered Workflows in the Ludis platform, we have opened workflow management to more than just engineers. Less technical users now have access to building powerful Airflow based workflows without having to deal with the stress of setting up and learning the ins and outs of Airflow.

At Ludis, we hope to empower data scientists and teams to focus on their core objectives without having to spend valuable time and mental energy on workflow management. With our projects, datasets, and app based insights, Ludis Workflows connect our system into a streamlined end to end data science analytics platform to help organize and make sense of data.