Google Cloud Composer vs Astronomer: what to choose?
-
11921
-
17
-
9
-
0
Selecting the right approach to building distributed data pipelines requires finding a good managed cloud computing solution, so we compare Google Cloud Compose with Astronomer.
Big Data processing was cloud platform-specific before the introduction of Airflow from Airbnb. The platform built on aggregating the venue booking offers from multiple providers across the globe obviously needed a system for forming a holistic workflow orchestration landscape throughout many infrastructure providers. After the Airflow project was initially built and donated to Apache, a huge and passionate community has invested lots of effort into turning it the best available data pipeline orchestration tool around.
However, handling complex data processing workflows is daunting enough to be worrying about the underlying infrastructure performance at the same time. This is why the need for managed Airflow services became obvious, and in 2012 two main competitors entered the field: Google Cloud Composer and Astronomer, which are microservice-architected hosted solutions that use Directed Acyclic Graphs or DAGs to manage data processing pipelines. Let’s dive deeper and compare these two alternatives, so you will be able to make an informed decision when selecting between them.
Of course, nobody forces your hand to go for paid hosting platforms and you are perfectly allowed to download the latest stable Airflow build, master its documentation and configure the underlying infrastructure and processes yourself. However, this approach is not cost-efficient, as it is a time-consuming process of reinventing the wheel and following the footprints of either Astronomer or Cloud Composer, without having access to their wealth of technical expertise.
Here are some things to consider:
The platforms themselves have evolved. Google Cloud Composer has been renamed to Apache Airflow, and Astronomer has released a number of new features.
The landscape of Airflow providers has changed. There are now a number of other Airflow providers available, such as Prefect and Luigi.
The needs of users have changed. The popularity of Airflow has grown, and there are now a wider range of use cases for the platform.
Overall, the information in the article is still relevant in 2023, but it’s important to be aware of the changes that have taken place since it was written. Here are some specific things to keep in mind: Apache Airflow is now the recommended platform for running Airflow pipelines on Google Cloud Platform. Cloud Composer is still supported, but it’s no longer actively developed. Astronomer has released a number of new features, such as support for Kubernetes and Airflow 2.0. The landscape of Airflow providers has become more crowded. There are now a number of other providers available, each with its own strengths and weaknesses. The needs of users have changed. Airflow is now used for a wider range of use cases, including data engineering, machine learning, and ETL.
Cloud Composer vs Astronomer
We will compare Google Cloud Composer to Astronomer by several parameters:
- Type of infrastructure used
- Type of operators applied
- DAG architecture and usage
- Usage of code templates
- Usage of RESTful APIs
These are the most distinguishing features, but Cloud Composer and Astronomer have lots in common:
- Both have pre-configured deployment scenarios, so instead of spending days and weeks to build and configure the needed infrastructure, you get a working Airflow environment in minutes.
- Both are managed services, so dedicated DevOps teams (Google Cloud or Astronomer Cloud respectively) handle the infrastructure maintenance tasks
- Both enable horizontal scaling out of the box, so you can add new Airflow environments with ease and never worry about scaling and load balancing
- Both provide CLI tools for DAG handling, though Cloud COmposer also provides a web UI dashboard for managing Airflow webserver and DAGs with ease.
- Both provide an immense PyPI (Python Package Index) to allow you to leverage all the libraries needed in your data processing workflows.
- Both work with a wide list of plugins to augment the operations you might need to perform
- Both support email alerting and multiple monitoring features to help you keep the hand on the pulse of your systems, though Cloud Composer benefits from direct integration with StackDriver dashboards.
- Both provide detailed developer documentation on the usage of their solutions, as well as paid support on subscription basis.
Thus said, let’s take a look at the differences between Cloud Composer and Astronomer.
- Airflow infrastructure
Google Cloud Composer deploys Airflow projects to its Kubernetes clusters using Celery Executor to store Airflow Webserver, Redis message broker, Postgres for metadata, Flower for monitoring, as well as Airflow Scheduler and Workers as nodes on a Kubernetes cluster. After the infrastructure is designed and all connectors are configured, the same scheme can be used with Google, AWS, Azure, DigitalOcean or any on-prem Kubernetes cluster.
By default, Astronomer deploys Airflow projects to GKE running on Astronomer cloud, but it has step-by-step guides to deploying your Airflow environments to any of the major cloud providers or on-prem infrastructure. Astronomer uses Mesos or Kubernetes Executors as alternatives to Celery.
- Operators applied
Airflow supports a wide range of common operators and most of these are supported by Google. Cloud Composer also works with a wide range of plugins and allows configuring any webhooks you need to trigger the Airflow data pipeline execution.
Astronomer supports the common plugins and custom operators, but the chance of you facing the need to develop another custom operator for your project is much higher with Astronomer. For example, while with Google 100% of DevOps work will be handled by the GCP team, working with the Astronomer team requires your in-house team to have a good understanding of DevOps workflows and tools. Otherwise (like when you need Airflow purely for data processing needs and have no in-house DevOps expertise), you will need to opt for the Astronomer Enterprise Cloud solution.
- DAG architecture
Cloud Composer offers a convenient DAG management dashboard, where you can combine warious modules into DAG Runs and build workflow pipelines. Each of the individual DAG components is idempotent, meaning they are self-contained and have all their connectors, hooks and dependencies stored with them, so connecting two modules in the dashboard and dropping a ready file into a DAG folder on your Google Storage leads to automatically applying all the configurations. All DAGs are kept as simple as possible to minimize the risk of misconfigured interdependencies slowing or halting the performance of your Airflow pipelines.
With Astronomer, you have a similar dashboard and a library of ready images, but there is no drag-and-drop option and all the configuration must be performed via Python scripts (R is announced but not implemented yet).
- Code templates
The default template engine for Airflow is Jinja, well-known to most Python developers working with Flask framework. It allows building neat and flexible templates that reduce the hurdle of writing new code for each operation. However, using code templates adds another layer of complexity to software engineering — but it can be a stepping stone for pure web developers transitioning into data processing operations.
With Google Cloud Composer, you get a library of templates to use, but the need in them is minimal, as it is a 100% managed service.
With Astronomer, you are free to build the templates you need, and the Astronomer team (which includes two of the initial Airflow developers and other tech talents) works on constantly increasing the number of custom code templates, webhooks and connectors available to the customers.
- RESTful API handling
Airflow uses RESTful APIs for interacting with external system modules. With Google Cloud, this means Google’s AI and ML products and system components, first and foremost. However, there are Google guides on moving your Airflow environments to external destinations or replacing
With Astronomer, you are free to use these APIs from the get-go to deploy your Airflow projects to on-prem Kubernetes clusters, AWS, Azure, etc. — or include components from these cloud platforms into your infrastructure.
Use cases for RESTful APIs with Airflow include the following scenarios:
- Spinning up Kubernetes clusters for data processing based on external https requests.
- Launching a workflow based on a message appearing in your message broker or data storage.
- Building full-scale Machine Learning platforms for processing data on demand
Thus said, using REST API turns Airflow into a highly flexible solution that can serve multiple business needs in a wide variety of scenarios.
Conclusions: when to use Astronomer or Cloud Composer?
To wrap it up, let’s talk about what matters most for many businesses – costs. While Google showcases Cloud Composer pricing openly, the scheme of price formation is not quite transparent, as data storage and some other expenses are added to your overall monthly bill. However, various sources indicate the average price of a single Airflow environment to be around $300/mo. with Google. Of course, for this price, you get an end-to-end solution with in-depth help documentation and top-notch Google support.
Astronomer Cloud is essentially the Google Cloud reseller, as GKE is its default destination for Airflow environments. However, Astronomer charges only $110/mo. to start an Airflow project with a Local Executor. The price is nearly three times lower — but the level of user convenience is not quite as high with Astronomer, both in terms of DAG configuration and in terms of availability of plugins, connectors and API integrations with other projects.
Therefore, you can either go for rock-solid customer experience at quite an affordable price with GCP or select a much more affordable solution with more configuration overhead with Astronomer. keep in mind though, that both of these costs can multiply quite quickly, should you configure Airflow incorrectly,
But what to do if your team does not have the DevOps expertise required to plan and execute complex distributed workflows and spending time waiting for Google Cloud support response is too costly? Contact IT Svit, one of the leaders of the worldwide Managed DevOps Services market! We would be glad to help!