Configuration of Apache Airflow for the Climate Action Secretariat projects.
This repository contains the docker images, helm charts, and DAGs required to automate various workflows for the CAS team.
The dags directory contains the various workflows (Directed Acyclic Graphs)
- The Kubernetes Executor allows us to run tasks on Kubernetes as Pods.
- It gives us benefits to run one script inside one container within its own quota, and to schedule to the least-congested node in the cluster.
- The KubernetesPodOperator allows us to create Pods on Kubernetes.
- It gives us the freedom to run the command in any arbitrary image, sandboxing the job run inside a docker container.
- It allows us to select in which namespace to run the job.
Airflow Kubernetes
The docker images are built on CircleCI for every commit, and pushed to CAS' google cloud registry if the build occurs on the develop
or master
branch, or if the commit is tagged.
Deployement is done with Shipit, using the helm charts defined in the helm
folder
- the
helm install
command should specify namespaces for the different CAS applications:helm install --set namespaces.airflow=<< >> --set namespaces.ggircs=<< >> --set namespaces.ciip=<< >> --set namespaces.cif=<< >>
There are a couple manual steps required for installation (the first deployment) at the moment:
- Prior to deploying, the namespace where airflow is deployed should have:
- A "github-registry" secret, containing the pull secrets to the docker registry. This should be taken care of by cas-pipeline's
make provision
. - An "airflow-default-user-password" secret. This will have airflow create a 'cas-aiflow-admin' user with this password.
-
Deploy with shipit
-
The connections required in the various dags need to be manually created
- stream-minio should be replaced to use gcs client
- the docker images should be imported in the cluster instead of pulling from GH every time we spin up a pod
- authentication should be done with GitHub (allowing members of https://github.com/orgs/bcgov/teams/cas-developers)
- automate the creation of connections on installation
git clone git@github.com:bcgov/cas-airflow.git ~/cas-airflow && cd $_
git submodule update --init
This repository contains the DAGs as well as the helm chart. It submodules airflow through the cas-airflow-upstream repository, to use its helm chart as a dependency - and will eventually reference the official airflow instead.
Use asdf to install the correct version of python.
asdf install
Use pip to install the correct version of airflow.
pip install -r requirements.txt
Then reshim asdf to ensure the correct version of airflow is in your path.
asdf reshim
Be sure to set the $AIRFLOW_HOME
environment variable if this repository was cloned to a path other than ~/airflow
.
airflow db init
airflow users create -r Admin -u <<username>> -e <<email>> -f <<first name>> -l <<last name>> -p <<password>>
Start airflow locally (optional).
airflow webserver --daemon
airflow scheduler --daemon
Run a specific task in a specific dag.
airflow test hello_world_dag hello_task $(date -u +"%Y-%m-%dT%H:%M:%SZ")