Building a robust data pipeline with the dAG stack: dbt, Airflow, and Great Expectations

To Nha Notes | Jan. 2, 2022, 2:12 p.m.

Running dbt in Production

If your organization is using Airflow, there are a number of ways you can run your dbt jobs, including:

  • Invoking dbt through the BashOperator. In this case, be sure to install dbt into a virtual environment to avoid issues with conflicting dependencies between Airflow and dbt.
  • Installing the airflow-dbt python package. This package uses Airflow's operator and hook concept — the source code can be found on github.

 

References:

https://www.getdbt.com/coalesce-2020/building-a-robust-data-pipeline-with-dbt-airflow-and-great-expectations/

https://docs.getdbt.com/docs/running-a-dbt-project/running-dbt-in-production

https://pypi.org/project/airflow-dbt/

https://github.com/gocardless/airflow-dbt

https://airflowsummit.org/slides/2021/d6-dAGStack.pdf

https://github.com/astronomer/airflow-dbt-demo

https://github.com/konosp/dbt-on-airflow

https://analyticsmayhem.com/dbt/schedule-dbt-models-with-apache-airflow/

https://github.com/spbail/dag-stack

https://legacy.docs.greatexpectations.io/en/latest/guides/workflows_patterns/deployment_airflow.html

http://mamykin.com/posts/fast-data-load-snowflake-dbt/

https://www.entechlog.com/blog/kafka/exploring-dbt-with-snowflake/

https://www.startdataengineering.com/post/cicd-dbt/