Why I Run dbt Inside Airflow Docker Container
In modern data engineering pipelines, dbt and Airflow often work side by side. One common design decision is how to run dbt alongside Airflow:
- Should dbt run in its own container, orchestrated via API or CLI call?
- Or should dbt run directly inside Airflow’s Docker container as part of the DAG?
After experimenting with both, I prefer running dbt inside Airflow’s Docker container.
Here’s why.
âś… One Container = One Environment
- Airflow DAGs directly execute dbt commands inside the same container.
- This guarantees:
- Same Python version
- Same dbt version
- Same dependency versions (dbt packages, adapters)
- No cross-container networking issues
With separate containers, you often need to:
- Maintain two separate docker images
- Sync dbt versions manually
- Deal with volume mounts or cross-container file access
âś… Simplified Dependency Management
- Airflow has full control over dbt’s environment.
- No need to expose dbt via API or CLI calls across containers.
- Package upgrades (
dbt
,dbt-bigquery
,dbt-postgres
, etc.) are synchronized automatically.
In CI/CD pipelines, you just build one Docker image with both Airflow and dbt installed — much simpler to maintain.
âś… Clean DAG Code
Your Airflow tasks can simply run:
BashOperator(
task_id="run_dbt",
bash_command="dbt run --project-dir /opt/airflow/dbt"
)
No need for custom dbt API operators or cross-container shell hacks. Simple is stable.
âś… Easier Local Development
- One docker-compose.yml runs everything.
- No complex networking or shared volumes between dbt and Airflow containers.
- You don’t need to figure out: “Is dbt installed correctly inside Airflow? Does this volume mount exist?”
- It simply works.
âś… Easier Debugging
When dbt fails inside Airflow:
- Logs are fully captured by Airflow.
- You can see dbt error messages directly in Airflow UI.
- No need to jump between separate containers or log files.
âś… Consistent Data Engineering Philosophy: Orchestration Owns Execution
- Airflow orchestrates dbt — it should own dbt’s runtime too.
- Docker images should reflect the entire unit of work, not fragmented micro-containers.
đź”§ My Typical Setup
Airflow Docker image is extended to install dbt:
FROM apache/airflow:2.9.0
USER root
RUN pip install dbt-core dbt-bigquery dbt-postgres dbt-snowflake
USER airflow
- Both DAGs and dbt projects live inside shared Airflow volume (/opt/airflow/).
- One consistent environment across dev, staging, and production.
🚀 When Would I Use Separate Containers?
- Extremely large dbt runs that require autoscaling worker pods (e.g. Kubernetes Executor).
- Fully managed dbt Cloud orchestrations.
- Multi-tenant dbt service architecture.
For 95% of typical data engineering pipelines, running dbt inside Airflow’s container is simpler, faster, and much more maintainable.
🔥 Conclusion
“I run dbt directly inside Airflow containers to guarantee full environment consistency, simplify orchestration, and minimize failure points. This makes DAGs highly portable and much easier to debug.”