💯 From Basic to Intermediate: Understanding dbt Tests

If you’re using dbt to transform data, you’re already winning. But did you know dbt has powerful testing features to keep your data clean, reliable, and trustworthy? In this post, we’ll walk through: ✅ Basic dbt tests — the quick wins 🚀 Intermediate tests — custom logic and reusable macros ✅ Basic dbt Tests (Built-in) dbt has out-of-the-box tests you can define in your .yml files under your models. Here’s an example: ...

June 10, 2025

How PostgreSQL Surprises You: Booleans, Text I/O, and ETL Gotchas

PostgreSQL is a powerful, standards-compliant database — but it has its quirks. One of those is how it handles boolean values, especially when exporting data in text format. 🧠 PostgreSQL Boolean Behavior: It’s Not What You Think Internally, PostgreSQL stores boolean values efficiently using just 1 bit — as you’d expect. But when you convert those values to text, say in a query or an export via COPY, things look… different: ...

June 10, 2025

🧚 Why Run dbt Inside Airflow Docker Container

Why I Run dbt Inside Airflow Docker Container In modern data engineering pipelines, dbt and Airflow often work side by side. One common design decision is how to run dbt alongside Airflow: Should dbt run in its own container, orchestrated via API or CLI call? Or should dbt run directly inside Airflow’s Docker container as part of the DAG? After experimenting with both, I prefer running dbt inside Airflow’s Docker container. ...

June 4, 2025

🧹 Data Cleansing: Why You Should Always Clean at the Staging Layer

In real-world data engineering pipelines, one of the most common mistakes is postponing data cleansing until too late in the pipeline. The cleaner your upstream data is, the simpler and more maintainable your downstream models will be. Let’s break it down. ✅ The Principle Whenever possible, cleanse your data as early as possible — ideally at the staging layer. ✅ The Why 1️⃣ Clear Separation of Responsibilities Staging models are responsible for: ...

June 4, 2025

🔧 ARM Mac + Docker + dbt: Troubleshooting Startup Issues

While setting up Airflow + dbt projects with Docker, you may run into this common error message and its solutions. 🔍 Problem 1: Platform Architecture Mismatch Error message: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) My Mac is running on ARM (Apple Silicon - M1/M2/M3). The official dbt Docker image is built for amd64 (x86-based). As a result, Docker tries to run cross-architecture using QEMU emulation, which sometimes leads to internal Python path issues → surfaces as the dbt dbt --version error. This is not a simple dbt bug — the root cause is platform mismatch. ...

May 30, 2025

📊 What dbt Does Well vs What Python Does Better

Role dbt Does Well Python Does Better Structured data cleaning (staging) ✅ Possible, but inconvenient Designing mart table structures ✅ Also possible User-specific calculations ❌ Inconvenient ✅ Super flexible Scoring, conditional matching, if-else logic ❌ Very cumbersome ✅ Ideal Filtering based on user input ❌ Not possible ✅ Core feature Explaining recommendations, tuning logic ❌ ✅ Fully customizable For Example -- This kind of logic is painful in dbt... SELECT CASE WHEN user.age BETWEEN policy.min_age AND policy.max_age THEN 30 ELSE 0 END + CASE WHEN user.income < policy.income_ceiling THE_ ELSE 0 END + ... In dbt, the concept of a “user” doesn’t even exist dbt is built for models that apply the same logic to everyone Python, on the other hand, can generate different recommendations per user based on input 👉 dbt is great for static modeling, but dynamic, user-input-driven recommender systems are better suited for Python. ...

May 12, 2025