🧚 Why Run dbt Inside Airflow Docker Container

Why I Run dbt Inside Airflow Docker Container In modern data engineering pipelines, dbt and Airflow often work side by side. One common design decision is how to run dbt alongside Airflow: Should dbt run in its own container, orchestrated via API or CLI call? Or should dbt run directly inside Airflow’s Docker container as part of the DAG? After experimenting with both, I prefer running dbt inside Airflow’s Docker container. ...

June 4, 2025

🧹 Data Cleansing: Why You Should Always Clean at the Staging Layer

In real-world data engineering pipelines, one of the most common mistakes is postponing data cleansing until too late in the pipeline. The cleaner your upstream data is, the simpler and more maintainable your downstream models will be. Let’s break it down. βœ… The Principle Whenever possible, cleanse your data as early as possible β€” ideally at the staging layer. βœ… The Why 1️⃣ Clear Separation of Responsibilities Staging models are responsible for: ...

June 4, 2025

🐳 How I Dockerized My GitHub Pages Jekyll Site β€” The Clean Setup That Works

😩 The Problem Setting up Jekyll with Docker sounds easy, but I ran into: platform issues (arm64 vs amd64) - I use Apple Silicon Macbook (M1) bundle install headaches Since I was building this for my personal GitHub Pages site, I also had to make sure it stays compatible with GitHub Pages gem versions while being easy to develop locally. πŸ›  My Clean Solution I ended up building this Docker setup. It works for me at last. ...

June 3, 2025

πŸ”§ ARM Mac + Docker + dbt: Troubleshooting Startup Issues

While setting up Airflow + dbt projects with Docker, you may run into this common error message and its solutions. πŸ” Problem 1: Platform Architecture Mismatch Error message: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) My Mac is running on ARM (Apple Silicon - M1/M2/M3). The official dbt Docker image is built for amd64 (x86-based). As a result, Docker tries to run cross-architecture using QEMU emulation, which sometimes leads to internal Python path issues β†’ surfaces as the dbt dbt --version error. This is not a simple dbt bug β€” the root cause is platform mismatch. ...

May 30, 2025

πŸ”§ Solving Airflow Docker Startup Issues

Common issues you will often encounter when running Airflow with Docker. ❗ Issue 1 β€” .env file is not visible inside Airflow container πŸ” Symptom Summary The .env file exists at the project root. But inside the Airflow container, load_dotenv() fails to read it. The reason: Docker automatically passes .env as environment variables. But Docker does not copy or mount the file itself into the container. Therefore, load_dotenv() has no file to read. βœ… Solution 1️⃣ Add volume mount for .env in docker-compose.yml This way, the .env file becomes available inside the container at the correct path. ...

May 30, 2025

πŸ”§ Why Do We Split Airflow into init, scheduler, and webserver?

If you start working with Airflow a bit more seriously, you’ll quickly notice that it’s usually split into multiple services: airflow-init airflow-scheduler airflow-webserver At first, you may wonder: β€œWhy do we need to split them up like this?” Well β€” this is actually the standard production architecture. Let’s break it down in simple, practical terms. 1️⃣ airflow-init β€” Preparation Step Also sometimes called airflow-db-migrate or airflow-bootstrap. This runs only once when you initialize Airflow. ...

May 30, 2025

🌱 Making a Potting Soil Calculator – React + Vite + Netlify

✨ Try It πŸ‘‰ Launch the Potting Soil Calculator πŸ“¬ Source Code GitHub: https://github.com/namikimlab/potting-soil-calculator πŸͺ΄ Why I Built This When planting in pots, figuring out how much soil you need can be surprisingly tricky. The volume depends on the pot’s shape, size, height, and quantity β€” and beginner gardeners often don’t have a clear way to calculate it. So I decided to create a tool that allows users to quickly and intuitively calculate the soil volume needed for repotting. ...

May 21, 2025

πŸ“Š What dbt Does Well vs What Python Does Better

Role dbt Does Well Python Does Better Structured data cleaning (staging) βœ… Possible, but inconvenient Designing mart table structures βœ… Also possible User-specific calculations ❌ Inconvenient βœ… Super flexible Scoring, conditional matching, if-else logic ❌ Very cumbersome βœ… Ideal Filtering based on user input ❌ Not possible βœ… Core feature Explaining recommendations, tuning logic ❌ βœ… Fully customizable For Example -- This kind of logic is painful in dbt... SELECT CASE WHEN user.age BETWEEN policy.min_age AND policy.max_age THEN 30 ELSE 0 END + CASE WHEN user.income < policy.income_ceiling THE_ ELSE 0 END + ... In dbt, the concept of a β€œuser” doesn’t even exist dbt is built for models that apply the same logic to everyone Python, on the other hand, can generate different recommendations per user based on input πŸ‘‰ dbt is great for static modeling, but dynamic, user-input-driven recommender systems are better suited for Python. ...

May 12, 2025

πŸš€ Building a Batch Data Pipeline with AWS, Airflow, and Spark

✨ Project Summary Assuming I am working for a fintech company, I built a batch pipeline that automatically aggregates β†’ transforms β†’ analyzes credit card data. Since I couldn’t use real data, I used synthetic transaction data generated using Faker, but I believe it was sufficient for the purpose of designing the overall data flow and structure. 🎯 Goal β€œBuild an Airflow pipeline that processes realistic financial data with Spark, analyzes and stores them.” ...

May 1, 2025

Hugo First Post Testing

Introduction This is bold text, and this is emphasized text. Visit the Hugo website!

April 30, 2025