๐Ÿ›ก๏ธ Solving the Kerberos User Authentication Issue in Spark Docker Streaming

Solving the Kerberos User Authentication Issue in Spark Docker Streaming While building my real-time streaming pipeline using Spark, Kafka, and Docker, I ran into a Spark error related to Kerberos authentication - when I wasnโ€™t even using Kerberosa. org.apache.hadoop.security.KerberosAuthException: failure to login: javax.security.auth.login.LoginException: java.lang.NullPointerException: invalid null input: name โ“ What triggered the problem? I was using the official apache/spark:3.5.0 Docker image. Spark inside Docker was trying to resolve Hadoopโ€™s default authentication mechanism. Hadoop tries to retrieve the current OS user via: UnixPrincipal(name) Inside Docker containers, my app was running as UID/GID that had no proper username mapping. This caused: invalid null input: name because UnixPrincipal() received null. ...

June 7, 2025

๐Ÿงš Why Run dbt Inside Airflow Docker Container

Why I Run dbt Inside Airflow Docker Container In modern data engineering pipelines, dbt and Airflow often work side by side. One common design decision is how to run dbt alongside Airflow: Should dbt run in its own container, orchestrated via API or CLI call? Or should dbt run directly inside Airflowโ€™s Docker container as part of the DAG? After experimenting with both, I prefer running dbt inside Airflowโ€™s Docker container. ...

June 4, 2025

๐Ÿณ How I Dockerized My GitHub Pages Jekyll Site โ€” The Clean Setup That Works

๐Ÿ˜ฉ The Problem Setting up Jekyll with Docker sounds easy, but I ran into: platform issues (arm64 vs amd64) - I use Apple Silicon Macbook (M1) bundle install headaches Since I was building this for my personal GitHub Pages site, I also had to make sure it stays compatible with GitHub Pages gem versions while being easy to develop locally. ๐Ÿ›  My Clean Solution I ended up building this Docker setup. It works for me at last. ...

June 3, 2025

๐Ÿ”ง ARM Mac + Docker + dbt: Troubleshooting Startup Issues

While setting up Airflow + dbt projects with Docker, you may run into this common error message and its solutions. ๐Ÿ” Problem 1: Platform Architecture Mismatch Error message: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) My Mac is running on ARM (Apple Silicon - M1/M2/M3). The official dbt Docker image is built for amd64 (x86-based). As a result, Docker tries to run cross-architecture using QEMU emulation, which sometimes leads to internal Python path issues โ†’ surfaces as the dbt dbt --version error. This is not a simple dbt bug โ€” the root cause is platform mismatch. ...

May 30, 2025

๐Ÿ”ง Solving Airflow Docker Startup Issues

Common issues you will often encounter when running Airflow with Docker. โ— Issue 1 โ€” .env file is not visible inside Airflow container ๐Ÿ” Symptom Summary The .env file exists at the project root. But inside the Airflow container, load_dotenv() fails to read it. The reason: Docker automatically passes .env as environment variables. But Docker does not copy or mount the file itself into the container. Therefore, load_dotenv() has no file to read. โœ… Solution 1๏ธโƒฃ Add volume mount for .env in docker-compose.yml This way, the .env file becomes available inside the container at the correct path. ...

May 30, 2025