UUID Pitfalls in Spark β†’ Kafka β†’ Postgres Pipelines

I was building a data pipeline using Kafka and Spark structured streaming. Fully containerized. The stack: Kafka for streaming transaction data Spark Structured Streaming for real-time processing and fraud detection Postgres as the data warehouse Everything was smooth. Until one tiny villain showed up: UUID fields. Yes β€” UUIDs. Here’s exactly what happened (so you can avoid the same headache). βœ… The Original Design I designed the tables in Postgres like this: ...

June 7, 2025

πŸ”§ Solving Airflow Docker Startup Issues

Common issues you will often encounter when running Airflow with Docker. ❗ Issue 1 β€” .env file is not visible inside Airflow container πŸ” Symptom Summary The .env file exists at the project root. But inside the Airflow container, load_dotenv() fails to read it. The reason: Docker automatically passes .env as environment variables. But Docker does not copy or mount the file itself into the container. Therefore, load_dotenv() has no file to read. βœ… Solution 1️⃣ Add volume mount for .env in docker-compose.yml This way, the .env file becomes available inside the container at the correct path. ...

May 30, 2025