UUID Pitfalls in Spark β†’ Kafka β†’ Postgres Pipelines

I was building a data pipeline using Kafka and Spark structured streaming. Fully containerized. The stack: Kafka for streaming transaction data Spark Structured Streaming for real-time processing and fraud detection Postgres as the data warehouse Everything was smooth. Until one tiny villain showed up: UUID fields. Yes β€” UUIDs. Here’s exactly what happened (so you can avoid the same headache). βœ… The Original Design I designed the tables in Postgres like this: ...

June 7, 2025