πŸ“š Building a 25-Year Backfill Pipeline for the National Library of Korea API

How I Designed a Reliable, Auto-Resuming ETL to Collect Decades of Book Data β€” Without Airflow 1. Why I Built This The National Library of Korea (NLK) provides a public API called Seoji β€” a bibliographic catalog of all registered books in Korea. I wanted to collect the entire dataset, from January 2000 to December 2024, and store it in my PostgreSQL database (Supabase). It sounded simple at first β€” just a loop over API pages. But in practice, I had to solve: ...

October 22, 2025

πŸš€ Building a Fintech Batch ETL Pipeline β€” the Modular Way

πŸ‘‰ Code, Portfolio, Blog, and LinkedIn 🎯 Batch Pipeline for Transaction Data Imagine: K-pop demon hunters launches a fintech startup for the fans. Now they have to deal with millions of credit card transactions every day β€” and they need to make sense of them. ...

September 12, 2025

How PostgreSQL Surprises You: Booleans, Text I/O, and ETL Gotchas

PostgreSQL is a powerful, standards-compliant database β€” but it has its quirks. One of those is how it handles boolean values, especially when exporting data in text format. 🧠 PostgreSQL Boolean Behavior: It’s Not What You Think Internally, PostgreSQL stores boolean values efficiently using just 1 bit β€” as you’d expect. But when you convert those values to text, say in a query or an export via COPY, things look… different: ...

June 10, 2025

πŸ”§ Why Do We Split Airflow into init, scheduler, and webserver?

If you start working with Airflow a bit more seriously, you’ll quickly notice that it’s usually split into multiple services: airflow-init airflow-scheduler airflow-webserver At first, you may wonder: β€œWhy do we need to split them up like this?” Well β€” this is actually the standard production architecture. Let’s break it down in simple, practical terms. 1️⃣ airflow-init β€” Preparation Step Also sometimes called airflow-db-migrate or airflow-bootstrap. This runs only once when you initialize Airflow. ...

May 30, 2025