πŸ“š Building a 25-Year Backfill Pipeline for the National Library of Korea API

How I Designed a Reliable, Auto-Resuming ETL to Collect Decades of Book Data β€” Without Airflow 1. Why I Built This The National Library of Korea (NLK) provides a public API called Seoji β€” a bibliographic catalog of all registered books in Korea. I wanted to collect the entire dataset, from January 2000 to December 2024, and store it in my PostgreSQL database (Supabase). It sounded simple at first β€” just a loop over API pages. But in practice, I had to solve: ...

October 22, 2025

πŸ“Š What dbt Does Well vs What Python Does Better

Role dbt Does Well Python Does Better Structured data cleaning (staging) βœ… Possible, but inconvenient Designing mart table structures βœ… Also possible User-specific calculations ❌ Inconvenient βœ… Super flexible Scoring, conditional matching, if-else logic ❌ Very cumbersome βœ… Ideal Filtering based on user input ❌ Not possible βœ… Core feature Explaining recommendations, tuning logic ❌ βœ… Fully customizable For Example -- This kind of logic is painful in dbt... SELECT CASE WHEN user.age BETWEEN policy.min_age AND policy.max_age THEN 30 ELSE 0 END + CASE WHEN user.income < policy.income_ceiling THE_ ELSE 0 END + ... In dbt, the concept of a β€œuser” doesn’t even exist dbt is built for models that apply the same logic to everyone Python, on the other hand, can generate different recommendations per user based on input πŸ‘‰ dbt is great for static modeling, but dynamic, user-input-driven recommender systems are better suited for Python. ...

May 12, 2025

πŸš€ Building a Batch Data Pipeline with AWS, Airflow, and Spark

✨ Project Summary Assuming I am working for a fintech company, I built a batch pipeline that automatically aggregates β†’ transforms β†’ analyzes credit card data. Since I couldn’t use real data, I used synthetic transaction data generated using Faker, but I believe it was sufficient for the purpose of designing the overall data flow and structure. 🎯 Goal β€œBuild an Airflow pipeline that processes realistic financial data with Spark, analyzes and stores them.” ...

May 1, 2025