Python

📚 Building a 25-Year Backfill Pipeline for the National Library of Korea API

How I Designed a Reliable, Auto-Resuming ETL to Collect Decades of Book Data — Without Airflow 1. Why I Built This The National Library of Korea (NLK) provides a public API called Seoji — a bibliographic catalog of all registered books in Korea. I wanted to collect the entire dataset, from January 2000 to December 2024, and store it in my PostgreSQL database (Supabase). It sounded simple at first — just a loop over API pages. But in practice, I had to solve: ...

📊 What dbt Does Well vs What Python Does Better

Role dbt Does Well Python Does Better Structured data cleaning (staging) ✅ Possible, but inconvenient Designing mart table structures ✅ Also possible User-specific calculations ❌ Inconvenient ✅ Super flexible Scoring, conditional matching, if-else logic ❌ Very cumbersome ✅ Ideal Filtering based on user input ❌ Not possible ✅ Core feature Explaining recommendations, tuning logic ❌ ✅ Fully customizable For Example -- This kind of logic is painful in dbt... SELECT CASE WHEN user.age BETWEEN policy.min_age AND policy.max_age THEN 30 ELSE 0 END + CASE WHEN user.income < policy.income_ceiling THE_ ELSE 0 END + ... In dbt, the concept of a “user” doesn’t even exist dbt is built for models that apply the same logic to everyone Python, on the other hand, can generate different recommendations per user based on input 👉 dbt is great for static modeling, but dynamic, user-input-driven recommender systems are better suited for Python. ...

🚀 Building a Batch Data Pipeline with AWS, Airflow, and Spark

✨ Project Summary Assuming I am working for a fintech company, I built a batch pipeline that automatically aggregates → transforms → analyzes credit card data. Since I couldn’t use real data, I used synthetic transaction data generated using Faker, but I believe it was sufficient for the purpose of designing the overall data flow and structure. 🎯 Goal “Build an Airflow pipeline that processes realistic financial data with Spark, analyzes and stores them.” ...