PythonFastAPIPostgreSQLDockerPandas

ETL Data Pipeline

Robust ETL pipeline that extracts data from files, cleans and transforms it, then loads into a PostgreSQL data warehouse. Fully containerized with Docker.

ETL Data Pipeline

A containerized ETL pipeline with a FastAPI control plane trigger jobs via API, track their status, and load processed data into a PostgreSQL data warehouse. Built for reliability: idempotent steps, job-level tracking, and restartability from any point of failure.

Problem

Ad-hoc ETL scripts are hard to monitor, impossible to restart safely, and brittle when data arrives late or malformed. This project brings engineering discipline to the ETL layer.

Solution

FastAPI endpoints trigger pipeline jobs stored in a PostgreSQL job tracking table. Each step (extract → transform → load) is idempotent: re-running it produces the same result. Background tasks handle heavy processing without blocking the API.