beginner

Build a Weather Data Pipeline

Fetch real weather data from a public API, clean it with pandas, load it into SQLite, and automate the whole thing — the exact pipeline pattern used in real data teams.

12 steps · free

Every data team — at startups and at FAANG — runs pipelines: code that fetches data from somewhere, cleans it, loads it into a database, and runs on a schedule. It's the most common "first ticket" for new data hires.

In this project you'll build one for real weather data, end to end:

Open-Meteo API  →  fetch.py  →  raw JSON  →  transform.py  →  tidy table  →  load.py  →  SQLite  →  daily cron

What you'll build

A Python pipeline that every day pulls hourly weather for three cities, cleans it into a tidy table, and appends it to a local SQLite database — safely re-runnable, logged, and documented well enough to put on your GitHub profile.

How this works

Work through the steps in order. Lessons teach the concept. Quizzes check you got it. Milestones are where you build — you'll paste your code and output, and your AI mentor reviews it like a senior engineer would in a pull request: what's good, what's wrong, and hints (never the answer).

Prerequisites

Basic Python (variables, functions, loops, pip install). Nothing else — every tool is introduced when you need it.

What you'll learn

  • Call a real REST API and handle its responses safely
  • Clean and reshape messy JSON into tidy tables with pandas
  • Design a small database schema and load data idempotently
  • Automate and log a pipeline so it runs without you

Steps

  1. 1. Project kickoff & setup
  2. 2. The data source: APIs and JSON
  3. 3. Quiz: APIs & JSON
  4. 4. Milestone: build fetch.pyAI review
  5. 5. Cleaning data with pandas
  6. 6. Quiz: pandas cleaning
  7. 7. Milestone: build transform.pyAI review
  8. 8. Loading into SQLite
  9. 9. Quiz: databases & idempotency
  10. 10. Automation: logging and scheduling
  11. 11. Milestone: the full pipelineAI review
  12. 12. Wrap-up: make it a portfolio piece