advanced

Build a RAG Chatbot Over Your Own Docs

Build a Retrieval-Augmented Generation chatbot in Python that answers questions over your own project docs — embeddings from scratch, cosine-similarity search, grounded prompts with citations, and a real eval set. The exact architecture behind every "chat with your docs" product.

12 steps · free

Every "chat with your docs" product — internal knowledge bots, support assistants, code Q&A tools — is the same architecture underneath: retrieve the relevant chunks, stuff them into a prompt, generate a grounded answer. It's called RAG (Retrieval-Augmented Generation), and it's the most-asked-about LLM pattern in engineering interviews right now.

In this project you'll build one over docs you already own — the READMEs and design notes from your weather pipeline (Project 1) and your ML API (Project 2), plus a small corpus you assemble:

docs/*.md  →  chunk  →  embed (all-MiniLM-L6-v2)  →  numpy matrix
                                                          │
question  →  embed  →  cosine top-k  →  grounded prompt  →  Claude  →  cited answer (or refusal)

What you'll build

A command-line chatbot that answers questions about your own projects, cites which doc each claim came from, refuses to answer when the corpus doesn't contain the answer, and ships with an eval set proving it works. No vector database, no framework — you'll build the retrieval layer yourself with sentence-transformers and numpy so you actually understand what Chroma and pgvector do for you later.

How this works

Work through the steps in order. Lessons teach the concept. Quizzes check you got it. Milestones are where you build — you'll paste your code and output, and your AI mentor reviews it like a senior engineer would in a pull request: what's good, what's wrong, and hints (never the answer).

Prerequisites

Projects 1 and 2 completed. You'll reuse their READMEs as part of your corpus, and you'll need the Python habits they built (env handling, clean functions, main guards).
An Anthropic API key with about $2 of credit — that comfortably covers every milestone plus experimentation. Sign up at console.anthropic.com. You'll learn how to handle the key safely in step 2; never paste it into code.
pip install anthropic sentence-transformers numpy. The embedding model downloads ~90 MB once and runs locally, free.

What you'll learn

Call the Anthropic API safely — keys in env vars, typed error handling, token and cost logging
Chunk and embed a documentation corpus, then search it with cosine similarity built from scratch in numpy
Wire a full RAG pipeline that answers with citations and refuses when retrieval comes up empty
Evaluate a RAG system against a Q/A eval set and explain the production upgrade path (vector DBs, reranking, streaming)

Steps

1. How LLMs actually work (for builders)
2. Calling the Anthropic API safely
3. Quiz: LLMs & the API
4. Milestone: build ask_claude.pyAI review
5. Why RAG: chunking, embeddings, and cosine similarity
6. Quiz: RAG, chunking & embeddings
7. Milestone: build retriever.pyAI review
8. Wiring RAG: grounded prompts, citations, and refusal
9. Quiz: grounding, refusal & evaluation
10. Milestone: build rag.pyAI review
11. The chat loop: conversation history done right
12. Wrap-up: you finished the track