RAG LLM Website chatbot

On this website, I built a Retrieval-Augmented Generation (RAG) assistant that answers questions using my own portfolio content (project descriptions, tags, README files, plus a public “About Max” knowledge file). Its purpose is practical: it helps visitors (especially recruiters and engineers) understand each project quickly, and it’s also meant to support people who try to recreate the work—for example if they run into issues installing dependencies, running the code, reproducing results, or following setup instructions. The system is scope-aware: on the home page, it can retrieve from the global knowledge base (e.g., “About Max” + general portfolio content), but on each individual project page, retrieval is hard-filtered to that project only—meaning the chatbot searches only the embeddings/chunks attached to that specific project (project_id filter), so content from other projects cannot “bleed” into the answer. When you ask a question, the system first runs a quality gate to detect short/generic inputs (e.g., “hello”) and routes them to a normal LLM response instead of forcing retrieval. For meaningful queries, it generates an embedding for the question, retrieves the most relevant chunks using a hybrid search (dense semantic similarity + BM25 keyword scoring), and applies a second gate that rejects weak matches to avoid irrelevant context. If retrieval is strong, it injects only the selected snippets into the prompt and the model answers strictly from the provided context—and explicitly says when the context doesn’t contain the answer. The UI also exposes a “top match” score so you can see retrieval confidence in real time. To validate behavior, I created an automated test suite (rag_run_tests) that checks both bypass cases (generic greetings should not trigger RAG) and grounded Q&A (project-specific questions must retrieve only that project’s sources; “About Max” questions must retrieve only the public bio knowledge). The result is a portfolio chatbot that is transparent, testable, and designed to avoid hallucinations by defaulting to general LLM behavior when retrieval evidence is weak—and by enforcing strict per-page retrieval boundaries on project pages. You can download a more specific readme file here.

How it works?