RAG Workflows: AI That Knows Your Data
Build a RAG Knowledge Base Bot — embed your documents in a vector store, then ask questions and get answers grounded in your actual data.
🔄 Your agent has tools (Lesson 4) and memory (Lesson 5). But when a customer asks “What’s your refund policy?”, the agent has to guess — because it’s never read your documentation. RAG changes that. Instead of hoping the LLM knows the answer, you give it your actual documents and let it find the relevant information before responding.
What Is RAG?
Retrieval-Augmented Generation is a three-step process:
- Embed — Convert your documents into numerical vectors (embeddings) and store them in a vector database
- Retrieve — When a question arrives, find the most relevant document chunks by comparing the question’s embedding to stored embeddings
- Generate — Feed the retrieved chunks + the question to an LLM, which generates an answer grounded in your actual data
The result: an AI that answers questions from your documents instead of making things up. The LLM’s response is grounded in your data — and it can cite where the information came from.
User Question → Embed Question → Search Vector Store → Top 3 Chunks
↓
LLM: "Based on these chunks, here's the answer..."
Why RAG instead of stuffing documents into the prompt? Token limits. GPT-4o has a 128K context window — big, but a mid-size company’s documentation easily exceeds that. RAG lets you search millions of documents and only feed the relevant chunks to the LLM.
✅ Quick Check: A company has 500 pages of product documentation. Why can’t they just paste all of it into the LLM prompt? (Answer: Token limits and cost. Even 128K token models can’t hold 500 pages of text. And even if they could, processing all that context for every question would be extremely expensive and slow. RAG retrieves only the 3-5 most relevant passages, keeping costs low and responses fast.)
The RAG Pipeline in n8n
n8n’s RAG pipeline uses four types of nodes:
1. Document Loaders — Ingest your source material
- PDF Loader, Google Drive Loader, Notion Loader, Web Scraper
- Converts documents into text that can be chunked and embedded
2. Text Splitters — Break documents into chunks
- Recursive Character Text Splitter (default, works well for most text)
- Token Text Splitter (splits by token count — more precise for LLM input)
- Configure chunk size (typically 500-1000 tokens) and overlap (10-20%)
3. Embeddings — Convert text chunks into vectors
- OpenAI Embeddings (
text-embedding-3-smallis cheap and effective) - Cohere, HuggingFace, or local models via Ollama
4. Vector Stores — Store and search embeddings
- Supabase (pgvector) — free tier available, persistent, SQL-queryable
- Pinecone — managed service, high performance, free tier
- Qdrant — open source, self-hostable, great for privacy
- In-Memory — testing only (resets on restart)
Build: RAG Knowledge Base Bot
You’ll build a bot that answers questions from a set of documents. We’ll use Supabase as the vector store because it’s free, persistent, and pairs well with n8n.
Part A: Document Ingestion Pipeline
This workflow embeds your documents into the vector store. You run it once (or whenever documents change).
Step 1: Create the ingestion workflow
- New workflow → add a Manual Trigger
- Add a Document Loader — for this example, use the Default Data Loader and paste some test text, or use the PDF Loader if you have a PDF to upload
- Add a Recursive Character Text Splitter:
- Chunk Size: 800 characters
- Chunk Overlap: 100 characters
- Add a Supabase Vector Store node in Insert mode:
- Connect to your Supabase instance (create a free project at supabase.com)
- Table:
documents(you’ll need to enable the pgvector extension and create this table — Supabase has a one-click setup for this)
- Add OpenAI Embeddings sub-node to the vector store:
- Model:
text-embedding-3-small - This converts each text chunk into a 1536-dimensional vector
- Model:
Connect them: Trigger → Loader → Splitter → Vector Store (with Embeddings attached)
Step 2: Run the ingestion
Click “Test workflow.” Watch as your documents are chunked, embedded, and stored. Check your Supabase dashboard — you should see rows in the documents table, each with a text chunk and its vector embedding.
Part B: Question-Answering Workflow
This workflow takes user questions and retrieves answers from your stored documents.
Step 1: Create the Q&A workflow
- New workflow → add a Chat Trigger
- Add a Q&A Chain node (not the AI Agent — Q&A Chain is optimized for document retrieval)
- Attach an OpenAI Chat Model sub-node (gpt-4o-mini is fine for Q&A)
- Attach a Supabase Vector Store sub-node in Retrieve mode:
- Connect to the same Supabase instance and table
- Top K: 4 (retrieves the 4 most relevant chunks)
- Attach OpenAI Embeddings (same model you used for ingestion — this is important)
Step 2: Test it
Click “Test workflow.” Ask questions about your documents:
- “What’s the refund policy?”
- “How do I reset my password?”
- “What features are included in the Pro plan?”
The Q&A Chain will search the vector store, retrieve the most relevant chunks, and generate an answer grounded in your actual documents.
✅ Quick Check: You ingested documents using OpenAI’s
text-embedding-3-smallmodel. For the retrieval workflow, can you use a different embedding model? (Answer: No. You must use the same embedding model for both ingestion and retrieval. Different models produce different vector representations — searching with mismatched embeddings returns irrelevant results. This is a common mistake when switching models mid-project.)
Chunking: The Make-or-Break Decision
Chunking strategy determines your RAG quality more than model selection. Bad chunks = bad answers.
| Chunk Size | Pros | Cons | Good For |
|---|---|---|---|
| Small (200-400 tokens) | Precise retrieval | May split important context | FAQ-style knowledge bases |
| Medium (500-1000 tokens) | Balanced precision/context | Standard tradeoff | Most documentation |
| Large (1000-2000 tokens) | Full context preserved | May include irrelevant text | Long-form articles, reports |
Overlap matters too. Without overlap, the boundary between chunks can split a sentence — and the critical information lives in neither chunk. A 10-20% overlap ensures sentences at chunk boundaries appear in both adjacent chunks.
Rule of thumb: Start with 800-token chunks and 100-token overlap. Test with real questions. If answers miss relevant context, increase chunk size. If answers include too much irrelevant text, decrease it.
RAG vs. Fine-Tuning vs. Long Context
Three approaches to giving an LLM your knowledge:
| Approach | When to Use | Tradeoff |
|---|---|---|
| RAG | Dynamic knowledge that changes (docs, policies, product info) | Depends on retrieval quality — garbage in, garbage out |
| Fine-Tuning | Teaching the model a specific style or pattern | Expensive, slow to update, requires training data |
| Long Context | Small, fixed knowledge base (<100 pages) | Expensive per query, no scaling beyond context limit |
For most business workflows, RAG is the right choice. Your docs change, your policies update, your product evolves — and you don’t want to retrain a model every time.
Advanced: RAG with the AI Agent
The Q&A Chain is great for straightforward document retrieval. But what if you want an agent that can both search your documents and use other tools (web search, code execution)?
Connect the vector store as a Vector Store Tool to the AI Agent node instead of using the Q&A Chain. The agent can then decide when to check your documents vs. when to search the web — combining RAG with the tool-use patterns from Lesson 4.
Update the system prompt:
You have access to:
- Company Knowledge Base (vector store): Use for internal policies, product docs, procedures
- Web Search: Use for external information, industry benchmarks, competitor data
Always check the knowledge base FIRST for internal questions.
Only use web search if the knowledge base doesn't have the answer.
Key Takeaways
- RAG lets your AI answer questions from your actual documents — not from the LLM’s training data
- The pipeline is: ingest (load → chunk → embed → store) then query (question → embed → search → generate)
- Chunking is the most impactful decision — start with 800 tokens and 100 overlap, then tune
- Use the same embedding model for both ingestion and retrieval — mismatched models produce bad results
- Supabase (pgvector) is a solid free option for vector storage; Pinecone and Qdrant are alternatives
- Combine RAG with the AI Agent to build assistants that search your docs and use external tools
Up Next
You’ve now built AI workflows with classification, agents, memory, and RAG. But none of them are production-ready. What happens when the LLM API times out? When your vector store is down? When credentials expire? In Lesson 7, you’ll learn production patterns — error handling, retry strategies, credential management, and monitoring — that turn your prototypes into workflows you can trust.
Knowledge Check
Complete the quiz above first
Lesson completed!