Retrieval Strategies: From Basic to Advanced
Go beyond basic vector search with hybrid search, reranking, query rewriting, HyDE, and metadata filtering. Learn which strategies to combine for maximum retrieval quality.
Premium Course Content
This lesson is part of a premium course. Upgrade to Pro to unlock all premium courses and content.
- Access all premium courses
- 1000+ AI skill templates included
- New content added weekly
Basic vector search gets you 70% of the way. Advanced retrieval strategies push you to 90%+. The difference determines whether users trust your RAG system or abandon it.
🔄 Quick Recall: In the previous lesson, you learned how embeddings convert text to vectors and how vector databases enable similarity search. Now you’ll learn techniques that dramatically improve what that search actually finds.
Strategy 1: Hybrid Search
Combine vector (semantic) search with keyword (lexical) search.
Why Both Matter
Vector search finds: “return policy” when the user asks “how do I get my money back”
Keyword search finds: “PCI-DSS” when the user asks about “PCI-DSS compliance” (exact term match)
Neither alone is sufficient. Vector search misses exact terms. Keyword search misses synonyms and paraphrases.
How Hybrid Search Works
User query: "PCI DSS compliance requirements"
↓
┌───────────────┴───────────────┐
↓ ↓
Vector Search Keyword Search (BM25)
Top 10 results Top 10 results
(semantic matches) (exact term matches)
↓ ↓
└───────────────┬───────────────┘
↓
Reciprocal Rank Fusion (RRF)
Merge and re-score results
↓
Final ranked list (Top 10)
Reciprocal Rank Fusion (RRF) merges the two result lists by combining their ranks. A document that ranks high in both lists gets the top combined score.
When to Use Hybrid Search
| Scenario | Use Hybrid? |
|---|---|
| Technical documents with acronyms | Yes — critical for exact terms |
| Legal documents with clause numbers | Yes — “Section 4.2” needs keyword match |
| General Q&A about common topics | Vector-only may suffice |
| Multi-language knowledge base | Yes — keyword catches proper nouns |
✅ Quick Check: Your RAG system uses vector-only search. Users complain that searching for “error code E-4021” returns results about “error handling” and “troubleshooting” but not the specific error code documentation. What’s happening? (Answer: The embedding model treats “E-4021” as just another token without understanding it’s a specific identifier. Vector search finds semantically related content about errors in general, but misses the exact code. Hybrid search fixes this: the BM25 keyword component matches “E-4021” exactly, while vector search still finds related troubleshooting content. The combined result gives the specific error doc plus helpful context.)
Strategy 2: Reranking
After initial retrieval, use a more powerful model to re-score results for true relevance.
The Two-Stage Pattern
Stage 1: Cheap retrieval (vector search)
→ Returns 20-50 candidate results (fast, broad)
Stage 2: Expensive reranking (cross-encoder)
→ Scores each candidate against the query (slow, precise)
→ Returns top 5-10 truly relevant results
Why Reranking Works
Vector search embeds query and documents independently. It answers: “Are these texts generally about the same topic?”
Cross-encoder reranking processes query and document together. It answers: “Does this specific document answer this specific question?”
That distinction matters. A document about “company holidays” and a query about “When is the office closed?” are semantically distant but functionally perfect. A reranker catches this. Vector search might miss it.
Reranking Models
| Model | Type | Speed | Quality |
|---|---|---|---|
| Cohere Rerank | API-based | Fast | Excellent |
| Jina Reranker | API-based | Fast | Good |
| BGE Reranker | Open-source | Medium | Good |
| LLM-as-reranker | Prompt-based | Slow | Highest |
Strategy 3: Query Rewriting
Transform the user’s query into a better retrieval query before searching.
Common Rewrites
| Original Query | Rewritten Query | Why Better |
|---|---|---|
| “What did we decide?” | “Meeting decisions January 2026 marketing strategy” | Adds specific context |
| “Is it any good?” | “Product review quality assessment [product name]” | Resolves vague pronouns |
| “Tell me about our PTO” | “Paid time off vacation policy employee handbook” | Expands abbreviation, adds doc context |
Implementation
Use the LLM to rewrite before retrieval:
<system>
Rewrite the user's question into a search query that will find
the answer in our company knowledge base. Add specific context,
expand abbreviations, and replace pronouns with specific terms.
Return ONLY the rewritten query.
</system>
<query>What did we decide about the budget thing?</query>
Output: “Budget decision Q4 2025 marketing allocation meeting notes”
Strategy 4: Hypothetical Document Embeddings (HyDE)
For vague or underspecified queries, generate a hypothetical answer first, then search for real documents similar to that answer.
User query: "How do we handle international shipping?"
Step 1: LLM generates hypothetical answer:
"Our international shipping policy covers delivery to
42 countries. Standard shipping takes 7-14 business days.
Customs fees are the customer's responsibility..."
Step 2: Embed the hypothetical answer (not the query)
Step 3: Search for real documents similar to the hypothesis
→ Finds actual shipping policy documents
Why it works: The hypothetical answer contains the vocabulary and concepts that real documents use, making the embedding much more aligned with what’s actually in your knowledge base.
When to use: Vague queries, short queries, queries where the user doesn’t know the terminology.
Strategy 5: Metadata Filtering
Narrow the search space before vector similarity:
Query: "What is the current return policy?"
Filter: document_type = "policy" AND last_updated > "2025-01-01"
Vector search: Find most similar within filtered results
Metadata filters are pre-retrieval — they run before vector search, reducing the candidate set. This improves both speed (fewer vectors to search) and quality (irrelevant categories eliminated).
✅ Quick Check: Your knowledge base contains documents from 5 departments. A user in the Sales department asks “What’s our commission structure?” Without metadata filtering, results include HR’s compensation philosophy and Finance’s revenue projections. With a filter on department=“Sales”, you get the actual Sales commission guide. Why is filtering better than relying on vector similarity alone? (Answer: Vector similarity finds topically related content — “commission” appears in HR, Finance, and Sales documents. Filtering uses structured metadata to restrict to the right context before semantic search begins. The result: higher precision with lower latency, because the vector search operates on a smaller, more relevant set.)
Combining Strategies
The most effective RAG systems layer multiple strategies:
1. Query Rewriting (improve the query)
↓
2. Metadata Filtering (narrow the search space)
↓
3. Hybrid Search (vector + keyword)
↓
4. Reranking (score for true relevance)
↓
5. Top-K Selection (pass best results to generation)
You don’t need all five for every system. Start with hybrid search + reranking — that covers most use cases.
Practice Exercise
- Take a search query from your domain
- Rewrite it as a better retrieval query (add context, expand terms)
- Identify: would hybrid search help? What exact terms might keyword search catch that vector search would miss?
Key Takeaways
- Hybrid search (vector + keyword via RRF) catches both semantic matches and exact term matches — essential for technical content
- Reranking with cross-encoders dramatically improves relevance by processing query-document pairs together
- Query rewriting bridges the gap between casual user language and formal document vocabulary
- HyDE generates hypothetical answers to improve embedding alignment for vague queries
- Metadata filtering narrows the search space before vector similarity, improving both speed and precision
- Layer strategies: query rewriting → metadata filtering → hybrid search → reranking for maximum quality
Up Next
In the next lesson, you’ll learn the generation side — how to prompt the LLM to stay grounded in retrieved context, provide citations, and avoid hallucination even when the context is incomplete.
Knowledge Check
Complete the quiz above first
Lesson completed!