5/8

Lesson 5 15 min

Retrieval Strategies: From Basic to Advanced

Go beyond basic vector search with hybrid search, reranking, query rewriting, HyDE, and metadata filtering. Learn which strategies to combine for maximum retrieval quality.

Premium Course Content

This lesson is part of a premium course. Upgrade to Pro to unlock all premium courses and content.

Access all premium courses
1000+ AI skill templates included
New content added weekly

← Back to course overview

Basic vector search gets you 70% of the way. Advanced retrieval strategies push you to 90%+. The difference determines whether users trust your RAG system or abandon it.

🔄 Quick Recall: In the previous lesson, you learned how embeddings convert text to vectors and how vector databases enable similarity search. Now you’ll learn techniques that dramatically improve what that search actually finds.

Strategy 1: Hybrid Search

Combine vector (semantic) search with keyword (lexical) search.

Why Both Matter

Vector search finds: “return policy” when the user asks “how do I get my money back”

Keyword search finds: “PCI-DSS” when the user asks about “PCI-DSS compliance” (exact term match)

Neither alone is sufficient. Vector search misses exact terms. Keyword search misses synonyms and paraphrases.

How Hybrid Search Works

User query: "PCI DSS compliance requirements"
                    ↓
    ┌───────────────┴───────────────┐
    ↓                               ↓
Vector Search                  Keyword Search (BM25)
Top 10 results                 Top 10 results
(semantic matches)             (exact term matches)
    ↓                               ↓
    └───────────────┬───────────────┘
                    ↓
         Reciprocal Rank Fusion (RRF)
         Merge and re-score results
                    ↓
         Final ranked list (Top 10)

Reciprocal Rank Fusion (RRF) merges the two result lists by combining their ranks. A document that ranks high in both lists gets the top combined score.

When to Use Hybrid Search

Scenario	Use Hybrid?
Technical documents with acronyms	Yes — critical for exact terms
Legal documents with clause numbers	Yes — “Section 4.2” needs keyword match
General Q&A about common topics	Vector-only may suffice
Multi-language knowledge base	Yes — keyword catches proper nouns

✅ Quick Check: Your RAG system uses vector-only search. Users complain that searching for “error code E-4021” returns results about “error handling” and “troubleshooting” but not the specific error code documentation. What’s happening? (Answer: The embedding model treats “E-4021” as just another token without understanding it’s a specific identifier. Vector search finds semantically related content about errors in general, but misses the exact code. Hybrid search fixes this: the BM25 keyword component matches “E-4021” exactly, while vector search still finds related troubleshooting content. The combined result gives the specific error doc plus helpful context.)

Strategy 2: Reranking

After initial retrieval, use a more powerful model to re-score results for true relevance.

The Two-Stage Pattern

Stage 1: Cheap retrieval (vector search)
  → Returns 20-50 candidate results (fast, broad)

Stage 2: Expensive reranking (cross-encoder)
  → Scores each candidate against the query (slow, precise)
  → Returns top 5-10 truly relevant results

Why Reranking Works

Vector search embeds query and documents independently. It answers: “Are these texts generally about the same topic?”

Cross-encoder reranking processes query and document together. It answers: “Does this specific document answer this specific question?”

That distinction matters. A document about “company holidays” and a query about “When is the office closed?” are semantically distant but functionally perfect. A reranker catches this. Vector search might miss it.

Reranking Models

Model	Type	Speed	Quality
Cohere Rerank	API-based	Fast	Excellent
Jina Reranker	API-based	Fast	Good
BGE Reranker	Open-source	Medium	Good
LLM-as-reranker	Prompt-based	Slow	Highest

Strategy 3: Query Rewriting

Transform the user’s query into a better retrieval query before searching.

Common Rewrites

Original Query	Rewritten Query	Why Better
“What did we decide?”	“Meeting decisions January 2026 marketing strategy”	Adds specific context
“Is it any good?”	“Product review quality assessment [product name]”	Resolves vague pronouns
“Tell me about our PTO”	“Paid time off vacation policy employee handbook”	Expands abbreviation, adds doc context

Implementation

Use the LLM to rewrite before retrieval:

<system>
Rewrite the user's question into a search query that will find
the answer in our company knowledge base. Add specific context,
expand abbreviations, and replace pronouns with specific terms.
Return ONLY the rewritten query.
</system>

<query>What did we decide about the budget thing?</query>

Output: “Budget decision Q4 2025 marketing allocation meeting notes”

Strategy 4: Hypothetical Document Embeddings (HyDE)

For vague or underspecified queries, generate a hypothetical answer first, then search for real documents similar to that answer.

User query: "How do we handle international shipping?"

Step 1: LLM generates hypothetical answer:
  "Our international shipping policy covers delivery to
   42 countries. Standard shipping takes 7-14 business days.
   Customs fees are the customer's responsibility..."

Step 2: Embed the hypothetical answer (not the query)

Step 3: Search for real documents similar to the hypothesis
  → Finds actual shipping policy documents

Why it works: The hypothetical answer contains the vocabulary and concepts that real documents use, making the embedding much more aligned with what’s actually in your knowledge base.

When to use: Vague queries, short queries, queries where the user doesn’t know the terminology.

Strategy 5: Metadata Filtering

Narrow the search space before vector similarity:

Query: "What is the current return policy?"
Filter: document_type = "policy" AND last_updated > "2025-01-01"
Vector search: Find most similar within filtered results

Metadata filters are pre-retrieval — they run before vector search, reducing the candidate set. This improves both speed (fewer vectors to search) and quality (irrelevant categories eliminated).

✅ Quick Check: Your knowledge base contains documents from 5 departments. A user in the Sales department asks “What’s our commission structure?” Without metadata filtering, results include HR’s compensation philosophy and Finance’s revenue projections. With a filter on department=“Sales”, you get the actual Sales commission guide. Why is filtering better than relying on vector similarity alone? (Answer: Vector similarity finds topically related content — “commission” appears in HR, Finance, and Sales documents. Filtering uses structured metadata to restrict to the right context before semantic search begins. The result: higher precision with lower latency, because the vector search operates on a smaller, more relevant set.)

Combining Strategies

The most effective RAG systems layer multiple strategies:

1. Query Rewriting (improve the query)
      ↓
2. Metadata Filtering (narrow the search space)
      ↓
3. Hybrid Search (vector + keyword)
      ↓
4. Reranking (score for true relevance)
      ↓
5. Top-K Selection (pass best results to generation)

You don’t need all five for every system. Start with hybrid search + reranking — that covers most use cases.

Practice Exercise

Take a search query from your domain
Rewrite it as a better retrieval query (add context, expand terms)
Identify: would hybrid search help? What exact terms might keyword search catch that vector search would miss?

Key Takeaways

Hybrid search (vector + keyword via RRF) catches both semantic matches and exact term matches — essential for technical content
Reranking with cross-encoders dramatically improves relevance by processing query-document pairs together
Query rewriting bridges the gap between casual user language and formal document vocabulary
HyDE generates hypothetical answers to improve embedding alignment for vague queries
Metadata filtering narrows the search space before vector similarity, improving both speed and precision
Layer strategies: query rewriting → metadata filtering → hybrid search → reranking for maximum quality

Up Next

In the next lesson, you’ll learn the generation side — how to prompt the LLM to stay grounded in retrieved context, provide citations, and avoid hallucination even when the context is incomplete.

Knowledge Check

1. A user searches for 'PCI DSS compliance requirements.' Basic vector search returns results about 'security compliance' and 'data protection standards' but misses the document titled 'PCI DSS Audit Checklist' because the embedding model doesn't recognize 'PCI DSS' as a technical term. What retrieval strategy fixes this?

Use a larger embedding model Hybrid search combining vector search (catches semantic matches like 'security compliance') with keyword search (catches exact term matches like 'PCI DSS'). The keyword component finds the exact acronym even when the embedding model doesn't understand its meaning Add more documents to the knowledge base

2. After vector search returns 20 results, a reranker scores them. Result #15 (originally ranked low) gets reranked to #2. Why can a reranker find relevance that vector search missed?

The reranker uses a bigger database Rerankers use cross-encoders that process the query AND document together — they see how the specific query and document interact, while vector search embeds them independently. Cross-encoding captures nuances like 'this document answers this specific question' that independent embeddings miss The reranker uses keyword matching

3. A user asks 'Did we hit our targets last quarter?' Query rewriting transforms this into 'Q3 2025 KPI performance actual vs target.' Why does the rewritten query retrieve better results?

It's longer The rewrite adds specific context (Q3 2025, KPI, performance metrics) that matches how the information is actually stored in the knowledge base — documents are written with specific terms, not vague references like 'last quarter' or 'our targets' The rewrite is more formal

Answer all questions to check

Complete the quiz above first