Knowledge Files: Making Your GPT an Expert

Your GPT can be conversational and follow instructions perfectly — but if it doesn’t know your specific data, it’s still guessing. Knowledge files transform a generic GPT into a specialist that knows your company policies, product catalog, training materials, or any domain-specific information.

🔄 Quick Recall: In the previous lesson, you learned the ROLE-RULES-FORMAT framework for writing instructions. Knowledge files work alongside those instructions — the instructions say how to behave, the knowledge says what to know.

How Knowledge Files Work

When you upload a file to your GPT, OpenAI automatically processes it using Retrieval Augmented Generation (RAG):

Chunking — The file is split into smaller pieces (chunks)
Embedding — Each chunk is converted into a numerical representation (vector)
Storage — Vectors are stored for your GPT
Retrieval — When a user asks a question, the most relevant chunks are found
Generation — The GPT uses those chunks as context to generate its answer

This means the GPT doesn’t memorize your entire file. It searches for relevant sections on each question — similar to how you’d flip to the right chapter in a textbook.

✅ Quick Check: In the RAG process, what triggers the retrieval of specific chunks from a knowledge file? (Answer: A user’s question — the system matches the question against stored vectors to find the most relevant chunks.)

File Limits and Best Formats

Limits:

Up to 20 files per GPT
Each file up to 512 MB
Maximum 2 million tokens per file
10 GB per user, 100 GB per organization

Best formats (ranked by reliability):

Format	Quality	Notes
Markdown (.md)	Excellent	Clear headings help chunking
Plain text (.txt)	Excellent	Simple and reliable
PDF (single column)	Good	Works well for reports and documents
Word (.docx)	Good	Tables and formatting preserved
CSV/Excel	Good	Great for structured data
PDF (multi-column)	Poor	Parser struggles with layout
PowerPoint	Poor	Positional text isn’t understood
Image-heavy files	Poor	Only text is processed

Key insight: The parser works best with simple formatting. A clean Markdown file outperforms a beautifully designed PDF because the chunking algorithm can split on clear heading boundaries.

Preparing Files for Upload

Before uploading, optimize your files for retrieval:

1. Add clear section headings

Instead of a wall of text, structure with headers:

# Product Returns Policy

## Eligibility
Items can be returned within 30 days of purchase...

## Process
To initiate a return, email returns@example.com...

## Exceptions
Electronics must be returned within 14 days...

2. Front-load key information

Put the most important facts at the beginning of each section. RAG chunks have size limits, so information near the top of a section is more reliably retrieved.

3. Include context in each section

Don’t assume the GPT will read the sections in order. Each section should stand alone:

Bad: “As mentioned above, the policy applies to…”
Good: “The 30-day return policy applies to…”

✅ Quick Check: Why should each section in a knowledge file stand alone without referencing other sections? (Answer: RAG retrieves individual chunks, not the whole file. A chunk that says “as mentioned above” loses context because “above” isn’t included in the retrieval.)

Instructing the GPT to Use Knowledge

Uploading files isn’t enough — you need to tell the GPT how to use them. Add instructions like:

KNOWLEDGE FILE USAGE:
- Before answering any question, search the uploaded file
  "company-policies.md" for relevant information
- If the answer is found in the knowledge file, cite the
  specific section
- If the answer is NOT in the knowledge file, say:
  "I don't have that specific information in my reference
  materials. Here's what I know from general knowledge: [answer]"
- When users ask about policies, ALWAYS check the knowledge
  file first — do not rely on general training

This prevents the GPT from ignoring your files and making up answers from its general training.

What to Upload (and What Not To)

Good knowledge file candidates:

Company handbooks and policies
Product catalogs and specifications
FAQ documents
Training manuals
Style guides and brand voice documents
Technical documentation
Standard operating procedures

Poor knowledge file candidates:

Frequently changing data (use Actions/APIs instead)
Entire databases (too large, retrieval becomes unreliable)
Confidential data you don’t want processed by OpenAI
Image-heavy content (images aren’t processed)

Troubleshooting Knowledge Retrieval

Problem	Likely Cause	Solution
GPT ignores the file	No explicit instruction to use it	Add “search the knowledge file first” to instructions
Wrong answers	Chunks lack context	Add context to each section header
Partial answers	Information split across chunks	Consolidate related info under one heading
“I don’t have that info”	Content not matching query terms	Use the same vocabulary users would use

Practice Exercise

Choose a document you know well (a work policy, product guide, or personal reference)
Convert it to Markdown with clear headings (or use a clean PDF)
Upload it to your GPT in the Configure tab → Knowledge section
Add knowledge file usage instructions to your GPT’s instructions
Test with 5 questions — verify it pulls answers from the file, not general knowledge

Key Takeaways

Knowledge files use RAG: chunking, embedding, retrieval, then generation
Markdown and plain text with clear headings work best for retrieval
Explicitly instruct your GPT to check knowledge files before answering
Each section should stand alone since RAG retrieves individual chunks
Don’t upload frequently changing data — use API Actions for live data

Up Next

In the next lesson, you’ll configure your GPT’s capabilities — web browsing, image generation, and code interpreter — and design conversation flows that guide users to the best experience.