Knowledge Files: Making Your GPT an Expert
Learn to upload and configure knowledge files that give your GPT specialized expertise. Master file formats, size limits, and retrieval optimization.
Premium Course Content
This lesson is part of a premium course. Upgrade to Pro to unlock all premium courses and content.
- Access all premium courses
- 1000+ AI skill templates included
- New content added weekly
Your GPT can be conversational and follow instructions perfectly — but if it doesn’t know your specific data, it’s still guessing. Knowledge files transform a generic GPT into a specialist that knows your company policies, product catalog, training materials, or any domain-specific information.
🔄 Quick Recall: In the previous lesson, you learned the ROLE-RULES-FORMAT framework for writing instructions. Knowledge files work alongside those instructions — the instructions say how to behave, the knowledge says what to know.
How Knowledge Files Work
When you upload a file to your GPT, OpenAI automatically processes it using Retrieval Augmented Generation (RAG):
- Chunking — The file is split into smaller pieces (chunks)
- Embedding — Each chunk is converted into a numerical representation (vector)
- Storage — Vectors are stored for your GPT
- Retrieval — When a user asks a question, the most relevant chunks are found
- Generation — The GPT uses those chunks as context to generate its answer
This means the GPT doesn’t memorize your entire file. It searches for relevant sections on each question — similar to how you’d flip to the right chapter in a textbook.
✅ Quick Check: In the RAG process, what triggers the retrieval of specific chunks from a knowledge file? (Answer: A user’s question — the system matches the question against stored vectors to find the most relevant chunks.)
File Limits and Best Formats
Limits:
- Up to 20 files per GPT
- Each file up to 512 MB
- Maximum 2 million tokens per file
- 10 GB per user, 100 GB per organization
Best formats (ranked by reliability):
| Format | Quality | Notes |
|---|---|---|
| Markdown (.md) | Excellent | Clear headings help chunking |
| Plain text (.txt) | Excellent | Simple and reliable |
| PDF (single column) | Good | Works well for reports and documents |
| Word (.docx) | Good | Tables and formatting preserved |
| CSV/Excel | Good | Great for structured data |
| PDF (multi-column) | Poor | Parser struggles with layout |
| PowerPoint | Poor | Positional text isn’t understood |
| Image-heavy files | Poor | Only text is processed |
Key insight: The parser works best with simple formatting. A clean Markdown file outperforms a beautifully designed PDF because the chunking algorithm can split on clear heading boundaries.
Preparing Files for Upload
Before uploading, optimize your files for retrieval:
1. Add clear section headings
Instead of a wall of text, structure with headers:
# Product Returns Policy
## Eligibility
Items can be returned within 30 days of purchase...
## Process
To initiate a return, email returns@example.com...
## Exceptions
Electronics must be returned within 14 days...
2. Front-load key information
Put the most important facts at the beginning of each section. RAG chunks have size limits, so information near the top of a section is more reliably retrieved.
3. Include context in each section
Don’t assume the GPT will read the sections in order. Each section should stand alone:
- Bad: “As mentioned above, the policy applies to…”
- Good: “The 30-day return policy applies to…”
✅ Quick Check: Why should each section in a knowledge file stand alone without referencing other sections? (Answer: RAG retrieves individual chunks, not the whole file. A chunk that says “as mentioned above” loses context because “above” isn’t included in the retrieval.)
Instructing the GPT to Use Knowledge
Uploading files isn’t enough — you need to tell the GPT how to use them. Add instructions like:
KNOWLEDGE FILE USAGE:
- Before answering any question, search the uploaded file
"company-policies.md" for relevant information
- If the answer is found in the knowledge file, cite the
specific section
- If the answer is NOT in the knowledge file, say:
"I don't have that specific information in my reference
materials. Here's what I know from general knowledge: [answer]"
- When users ask about policies, ALWAYS check the knowledge
file first — do not rely on general training
This prevents the GPT from ignoring your files and making up answers from its general training.
What to Upload (and What Not To)
Good knowledge file candidates:
- Company handbooks and policies
- Product catalogs and specifications
- FAQ documents
- Training manuals
- Style guides and brand voice documents
- Technical documentation
- Standard operating procedures
Poor knowledge file candidates:
- Frequently changing data (use Actions/APIs instead)
- Entire databases (too large, retrieval becomes unreliable)
- Confidential data you don’t want processed by OpenAI
- Image-heavy content (images aren’t processed)
Troubleshooting Knowledge Retrieval
| Problem | Likely Cause | Solution |
|---|---|---|
| GPT ignores the file | No explicit instruction to use it | Add “search the knowledge file first” to instructions |
| Wrong answers | Chunks lack context | Add context to each section header |
| Partial answers | Information split across chunks | Consolidate related info under one heading |
| “I don’t have that info” | Content not matching query terms | Use the same vocabulary users would use |
Practice Exercise
- Choose a document you know well (a work policy, product guide, or personal reference)
- Convert it to Markdown with clear headings (or use a clean PDF)
- Upload it to your GPT in the Configure tab → Knowledge section
- Add knowledge file usage instructions to your GPT’s instructions
- Test with 5 questions — verify it pulls answers from the file, not general knowledge
Key Takeaways
- Knowledge files use RAG: chunking, embedding, retrieval, then generation
- Markdown and plain text with clear headings work best for retrieval
- Explicitly instruct your GPT to check knowledge files before answering
- Each section should stand alone since RAG retrieves individual chunks
- Don’t upload frequently changing data — use API Actions for live data
Up Next
In the next lesson, you’ll configure your GPT’s capabilities — web browsing, image generation, and code interpreter — and design conversation flows that guide users to the best experience.
Knowledge Check
Complete the quiz above first
Lesson completed!