Schema Design with AI
Design robust database schemas with AI — from business requirements to normalized tables, relationships, indexes, and constraints that scale with your application.
Premium Course Content
This lesson is part of a premium course. Upgrade to Pro to unlock all premium courses and content.
- Access all premium courses
- 1000+ AI skill templates included
- New content added weekly
🔄 Quick Recall: In the last lesson, you mastered AI-powered SQL writing — the schema-first prompt pattern, JOIN verification, and iterative refinement. Now let’s design the schemas themselves.
From Business Requirements to Tables
The best schemas start with business understanding, not technical decisions. Here’s the prompt pattern:
I'm building a database for [describe your application/business].
Business domain:
- [Entity 1] can have many [Entity 2]
- Each [Entity 2] belongs to exactly one [Entity 1]
- [Entity 3] connects [Entity 1] and [Entity 4] (many-to-many)
- [List any special rules: "an order can't exist without a customer"]
Expected data volumes:
- [Entity 1]: ~[number] rows, growing [rate]
- [Entity 2]: ~[number] rows
Key queries I'll run frequently:
1. [describe query 1]
2. [describe query 2]
3. [describe query 3]
Database: PostgreSQL
Design a normalized schema with:
1. CREATE TABLE statements with appropriate data types
2. Primary keys, foreign keys, and constraints
3. Indexes for the frequent queries listed above
4. Comments explaining design decisions
✅ Quick Check: Why does the prompt include “expected data volumes” for schema design?
Because volume changes design decisions. A table with 100 rows doesn’t need indexes — PostgreSQL scans it in milliseconds. A table with 10 million rows needs strategic indexes or queries will time out. Volume also affects choices like: should you store calculated values (denormalize for speed) or compute them at query time (normalize for accuracy)?
Normalization: When to Follow the Rules
AI usually suggests fully normalized schemas (3NF). That’s often right, but not always:
Normalize when:
- Data changes frequently (customer addresses, product prices)
- You need data integrity guarantees (financial records)
- Multiple entities reference the same data (avoid inconsistencies)
Denormalize when:
- You need fast reads and can tolerate slower writes
- The data rarely changes (historical records, logs)
- You’re building a reporting/analytics database (star schema)
Here's my current normalized schema:
[paste your CREATE TABLE statements]
My application is read-heavy (95% reads, 5% writes). The most common query joins 4 tables and runs 1,000 times per hour.
Should I denormalize? If so, which tables should I combine and what are the trade-offs? Show me the denormalized version alongside the normalized one.
Designing Relationships
The three relationship types and how to implement them:
One-to-Many (most common):
Customer → Orders. Add customer_id foreign key on the orders table.
Many-to-Many:
Students ↔ Courses. Create a junction table: enrollments (student_id, course_id, enrolled_at).
One-to-One: User → Profile. Either embed in the same table or create a separate table with the same primary key.
I need to model these relationships:
- A user can have multiple projects
- Each project has multiple tasks
- Tasks can be assigned to multiple users (and users can have multiple tasks)
- Each task belongs to exactly one project
- Projects belong to one organization
- Users can belong to multiple organizations
Design the schema with junction tables where needed. Include ON DELETE behavior for each foreign key.
AI verification tip: After AI generates relationships, draw them out. If a junction table feels unnecessary (like a one-to-many disguised as many-to-many), question it. Ask AI: “Is this truly many-to-many, or would a simple foreign key suffice?”
Data Types That Matter
Choosing the right data type prevents bugs and improves performance:
| Use Case | Good Choice | Bad Choice | Why |
|---|---|---|---|
| Money/currency | DECIMAL(10,2) | FLOAT | Float has rounding errors |
| Unique IDs | UUID or BIGSERIAL | INT | INT overflows at ~2 billion |
| Timestamps | TIMESTAMPTZ | TIMESTAMP | Timezone-aware prevents bugs |
| Short text (names) | VARCHAR(255) | TEXT | Communicates expected length |
| Long text (descriptions) | TEXT | VARCHAR(10000) | No practical limit needed |
| Yes/no flags | BOOLEAN | INT | Semantic clarity |
| Email/URL | TEXT + CHECK constraint | VARCHAR | Validated at app level too |
Review my schema for data type issues:
[paste your CREATE TABLE statements]
Suggest better data types where appropriate. Explain why each change matters. Pay special attention to: monetary values, timestamps, and ID columns.
Adding Constraints and Indexes
AI can suggest constraints you might miss:
Here's my schema:
[paste CREATE TABLE statements]
Add:
1. NOT NULL constraints where data should always be present
2. CHECK constraints for data validation (positive amounts, valid statuses)
3. UNIQUE constraints where duplicates shouldn't exist
4. DEFAULT values where appropriate
5. Indexes for these common queries: [list your queries]
For each index, explain why it helps and what query it optimizes.
✅ Quick Check: Why should you list your common queries when asking AI to suggest indexes?
Because indexes are only valuable for queries that use them. An index on customer_name is useless if you always search by customer_id. Indexes also have a cost — they slow down INSERT and UPDATE operations. AI needs to know your query patterns to suggest indexes that provide real benefit without unnecessary overhead.
Exercise: Design a Schema from Scratch
Pick one of these scenarios (or use your own):
E-commerce platform: Products, categories, customers, orders, order items, reviews, inventory tracking.
Project management tool: Organizations, users, projects, tasks, comments, time entries, labels.
Restaurant system: Menus, items, orders, tables, reservations, customers, staff schedules.
Use the business-requirements prompt pattern to have AI design the complete schema. Then review it: check relationships, data types, constraints, and indexes.
Key Takeaways
- Start schema design with business requirements and query patterns, not technical decisions
- Include expected data volumes in your prompt — volume changes normalization, indexing, and data type decisions
- AI tends to over-normalize or under-normalize — validate by asking “will I actually query this separately?”
- Choose data types deliberately: DECIMAL for money, TIMESTAMPTZ for times, UUID or BIGSERIAL for IDs
- Add constraints (NOT NULL, CHECK, UNIQUE) to enforce data integrity at the database level
- Index design requires knowing your query patterns — list your frequent queries when asking AI for index recommendations
Up Next: In the next lesson, you’ll tackle data cleaning and transformation — using AI to turn messy, inconsistent data into reliable, queryable datasets.
Knowledge Check
Complete the quiz above first
Lesson completed!