Schema Design with AI

🔄 Quick Recall: In the last lesson, you mastered AI-powered SQL writing — the schema-first prompt pattern, JOIN verification, and iterative refinement. Now let’s design the schemas themselves.

From Business Requirements to Tables

The best schemas start with business understanding, not technical decisions. Here’s the prompt pattern:

I'm building a database for [describe your application/business].

Business domain:
- [Entity 1] can have many [Entity 2]
- Each [Entity 2] belongs to exactly one [Entity 1]
- [Entity 3] connects [Entity 1] and [Entity 4] (many-to-many)
- [List any special rules: "an order can't exist without a customer"]

Expected data volumes:
- [Entity 1]: ~[number] rows, growing [rate]
- [Entity 2]: ~[number] rows

Key queries I'll run frequently:
1. [describe query 1]
2. [describe query 2]
3. [describe query 3]

Database: PostgreSQL

Design a normalized schema with:
1. CREATE TABLE statements with appropriate data types
2. Primary keys, foreign keys, and constraints
3. Indexes for the frequent queries listed above
4. Comments explaining design decisions

✅ Quick Check: Why does the prompt include “expected data volumes” for schema design?

Because volume changes design decisions. A table with 100 rows doesn’t need indexes — PostgreSQL scans it in milliseconds. A table with 10 million rows needs strategic indexes or queries will time out. Volume also affects choices like: should you store calculated values (denormalize for speed) or compute them at query time (normalize for accuracy)?

Normalization: When to Follow the Rules

AI usually suggests fully normalized schemas (3NF). That’s often right, but not always:

Normalize when:

Data changes frequently (customer addresses, product prices)
You need data integrity guarantees (financial records)
Multiple entities reference the same data (avoid inconsistencies)

Denormalize when:

You need fast reads and can tolerate slower writes
The data rarely changes (historical records, logs)
You’re building a reporting/analytics database (star schema)

Here's my current normalized schema:
[paste your CREATE TABLE statements]

My application is read-heavy (95% reads, 5% writes). The most common query joins 4 tables and runs 1,000 times per hour.

Should I denormalize? If so, which tables should I combine and what are the trade-offs? Show me the denormalized version alongside the normalized one.

Designing Relationships

The three relationship types and how to implement them:

One-to-Many (most common): Customer → Orders. Add customer_id foreign key on the orders table.

Many-to-Many: Students ↔ Courses. Create a junction table: enrollments (student_id, course_id, enrolled_at).

One-to-One: User → Profile. Either embed in the same table or create a separate table with the same primary key.

I need to model these relationships:
- A user can have multiple projects
- Each project has multiple tasks
- Tasks can be assigned to multiple users (and users can have multiple tasks)
- Each task belongs to exactly one project
- Projects belong to one organization
- Users can belong to multiple organizations

Design the schema with junction tables where needed. Include ON DELETE behavior for each foreign key.

AI verification tip: After AI generates relationships, draw them out. If a junction table feels unnecessary (like a one-to-many disguised as many-to-many), question it. Ask AI: “Is this truly many-to-many, or would a simple foreign key suffice?”

Data Types That Matter

Choosing the right data type prevents bugs and improves performance:

Use Case	Good Choice	Bad Choice	Why
Money/currency	DECIMAL(10,2)	FLOAT	Float has rounding errors
Unique IDs	UUID or BIGSERIAL	INT	INT overflows at ~2 billion
Timestamps	TIMESTAMPTZ	TIMESTAMP	Timezone-aware prevents bugs
Short text (names)	VARCHAR(255)	TEXT	Communicates expected length
Long text (descriptions)	TEXT	VARCHAR(10000)	No practical limit needed
Yes/no flags	BOOLEAN	INT	Semantic clarity
Email/URL	TEXT + CHECK constraint	VARCHAR	Validated at app level too

Review my schema for data type issues:
[paste your CREATE TABLE statements]

Suggest better data types where appropriate. Explain why each change matters. Pay special attention to: monetary values, timestamps, and ID columns.

Adding Constraints and Indexes

AI can suggest constraints you might miss:

Here's my schema:
[paste CREATE TABLE statements]

Add:
1. NOT NULL constraints where data should always be present
2. CHECK constraints for data validation (positive amounts, valid statuses)
3. UNIQUE constraints where duplicates shouldn't exist
4. DEFAULT values where appropriate
5. Indexes for these common queries: [list your queries]

For each index, explain why it helps and what query it optimizes.

✅ Quick Check: Why should you list your common queries when asking AI to suggest indexes?

Because indexes are only valuable for queries that use them. An index on customer_name is useless if you always search by customer_id. Indexes also have a cost — they slow down INSERT and UPDATE operations. AI needs to know your query patterns to suggest indexes that provide real benefit without unnecessary overhead.

Exercise: Design a Schema from Scratch

Pick one of these scenarios (or use your own):

E-commerce platform: Products, categories, customers, orders, order items, reviews, inventory tracking.

Project management tool: Organizations, users, projects, tasks, comments, time entries, labels.

Restaurant system: Menus, items, orders, tables, reservations, customers, staff schedules.

Use the business-requirements prompt pattern to have AI design the complete schema. Then review it: check relationships, data types, constraints, and indexes.

Key Takeaways

Start schema design with business requirements and query patterns, not technical decisions
Include expected data volumes in your prompt — volume changes normalization, indexing, and data type decisions
AI tends to over-normalize or under-normalize — validate by asking “will I actually query this separately?”
Choose data types deliberately: DECIMAL for money, TIMESTAMPTZ for times, UUID or BIGSERIAL for IDs
Add constraints (NOT NULL, CHECK, UNIQUE) to enforce data integrity at the database level
Index design requires knowing your query patterns — list your frequent queries when asking AI for index recommendations

Up Next: In the next lesson, you’ll tackle data cleaning and transformation — using AI to turn messy, inconsistent data into reliable, queryable datasets.