Testing and Quality Assurance

The Test You Didn’t Write

In the previous lesson, we explored debugging and error resolution with ai. Now let’s build on that foundation. Here’s a scenario every developer has lived: you write a function, test it manually with a couple of inputs, it works, you ship it. Three weeks later, a user finds a bug. The input? An empty string. Or a negative number. Or a list with one item instead of many.

You knew you should have tested those cases. You just… didn’t.

AI doesn’t forget edge cases. It doesn’t get bored writing the fifteenth test. And it doesn’t skip the weird inputs because “that’ll never happen in production.” Let’s put that to work.

The AI Testing Workflow

Here’s the workflow you’ll use for every testing session:

Provide the code under test with its dependencies
Describe the expected behavior (not just what the code does—what it should do)
Request specific test types (unit, integration, edge cases)
Review and refine the generated tests
Add manual tests for domain-specific cases the AI might miss

Let’s walk through each step with a real example.

Step 1: Give AI the Full Picture

Say you have this function:

def calculate_discount(
    subtotal: float,
    coupon_code: str | None,
    is_member: bool,
    items_count: int
) -> dict:
    """Calculate order discount based on multiple factors."""
    discount = 0.0
    reasons = []

    # Member discount: 10%
    if is_member:
        discount += subtotal * 0.10
        reasons.append("Member discount: 10%")

    # Bulk discount: 5% for 10+ items
    if items_count >= 10:
        discount += subtotal * 0.05
        reasons.append("Bulk discount: 5%")

    # Coupon codes
    if coupon_code:
        coupon_discounts = {
            "SAVE20": 0.20,
            "WELCOME10": 0.10,
            "VIP30": 0.30,
        }
        rate = coupon_discounts.get(coupon_code.upper())
        if rate:
            discount += subtotal * rate
            reasons.append(f"Coupon {coupon_code}: {int(rate*100)}%")

    # Cap at 50%
    max_discount = subtotal * 0.50
    if discount > max_discount:
        discount = max_discount
        reasons.append("Capped at 50% maximum")

    return {
        "subtotal": subtotal,
        "discount": round(discount, 2),
        "total": round(subtotal - discount, 2),
        "reasons": reasons,
    }

Now here’s how you’d prompt AI:

Write a comprehensive test suite for this calculate_discount
function using pytest. Include:

1. Happy path tests for each discount type
2. Combination tests (member + coupon, member + bulk, all three)
3. Edge cases (zero subtotal, negative, empty string coupon,
   invalid coupon, case sensitivity)
4. Boundary tests (exactly 10 items, 9 items for bulk threshold)
5. The 50% cap behavior
6. Type edge cases (None vs empty string for coupon)

Use descriptive test names that explain the scenario.
Follow AAA pattern (Arrange, Act, Assert).

[paste the function above]

Step 2: What AI Generates

The AI will produce something like this (abbreviated):

class TestCalculateDiscount:
    """Tests for the calculate_discount function."""

    # --- Happy Path ---

    def test_member_gets_10_percent_discount(self):
        result = calculate_discount(100.0, None, True, 1)
        assert result["discount"] == 10.0
        assert result["total"] == 90.0
        assert "Member discount: 10%" in result["reasons"]

    def test_bulk_order_gets_5_percent_for_10_plus_items(self):
        result = calculate_discount(200.0, None, False, 10)
        assert result["discount"] == 10.0
        assert "Bulk discount: 5%" in result["reasons"]

    def test_valid_coupon_applies_correct_rate(self):
        result = calculate_discount(100.0, "SAVE20", False, 1)
        assert result["discount"] == 20.0

    # --- Combinations ---

    def test_member_plus_coupon_stacks_discounts(self):
        result = calculate_discount(100.0, "SAVE20", True, 1)
        assert result["discount"] == 30.0  # 10% + 20%

    def test_all_discounts_combined_hits_cap(self):
        result = calculate_discount(100.0, "VIP30", True, 15)
        assert result["discount"] == 50.0  # 10+5+30=45... wait
        # Actually 10% + 5% + 30% = 45%, under 50% cap
        assert result["discount"] == 45.0

    # --- Edge Cases ---

    def test_zero_subtotal_returns_zero_discount(self):
        result = calculate_discount(0.0, "SAVE20", True, 10)
        assert result["discount"] == 0.0
        assert result["total"] == 0.0

    def test_invalid_coupon_code_ignored(self):
        result = calculate_discount(100.0, "BOGUS", False, 1)
        assert result["discount"] == 0.0

    def test_coupon_case_insensitive(self):
        result = calculate_discount(100.0, "save20", False, 1)
        assert result["discount"] == 20.0

    def test_empty_string_coupon_treated_as_no_coupon(self):
        result = calculate_discount(100.0, "", False, 1)
        assert result["discount"] == 0.0

Notice something interesting? The AI caught a potential bug in its own test—the “all discounts combined” case. It initially expected 50 (capped), then recalculated and realized 45% is actually under the cap. This kind of systematic thinking is exactly where AI tests shine.

Step 3: Review the Generated Tests

Don’t blindly trust AI-generated tests. Check for these problems:

Tautological tests test that the code does what the code does, without testing that the code does the right thing:

# BAD: This just re-implements the function logic
def test_discount_calculation(self):
    subtotal = 100.0
    expected = subtotal * 0.10  # Just copying the source logic
    result = calculate_discount(subtotal, None, True, 1)
    assert result["discount"] == expected

Missing edge cases the AI might not think of:

# Did AI test these?
def test_negative_subtotal(self):
    # Should this even be allowed? What's the expected behavior?
    result = calculate_discount(-50.0, None, True, 1)
    # This reveals a design question, not just a bug

def test_floating_point_precision(self):
    result = calculate_discount(33.33, "SAVE20", True, 1)
    # 33.33 * 0.30 = 9.999... Does rounding work correctly?
    assert result["discount"] == 10.0  # or 9.99?

Wrong assertions where the AI’s expected value is incorrect. Run every test and verify that failures are actual bugs, not wrong expectations.

Quick Check: Spot the Problem

Here’s an AI-generated test. What’s wrong with it?

def test_discount_applied(self):
    result = calculate_discount(100.0, "SAVE20", False, 1)
    assert result is not None
    assert "discount" in result
    assert "total" in result

The problem: it tests that the function returns something but doesn’t test that the values are correct. This test would pass even if the discount calculation was completely wrong. Always assert on specific values.

Generating Integration Tests

Unit tests cover individual functions. Integration tests cover how pieces work together. AI is great at generating both—you just need to provide more context:

Write integration tests for our order checkout flow.
Here's how the pieces connect:

1. CartService.getItems() returns the cart items
2. DiscountService.calculate() applies discounts
3. PaymentService.charge() processes payment
4. OrderService.create() creates the order record

Test these scenarios:
- Successful checkout with discount
- Payment failure should not create an order
- Empty cart should reject early
- Expired coupon during checkout

Here's each service's interface:
[paste relevant interfaces]

Use pytest with unittest.mock for service mocking.

The AI generates tests that verify the interactions between services—that the right methods are called in the right order with the right parameters, and that failure in one step prevents subsequent steps.

The Test Pyramid with AI

AI changes how you approach the test pyramid:

        /\
       /  \  E2E Tests
      /    \  (AI helps write, you design scenarios)
     /------\
    /        \  Integration Tests
   /          \  (AI generates from service interfaces)
  /------------\
 /              \  Unit Tests
/                \  (AI generates most of these for you)
------------------

Unit tests: Let AI generate 80%+ of these. The edge cases AI catches are worth their weight in gold. Review and supplement with domain-specific cases.

Integration tests: Co-create with AI. You define the scenarios and interactions; AI writes the test code and mocking setup.

E2E tests: You design the user flows; AI helps translate them into test code (Playwright, Cypress, etc.).

Asking AI to Find What’s NOT Tested

One of the most powerful testing prompts:

Here's my function and its current test suite.
What behaviors, edge cases, or error conditions
are NOT currently tested?

[paste function]
[paste existing tests]

AI will identify gaps like:

“No test for when items_count is 0”
“No test verifying the reasons array order”
“No test for concurrent discount calculations”
“No test for extremely large subtotals (overflow?)”

This is like having a senior QA engineer review your test coverage.

Practical Exercise

Take a function you’ve written recently and try this workflow:

Paste it into your AI assistant
Ask for comprehensive tests including edge cases
Review every generated test—do they test behavior or just implementation?
Run the tests—do they all pass? If not, is it a bug in the code or the test?
Ask AI: “What’s NOT tested?” and add those cases

You’ll likely discover at least one edge case you hadn’t considered.

Key Takeaways

AI excels at identifying edge cases and boundary conditions humans miss
Provide the function, its dependencies, and expected behaviors for best results
Review generated tests for tautological assertions and wrong expected values
Use AI to find gaps in existing test coverage
Let AI handle the bulk of unit tests; co-create integration and E2E tests
Always run generated tests and verify that failures indicate real bugs

Next up: code review. You’ve written and tested code—now let’s use AI as a tireless reviewer that catches issues before they reach production.

Up next: In the next lesson, we’ll dive into Code Review and Refactoring.