Lesson 5 12 min

API Integration

Learn to connect Python scripts to REST APIs — authentication, pagination, error handling, and building reusable API wrappers with AI assistance.

🔄 Recall Bridge: In the previous lesson, you learned web scraping — extracting data from HTML pages. APIs are the structured, reliable alternative: instead of parsing HTML, you get clean JSON data directly from the service.

APIs are the backbone of modern automation. Instead of scraping a weather website, call the weather API. Instead of screen-scraping your project management tool, use its API. APIs give you structured data, stable interfaces, and explicit permission.

REST API Basics

pip install requests python-dotenv

The four HTTP methods you’ll use:

Method	Purpose	Example
GET	Retrieve data	Get weather forecast, list users
POST	Send/create data	Create a task, submit a form
PUT	Update data	Update user profile, modify settings
DELETE	Remove data	Delete a record, cancel a subscription

Core requests patterns:

import requests

# GET with parameters
response = requests.get(
    "https://api.example.com/data",
    params={"city": "Tokyo", "units": "metric"},
    headers={"Authorization": f"Bearer {api_key}"}
)
data = response.json()

# POST with JSON body
response = requests.post(
    "https://api.example.com/items",
    json={"name": "New Item", "quantity": 5},
    headers={"Authorization": f"Bearer {api_key}"}
)

Script 1: API Data Fetcher

AI prompt:

Write a Python script that fetches data from a REST API: (1) Read the API key from environment variables using python-dotenv, (2) Make GET requests with proper headers and parameters, (3) Handle common HTTP errors: 401 (unauthorized), 403 (forbidden), 404 (not found), 429 (rate limited), 500 (server error), (4) Parse the JSON response and save to CSV, (5) Add retry logic: retry failed requests up to 3 times with exponential backoff (1s, 2s, 4s). Include a .env.example file listing required environment variables.

Script 2: Paginated API Consumer

AI prompt:

Write a script that consumes a paginated REST API: (1) Start at the first page, (2) Follow pagination: the API returns a “next_page_token” field in each response — pass it as a query parameter to get the next page, (3) Collect all items across all pages into a single list, (4) Stop when there’s no “next_page_token” in the response, (5) Respect rate limits: maximum 60 requests per minute, (6) Save progress after each page (resume-safe if the script crashes), (7) Print progress: “Page 5 — 500 items collected so far”. Return the complete dataset as a pandas DataFrame.

Authentication Patterns

Auth Type	How It Works	requests Code
API Key (header)	Key in request header	`headers={"X-API-Key": key}`
Bearer Token	OAuth-style token	`headers={"Authorization": f"Bearer {token}"}`
API Key (query)	Key in URL parameters	`params={"api_key": key}`
Basic Auth	Username + password	`auth=("username", "password")`

Environment Variable Security

Create .env file (add to .gitignore):

WEATHER_API_KEY=your-key-here
GITHUB_TOKEN=ghp_xxxxxxxxxxxx

Create .env.example (commit this — shows required variables without values):

WEATHER_API_KEY=
GITHUB_TOKEN=

Load in your script:

from dotenv import load_dotenv
import os

load_dotenv()
api_key = os.environ.get("WEATHER_API_KEY")
if not api_key:
    raise ValueError("WEATHER_API_KEY not set in .env file")

Script 3: Multi-API Data Pipeline

AI prompt:

Write a Python script that combines data from two APIs: (1) Fetch a list of cities from API_1, (2) For each city, fetch weather data from API_2, (3) Merge the results into a single dataset with columns from both APIs, (4) Handle: one API being down (use cached data if available), rate limits on both APIs (different limits), and missing data (some cities may not have weather data), (5) Save the combined result as CSV and Excel. This demonstrates the common pattern of orchestrating multiple API calls.

✅ Quick Check: An API returns this error: {"error": "rate_limit_exceeded", "retry_after": 30}. What should your script do? (Answer: Wait the specified 30 seconds before retrying. Many APIs include a retry_after field or a Retry-After HTTP header telling you exactly how long to wait. Your error handling should check for this value and use it instead of a fixed backoff. AI prompt: “Add retry_after handling to my API error logic — check both the JSON response body and HTTP headers for retry timing.”)

Error Handling for API Scripts

Error	Status Code	Your Script Should
Rate limited	429	Wait and retry (check Retry-After header)
Unauthorized	401	Check API key, raise clear error
Not found	404	Log and skip this resource
Server error	500-503	Retry with backoff (temporary issue)
Timeout	-	Retry with longer timeout
Network error	-	Retry, then fail with clear message

Key Takeaways

Never hardcode API keys in source code — use environment variables with python-dotenv, create a .env file (added to .gitignore) for your keys and a .env.example (committed) showing required variables; bots scan GitHub for leaked keys within minutes of accidental pushes
Implement smart rate limiting: track request timestamps to maximize throughput within limits, and use exponential backoff (1s, 2s, 4s) with retry logic for 429 errors — every API has limits, and your script must respect them automatically
APIs are more reliable than web scraping because they provide structured JSON data, versioned interfaces, and explicit permission — always check if a site has an API before building a scraper

Up Next

In the next lesson, you’ll automate email and notifications — sending scheduled reports, alerts, and status updates from your Python scripts.

Knowledge Check

1. You're writing a script that calls a weather API to get forecasts for 50 cities. The API allows 60 requests per minute. Your script sends all 50 requests in 3 seconds and works fine. But when you add 20 more cities (70 total), the API returns '429 Too Many Requests' errors. How do you fix this?

Add a delay between requests to stay under the rate limit. But simple `time.sleep(1)` between every request is wasteful (70 seconds for 70 requests when the limit allows 60/minute). A better approach: (1) Track request timestamps and only pause when approaching the limit. (2) Implement exponential backoff for 429 errors: wait 1s, then 2s, then 4s before retrying. (3) AI prompt: 'Add rate limiting to my API script. The API allows 60 requests/minute. Track timestamps of recent requests and only sleep when the next request would exceed the limit. If we get a 429 response, implement exponential backoff with a maximum of 3 retries.' This gives you maximum speed while respecting the limit Use a different API that has higher rate limits Make requests faster so they all complete before the rate limit kicks in

2. Your script stores the API key directly in the code: `api_key = 'sk-abc123...'`. A colleague says this is a security risk. Why, and what's the fix?

It's only a risk if you share the code — as long as the script is on your computer, it's fine API keys in source code are a security risk because: (1) If you ever push the code to GitHub (even accidentally), the key is exposed and bots scan for leaked API keys within minutes. (2) Different environments (development, production) need different keys, and hardcoding means editing the code each time. (3) Sharing the script with colleagues means sharing your API key. The fix: use environment variables. Store the key in a .env file (add .env to .gitignore), then read it in your script: `api_key = os.environ.get('WEATHER_API_KEY')`. The python-dotenv library loads .env files automatically. AI prompt: 'Refactor my script to use environment variables for API keys. Use python-dotenv, include a .env.example file showing required variables, and raise a clear error if any key is missing' The API provider rotates keys automatically, so it doesn't matter if one leaks

3. An API returns data in pages: each response includes 100 items and a 'next_page_token'. You need all 5,000 items. Your script collects all pages correctly but the API returns slightly different data if you re-run the script 10 minutes later (some items added, some removed). How do you handle this?

Accept that API data is dynamic — your script is working correctly, the data changed between runs Accept that data changes, but build your script to handle it gracefully: (1) Always save the timestamp of when you scraped — so you know WHEN the data reflects. (2) Store previous results and diff against new results — what was added? what was removed? (3) For paginated APIs, be aware of cursor consistency: some APIs guarantee stable pagination (you see every item exactly once), others don't (items can shift between pages as data changes). (4) If you need a consistent snapshot, check if the API supports a 'snapshot' or 'as_of' parameter. AI prompt: 'Add change detection to my API script. Compare the new results against the previous run and output: new items, removed items, and changed items. Save each run with a timestamp' Cache the first run's results and use them forever — live data is unreliable

Answer all questions to check

Complete the quiz above first

REST API Basics

Script 1: API Data Fetcher

Script 2: Paginated API Consumer

Authentication Patterns

Environment Variable Security

Script 3: Multi-API Data Pipeline

Error Handling for API Scripts

Key Takeaways

Up Next

Knowledge Check

Related Skills