Lesson 5 12 min

API Integration

Learn to connect Python scripts to REST APIs — authentication, pagination, error handling, and building reusable API wrappers with AI assistance.

🔄 Recall Bridge: In the previous lesson, you learned web scraping — extracting data from HTML pages. APIs are the structured, reliable alternative: instead of parsing HTML, you get clean JSON data directly from the service.

APIs are the backbone of modern automation. Instead of scraping a weather website, call the weather API. Instead of screen-scraping your project management tool, use its API. APIs give you structured data, stable interfaces, and explicit permission.

REST API Basics

pip install requests python-dotenv

The four HTTP methods you’ll use:

MethodPurposeExample
GETRetrieve dataGet weather forecast, list users
POSTSend/create dataCreate a task, submit a form
PUTUpdate dataUpdate user profile, modify settings
DELETERemove dataDelete a record, cancel a subscription

Core requests patterns:

import requests

# GET with parameters
response = requests.get(
    "https://api.example.com/data",
    params={"city": "Tokyo", "units": "metric"},
    headers={"Authorization": f"Bearer {api_key}"}
)
data = response.json()

# POST with JSON body
response = requests.post(
    "https://api.example.com/items",
    json={"name": "New Item", "quantity": 5},
    headers={"Authorization": f"Bearer {api_key}"}
)

Script 1: API Data Fetcher

AI prompt:

Write a Python script that fetches data from a REST API: (1) Read the API key from environment variables using python-dotenv, (2) Make GET requests with proper headers and parameters, (3) Handle common HTTP errors: 401 (unauthorized), 403 (forbidden), 404 (not found), 429 (rate limited), 500 (server error), (4) Parse the JSON response and save to CSV, (5) Add retry logic: retry failed requests up to 3 times with exponential backoff (1s, 2s, 4s). Include a .env.example file listing required environment variables.

Script 2: Paginated API Consumer

AI prompt:

Write a script that consumes a paginated REST API: (1) Start at the first page, (2) Follow pagination: the API returns a “next_page_token” field in each response — pass it as a query parameter to get the next page, (3) Collect all items across all pages into a single list, (4) Stop when there’s no “next_page_token” in the response, (5) Respect rate limits: maximum 60 requests per minute, (6) Save progress after each page (resume-safe if the script crashes), (7) Print progress: “Page 5 — 500 items collected so far”. Return the complete dataset as a pandas DataFrame.

Authentication Patterns

Auth TypeHow It Worksrequests Code
API Key (header)Key in request headerheaders={"X-API-Key": key}
Bearer TokenOAuth-style tokenheaders={"Authorization": f"Bearer {token}"}
API Key (query)Key in URL parametersparams={"api_key": key}
Basic AuthUsername + passwordauth=("username", "password")

Environment Variable Security

Create .env file (add to .gitignore):

WEATHER_API_KEY=your-key-here
GITHUB_TOKEN=ghp_xxxxxxxxxxxx

Create .env.example (commit this — shows required variables without values):

WEATHER_API_KEY=
GITHUB_TOKEN=

Load in your script:

from dotenv import load_dotenv
import os

load_dotenv()
api_key = os.environ.get("WEATHER_API_KEY")
if not api_key:
    raise ValueError("WEATHER_API_KEY not set in .env file")

Script 3: Multi-API Data Pipeline

AI prompt:

Write a Python script that combines data from two APIs: (1) Fetch a list of cities from API_1, (2) For each city, fetch weather data from API_2, (3) Merge the results into a single dataset with columns from both APIs, (4) Handle: one API being down (use cached data if available), rate limits on both APIs (different limits), and missing data (some cities may not have weather data), (5) Save the combined result as CSV and Excel. This demonstrates the common pattern of orchestrating multiple API calls.

Quick Check: An API returns this error: {"error": "rate_limit_exceeded", "retry_after": 30}. What should your script do? (Answer: Wait the specified 30 seconds before retrying. Many APIs include a retry_after field or a Retry-After HTTP header telling you exactly how long to wait. Your error handling should check for this value and use it instead of a fixed backoff. AI prompt: “Add retry_after handling to my API error logic — check both the JSON response body and HTTP headers for retry timing.”)

Error Handling for API Scripts

ErrorStatus CodeYour Script Should
Rate limited429Wait and retry (check Retry-After header)
Unauthorized401Check API key, raise clear error
Not found404Log and skip this resource
Server error500-503Retry with backoff (temporary issue)
Timeout-Retry with longer timeout
Network error-Retry, then fail with clear message

Key Takeaways

  • Never hardcode API keys in source code — use environment variables with python-dotenv, create a .env file (added to .gitignore) for your keys and a .env.example (committed) showing required variables; bots scan GitHub for leaked keys within minutes of accidental pushes
  • Implement smart rate limiting: track request timestamps to maximize throughput within limits, and use exponential backoff (1s, 2s, 4s) with retry logic for 429 errors — every API has limits, and your script must respect them automatically
  • APIs are more reliable than web scraping because they provide structured JSON data, versioned interfaces, and explicit permission — always check if a site has an API before building a scraper

Up Next

In the next lesson, you’ll automate email and notifications — sending scheduled reports, alerts, and status updates from your Python scripts.

Knowledge Check

1. You're writing a script that calls a weather API to get forecasts for 50 cities. The API allows 60 requests per minute. Your script sends all 50 requests in 3 seconds and works fine. But when you add 20 more cities (70 total), the API returns '429 Too Many Requests' errors. How do you fix this?

2. Your script stores the API key directly in the code: `api_key = 'sk-abc123...'`. A colleague says this is a security risk. Why, and what's the fix?

3. An API returns data in pages: each response includes 100 items and a 'next_page_token'. You need all 5,000 items. Your script collects all pages correctly but the API returns slightly different data if you re-run the script 10 minutes later (some items added, some removed). How do you handle this?

Answer all questions to check

Complete the quiz above first

Related Skills