Lesson 7 12 min

Scheduling & Error Handling

Learn to run Python automation scripts on a schedule and make them production-ready with error handling, logging, retry logic, and monitoring.

🔄 Recall Bridge: In the previous lesson, you automated email and notifications — sending reports and alerts from your scripts. Now let’s make your scripts run automatically on a schedule and survive errors gracefully.

A script that works when you run it manually is a tool. A script that runs on schedule, handles errors, and notifies you of problems is an automation system. This lesson bridges that gap.

Scheduling Options

ToolPlatformBest For
cronmacOS/LinuxSimple, reliable, built-in
Task SchedulerWindowsWindows native scheduling
schedule (Python library)All platformsReadable schedules in Python code
APSchedulerAll platformsAdvanced scheduling with persistence
launchdmacOSmacOS-specific, more features than cron

Option 1: cron (macOS/Linux)

AI prompt:

Generate cron expressions for these schedules: (1) Every weekday at 8 AM, (2) Every Monday at 9 AM, (3) Every 6 hours, (4) First day of every month at midnight. Show me the full crontab entry for running a Python script at each schedule, including the full path to python3 and the script, and redirecting output to a log file.

Common cron patterns:

ScheduleCron Expression
Every day at 8 AM0 8 * * *
Weekdays at 8 AM0 8 * * 1-5
Every Monday at 9 AM0 9 * * 1
Every 6 hours0 */6 * * *
1st of month at midnight0 0 1 * *

Crontab entry:

# Edit crontab
crontab -e

# Entry format: minute hour day month weekday command
0 8 * * 1-5 /usr/bin/python3 /home/user/scripts/daily_report.py >> /home/user/logs/daily_report.log 2>&1

Option 2: Python schedule Library

pip install schedule

AI prompt:

Write a Python scheduler script using the schedule library that: (1) Runs daily_report() every weekday at 8 AM, (2) Runs weekly_summary() every Friday at 5 PM, (3) Runs price_check() every 6 hours, (4) Logs when each task starts and finishes, (5) Catches and logs errors without crashing the scheduler, (6) Sends an alert if any task fails. Keep the scheduler running indefinitely.

Production-Ready Error Handling

The error handling hierarchy:

LevelHandlesExample
Try/except per operationIndividual failuresOne file fails, others continue
Retry with backoffTemporary failuresNetwork timeout → retry in 30s
Failure notificationExhausted retriesEmail alert: “Script failed after 3 retries”
Heartbeat monitoringSilent failures“Script didn’t report success by 8:15 AM”

AI prompt for robust error handling:

Add production-ready error handling to my automation script: (1) Wrap each major operation in try/except with specific exception types (not bare except), (2) Add retry logic for network operations: 3 retries with exponential backoff (30s, 60s, 120s), (3) Log all errors with full traceback to a rotating log file (max 10MB, keep 5 rotations), (4) If the script fails completely, send an alert email with the error details, (5) If the script succeeds, write a success marker file (heartbeat) with timestamp.

Logging Setup

AI prompt for logging configuration:

Set up Python logging for my automation script: (1) Log to both console and file, (2) Console shows INFO and above, file shows DEBUG and above, (3) Log format includes timestamp, level, function name, and message, (4) Rotate log files daily, keep 30 days of logs, (5) Create a reusable setup_logging() function I can import into all my scripts.

Log level guidelines:

LevelUse ForExample
DEBUGDetailed troubleshooting info“Processing row 142 of 5000”
INFONormal operation milestones“Report generated: 500 rows, saved to output.xlsx”
WARNINGSomething unexpected but handled“3 rows had missing emails, filled with default”
ERROROperation failed but script continues“Failed to fetch page 15, skipping”
CRITICALScript cannot continue“Database connection failed after 3 retries”

Monitoring Your Automations

Simple heartbeat monitoring script:

AI prompt:

Write a monitoring script that checks if my automation scripts ran successfully: (1) Each script writes a “heartbeat” file after success: {script_name}_heartbeat.json with {“last_success”: timestamp, “records_processed”: count}, (2) The monitor checks all heartbeat files and alerts if any script’s last success is older than its expected schedule (daily scripts → alert if > 25 hours, hourly scripts → alert if > 90 minutes), (3) Generate a daily status summary: which scripts ran, when, how many records processed, any failures. Run this monitor every 30 minutes.

Quick Check: Your script uses a bare except: clause that catches ALL exceptions, including KeyboardInterrupt and SystemExit. Why is this a problem? (Answer: Bare except: catches EVERYTHING, including exceptions that should stop the script: KeyboardInterrupt (Ctrl+C), SystemExit (sys.exit()), and MemoryError. This makes the script impossible to stop gracefully. Always catch specific exceptions: except (requests.RequestException, ValueError) as e: — or at minimum, use except Exception as e: which excludes KeyboardInterrupt and SystemExit.)

Key Takeaways

  • Production automation needs three layers of protection: retry logic for temporary failures (network timeouts resolve themselves), failure notifications for exhausted retries (you learn about problems immediately), and heartbeat monitoring for silent failures (script didn’t even run) — without all three, you’ll discover failures hours or days late
  • Use Python’s logging module instead of print() for automation scripts — logging provides timestamps, severity levels, file output, and rotation that you need to debug failures in scripts running unattended at 3 AM; the setup takes 5 lines and saves hours of investigation
  • Centralize scheduling and error handling as your automation grows — one scheduler script with consistent logging, retries, and alerts across all tasks is more maintainable than five independent scripts each with their own ad-hoc error handling

Up Next

In the final lesson, you’ll build your personalized automation toolkit — identifying your highest-value automation opportunities, creating your script portfolio, and establishing a maintenance routine.

Knowledge Check

1. Your script runs every morning at 8 AM via cron. One day it fails because the target website is temporarily down (503 error). The script logs the error and exits. You don't notice until 3 PM because you weren't checking the logs. How do you fix this?

2. Your automation script uses `print()` statements throughout to show progress and errors. A colleague says you should use Python's `logging` module instead. Is this a meaningful difference or just a style preference?

3. You have 5 automation scripts that each run on different schedules. You're managing them with 5 separate cron entries. This is becoming hard to track. What's a better approach?

Answer all questions to check

Complete the quiz above first

Related Skills