Algorithms That Matter

The Algorithm Toolkit

🔄 Lesson 2 introduced the three types of ML. Now let’s look at the specific algorithms that power each type. You don’t need to understand the math behind every algorithm — you need to know what each one does, when it works, and when it doesn’t.

Supervised Learning Algorithms

Linear Regression

The simplest ML algorithm. Finds a straight line (or flat surface) that best fits your data.

What it does: Predicts a continuous number based on input features.

Example: Predict house price from square footage. The algorithm finds the line: price = $200 × square_feet + $50,000. Each additional square foot adds ~$200 to the predicted price.

When it works: Relationships that are roughly linear, small-to-medium datasets, when you need interpretable results (each feature gets a clear coefficient).

When it fails: Complex, non-linear relationships. If price increases exponentially with square footage rather than linearly, linear regression misses the curve.

Decision Trees

A flowchart-like structure that makes decisions by asking a sequence of questions about the data.

What it does: Splits data based on feature values, creating a tree of decisions that leads to a prediction.

Example: Will this customer churn?

Is contract month-to-month?
├── YES → Is tenure < 12 months?
│   ├── YES → HIGH RISK (83% churn)
│   └── NO → MEDIUM RISK (45% churn)
└── NO → LOW RISK (12% churn)

Strength: Completely interpretable — you can trace every prediction back through the decision path. Non-technical stakeholders can understand and verify the logic.

Weakness: Prone to overfitting — the tree can become so complex it memorizes the training data rather than learning general patterns. A tree with 100 levels will perfectly fit training data and fail on new data.

✅ Quick Check: A decision tree for loan approval has 200 levels and perfectly classifies every training example. Is this good? No — it’s overfitting. A 200-level tree has memorized the training data, including its noise and quirks. It will perform terribly on new loan applications. The fix: limit tree depth (5-20 levels is usually enough), require minimum samples per leaf, or prune unnecessary branches. Simpler trees generalize better.

Random Forests

An ensemble of many decision trees that vote on the final prediction.

What it does: Builds hundreds of decision trees, each trained on a random subset of data and features. The final prediction is the majority vote (classification) or average (regression) across all trees.

Why it’s better than a single tree: Individual trees overfit. But if you build 500 trees, each with slightly different training data and features, their errors cancel out. The ensemble is more accurate and more stable than any individual tree.

The trade-off: Less interpretable than a single decision tree (500 trees voting is hard to trace), but significantly more accurate. For most structured data problems, random forests are the default starting point.

Neural Networks

Loosely inspired by how biological neurons work. Layers of connected nodes that transform inputs into outputs.

What they do: Learn complex, non-linear patterns from large amounts of data. Especially powerful for images, text, and audio — unstructured data where traditional algorithms struggle.

Key insight: Neural networks learn their own features. With a traditional algorithm, you’d manually define features (edge detection for images, word frequency for text). Neural networks discover the relevant features automatically from raw data. This is why they dominate computer vision, NLP, and speech recognition.

The trade-off: Need much more data and compute than traditional algorithms. Harder to train (require careful architecture design and hyperparameter tuning). Less interpretable — often called “black boxes” because you can’t easily explain why they made a specific prediction.

Unsupervised Learning: K-Means Clustering

What it does: Groups data points into K clusters based on similarity. You specify how many groups (K) you want; the algorithm figures out which data points belong to each group.

Example: Customer segmentation. Feed purchase data into K-Means with K=4, and it might discover:

Cluster 1: High-value, frequent buyers (your VIPs)
Cluster 2: Seasonal shoppers (holiday and sale buyers)
Cluster 3: Discount-only customers (only buy during promotions)
Cluster 4: One-time purchasers (haven’t returned)

The challenge: You have to choose K (the number of clusters). Too few clusters oversimplify; too many fragment meaningful groups. Techniques like the “elbow method” help find the right K, but there’s always some judgment involved.

Choosing the Right Algorithm

Data Type	Best Starting Algorithm	Why
Tabular data, classification	Random Forest or XGBoost	High accuracy, handles mixed feature types
Tabular data, regression	Linear Regression → Random Forest	Start simple, add complexity if needed
Images	Convolutional Neural Network (CNN)	Learns spatial features automatically
Text / NLP	Transformer Neural Network	Captures language patterns and context
Finding groups (no labels)	K-Means	Simple, fast, interpretable clusters
Need explainability	Decision Tree	Every prediction is traceable

✅ Quick Check: Your CEO asks: “Why did the model flag this customer as high churn risk?” You used a neural network. Can you explain it? Probably not in business terms. Neural networks are black boxes — they can identify the most important features, but they can’t show the specific decision path like a decision tree can. If explainability is required (compliance, executive reporting, customer-facing decisions), use tree-based models. If raw accuracy matters more than explainability (image recognition, speech processing), neural networks are appropriate.

Key Takeaways

Linear regression: simplest, best for small datasets and interpretable results
Decision trees: fully interpretable but prone to overfitting — limit depth
Random forests: ensemble of trees, more accurate, less interpretable — the default for structured data
Neural networks: dominate images, text, and audio — require more data and compute, harder to explain
K-means: groups unlabeled data into clusters — you choose how many groups
For structured/tabular data, tree-based methods (XGBoost, random forests) typically outperform deep learning

Up Next

Algorithms need data to learn from. Lesson 4 covers the data pipeline — how to prepare, split, and validate data so your ML model learns real patterns instead of noise.