Algorithms That Matter
The ML algorithms you need to know — linear regression, decision trees, random forests, neural networks, and K-means. What each does and when to use it.
Premium Course Content
This lesson is part of a premium course. Upgrade to Pro to unlock all premium courses and content.
- Access all premium courses
- 1000+ AI skill templates included
- New content added weekly
The Algorithm Toolkit
🔄 Lesson 2 introduced the three types of ML. Now let’s look at the specific algorithms that power each type. You don’t need to understand the math behind every algorithm — you need to know what each one does, when it works, and when it doesn’t.
Supervised Learning Algorithms
Linear Regression
The simplest ML algorithm. Finds a straight line (or flat surface) that best fits your data.
What it does: Predicts a continuous number based on input features.
Example: Predict house price from square footage. The algorithm finds the line: price = $200 × square_feet + $50,000. Each additional square foot adds ~$200 to the predicted price.
When it works: Relationships that are roughly linear, small-to-medium datasets, when you need interpretable results (each feature gets a clear coefficient).
When it fails: Complex, non-linear relationships. If price increases exponentially with square footage rather than linearly, linear regression misses the curve.
Decision Trees
A flowchart-like structure that makes decisions by asking a sequence of questions about the data.
What it does: Splits data based on feature values, creating a tree of decisions that leads to a prediction.
Example: Will this customer churn?
Is contract month-to-month?
├── YES → Is tenure < 12 months?
│ ├── YES → HIGH RISK (83% churn)
│ └── NO → MEDIUM RISK (45% churn)
└── NO → LOW RISK (12% churn)
Strength: Completely interpretable — you can trace every prediction back through the decision path. Non-technical stakeholders can understand and verify the logic.
Weakness: Prone to overfitting — the tree can become so complex it memorizes the training data rather than learning general patterns. A tree with 100 levels will perfectly fit training data and fail on new data.
✅ Quick Check: A decision tree for loan approval has 200 levels and perfectly classifies every training example. Is this good? No — it’s overfitting. A 200-level tree has memorized the training data, including its noise and quirks. It will perform terribly on new loan applications. The fix: limit tree depth (5-20 levels is usually enough), require minimum samples per leaf, or prune unnecessary branches. Simpler trees generalize better.
Random Forests
An ensemble of many decision trees that vote on the final prediction.
What it does: Builds hundreds of decision trees, each trained on a random subset of data and features. The final prediction is the majority vote (classification) or average (regression) across all trees.
Why it’s better than a single tree: Individual trees overfit. But if you build 500 trees, each with slightly different training data and features, their errors cancel out. The ensemble is more accurate and more stable than any individual tree.
The trade-off: Less interpretable than a single decision tree (500 trees voting is hard to trace), but significantly more accurate. For most structured data problems, random forests are the default starting point.
Neural Networks
Loosely inspired by how biological neurons work. Layers of connected nodes that transform inputs into outputs.
What they do: Learn complex, non-linear patterns from large amounts of data. Especially powerful for images, text, and audio — unstructured data where traditional algorithms struggle.
Key insight: Neural networks learn their own features. With a traditional algorithm, you’d manually define features (edge detection for images, word frequency for text). Neural networks discover the relevant features automatically from raw data. This is why they dominate computer vision, NLP, and speech recognition.
The trade-off: Need much more data and compute than traditional algorithms. Harder to train (require careful architecture design and hyperparameter tuning). Less interpretable — often called “black boxes” because you can’t easily explain why they made a specific prediction.
Unsupervised Learning: K-Means Clustering
What it does: Groups data points into K clusters based on similarity. You specify how many groups (K) you want; the algorithm figures out which data points belong to each group.
Example: Customer segmentation. Feed purchase data into K-Means with K=4, and it might discover:
- Cluster 1: High-value, frequent buyers (your VIPs)
- Cluster 2: Seasonal shoppers (holiday and sale buyers)
- Cluster 3: Discount-only customers (only buy during promotions)
- Cluster 4: One-time purchasers (haven’t returned)
The challenge: You have to choose K (the number of clusters). Too few clusters oversimplify; too many fragment meaningful groups. Techniques like the “elbow method” help find the right K, but there’s always some judgment involved.
Choosing the Right Algorithm
| Data Type | Best Starting Algorithm | Why |
|---|---|---|
| Tabular data, classification | Random Forest or XGBoost | High accuracy, handles mixed feature types |
| Tabular data, regression | Linear Regression → Random Forest | Start simple, add complexity if needed |
| Images | Convolutional Neural Network (CNN) | Learns spatial features automatically |
| Text / NLP | Transformer Neural Network | Captures language patterns and context |
| Finding groups (no labels) | K-Means | Simple, fast, interpretable clusters |
| Need explainability | Decision Tree | Every prediction is traceable |
✅ Quick Check: Your CEO asks: “Why did the model flag this customer as high churn risk?” You used a neural network. Can you explain it? Probably not in business terms. Neural networks are black boxes — they can identify the most important features, but they can’t show the specific decision path like a decision tree can. If explainability is required (compliance, executive reporting, customer-facing decisions), use tree-based models. If raw accuracy matters more than explainability (image recognition, speech processing), neural networks are appropriate.
Key Takeaways
- Linear regression: simplest, best for small datasets and interpretable results
- Decision trees: fully interpretable but prone to overfitting — limit depth
- Random forests: ensemble of trees, more accurate, less interpretable — the default for structured data
- Neural networks: dominate images, text, and audio — require more data and compute, harder to explain
- K-means: groups unlabeled data into clusters — you choose how many groups
- For structured/tabular data, tree-based methods (XGBoost, random forests) typically outperform deep learning
Up Next
Algorithms need data to learn from. Lesson 4 covers the data pipeline — how to prepare, split, and validate data so your ML model learns real patterns instead of noise.
Knowledge Check
Complete the quiz above first
Lesson completed!