Tools & Frameworks
The ML software stack — scikit-learn, TensorFlow, PyTorch, Keras, pandas, and NumPy. What each does and when to use it.
Premium Course Content
This lesson is part of a premium course. Upgrade to Pro to unlock all premium courses and content.
- Access all premium courses
- 1000+ AI skill templates included
- New content added weekly
The Software Stack
🔄 Lessons 3-5 covered algorithms, data, and evaluation. Now let’s look at the tools that turn these concepts into working systems. The ML software stack has clear layers, each with a specific job.
The Foundation: Python
Python dominates ML. Not because it’s the fastest language (it isn’t), but because its ecosystem of libraries makes ML practical:
| Library | Role | Analogy |
|---|---|---|
| NumPy | Numerical computation | The calculator |
| pandas | Data manipulation | The spreadsheet |
| matplotlib | Visualization | The chart maker |
| scikit-learn | Traditional ML | The algorithms |
| PyTorch | Deep learning (research) | The neural network lab |
| TensorFlow | Deep learning (production) | The neural network factory |
| Keras | Deep learning (simplified) | The easy button |
Every ML project uses NumPy and pandas. The ML framework you add on top depends on your problem type.
pandas: Data Preparation
Every ML project starts with pandas. It’s the tool that loads, explores, cleans, and transforms data before any algorithm touches it.
What pandas does:
- Load data from CSV, Excel, SQL databases, JSON
- Explore data: column types, missing values, summary statistics
- Clean data: handle missing values, remove duplicates, fix formats
- Engineer features: create new columns, encode categories, aggregate
- Export clean data for ML training
The workflow:
Load CSV → Explore (describe, info) → Clean (fillna, dropna) →
Engineer features → Export for ML
pandas handles the 80% of ML work that isn’t algorithms. Data scientists joke that they spend 80% of their time on data preparation and 20% on actual modeling. pandas is the 80% tool.
scikit-learn: Traditional Machine Learning
scikit-learn is the gold standard for traditional ML algorithms. If your data lives in a spreadsheet and you’re not using neural networks, scikit-learn is almost certainly the right choice.
What it covers:
- Classification: random forests, SVM, logistic regression, k-nearest neighbors
- Regression: linear regression, decision trees, gradient boosting
- Clustering: K-means, DBSCAN, hierarchical clustering
- Preprocessing: scaling, encoding, imputation
- Evaluation: accuracy, precision, recall, cross-validation
- Model selection: grid search, hyperparameter tuning
Why ML practitioners love it: Consistent API. Every algorithm follows the same pattern: create the model, .fit() on training data, .predict() on new data. Learn the pattern once, apply it to any algorithm.
When to use scikit-learn: Structured/tabular data (spreadsheets, databases, CSV files), traditional algorithms (not deep learning), projects where interpretability matters, quick prototyping.
✅ Quick Check: You need to build a spam classifier. Your data is a CSV with 15 columns (word counts, sender info, email metadata). Would you use scikit-learn or PyTorch? scikit-learn — this is structured tabular data with 15 features. A random forest or logistic regression in scikit-learn handles this perfectly in a few lines of code. PyTorch is built for neural networks on unstructured data. Using it here adds complexity without adding value.
PyTorch: Deep Learning for Research
PyTorch is the dominant framework for deep learning research and education. 60%+ of beginners choose it first, and most new ML research papers use it.
Key strength: dynamic computation graphs. PyTorch code runs like regular Python — you can print values, set breakpoints, and step through execution line by line. This makes debugging intuitive, which is crucial when you’re learning or experimenting.
When to use PyTorch:
- Neural networks (CNNs for images, transformers for text, RNNs for sequences)
- Research and experimentation
- Learning deep learning concepts
- Projects where you need flexibility to customize architecture
TensorFlow: Deep Learning for Production
TensorFlow powers ML systems at Google and many large enterprises. It’s optimized for production deployment — serving models to millions of users, running on mobile devices, and scaling across data centers.
Key strength: production ecosystem. TF Serving deploys models as APIs. TF Lite runs models on mobile and edge devices. TF.js runs in browsers. The ecosystem is built for taking a trained model and putting it in front of users.
When to use TensorFlow:
- Production deployment at scale
- Mobile or edge device ML
- Enterprise ML infrastructure
- When your organization already uses TensorFlow
Keras: The Easy Button
Keras is a high-level interface for building neural networks. It abstracts away the complex details, letting you build and train models in a few lines.
What makes it beginner-friendly: Instead of defining every matrix multiplication and gradient calculation, you describe the network structure in plain terms: “A layer with 128 neurons, then a layer with 64 neurons, then an output layer with 10 classes.”
Keras runs on top of TensorFlow (it’s built into TensorFlow as tf.keras). Think of it as a simplified control panel for TensorFlow’s engine.
When to use Keras: Your first deep learning project, quick prototyping, when you want results fast without deep framework knowledge.
Choosing Your Framework
| Question | Answer |
|---|---|
| Structured data + traditional algorithms? | scikit-learn |
| Images/text/audio + neural networks? | PyTorch (learning) or TensorFlow (production) |
| First deep learning project? | Keras (simplest) or PyTorch (most intuitive) |
| Deploying to mobile/edge? | TensorFlow + TF Lite |
| Research and experimentation? | PyTorch |
| Enterprise deployment? | TensorFlow |
✅ Quick Check: A startup wants to build a recommendation system. They’ll prototype quickly, iterate fast, and eventually deploy to production. What’s their framework path? Start with scikit-learn for a baseline (collaborative filtering is traditional ML). If performance needs deep learning, prototype in PyTorch (faster iteration). Deploy the final model with TensorFlow Serving (production-ready). This prototype→deploy pattern — PyTorch for research, TensorFlow for production — is common in industry.
Key Takeaways
- Python is the language of ML — NumPy, pandas, matplotlib form the foundation
- pandas handles data preparation (80% of ML work) — loading, cleaning, engineering features
- scikit-learn is the gold standard for traditional ML on structured data — consistent API, broad algorithm coverage
- PyTorch dominates research and learning — dynamic graphs, intuitive debugging, 60% beginner adoption
- TensorFlow dominates production — model serving, mobile deployment, enterprise scale
- Keras simplifies deep learning — high-level interface on top of TensorFlow
- Choose by problem type: structured data → scikit-learn, neural networks → PyTorch/TensorFlow
Up Next
You understand the concepts, algorithms, data, evaluation, and tools. Lesson 7 puts it all in context — real-world ML applications across industries, plus the ethical challenges of algorithmic bias, fairness, and accountability.
Knowledge Check
Complete the quiz above first
Lesson completed!