Transfer Learning & Data Augmentation
How to train accurate computer vision models with limited data — pretrained models, feature extraction, fine-tuning, and augmentation techniques.
Premium Course Content
This lesson is part of a premium course. Upgrade to Pro to unlock all premium courses and content.
- Access all premium courses
- 1000+ AI skill templates included
- New content added weekly
Building More with Less
🔄 Lesson 5 showed that segmentation models need extensive pixel-level annotations — labeling every pixel in thousands of images. In practice, most CV projects don’t have millions of labeled images. Transfer learning and data augmentation are the practical techniques that bridge this gap.
Few people train computer vision models from scratch. The standard approach: start with a model pretrained on millions of images, then adapt it to your specific task with your limited data.
Transfer Learning for Vision
ImageNet — 1.2 million images across 1,000 categories — trained the models that power most of modern computer vision. A ResNet pretrained on ImageNet already knows how to detect edges, textures, shapes, and object parts. Your task is to redirect that knowledge toward your specific problem.
Feature extraction (freeze and classify):
- Take a pretrained CNN (ResNet, EfficientNet)
- Remove the final classification layer
- Freeze all other layers — their weights don’t change
- Add a new classifier trained on your data
The pretrained layers act as a fixed feature extractor. Your images flow through them, producing rich feature vectors. The new classifier learns to map those vectors to your categories.
When to use: Small datasets (100-1,000 images), or when your domain is visually similar to ImageNet (natural images, products, everyday objects).
Fine-tuning (unfreeze and adapt):
- Take a pretrained CNN
- Add your new classifier
- Unfreeze the top layers (keep early layers frozen)
- Train with a very low learning rate (10-100× lower than normal)
Fine-tuning adapts the pretrained features to your specific domain. The early layers (edges, textures) stay frozen — these are universal. The later layers (object parts, high-level patterns) get adapted to your domain-specific visual patterns.
When to use: Medium datasets (1,000-10,000+ images), or when your domain differs from ImageNet (medical images, satellite imagery, microscopy).
✅ Quick Check: A hospital has 500 chest X-rays and wants to detect pneumonia. Should they use feature extraction or fine-tuning? Feature extraction — with only 500 images, fine-tuning risks catastrophic forgetting (overwriting pretrained features with noisy updates from too-small data). Freeze the pretrained layers, train only the classifier. The pretrained edge and texture detectors apply to X-rays even though ImageNet doesn’t contain medical images — textures and patterns indicating pneumonia activate the same low-level feature detectors learned from natural images.
Pretrained Model Selection
| Model | Parameters | Speed | Best For |
|---|---|---|---|
| EfficientNet-B0 | 5.3M | Very fast | Mobile, edge, resource-limited |
| ResNet-50 | 25.6M | Fast | General-purpose baseline |
| ResNet-152 | 60.2M | Medium | Higher accuracy when compute allows |
| EfficientNet-B7 | 66M | Slower | Maximum accuracy from CNNs |
| ViT-Base | 86M | Medium | Large datasets (14M+ images) |
Rule of thumb: Start with ResNet-50 (the universal baseline). If accuracy isn’t sufficient, try EfficientNet-B3 or B4. Only move to larger models if your dataset supports it.
Data Augmentation
Data augmentation creates new training examples by applying random transformations to existing images. A single image becomes 10-20 variants that the network treats as different training examples.
Standard augmentations (safe for most tasks):
| Augmentation | What It Does | When It Helps |
|---|---|---|
| RandomHorizontalFlip | Mirror the image left-to-right | Most natural images |
| RandomRotation (±15°) | Slightly rotate | When orientation isn’t meaningful |
| RandomResizedCrop | Crop a random portion and resize | Forces model to recognize partial objects |
| ColorJitter | Adjust brightness, contrast, saturation | Varying lighting conditions |
| RandomErasing | Mask random patches | Teaches robustness to occlusion |
Advanced augmentations:
| Augmentation | What It Does | Impact |
|---|---|---|
| Mixup | Blend two images and their labels | Smooth decision boundaries |
| CutMix | Swap random patches between images | Better than Cutout and Mixup alone |
| AutoAugment | Learns optimal augmentation policies | ImageNet policies transfer to other datasets |
| RandAugment | Random selection from augmentation pool | Simpler than AutoAugment, competitive results |
The practical impact: Augmentation can effectively multiply your dataset by 10-20×. With 500 base images and aggressive augmentation, the model sees 5,000-10,000 effective training examples — often enough for competitive performance when combined with transfer learning.
✅ Quick Check: You’re training a model to detect traffic signs. Which augmentations are safe, and which could be dangerous? Safe: brightness/contrast jitter (signs appear in varying light), RandomErasing (partially occluded signs). Dangerous: heavy rotation (signs should be upright), horizontal flipping (a “turn left” sign flipped becomes “turn right” — wrong label), extreme color jitter (color is a defining feature of stop signs vs yield signs). Always evaluate augmentations against what’s physically realistic for your domain.
Combining Transfer Learning + Augmentation
The standard recipe for production CV with limited data:
- Choose a pretrained model (ResNet-50 is the safe default)
- Define augmentations appropriate for your domain
- Feature extraction first — freeze backbone, train classifier
- Evaluate — if accuracy is sufficient, deploy
- Fine-tune if needed — unfreeze top layers with low learning rate
- Increase augmentation if overfitting persists
This workflow handles 90% of practical CV projects. Companies routinely deploy production models trained on 500-5,000 domain-specific images using this approach.
Key Takeaways
- Few people train CV models from scratch — transfer learning from ImageNet-pretrained models is standard
- Feature extraction: freeze pretrained layers, train only a new classifier — best for small datasets (100-1,000 images)
- Fine-tuning: unfreeze top layers with 10-100× lower learning rate — best for medium datasets (1,000+)
- Catastrophic forgetting: high learning rates destroy pretrained knowledge — always use low learning rates for fine-tuning
- Data augmentation multiplies effective dataset size by 10-20× with zero new data collection
- Augmentation must preserve label validity — always validate that transformations make sense for your domain
- Transfer learning + augmentation handles 90% of practical CV projects with limited data
Up Next
You know how to build CV models. Lesson 7 shows where they’re deployed in the real world — from autonomous vehicles to medical imaging — and the ethical challenges that come with teaching machines to see.
Knowledge Check
Complete the quiz above first
Lesson completed!