Transfer Learning & Data Augmentation

Building More with Less

🔄 Lesson 5 showed that segmentation models need extensive pixel-level annotations — labeling every pixel in thousands of images. In practice, most CV projects don’t have millions of labeled images. Transfer learning and data augmentation are the practical techniques that bridge this gap.

Few people train computer vision models from scratch. The standard approach: start with a model pretrained on millions of images, then adapt it to your specific task with your limited data.

Transfer Learning for Vision

ImageNet — 1.2 million images across 1,000 categories — trained the models that power most of modern computer vision. A ResNet pretrained on ImageNet already knows how to detect edges, textures, shapes, and object parts. Your task is to redirect that knowledge toward your specific problem.

Feature extraction (freeze and classify):

Take a pretrained CNN (ResNet, EfficientNet)
Remove the final classification layer
Freeze all other layers — their weights don’t change
Add a new classifier trained on your data

The pretrained layers act as a fixed feature extractor. Your images flow through them, producing rich feature vectors. The new classifier learns to map those vectors to your categories.

When to use: Small datasets (100-1,000 images), or when your domain is visually similar to ImageNet (natural images, products, everyday objects).

Fine-tuning (unfreeze and adapt):

Take a pretrained CNN
Add your new classifier
Unfreeze the top layers (keep early layers frozen)
Train with a very low learning rate (10-100× lower than normal)

Fine-tuning adapts the pretrained features to your specific domain. The early layers (edges, textures) stay frozen — these are universal. The later layers (object parts, high-level patterns) get adapted to your domain-specific visual patterns.

When to use: Medium datasets (1,000-10,000+ images), or when your domain differs from ImageNet (medical images, satellite imagery, microscopy).

✅ Quick Check: A hospital has 500 chest X-rays and wants to detect pneumonia. Should they use feature extraction or fine-tuning? Feature extraction — with only 500 images, fine-tuning risks catastrophic forgetting (overwriting pretrained features with noisy updates from too-small data). Freeze the pretrained layers, train only the classifier. The pretrained edge and texture detectors apply to X-rays even though ImageNet doesn’t contain medical images — textures and patterns indicating pneumonia activate the same low-level feature detectors learned from natural images.

Pretrained Model Selection

Model	Parameters	Speed	Best For
EfficientNet-B0	5.3M	Very fast	Mobile, edge, resource-limited
ResNet-50	25.6M	Fast	General-purpose baseline
ResNet-152	60.2M	Medium	Higher accuracy when compute allows
EfficientNet-B7	66M	Slower	Maximum accuracy from CNNs
ViT-Base	86M	Medium	Large datasets (14M+ images)

Rule of thumb: Start with ResNet-50 (the universal baseline). If accuracy isn’t sufficient, try EfficientNet-B3 or B4. Only move to larger models if your dataset supports it.

Data Augmentation

Data augmentation creates new training examples by applying random transformations to existing images. A single image becomes 10-20 variants that the network treats as different training examples.

Standard augmentations (safe for most tasks):

Augmentation	What It Does	When It Helps
RandomHorizontalFlip	Mirror the image left-to-right	Most natural images
RandomRotation (±15°)	Slightly rotate	When orientation isn’t meaningful
RandomResizedCrop	Crop a random portion and resize	Forces model to recognize partial objects
ColorJitter	Adjust brightness, contrast, saturation	Varying lighting conditions
RandomErasing	Mask random patches	Teaches robustness to occlusion

Advanced augmentations:

Augmentation	What It Does	Impact
Mixup	Blend two images and their labels	Smooth decision boundaries
CutMix	Swap random patches between images	Better than Cutout and Mixup alone
AutoAugment	Learns optimal augmentation policies	ImageNet policies transfer to other datasets
RandAugment	Random selection from augmentation pool	Simpler than AutoAugment, competitive results

The practical impact: Augmentation can effectively multiply your dataset by 10-20×. With 500 base images and aggressive augmentation, the model sees 5,000-10,000 effective training examples — often enough for competitive performance when combined with transfer learning.

✅ Quick Check: You’re training a model to detect traffic signs. Which augmentations are safe, and which could be dangerous? Safe: brightness/contrast jitter (signs appear in varying light), RandomErasing (partially occluded signs). Dangerous: heavy rotation (signs should be upright), horizontal flipping (a “turn left” sign flipped becomes “turn right” — wrong label), extreme color jitter (color is a defining feature of stop signs vs yield signs). Always evaluate augmentations against what’s physically realistic for your domain.

Combining Transfer Learning + Augmentation

The standard recipe for production CV with limited data:

Choose a pretrained model (ResNet-50 is the safe default)
Define augmentations appropriate for your domain
Feature extraction first — freeze backbone, train classifier
Evaluate — if accuracy is sufficient, deploy
Fine-tune if needed — unfreeze top layers with low learning rate
Increase augmentation if overfitting persists

This workflow handles 90% of practical CV projects. Companies routinely deploy production models trained on 500-5,000 domain-specific images using this approach.

Key Takeaways

Few people train CV models from scratch — transfer learning from ImageNet-pretrained models is standard
Feature extraction: freeze pretrained layers, train only a new classifier — best for small datasets (100-1,000 images)
Fine-tuning: unfreeze top layers with 10-100× lower learning rate — best for medium datasets (1,000+)
Catastrophic forgetting: high learning rates destroy pretrained knowledge — always use low learning rates for fine-tuning
Data augmentation multiplies effective dataset size by 10-20× with zero new data collection
Augmentation must preserve label validity — always validate that transformations make sense for your domain
Transfer learning + augmentation handles 90% of practical CV projects with limited data

Up Next

You know how to build CV models. Lesson 7 shows where they’re deployed in the real world — from autonomous vehicles to medical imaging — and the ethical challenges that come with teaching machines to see.