4/8

Lesson 4 12 min

Tools & Infrastructure

The fine-tuning tool landscape: Unsloth for speed, Axolotl for production, Hugging Face for the ecosystem, and OpenAI's API for no-GPU fine-tuning.

Premium Course Content

This lesson is part of a premium course. Upgrade to Pro to unlock all premium courses and content.

Access all premium courses
1000+ AI skill templates included
New content added weekly

← Back to course overview

🔄 In the last lesson, you learned how LoRA and QLoRA cut memory requirements by 10-20x. But understanding the math is one thing — actually running a training job is another. You need software that handles the quantization, gradient computation, and model saving. And the ecosystem has options.

Too many, honestly. So let’s cut through the noise.

What You’ll Learn

By the end of this lesson, you’ll know which tool to use for your situation — and why the choice matters less than most people think.

The Four Options

Tool	Best For	GPU Required	Cost
Unsloth	Speed on single GPU	Yes (Colab free tier works)	Free
Axolotl	Production, multi-GPU	Yes (multiple GPUs)	Free
Hugging Face (trl + peft)	Full control, custom pipelines	Yes	Free
OpenAI API	No GPU, managed service	No	$3-25/1M training tokens

Unsloth: The Speed Champion

Unsloth is what we’ll use in Lesson 6 for the hands-on exercise. Here’s why:

Performance:

2-5x faster training than standard FlashAttention 2
Up to 80% less VRAM (beyond even standard QLoRA)
Works on Google Colab’s free T4 with just 3 GB VRAM for small models

Supported models (2026): Llama 1-4, Gemma 3, Mistral, Phi-4, Qwen 2.5, DeepSeek, and more. Basically every major open-source model.

What a training script looks like:

from unsloth import FastLanguageModel

# Load model with 4-bit quantization
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="unsloth/Llama-3.2-3B-Instruct",
    max_seq_length=2048,
    load_in_4bit=True,
)

# Add LoRA adapters
model = FastLanguageModel.get_peft_model(
    model,
    r=16,
    lora_alpha=16,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
    lora_dropout=0.05,
)

# Train with trl's SFTTrainer
from trl import SFTTrainer, SFTConfig

trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    args=SFTConfig(
        per_device_train_batch_size=4,
        num_train_epochs=1,
        learning_rate=2e-4,
        output_dir="outputs",
    ),
)
trainer.train()

That’s it. Under 30 lines to fine-tune a 3B model. Unsloth handles the quantization, memory management, and speed optimizations transparently.

The catch: Multi-GPU is only available in the paid Pro/Enterprise tiers. Free Unsloth = single GPU only.

✅ Quick Check: You’re fine-tuning on a free Colab T4 (16 GB VRAM). Can you use Unsloth with a 7B model? (Yes. With QLoRA and Unsloth’s memory optimizations, a 7B model fits in ~8-10 GB — well within the T4’s 16 GB. You’ll use a batch size of 2-4 to stay within limits.)

Axolotl: For Teams and Production

Axolotl is the production standard. Config-driven, multi-GPU, reproducible.

Key difference from Unsloth: Axolotl uses YAML config files instead of Python scripts. Every hyperparameter, every dataset path, every model choice goes in a single file:

base_model: meta-llama/Llama-3.2-3B-Instruct
model_type: AutoModelForCausalLM
load_in_4bit: true

adapter: qlora
lora_r: 16
lora_alpha: 16
lora_dropout: 0.05
lora_target_modules:
  - q_proj
  - k_proj
  - v_proj
  - o_proj

datasets:
  - path: my_training_data.jsonl
    type: sharegpt

training:
  num_epochs: 1
  learning_rate: 2e-4
  micro_batch_size: 4
  gradient_accumulation_steps: 4

Why teams prefer Axolotl:

YAML configs are version-controlled → reproducible experiments
Multi-GPU with DeepSpeed and FSDP out of the box
Supports long-context fine-tuning (Ring FlashAttention)
Community-maintained, well-documented

When to use Axolotl over Unsloth:

You have 2+ GPUs
You need reproducible experiment configs
You’re fine-tuning regularly (not one-off)
Your team needs to share and review training setups

Hugging Face Ecosystem: The Building Blocks

Unsloth and Axolotl are built on top of Hugging Face libraries. If you need maximum control, use the libraries directly:

Library	Purpose
transformers	Model loading, tokenization, inference
peft	LoRA/QLoRA adapter creation and management
trl	Training loops: SFTTrainer, DPOTrainer
accelerate	Multi-GPU and mixed-precision training
bitsandbytes	4-bit and 8-bit quantization
datasets	Dataset loading, formatting, preprocessing

When to go bare HF libraries:

Custom training loops that Unsloth/Axolotl don’t support
Research experiments with non-standard configurations
Integration into existing ML pipelines

For this course, Unsloth wraps these libraries with speed optimizations. You’ll use it in Lesson 6 without needing to understand each library individually.

OpenAI’s Fine-Tuning API: No GPU Needed

If you don’t want to manage GPUs at all, OpenAI offers managed fine-tuning:

How it works:

Prepare your data as JSONL (same format as SFT examples)
Upload via API or dashboard
Choose a model: GPT-4o-mini or GPT-4.1
Click “Start training”
Use your fine-tuned model via the same API

Pricing (2026):

Model	Training Cost	Inference (Input/Output)
GPT-4o-mini	$3.00/1M tokens	$0.30 / $1.20 per 1M
GPT-4.1-mini	$3.00/1M tokens	$0.30 / $1.20 per 1M
GPT-4.1	$25.00/1M tokens	$2.00 / $8.00 per 1M

The math: Fine-tuning GPT-4o-mini on 1,000 examples (~500K tokens) costs about $1.50. That’s cheaper than renting a GPU for an hour.

DPO is now available too — upload preference pairs, OpenAI handles the rest.

The trade-off: You don’t own the weights. You can’t self-host. You’re locked into OpenAI’s API. For privacy-sensitive use cases, that’s a deal-breaker.

✅ Quick Check: A startup wants to fine-tune for customer support tone. They have 500 examples, no GPU, and need to deploy fast. Which tool? (OpenAI’s fine-tuning API. Upload 500 examples, train GPT-4o-mini for ~$0.75, deploy immediately via API. No GPU, no ops, no waiting. If they later need self-hosting for privacy, they can migrate to open-source with Unsloth.)

GPU Cloud Options

If Colab’s free tier isn’t enough, here are the main GPU cloud providers:

Provider	GPU	$/hour	Best For
Google Colab	T4 (free), A100 (Pro)	$0 / $10/mo	Experiments, learning
Kaggle	T4 × 2, P100	$0 (30h/week)	Free dual-GPU
RunPod	A100, H100	$1.74-3.89/h	Flexible on-demand
Lambda	A100, H100	$1.10-2.49/h	Serious training
Vast.ai	Various	$0.30-2.00/h	Budget runs

For this course, Google Colab’s free T4 is enough. For production training runs, RunPod or Lambda give you A100 access at reasonable rates.

Key Takeaways

Unsloth is best for single-GPU speed (2-5x faster, 80% less VRAM) — use for Colab, local GPUs, and experiments
Axolotl is best for production teams (YAML configs, multi-GPU, reproducible experiments)
OpenAI’s API is the no-GPU option ($3/1M training tokens for GPT-4o-mini) but you don’t own the weights
Hugging Face libraries (peft, trl, transformers) are the foundation everything else is built on
For learning, Colab’s free T4 is enough. For production, RunPod or Lambda offer A100s at $1-3/hour
The tool choice matters less than the data quality — a great dataset on any tool beats a mediocre dataset on the best tool

Up Next

You’ve got the tools. Now you need data. In the next lesson, you’ll learn dataset preparation — the step that makes or breaks your fine-tuning results. We’ll cover data formats, quality filtering, synthetic data generation, and how many examples you actually need.

Knowledge Check

1. When should you choose Unsloth over Axolotl for fine-tuning?

When you need multi-GPU distributed training When you're working with a single GPU (Colab, local RTX) and want 2-5x faster training with less VRAM When you need YAML-based reproducible experiment configs

2. What's the main advantage of OpenAI's fine-tuning API over open-source tools?

It produces better models No GPU required — you upload training data, OpenAI handles the compute. You pay per training token. It supports more model architectures

3. Which Hugging Face library handles LoRA adapter creation and management?

transformers peft (Parameter-Efficient Fine-Tuning) datasets

Answer all questions to check

Complete the quiz above first

Related Skills

Code Review Assistant