Tools & Infrastructure
The fine-tuning tool landscape: Unsloth for speed, Axolotl for production, Hugging Face for the ecosystem, and OpenAI's API for no-GPU fine-tuning.
Premium Course Content
This lesson is part of a premium course. Upgrade to Pro to unlock all premium courses and content.
- Access all premium courses
- 1000+ AI skill templates included
- New content added weekly
🔄 In the last lesson, you learned how LoRA and QLoRA cut memory requirements by 10-20x. But understanding the math is one thing — actually running a training job is another. You need software that handles the quantization, gradient computation, and model saving. And the ecosystem has options.
Too many, honestly. So let’s cut through the noise.
What You’ll Learn
By the end of this lesson, you’ll know which tool to use for your situation — and why the choice matters less than most people think.
The Four Options
| Tool | Best For | GPU Required | Cost |
|---|---|---|---|
| Unsloth | Speed on single GPU | Yes (Colab free tier works) | Free |
| Axolotl | Production, multi-GPU | Yes (multiple GPUs) | Free |
| Hugging Face (trl + peft) | Full control, custom pipelines | Yes | Free |
| OpenAI API | No GPU, managed service | No | $3-25/1M training tokens |
Unsloth: The Speed Champion
Unsloth is what we’ll use in Lesson 6 for the hands-on exercise. Here’s why:
Performance:
- 2-5x faster training than standard FlashAttention 2
- Up to 80% less VRAM (beyond even standard QLoRA)
- Works on Google Colab’s free T4 with just 3 GB VRAM for small models
Supported models (2026): Llama 1-4, Gemma 3, Mistral, Phi-4, Qwen 2.5, DeepSeek, and more. Basically every major open-source model.
What a training script looks like:
from unsloth import FastLanguageModel
# Load model with 4-bit quantization
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="unsloth/Llama-3.2-3B-Instruct",
max_seq_length=2048,
load_in_4bit=True,
)
# Add LoRA adapters
model = FastLanguageModel.get_peft_model(
model,
r=16,
lora_alpha=16,
target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
lora_dropout=0.05,
)
# Train with trl's SFTTrainer
from trl import SFTTrainer, SFTConfig
trainer = SFTTrainer(
model=model,
train_dataset=dataset,
args=SFTConfig(
per_device_train_batch_size=4,
num_train_epochs=1,
learning_rate=2e-4,
output_dir="outputs",
),
)
trainer.train()
That’s it. Under 30 lines to fine-tune a 3B model. Unsloth handles the quantization, memory management, and speed optimizations transparently.
The catch: Multi-GPU is only available in the paid Pro/Enterprise tiers. Free Unsloth = single GPU only.
✅ Quick Check: You’re fine-tuning on a free Colab T4 (16 GB VRAM). Can you use Unsloth with a 7B model? (Yes. With QLoRA and Unsloth’s memory optimizations, a 7B model fits in ~8-10 GB — well within the T4’s 16 GB. You’ll use a batch size of 2-4 to stay within limits.)
Axolotl: For Teams and Production
Axolotl is the production standard. Config-driven, multi-GPU, reproducible.
Key difference from Unsloth: Axolotl uses YAML config files instead of Python scripts. Every hyperparameter, every dataset path, every model choice goes in a single file:
base_model: meta-llama/Llama-3.2-3B-Instruct
model_type: AutoModelForCausalLM
load_in_4bit: true
adapter: qlora
lora_r: 16
lora_alpha: 16
lora_dropout: 0.05
lora_target_modules:
- q_proj
- k_proj
- v_proj
- o_proj
datasets:
- path: my_training_data.jsonl
type: sharegpt
training:
num_epochs: 1
learning_rate: 2e-4
micro_batch_size: 4
gradient_accumulation_steps: 4
Why teams prefer Axolotl:
- YAML configs are version-controlled → reproducible experiments
- Multi-GPU with DeepSpeed and FSDP out of the box
- Supports long-context fine-tuning (Ring FlashAttention)
- Community-maintained, well-documented
When to use Axolotl over Unsloth:
- You have 2+ GPUs
- You need reproducible experiment configs
- You’re fine-tuning regularly (not one-off)
- Your team needs to share and review training setups
Hugging Face Ecosystem: The Building Blocks
Unsloth and Axolotl are built on top of Hugging Face libraries. If you need maximum control, use the libraries directly:
| Library | Purpose |
|---|---|
| transformers | Model loading, tokenization, inference |
| peft | LoRA/QLoRA adapter creation and management |
| trl | Training loops: SFTTrainer, DPOTrainer |
| accelerate | Multi-GPU and mixed-precision training |
| bitsandbytes | 4-bit and 8-bit quantization |
| datasets | Dataset loading, formatting, preprocessing |
When to go bare HF libraries:
- Custom training loops that Unsloth/Axolotl don’t support
- Research experiments with non-standard configurations
- Integration into existing ML pipelines
For this course, Unsloth wraps these libraries with speed optimizations. You’ll use it in Lesson 6 without needing to understand each library individually.
OpenAI’s Fine-Tuning API: No GPU Needed
If you don’t want to manage GPUs at all, OpenAI offers managed fine-tuning:
How it works:
- Prepare your data as JSONL (same format as SFT examples)
- Upload via API or dashboard
- Choose a model: GPT-4o-mini or GPT-4.1
- Click “Start training”
- Use your fine-tuned model via the same API
Pricing (2026):
| Model | Training Cost | Inference (Input/Output) |
|---|---|---|
| GPT-4o-mini | $3.00/1M tokens | $0.30 / $1.20 per 1M |
| GPT-4.1-mini | $3.00/1M tokens | $0.30 / $1.20 per 1M |
| GPT-4.1 | $25.00/1M tokens | $2.00 / $8.00 per 1M |
The math: Fine-tuning GPT-4o-mini on 1,000 examples (~500K tokens) costs about $1.50. That’s cheaper than renting a GPU for an hour.
DPO is now available too — upload preference pairs, OpenAI handles the rest.
The trade-off: You don’t own the weights. You can’t self-host. You’re locked into OpenAI’s API. For privacy-sensitive use cases, that’s a deal-breaker.
✅ Quick Check: A startup wants to fine-tune for customer support tone. They have 500 examples, no GPU, and need to deploy fast. Which tool? (OpenAI’s fine-tuning API. Upload 500 examples, train GPT-4o-mini for ~$0.75, deploy immediately via API. No GPU, no ops, no waiting. If they later need self-hosting for privacy, they can migrate to open-source with Unsloth.)
GPU Cloud Options
If Colab’s free tier isn’t enough, here are the main GPU cloud providers:
| Provider | GPU | $/hour | Best For |
|---|---|---|---|
| Google Colab | T4 (free), A100 (Pro) | $0 / $10/mo | Experiments, learning |
| Kaggle | T4 × 2, P100 | $0 (30h/week) | Free dual-GPU |
| RunPod | A100, H100 | $1.74-3.89/h | Flexible on-demand |
| Lambda | A100, H100 | $1.10-2.49/h | Serious training |
| Vast.ai | Various | $0.30-2.00/h | Budget runs |
For this course, Google Colab’s free T4 is enough. For production training runs, RunPod or Lambda give you A100 access at reasonable rates.
Key Takeaways
- Unsloth is best for single-GPU speed (2-5x faster, 80% less VRAM) — use for Colab, local GPUs, and experiments
- Axolotl is best for production teams (YAML configs, multi-GPU, reproducible experiments)
- OpenAI’s API is the no-GPU option ($3/1M training tokens for GPT-4o-mini) but you don’t own the weights
- Hugging Face libraries (peft, trl, transformers) are the foundation everything else is built on
- For learning, Colab’s free T4 is enough. For production, RunPod or Lambda offer A100s at $1-3/hour
- The tool choice matters less than the data quality — a great dataset on any tool beats a mediocre dataset on the best tool
Up Next
You’ve got the tools. Now you need data. In the next lesson, you’ll learn dataset preparation — the step that makes or breaks your fine-tuning results. We’ll cover data formats, quality filtering, synthetic data generation, and how many examples you actually need.
Knowledge Check
Complete the quiz above first
Lesson completed!