A comprehensive, step-by-step guide with validation steps and troubleshooting
This guide is an expanded version of the original tutorial by Avi Chawla. The original tutorial can be found here.
- macOS ≥ 13.0 (Ventura or newer)
- M1/M2/M3 Mac with ≥16GB RAM
- At least 20GB free disk space
- Xcode Command Line Tools
- Active internet connection for initial downloads
# Install Xcode CLI tools if not already installed
xcode-select --install
# Install Homebrew if not present
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
# Install Miniforge3 for ARM64
curl -O https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-arm64.sh
sh Miniconda3-latest-MacOSX-arm64.sh
source ~/miniforge3/bin/activate
# Create new conda environment
conda create -n deepseek python=3.11
conda activate deepseek
# Install PyTorch with Metal support
pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu
# Install UnslothAI with Apple Silicon support
pip install "unsloth[apple-m1]"
# Install Ollama
brew install ollama
import torch
print(f"PyTorch version: {torch.__version__}")
print(f"MPS available: {torch.backends.mps.is_available()}")
print(f"MPS built: {torch.backends.mps.is_built()}")
# Should output True for both MPS checks
from unsloth import FastLanguageModel
import torch
# Set environment variables for Metal optimization
import os
os.environ['PYTORCH_MPS_HIGH_WATERMARK_RATIO'] = '0.95'
# Define model path
MODEL = "unsloth/DeepSeek-R1-Distill-llama-8B-unsloth-bnb-4bit"
# Load model with Metal acceleration
model, tokenizer = FastLanguageModel.from_pretrained(
model_name=MODEL,
max_seq_length=2048,
dtype=None,
load_in_4bit=True, # Enable 4-bit quantization
)
# Move model to MPS device
device = torch.device("mps")
model = model.to(device)
# Test tokenizer and model
test_input = "Hello, world!"
tokens = tokenizer(test_input, return_tensors="pt").to(device)
print(f"Tokenized output: {tokens}")
model = FastLanguageModel.get_peft_model(
model,
r=4, # LoRA rank
target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
use_gradient_checkpointing="unsloth",
lora_alpha=16,
lora_dropout=0,
bias="none",
use_rslora=False,
loftq_config=None
)
from datasets import load_dataset
from unsloth import to_sharegpt, standardize_sharegpt
# Load Alpaca dataset
dataset = load_dataset("vicgalle/alpaca-gpt4", split="train")
# Convert to ShareGPT format
dataset = to_sharegpt(
dataset,
merged_prompt="{instruction}[\nYour input is:\n{input}]",
output_column_name="output",
conversation_extension=3
)
# Standardize format
dataset = standardize_sharegpt(dataset)
# Verify dataset format
print(f"Dataset size: {len(dataset)}")
sample = dataset[0]
print(f"Sample entry:\n{sample}")
# Check token lengths
max_length = max(len(tokenizer.encode(str(x))) for x in dataset)
print(f"Maximum sequence length: {max_length}")
from transformers import TrainingArguments
from ttl import SFTTrainer
training_args = TrainingArguments(
per_device_train_batch_size=2,
gradient_accumulation_steps=4,
max_steps=60,
learning_rate=2e-4,
optim="adamw_8bit",
weight_decay=0.01,
fp16=True, # Enable mixed precision
output_dir="./results",
save_strategy="steps",
save_steps=20,
)
# Initialize trainer
trainer = SFTTrainer(
model=model,
tokenizer=tokenizer,
train_dataset=dataset,
args=training_args
)
# Enable Metal performance monitoring
print("Starting training...")
trainer_stats = trainer.train()
# Monitor progress
print(f"Initial loss: {trainer_stats.training_loss[0]}")
print(f"Final loss: {trainer_stats.training_loss[-1]}")
- Expected loss pattern: 1.8-2.0 initially, decreasing to 1.0-1.3
- Monitor Activity Monitor for GPU usage
- Watch for memory warnings in terminal
- Training should take approximately 30-60 minutes
# Save model in GGUF format
model.save_pretrained_gguf("deepseek_finetuned", tokenizer)
# Create Modelfile
with open("Modelfile", "w") as f:
f.write("""FROM deepseek-r1:8b-base
PARAMETER temperature 0.7
PARAMETER top_p 0.7
PARAMETER stop "User:"
PARAMETER stop "Assistant:"
LICENSE Apache 2.0
TEMPLATE """{{.System}}
User: {{.Prompt}}
Assistant: """
""")
# Create Ollama model
!ollama create deepseek_finetuned -f ./Modelfile
import ollama
# Test the model
response = ollama.chat(model='deepseek_finetuned', messages=[
{
'role': 'user',
'content': 'What is 2+2?'
}
])
print(response['message']['content'])
-
Metal Device Errors
# Reset Metal device torch.mps.empty_cache()
-
Memory Issues
- Reduce batch size
- Enable gradient checkpointing
- Clear Python memory:
import gc gc.collect()
-
Training Crashes
- Verify macOS version ≥13.0
- Check available memory
- Reduce model size or use more quantization
- Set environment variables:
export MPS_GRAPH_COMPILE_SYNCHRONOUS=1 export PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.95
A companion guide for troubleshooting every step of the fine-tuning process
Error: xcode-select: error: command line tools are already installed
Solution:
# Remove existing installation
sudo rm -rf /Library/Developer/CommandLineTools
# Reinstall
xcode-select --install
Error: Permission denied @ dir_s_mkdir - /usr/local/Cellar
Solution:
# Fix permissions
sudo chown -R $(whoami) /usr/local/*
# Retry installation
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
Error: No space left on device
Solution:
- Clean up unnecessary files:
brew cleanup
conda clean --all
- Check space requirements:
df -h
- Required space: at least 20GB free
ERROR: Could not find a version that satisfies the requirement torch
Solution:
# Clear pip cache
pip cache purge
# Try alternative installation
conda install pytorch torchvision torchaudio -c pytorch-nightly
Error: Failed building wheel for unsloth
Solution:
- Install build dependencies:
conda install -y cmake ninja
pip install --upgrade pip setuptools wheel
- Try alternative installation:
pip install "unsloth[apple-m1] @ git+https://github.com/unslothai/unsloth.git"
RuntimeError: MPS backend not available
Solution:
- Verify macOS version:
sw_vers
# Must be ≥13.0
- Check Metal support:
import torch
print(torch.backends.mps.is_built())
if not torch.backends.mps.is_available():
print("Update macOS to version 13.0 or later")
RuntimeError: out of memory
Solution:
- Enable memory optimization:
os.environ['PYTORCH_MPS_HIGH_WATERMARK_RATIO'] = '0.8'
torch.mps.empty_cache()
- Reduce model size:
model, tokenizer = FastLanguageModel.from_pretrained(
model_name=MODEL,
max_seq_length=1024, # Reduced from 2048
load_in_8bit=True, # Use 8-bit instead of 4-bit
)
OSError: Can't load tokenizer for 'unsloth/DeepSeek-R1'
Solution:
# Force download tokenizer
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(
MODEL,
trust_remote_code=True,
use_fast=False
)
RuntimeError: CUDA error: no kernel image is available for execution
Solution:
# Force Metal backend
import os
os.environ['CUDA_VISIBLE_DEVICES'] = ''
device = torch.device("mps")
ValueError: Gradient checkpointing is not compatible
Solution:
model = FastLanguageModel.get_peft_model(
model,
use_gradient_checkpointing=False, # Disable if problematic
lora_alpha=16,
lora_dropout=0.1, # Add dropout for stability
)
FileNotFoundError: Dataset vicgalle/alpaca-gpt4 not found
Solution:
- Check internet connection
- Try alternative dataset source:
# Alternative loading method
dataset = load_dataset(
"json",
data_files={"train": "path/to/local/alpaca.json"},
split="train"
)
KeyError: instruction not found in dataset
Solution:
# Verify dataset structure
print(dataset[0].keys())
# Map correct fields
dataset = dataset.map(
lambda x: {
"instruction": x.get("prompt", ""),
"input": x.get("context", ""),
"output": x.get("response", "")
}
)
RuntimeError: MPS backend: Metal out of memory
Solution:
training_args = TrainingArguments(
per_device_train_batch_size=1, # Reduce batch size
gradient_accumulation_steps=8, # Increase steps
fp16=False, # Disable mixed precision
optim="adamw_torch", # Use standard optimizer
)
WARNING: Loss is NaN
Solution:
training_args = TrainingArguments(
learning_rate=1e-4, # Reduce learning rate
max_grad_norm=1.0, # Add gradient clipping
warmup_steps=100, # Add warmup
)
Symptoms:
- Loss not decreasing
- GPU utilization low
- Training seems frozen
Solution:
- Check Progress:
# Monitor training metrics
print(f"Step: {trainer.state.global_step}")
print(f"Loss: {trainer.state.log_history[-1]}")
- Reset Training:
# Clear cache
torch.mps.empty_cache()
import gc
gc.collect()
# Restart training with monitoring
trainer.train(
resume_from_checkpoint=False,
report_to="tensorboard"
)
OSError: Can't save model
Solution:
# Set explicit save directory with permissions
import os
save_dir = "./checkpoints"
os.makedirs(save_dir, exist_ok=True)
os.chmod(save_dir, 0o777)
training_args = TrainingArguments(
output_dir=save_dir,
save_strategy="steps",
save_steps=10,
save_total_limit=2, # Keep only last 2 checkpoints
)
Error: Failed to convert model to GGUF format
Solution:
- Check disk space
- Try alternative conversion:
# Save in safetensors format first
model.save_pretrained("./model_safetensors", safe_serialization=True)
# Then convert using llama.cpp
!./llama.cpp/convert.py ./model_safetensors --outfile model.gguf
Error: failed to create model
Solution:
- Check Modelfile syntax:
# Validate Modelfile
ollama show deepseek_finetuned
- Clean and retry:
# Remove existing model
ollama rm deepseek_finetuned
# Clear cache
rm -rf ~/.ollama/models/deepseek_finetuned
# Retry creation
ollama create deepseek_finetuned -f ./Modelfile
Symptoms:
- No output
- Garbage output
- Very slow responses
Solution:
- Check model loading:
# Verify model is loaded correctly
response = ollama.list()
print(response)
- Adjust inference parameters:
response = ollama.chat(
model='deepseek_finetuned',
messages=[{'role': 'user', 'content': 'Test prompt'}],
options={
'temperature': 0.7,
'top_p': 0.9,
'num_predict': 100,
'stop': ['User:', 'Assistant:']
}
)
Always monitor system resources during the entire process:
# Monitor GPU
sudo powermetrics --samplers gpu_power -i 1000
# Monitor memory
top -l 1 -n 0 -s 0 | grep PhysMem
# Monitor disk space
df -h