Fine-tuning allows you to adapt a pre-trained language model to your specific domain or task. Running the process on your own Breeze keeps proprietary training data private.
Requirements
- A GPU-enabled Breeze with 16 GB+ VRAM for LoRA fine-tuning
- Python 3.10+ with CUDA toolkit installed
- At least 50 GB free disk space
Install the Training Stack
pip install torch transformers datasets peft accelerate bitsandbytes
Prepare Your Dataset
Format training data as JSONL with instruction and response fields:
{"instruction": "Summarize this ticket", "input": "Customer reports...", "output": "The customer is experiencing..."}
{"instruction": "Draft a reply", "input": "Server is down...", "output": "We are investigating..."}
Run LoRA Fine-Tuning
Use Parameter-Efficient Fine-Tuning (PEFT) to train with less memory:
from peft import LoraConfig, get_peft_model
from transformers import AutoModelForCausalLM, TrainingArguments, Trainer
model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-v0.1", load_in_4bit=True)
lora_config = LoraConfig(r=16, lora_alpha=32, target_modules=["q_proj", "v_proj"])
model = get_peft_model(model, lora_config)
trainer = Trainer(model=model, args=TrainingArguments(
output_dir="./output", num_train_epochs=3, per_device_train_batch_size=4,
learning_rate=2e-4, fp16=True
), train_dataset=dataset)
trainer.train()
Tips
- Use 4-bit quantization to reduce VRAM requirements significantly
- Start with a small dataset (500-1000 examples) and iterate
- Monitor GPU usage with
nvidia-smiduring training