How to Fine-Tune a Language Model on Your Own Data
Fine-tuning adapts a pre-trained language model to your specific domain or task by training it further on your own dataset. Running the fine-tuning process on your Breeze gives you full control over the training data, resulting model, and associated costs.
Prerequisites
- A Breeze instance with a GPU (8+ GB VRAM recommended) or a high-RAM CPU instance for smaller models
- Python 3.10 or later
- At least 30 GB of free disk space
- A training dataset in JSONL or CSV format
Installing the Training Stack
python3 -m venv ~/finetune-env
source ~/finetune-env/bin/activate
pip install torch transformers datasets peft accelerate bitsandbytes trl
Preparing Your Dataset
Format your training data as a JSONL file with an instruction-response structure:
{"instruction": "Summarize the quarterly earnings report", "input": "Revenue was $2.3M...", "output": "Q3 revenue reached $2.3M..."}
{"instruction": "Draft a customer response", "input": "Customer complaint about...", "output": "Dear valued customer..."}
Load the dataset using the Hugging Face datasets library:
from datasets import load_dataset
dataset = load_dataset("json", data_files="training_data.jsonl", split="train")
dataset = dataset.train_test_split(test_size=0.1)
Loading the Base Model with QLoRA
Use QLoRA (Quantized Low-Rank Adaptation) to fine-tune large models with minimal VRAM:
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype="float16"
)
model_name = "meta-llama/Llama-3-8B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, quantization_config=bnb_config)
model = prepare_model_for_kbit_training(model)
lora_config = LoraConfig(r=16, lora_alpha=32, target_modules=["q_proj", "v_proj"], lora_dropout=0.05)
model = get_peft_model(model, lora_config)
Training the Model
Configure the training arguments and launch the fine-tuning process:
from trl import SFTTrainer
from transformers import TrainingArguments
training_args = TrainingArguments(
output_dir="./results",
num_train_epochs=3,
per_device_train_batch_size=4,
gradient_accumulation_steps=4,
learning_rate=2e-4,
warmup_steps=100,
logging_steps=10,
save_strategy="epoch",
fp16=True
)
trainer = SFTTrainer(
model=model,
tokenizer=tokenizer,
args=training_args,
train_dataset=dataset["train"],
eval_dataset=dataset["test"],
dataset_text_field="output",
max_seq_length=512
)
trainer.train()
Saving and Using the Fine-Tuned Model
Save the LoRA adapter weights and merge them with the base model for inference:
model.save_pretrained("./my-finetuned-model")
tokenizer.save_pretrained("./my-finetuned-model")
# For inference, load the merged model
from peft import PeftModel
base_model = AutoModelForCausalLM.from_pretrained(model_name)
finetuned = PeftModel.from_pretrained(base_model, "./my-finetuned-model")
merged = finetuned.merge_and_unload()
merged.save_pretrained("./my-merged-model")
Monitoring Training Progress
Use TensorBoard to visualize training metrics in real time:
pip install tensorboard
tensorboard --logdir ./results --host 0.0.0.0 --port 6006
Access the dashboard at http://your-breeze-ip:6006 to monitor loss curves and ensure the model is converging properly.