Fine-tuning with Predacons
This tutorial demonstrates how to fine-tune a pre-trained language model using the Predacons library. We will use the deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
model as an example.
Prerequisites
- A machine with a GPU and CUDA installed (recommended).
- Python 3.8 or higher.
- The Predacons library installed (
pip install predacons
). - Hugging Face Transformers library installed (
pip install transformers datasets trl peft
).
Steps
-
Import Libraries:
First, import the necessary libraries:
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0' # Specify GPU device
import torch
if torch.cuda.is_available():
print("Using GPU")
else:
print("No GPU available")
import predacons
from datasets import load_dataset
-
Load Dataset:
Load a dataset for fine-tuning. Here, we use
SkunkworksAI/reasoning-0.01
dataset
ds = load_dataset("SkunkworksAI/reasoning-0.01")
-
Define Model Path:
Specify the path to the pre-trained model you want to fine-tune.
model_path = "deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B"
-
Configure Training Parameters:
Set up the training parameters such as output directory, batch size, number of epochs, and save steps.
output_dir = "pico_r1"
overwrite_output_dir = False
per_device_train_batch_size = 1
num_train_epochs = 10
save_steps = 50
-
Initialize and Run Trainer:
Use the
predacons.trainer
function to initialize the trainer and start the fine-tuning process. This example uses 4-bit quantization and LoRA.
trainer = predacons.trainer(
use_legacy_trainer = False,
train_dataset=ds,
model_name = model_path,
output_dir=output_dir,
overwrite_output_dir=overwrite_output_dir,
per_device_train_batch_size=per_device_train_batch_size,
num_train_epochs=num_train_epochs,
save_steps=save_steps,
trust_remote_code = False,
resume_from_checkpoint = False,
auto_quantize = "4bit",
auto_lora_config = True
)
trainer.train()
Parameters Explanation:
use_legacy_trainer
: Flag to use the legacy trainer implementation. Set toFalse
for the newer implementation.train_dataset
: The dataset to use for training.model_name
: The name or path of the pre-trained model.output_dir
: The directory where the fine-tuned model will be saved.overwrite_output_dir
: Whether to overwrite the output directory if it exists.per_device_train_batch_size
: The batch size per GPU.num_train_epochs
: The number of training epochs.save_steps
: The number of steps between saving checkpoints.trust_remote_code
: Whether to trust remote code when loading the model.resume_from_checkpoint
: Whether to resume training from a checkpoint.auto_quantize
: Enables automatic quantization ("4bit" or "8bit").auto_lora_config
: Enables automatic LoRA configuration.
-
Save the Fine-Tuned Model:
The
trainer.train()
method saves the fine-tuned model to the specifiedoutput_dir
.