This repository contains the implementation of Spread Preference Annotation (SPA), as described in the paper:
SPA is a framework that aligns large language models (LLMs) to human preferences using only a small amount of annotated data. It leverages direct preference judgment and introduces self-refinement methods to mitigate noise in self-generated data, significantly reducing the reliance on extensive human annotations.
This guide provides detailed instructions on setting up and running experiments with the specified environment and configurations.
- CUDA: 12.1.1
- PyTorch: 2.1.2
- GPU: 4x A6000
-
Create and activate the conda environment:
conda create -n vllm python=3.10 conda activate vllm pip install vllm datasets conda deactivate conda create -n spa python=3.10 conda activate spa
-
Install CUDA toolkit (if necessary):
conda install nvidia/label/cuda-12.1.1::cuda-toolkit
-
Install PyTorch and related packages:
conda install pytorch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 pytorch-cuda=12.1 -c pytorch -c nvidia
-
Install additional required packages:
pip install -e trl/. pip install -e alignment-handbook/. pip install transformers==4.36.2 pip install numpy==1.26.4 pip install torchmetrics pip install huggingface-hub==0.24.7
-
Navigate to the alignment-handbook directory:
cd alignment-handbook
- Split the data:
python make_data_split.py
```bash
bash scripts/run_all.sh
```
-
The current setup is based on the zephyr-7b-beta model. Sampling code may need to be adjusted for different models or chat templates.
-
Models trained to align with the current setup include:
princeton-nlp/Llama-3-Base-8B-SFTalignment-handbook/zephyr-7b-sft-full
-
For initial DPO training:
- Using 3 epochs is effective for datasets with fewer than 2,000 samples.
- For larger datasets, even if initial performance is lower, 1 epoch training has been observed to yield better trends.
-
Hyperparameters can be customized using the configuration files:
alignment-handbook/recipes/zephyr-7b-beta/dpo/config_full_initial.yamlalignment-handbook/recipes/zephyr-7b-beta/dpo/config_full.yaml