CoPe is a decoding-time personalization framework for large language models (LLMs). It maximizes implicit user reward by contrasting a personalized model (PEFT/LoRA tuned per user) with the base task-adapted model at token level — enabling personalization without external reward models or extra reward labeling.
This repository provides end-to-end scripts for:
- 🔧 Task-Adaptive Model (TAM) training on non-target users
- 👤 Per-user SFT adapters (OPPU)
- 🔄 Synthetic(or Pseudo) negative generation and selection for DPO
- 🎯 DPO fine-tuning per user
- 🚀 Inference with contrastive decoding (personal vs. base)
📋 Tasks
- Supported
--task_name:news_headline,scholarly_title,abstract_generation,review_writing,topic_writing
We present CoPe, a decoding framework for LLM personalization by Contrasting Personal Preference (CoPe). The key idea is to incorporate implicit reward signals of user preference to guide both training (via DPO on selected negative pairs) and inference (via contrastive decoding that down-weights tokens preferred by the base model but disfavored by the personalized model).
- We utilize publicly available data from the LaMP and LongLaMP benchmarks and follow the OPPU setting.
- Download processed data and place under the repository root
./data: - After extracting, you should have paths like
./data/<task_name>/user_top_100_history.json.
Create a virtual environment (example with conda):
conda create -n cope python=3.9 -y
conda activate copeInstall dependencies from the repo root:
pip install -r requirements.txtChange into the CoPe project directory before running the scripts:
cd CoPe💡 Note
- A CUDA GPU is recommended. Some steps (e.g., vLLM sampling for synthetic(pseudo) negatives) may require multiple GPUs for speed.
- If you use private models on Hugging Face, set
--access_tokenaccordingly.
- 🔧 Train TAM on non-target users to adapt the base model to the task domain.
- 👤 Train per-user SFT adapters (OPPU) using each user's own history.
- 🔄 Generate pseudo negatives per user and select best negatives by contrasting OPPU vs. TAM likelihoods.
- 🎯 Run DPO with the selected negatives to refine per-user adapters.
- 🚀 Inference with contrastive decoding: contrast OPPU (expert) vs. TAM (amateur) at token level.
Task-Adaptive Model is trained on data excluding the target user. Outputs are saved as LoRA adapters.
python scripts/TAM.py \
--task_name news_headline \
--model_name mistralai/Mistral-7B-Instruct-v0.3 \
--batch_size 8 \
--max_epoch 3 \
--is_trainpython scripts/TAM.py \
--task_name news_headline \
--model_name mistralai/Mistral-7B-Instruct-v0.3 \
--repetition_penalty 1.0 \
--is_test📝 Notes
- TAM saves to
./ckpt/TAM/<task_name>/TAM-<model_name_short>_ckpt/.
Train one PEFT per user on their own history, initializing from TAM.
python scripts/OPPU_sft.py \
--task_name news_headline \
--model_name mistralai/Mistral-7B-Instruct-v0.3 \
--batch_size 4 \
--max_epoch 2 \
--is_trainpython scripts/OPPU_sft.py \
--task_name news_headline \
--model_name mistralai/Mistral-7B-Instruct-v0.3python scripts/OPPU_sft.py \
--task_name news_headline \
--model_name mistralai/Mistral-7B-Instruct-v0.3 \
--is_cd \
--contrastive_alpha 0.1 \
--repetition_penalty 1.0📤 Outputs
- Predictions are saved under
./output/<task_name>/OPPU-SFT-<model>-*.json.
python scripts/generate_pseudo_negatives.py \
--task_name news_headline \
--model_name mistralai/Mistral-7B-Instruct-v0.3 \
--response_num 3 \
--temperature 1.0 \
--data_path ./data✅ This creates ./data/<task_name>/user_top_100_history_with_pseudo_negatives.json with multiple candidate responses per user.
python scripts/compute_scores.py \
--task_name news_headline \
--model_name mistralai/Mistral-7B-Instruct-v0.3 \
--std max✅ This writes ./data/<task_name>/user_top_100_history_with_pseudo_negatives_max.json (or _min.json).
Run DPO using the selected negatives to refine each user’s adapter.
python scripts/OPPU_sft+dpo.py \
--task_name news_headline \
--model_name mistralai/Mistral-7B-Instruct-v0.3 \
--batch_size 8 \
--dpo_beta 0.01 \
--negative_sampling_method pseudo \
--mode max \
--is_train📝 Notes
- Ensure TAM adapter is discoverable by this script. By default this code expects TAM at
./ckpt/TAM/<task_name>/TAM-<model>_ckpt/. - DPO outputs (adapters) are saved under
./ckpt/OPPU_SFT+DPO/<task_name>/user_<idx>/.
Use the DPO-refined per-user adapter as expert and TAM as amateur, decoding with per-token contrastive scores.
python scripts/OPPU_sft+dpo.py \
--task_name news_headline \
--model_name mistralai/Mistral-7B-Instruct-v0.3 \
--is_cd \
--contrastive_alpha 0.1 \
--repetition_penalty 1.0📤 Outputs
- Files like
OPPU-SFT+DPO-<model>-rp<...>-ca<...>-CD-run_*.jsonunder./output/<task_name>/.
Evaluate predictions with the provided metrics script:
python eval/eval_task.py \
--task <task_name> \
--golds_json ./data/<task_name>/user_top_100_history_label.json \
--preds_json ./output/<task_name>/<PRED_FILE>.json \
--task_name <LaMP_ID> \
--output_file ./output/<task_name>/<PRED_FILE>.jsonExamples for <LaMP_ID>: LaMP_4, LaMP_5, LongLaMP_2, LongLaMP_3, LongLaMP_4.
- Memory: prefer bfloat16 and gradient checkpointing; adjust
--batch_sizeif OOM. - Access tokens: if using gated models, pass
--access_token <HF_TOKEN>for loading. - Reproducibility: set seeds (already set to 42). For sampling, use
--is_samplingand document runs.
If you find this work useful, please cite:
@inproceedings{bu-etal-2025-personalized,
title = "Personalized {LLM} Decoding via Contrasting Personal Preference",
author = "Bu, Hyungjune and
Jung, ChanJoo and
Kang, Minjae and
Kim, Jaehyung",
editor = "Christodoulopoulos, Christos and
Chakraborty, Tanmoy and
Rose, Carolyn and
Peng, Violet",
booktitle = "Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing",
month = nov,
year = "2025",
address = "Suzhou, China",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.emnlp-main.1723/",
pages = "33946--33966",
ISBN = "979-8-89176-332-6"
}

