Codes for UniDetox: Universal Detoxification of Large Language Models via Dataset Distillation.
Authors:
Huimin Lu1, Masaru Isonuma1,2,3, Junichiro Mori1,4, and Ichiro Sakata1
1University of Tokyo
2University of Edinburgh
3National Institute of Informatics (NII)
4RIKEN AIP
conda env create --name unidetox -f environment.yml
conda activate unidetox
On some Linux systems, you may encounter an error about GLIBCXX_3.4.29 not found.
This happens if your system’s library paths overshadow conda’s newer libstdc++.so.6.
To ensure the conda environment’s libraries take priority, you can do:
export LD_LIBRARY_PATH="$CONDA_PREFIX/lib:$LD_LIBRARY_PATH"
python -m unidetox.toxic_gpt2_finetune_and_distill \
--base_model_name gpt2-xl \
--output_dir ./toxic_model \
--auth_token "enter_your_huggingface_auth_token_here_to_load_DGHS_dataset" \
--epochs 3 \
--lr 1e-5 \
--batch_size 4
python -m unidetox.unidetox --mode finetune \
--mode finetune \
--auth_token "enter_your_huggingface_auth_token_here_to_use_LLaMA2" \
--target_model "gpt2-xl"
python -m unidetox.unidetox --mode evaluate \
--mode evaluate \
--auth_token "enter_your_huggingface_auth_token_here_to_use_LLaMA2" \
--target_model "gpt2-xl"
If you find our work helpful, please cite our paper:
@inproceedings{
lu2025unidetox,
title={UniDetox: Universal Detoxification of Large Language Models via Dataset Distillation},
author={Huimin LU and Masaru Isonuma and Junichiro Mori and Ichiro Sakata},
booktitle={The Thirteenth International Conference on Learning Representations},
year={2025},
url={https://openreview.net/forum?id=eLLBILFRsA}
}