Skip to content

EminLU/UniDetox

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

UniDetox

Codes for UniDetox: Universal Detoxification of Large Language Models via Dataset Distillation.

Authors:
Huimin Lu1, Masaru Isonuma1,2,3, Junichiro Mori1,4, and Ichiro Sakata1

1University of Tokyo
2University of Edinburgh
3National Institute of Informatics (NII)
4RIKEN AIP

GitHub Logo


0. Reproduce the Environment

conda env create --name unidetox -f environment.yml
conda activate unidetox

Potential GLIBCXX Issue

On some Linux systems, you may encounter an error about GLIBCXX_3.4.29 not found.

This happens if your system’s library paths overshadow conda’s newer libstdc++.so.6.

To ensure the conda environment’s libraries take priority, you can do:

export LD_LIBRARY_PATH="$CONDA_PREFIX/lib:$LD_LIBRARY_PATH"

1+2. Obtain a Toxic Model and Distill Detoxifying Data

python -m unidetox.toxic_gpt2_finetune_and_distill \
  --base_model_name gpt2-xl \
  --output_dir ./toxic_model \
  --auth_token "enter_your_huggingface_auth_token_here_to_load_DGHS_dataset" \
  --epochs 3 \
  --lr 1e-5 \
  --batch_size 4

3. Fine-tune Model(s) for Detoxification

  python -m unidetox.unidetox --mode finetune \
  --mode finetune \
  --auth_token "enter_your_huggingface_auth_token_here_to_use_LLaMA2" \
  --target_model "gpt2-xl"

Reproduce our Evaluation Results of GPT-2 XL

  python -m unidetox.unidetox --mode evaluate \
  --mode evaluate \
  --auth_token "enter_your_huggingface_auth_token_here_to_use_LLaMA2" \
  --target_model "gpt2-xl"

If you find our work helpful, please cite our paper:

@inproceedings{
lu2025unidetox,
title={UniDetox: Universal Detoxification of Large Language Models via Dataset Distillation},
author={Huimin LU and Masaru Isonuma and Junichiro Mori and Ichiro Sakata},
booktitle={The Thirteenth International Conference on Learning Representations},
year={2025},
url={https://openreview.net/forum?id=eLLBILFRsA}
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published