Skip to content

[KDD 2025] Dual-Head Knowledge Distillation: Enhancing Logits Utilization with an Auxiliary Head

Notifications You must be signed in to change notification settings

penghui-yang/DHKD

Repository files navigation

Dual-Head Knowledge Distillation

Requirements

The repo is tested with:

  • numpy==1.24.3
  • nvidia_dali_cuda120==1.34.0
  • tensorboard_logger==0.1.0
  • torch==2.1.2
  • torchvision==0.16.2

But it should be runnable with other PyTorch versions.

To install requirements:

pip install -r requirements.txt

It is noteworthy that we use nvidia_dali_cuda package to accelerate the dataloader of ImageNet. If you only want to test the performance on CIFAR-100, you can choose not to install nvidia_dali_cuda package and delete files whose names contain "ImageNet".

Quick start

Quick start on CIFAR-100:

python train_teacher.py --model resnet32x4
python train_student.py --model_s ShuffleV2 --path_t ./save/models/resnet32x4_cifar100_lr_0.05_decay_0.0005_trial_0/resnet32x4_best.pth --dual_head --nonlinear --ga --distill dhkd -r 1 -a 1 --BinaryKL_T 2

Quick start on ImageNet:

Before the experiment, you need to prepare the ImageNet dataset in the folder ./data following the example shell file given by pytorch: file.

CUDA_VISIBLE_DEVICES=0,1 python train_student_imagenet.py --model_t ResNet34 --model_s ResNet18 --dataset imagenet --batch_size 128 --epochs 100 --learning_rate 0.1 --lr_decay_epochs 30,60,90 --lr_decay_rate 0.1 --weight_decay 1e-4 --momentum 0.9 --distill dhkd -r 1 -a 0.1 --dual_head --linear --BinaryKL_T 2 --trial 1 --multiprocessing-distributed --world-size 1 --rank 0

About

[KDD 2025] Dual-Head Knowledge Distillation: Enhancing Logits Utilization with an Auxiliary Head

Resources

Stars

Watchers

Forks

Languages