问题描述
在执行这条训练指令时候出现了这个问题:
python train.py --batch_size 100 --max_epochs 60 --runname train --wm_batch_size 2 --wmtrain
这个问题的出现是在进行训练的过程中,具体的报错内容如下,我会把对解决问题直接相关的内容标出来,按照这个思路可以把问题解决:
==> Preparing data..
Using CIFAR10 dataset.
Files already downloaded and verified
Files already downloaded and verified
Loading watermark images
==> Building model..
Using CUDA
Parallel training on 3 GPUs.
/home/visionx/anaconda3/envs/waterknn/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py:32: UserWarning:
There is an imbalance between your GPUs. You may want to exclude GPU 2 which
has less than 75% of the memory or cores of GPU 0. You can do so by setting
the device_ids argument to DataParallel, or by setting the CUDA_VISIBLE_DEVICES
environment variable.
warnings.warn(imbalance_warn.format(device_ids[min_pos], device_ids[max_pos]))
WM acc:
[=========================== 50/50 =============================>.] Step: 17ms | Tot: 969ms | Loss: 2.304 | Acc: 8.000% (8/100)Epoch: 0
Traceback (most recent call last):
File "train.py", line 96, in <module>
trainloader, device, wmloader)
File "/home/visionx/project/WatermarkNN/trainer.py", line 48, in train
outputs = net(inputs)
File "/home/visionx/anaconda3/envs/waterknn/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/visionx/anaconda3/envs/waterknn/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 171, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/home/visionx/anaconda3/envs/waterknn/lib/python3.7/site-packages/torch/nn/parallel/data_parallel