问题描述:如题,在用预训练Bert进行分类微调时,用nvidia-smi看显存占用和gpu使用率都接近全满,但是速度和在cpu上训练是一样的。
torch.manual_seed(0)
a=CustomDataset(query_dic=data,table_dic=data2,batch_size=4,tokenizer=tokenizer)
checkpoint = torch.load('model_checkpoint0.pth')
model = classifier(cwd + bert_path, 7)
optimizer = optim.Adam(model.parameters(), lr=5e-6)
model.load_state_dict(checkpoint["net"])
optimizer.load_state_dict(checkpoint["optimizer"])
start_epoch=checkpoint["epoch"]
run(1,a,model,optimizer,start_epoch)
首先实例化模型和优化器,加载checkpoint,将模型、优化器和数据集传入到训练函数run中。
函数run的代码如下:
def run(epochs,data_iter,model,optimizer,start_epoch):
device=torch.device("cuda")
model.to(device)#将模型转移到gpu上
criterion=torch.nn.CrossEntropyLoss(ignore_index=10)
for state in optimizer.state.values(): ##将cpu上的optimizer参数转移到gpu上
for k, v in state.items():
if isinstance(v, torch.Tensor):
stat