加载模型后optimizer.step()处报错：RuntimeError: expected device cpu but got device cuda:0

最新推荐文章于 2023-02-17 18:25:13 发布

原创最新推荐文章于 2023-02-17 18:25:13 发布 · 6.3k 阅读

22 ·

CC 4.0 BY-SA版权

文章标签：

#PyCharm #RuntimeError #模型重载出错

问题解决专栏收录该内容

3 篇文章

订阅专栏

本文介绍如何解决在使用PyTorch进行深度学习模型训练时，因GPU和CPU设备不匹配导致的RuntimeError。通过在加载模型参数前，确保所有张量都在相同的设备上，避免在optimizer.step()时出现错误。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

整体的思路如下：

1、保存每个epoch模型的参数；

2、如果在训练时意外终止，，则自定义再次加载意外终止处保存的模型；

3、然后一切再开始，，比如接着训练测试什么的，，但是问题来了，，出现错误：RuntimeError: expected device cpu but got device cuda:0。。期望CPU设备而得到的是CUDA（GPU）。。

原因：optimizer加载参数时,tensor默认在CPU上，故需将所有的tensor都放到GPU上，否则:在optimizer.step()处报错：RuntimeError: expected device cpu but got device cuda:0。

解决的详细代码如下：

    if opt.resume_path:
        print('loading checkpoint {}'.format(opt.resume_path))

        checkpoint = torch.load(opt.resume_path)

        assert opt.arch == checkpoint['arch']
        opt.begin_epoch = checkpoint['epoch']  # 从上次中断的epoch次重新开始,直至n_epochs
        model.load_state_dict(checkpoint['state_dict'])
        if not opt.no_train:
            optimizer.load_state_dict(checkpoint['optimizer'])
            # 因为optimizer加载参数时,tensor默认在CPU上
            # 故需将所有的tensor都放到cuda,
            # 否则: 在optimizer.step()处报错：
            # RuntimeError: expected device cpu but got device cuda:0
            for state in optimizer.state.values():
                for k, v in state.items():
                    if torch.is_tensor(v):
                        state[k] = v.cuda()