【令人窒息的tensorflow-gpu】有关tensorflow-gpu 1.13.1 + cuda 10.0.130 + cudnn 7.6.5 的配置思路

有关tensorflow-gpu 1.13.1 + cuda 10.0.130 + cudnn 7.6.5 的配置思路

吐槽(可跳)

       又一次被tensorflow-gpu环境折磨了两天后,我发现我这次必须得说点什么;
       曾几何时,我也是个刷着各种高赞教程,一步步安装下来但被莫名其妙的问题毒打的骚年;
       每次我就都觉得比上次更加熟练,辛辛苦苦配了几小时后,依然被各种毒打;

import tensorflow as tf
tf.test.is_gpu_available()

>>>False

       年轻的我不能明白,为什么python安装库要有conda、pip等方法;
       不明白为啥一个库有那么多版本,版本太高太低都有问题;
       不明白为啥有那么多镜像源,每个好像差不多但又有些区别;
       不明白为啥上一次配好的方法,这一次就失灵了(当然可能是我忘了);
       
       相信同样的问题大家也在苦恼,然而上网一查都在说版本不匹配的问题;
       tensorflow-gpu、cuda、cudnn的包都不小,折腾来折腾去既浪费时间也消耗情绪;
       明明安装的版本已经跟教程一致,但还是一直报错;
       删了这个安那个,反复且枯燥的配置过程劝退了不少萌新;
       
       但是!有的时候还是要相信自己。
       今天的问题让我发现,gpu连不上不一定是tensorflow-gpu、cuda与cudnn之间的版本不匹配问题;
       就算版本匹配了,还是有可能连不上gpu;
       今天的教训是:相同版本号的库,在不同的源也是会有区别的!
       

方法分享

       高赞博客里的教程我就不重复了,基础的步骤没太大区别,我来分享一个今天刚验证过的方法;
       
       在此提醒,本方法需要有个事先配好的tensorflow-gpu环境作为比照(在不在本机无所谓,你能操作就行)
       
       首先简单介绍下我的tensorflow-gpu的环境配置结果,以下配置都在conda建立的虚拟环境中完成:
       

环境配置版本下载方式来源
python3.6.12condaconda-forge
tensorflow-gpu1.13.1piphttps://pypi.tuna.tsinghua.edu.cn/simple
cudatoolkit10.0.130condahttps://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
cdunn7.6.5condahttps://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main

       
       以下是本文分享的方法:
       使用 conda list 查看已安装的python库及其来源

(your env) *:\**\****>conda list
# packages in environment at *:\**\***:
#
# Name                    Version                   Build  Channel
absl-py                   0.11.0                   pypi_0    pypi
appdirs                   1.4.4              pyh9f0ad1d_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
audioread                 2.1.9            py36ha15d459_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
brotlipy                  0.7.0           py36hc753bc4_1001    https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
ca-certificates           2020.12.5            h5b45459_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
cached-property           1.5.2                    pypi_0    pypi

******

       使用同样的方式对比之前配好的tensorflow-gpu环境,仔细对比 Channel ,将新的环境需要配的库一次使用对应的 Channel 下载(控诉同版本不同镜像源竟然会出现内容不同的事情,当然也可能是我太菜了没有悟道 )
       上述对比完成后,再次测试能否使用gpu:

>>> import tensorflow as tf
>>>> tf.test.is_gpu_available()
2021-01-12 21:03:38.047493: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2021-01-12 21:03:38.216586: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:
name: GeForce GTX 1060 with Max-Q Design major: 6 minor: 1 memoryClockRate(GHz): 1.3415
pciBusID: 0000:01:00.0
totalMemory: 6.00GiB freeMemory: 4.97GiB
2021-01-12 21:03:38.228614: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2021-01-12 21:03:38.848614: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-01-12 21:03:38.855040: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0
2021-01-12 21:03:38.859936: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N
2021-01-12 21:03:38.864840: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/device:GPU:0 with 4716 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060 with Max-Q Design, pci bus id: 0000:01:00.0, compute capability: 6.1)
True
>>> exit()

       没有比看到True更快乐的事了!!!
       本次分享的奇淫巧计到此为止,以下是我无可奈何迫不得已的解决方案,各位小伙伴一定要擦亮眼睛认清版本和渠道呀!

       

其他思路

conda环境克隆

       其实若有现成的tensorflow-gpu环境,完全可以采取conda环境克隆的方式。上文中的方法只是我贼不死心的尝试。
       
       假设你有现成的环境A-env,相要克隆出环境B-env:

    >>> conda create -n A-env --clone B-env

       据说即使环境不在本机,也可以用路径引导(未尝试):

    >>> conda create -n A-env --clone Path

Docker

       使用Docker仍然是相对方便快捷的环境配置方式,其优点此处不再赘述,简单分享一下使用方式;

  1. 编写Dockerfile
	FROM  tensorflow/tensorflow:1.13.1-gpu-py3
	RUN apt-get update
  1. 创建docker镜像
    docker build -t your-image-name .
     # -t :指定要创建的目标镜像名 
     # . :Dockerfile 文件所在目录,可以指定Dockerfile 的绝对路径
  1. 创建docker容器并运行
	docker run -t -i your-image-name  /bin/bash

       docker相关的示例到此结束,更多docker的操作在更专业更详细的地方去学吧

       
       欢迎各位大佬交流讨论!

自编译tensorflow1.python3.5,tensorflow1.12; 2.支持cuda10.0,cudnn7.3.1,TensorRT-5.0.2.6-cuda10.0-cudnn7.3; 3.支持mkl,无MPI; 软硬件硬件环境:Ubuntu16.04,GeForce GTX 1080 配置信息: hp@dla:~/work/ts_compile/tensorflow$ ./configure WARNING: --batch mode is deprecated. Please instead explicitly shut down your Bazel server using the command "bazel shutdown". You have bazel 0.19.1 installed. Please specify the location of python. [Default is /usr/bin/python]: /usr/bin/python3 Found possible Python library paths: /usr/local/lib/python3.5/dist-packages /usr/lib/python3/dist-packages Please input the desired Python library path to use. Default is [/usr/local/lib/python3.5/dist-packages] Do you wish to build TensorFlow with XLA JIT support? [Y/n]: XLA JIT support will be enabled for TensorFlow. Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: No OpenCL SYCL support will be enabled for TensorFlow. Do you wish to build TensorFlow with ROCm support? [y/N]: No ROCm support will be enabled for TensorFlow. Do you wish to build TensorFlow with CUDA support? [y/N]: y CUDA support will be enabled for TensorFlow. Please specify the CUDA SDK version you want to use. [Leave empty to default to CUDA 10.0]: Please specify the location where CUDA 10.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: /usr/local/cuda-10.0 Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7]: 7.3.1 Please specify the location where cuDNN 7 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda-10.0]: Do you wish to build TensorFlow with TensorRT support? [y/N]: y TensorRT support will be enabled for TensorFlow. Please specify the location where TensorRT is installed. [Default is /usr/lib/x86_64-linux-gnu]:/home/hp/bin/TensorRT-5.0.2.6-cuda10.0-cudnn7.3/targets/x86_64-linux-gnu Please specify the locally installed NCCL version you want to use. [Default is to use https://github.com/nvidia/nccl]: Please specify a list of comma-separated Cuda compute capabilities you want to build with. You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus. Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 6.1,6.1,6.1]: Do you want to use clang as CUDA compiler? [y/N]: nvcc will be used as CUDA compiler. Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]: Do you wish to build TensorFlow with MPI support? [y/N]: No MPI support will be enabled for TensorFlow. Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native -Wno-sign-compare]: Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: Not configuring the WORKSPACE for Android builds. Preconfigured Bazel build configs. You can use any of the below by adding "--config=" to your build command. See .bazelrc for more details. --config=mkl # Build with MKL support. --config=monolithic # Config for mostly static monolithic build. --config=gdr # Build with GDR support. --config=verbs # Build with libverbs support. --config=ngraph # Build with Intel nGraph support. --config=dynamic_kernels # (Experimental) Build kernels into separate shared objects. Preconfigured Bazel build configs to DISABLE default on features: --config=noaws # Disable AWS S3 filesystem support. --config=nogcp # Disable GCP support. --config=nohdfs # Disable HDFS support. --config=noignite # Disable Apacha Ignite support. --config=nokafka # Disable Apache Kafka support. --config=nonccl # Disable NVIDIA NCCL support. Configuration finished 编译: hp@dla:~/work/ts_compile/tensorflow$ bazel build --config=opt --config=mkl --verbose_failures //tensorflow/tools/pip_package:build_pip_package 卸载已有tensorflow: hp@dla:~/temp$ sudo pip3 uninstall tensorflow 安装自己编译的成果: hp@dla:~/temp$ sudo pip3 install tensorflow-1.12.0-cp35-cp35m-linux_x86_64.whl
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值