Python语法
Global变量使用
_______________________________________________________________________________________________
X=0
Def fun():
Global x
X=function(x)
Def fun2():
Global x
X=function2()
_______________________________________________________________________________________________
Python function global variables? - Stack Overflow
Python2.7
安装三步:(for caffe)
1.下载安装python2.7
2.系统变量加目录;
3.环境变量加pip2.7目录(install/scripts)
Cython
Unable to find vcvarsall.bat
关于error: Unable to find vcvarsall.bat-CSDN博客
注意需要修改D:\anaconda3\Lib\distutils\_msvccompiler.py文件,vcruntime=None即可
Tensorflow
安装
配置工具:tensorflow1.1.0 cuda8.0 cudnn5.1
Q:Make出现/usr/bin/ld: cannot find –lcudart
S: 改变编译命令顺序(按照pointnet++改)
Q:/home/douxiao/anaconda3/bin/../lib/libstdc++.so.6: version `GLIBCXX/cxxabi_3.4.21' not found
查看strings /home/maoyingjun/anaconda3/lib/libstdc++.so.6 | grep CXXABI
locate libstdc++.so.6
find / -name “libstdc++.so*”找到anaconda中的位置
复制即可
2020-7/7
Python version >= 3.6 required
遇到安装tensorflow-gpu35 error =futures only for python2,安装tensorflow-gpu27 error=futures only for python3,需要先安装futures并指定对应版本(tf-gpu35,安装futures=3.1.1)
1.pip install numpy==1.16.5
2.pip install tensorflow==1.12.0 numpy==1.16.5(再次指定)
Markdown=2.6.8对应tf=1.12.0可以
2020/9/3
Import tensorflow as tf导致python退出
1.改变tf版本无效;
2.同时出现numpy dtypes
3.debug查到platform无法继续
4.重装conda和charm解决
另一个解决方法:
运行 import tensorflow as tf 时出现python已停止工作_import tf as tranf pythonji?-CSDN博客
获取可用gpu
https://zhuanlan.zhihu.com/p/124190212
注意及时退出,因为会把gpu全占Nan值
Tf.sqrt梯度为nan带来的问题
tensorflow中NaN的问题_a tensor with all nans was produced in unet-CSDN博客
载入模型
tf.train.latest_checkpoint(checkpoint_dir)
tf.train.get_checkpoint_state(checkpoint_dir)
载入部分权重:
方式1.
Pick出变量列表,saver(var_list)
方式2.
———————————————————————————————————————
previous_variables = [var_name for var_name, _ in tf.contrib.framework.list_variables('./checkpoint-single-02652-1')]
restore_map = {variable.op.name: variable for variable in tf.global_variables()
if variable.op.name in previous_variables}
tf.contrib.framework.init_from_checkpoint('././checkpoint-single-02652-1', restore_map)
print('not first train, use previous')
———————————————————————————————————————
variable问题
ValueError: Variable conv1/weights already exists, disallowed. Did you mean to set reuse=True in VarScope
在最前面(placeholder前面)加tf.reset_default_graph()
Attempting to use uninitialized value beta1_power
opt=tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(loss)
init=tf.global_variables_initializer()
第一次训练会有这个问题,从ckpt restore没有这个问题
修改tensorflow变量名,以适用于其他地方:
tensorflow_rename_variables.py
在black_glass训练里面
变量重复使用:
1.With tf.variable_scope(name,reuse=True):
2.with tf.variable_scope(name) as scope:
Scope.reuse_variables=True
计算不相关的变量梯度出现None
https://github.com/tensorflow/tensorflow/issues/5861
学习率下降策略
https://zhuanlan.zhihu.com/p/32923584
学习率需要注意的点:
cur_lr = tf.train.exponential_decay(init_lr, global_step, decay_step, decay, staircase=True)
1.global step的值需要在feed_dict加进去,否则不改变
print("epoch=%d,iter=%d"%(epoch,iteration),sess.run(cur_lr),my_lr) //输出一直是初值(4e-5)
print("epoch=%d,iter=%d"%(epoch,iteration),sess.run(cur_lr,feed_dict={global_step:train_step_}),my_lr) //cur_lr会改变
2.注意global step不要+1
global_step=global_step+1 //这一步会把global step从variable改变为一个operate(add)导致下一步直接失效;
print("epoch=%d,iter=%d"%( epoch,iteration),sess.run(cur_lr,feed_dict={global_step:train_step_}),my_lr)
版本
【Hdrnet】对应编译的tf版本不能高于1.2.0,cuda-helper.h
Stylegan tf版本要求1.4以上 tf.auto_reuse mode,使用1.11.0(需要修改cuda版本,原来是tf1.2.0-cuda8.0)
【stylegan】Tf.AUTO_REUSE 1.4.0 版本以上
图
从meta,index文件恢复模型:
-----------------------------------------------------------
tf.reset_default_graph()
sess = tf.Session()
sess.run(tf.global_variables_initializer())
saver = tf.train.import_meta_graph(os.path.join('model', 'model.meta'))
saver.restore(sess, tf.train.latest_checkpoint('model'))
for idx,xx in enumerate(tf.global_variables()):
print(idx, xx.name)
无法保存
Sess.save GraphDef cannot be larger than 2GB:
Tf.constant作为节点放于tf图,尽量避免,使用tf.placeholder
Decov
Conv2d和deconv2d的input channel和output channel顺序是不同的
理解tf.nn.conv2d和tf.nn.conv2d_transpose - 简书
conv2d filter:
[filter_height, filter_width, in_channels, out_channels]
Conv2d_transpose filter:
[height, width, output_channels, in_channels]
Pad
Same valid padding:
Keras
module 'tensorflow.python.keras.backend' has no attribute 'get_graph'
版本问题:1.11.0 keras=2.1.4 solve
接口
tf.nn.batch_normalization(
x,
mean,
variance,
offset,
scale,
variance_epsilon,
name=None
)
其他
查看模型细节信息
saver.restore(sess, checkpoint_path)
tf.global_variables()
指定gpu方法:
https://www.cnblogs.com/darkknightzh/p/6591923.html
1. CUDA_VISIBLE_DEVICES=1 python my_script.py
2. import os
os.environ["CUDA_VISIBLE_DEVICES"] = "2"
查看可用设备:
from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())
config = tf.ConfigProto(allow_soft_placement=True)
只指定allow-grow出现no support kernel问题
Q:Deeplab tf name 'basestring' is not defined
Python3 basestring not defined use basestring=str
初始化和恢复参数顺序:
Tensorflow 加载模型 restore 与 init_from_checkpoint的区别_加载模型和checkpoint的区别-CSDN博客
如果使用非save.restore,则初始化操作在restore之后
http://stackmirror.caup.cn/page/sksban8zfne8
Blas SGEMM launch failed
Intersession 改为session,为gpu没释放问题
错误:Failed to convert object of type <class 'list'> to Tensor. Contents: [None, -1]. Consider casting elements to a supported type.
replace x.get_shape()[0] with tf.shape(x)[0]
错误:The last dimension of the inputs to `Dense` should be defined. Found `None`
.meta文件随着训练一直增大:
因为在训练迭代中新添加了变量,导致meta增加,内存占用也增大(直到崩溃)
TensorProto中的tensor_content
————————————
from tensorflow.python.framework import tensor_util
for n in tf.get_default_graph().as_graph_def().node:
print tensor_util.MakeNdarray(n.attr['value'].tensor)
————————————
http://www.voidcn.com/article/p-qevyyirs-bxx.html
protobuf
class DescriptorBase(metaclass=DescriptorMetaclass) class DescriptorBase(metaclass=DescriptorMetaclass)
pip install protobuf==3.17.3
tflite
量化训练
deepupe_tensorflow/mobilenetv2.py at master · SystemErrorWang/deepupe_tensorflow · GitHub
build
https://www.tensorflow.org/lite/guide/build_android
mark
caffe
模型部署:
编译问题:
Could NOT find Atlas (missing: Atlas_CLAPACK_INCLUDE_DIR
Bias选项改为open
CMake Error at cmake/TargetResolvePrerequesites.cmake:28 (get_filename_component):
get_filename_component called with incorrect number of arguments
Call Stack (most recent call first):
cmake/TargetResolvePrerequesites.cmake:50 (caffe_prerequisites_directories)
src/caffe/test/CMakeLists.txt:37 (caffe_copy_prerequisites)
USE_OPENCV没开
tf2caffe
常见问题:
1.网络输出层数不对
2.输入mean不对,特别检查对3通道输入的
训练加速
读图速度 | 1000图128*128*1 0.003s |
logging
简单的logging使用:
Python日志库logging总结-可能是目前为止将logging库总结的最好的一篇文章-腾讯云开发者社区-腾讯云
hdrnet logging使用:
-------------------------------------------------------------------------------
import logging
logging.basicConfig(format="[%(process)d] %(levelname)s %(filename)s:%(lineno)s | %(message)s")
log = logging.getLogger("train")
log.setLevel(logging.INFO)
log.info("Directory input {}, with {} images".format(path, len(inputs)))
-------------------------------------------------------------------------------
pytorch
载入pretrained-resnet:
PyTorch—torchvision.models导入预训练模型—残差网络代码讲解_model = torchvision.models.resnet50(pretrained=fal-CSDN博客
nn.ReLU(inplace=True)节约显存
输入输出
Print %% 打印百分号
Print %.2f 保留两位小数
pdb
删除断点: cl k
条件断点
第十六章:开发工具-pdb:交互式调试工具-断点-条件断点_pdb 条件断点-CSDN博客
Tensorboard:
View image during training:
mnist example:
https://www.cnblogs.com/xuyong437/p/11202047.html
python install setup.py
打开cmd
到达安装目录
python setup.py build
python setup.py install
run每层
已知图,如果sess.run每层结果做比较
Pytorch
安装
pip install torch==1.7.0+cu101 torchvision==0.8.1+cu101 torchaudio==0.7.0 -f https://download.pytorch.org/whl/torch_stable.html
通过这个看适合的版本
实际使用版本:
pip install torch==1.4.0+cu100 -f https://download.pytorch.org/whl/torch_stable.html
替换部分权重
pytorch 如何加载部分预训练模型_.filter out unnecessary keys-CSDN博客
state_dict = load_state_dict_from_url(model_urls['mobilenet_v2_tv'], progress=progress)
model_dict = model.state_dict()
pretrained_dict = {k: v for k, v in state_dict.items() if k in model_dict}
model_dict.update(pretrained_dict)
print("-----------------validate whether use pretrained weight------------------")
print(model.state_dict()['features.18.1.weight'])
model.load_state_dict(model_dict)
查看权重
Model.state_dict()[“feature.1.weight”]
N: batch;
C: channel
H: height
W: width
Caffe 的Blob通道顺序是:NCHW;
Tensorflow的tensor通道顺序:默认是NHWC, 也支持NCHW,使用cuDNN会更快;
Pytorch中tensor的通道顺序:NCHW
TensorRT中的tensor 通道顺序: NCHW
cuda设置:只设置环境变量,cuda可能还是默认放到cuda:0中
如何使用特定显卡跑pyTorch_ubuntu torch指定显卡-CSDN博客
q:cuda runtime error (11) : invalid argument at /pytorch/aten/src/THC/THCGeneral.cpp:383
版本问题
安装cuda10(前面的是cut version)
【inverseGAN】'torchvision.transforms' has no attribute 'Resize'
Torchvision=0.1.9
上一个torch版本是1.0.2.post
Torchvision 0.3.0版本可以解决问题,torchvision要求torch>=1.1.0
【face-parsing.PyTorch】
Cuda问题
Cuda9+torch1.0.1+torchvision0.1.9
【FUNIT】
Cuda10+torch1.1.0+torchvision0.3.0
invalid argument at /pytorch/aten/src/THC/THCGeneral.cpp:383
未解决
undefined symbol: _ZN2at7getTypeERKNS_6TensorE
更新到torch1.3.0+torchvision0.4.1解决
报错:
one of the variables needed for gradient computation has been modified by an
s:pytorch 0.4.0之后不支持inplace操作
参考 https://www.cnblogs.com/liangzp/p/9207979.html
'function' object has no attribute 'Variable'
改为:isinstance(x, torch.autograd.Variable):(少一个variable)
Onnx
删除添加节点的方法(avgpool删除前面的pad层)
opset=8
转换view函数会多出来cast,float64到int64的层,opset升级到10即可去掉
RuntimeError
RuntimeError: storage has wrong size: expected 0 got 32
并行训练相关导致,停止训练重新复制模型解决问题
Forward需要model.eval,否则可能有问题(结果跟训练不同,即不合理)
PYTorch3d
Fvcore安装
Py之fvcore:fvcore库的简介、安装、使用方法之详细攻略-CSDN博客
detectron安装
Installation — detectron2 0.6 documentation
pytorch: grad can be implicitly created only for scalar outputs
没做loss mean/sum导致
Torch做简单优化:
用pytorch做简单的最优化问题_pytorch求解最优化方程-CSDN博客
torchvision-window
pip install --no-deps torchvision
from pytorch3d import _C报错_ZNK2at6Tensor7is_cudaEv
python - How to fix "ERROR: Failed building wheel for pytorch3d" error on Colab? - Stack Overflow
从github pip安装
$ pip install git+git://github.com/aladagemre/django-notification.git@2927346f4c513a217ac8ad076e494dd1adbf70e1
前景色在renderering中定义
背景色修改:
/opt/anaconda3/envs/pytorch3d/lib/python3.8/site-packages/pytorch3d/renderer]$ vim blending.py
Pytorch-tensorboard
https://zhuanlan.zhihu.com/p/97585876?from_voters_page=true
显存增加
Loss.backward() 内存增加2M
Del loss
Torch.cuda.achche_empty()
测试yolov7 detect.py nms函数显存增加:
https://zhuanlan.zhihu.com/p/351705514
加 with torch.no_grad():解决
Load_state_dict
Unexpected key(s) in state_dict: "module…“
查看模型参数值
for name,param in model.named_parameters():params[name]=param
加载模型版本问题:
解决不同深度学习框架 PyTorch 加载模型问题_attempted to read a pytorch file with version 3, b-CSDN博客
torch模型training
主要torch模型是否training状态,看model.training,
(我自己设定了is_training,注意不要混淆)
Ffmpeg
module 'ffmpeg' has no attribute 'input'
版本降低到0.1.18(from 0.2.0)
报错:module ffmpeg has no attribute input 解决方法-CSDN博客
pip install ffmpeg-python==0.1.18
Pytorch loss不下降:
【Torch】解决tensor参数有梯度,weight不更新的若干思路_tensor.weight-CSDN博客
protobuf
ModuleNotFoundError: No module named 'google'
Conda install protobuf(pip install protobuf不行)
torchaudio
torchaudio的使用_torchaudio.load-CSDN博客
Pandas
ImportError: cannot import name 'nosetester'
cannot import name 'nosetester' error while importing pandas - Stack Overflow
pandas==0.19.2,numpy==1.11.0/1.11.1/1.16.0(tf也可)
python类
类继承:
类:类成员变量(数组),类成员函数
继承:分为三类:不需要继承的(默认继承),需要特定继承的super(c1,self).__Init__(xx),特定新成员变量/函数,默认继承/传值(修改)继承
super() argument 1 must be type, not classobj
python2 类继承需要声明为object
https://blog.csdn.net/xiongaijing/article/details/14001365
Opencv 2.4.13
Opencv读写isopened失败
Filename参数为全路径
Opencv视频写入的size为(宽,高),但是img.shape得到为(高,宽),需要注意对应顺序
复制图使得原图不改变:
Img_new=np.copy(img)
注:不能使用img_new=img,这样img改变使得img_new同时改变
Opencv fillpoly
img2_reye_mask1 = cv2.fillPoly(img2_reye_mask, [np.squeeze(convexhull_reye)], 255)
注意pnts参数的[ ]是必须的,否则显示出来为散点,而非填充图
Crop img
CvSize size = cvSize(rect_width, rect_height);
cvSetImageROI(src_img, cvRect(rectx, recty, rect_width, rect_height));
IplImage* pDest = cvCreateImage(size, src_img->depth, src_img->nChannels);
cvCopy(src_img, pDest);
cvResetImageROI(pDest);
cvSaveImage(str_face.c_str(), pDest);
if (isdebug == 1) {
cvShowImage("dn face", pDest);
cvWaitKey(0);
}
cvReleaseImage(&pDest);
split& merge
【OpenCV3】图像通道分离与合并——cv::split()与cv::merge()详解-CSDN博客
CvMat类
取值
CvMat数据在屏幕上打印_printcvmat-CSDN博客
赋值:
cvfillpoly
OpenCV绘图之多边形函数cvFillPoly,cvPolyLine,cvFillConvexPoly_cv::fillpoly-CSDN博客
get mask
opencv 提取多边形mask区域_cv::mat 的mask-CSDN博客
opencv字体效果
opencv 里面各个字体的效果_opencv哪些字体好看-CSDN博客
Unable to stop the stream: Inappropriate ioctl for device
Cv2.VideoCapture(xx.mp4)导致,无ffmpeg导致,pip install ffmpeg然后pip install opencv-contrib-python
报错:Unable to stop the stream: Inappropriate ioctl for device-CSDN博客
opencv打开失败,读图为空
改变版本:4.4.0->4.2.0
opencv349
imread不是cv的成员
包含头文件 #include<opencv2/opencv.hpp>
Tensorboard
从训练好的meta文件查看网络结构:载入模型graph从meta文件,然后写入tensorboard
python - Tensorflow view graph from model.ckpt.meta file - Stack Overflow
tensorboard连续显示和新建显示:
1.新建显示
2.连续显示:tf.suammry变量需要事先定义
tags and values not the same shape
matplotlib
三维面画图
https://www.cnblogs.com/xingshansi/p/6777945.html
文件操作
'ascii' codec can't decode byte 0xbf in position 23:
追加txt写入操作
python中open()函数中可选参数w,w+和a,a+的区别_python open w+-CSDN博客
csv读写
python3:csv的读写_python csv-CSDN博客
conda
conda创建环境
conda 命令和创建tensorflow环境_tensorflow在终端中执行什么命令切换到我们新创建的虚拟环境中-CSDN博客
conda创建网络出错
Conda Install Package Error_用清华镜像更新conda-CSDN博客(删除默认)
caffe2tf & tf2caffe
查看已有的conda列表:
conda info –e
无法激活:
You may need to close and restart your shell after running 'conda init'
conda activate激活环境出错的解决办法_condaerror: run 'conda init' before 'conda activat-CSDN博客
# 重新进入虚拟环境
source activate
# 退出虚拟环境
conda deactivate
conda channel not find:看对应文件夹,没有对应文件,则改为有的目录
conda install xxx
报错conda Malformed version string ‘~’: invalid character(s)
conda Malformed version string ‘~’: invalid character(s) | 码农家园
conda源和设置
https://zhuanlan.zhihu.com/p/87123943
其他
Python 派生类
1.base类定义,派生类重定义函数
2.@xx 复定义多种函数
多线程
https://www.cnblogs.com/amengduo/p/9586704.html
查看某个进程的线程数:cat /proc/15016/status
Linux最大线程数限制及当前线程数查询_Linux教程_Linux公社-Linux系统门户网站
'encoding' is an invalid keyword argument for this function
Python2.7导致
安装遇到的问题
Not uninstalling numpy at /usr/lib/python2.7/dist-packages, outside environment /usr
0.numpy无法卸载
1.sudo apt remove numpy
2.pip install numpy
Csv
读取csv文件:
https://www.cnblogs.com/liangshian/p/11272155.html
Pycharm
Windows大文件
C:\Users\user_name1\.PyCharmCE2019.2\system\caches
Pycharm字体颜色
Setting-editer-color scheme-python-line comment (not inherit)
Pycharm 右边analysis取消
Analysis=inspections
生成gif动画
images.append(imageio.imread(filepath))
imageio.mimsave('Iter_res.gif', images,duration=0.3)
问题:Imageio Pillow plugin requires Pillow, not PIL
Pip的pillow不可用,需要conda install pillow
颜色版面
http://www.ebaomonthly.com/window/photo/lesson/colorList.htm
python二进制
Python中struct.pack()和struct.unpack()用法详细说明-CSDN博客
pybind
cpp to python by so
注意数据类型,np.zeros(shape=[])会指定float64,需要改为32float
protobuf
caffe和tensorflow都依赖这个库
from google.protobuf import descriptor as _descriptor 出错
升级版本即可:tebsorflow 安装问题 import 的时候出问题_有问必答-CSDN问答
Ffmpeg
分离音频: ffmpeg -i %vid_path% -vn soundfile.wav
重采样:
ffmpeg - Convert an MP3 from 48000 to 44100 Hz? - Super User
cuda err
CUDA runtime error (59) : device-side assert triggered
Slice错误
加分析A_LAUNCH_BLOCKING=1