梳理CNN流程,解释参数
Torch与python对比学习
Deep Learning with Torch
手写数字神经网络,是一个简单的前馈网络。接受输入然后一层层的传递,最后给出输出。
require 'nn'; #在torch中使用nn包来构建神经网络
net = nn.Sequential() #是个容器,下面解释
net:add(nn.SpatialConvolution(1, 6, 5, 5)) -- 1 input image channel, 6 output channels, 5x5 convolution kernel
卷积运算:1个输入通道,6个输出通道,5x5的卷积核
net:add(nn.ReLU()) -- non-linearity
ReLU是一种非线性激励函数,如果没有非线性激励函数,可以验证每一层的输出都是上一层输入的线性函数,无论多少层都是线性关系,这就是最原始的感知机了,没有太大的意义。
net:add(nn.SpatialMaxPooling(2,2,2,2)) -- A max-pooling operation that looks at 2x2 windows and finds the max.
池化层是为了减小输出的大小,降低过拟合
net:add(nn.SpatialConvolution(6, 16, 5, 5))
卷积运算:6个输入通道,16个输出通道,5x5的卷积核
net:add(nn.ReLU()) -- non-linearity
net:add(nn.SpatialMaxPooling(2,2,2,2))
net:add(nn.View(16*5*5)) -- reshapes from a 3D tensor of 16x5x5 into 1D tensor of 16*5*5
将上面计算后的3维转换成一维数组,为了下面进行全连接层的计算
net:add(nn.Linear(16*5*5, 120)) -- fully connected layer (matrix multiplication between input and weights)
全连接层:输入与权之间的矩阵乘法
net:add(nn.ReLU()) -- non-linearity
net:add(nn.Linear(120, 84))
net:add(nn.ReLU()) -- non-linearity
net:add(nn.Linear(84, 10)) -- 10 is the number of outputs of the network (in this case, 10 digits)
net:add(nn.LogSoftMax()) -- converts the output to a log-probability. Useful for classification problems
输出转换成概率
print('Lenet5\n' .. net:__tostring());
不同的容器:
automatic differentiation
向前(输入)函数:计算给定输入的输出,使输入通过网络流动
向后(输入,梯度)功能:区分神经元传递的梯度
全连接层:全连接层就是高度提纯的特征了,方便交给最后的分类器或者回归
意会一下啊~
这样计算之后数据就变少了
input = torch.rand(1,32,32) -- pass a random tensor as input to the network
随机初始化一个输入
output = net:forward(input)
前向运算
print(output)
net:zeroGradParameters() -- zero the internal gradient buffers of the network (will come to this later)
gradInput = net:backward(input, torch.rand(10))
print(#gradInput)
criterion = nn.ClassNLLCriterion() -- a negative log-likelihood criterion for multi-class classification
定义损失函数
criterion:forward(output, 3) -- let's say the groundtruth was class number: 3
gradients = criterion:backward(output, 3)
gradInput = net:backward(input, gradients)
损失函数
当你想要一个模型来学习做某事时,你给它反馈它做得有多好。这个计算模型性能的客观度量的函数称为损失函数。
典型的损失函数接收模型的输出和基础事实,并计算量化模型性能的值。
例子
We now have 5 steps left to do in training our first torch neural network
- Load and normalize data
- Define Neural Network
- Define Loss function
- Train network on training data
- Test network on test data.
Load and normalize data
require 'paths'
if (not paths.filep("cifar10torchsmall.zip")) then
os.execute('wget -c https://s3.amazonaws.com/torch7/data/cifar10torchsmall.zip')
os.execute('unzip cifar10torchsmall.zip')
end
trainset = torch.load('cifar10-train.t7')
testset = torch.load('cifar10-test.t7')
classes = {'airplane', 'automobile', 'bird', 'cat',
'deer', 'dog', 'frog', 'horse', 'ship', 'truck'}
print(trainset)
print(#trainset.data)
事先将数据准备成了50000x3x32x32(训练)和10000x3x32x32(测试)
itorch.image(trainset.data[100]) -- display the 100-th image in dataset
print(classes[trainset.label[100]])
显示第100张图片,看一下
-- ignore setmetatable for now, it is a feature beyond the scope of this tutorial. It sets the index operator. setmetatable(trainset, {__index = function(t, i) return {t.data[i], t.label[i]} end} ); trainset.data = trainset.data:double() -- convert the data from a ByteTensor to a DoubleTensor. function trainset:size() return self.data:size(1) end
建立一个数据集索引
print(trainset:size()) -- just to test
print(trainset[33]) -- load sample number 33.
itorch.image(trainset[33][1])
测试一下
redChannel = trainset.data[{ {}, {1}, {}, {} }] -- this picks {all images, 1st channel, all vertical pixels, all horizontal pixels}
{所有图像,第一通道,所有垂直像素,所有水平像素}
print(#redChannel)
mean = {} -- store the mean, to normalize the test set in the future
stdv = {} -- store the standard-deviation for the future
for i=1,3 do -- over each image channel
mean[i] = trainset.data[{ {}, {i}, {}, {} }]:mean() -- mean estimation
print('Channel ' .. i .. ', Mean: ' .. mean[i])
trainset.data[{ {}, {i}, {}, {} }]:add(-mean[i]) -- mean subtraction
stdv[i] = trainset.data[{ {}, {i}, {}, {} }]:std() -- std estimation
print('Channel ' .. i .. ', Standard Deviation: ' .. stdv[i])
trainset.data[{ {}, {i}, {}, {} }]:div(stdv[i]) -- std scaling
end
数据归一化处理
Time to define our neural network
net = nn.Sequential()
net:add(nn.SpatialConvolution(3, 6, 5, 5)) -- 3 input image channels, 6 output channels, 5x5 convolution kernel
net:add(nn.ReLU()) -- non-linearity
net:add(nn.SpatialMaxPooling(2,2,2,2)) -- A max-pooling operation that looks at 2x2 windows and finds the max.
net:add(nn.SpatialConvolution(6, 16, 5, 5))
net:add(nn.ReLU()) -- non-linearity
net:add(nn.SpatialMaxPooling(2,2,2,2))
net:add(nn.View(16*5*5)) -- reshapes from a 3D tensor of 16x5x5 into 1D tensor of 16*5*5
net:add(nn.Linear(16*5*5, 120)) -- fully connected layer (matrix multiplication between input and weights)
net:add(nn.ReLU()) -- non-linearity
net:add(nn.Linear(120, 84))
net:add(nn.ReLU()) -- non-linearity
net:add(nn.Linear(84, 10)) -- 10 is the number of outputs of the network (in this case, 10 digits)
net:add(nn.LogSoftMax()) -- converts the output to a log-probability. Useful for classification problems
Let us define the Loss function
criterion = nn.ClassNLLCriterion()
定义损失函数
Train the neural network
trainer = nn.StochasticGradient(net, criterion)
trainer.learningRate = 0.001
trainer.maxIteration = 5 -- just do 5 epochs of training.
随机梯度下降
trainer:train(trainset)
Test the network, print accuracy
print(classes[testset.label[100]])
itorch.image(testset.data[100])
先显示一张图
testset.data = testset.data:double() -- convert from Byte tensor to Double tensor
for i=1,3 do -- over each image channel
testset.data[{ {}, {i}, {}, {} }]:add(-mean[i]) -- mean subtraction
testset.data[{ {}, {i}, {}, {} }]:div(stdv[i]) -- std scaling
end
数据归一化
-- for fun, print the mean and standard-deviation of example-100
horse = testset.data[100]
print(horse:mean(), horse:std())
print(classes[testset.label[100]])
itorch.image(testset.data[100])
predicted = net:forward(testset.data[100])
看下我们训练的神经网络认为上面的图是什么
-- the output of the network is Log-Probabilities. To convert them to probabilities, you have to take e^x
print(predicted:exp())
for i=1,predicted:size(1) do
print(classes[i], predicted[i])
end
它会给每个分类一个概率,给定图像
correct = 0
for i=1,10000 do
local groundtruth = testset.label[i]
local prediction = net:forward(testset.data[i])
local confidences, indices = torch.sort(prediction, true) -- true means sort in descending order
if groundtruth == indices[1] then
correct = correct + 1
end
end
print(correct, 100*correct/10000 .. ' % ')
哪有多少是正确的呢?
class_performance = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
for i=1,10000 do
local groundtruth = testset.label[i]
local prediction = net:forward(testset.data[i])
local confidences, indices = torch.sort(prediction, true) -- true means sort in descending order
if groundtruth == indices[1] then
class_performance[groundtruth] = class_performance[groundtruth] + 1
end
end
for i=1,#classes do
print(classes[i], 100*class_performance[i]/1000 .. ' %')
end
表现好的,表现不好的
python实现
#1 import
import tensorflow as tf
import numpy as np
from tensorflow.examples.tutorials.mnist import input_data
# 2 load data
mnist = input_data.read_data_sets('MNIST_data',one_hot = True)
# 3 input
imageInput = tf.placeholder(tf.float32,[None,784]) # [图片个数,28*28]
labeInput = tf.placeholder(tf.float32,[None,10]) # [图片个数,10]
# 4 data reshape
# [None,784]->M*28*28*1 2D->4D 28*28 wh 1 channel 2维转换成4维
imageInputReshape = tf.reshape(imageInput,[-1,28,28,1])
# 5 卷积 w0 : 卷积内核 5*5 out:32 in:1 类似上面torch的部分
w0 = tf.Variable(tf.truncated_normal([5,5,1,32],stddev = 0.1)) #权重,服从正态分布,方差为0.01
b0 = tf.Variable(tf.constant(0.1,shape=[32])) #偏移b,是个常量,偏移是当前权重矩阵与输入内容卷积之后相加的值,最后一维得保持一致
# 6 # layer1:激励函数+卷积运算
# imageInputReshape : M*28*28*1 w0:5,5,1,32
layer1 = tf.nn.relu(tf.nn.conv2d(imageInputReshape,w0,strides=[1,1,1,1],padding='SAME')+b0)#参数:输入图像数据,w,步长,padding卷积核可以停在图像边缘
# M*28*28*32
# pool 采样 数据量减少很多M*28*28*32 => M*7*7*32
layer1_pool = tf.nn.max_pool(layer1,ksize=[1,4,4,1],strides=[1,4,4,1],padding='SAME')
# [1 2 3 4]->[4] 数据太大,来个抽样,原数据减小4倍
# 7 layer2 out : 激励函数+乘加运算: softmax(激励函数 + 乘加运算)
# [7*7*32,1024]
w1 = tf.Variable(tf.truncated_normal([7*7*32,1024],stddev=0.1))
b1 = tf.Variable(tf.constant(0.1,shape=[1024]))
h_reshape = tf.reshape(layer1_pool,[-1,7*7*32])# M*7*7*32 -> N*N1
# [N*7*7*32] [7*7*32,1024] = N*1024
h1 = tf.nn.relu(tf.matmul(h_reshape,w1)+b1)
# 7.1 softMax
w2 = tf.Variable(tf.truncated_normal([1024,10],stddev=0.1))
b2 = tf.Variable(tf.constant(0.1,shape=[10]))
pred = tf.nn.softmax(tf.matmul(h1,w2)+b2)# N*1024 1024*10 = N*10
# N*10( 概率 )N1【0.1 0.2 0.4 0.1 0.2 。。。】
# label。 【0 0 0 0 1 0 0 0.。。】
loss0 = labeInput*tf.log(pred) #维度累加取均值
loss1 = 0
# 7.2
for m in range(0,500):# test 100
for n in range(0,10):
loss1 = loss1 - loss0[m,n]
loss = loss1/500
# 8 train 就是让损失函数最小
train = tf.train.GradientDescentOptimizer(0.01).minimize(loss)
# 9 run
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for i in range(100):
images,labels = mnist.train.next_batch(500)
sess.run(train,feed_dict={imageInput:images,labeInput:labels})
pred_test = sess.run(pred,feed_dict={imageInput:mnist.test.images,labeInput:labels})
acc = tf.equal(tf.arg_max(pred_test,1),tf.arg_max(mnist.test.labels,1))
acc_float = tf.reduce_mean(tf.cast(acc,tf.float32))
acc_result = sess.run(acc_float,feed_dict={imageInput:mnist.test.images,labeInput:mnist.test.labels})
print(acc_result)
emmmmmm,学的似懂非懂~
一个搬运工永远不会真的懂,自己动手之后才会真正理解
以后再看吧,温故而知新