深度学习---卷积神经网络代码

最新推荐文章于 2025-07-10 10:01:46 发布

CHERISHGF

最新推荐文章于 2025-07-10 10:01:46 发布

阅读量1.1k

点赞数 1

CC 4.0 BY-SA版权

分类专栏：数据挖掘学习笔记

本文链接：https://blog.csdn.net/CHERISHGF/article/details/82946459

数据挖掘学习笔记专栏收录该内容

27 篇文章

订阅专栏

本文详细介绍如何使用Torch构建CNN，包括参数解释、网络搭建、数据加载与预处理、训练及测试过程。同时，提供Python实现CNN的代码示例，对比两种编程环境下的神经网络构建。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

梳理CNN流程，解释参数

Torch与python对比学习

Deep Learning with Torch

https://github.com/soumith/cvpr2015/blob/master/Deep%20Learning%20with%20Torch.ipynb

手写数字神经网络，是一个简单的前馈网络。接受输入然后一层层的传递，最后给出输出。

require 'nn'; #在torch中使用nn包来构建神经网络
net = nn.Sequential()  #是个容器，下面解释
net:add(nn.SpatialConvolution(1, 6, 5, 5)) -- 1 input image channel, 6 output channels, 5x5 convolution kernel
卷积运算：1个输入通道，6个输出通道，5x5的卷积核
net:add(nn.ReLU())                       -- non-linearity 
ReLU是一种非线性激励函数，如果没有非线性激励函数，可以验证每一层的输出都是上一层输入的线性函数，无论多少层都是线性关系，这就是最原始的感知机了，没有太大的意义。
net:add(nn.SpatialMaxPooling(2,2,2,2))     -- A max-pooling operation that looks at 2x2 windows and finds the max.
池化层是为了减小输出的大小，降低过拟合
net:add(nn.SpatialConvolution(6, 16, 5, 5))
卷积运算：6个输入通道，16个输出通道，5x5的卷积核
net:add(nn.ReLU())                       -- non-linearity 
net:add(nn.SpatialMaxPooling(2,2,2,2))
net:add(nn.View(16*5*5))                    -- reshapes from a 3D tensor of 16x5x5 into 1D tensor of 16*5*5
将上面计算后的3维转换成一维数组，为了下面进行全连接层的计算
net:add(nn.Linear(16*5*5, 120))          -- fully connected layer (matrix multiplication between input and weights)
全连接层：输入与权之间的矩阵乘法
net:add(nn.ReLU())                       -- non-linearity 
net:add(nn.Linear(120, 84))
net:add(nn.ReLU())                       -- non-linearity 
net:add(nn.Linear(84, 10))                 -- 10 is the number of outputs of the network (in this case, 10 digits)
net:add(nn.LogSoftMax())           -- converts the output to a log-probability. Useful for classification problems
输出转换成概率
print('Lenet5\n' .. net:__tostring());

不同的容器：

automatic differentiation

向前（输入）函数：计算给定输入的输出，使输入通过网络流动

向后（输入，梯度）功能：区分神经元传递的梯度

全连接层：全连接层就是高度提纯的特征了，方便交给最后的分类器或者回归

意会一下啊~

这样计算之后数据就变少了

input = torch.rand(1,32,32) -- pass a random tensor as input to the network
随机初始化一个输入
output = net:forward(input)
前向运算
print(output)
net:zeroGradParameters() -- zero the internal gradient buffers of the network (will come to this later)
gradInput = net:backward(input, torch.rand(10))

print(#gradInput)

criterion = nn.ClassNLLCriterion() -- a negative log-likelihood criterion for multi-class classification
定义损失函数
criterion:forward(output, 3) -- let's say the groundtruth was class number: 3
gradients = criterion:backward(output, 3)
gradInput = net:backward(input, gradients)

损失函数

当你想要一个模型来学习做某事时，你给它反馈它做得有多好。这个计算模型性能的客观度量的函数称为损失函数。

典型的损失函数接收模型的输出和基础事实，并计算量化模型性能的值。

例子

We now have 5 steps left to do in training our first torch neural network

Load and normalize data
Define Neural Network
Define Loss function
Train network on training data
Test network on test data.

Load and normalize data

require 'paths'
if (not paths.filep("cifar10torchsmall.zip")) then
    os.execute('wget -c https://s3.amazonaws.com/torch7/data/cifar10torchsmall.zip')
    os.execute('unzip cifar10torchsmall.zip')
end
trainset = torch.load('cifar10-train.t7')
testset = torch.load('cifar10-test.t7')
classes = {'airplane', 'automobile', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck'}
print(trainset)
print(#trainset.data)
事先将数据准备成了50000x3x32x32（训练）和10000x3x32x32（测试）

itorch.image(trainset.data[100]) -- display the 100-th image in dataset
print(classes[trainset.label[100]])
显示第100张图片，看一下

-- ignore setmetatable for now, it is a feature beyond the scope of this tutorial. It sets the index operator. setmetatable(trainset, {__index = function(t, i) return {t.data[i], t.label[i]} end} ); trainset.data = trainset.data:double() -- convert the data from a ByteTensor to a DoubleTensor. function trainset:size() return self.data:size(1) end
建立一个数据集索引

print(trainset:size()) -- just to test
print(trainset[33]) -- load sample number 33.
itorch.image(trainset[33][1])
测试一下

redChannel = trainset.data[{ {}, {1}, {}, {}  }] -- this picks {all images, 1st channel, all vertical pixels, all horizontal pixels}
{所有图像，第一通道，所有垂直像素，所有水平像素}
print(#redChannel)

mean = {} -- store the mean, to normalize the test set in the future
stdv  = {} -- store the standard-deviation for the future
for i=1,3 do -- over each image channel
    mean[i] = trainset.data[{ {}, {i}, {}, {}  }]:mean() -- mean estimation
    print('Channel ' .. i .. ', Mean: ' .. mean[i])
    trainset.data[{ {}, {i}, {}, {}  }]:add(-mean[i]) -- mean subtraction
    
    stdv[i] = trainset.data[{ {}, {i}, {}, {}  }]:std() -- std estimation
    print('Channel ' .. i .. ', Standard Deviation: ' .. stdv[i])
    trainset.data[{ {}, {i}, {}, {}  }]:div(stdv[i]) -- std scaling
end
数据归一化处理

Time to define our neural network

net = nn.Sequential()
net:add(nn.SpatialConvolution(3, 6, 5, 5)) -- 3 input image channels, 6 output channels, 5x5 convolution kernel
net:add(nn.ReLU())                       -- non-linearity 
net:add(nn.SpatialMaxPooling(2,2,2,2))     -- A max-pooling operation that looks at 2x2 windows and finds the max.
net:add(nn.SpatialConvolution(6, 16, 5, 5))
net:add(nn.ReLU())                       -- non-linearity 
net:add(nn.SpatialMaxPooling(2,2,2,2))
net:add(nn.View(16*5*5))                    -- reshapes from a 3D tensor of 16x5x5 into 1D tensor of 16*5*5
net:add(nn.Linear(16*5*5, 120))             -- fully connected layer (matrix multiplication between input and weights)
net:add(nn.ReLU())                       -- non-linearity 
net:add(nn.Linear(120, 84))
net:add(nn.ReLU())                       -- non-linearity 
net:add(nn.Linear(84, 10))                   -- 10 is the number of outputs of the network (in this case, 10 digits)
net:add(nn.LogSoftMax())                     -- converts the output to a log-probability. Useful for classification problems

Let us define the Loss function

criterion = nn.ClassNLLCriterion()
定义损失函数

Train the neural network

trainer = nn.StochasticGradient(net, criterion)
trainer.learningRate = 0.001
trainer.maxIteration = 5 -- just do 5 epochs of training.
随机梯度下降
trainer:train(trainset)

Test the network, print accuracy

print(classes[testset.label[100]])
itorch.image(testset.data[100])
先显示一张图

testset.data = testset.data:double()   -- convert from Byte tensor to Double tensor
for i=1,3 do -- over each image channel
    testset.data[{ {}, {i}, {}, {}  }]:add(-mean[i]) -- mean subtraction    
    testset.data[{ {}, {i}, {}, {}  }]:div(stdv[i]) -- std scaling
end
数据归一化
-- for fun, print the mean and standard-deviation of example-100
horse = testset.data[100]
print(horse:mean(), horse:std())

print(classes[testset.label[100]])
itorch.image(testset.data[100])
predicted = net:forward(testset.data[100])
看下我们训练的神经网络认为上面的图是什么
-- the output of the network is Log-Probabilities. To convert them to probabilities, you have to take e^x 
print(predicted:exp())

for i=1,predicted:size(1) do
    print(classes[i], predicted[i])
end
它会给每个分类一个概率，给定图像

correct = 0
for i=1,10000 do
    local groundtruth = testset.label[i]
    local prediction = net:forward(testset.data[i])
    local confidences, indices = torch.sort(prediction, true)  -- true means sort in descending order
    if groundtruth == indices[1] then
        correct = correct + 1
    end
end
print(correct, 100*correct/10000 .. ' % ')
哪有多少是正确的呢？

class_performance = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
for i=1,10000 do
    local groundtruth = testset.label[i]
    local prediction = net:forward(testset.data[i])
    local confidences, indices = torch.sort(prediction, true)  -- true means sort in descending order
    if groundtruth == indices[1] then
        class_performance[groundtruth] = class_performance[groundtruth] + 1
    end
end
for i=1,#classes do
    print(classes[i], 100*class_performance[i]/1000 .. ' %')
end
表现好的，表现不好的

python实现
#1 import
import tensorflow as tf
import numpy as np
from tensorflow.examples.tutorials.mnist import input_data
# 2 load data
mnist = input_data.read_data_sets('MNIST_data',one_hot = True)
# 3 input
imageInput = tf.placeholder(tf.float32,[None,784]) # [图片个数,28*28]
labeInput = tf.placeholder(tf.float32,[None,10]) # [图片个数,10]
# 4 data reshape
# [None,784]->M*28*28*1 2D->4D 28*28 wh 1 channel 2维转换成4维
imageInputReshape = tf.reshape(imageInput,[-1,28,28,1])
# 5 卷积 w0 : 卷积内核 5*5 out:32 in:1 类似上面torch的部分
w0 = tf.Variable(tf.truncated_normal([5,5,1,32],stddev = 0.1)) #权重，服从正态分布，方差为0.01
b0 = tf.Variable(tf.constant(0.1,shape=[32])) #偏移b，是个常量，偏移是当前权重矩阵与输入内容卷积之后相加的值，最后一维得保持一致
# 6 # layer1：激励函数+卷积运算
# imageInputReshape : M*28*28*1 w0:5,5,1,32
layer1 = tf.nn.relu(tf.nn.conv2d(imageInputReshape,w0,strides=[1,1,1,1],padding='SAME')+b0)#参数：输入图像数据，w，步长，padding卷积核可以停在图像边缘
# M*28*28*32
# pool 采样数据量减少很多M*28*28*32 => M*7*7*32
layer1_pool = tf.nn.max_pool(layer1,ksize=[1,4,4,1],strides=[1,4,4,1],padding='SAME')
# [1 2 3 4]->[4] 数据太大，来个抽样，原数据减小4倍
# 7 layer2 out : 激励函数+乘加运算： softmax（激励函数 + 乘加运算）
# [7*7*32,1024]
w1 = tf.Variable(tf.truncated_normal([7*7*32,1024],stddev=0.1))
b1 = tf.Variable(tf.constant(0.1,shape=[1024]))
h_reshape = tf.reshape(layer1_pool,[-1,7*7*32])# M*7*7*32 -> N*N1
# [N*7*7*32] [7*7*32,1024] = N*1024
h1 = tf.nn.relu(tf.matmul(h_reshape,w1)+b1)
# 7.1 softMax
w2 = tf.Variable(tf.truncated_normal([1024,10],stddev=0.1))
b2 = tf.Variable(tf.constant(0.1,shape=[10]))
pred = tf.nn.softmax(tf.matmul(h1,w2)+b2)# N*1024 1024*10 = N*10
# N*10( 概率 )N1【0.1 0.2 0.4 0.1 0.2 。。。】
# label。【0 0 0 0 1 0 0 0.。。】
loss0 = labeInput*tf.log(pred) #维度累加取均值
loss1 = 0
# 7.2
for m in range(0,500):# test 100
for n in range(0,10):
loss1 = loss1 - loss0[m,n]
loss = loss1/500

# 8 train 就是让损失函数最小
train = tf.train.GradientDescentOptimizer(0.01).minimize(loss)
# 9 run
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for i in range(100):
images,labels = mnist.train.next_batch(500)
sess.run(train,feed_dict={imageInput:images,labeInput:labels})

pred_test = sess.run(pred,feed_dict={imageInput:mnist.test.images,labeInput:labels})
acc = tf.equal(tf.arg_max(pred_test,1),tf.arg_max(mnist.test.labels,1))
acc_float = tf.reduce_mean(tf.cast(acc,tf.float32))
acc_result = sess.run(acc_float,feed_dict={imageInput:mnist.test.images,labeInput:mnist.test.labels})
print(acc_result)

emmmmmm,学的似懂非懂~

一个搬运工永远不会真的懂，自己动手之后才会真正理解

以后再看吧，温故而知新