MobileNets(V1)简介及两个初步的代码实验

笨牛慢耕

已于 2023-02-13 10:42:14 修改

阅读量3.6k

点赞数 2

CC 4.0 BY-SA版权

分类专栏：深度学习文章标签： tensorflow MobileNet CIFAR10 深度可分离卷积

于 2022-04-25 21:02:55 首次发布

本文链接：https://blog.csdn.net/chenxy_bwave/article/details/124384797

深度学习专栏收录该内容

36 篇文章

订阅专栏

本文深入探讨了MobileNet的原理，包括Depthwise Separable Convolution，网络结构，Width Multiplier和Resolution Multiplier。通过两个代码实验展示了如何使用预训练的MobileNet进行猫狗分类和CIFAR10数据集的10分类任务，验证了模型的有效性和灵活性。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

1. 前言

2. Depthwise Separable Convolution

3. 网络结构及训练

4. Width Multiplier and Resoluton Multiplier

5. 实验结果

6. 代码实验1：基于预训练模型特征提取的猫狗分类器

6.1 模型加载及搭建

6.2 数据集生成及特征提取

6.3 模型训练及测试

6.4 accuracy and loss curve

1. 前言

MobileNet在2017年由谷歌2017年提出[1]，主要是为了解决卷积神经网络在移动端或者嵌入式平台中落地应用的运算负荷的问题。一方面，各路大神不断地在修炼越来越大的网络，另一方面，移动端或者嵌入式则要求轻量级的网络，同时又要求性能上没有严重的损失，正所谓既要马儿跑的好，又要马儿不吃草。

版本号V1是后来又出了V2,和V3后加上去的，以示区分。

MobileNets V1的几个要点分别在以下几节介绍。

2. Depthwise Separable Convolution

一般翻译成深度可分离卷积。但是这个翻译其实把英文原名中的depthwise中的-wise的意思丢失了。-wise表示的是每层各自分别做的意思。深度对应的是输入张量中的通道，标准的卷积层是多个输入通道一起做卷积运算并进行通道间合并（求和）并得到一个通道数为1的张量。

而深度可分离卷积则是将标准的卷积层分为两层来做：

首先是各通道单独做卷积运算，称之为Depthwise Convolution
然后用一个1*1的标准卷积层进行各通道间的合并，称之为Point-Wise Convolution

标准卷积层与深度可分离卷积层的对比如下图所示：

watermark,type_d3F5LXplbmhlaQ,shadow_50,text_Q1NETiBA56yo54mb5oWi6ICV,size_17,color_FFFFFF,t_70,g_se,x_16

关于深度可分离卷积的更详细解说，以及它相对于标准卷积的优势等，参见[Ref2]A brief introduction to Depthwise Separable Convolution。

3. 网络结构及训练

MobileNet的网络结构如下表所示。

watermark,type_d3F5LXplbmhlaQ,shadow_50,text_Q1NETiBA56yo54mb5oWi6ICV,size_14,color_FFFFFF,t_70,g_se,x_16

其中，s1或s2指的是stride=1 or stride=2。也就是说，MobileNet中没有像常规的卷积神经网络一样采用MaxPooling，而是采用stride=2来降低W和H。甚至在最后全连接层之前使用了Average Pooling，这个也是标准卷积神经网络中不多见的。

看别的博客中还提及“还有一个不同是采用的是ReLU6，这个激活函数在6的时候有一个边界，论文中提到说这样“可以在低精度计算下具有更强的鲁棒性””，但是我在论文中并没有找到相关描述，难道是挂在arXiv的论文经过了修改，这一点后来被作者删掉了？

MobileNet在TensorFlow中实现，优化算法采用的是RMSprop，并且采用了与Inception V3类似的异步梯度下降（asynchronous gradient descent）。相比大模型的训练，MobileNet的训练有以下几个不同之处：

减少了regularization和数据增强的使用，因为小模型天然地更不容易过拟合
Do not use side heads or label smoothing and additionally reduce the amount image of distortions by limiting the size of small crops that are used in large Inception training ...???没看明白啥玩意儿.
对Depthwise卷积层几乎没有使用weight decay (l2 regularization)，因为它们本来参数就很少

4. Width Multiplier and Resoluton Multiplier

为了方便更加灵活地适配于不同的应用需要，MobileNet还提供了两个乘性参数用于生成reduced MobileNet。

Width Multiplier（记为 $gif.latex?%5Calpha$ ）用于输入图像的通道数的调节（不知道为什么名字为Width Multiplier，难道不是叫Depth Multiplier更合理一些吗？）, $gif.latex?%5Calpha$ 为1时对应的是MobileNet基本模型。典型的设置为 $gif.latex?%5Calpha%20%3D%201%2C%200.75%2C%200.5%2C%200.25$ 。但是这里有点不解的是，对于通常的图像处理应用，输入图像是3个通道，那 $gif.latex?%5Calpha$ 取值0.75,0.5和0.25时分别意味着什么（T.B.D）？

Resolution Multiplier（记为 $gif.latex?%5Crho$ ）用于降低图像分辨率，典型的设置是适配输入图像大小：224（ $gif.latex?%5Crho%3D1$ ）, 192, 160 or 128。

5. 实验结果

论文中给出各种对比测试结果，反正都是证明MobileNet如何如何有效^-^。详细结果此处略。

6. 代码实验1：基于预训练模型特征提取的猫狗分类器

本实验沿袭深度学习笔记：利用预训练模型之特征提取训练小数据集上的图像分类器的套路，基于TensorFlow中内置的MobileNet预训练模型进行特征提取，训练一个猫狗分类器。实验环境：Win10+Anaconda+Jupyter Notebook+TF2.5.

6.1 模型加载及搭建

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras import utils
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image
print(tf.__version__)

INPUT_WIDTH = 160
INPUT_HEIGHT = 160
N_CHANNELS = 3
N_CLASSES  = 2
# 1. Import the empty architecture
mobilenet_base = tf.keras.applications.MobileNet(
    input_shape=[INPUT_WIDTH, INPUT_HEIGHT, N_CHANNELS],
    
    # Removing the fully-connected layer at the top of the network.
    #  Unless you have the same number of labels as the original architecture, 
    #  you should remove it.
    include_top=False,  
    
    # Using no pretrained weights (random initialization)
    weights="imagenet")

# 2. Define the full architecture by adding a classification head.
#    For this example, I chose to flatten the results and use a single Dense layer.
model = tf.keras.Sequential()
model.add(mobilenet_base)
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(N_CLASSES))
model.summary()

watermark,type_d3F5LXplbmhlaQ,shadow_50,text_Q1NETiBA56yo54mb5oWi6ICV,size_20,color_FFFFFF,t_70,g_se,x_16

6.2 数据集生成及特征提取

详细请参考：深度学习笔记：利用预训练模型之特征提取训练小数据集上的图像分类器。

# Data generators
import os
from tensorflow.keras.preprocessing.image import ImageDataGenerator
 
batch_size = 32
train_dir = os.path.join('F:\DL\cats_vs_dogs_small', 'train')
test_dir = os.path.join('F:\DL\cats_vs_dogs_small', 'test')
train_datagen = ImageDataGenerator(rescale=1./255,validation_split=0.3)
test_datagen = ImageDataGenerator(rescale=1./255)
 
train_generator = train_datagen.flow_from_directory(
    directory=train_dir,
    target_size=(INPUT_WIDTH, INPUT_HEIGHT),
    color_mode="rgb",
    batch_size=batch_size,
    class_mode="binary",
    subset='training',
    shuffle=True,
    seed=42
)
valid_generator = train_datagen.flow_from_directory(
    directory=train_dir,
    target_size=(INPUT_WIDTH, INPUT_HEIGHT),
    color_mode="rgb",
    batch_size=batch_size,
    class_mode="binary",
    subset='validation',
    shuffle=True,
    seed=42
)
test_generator = test_datagen.flow_from_directory(
    directory=test_dir,
    target_size=(INPUT_WIDTH, INPUT_HEIGHT),
    color_mode="rgb",
    batch_size=batch_size,
    class_mode='binary',
    shuffle=False,
    seed=42
)
import numpy as np
 
def get_features_and_labels(dataGenerator):
    all_features = []
    all_labels = []
    k = 0
    for images, labels in dataGenerator:
        features = mobilenet_base.predict(images)
        all_features.append(features)
        all_labels.append(labels)
        k += 1
        if dataGenerator.batch_size * (k+1) > dataGenerator.n:
            break
    print('Totally, {0}-batches with batch_size={1}'.format(k,dataGenerator.batch_size))
    return np.concatenate(all_features), np.concatenate(all_labels)
 
train_features, train_labels =  get_features_and_labels(train_generator)
val_features, val_labels =  get_features_and_labels(valid_generator)
test_features, test_labels =  get_features_and_labels(test_generator)
print(train_features.shape,train_labels.shape)

watermark,type_d3F5LXplbmhlaQ,shadow_50,text_Q1NETiBA56yo54mb5oWi6ICV,size_20,color_FFFFFF,t_70,g_se,x_16

6.3 模型训练及测试

from tensorflow.keras import optimizers
inputs = keras.Input(shape=(5, 5, 1024)) # This shape has to be the same as the output shape of the convbase
x = layers.Flatten()(inputs)
x = layers.Dense(256,activation='relu')(x)
x = layers.Dropout(0.5)(x)
outputs = layers.Dense(1, activation="sigmoid")(x)
model = keras.Model(inputs, outputs)
model.compile(loss="binary_crossentropy",
              optimizer=optimizers.RMSprop(learning_rate=2e-5),
              metrics=["accuracy"])
 
callbacks = [
    keras.callbacks.ModelCheckpoint(
      filepath="feature_extraction.mobilenet",
      save_best_only=True,
      monitor="val_loss")
]
history = model.fit(
    train_features, train_labels,
    epochs=32,
    validation_data=(val_features, val_labels),
    callbacks=callbacks)

训练结果如下： watermark,type_d3F5LXplbmhlaQ,shadow_50,text_Q1NETiBA56yo54mb5oWi6ICV,size_20,color_FFFFFF,t_70,g_se,x_16

6.4 accuracy and loss curve

import matplotlib.pyplot as plt
accuracy = history.history["accuracy"]
val_accuracy = history.history["val_accuracy"]
loss = history.history["loss"]
val_loss = history.history["val_loss"]
epochs = range(1, len(accuracy) + 1)
fig,ax = plt.subplots(1,2,figsize=(12,6)) # figsize=(width, height)
ax[0].plot(epochs, accuracy, "bo", label="Training accuracy")
ax[0].plot(epochs, val_accuracy, "b", label="Validation accuracy")
ax[0].set_title("Training and validation accuracy")
ax[0].legend()
ax[1].plot(epochs, loss, "bo", label="Training loss")
ax[1].plot(epochs, val_loss, "b", label="Validation loss")
ax[1].set_title("Training and validation loss")
ax[1].legend()

watermark,type_d3F5LXplbmhlaQ,shadow_50,text_Q1NETiBA56yo54mb5oWi6ICV,size_20,color_FFFFFF,t_70,g_se,x_16

从以上结果来看，应该是存在明显的overfitting的。

7. 代码实验2：cifar10分类器

本节的实验是利用MobileNet的网络结构（除了最后的全连接层外），在Cifar10数据上从头开始训练，看看能够到达什么样的分类性能。

7.1 模型加载

INPUT_WIDTH = 32
INPUT_HEIGHT = 32
N_CHANNELS = 3
N_CLASSES = 10

# 1. Import the empty architecture
mobilenet_v1 = tf.keras.applications.MobileNet(
    input_shape=[INPUT_WIDTH, INPUT_HEIGHT, N_CHANNELS],
    
    # Removing the fully-connected layer at the top of the network.
    #  Unless you have the same number of labels as the original architecture, 
    #  you should remove it.
    include_top=False,  
    
    # Using no pretrained weights (random initialization)
    weights=None)

# 2. Define the full architecture by adding a classification head.
#    For this example, I chose to flatten the results and use a single Dense layer.

model = tf.keras.Sequential()
model.add(mobilenet_v1)
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(N_CLASSES, activation='softmax'))
model.summary()

watermark,type_d3F5LXplbmhlaQ,shadow_50,text_Q1NETiBA56yo54mb5oWi6ICV,size_20,color_FFFFFF,t_70,g_se,x_16

7.2 Cifar10数据集加载

(x_train, y_train), (x_test, y_test) = tf.keras.datasets.cifar10.load_data()
x_train = x_train / 255.0
y_train = tf.keras.utils.to_categorical(y_train, N_CLASSES)

如果本机上没有下载过这个数据的话，会自动下载，需要一点时间。请耐心等待。

7.3 训练

# Training
model.compile(
    loss='categorical_crossentropy',
    optimizer='RMSprop',
    metrics=['categorical_accuracy', 'Recall', 'AUC']
)

model.fit(x_train, y_train, batch_size=64, epochs=10)
model.save('mobilenet_v1_cifar10.h5')

ETA: 0s - loss: 0.4004 - categorical_accuracy: 0.8748 - recall: 0.8530 - auc:0.98

一开始用adam训练的，accuracy只有70%不到。后来想起来论文中是说用的是RMSprop训练的，换了以后果然性能大幅度提升。不过没有对比，所以不知道这个算是什么程度的分类性能（待补充）