解析一个 tensorflow 项目 – residual network 残差网络结构。
resnet_model
调用关系:
1. official/resnet/imagenet_main.py
:
ImagenetModel() 类是继承 official/resnet/resnet_model.py 的 Model() 类,并初始化 __init__ 构造函数
设置的参数:
super(ImagenetModel, self).__init__(
resnet_size=resnet_size,
bottleneck=bottleneck,
num_classes=num_classes,
num_filters=64,
kernel_size=7,
conv_stride=2,
first_pool_size=3,
first_pool_stride=2,
second_pool_size=7,
second_pool_stride=1,
block_sizes=_get_block_sizes(resnet_size),
block_strides=[1, 2, 2, 2],
final_size=final_size,
version=version,
data_format=data_format,
dtype=dtype
)
2. official/resnet/resnet_model.py
的 Model()
类:
其定义了内置函数 __call__(self, inputs, training)。使得可直接用类实例化。
3. Model()
类的内置函数 __call__()
:
- 3.1
with
包含variable_scope('resnet_model', )
:
with self._model_variable_scope():
- 3.2 第一个7x7的卷积层与池化层。根据 residual network 论文所述,先进行第一个7x7的卷积层与池化层,如图。以下代码implement 第一个卷积层:
inputs = conv2d_fixed_padding(
inputs=inputs, filters=self.num_filters, kernel_size=self.kernel_size,
strides=self.conv_stride, data_format=self.data_format)
inputs = tf.identity(inputs, 'initial_conv')
__init__
的参数对应着论文里的参数:
self.num_filters = 64,
self.kernel_size = 7,
self.conv_stride = 2,
这里需要加 tf.identity() 方法的原因是将 input 转换为一个 op, 使得后续layer 可对 input 执行操作
接着是池化层:
inputs = tf.layers.max_pooling2d(
inputs=inputs, pool_size=self.first_pool_size,
strides=self.first_pool_stride, padding='SAME',
data_format=self.data_format)
inputs = tf.identity(inputs, 'initial_max_pool')
同样, __init__
的参数对应着论文里的参数:
first_pool_size = 3,
first_pool_stride = 2,
- 3.3 构建一串 block。
从论文科可以看到,作者构建的 residual network 是由一个个的卷积 block 构成,如图,每一 block 含2个卷积层,第一组有 3 个卷积block,也就是有三个shortcut,第二组有 4 个卷积block ,第三组有 6 个卷积block,第四组有3 个卷积block。
代码中可看到,block_layer
的第一个 block_fn
与 其后的分开计算,原因是因为第一个 block_fn
需要做一次 projection_shortcut()
,projection_shortcut()
的作用是执行一个卷积层(kernel大小: 1x1, filter数量: x4,stride长度: 2).
def projection_shortcut(inputs):
return conv2d_fixed_padding(
inputs=inputs, filters=filters_out, kernel_size=1, strides=strides,
data_format=data_format)
# Only the first block per block_layer uses projection_shortcut and strides
inputs = block_fn(inputs, filters, training, projection_shortcut, strides,
data_format)
这时候的 block_fn()
是 _bottleneck_block_v1()
,如下代码,当做 projection_shortcut()
时,input 会加上 shortcut,当不做 projection_shortcut()
时,input 会自加一次。论文中列举了两种 block,一种是 identify 型,一种是 bottleneck 型 ,代码展示的是做一次 “bottleneck” 型的block。如图
def _bottleneck_block_v1(inputs, filters, training, projection_shortcut,
strides, data_format):
"""A single block for ResNet v1, with a bottleneck.
Similar to _building_block_v1(), except using the "bottleneck" blocks
described in:
Convolution then batch normalization then ReLU as described by:
Deep Residual Learning for Image Recognition
https://arxiv.org/pdf/1512.03385.pdf
by Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, Dec 2015.
Args:
inputs: A tensor of size [batch, channels, height_in, width_in] or
[batch, height_in, width_in, channels] depending on data_format.
filters: The number of filters for the convolutions.
training: A Boolean for whether the model is in training or inference
mode. Needed for batch normalization.
projection_shortcut: The function to use for projection shortcuts
(typically a 1x1 convolution when downsampling the input).
strides: The block's stride. If greater than 1, this block will ultimately
downsample the input.
data_format: The input format ('channels_last' or 'channels_first').
Returns:
The output tensor of the block; shape should match inputs.
"""
shortcut = inputs
if projection_shortcut is not None:
shortcut = projection_shortcut(inputs)
shortcut = batch_norm(inputs=shortcut, training=training,
data_format=data_format)
inputs = conv2d_fixed_padding(
inputs=inputs, filters=filters, kernel_size=1, strides=1,
data_format=data_format)
inputs = batch_norm(inputs, training, data_format)
inputs = tf.nn.relu(inputs)
inputs = conv2d_fixed_padding(
inputs=inputs, filters=filters, kernel_size=3, strides=strides,
data_format=data_format)
inputs = batch_norm(inputs, training, data_format)
inputs = tf.nn.relu(inputs)
inputs = conv2d_fixed_padding(
inputs=inputs, filters=4 * filters, kernel_size=1, strides=1,
data_format=data_format)
inputs = batch_norm(inputs, training, data_format)
inputs += shortcut
inputs = tf.nn.relu(inputs)
return inputs
结合代码与下图可以看出,确实是做一次 “bottleneck” 型的block,
作者在论文中解释,在同一个block 内,input 和 output 的维数是一样的(用实线连一个shortcut),但在不同block衔接时,input 到 output 维度会增加(虚线连接一个shortcut),因此,每一个 block_layer()
的第一个 block_fn()
需要先做一个 projection_shortcut()
,并使用参数规定的 strides,(如第一个block_layer的strides=1,第二个block_layer的strides=2,第三个block_layer的strides=2,第四个block_layer的strides=2),剩下的 block_fn()
不做 projection_shortcut()
,并统一使用 strides=1
,在如下代码中有体现。牢记这一点,后续算维度的时候才能够准确无误。
# Only the first block per block_layer uses projection_shortcut and strides
inputs = block_fn(inputs, filters, training, projection_shortcut, strides,
data_format)
for _ in range(1, blocks):
inputs = block_fn(inputs, filters, training, None, 1, data_format)
上述代码展示了每个 block_layer 的计算流程,第一个 block_fn
是需要计算 projection_shortcut
,其余不需要计算 projection_shortcut
。第一个图片中不同颜色区分的层即表示一个 block_layer
,一层表示一个 block_fn()
。
至此,residual network 主体结构展示完毕。看看其模型的参数变化:
输入:
shape=(?, 3, 224, 224)
第一次卷积:
shape=(?, 64, 112, 112)
第一次池化:
shape=(?, 64, 56, 56)
接下来,进入shortcut 阶段:
lin:shortcut---------> Tensor("resnet_model/batch_normalization/FusedBatchNorm:0", shape=(?, 256, 56, 56), dtype=float32, device=/replica:0/task:0/device:GPU:0)
lin:conv2d1---------> Tensor("resnet_model/Relu:0", shape=(?, 64, 56, 56), dtype=float32, device=/replica:0/task:0/device:GPU:0) Kernel size: 1 strides: 1
lin:conv2d2---------> Tensor("resnet_model/Relu_1:0", shape=(?, 64, 56, 56), dtype=float32, device=/replica:0/task:0/device:GPU:0) Kernel size: 3 strides: 1
lin:conv2d3---------> Tensor("resnet_model/batch_normalization_3/FusedBatchNorm:0", shape=(?, 256, 56, 56), dtype=float32, device=/replica:0/task:0/device:GPU:0) Ker
nel size: 1 strides: 1lin:conv2d1---------> Tensor("resnet_model/Relu_3:0", shape=(?, 64, 56, 56), dtype=float32, device=/replica:0/task:0/device:GPU:0) Kernel size: 1 strides: 1
lin:conv2d2---------> Tensor("resnet_model/Relu_4:0", shape=(?, 64, 56, 56), dtype=float32, device=/replica:0/task:0/device:GPU:0) Kernel size: 3 strides: 1
lin:conv2d3---------> Tensor("resnet_model/batch_normalization_6/FusedBatchNorm:0", shape=(?, 256, 56, 56), dtype=float32, device=/replica:0/task:0/device:GPU:0) Ker
nel size: 1 strides: 1lin:the 1 sub_layer---------> Tensor("resnet_model/Relu_5:0", shape=(?, 256, 56, 56), dtype=float32, device=/replica:0/task:0/device:GPU:0)
lin:conv2d1---------> Tensor("resnet_model/Relu_6:0", shape=(?, 64, 56, 56), dtype=float32, device=/replica:0/task:0/device:GPU:0) Kernel size: 1 strides: 1
lin:conv2d2---------> Tensor("resnet_model/Relu_7:0", shape=(?, 64, 56, 56), dtype=float32, device=/replica:0/task:0/device:GPU:0) Kernel size: 3 strides: 1
lin:conv2d3---------> Tensor("resnet_model/batch_normalization_9/FusedBatchNorm:0", shape=(?, 256, 56, 56), dtype=float32, device=/replica:0/task:0/device:GPU:0) Ker
nel size: 1 strides: 1lin:the 2 sub_layer---------> Tensor("resnet_model/Relu_8:0", shape=(?, 256, 56, 56), dtype=float32, device=/replica:0/task:0/device:GPU:0)
lin:the 0 blk_layer---------> Tensor("resnet_model/block_layer1:0", shape=(?, 256, 56, 56), dtype=float32, device=/replica:0/task:0/device:GPU:0)
lin:shortcut---------> Tensor("resnet_model/batch_normalization_10/FusedBatchNorm:0", shape=(?, 512, 28, 28), dtype=float32, device=/replica:0/task:0/device:GPU:0)
lin:conv2d1---------> Tensor("resnet_model/Relu_9:0", shape=(?, 128, 56, 56), dtype=float32, device=/replica:0/task:0/device:GPU:0) Kernel size: 1 strides: 1
lin:conv2d2---------> Tensor("resnet_model/Relu_10:0", shape=(?, 128, 28, 28), dtype=float32, device=/replica:0/task:0/device:GPU:0) Kernel size: 3 strides: 2
lin:conv2d3---------> Tensor("resnet_model/batch_normalization_13/FusedBatchNorm:0", shape=(?, 512, 28, 28), dtype=float32, device=/replica:0/task:0/device:GPU:0) Ke
rnel size: 1 strides: 1lin:conv2d1---------> Tensor("resnet_model/Relu_12:0", shape=(?, 128, 28, 28), dtype=float32, device=/replica:0/task:0/device:GPU:0) Kernel size: 1 strides: 1
lin:conv2d2---------> Tensor("resnet_model/Relu_13:0", shape=(?, 128, 28, 28), dtype=float32, device=/replica:0/task:0/device:GPU:0) Kernel size: 3 strides: 1
lin:conv2d3---------> Tensor("resnet_model/batch_normalization_16/FusedBatchNorm:0", shape=(?, 512, 28, 28), dtype=float32, device=/replica:0/task:0/device:GPU:0) Ke
rnel size: 1 strides: 1lin:the 1 sub_layer---------> Tensor("resnet_model/Relu_14:0", shape=(?, 512, 28, 28), dtype=float32, device=/replica:0/task:0/device:GPU:0)
lin:conv2d1---------> Tensor("resnet_model/Relu_15:0", shape=(?, 128, 28, 28), dtype=float32, device=/replica:0/task:0/device:GPU:0) Kernel size: 1 strides: 1
lin:conv2d2---------> Tensor("resnet_model/Relu_16:0", shape=(?, 128, 28, 28), dtype=float32, device=/replica:0/task:0/device:GPU:0) Kernel size: 3 strides: 1
lin:conv2d3---------> Tensor("resnet_model/batch_normalization_19/FusedBatchNorm:0", shape=(?, 512, 28, 28), dtype=float32, device=/replica:0/task:0/device:GPU:0) Ke
rnel size: 1 strides: 1lin:the 2 sub_layer---------> Tensor("resnet_model/Relu_17:0", shape=(?, 512, 28, 28), dtype=float32, device=/replica:0/task:0/device:GPU:0)
lin:conv2d1---------> Tensor("resnet_model/Relu_18:0", shape=(?, 128, 28, 28), dtype=float32, device=/replica:0/task:0/device:GPU:0) Kernel size: 1 strides: 1
lin:conv2d2---------> Tensor("resnet_model/Relu_19:0", shape=(?, 128, 28, 28), dtype=float32, device=/replica:0/task:0/device:GPU:0) Kernel size: 3 strides: 1
lin:conv2d3---------> Tensor("resnet_model/batch_normalization_22/FusedBatchNorm:0", shape=(?, 512, 28, 28), dtype=float32, device=/replica:0/task:0/device:GPU:0) Ke
rnel size: 1 strides: 1lin:the 3 sub_layer---------> Tensor("resnet_model/Relu_20:0", shape=(?, 512, 28, 28), dtype=float32, device=/replica:0/task:0/device:GPU:0)
lin:the 1 blk_layer---------> Tensor("resnet_model/block_layer2:0", shape=(?, 512, 28, 28), dtype=float32, device=/replica:0/task:0/device:GPU:0)
lin:shortcut---------> Tensor("resnet_model/batch_normalization_23/FusedBatchNorm:0", shape=(?, 1024, 14, 14), dtype=float32, device=/replica:0/task:0/device:GPU:0)
lin:conv2d1---------> Tensor("resnet_model/Relu_21:0", shape=(?, 256, 28, 28), dtype=float32, device=/replica:0/task:0/device:GPU:0) Kernel size: 1 strides: 1
lin:conv2d2---------> Tensor("resnet_model/Relu_22:0", shape=(?, 256, 14, 14), dtype=float32, device=/replica:0/task:0/device:GPU:0) Kernel size: 3 strides: 2
lin:conv2d3---------> Tensor("resnet_model/batch_normalization_26/FusedBatchNorm:0", shape=(?, 1024, 14, 14), dtype=float32, device=/replica:0/task:0/device:GPU:0) K
ernel size: 1 strides: 1lin:conv2d1---------> Tensor("resnet_model/Relu_24:0", shape=(?, 256, 14, 14), dtype=float32, device=/replica:0/task:0/device:GPU:0) Kernel size: 1 strides: 1
lin:conv2d2---------> Tensor("resnet_model/Relu_25:0", shape=(?, 256, 14, 14), dtype=float32, device=/replica:0/task:0/device:GPU:0) Kernel size: 3 strides: 1
lin:conv2d3---------> Tensor("resnet_model/batch_normalization_29/FusedBatchNorm:0", shape=(?, 1024, 14, 14), dtype=float32, device=/replica:0/task:0/device:GPU:0) K
ernel size: 1 strides: 1lin:the 1 sub_layer---------> Tensor("resnet_model/Relu_26:0", shape=(?, 1024, 14, 14), dtype=float32, device=/replica:0/task:0/device:GPU:0)
lin:conv2d1---------> Tensor("resnet_model/Relu_27:0", shape=(?, 256, 14, 14), dtype=float32, device=/replica:0/task:0/device:GPU:0) Kernel size: 1 strides: 1
lin:conv2d2---------> Tensor("resnet_model/Relu_28:0", shape=(?, 256, 14, 14), dtype=float32, device=/replica:0/task:0/device:GPU:0) Kernel size: 3 strides: 1
lin:conv2d3---------> Tensor("resnet_model/batch_normalization_32/FusedBatchNorm:0", shape=(?, 1024, 14, 14), dtype=float32, device=/replica:0/task:0/device:GPU:0) K
ernel size: 1 strides: 1lin:the 2 sub_layer---------> Tensor("resnet_model/Relu_29:0", shape=(?, 1024, 14, 14), dtype=float32, device=/replica:0/task:0/device:GPU:0)
lin:conv2d1---------> Tensor("resnet_model/Relu_30:0", shape=(?, 256, 14, 14), dtype=float32, device=/replica:0/task:0/device:GPU:0) Kernel size: 1 strides: 1
lin:conv2d2---------> Tensor("resnet_model/Relu_31:0", shape=(?, 256, 14, 14), dtype=float32, device=/replica:0/task:0/device:GPU:0) Kernel size: 3 strides: 1
lin:conv2d3---------> Tensor("resnet_model/batch_normalization_35/FusedBatchNorm:0", shape=(?, 1024, 14, 14), dtype=float32, device=/replica:0/task:0/device:GPU:0) K
ernel size: 1 strides: 1lin:the 3 sub_layer---------> Tensor("resnet_model/Relu_32:0", shape=(?, 1024, 14, 14), dtype=float32, device=/replica:0/task:0/device:GPU:0)
lin:conv2d1---------> Tensor("resnet_model/Relu_33:0", shape=(?, 256, 14, 14), dtype=float32, device=/replica:0/task:0/device:GPU:0) Kernel size: 1 strides: 1
lin:conv2d2---------> Tensor("resnet_model/Relu_34:0", shape=(?, 256, 14, 14), dtype=float32, device=/replica:0/task:0/device:GPU:0) Kernel size: 3 strides: 1
lin:conv2d3---------> Tensor("resnet_model/batch_normalization_38/FusedBatchNorm:0", shape=(?, 1024, 14, 14), dtype=float32, device=/replica:0/task:0/device:GPU:0) K
ernel size: 1 strides: 1lin:the 4 sub_layer---------> Tensor("resnet_model/Relu_35:0", shape=(?, 1024, 14, 14), dtype=float32, device=/replica:0/task:0/device:GPU:0)
lin:conv2d1---------> Tensor("resnet_model/Relu_36:0", shape=(?, 256, 14, 14), dtype=float32, device=/replica:0/task:0/device:GPU:0) Kernel size: 1 strides: 1
lin:conv2d2---------> Tensor("resnet_model/Relu_37:0", shape=(?, 256, 14, 14), dtype=float32, device=/replica:0/task:0/device:GPU:0) Kernel size: 3 strides: 1
lin:conv2d3---------> Tensor("resnet_model/batch_normalization_41/FusedBatchNorm:0", shape=(?, 1024, 14, 14), dtype=float32, device=/replica:0/task:0/device:GPU:0) K
ernel size: 1 strides: 1lin:the 5 sub_layer---------> Tensor("resnet_model/Relu_38:0", shape=(?, 1024, 14, 14), dtype=float32, device=/replica:0/task:0/device:GPU:0)
lin:the 2 blk_layer---------> Tensor("resnet_model/block_layer3:0", shape=(?, 1024, 14, 14), dtype=float32, device=/replica:0/task:0/device:GPU:0)
lin:shortcut---------> Tensor("resnet_model/batch_normalization_42/FusedBatchNorm:0", shape=(?, 2048, 7, 7), dtype=float32, device=/replica:0/task:0/device:GPU:0)
lin:conv2d1---------> Tensor("resnet_model/Relu_39:0", shape=(?, 512, 14, 14), dtype=float32, device=/replica:0/task:0/device:GPU:0) Kernel size: 1 strides: 1
lin:conv2d2---------> Tensor("resnet_model/Relu_40:0", shape=(?, 512, 7, 7), dtype=float32, device=/replica:0/task:0/device:GPU:0) Kernel size: 3 strides: 2
lin:conv2d3---------> Tensor("resnet_model/batch_normalization_45/FusedBatchNorm:0", shape=(?, 2048, 7, 7), dtype=float32, device=/replica:0/task:0/device:GPU:0) Ker
nel size: 1 strides: 1lin:conv2d1---------> Tensor("resnet_model/Relu_42:0", shape=(?, 512, 7, 7), dtype=float32, device=/replica:0/task:0/device:GPU:0) Kernel size: 1 strides: 1
lin:conv2d2---------> Tensor("resnet_model/Relu_43:0", shape=(?, 512, 7, 7), dtype=float32, device=/replica:0/task:0/device:GPU:0) Kernel size: 3 strides: 1
lin:conv2d3---------> Tensor("resnet_model/batch_normalization_48/FusedBatchNorm:0", shape=(?, 2048, 7, 7), dtype=float32, device=/replica:0/task:0/device:GPU:0) Ker
nel size: 1 strides: 1lin:the 1 sub_layer---------> Tensor("resnet_model/Relu_44:0", shape=(?, 2048, 7, 7), dtype=float32, device=/replica:0/task:0/device:GPU:0)
lin:conv2d1---------> Tensor("resnet_model/Relu_45:0", shape=(?, 512, 7, 7), dtype=float32, device=/replica:0/task:0/device:GPU:0) Kernel size: 1 strides: 1
lin:conv2d2---------> Tensor("resnet_model/Relu_46:0", shape=(?, 512, 7, 7), dtype=float32, device=/replica:0/task:0/device:GPU:0) Kernel size: 3 strides: 1
lin:conv2d3---------> Tensor("resnet_model/batch_normalization_51/FusedBatchNorm:0", shape=(?, 2048, 7, 7), dtype=float32, device=/replica:0/task:0/device:GPU:0) Ker
nel size: 1 strides: 1lin:the 2 sub_layer---------> Tensor("resnet_model/Relu_47:0", shape=(?, 2048, 7, 7), dtype=float32, device=/replica:0/task:0/device:GPU:0)
lin:the 3 blk_layer---------> Tensor("resnet_model/block_layer4:0", shape=(?, 2048, 7, 7), dtype=float32, device=/replica:0/task:0/device:GPU:0)
特征图的尺寸递减,由 56, 28,14,到7。