【tensorflow 解析】-【4】

最新推荐文章于 2022-03-15 11:25:16 发布

原创最新推荐文章于 2022-03-15 11:25:16 发布 · 442 阅读

1 ·

CC 4.0 BY-SA版权

文章标签：

#tensorflow

deep learning 专栏收录该内容

16 篇文章

订阅专栏

本文深入解析了TensorFlow中的ResNet残差网络结构，详细介绍了其调用关系及核心组件，包括ImagenetModel类、Model类的__call__方法、卷积层与池化层的实现，以及构建block层的过程。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

解析一个 tensorflow 项目 – residual network 残差网络结构。

resnet_model in tensorflow

`resnet_model` 调用关系：

1. `official/resnet/imagenet_main.py` :

 ImagenetModel() 类是继承 official/resnet/resnet_model.py 的 Model() 类，并初始化 __init__ 构造函数

设置的参数：

    super(ImagenetModel, self).__init__(                                           
        resnet_size=resnet_size,                                                   
        bottleneck=bottleneck,                                                     
        num_classes=num_classes,                                                   
        num_filters=64,                                                            
        kernel_size=7,                                                             
        conv_stride=2,                                                             
        first_pool_size=3,                                                         
        first_pool_stride=2,                                                       
        second_pool_size=7,                                                        
        second_pool_stride=1,                                                      
        block_sizes=_get_block_sizes(resnet_size),                                 
        block_strides=[1, 2, 2, 2],                                                
        final_size=final_size,                                                     
        version=version,                                                           
        data_format=data_format,                                                   
        dtype=dtype                                                                
    )

2. `official/resnet/resnet_model.py` 的 `Model()` 类：

其定义了内置函数 __call__(self, inputs, training)。使得可直接用类实例化。

3. `Model()` 类的内置函数 `call()`：

3.1 with 包含 variable_scope('resnet_model', ):

with self._model_variable_scope():

3.2 第一个7x7的卷积层与池化层。根据 residual network 论文所述，先进行第一个7x7的卷积层与池化层，如图。以下代码implement 第一个卷积层:

inputs = conv2d_fixed_padding(
      inputs=inputs, filters=self.num_filters, kernel_size=self.kernel_size,
      strides=self.conv_stride, data_format=self.data_format)
  inputs = tf.identity(inputs, 'initial_conv')

__init__ 的参数对应着论文里的参数：

self.num_filters = 64,
self.kernel_size = 7,
self.conv_stride = 2,

这里需要加 tf.identity() 方法的原因是将 input 转换为一个 op, 使得后续layer 可对 input 执行操作
接着是池化层：

        inputs = tf.layers.max_pooling2d(
            inputs=inputs, pool_size=self.first_pool_size,
            strides=self.first_pool_stride, padding='SAME',
            data_format=self.data_format)
        inputs = tf.identity(inputs, 'initial_max_pool')

同样， __init__ 的参数对应着论文里的参数：

first_pool_size = 3,
first_pool_stride = 2,

3.3 构建一串 block。
从论文科可以看到，作者构建的 residual network 是由一个个的卷积 block 构成，如图，每一 block 含2个卷积层，第一组有 3 个卷积block，也就是有三个shortcut，第二组有 4 个卷积block ，第三组有 6 个卷积block，第四组有3 个卷积block。

residual network

代码中可看到，block_layer 的第一个 block_fn 与其后的分开计算，原因是因为第一个 block_fn 需要做一次 projection_shortcut()，projection_shortcut() 的作用是执行一个卷积层（kernel大小: 1x1, filter数量: x4，stride长度: 2).

  def projection_shortcut(inputs):
    return conv2d_fixed_padding(
        inputs=inputs, filters=filters_out, kernel_size=1, strides=strides,
        data_format=data_format)

  # Only the first block per block_layer uses projection_shortcut and strides
  inputs = block_fn(inputs, filters, training, projection_shortcut, strides,
                    data_format)

这时候的 block_fn() 是 _bottleneck_block_v1() ,如下代码，当做 projection_shortcut() 时，input 会加上 shortcut，当不做 projection_shortcut() 时，input 会自加一次。论文中列举了两种 block，一种是 identify 型，一种是 bottleneck 型，代码展示的是做一次 “bottleneck” 型的block。如图

def _bottleneck_block_v1(inputs, filters, training, projection_shortcut,
                         strides, data_format):
  """A single block for ResNet v1, with a bottleneck.

  Similar to _building_block_v1(), except using the "bottleneck" blocks
  described in:
    Convolution then batch normalization then ReLU as described by:
      Deep Residual Learning for Image Recognition
      https://arxiv.org/pdf/1512.03385.pdf
      by Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, Dec 2015.

  Args:
    inputs: A tensor of size [batch, channels, height_in, width_in] or
      [batch, height_in, width_in, channels] depending on data_format.
    filters: The number of filters for the convolutions.
    training: A Boolean for whether the model is in training or inference
      mode. Needed for batch normalization.
    projection_shortcut: The function to use for projection shortcuts
      (typically a 1x1 convolution when downsampling the input).
    strides: The block's stride. If greater than 1, this block will ultimately
      downsample the input.
    data_format: The input format ('channels_last' or 'channels_first').

  Returns:
    The output tensor of the block; shape should match inputs.
  """
  shortcut = inputs

  if projection_shortcut is not None:
    shortcut = projection_shortcut(inputs)
    shortcut = batch_norm(inputs=shortcut, training=training,
                          data_format=data_format)

  inputs = conv2d_fixed_padding(
      inputs=inputs, filters=filters, kernel_size=1, strides=1,
      data_format=data_format)
  inputs = batch_norm(inputs, training, data_format)
  inputs = tf.nn.relu(inputs)

  inputs = conv2d_fixed_padding(
      inputs=inputs, filters=filters, kernel_size=3, strides=strides,
      data_format=data_format)
  inputs = batch_norm(inputs, training, data_format)
  inputs = tf.nn.relu(inputs)

  inputs = conv2d_fixed_padding(
      inputs=inputs, filters=4 * filters, kernel_size=1, strides=1,
      data_format=data_format)
  inputs = batch_norm(inputs, training, data_format)
  inputs += shortcut
  inputs = tf.nn.relu(inputs)

  return inputs

结合代码与下图可以看出，确实是做一次 “bottleneck” 型的block，

作者在论文中解释，在同一个block 内，input 和 output 的维数是一样的（用实线连一个shortcut），但在不同block衔接时，input 到 output 维度会增加（虚线连接一个shortcut），因此，每一个 block_layer() 的第一个 block_fn() 需要先做一个 projection_shortcut()，并使用参数规定的 strides，（如第一个block_layer的strides=1，第二个block_layer的strides=2，第三个block_layer的strides=2，第四个block_layer的strides=2），剩下的 block_fn() 不做 projection_shortcut()，并统一使用 strides=1，在如下代码中有体现。牢记这一点，后续算维度的时候才能够准确无误。

  # Only the first block per block_layer uses projection_shortcut and strides
  inputs = block_fn(inputs, filters, training, projection_shortcut, strides,
                    data_format)

  for _ in range(1, blocks):
    inputs = block_fn(inputs, filters, training, None, 1, data_format)

上述代码展示了每个 block_layer 的计算流程，第一个 block_fn 是需要计算 projection_shortcut，其余不需要计算 projection_shortcut。第一个图片中不同颜色区分的层即表示一个 block_layer，一层表示一个 block_fn() 。

至此，residual network 主体结构展示完毕。看看其模型的参数变化:
输入：

shape=(?, 3, 224, 224)

第一次卷积：

shape=(?, 64, 112, 112)

第一次池化：

shape=(?, 64, 56, 56)

接下来，进入shortcut 阶段：

lin:shortcut--------->  Tensor("resnet_model/batch_normalization/FusedBatchNorm:0", shape=(?, 256, 56, 56), dtype=float32, device=/replica:0/task:0/device:GPU:0)
lin:conv2d1--------->  Tensor("resnet_model/Relu:0", shape=(?, 64, 56, 56), dtype=float32, device=/replica:0/task:0/device:GPU:0) Kernel size: 1 strides: 1
lin:conv2d2--------->  Tensor("resnet_model/Relu_1:0", shape=(?, 64, 56, 56), dtype=float32, device=/replica:0/task:0/device:GPU:0) Kernel size: 3 strides: 1
lin:conv2d3--------->  Tensor("resnet_model/batch_normalization_3/FusedBatchNorm:0", shape=(?, 256, 56, 56), dtype=float32, device=/replica:0/task:0/device:GPU:0) Ker
nel size: 1 strides: 1lin:conv2d1--------->  Tensor("resnet_model/Relu_3:0", shape=(?, 64, 56, 56), dtype=float32, device=/replica:0/task:0/device:GPU:0) Kernel size: 1 strides: 1
lin:conv2d2--------->  Tensor("resnet_model/Relu_4:0", shape=(?, 64, 56, 56), dtype=float32, device=/replica:0/task:0/device:GPU:0) Kernel size: 3 strides: 1
lin:conv2d3--------->  Tensor("resnet_model/batch_normalization_6/FusedBatchNorm:0", shape=(?, 256, 56, 56), dtype=float32, device=/replica:0/task:0/device:GPU:0) Ker
nel size: 1 strides: 1lin:the 1 sub_layer--------->  Tensor("resnet_model/Relu_5:0", shape=(?, 256, 56, 56), dtype=float32, device=/replica:0/task:0/device:GPU:0)
lin:conv2d1--------->  Tensor("resnet_model/Relu_6:0", shape=(?, 64, 56, 56), dtype=float32, device=/replica:0/task:0/device:GPU:0) Kernel size: 1 strides: 1
lin:conv2d2--------->  Tensor("resnet_model/Relu_7:0", shape=(?, 64, 56, 56), dtype=float32, device=/replica:0/task:0/device:GPU:0) Kernel size: 3 strides: 1
lin:conv2d3--------->  Tensor("resnet_model/batch_normalization_9/FusedBatchNorm:0", shape=(?, 256, 56, 56), dtype=float32, device=/replica:0/task:0/device:GPU:0) Ker
nel size: 1 strides: 1lin:the 2 sub_layer--------->  Tensor("resnet_model/Relu_8:0", shape=(?, 256, 56, 56), dtype=float32, device=/replica:0/task:0/device:GPU:0)
lin:the 0 blk_layer--------->  Tensor("resnet_model/block_layer1:0", shape=(?, 256, 56, 56), dtype=float32, device=/replica:0/task:0/device:GPU:0)
lin:shortcut--------->  Tensor("resnet_model/batch_normalization_10/FusedBatchNorm:0", shape=(?, 512, 28, 28), dtype=float32, device=/replica:0/task:0/device:GPU:0)
lin:conv2d1--------->  Tensor("resnet_model/Relu_9:0", shape=(?, 128, 56, 56), dtype=float32, device=/replica:0/task:0/device:GPU:0) Kernel size: 1 strides: 1
lin:conv2d2--------->  Tensor("resnet_model/Relu_10:0", shape=(?, 128, 28, 28), dtype=float32, device=/replica:0/task:0/device:GPU:0) Kernel size: 3 strides: 2
lin:conv2d3--------->  Tensor("resnet_model/batch_normalization_13/FusedBatchNorm:0", shape=(?, 512, 28, 28), dtype=float32, device=/replica:0/task:0/device:GPU:0) Ke
rnel size: 1 strides: 1lin:conv2d1--------->  Tensor("resnet_model/Relu_12:0", shape=(?, 128, 28, 28), dtype=float32, device=/replica:0/task:0/device:GPU:0) Kernel size: 1 strides: 1
lin:conv2d2--------->  Tensor("resnet_model/Relu_13:0", shape=(?, 128, 28, 28), dtype=float32, device=/replica:0/task:0/device:GPU:0) Kernel size: 3 strides: 1
lin:conv2d3--------->  Tensor("resnet_model/batch_normalization_16/FusedBatchNorm:0", shape=(?, 512, 28, 28), dtype=float32, device=/replica:0/task:0/device:GPU:0) Ke
rnel size: 1 strides: 1lin:the 1 sub_layer--------->  Tensor("resnet_model/Relu_14:0", shape=(?, 512, 28, 28), dtype=float32, device=/replica:0/task:0/device:GPU:0)
lin:conv2d1--------->  Tensor("resnet_model/Relu_15:0", shape=(?, 128, 28, 28), dtype=float32, device=/replica:0/task:0/device:GPU:0) Kernel size: 1 strides: 1
lin:conv2d2--------->  Tensor("resnet_model/Relu_16:0", shape=(?, 128, 28, 28), dtype=float32, device=/replica:0/task:0/device:GPU:0) Kernel size: 3 strides: 1
lin:conv2d3--------->  Tensor("resnet_model/batch_normalization_19/FusedBatchNorm:0", shape=(?, 512, 28, 28), dtype=float32, device=/replica:0/task:0/device:GPU:0) Ke
rnel size: 1 strides: 1lin:the 2 sub_layer--------->  Tensor("resnet_model/Relu_17:0", shape=(?, 512, 28, 28), dtype=float32, device=/replica:0/task:0/device:GPU:0)
lin:conv2d1--------->  Tensor("resnet_model/Relu_18:0", shape=(?, 128, 28, 28), dtype=float32, device=/replica:0/task:0/device:GPU:0) Kernel size: 1 strides: 1
lin:conv2d2--------->  Tensor("resnet_model/Relu_19:0", shape=(?, 128, 28, 28), dtype=float32, device=/replica:0/task:0/device:GPU:0) Kernel size: 3 strides: 1
lin:conv2d3--------->  Tensor("resnet_model/batch_normalization_22/FusedBatchNorm:0", shape=(?, 512, 28, 28), dtype=float32, device=/replica:0/task:0/device:GPU:0) Ke
rnel size: 1 strides: 1lin:the 3 sub_layer--------->  Tensor("resnet_model/Relu_20:0", shape=(?, 512, 28, 28), dtype=float32, device=/replica:0/task:0/device:GPU:0)
lin:the 1 blk_layer--------->  Tensor("resnet_model/block_layer2:0", shape=(?, 512, 28, 28), dtype=float32, device=/replica:0/task:0/device:GPU:0)
lin:shortcut--------->  Tensor("resnet_model/batch_normalization_23/FusedBatchNorm:0", shape=(?, 1024, 14, 14), dtype=float32, device=/replica:0/task:0/device:GPU:0)
lin:conv2d1--------->  Tensor("resnet_model/Relu_21:0", shape=(?, 256, 28, 28), dtype=float32, device=/replica:0/task:0/device:GPU:0) Kernel size: 1 strides: 1
lin:conv2d2--------->  Tensor("resnet_model/Relu_22:0", shape=(?, 256, 14, 14), dtype=float32, device=/replica:0/task:0/device:GPU:0) Kernel size: 3 strides: 2
lin:conv2d3--------->  Tensor("resnet_model/batch_normalization_26/FusedBatchNorm:0", shape=(?, 1024, 14, 14), dtype=float32, device=/replica:0/task:0/device:GPU:0) K
ernel size: 1 strides: 1lin:conv2d1--------->  Tensor("resnet_model/Relu_24:0", shape=(?, 256, 14, 14), dtype=float32, device=/replica:0/task:0/device:GPU:0) Kernel size: 1 strides: 1
lin:conv2d2--------->  Tensor("resnet_model/Relu_25:0", shape=(?, 256, 14, 14), dtype=float32, device=/replica:0/task:0/device:GPU:0) Kernel size: 3 strides: 1
lin:conv2d3--------->  Tensor("resnet_model/batch_normalization_29/FusedBatchNorm:0", shape=(?, 1024, 14, 14), dtype=float32, device=/replica:0/task:0/device:GPU:0) K
ernel size: 1 strides: 1lin:the 1 sub_layer--------->  Tensor("resnet_model/Relu_26:0", shape=(?, 1024, 14, 14), dtype=float32, device=/replica:0/task:0/device:GPU:0)
lin:conv2d1--------->  Tensor("resnet_model/Relu_27:0", shape=(?, 256, 14, 14), dtype=float32, device=/replica:0/task:0/device:GPU:0) Kernel size: 1 strides: 1
lin:conv2d2--------->  Tensor("resnet_model/Relu_28:0", shape=(?, 256, 14, 14), dtype=float32, device=/replica:0/task:0/device:GPU:0) Kernel size: 3 strides: 1
lin:conv2d3--------->  Tensor("resnet_model/batch_normalization_32/FusedBatchNorm:0", shape=(?, 1024, 14, 14), dtype=float32, device=/replica:0/task:0/device:GPU:0) K
ernel size: 1 strides: 1lin:the 2 sub_layer--------->  Tensor("resnet_model/Relu_29:0", shape=(?, 1024, 14, 14), dtype=float32, device=/replica:0/task:0/device:GPU:0)
lin:conv2d1--------->  Tensor("resnet_model/Relu_30:0", shape=(?, 256, 14, 14), dtype=float32, device=/replica:0/task:0/device:GPU:0) Kernel size: 1 strides: 1
lin:conv2d2--------->  Tensor("resnet_model/Relu_31:0", shape=(?, 256, 14, 14), dtype=float32, device=/replica:0/task:0/device:GPU:0) Kernel size: 3 strides: 1
lin:conv2d3--------->  Tensor("resnet_model/batch_normalization_35/FusedBatchNorm:0", shape=(?, 1024, 14, 14), dtype=float32, device=/replica:0/task:0/device:GPU:0) K
ernel size: 1 strides: 1lin:the 3 sub_layer--------->  Tensor("resnet_model/Relu_32:0", shape=(?, 1024, 14, 14), dtype=float32, device=/replica:0/task:0/device:GPU:0)
lin:conv2d1--------->  Tensor("resnet_model/Relu_33:0", shape=(?, 256, 14, 14), dtype=float32, device=/replica:0/task:0/device:GPU:0) Kernel size: 1 strides: 1
lin:conv2d2--------->  Tensor("resnet_model/Relu_34:0", shape=(?, 256, 14, 14), dtype=float32, device=/replica:0/task:0/device:GPU:0) Kernel size: 3 strides: 1
lin:conv2d3--------->  Tensor("resnet_model/batch_normalization_38/FusedBatchNorm:0", shape=(?, 1024, 14, 14), dtype=float32, device=/replica:0/task:0/device:GPU:0) K
ernel size: 1 strides: 1lin:the 4 sub_layer--------->  Tensor("resnet_model/Relu_35:0", shape=(?, 1024, 14, 14), dtype=float32, device=/replica:0/task:0/device:GPU:0)
lin:conv2d1--------->  Tensor("resnet_model/Relu_36:0", shape=(?, 256, 14, 14), dtype=float32, device=/replica:0/task:0/device:GPU:0) Kernel size: 1 strides: 1
lin:conv2d2--------->  Tensor("resnet_model/Relu_37:0", shape=(?, 256, 14, 14), dtype=float32, device=/replica:0/task:0/device:GPU:0) Kernel size: 3 strides: 1
lin:conv2d3--------->  Tensor("resnet_model/batch_normalization_41/FusedBatchNorm:0", shape=(?, 1024, 14, 14), dtype=float32, device=/replica:0/task:0/device:GPU:0) K
ernel size: 1 strides: 1lin:the 5 sub_layer--------->  Tensor("resnet_model/Relu_38:0", shape=(?, 1024, 14, 14), dtype=float32, device=/replica:0/task:0/device:GPU:0)
lin:the 2 blk_layer--------->  Tensor("resnet_model/block_layer3:0", shape=(?, 1024, 14, 14), dtype=float32, device=/replica:0/task:0/device:GPU:0)
lin:shortcut--------->  Tensor("resnet_model/batch_normalization_42/FusedBatchNorm:0", shape=(?, 2048, 7, 7), dtype=float32, device=/replica:0/task:0/device:GPU:0)
lin:conv2d1--------->  Tensor("resnet_model/Relu_39:0", shape=(?, 512, 14, 14), dtype=float32, device=/replica:0/task:0/device:GPU:0) Kernel size: 1 strides: 1
lin:conv2d2--------->  Tensor("resnet_model/Relu_40:0", shape=(?, 512, 7, 7), dtype=float32, device=/replica:0/task:0/device:GPU:0) Kernel size: 3 strides: 2
lin:conv2d3--------->  Tensor("resnet_model/batch_normalization_45/FusedBatchNorm:0", shape=(?, 2048, 7, 7), dtype=float32, device=/replica:0/task:0/device:GPU:0) Ker
nel size: 1 strides: 1lin:conv2d1--------->  Tensor("resnet_model/Relu_42:0", shape=(?, 512, 7, 7), dtype=float32, device=/replica:0/task:0/device:GPU:0) Kernel size: 1 strides: 1
lin:conv2d2--------->  Tensor("resnet_model/Relu_43:0", shape=(?, 512, 7, 7), dtype=float32, device=/replica:0/task:0/device:GPU:0) Kernel size: 3 strides: 1
lin:conv2d3--------->  Tensor("resnet_model/batch_normalization_48/FusedBatchNorm:0", shape=(?, 2048, 7, 7), dtype=float32, device=/replica:0/task:0/device:GPU:0) Ker
nel size: 1 strides: 1lin:the 1 sub_layer--------->  Tensor("resnet_model/Relu_44:0", shape=(?, 2048, 7, 7), dtype=float32, device=/replica:0/task:0/device:GPU:0)
lin:conv2d1--------->  Tensor("resnet_model/Relu_45:0", shape=(?, 512, 7, 7), dtype=float32, device=/replica:0/task:0/device:GPU:0) Kernel size: 1 strides: 1
lin:conv2d2--------->  Tensor("resnet_model/Relu_46:0", shape=(?, 512, 7, 7), dtype=float32, device=/replica:0/task:0/device:GPU:0) Kernel size: 3 strides: 1
lin:conv2d3--------->  Tensor("resnet_model/batch_normalization_51/FusedBatchNorm:0", shape=(?, 2048, 7, 7), dtype=float32, device=/replica:0/task:0/device:GPU:0) Ker
nel size: 1 strides: 1lin:the 2 sub_layer--------->  Tensor("resnet_model/Relu_47:0", shape=(?, 2048, 7, 7), dtype=float32, device=/replica:0/task:0/device:GPU:0)
lin:the 3 blk_layer--------->  Tensor("resnet_model/block_layer4:0", shape=(?, 2048, 7, 7), dtype=float32, device=/replica:0/task:0/device:GPU:0)

特征图的尺寸递减，由 56， 28，14，到7。