一、待解决的问题
- 网络连续下采样和重复池化,导致输入特征图分辨率较低
- 空间不变性,丢失大量细节信息
- 物体多尺度问题
二、创新点
- 引入编解码结构Encoder-Decoder,v3作为Encoder,添加Decoder作为输出处理过程,优化边缘
- 引入Xception和Depthwise separable convolution,应用到ASPP和Deconder过程
- 修改了Xception,使用strider=2的Depthwise separable convolution代替所有的Maxpooling
三、具体细节
1.Encoder-Decoder编解码结构
图a是v3的网络结构,图b则是典型的编解码网络结构,图c则是本文提出的,基于v3作为Encoder的编解码分割网络。分割网络的Decoder有两个输入:ASPP的输出和Encoder的中间特征图(Resnet中间输出)。
具体的网络结构如下图所示:
class _ASPP(nn.Module):
"""
Atrous spatial pyramid pooling with image-level feature
"""
def __init__(self, in_ch, out_ch, rates):
super(_ASPP, self).__init__()
self.stages = nn.Module()
self.stages.add_module("c0", _ConvBnReLU(in_ch, out_ch, 1, 1, 0, 1))
for i, rate in enumerate(rates):
self.stages.add_module(
"c{}".format(i + 1),
_ConvBnReLU(in_ch, out_ch, 3, 1, padding=rate, dilation=rate),
)
self.stages.add_module("imagepool", _ImagePool(in_ch, out_ch))
def forward(self, x):
return torch.cat([stage(x) for stage in self.stages.children()], dim=1)
class DeepLabV3Plus(nn.Module):
"""
DeepLab v3+: Dilated ResNet with multi-grid + improved ASPP + decoder
Dilated Resnet with multi-grid + improved ASPP与DeepLab V3相同
decoder则是两次卷积+4倍上采样
注意 Renset layer2的输出在送入decoder前对num_channels进行了reduce(256->48)
"""
def __init__(self, n_classes, n_blocks, atrous_rates, multi_grids, output_stride):
super(DeepLabV3Plus, self).__init__()
# Stride and dilation
if output_stride == 8:
s = [1, 2, 1, 1]
d = [1, 1, 2, 4]
elif output_stride == 16:
s = [1, 2, 2, 1]
d = [1, 1, 1, 2]
# Encoder
ch = [64 * 2 ** p for p in range(6)]
self.layer1 = _Stem(ch[0])
self.layer2 = _ResLayer(n_blocks[0], ch[0], ch[2], s[0], d[0])
self.layer3 = _ResLayer(n_blocks[1], ch[2], ch[3], s[1], d[1])
self.layer4 = _ResLayer(n_blocks[2], ch[3], ch[4], s[2], d[2])
self.layer5 = _ResLayer(n_blocks[3], ch[4], ch[5], s[3], d[3], multi_grids)
self.aspp = _ASPP(ch[5], 256, atrous_rates)
concat_ch = 256 * (len(atrous_rates) + 2)
self.add_module("fc1", _ConvBnReLU(concat_ch, 256, 1, 1, 0, 1))
# Decoder
self.reduce = _ConvBnReLU(256, 48, 1, 1, 0, 1)
self.fc2 = nn.Sequential(
OrderedDict(
[
("conv1", _ConvBnReLU(304, 256, 3, 1, 1, 1)),
("conv2", _ConvBnReLU(256, 256, 3, 1, 1, 1)),
("conv3", nn.Conv2d(256, n_classes, kernel_size=1)),
]
)
)
def forward(self, x):
h = self.layer1(x)
h = self.layer2(h)
h_ = self.reduce(h)
h = self.layer3(h)
h = self.layer4(h)
h = self.layer5(h)
h = self.aspp(h)
h = self.fc1(h)
h = F.interpolate(h, size=h_.shape[2:], mode="bilinear", align_corners=False)
h = torch.cat((h, h_), dim=1)
h = self.fc2(h)
h = F.interpolate(h, size=x.shape[2:], mode="bilinear", align_corners=False)
return h
2.Depthwise separable convolution深度可分离卷积(空洞)
DeepLab-v3还引入Xception网络结构和深度可分离卷积。深度可分离卷积将普通卷积分为两个过程 depth-wise convolution 和 point-wise convolution。先使用K*K的卷积核对每个通道单独进行卷积,不改变通道数两;再使用1*1的卷积对前述特征进行融合,并进行通道数增减。
对卷积核大小为K的普通卷积来说,参数量为:
Cin∗K∗K∗CoutC_{in}*K*K*C_{out}Cin∗K∗K∗Cout
若使用深度可分离卷积,则参数量为:
Cin∗k∗k+Cout∗1∗1C_{in}*k*k + C_{out}*1*1Cin∗k∗k+Cout∗1∗1
Pytorch实现深度可分离卷积代码:
class Sep_conv(nn.Module):
def __init__(self, in_ch, out_ch):
super(Sep_conv, self).__init__()
self.depth_conv = nn.Conv2d(
in_channels=in_ch,
out_channels=in_ch,
kernel_size=3,
stride=1,
padding=1,
groups=in_ch # 每个channel作为一组
)
self.point_conv = nn.Conv2d(
in_channels=in_ch,
out_channels=out_ch,
kernel_size=1,
stride=1,
padding=0,
groups=1
)
def forward(self, input):
out = self.depth_conv(input)
out = self.point_conv(out)
return out
另外,分组卷积则是将特征图按照通道关系分为n组,每组进行普通卷积,则分组卷积的参数量为:
Cin∗K∗K∗Cout3\frac{C_{in}*K*K*C_{out}}{3}3Cin∗K∗K∗Cout,pytorch代码只需要在普通卷积中添加group参数量即可。
class Group_conv(nn.Module):
def __init__(self, in_ch, out_ch, groups):
super(Group_conv, self).__init__()
self.conv = nn.Conv2d(
in_channels=in_ch,
out_channels=out_ch,
kernel_size=3,
stride=1,
padding=1,
groups=groups
)
def forward(self, input):
out = self.conv(input)
return out
3.Xception
v3还探讨了Xception作为Encoder的效果,并对Xception进行了一定程度的改动,如下图所示。
- Xception的Entry Flow结构保持不变,增加了更多的Middle Flow。
- 将所有的MaxPooling都替换成kernel_size=3,stride=2的深度可分离卷积。
- 在每个3*3的可分离卷积后都添加BN层和ReLU,类似于MobileNet。