语义分割_ 字节宝

Full-Resolution Residual Networks for Semantic Segmentation in Street Scenes CVPR2017 Theano/Lasagne code：https://github.com/TobyPDE/FRRN

针对语义分割问题，本文侧重于分割中的物体边界精度， precise boundary adherence 通过两个 stream 实现：residual stream 保持图像的全尺寸大小，pooling stream 先降采样再上采样，两个 stream 通过 residual units 联系起来 high-level features for recognition pooling stream low-level features for localization residual stream

The residual stream (blue) stays at the full image resolution, the pooling stream (red) undergoes a sequence of pooling and unpooling operations. The two processing streams are coupled using full-resolution residual units (FRRUs)

当前大部分分割算法都采样FCNs[38] 架构，很多文章依赖于在图像分类上的训练好的模型如 ResNet, VGG。从预训练模型开始训练好处就是节省训练时间，大部分情况下得到的效果比从零开始训练要好。但是这么做也有缺点：这些 pre-trained networks 灵活性要差些，不能采用一些新的网络结构技术如 batch normalization 和新的激活响应函数。

使用 FCNs 做分割，会通过池化或 strided convolutions 进行一系列的降采样，这么做有两个原因：1）可以显著的增加感受野的尺寸 increases the size of the receptive field，2）降采样可以提高网络对物体在图像中小平移的鲁棒性。池化对于图像中物体的识别是很有帮助的，但是对于分割问题却降低了分割的 localization 性能。目前已有的解决方法：1）加一个 decoder；2）使用 dilated convolutions; 3）使用多尺度预测；4）加后处理步骤如 CRF 平滑

Network Architectures for Segmentation

high-level features for recognition low-level features for localization 怎么鱼和熊掌兼得了？ Full-Resolution Residual Unit (FRRU) ： residual stream 对应 low-level features ， pooling stream 对应 high-level features

这里采用 Residual 网络是因为它比常规卷积网络性能明显要好。

Residual 网络为什么可以训练很深的网络？本文从 loss derivative 分析得出 Hence, gradients can flow unhindered from the deeper unit to the shallower unit. This makes training even extremely deep ResNets possible.

随后我们也从 loss derivative 角度分析了 Full-Resolution Residual Unit (FRRU) 同样具有和 ResNets 类似的训练属性 similar training characteristics as ResNets

localization resolution stream this

0 人点赞