作者 | 已退逼乎
【导读】2019年以来,除各AI 大厂私有网络范围外,MaskRCNN,CascadeRCNN 成为了支撑很多业务得以开展的基础,而以 Faster RCNN 为基础去复现其他的检测网络既省时又省力,也算得上是里程碑性成果了。因此本文主要以简洁易懂的文字复现了 Resnet - Faster R-CNN 。
Faster RCNN 复现
文章之前,我们先来明确检测类任务都在干些什么:
需求:
对图像中的特定种类目标做出分类,并求出目标在图像中所处的位置即最终需要的信息:
- object-classes_name
- object-position
在一般的检测任务中类别信息通常由索引代替,例如1-> apple,2 - > cat,3 - > dog,...... > 而位置一般可以由两组坐标代替:> 矩形的左上角,右下角坐标(x1,y1,x2,y2)
Faster R-CNN 作为两阶段检测网络发展中最重要的一个网络,基本可以视为检测任务的里程碑性成果。
延伸扩展的 MaskRCNN,CascadeRCNN 都成为了 2019 年这个时间点上除了各家 AI 大厂私有网络范围外,支撑很多业务得以开展的基础。所以,Pytorch 为基础来从头复现 FasterRCNN 网络是非常有必要的,其中包含了太多的招数和理论中不会包括的先验知识。
甚至,以 Faster RCNN 为基础去复现其他的检测网络 所需要的精力和时间都会大大降低
我们的目标:尝试用最简洁,最贴合原文得写法复现 Resnet - Faster R-CNN
注:> > 本文中的代码为结构性示例的代码片段,不能够复制粘贴直接运行
完整目录会在文章完成后更新
架构
原始更快 RCNN 以 VGG16 为基础骨干网络。
但是 VGG16-19 因为参数的急剧膨胀和深层结构搭建导致参数量暴涨,网络在反向传播过程中要不断地传播梯度,而当网络层数加深时,梯度在逐层传播过程中会逐渐衰减,导致无法对前面网络层的权重进行有效的调整。
因此 vgg19 就出现了它的局限性。
而在之后提出的残差网络中,加入了短连接为梯度带来了一个直接向前面层的传播通道,缓解了梯度的减小问题,同时,将整个网络的深度加到了 100 层 ,甚至后来的 DenseNet 出现了实用的 200 层 网络。并且大量使用了 1 * 1 卷积来降低参数量因此本文将尝试 ResNet 101 更快的 RCNN ,以及衔接 DenseNet 和 Faster-RCNN 的可能性。
从以上图中我们可以看出 Faster R-CNN 除了作为特征提取部分的主干网络,剩下的最关键的也就是以下部分
- RPN
- RPN LossFunction
- ROI Pooling
- Faster-R-CNN Loss Function
也就是说我们的复现工作要着重从这些部分开始。现在看到的最优秀的复现版本应该是 Jianwei Yang page
本文的代码较多的综合了多种写法,以及 pytorch 标准结构的写法
0.整体流程
先来看看代码(只是大概的看一下就行):
代码语言:javascript复制#################################非Resnet版本只看看基本结构就可以
class FasterRCNN(nn.Module):
n_classes = 21
classes = np.asarray(['__background__',
'aeroplane', 'bicycle', 'bird', 'boat',
'bottle', 'bus', 'car', 'cat', 'chair',
'cow', 'diningtable', 'dog', 'horse',
'motorbike', 'person', 'pottedplant',
'sheep', 'sofa', 'train', 'tvmonitor'])
PIXEL_MEANS = np.array([[[102.9801, 115.9465, 122.7717]]])
SCALES = (600,)
MAX_SIZE = 1000
def __init__(self, classes=None, debug=False):
super(FasterRCNN, self).__init__()
if classes is not None:
self.classes = classes
self.n_classes = len(classes)
self.rpn = RPN()
self.roi_pool = RoIPool(7, 7, 1.0/16)
self.fc6 = FC(512 * 7 * 7, 4096)
self.fc7 = FC(4096, 4096)
self.score_fc = FC(4096, self.n_classes, relu=False)
self.bbox_fc = FC(4096, self.n_classes * 4, relu=False)
# loss
self.cross_entropy = None
self.loss_box = None
@property
def loss(self):
return self.cross_entropy self.loss_box * 10
def forward(self, im_data, im_info, gt_boxes=None, gt_ishard=None, dontcare_areas=None):
features, rois = self.rpn(im_data, im_info, gt_boxes, gt_ishard, dontcare_areas)
if self.training:
roi_data = self.proposal_target_layer(rois, gt_boxes, gt_ishard, dontcare_areas, self.n_classes)
rois = roi_data[0]
# roi pool
pooled_features = self.roi_pool(features, rois)
x = pooled_features.view(pooled_features.size()[0], -1)
x = self.fc6(x)
x = F.dropout(x, training=self.training)
x = self.fc7(x)
x = F.dropout(x, training=self.training)
cls_score = self.score_fc(x)
cls_prob = F.softmax(cls_score)
bbox_pred = self.bbox_fc(x)
if self.training:
self.cross_entropy, self.loss_box = self.build_loss(cls_score, bbox_pred, roi_data)
return cls_prob, bbox_pred, rois
这段代码并不是完整定义,只显示了主要流程,辅助性,功能性的方法全部被省略。结构也被大大简化 当我们以数据为线索则会产生以下的流程
Faster RCNN 以数据为节点的处理流程
我们在主干网络中可以清晰地看到,向前按照什么样的顺序执行了整个流程( just take a look )
值得注意的是,在以上执行流程中,有些过程需要相应的辅助函数来进行 比如 loss 的构建,框生成等等,都需要完备的函数库来辅助进行。
以上流程图 ,以及本文的叙述顺序与线索,都是以数据为依托的,明确各个部分数据之间的计算 输入输出信息是非常重要的
初始训练数据包含了:
1.DataLoader
数据加载部分十分自然地,要输入我们的数据集,我们这篇文章使用按最常用的
代码语言:javascript复制coco2014/17标准
pascal VOC标准
自定义数据集标准
全部部分当然不能展现 但是我们会在开源项目中演示Voclike数据集,以及自定义数据集如何方便的被加载->开源-快速训练工具(未完成)(https://github.com/OOXXXXOO/WSNet)
在本文中为了关注主旨我们只介绍自定义数据集和 VocLike 数据集的加载过程
Data2Dataset
数据的原始形式,当然是以图片为主 我们以一张图为例.
使用labelme标注之后
保存之后软件就会自动生成例如:
代码语言:javascript复制{
"version": "3.4.1",
"flags": {},
"shapes": [
{
"label": "dog",
"line_color": null,
"fill_color": null,
"points": [
[
7,
144
],
[
307,
588
]
],
"shape_type": "rectangle"
},
......
{
"label": "dog",
"line_color": null,
"fill_color": null,
"points": [
[
756,
130
],
[
974,
507
]
],
"shape_type": "rectangle"
}
],
"lineColor": [
0,
255,
0,
128
],
"fillColor": [
255,
0,
0,
128
],
"imagePath": "timg.jpeg",
"imageData": "此处为base64编码过得图像数据"
}
还有 labelimg xml 标准的数据样本
代码语言:javascript复制<annotation>
<folder>图片</folder>
<filename>timg.jpeg</filename>
<path>/home/winshare/图片/timg.jpeg</path>
<source>
<database>Unknown</database>
</source>
<size>
<width>1000</width>
<height>612</height>
<depth>3</depth>
</size>
<segmented>0</segmented>
<object>
<name>dog</name>
<pose>Unspecified</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>9</xmin>
<ymin>163</ymin>
<xmax>309</xmax>
<ymax>584</ymax>
</bndbox>
</object>
....
<object>
<name>dog</name>
<pose>Unspecified</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>748</xmin>
<ymin>142</ymin>
<xmax>977</xmax>
<ymax>508</ymax>
</bndbox>
</object>
</annotation>
以及 yolo 标准的 bbox
代码语言:javascript复制class_id box
0 0.159000 0.610294 0.300000 0.687908
0 0.346000 0.433824 0.216000 0.638889
0 0.491500 0.449346 0.191000 0.588235
0 0.650000 0.511438 0.246000 0.614379
0 0.863000 0.535948 0.230000 0.588235
yolo 的 box 值最终会由下面的方法转换为标准的框数据( xywh )
代码语言:javascript复制def convert(size, box): # 归一化操作
#size图像尺寸
#box包围盒
dw = 1. / size[0]
dh = 1. / size[1]
x = (box[0] box[1]) / 2.0
y = (box[2] box[3]) / 2.0
w = box[1] - box[0]
h = box[3] - box[2]
x = x * dw
w = w * dw
y = y * dh
h = h * dh
return (x, y, w, h)
Dataset2Dataloader
很多对 Faster RCNN 复现的版本中,标准的数据加载流程没有被固定化,所以数据被以各种 datalayer ,roidb 等等方法包装,Pytorch0.4 之后,实质上已经出现了最标准化的数据输入形式
因此我们设 getdata() 为从一个数据列表的 json 对象中根据索引值返回我们需要的指定信息 imagebboxlist classlist scale
这其中 bboxlist 指的是一张图像中所有目标标注框构成的列表,classlist 指的是类名,并保证两个列表索引对齐
有了这些设计目标,我们就可以开始构建代码了
代码语言:javascript复制class Dataset:
def __init__(self, opt):
self.opt = opt
self.db = DatasetClass(opt.data_dir)
#定义原始数据的来源如果是自定义数据可以尝试设置数据的列表,然后利用一个通用的读取器来读取数据
self.tsf = Transform(opt.min_size, opt.max_size)
def __getitem__(self, idx):
image, bboxlist, classlist= self.db.getdata(idx)
img, bbox, label, scale = self.tsf((image, bboxlist, classlist))#对原始数据的变换操作
# TODO: check whose stride is negative to fix this instead copy all
# some of the strides of a given numpy array are negative.
return img.copy(), bbox.copy(), label.copy(), scale
def __len__(self):
return len(self.db)
其中变换操作,是为了使得数据转换为张量,以及对数据的平移旋转等数据增强手段。实质上,Pytorch 提供了一系列的 Transform 下面的代码实际上有很多部分可以省略或者替代
代码语言:javascript复制class Transform(object):
def __init__(self, min_size=600, max_size=1000):
self.min_size = min_size
self.max_size = max_size
def __call__(self, in_data):
img, bbox, label = in_data
_, H, W = img.shape
img = preprocess(img, self.min_size, self.max_size)
_, o_H, o_W = img.shape
scale = o_H / H
bbox = util.resize_bbox(bbox, (H, W), (o_H, o_W))
# horizontally flip
img, params = util.random_flip(
img, x_random=True, return_param=True)
bbox = util.flip_bbox(
bbox, (o_H, o_W), x_flip=params['x_flip'])
return img, bbox, label, scale
实质上影像的处理依靠 torchvision.transforms 的写法更加符合一般性 pytorch 的标准 这里不在多的探讨. 经过 Dataset 处理和包装之后,其实通过获取方法得到的数据已经可以进入网络训练了,但是实质上还需要最后一层包装。
代码语言:javascript复制from torch.utils import data as data_
dataset=Dataset()
dataloader = data_.DataLoader(dataset,
batch_size=1,
shuffle=True,
# pin_memory=True,
num_workers=num_workers)
在 pytorch 的体系中,数据加载的最终目的使用 Dataloader 处理 dataset 对象,以方便的控制 Batch,Shuffle 等等操作。
建议的简介原始数据被转换为 list 或者以序号为索引的字典,因为训练流程的大 IO 量 所以一些索引比较慢的格式会深刻的影响训练速度。
值得注意的一点:在以上的 DataLoader 中 Worker 是负责数据加载的多进程数量。torch.multiprocessing 是一个本地 multiprocessing 模块的包装.
它注册了自定义的reducers, 并使用共享内存为不同的进程在同一份数据上提供共享的视图. 一旦 tensor/storage 被移动到共享内存 , 将其发送到任何进程不会造成拷贝开销.
此 API 100% 兼容原生模块 - 所以足以将 import multiprocessing 改成 import torch.multiprocessing 使得所有的 tensors 通过队列发送或者使用其它共享机制, 移动到共享内存.
Python 3 支持进程之间共享 CUDA 张量,我们可以使用 spawn 或forkserver 启动此类方法。Python 2 中的 multiprocessing 多进程处理只能使用 fork 创建子进程,并且CUDA 运行时不支持多进程处理。
以下代码显示了 worker 的实际工作样貌
代码语言:javascript复制if self.num_workers > 0:
# worker_init_fn是worker初始化函数
self.worker_init_fn = loader.worker_init_fn
# index_queue 索引队列 每个worker进程对应一个:
self.index_queues = [multiprocessing.Queue() for _ in range(self.num_workers)]
# worker 队列索引
self.worker_queue_idx = 0
# worker_result_queue 进程间通信
# multiprocessing.SimpleQueue是multiprocessing.Queue([maxsize])的简化,只有三个方法------empty(), get(), put()
self.worker_result_queue = multiprocessing.SimpleQueue()
# batches_outstanding
# 当前已经准备好的 batch 的数量(可能有些正在准备中)
# 当为 0 时, 说明, dataset 中已经没有剩余数据了。
# 初始值为 0, 在 self._put_indices() 中 1,在 self.__next__ 中-1
self.batches_outstanding = 0
self.worker_pids_set = False
# shutdown为True是关闭worker
self.shutdown = False
# send_idx, rcvd_idx——发送索引,接收索引
# send_idx 用来记录 这次要放 index_queue 中 batch 的 idx
self.send_idx = 0
# rcvd_idx 用来记录 这次要从 data_queue 中取出 的 batch 的 idx
self.rcvd_idx = 0
# 因为多进程,可能会导致 data_queue 中的batch乱序
# 用这个来保证 batch 的返回是按照send_idx升序出去的。
self.reorder_dict = {}
# 创建num_workers个worker进程来处理
self.workers = [
multiprocessing.Process(
target=_worker_loop,
args=(self.dataset, self.index_queues[i],
self.worker_result_queue, self.collate_fn, base_seed i,
self.worker_init_fn, i))
for i in range(self.num_workers)]
# 这里暂不分析CUDA或者timeout的情况
if self.pin_memory or self.timeout > 0:
...
else:
# data_queue就是self.worker_result_queue(MultiProcessing.SimpleQueue()类型)
# 这个唯一的队列
self.data_queue = self.worker_result_queue
# 设置守护进程
for w in self.workers:
w.daemon = True # ensure that the worker exits on process exit
w.start()
...
# prime the prefetch loop
# 初始化的时候,就将 2*num_workers 个 (batch_idx, sampler_indices) 放到 index_queue 中
for _ in range(2 * self.num_workers):
self._put_indices()
通过数层的封装,我们完成了对训练数据的高速加载,变换,BatchSIze,Shuffle 等训练流程所需的操作的构建。最终在训练流程中通过迭代器就可以获取数据输入网络流程。最终通过:
代码语言:javascript复制
这时候的数据就可以直接输入网络了,我们也顺利的进入到了下一个阶段。
2.BackBone - Resnet/VGG
作为两阶段网络的骨干网络,其深度,和性能优劣都深刻的影响着整个网络的性能,其前向推断的速度和准确度都至关重要,VGG 作为最原始的骨干网络,各方面的表现都已经落后于新提出的网络。所以我们从 Resnet 的结构说起
原始 VGG 网络和 Resnet34 的对比
相比 VGG 的各种问题来说 Resnet 提出了新的残差块来对不必要的卷积流程进行跳过,于是网络的加深,高级特征的提取变得更加容易,在此之后,几乎所有的骨干网络更新都是从块结构的优化着手。例如 DenseNet 就对块结构做出了更多连接模式的探索
简单起见我们从最基础的 BackBone-Resnet 开始
1. BasicBlock
在代码阶段的表现就是 ResNet 网络的构建代码中包含了跳层结构
代码语言:javascript复制class BasicBlock(nn.Module):
expansion = 1
def __init__(self, inplanes, planes, stride=1, downsample=None):
super(BasicBlock, self).__init__()
self.conv1 = conv3x3(inplanes, planes, stride)
self.bn1 = nn.BatchNorm2d(planes)
self.relu = nn.ReLU(inplace=True)
self.conv2 = conv3x3(planes, planes)
self.bn2 = nn.BatchNorm2d(planes)
self.downsample = downsample
self.stride = stride
def forward(self, x):
identity = x
out = self.conv1(x)
out = self.bn1(out)
out = self.relu(out)
out = self.conv2(out)
out = self.bn2(out)
if self.downsample is not None:
identity = self.downsample(x)
out = identity
out = self.relu(out)
return out
代码语言:javascript复制2.Bottleneck
代码语言:javascript复制
class Bottleneck(nn.Module):
expansion = 4
def __init__(self, inplanes, planes, stride=1, downsample=None):
super(Bottleneck, self).__init__()
self.conv1 = conv1x1(inplanes, planes)
self.bn1 = nn.BatchNorm2d(planes)
self.conv2 = conv3x3(planes, planes, stride)
self.bn2 = nn.BatchNorm2d(planes)
self.conv3 = conv1x1(planes, planes * self.expansion)
self.bn3 = nn.BatchNorm2d(planes * self.expansion)
self.relu = nn.ReLU(inplace=True)
self.downsample = downsample
self.stride = stride
def forward(self, x):
identity = x
out = self.conv1(x)
out = self.bn1(out)
out = self.relu(out)
out = self.conv2(out)
out = self.bn2(out)
out = self.relu(out)
out = self.conv3(out)
out = self.bn3(out)
if self.downsample is not None:
identity = self.downsample(x)
out = identity
out = self.relu(out)
return out
我们在以上网络中看到 跳层的控制结构由 downsample() 控制,也就是说残差块会判断下采样是否为空,如果训练流程执行的不是下采样,那么就进行正常的卷积流程。但是如果训练流程决定执行下采样,就说明残差块中的卷积结果需要加上下采样生成的恒等(identity)。我们从原理上看一下它为什么有效,首先我们来定义残差单元:
其中 $h(x)$ 为 identity 一般求解过程中直接设为 x
$F(x_{l},W_{l})$ 为残差函数( w 是什么不用说了吧) $f(x)$ 为激活函数 ReLU 从此定义来看我们从 l 层学习到 L 层:
求反向传播梯度:
表示 loss 在 L 层的梯度,小括号里面的是残差梯度,其加法结构相比传统的乘法结构有一个直接的好处就是,可以发现当 Loss 在很小的时候也因为 1 的存在不会出现残差梯度的消失,既该层不会像传统网络一样,因为乘法结构导致梯度消失。具体的讨论可以在下面的文章中找到
Identity Mappings in Deep Residual Networks
(https://arxiv.org/pdf/1603.05027.pdf)
因为我们只需 BackBone 作为提取特征的工具, 最终将图片提取为一个合乎其他部分输入的 featuremap 就可以 我们来看一下,最终的骨干网络怎么构成:
代码语言:javascript复制class ResNet(nn.Module):
def __init__(self, block, layers, num_classes=1000):
self.inplanes = 64
super(ResNet, self).__init__()
self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3,
bias=False)
self.bn1 = nn.BatchNorm2d(64)
self.relu = nn.ReLU(inplace=True)
self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
self.layer1 = self._make_layer(block, 64, layers[0])
self.layer2 = self._make_layer(block, 128, layers[1], stride=2)
self.layer3 = self._make_layer(block, 256, layers[2], stride=2)
self.layer4 = self._make_layer(block, 512, layers[3], stride=2)
self.avgpool = nn.AvgPool2d(7, stride=1)
self.fc = nn.Linear(512 * block.expansion, num_classes)
for m in self.modules():
if isinstance(m, nn.Conv2d):
n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
m.weight.data.normal_(0, math.sqrt(2. / n))
elif isinstance(m, nn.BatchNorm2d):
m.weight.data.fill_(1)
m.bias.data.zero_()
def _make_layer(self, block, planes, blocks, stride=1):
downsample = None
if stride != 1 or self.inplanes != planes * block.expansion:
downsample = nn.Sequential(
nn.Conv2d(self.inplanes, planes * block.expansion,
kernel_size=1, stride=stride, bias=False),
nn.BatchNorm2d(planes * block.expansion),
)
layers = []
layers.append(block(self.inplanes, planes, stride, downsample))
self.inplanes = planes * block.expansion
for i in range(1, blocks):
layers.append(block(self.inplanes, planes))
return nn.Sequential(*layers)
def forward(self, x):
x = self.conv1(x)
x = self.bn1(x)
x = self.relu(x)
x = self.maxpool(x)
x = self.layer1(x)
x = self.layer2(x)
x = self.layer3(x)
x = self.layer4(x)
x = self.avgpool(x)
x = x.view(x.size(0), -1)
x = self.fc(x)
return x
由此生成的标准 Resnet 肯定是不能为我们直接使用的,因为 RPN 接口所需要的是 FeatureMap 不是最后全连接的结果,因此,我们需要 Layer3 的输出,而不是 fc 的输出。那么问题来了:
Resnet 怎么嫁接到 RPN 网络呢?
这就要从 RPN 需要输入的尺寸,和 Layer 输出的尺寸说起。我们根据图可以看到 Resnet101 layer3 的输出是 1*1,1024
适配工作 AnyFeature to RPN
我们以 Resnet101 为例 来展示一个典型的 Resnet 网络作为 Faster RCNN 的 BackBone 是怎么一种操作
在原版中,VGG 作为 BackBone 我们看到的写法是
代码语言:javascript复制self.features = VGG16(bn=False)
self.conv1 = Conv2d(512, 512, 3, same_padding=True)
.
.
.
def forward(self, im_data, im_info, gt_boxes=None, gt_ishard=None, dontcare_areas=None):
features = self.features(im_data)
rpn_conv1 = self.conv1(features)
也就是说只要我们把最终把 BackBone 产生的 feature 尺寸和 Conv1 输入的尺寸匹配好就可以了, 当我们定义一个 Resnet for RPN 的类的时候可以参考下面的流程
代码语言:javascript复制class resnet(_fasterRCNN):
def __init__(self, classes, num_layers=101, pretrained=False, class_agnostic=False):
self.model_path = 'data/pretrained_model/resnet101_caffe.pth'
self.dout_base_model = 1024
self.pretrained = pretrained
self.class_agnostic = class_agnostic
_fasterRCNN.__init__(self, classes, class_agnostic)
def _init_modules(self):
resnet = resnet101()
if self.pretrained == True:
print("Loading pretrained weights from %s" %(self.model_path))
state_dict = torch.load(self.model_path)
resnet.load_state_dict({k:v for k,v in state_dict.items() if k in resnet.state_dict()})
# Build resnet.
##########################
self.RCNN_base = nn.Sequential(resnet.conv1, resnet.bn1,resnet.relu,
resnet.maxpool,resnet.layer1,resnet.layer2,resnet.layer3)
self.RCNN_top = nn.Sequential(resnet.layer4)
##########################
self.RCNN_cls_score = nn.Linear(2048, self.n_classes)
if self.class_agnostic:
self.RCNN_bbox_pred = nn.Linear(2048, 4)
else:
self.RCNN_bbox_pred = nn.Linear(2048, 4 * self.n_classes)
# Fix blocks
for p in self.RCNN_base[0].parameters(): p.requires_grad=False
for p in self.RCNN_base[1].parameters(): p.requires_grad=False
assert (0 <= cfg.RESNET.FIXED_BLOCKS < 4)
if cfg.RESNET.FIXED_BLOCKS >= 3:
for p in self.RCNN_base[6].parameters(): p.requires_grad=False
if cfg.RESNET.FIXED_BLOCKS >= 2:
for p in self.RCNN_base[5].parameters(): p.requires_grad=False
if cfg.RESNET.FIXED_BLOCKS >= 1:
for p in self.RCNN_base[4].parameters(): p.requires_grad=False
def set_bn_fix(m):
classname = m.__class__.__name__
if classname.find('BatchNorm') != -1:
for p in m.parameters(): p.requires_grad=False
self.RCNN_base.apply(set_bn_fix)
self.RCNN_top.apply(set_bn_fix)
可以看到常见的做法就是把这 BackBone 分成两部分,以 ResNet101 为例,这里把构造过程分成了两部分:
代码语言:javascript复制
self.RCNN_base = nn.Sequential(resnet.conv1, resnet.bn1,resnet.relu,
resnet.maxpool,resnet.layer1,resnet.layer2,resnet.layer3)
self.RCNN_top = nn.Sequential(resnet.layer4)
def _head_to_tail(self, pool5):
fc7 = self.RCNN_top(pool5).mean(3).mean(2)
return fc7
base_feat = self.RCNN_base(6)
# feed base feature map tp RPN to obtain rois
rois, rpn_loss_cls, rpn_loss_bbox = self.RCNN_rpn(base_feat, im_info, gt_boxes, num_boxes)
既从最开始到 Layer3 为一部分,layer4 为一部分,在之后的操作中RCNN_Base 作为通用的 feature 输入 RPN,而经过 ROI Pooling(Align)后的 feature 进入最后的 layer4
代码语言:javascript复制
pooled_feat = self.RCNN_roi_pool(base_feat, rois.view(-1,5))
# ..................最终拼接位置
pooled_feat = self._head_to_tail(pooled_feat)# compute bbox offset
经过 layer4 之后池化的 feature 在进入类别预测和 box 预测 如下
代码语言:javascript复制class RPN(nn.Module):
_feat_stride = [16, ]
anchor_scales = [8, 16, 32]
def __init__(self):
super(RPN, self).__init__()
self.features = VGG16(bn=False)
self.conv1 = Conv2d(512, 512, 3, same_padding=True)
self.score_conv = Conv2d(512, len(self.anchor_scales) * 3 * 2, 1, relu=False, same_padding=False)
self.bbox_conv = Conv2d(512, len(self.anchor_scales) * 3 * 4, 1, relu=False, same_padding=False)
# loss
self.cross_entropy = None
self.los_box = None
@property
def loss(self):
return self.cross_entropy self.loss_box * 10
def forward(self, im_data, im_info, gt_boxes=None, gt_ishard=None, dontcare_areas=None):
"""
|-->rpn_cls_score_net--->_______--->Class Scores--->|softmax--->|Class Probabilities
| w/16,h/16,9,2 reshape
|
rpn_net -->relu-->|
|
|-->rpn_bbx_pred_net---->_______--->Bounding Box regressors---->|
w/16,h/16,9,4 reshape
"""
im_data = network.np_to_variable(im_data, is_cuda=True)
im_data = im_data.permute(0, 3, 1, 2)
features = self.features(im_data)
rpn_conv1 = self.conv1(features)
# rpn cls score net
rpn_cls_score = self.score_conv(rpn_conv1)
rpn_cls_score_reshape = self.reshape_layer(rpn_cls_score, 2)
rpn_cls_prob = F.softmax(rpn_cls_score_reshape)
rpn_cls_prob_reshape = self.reshape_layer(rpn_cls_prob, len(self.anchor_scales)*3*2)
# rpn bbx pred net
rpn_bbox_pred = self.bbox_conv(rpn_conv1)
# proposal layer
cfg_key = 'TRAIN' if self.training else 'TEST'
#
rois = self.proposal_layer(rpn_cls_prob_reshape, rpn_bbox_pred, im_info,
cfg_key, self._feat_stride, self.anchor_scales)
# generating training labels and build the rpn loss
if self.training:
assert gt_boxes is not None
rpn_data = self.anchor_target_layer(rpn_cls_score, gt_boxes, gt_ishard, dontcare_areas,
im_info, self._feat_stride, self.anchor_scales)
self.cross_entropy, self.loss_box = self.build_loss(rpn_cls_score_reshape, rpn_bbox_pred, rpn_data)
return features, rois
代码语言:javascript复制def proposal_layer(rpn_cls_prob_reshape, rpn_bbox_pred, im_info, cfg_key, _feat_stride=[16, ],
anchor_scales=[8, 16, 32]):
"""
Parameters
----------
rpn_cls_prob_reshape: (1 , H , W , Ax2) outputs of RPN, prob of bg or fg
NOTICE: the old version is ordered by (1, H, W, 2, A) !!!!
rpn_bbox_pred: (1 , H , W , Ax4), rgs boxes output of RPN
im_info: a list of [image_height, image_width, scale_ratios]
cfg_key: 'TRAIN' or 'TEST'
_feat_stride: the downsampling ratio of feature map to the original input image
anchor_scales: the scales to the basic_anchor (basic anchor is [16, 16])
----------
Returns
----------
rpn_rois : (1 x H x W x A, 5) e.g. [0, x1, y1, x2, y2]
"""
#注意在这个位置就生成了预置框了
_anchors = generate_anchors(scales=np.array(anchor_scales))
_num_anchors = _anchors.shape[0]
im_info = im_info[0]
assert rpn_cls_prob_reshape.shape[0] == 1,
'Only single item batches are supported'
# cfg_key = str(self.phase) # either 'TRAIN' or 'TEST'
# cfg_key = 'TEST'
pre_nms_topN = cfg[cfg_key].RPN_PRE_NMS_TOP_N
post_nms_topN = cfg[cfg_key].RPN_POST_NMS_TOP_N
#这个阈值很重要
nms_thresh = cfg[cfg_key].RPN_NMS_THRESH
min_size = cfg[cfg_key].RPN_MIN_SIZE
# the first set of _num_anchors channels are bg probs
# the second set are the fg probs, which we want
scores = rpn_cls_prob_reshape[:, _num_anchors:, :, :]
bbox_deltas = rpn_bbox_pred
# im_info = bottom[2].data[0, :]
# 1. Generate proposals from bbox deltas and shifted anchors
height, width = scores.shape[-2:]
# Enumerate all shifts
shift_x = np.arange(0, width) * _feat_stride
shift_y = np.arange(0, height) * _feat_stride
shift_x, shift_y = np.meshgrid(shift_x, shift_y)
shifts = np.vstack((shift_x.ravel(), shift_y.ravel(),
shift_x.ravel(), shift_y.ravel())).transpose()
# Enumerate all shifted anchors:
#
# add A anchors (1, A, 4) to
# cell K shifts (K, 1, 4) to get
# shift anchors (K, A, 4)
# reshape to (K*A, 4) shifted anchors
A = _num_anchors
K = shifts.shape[0]
anchors = _anchors.reshape((1, A, 4))
shifts.reshape((1, K, 4)).transpose((1, 0, 2))
anchors = anchors.reshape((K * A, 4))
# Transpose and reshape predicted bbox transformations to get them
# into the same order as the anchors:
#
# bbox deltas will be (1, 4 * A, H, W) format
# transpose to (1, H, W, 4 * A)
# reshape to (1 * H * W * A, 4) where rows are ordered by (h, w, a)
# in slowest to fastest order
bbox_deltas = bbox_deltas.transpose((0, 2, 3, 1)).reshape((-1, 4))
# Same story for the scores:
#
# scores are (1, A, H, W) format
# transpose to (1, H, W, A)
# reshape to (1 * H * W * A, 1) where rows are ordered by (h, w, a)
scores = scores.transpose((0, 2, 3, 1)).reshape((-1, 1))
# Convert anchors into proposals via bbox transformations
proposals = bbox_transform_inv(anchors, bbox_deltas)
# 2. clip predicted boxes to image
proposals = clip_boxes(proposals, im_info[:2])
# 3. remove predicted boxes with either height or width < threshold
# (NOTE: convert min_size to input image scale stored in im_info[2])
keep = _filter_boxes(proposals, min_size * im_info[2])
proposals = proposals[keep, :]
scores = scores[keep]
# # remove irregular boxes, too fat too tall
# keep = _filter_irregular_boxes(proposals)
# proposals = proposals[keep, :]
# scores = scores[keep]
# 4. sort all (proposal, score) pairs by score from highest to lowest
# 5. take top pre_nms_topN (e.g. 6000)
order = scores.ravel().argsort()[::-1]
if pre_nms_topN > 0:
order = order[:pre_nms_topN]
proposals = proposals[order, :]
scores = scores[order]
# 6. apply nms (e.g. threshold = 0.7)
# 7. take after_nms_topN (e.g. 300)
# 8. return the top proposals (-> RoIs top)
keep = nms(np.hstack((proposals, scores)), nms_thresh)
if post_nms_topN > 0:
keep = keep[:post_nms_topN]
proposals = proposals[keep, :]
scores = scores[keep]
# Output rois blob
# Our RPN implementation only supports a single input image, so all
# batch inds are 0
batch_inds = np.zeros((proposals.shape[0], 1), dtype=np.float32)
blob = np.hstack((batch_inds, proposals.astype(np.float32, copy=False)))
return blob
# top[0].reshape(*(blob.shape))
# top[0].data[...] = blob
# [Optional] output scores blob
# if len(top) > 1:
# top[1].reshape(*(scores.shape))
# top[1].data[...] = scores
代码语言:javascript复制import numpy as np
sub_sample=16
ratio = [0.5, 1, 2]
anchor_scales = [8, 16, 32]
anchor_base = np.zeros((len(ratios) * len(scales), 4), dtype=np.float32)
ctr_y = sub_sample / 2.
ctr_x = sub_sample / 2.
print(ctr_y, ctr_x)
for i in range(len(ratios)):
for j in range(len(anchor_scales)):
h = sub_sample * anchor_scales[j] * np.sqrt(ratios[i])
w = sub_sample * anchor_scales[j] * np.sqrt(1./ ratios[i])
index = i * len(anchor_scales) j
anchor_base[index, 0] = ctr_y - h / 2.
anchor_base[index, 1] = ctr_x - w / 2.
anchor_base[index, 2] = ctr_y h / 2.
anchor_base[index, 3] = ctr_x w / 2.
以上的输出为featuremap上第一个像素位置的anchor,
我们必须依照这个流程生成所有像素位置上的anchor:
2.在 feature map 上的每个像素位置,我们需要生成 9 个锚点框,既每个框由(‘y1’, ‘x1’, ‘y2’, ‘x2’)构成因此总共有 95050=22500 个框,因此最后,一张图的 anchor 数据尺寸应该是( 22500,4 )
3.在 22500 个框中最后有相当部分的框实质上超出了图像的边界,因此我们根据最直接的边界计算就能筛除,最终 22500 个框剩下 17500 个有效框( 17500,4 )
fe_size = (800//16)
ctr_x = np.arange(16, (fe_size 1) * 16, 16)
ctr_y = np.arange(16, (fe_size 1) * 16, 16)
index = 0
for x in range(len(ctr_x)):
for y in range(len(ctr_y)):
ctr[index, 1] = ctr_x[x] - 8
ctr[index, 0] = ctr_y[y] - 8
index =1
anchors = np.zeros((fe_size * fe_size * 9), 4)
index = 0
for c in ctr:
ctr_y, ctr_x = c
for i in range(len(ratios)):
for j in range(len(anchor_scales)):
h = sub_sample * anchor_scales[j] * np.sqrt(ratios[i])
w = sub_sample * anchor_scales[j] * np.sqrt(1./ ratios[i])
anchors[index, 0] = ctr_y - h / 2.
anchors[index, 1] = ctr_x - w / 2.
anchors[index, 2] = ctr_y h / 2.
anchors[index, 3] = ctr_x w / 2.
index = 1
print(anchors.shape)
#Out: [22500, 4]
现在我们已经从原理上把所有框都生成了。为了工程调用起见我们用下面的方法包装
def _whctrs(anchor):
"""
Return width, height, x center, and y center for an anchor (window).
"""
w = anchor[2] - anchor[0] 1
h = anchor[3] - anchor[1] 1
x_ctr = anchor[0] 0.5 * (w - 1)
y_ctr = anchor[1] 0.5 * (h - 1)
return w, h, x_ctr, y_ctr
def _mkanchors(ws, hs, x_ctr, y_ctr):
"""
Given a vector of widths (ws) and heights (hs) around a center
(x_ctr, y_ctr), output a set of anchors (windows).
"""
ws = ws[:, np.newaxis]
hs = hs[:, np.newaxis]
anchors = np.hstack((x_ctr - 0.5 * (ws - 1),
y_ctr - 0.5 * (hs - 1),
x_ctr 0.5 * (ws - 1),
y_ctr 0.5 * (hs - 1)))
return anchors
def _ratio_enum(anchor, ratios):
"""
Enumerate a set of anchors for each aspect ratio wrt an anchor.
"""
w, h, x_ctr, y_ctr = _whctrs(anchor)
size = w * h
size_ratios = size / ratios
ws = np.round(np.sqrt(size_ratios))
hs = np.round(ws * ratios)
anchors = _mkanchors(ws, hs, x_ctr, y_ctr)
return anchors
def _scale_enum(anchor, scales):
"""
Enumerate a set of anchors for each scale wrt an anchor.
"""
w, h, x_ctr, y_ctr = _whctrs(anchor)
ws = w * scales
hs = h * scales
anchors = _mkanchors(ws, hs, x_ctr, y_ctr)
return anchors
def generate_anchors(base_size=16, ratios=[0.5, 1, 2],
scales=2**np.arange(3, 6)):
"""
Generate anchor (reference) windows by enumerating aspect ratios X
scales wrt a reference (0, 0, 15, 15) window.
"""
base_anchor = np.array([1, 1, base_size, base_size]) - 1
ratio_anchors = _ratio_enum(base_anchor, ratios)
anchors = np.vstack([_scale_enum(ratio_anchors[i, :], scales)
for i in xrange(ratio_anchors.shape[0])])
return anchors
print(anchors)
#array([[ -83., -39., 100., 56.],
# [-175., -87., 192., 104.],
# [-359., -183., 376., 200.],
# [ -55., -55., 72., 72.],
# [-119., -119., 136., 136.],
# [-247., -247., 264., 264.],
# [ -35., -79., 52., 96.],
# [ -79., -167., 96., 184.],
# [-167., -343., 184., 360.]])
代码语言:javascript复制import numpy as np
import matplotlib.pyplot as plt
import cv2
def display(cordlist):
back=np.zeros((800,800,3),dtype=np.uint8)
for index,cord in enumerate(cordlist):
print('draw ',cord)
color=(np.random.randint(127,255),np.random.randint(127,255),np.random.randint(127,255))
print('color is ',color)
cv2.rectangle(back, (int(cord[0]),int(cord[1])),
(int(cord[2]),int(cord[3])), color, 1)
cv2.putText(back, str(cord[4]), (int(cord[0]),int(cord[1])),cv2.FONT_ITALIC,0.5,color, 1)
plt.imshow(back),plt.show()
return back
def py_cpu_nms(dets, thresh):
"""Pure Python NMS baseline."""
# 所有图片的坐标信息,字典形式储存??
x1 = dets[:, 0]
y1 = dets[:, 1]
x2 = dets[:, 2]
y2 = dets[:, 3]
scores = dets[:, 4]
areas = (x2 - x1 1) * (y2 - y1 1) # 计算出所有图片的面积
order = scores.argsort()[::-1] # 图片评分按升序排序
keep = [] # 用来存放最后保留的图片的相应评分
while order.size > 0:
i = order[0] # i 是还未处理的图片中的最大评分
keep.append(i) # 保留改图片的值
# 矩阵操作,下面计算的是图片i分别与其余图片相交的矩形的坐标
tmp=x1[order[1:]]
xxxx = x1[i]
xx1 = np.maximum(x1[i], x1[order[1:]])
yy1 = np.maximum(y1[i], y1[order[1:]])
xx2 = np.minimum(x2[i], x2[order[1:]])
yy2 = np.minimum(y2[i], y2[order[1:]])
# 计算出各个相交矩形的面积
w = np.maximum(0.0, xx2 - xx1 1)
h = np.maximum(0.0, yy2 - yy1 1)
inter = w * h
# 计算重叠比例
ovr = inter / (areas[i] areas[order[1:]] - inter)
# 只保留比例小于阙值的图片,然后继续处理
inds = np.where(ovr <= thresh)[0]
indsd= inds 1
order = order[inds 1]
return keep
boxes = np.array([[100, 100, 150, 168, 0.63],[166, 70, 312, 190, 0.55],[221, 250, 389, 500, 0.79],[12, 190, 300, 399, 0.9],[28, 130, 134, 302, 0.3]])
# boxes[:,:3] =100
display(boxes)#显示NMS输入
thresh = 0.1
keep = py_cpu_nms(boxes, thresh)
print('keep:',keep)
keep_=boxes[keep]#nms之后的索引
display(keep_)#显示NMS输出
代码语言:javascript复制#torch.numel() 表示一个张量总元素的个数
#torch.clamp(min, max) 设置上下限
#tensor.item() 把tensor元素取出作为numpy数字
def nms(self, bboxes, scores, threshold=0.5):
x1 = bboxes[:,0]
y1 = bboxes[:,1]
x2 = bboxes[:,2]
y2 = bboxes[:,3]
areas = (x2-x1)*(y2-y1) # [N,] 每个bbox的面积
_, order = scores.sort(0, descending=True) # 降序排列
keep = []
while order.numel() > 0: # torch.numel()返回张量元素个数
if order.numel() == 1: # 保留框只剩一个
i = order.item()
keep.append(i)
break
else:
i = order[0].item() # 保留scores最大的那个框box[i]
keep.append(i)
# 计算box[i]与其余各框的IOU(思路很好)
xx1 = x1[order[1:]].clamp(min=x1[i]) # [N-1,]
yy1 = y1[order[1:]].clamp(min=y1[i])
xx2 = x2[order[1:]].clamp(max=x2[i])
yy2 = y2[order[1:]].clamp(max=y2[i])
inter = (xx2-xx1).clamp(min=0) * (yy2-yy1).clamp(min=0) # [N-1,]
iou = inter / (areas[i] areas[order[1:]]-inter) # [N-1,]
idx = (iou <= threshold).nonzero().squeeze() # 注意此时idx为[N-1,] 而order为[N,]
if idx.numel() == 0:
break
order = order[idx 1] # 修补索引之间的差值
return torch.LongTensor(keep) # Pytorch的索引值为LongTensor
代码语言:javascript复制
def build_loss(self, rpn_cls_score_reshape, rpn_bbox_pred, rpn_data):
# classification loss
rpn_cls_score = rpn_cls_score_reshape.permute(0, 2, 3, 1).contiguous().view(-1, 2)
rpn_label = rpn_data[0].view(-1)
rpn_keep = Variable(rpn_label.data.ne(-1).nonzero().squeeze()).cuda()
rpn_cls_score = torch.index_select(rpn_cls_score, 0, rpn_keep)
rpn_label = torch.index_select(rpn_label, 0, rpn_keep)
fg_cnt = torch.sum(rpn_label.data.ne(0))
rpn_cross_entropy = F.cross_entropy(rpn_cls_score, rpn_label)
# box loss
rpn_bbox_targets, rpn_bbox_inside_weights, rpn_bbox_outside_weights = rpn_data[1:]
rpn_bbox_targets = torch.mul(rpn_bbox_targets, rpn_bbox_inside_weights)
rpn_bbox_pred = torch.mul(rpn_bbox_pred, rpn_bbox_inside_weights)
rpn_loss_box = F.smooth_l1_loss(rpn_bbox_pred, rpn_bbox_targets, size_average=False) / (fg_cnt 1e-4)
return rpn_cross_entropy, rpn_loss_box
代码语言:javascript复制
class RoIPoolFunction(Function):
def __init__(self, pooled_height, pooled_width, spatial_scale):
self.pooled_width = int(pooled_width)
self.pooled_height = int(pooled_height)
self.spatial_scale = float(spatial_scale)
self.output = None
self.argmax = None
self.rois = None
self.feature_size = None
def forward(self, features, rois):
batch_size, num_channels, data_height, data_width = features.size()
num_rois = rois.size()[0]
output = torch.zeros(num_rois, num_channels, self.pooled_height, self.pooled_width)
argmax = torch.IntTensor(num_rois, num_channels, self.pooled_height, self.pooled_width).zero_()
if not features.is_cuda:
_features = features.permute(0, 2, 3, 1)
roi_pooling.roi_pooling_forward(self.pooled_height, self.pooled_width, self.spatial_scale,
_features, rois, output)
# output = output.cuda()
else:
output = output.cuda()
argmax = argmax.cuda()
roi_pooling.roi_pooling_forward_cuda(self.pooled_height, self.pooled_width, self.spatial_scale,
features, rois, output, argmax)
self.output = output
self.argmax = argmax
self.rois = rois
self.feature_size = features.size()
return output
def backward(self, grad_output):
assert(self.feature_size is not None and grad_output.is_cuda)
batch_size, num_channels, data_height, data_width = self.feature_size
grad_input = torch.zeros(batch_size, num_channels, data_height, data_width).cuda()
roi_pooling.roi_pooling_backward_cuda(self.pooled_height, self.pooled_width, self.spatial_scale,
grad_output, self.rois, grad_input, self.argmax)
# print grad_input
return grad_input, None
可以看到 ROI Pooling 的 Python 部分其实没有什么计算的部分,其计算部分都被隐藏在了 CUDA-C 以及 C 中,以保证其高效。我们对比一下 C 和CUDA 的代码 ( forward ) CUDA:
int roi_pooling_forward_cuda(int pooled_height, int pooled_width, float spatial_scale,
THCudaTensor * features, THCudaTensor * rois, THCudaTensor * output, THCudaIntTensor * argmax)
{
// Grab the input tensor
float * data_flat = THCudaTensor_data(state, features);//输入的Featuremap
float * rois_flat = THCudaTensor_data(state, rois);//输入的多个ROI
float * output_flat = THCudaTensor_data(state, output);
int * argmax_flat = THCudaIntTensor_data(state, argmax);
// Number of ROIs
int num_rois = THCudaTensor_size(state, rois, 0);//ROI的数量
int size_rois = THCudaTensor_size(state, rois, 1);//ROI的尺寸
if (size_rois != 5)
{
return 0;
}
// batch size
int batch_size = THCudaTensor_size(state, features, 0);
if (batch_size != 1)
{
return 0;
}
// data height
int data_height = THCudaTensor_size(state, features, 2);
// data width
int data_width = THCudaTensor_size(state, features, 3);
// Number of channels
int num_channels = THCudaTensor_size(state, features, 1);
cudaStream_t stream = THCState_getCurrentStream(state);
ROIPoolForwardLaucher(
data_flat, spatial_scale, num_rois, data_height,
data_width, num_channels, pooled_height,
pooled_width, rois_flat,
output_flat, argmax_flat, stream);
return 1;
}
在这段代码中所有数据被整合进了ROI Pool Forward Launcher中,然后执行了
int ROIPoolForwardLaucher(
const float* bottom_data, const float spatial_scale, const int num_rois, const int height,
const int width, const int channels, const int pooled_height,
const int pooled_width, const float* bottom_rois,
float* top_data, int* argmax_data, cudaStream_t stream)
{
const int kThreadsPerBlock = 1024;
const int output_size = num_rois * pooled_height * pooled_width * channels;
cudaError_t err;
////////////////////////////////////在此执行真正的forward计算
ROIPoolForward<<<(output_size kThreadsPerBlock - 1) / kThreadsPerBlock, kThreadsPerBlock, 0, stream>>>(
output_size, bottom_data, spatial_scale, height, width, channels, pooled_height,
pooled_width, bottom_rois, top_data, argmax_data);
////////////////////////////////////
err = cudaGetLastError();
if(cudaSuccess != err)
{
fprintf( stderr, "cudaCheckError() failed : %sn", cudaGetErrorString( err ) );
exit( -1 );
}
return 1;
}
真正的 ROI Pooling Forward Kernal 计算:
__global__ void ROIPoolForward(const int nthreads, const float* bottom_data,
const float spatial_scale, const int height, const int width,
const int channels, const int pooled_height, const int pooled_width,
const float* bottom_rois, float* top_data, int* argmax_data)
{
CUDA_1D_KERNEL_LOOP(index, nthreads)
{
// (n, c, ph, pw) is an element in the pooled output
int n = index;
int pw = n % pooled_width;
n /= pooled_width;
int ph = n % pooled_height;
n /= pooled_height;
int c = n % channels;
n /= channels;
bottom_rois = n * 5;
int roi_batch_ind = bottom_rois[0];
int roi_start_w = round(bottom_rois[1] * spatial_scale);
int roi_start_h = round(bottom_rois[2] * spatial_scale);
int roi_end_w = round(bottom_rois[3] * spatial_scale);
int roi_end_h = round(bottom_rois[4] * spatial_scale);
// Force malformed ROIs to be 1x1
int roi_width = fmaxf(roi_end_w - roi_start_w 1, 1);
int roi_height = fmaxf(roi_end_h - roi_start_h 1, 1);
float bin_size_h = (float)(roi_height) / (float)(pooled_height);
float bin_size_w = (float)(roi_width) / (float)(pooled_width);
int hstart = (int)(floor((float)(ph) * bin_size_h));
int wstart = (int)(floor((float)(pw) * bin_size_w));
int hend = (int)(ceil((float)(ph 1) * bin_size_h));
int wend = (int)(ceil((float)(pw 1) * bin_size_w));
// Add roi offsets and clip to input boundaries
hstart = fminf(fmaxf(hstart roi_start_h, 0), height);
hend = fminf(fmaxf(hend roi_start_h, 0), height);
wstart = fminf(fmaxf(wstart roi_start_w, 0), width);
wend = fminf(fmaxf(wend roi_start_w, 0), width);
bool is_empty = (hend <= hstart) || (wend <= wstart);
// Define an empty pooling region to be zero
float maxval = is_empty ? 0 : -FLT_MAX;
// If nothing is pooled, argmax = -1 causes nothing to be backprop'd
int maxidx = -1;
bottom_data = roi_batch_ind * channels * height * width;
for (int h = hstart; h < hend; h) {
for (int w = wstart; w < wend; w) {
// int bottom_index = (h * width w) * channels c;
int bottom_index = (c * height h) * width w;
if (bottom_data[bottom_index] > maxval) {
maxval = bottom_data[bottom_index];
maxidx = bottom_index;
}
}
}
top_data[index] = maxval;
if (argmax_data != NULL)
argmax_data[index] = maxidx;
}
}
代码语言:javascript复制class Faster_RCNN(nn.Module)
def __init__(self, classes=None, debug=False):
super(FasterRCNN, self).__init__()
# roi pooling 之前的定义部分省略
if classes is not None:
self.classes = classes
self.n_classes = len(classes)
self.rpn = RPN()
self.roi_pool = RoIPool(7, 7, 1.0/16)
self.fc6 = FC(512 * 7 * 7, 4096)
self.fc7 = FC(4096, 4096)
self.score_fc = FC(4096, self.n_classes, relu=False)
self.bbox_fc = FC(4096, self.n_classes * 4, relu=False)
def forward(self, im_data, im_info, gt_boxes=None, gt_ishard=None, dontcare_areas=None):
......roi之前的部分省略
pooled_features = self.roi_pool(features, rois)
x = pooled_features.view(pooled_features.size()[0], -1)
x = self.fc6(x)
x = F.dropout(x, training=self.training)
x = self.fc7(x)
x = F.dropout(x, training=self.training)
cls_score = self.score_fc(x)
cls_prob = F.softmax(cls_score)
bbox_pred = self.bbox_fc(x)
return cls_prob, bbox_pred, rois
代码语言:javascript复制def build_loss(self, cls_score, bbox_pred, roi_data):
# classification loss
label = roi_data[1].squeeze()
fg_cnt = torch.sum(label.data.ne(0))
bg_cnt = label.data.numel() - fg_cnt
# for log
if self.debug:
maxv, predict = cls_score.data.max(1)
self.tp = torch.sum(predict[:fg_cnt].eq(label.data[:fg_cnt])) if fg_cnt > 0 else 0
self.tf = torch.sum(predict[fg_cnt:].eq(label.data[fg_cnt:]))
self.fg_cnt = fg_cnt
self.bg_cnt = bg_cnt
ce_weights = torch.ones(cls_score.size()[1])
ce_weights[0] = float(fg_cnt) / bg_cnt
ce_weights = ce_weights.cuda()
cross_entropy = F.cross_entropy(cls_score, label, weight=ce_weights)
# bounding box regression L1 loss
bbox_targets, bbox_inside_weights, bbox_outside_weights = roi_data[2:]
bbox_targets = torch.mul(bbox_targets, bbox_inside_weights)
bbox_pred = torch.mul(bbox_pred, bbox_inside_weights)
loss_box = F.smooth_l1_loss(bbox_pred, bbox_targets, size_average=False) / (fg_cnt 1e-4)
return cross_entropy, loss_box