目标检测基本概念与性能评价指标计算

Contents

1 前言
2 anchors
3 交并比IOU
4 非极大值抑制算法NMS
- 4.1 NMS介绍
- 4.2 NMS算法
5 Soft NMS算法
6 AP计算
- 6.1 近似计算AP(approximated average precision)
- 6.2 插值计算(Interpolated average precision)
- 6.3 AP计算代码实现
- 6.4 不同数据集map计算方法
7 FLOPs计算
- 7.1 卷积层FLOPs计算
- 7.2 全连接层的 FLOPs 计算
- 7.3 参考资料
8 目标检测度量标准汇总
9 参考资料

前言

不同的问题和不同的数据集都会有不同的模型评价指标，比如分类问题，数据集类别平衡的情况下可以使用准确率作为评价指标，但是现实中的数据集几乎都是类别不平衡的，所以一般都是采用 AP 作为评价指标，分别计算每个类别的 AP，再计算mAP。

anchors

所谓 anchors，实际上就是一组由 generate_anchors.py 生成的矩形框。其中每行的4个值 (x1,y1,x2,y2) 表矩形左上和右下角点坐标。9个矩形共有3种形状，长宽比为大约为 {1:1, 1:2, 2:1} 三种, 实际上通过anchors就引入了检测中常用到的多尺度方法。generate_anchors.py的代码如下：

代码语言：javascript复制

import numpy as np
import six
from six import __init__  # 兼容python2和python3模块


def generate_anchor_base(base_size=16, ratios=[0.5, 1, 2],
                         anchor_scales=[8, 16, 32]):
    """Generate anchor base windows by enumerating aspect ratio and scales.

    Generate anchors that are scaled and modified to the given aspect ratios.
    Area of a scaled anchor is preserved when modifying to the given aspect
    ratio.

    :obj:`R = len(ratios) * len(anchor_scales)` anchors are generated by this
    function.
    The :obj:`i * len(anchor_scales)   j` th anchor corresponds to an anchor
    generated by :obj:`ratios[i]` and :obj:`anchor_scales[j]`.

    For example, if the scale is :math:`8` and the ratio is :math:`0.25`,
    the width and the height of the base window will be stretched by :math:`8`.
    For modifying the anchor to the given aspect ratio,
    the height is halved and the width is doubled.

    Args:
        base_size (number): The width and the height of the reference window.
        ratios (list of floats): This is ratios of width to height of
            the anchors.
        anchor_scales (list of numbers): This is areas of anchors.
            Those areas will be the product of the square of an element in
            :obj:`anchor_scales` and the original area of the reference
            window.

    Returns:
        ~numpy.ndarray:
        An array of shape :math:`(R, 4)`.
        Each element is a set of coordinates of a bounding box.
        The second axis corresponds to
        :math:`(x_{min}, y_{min}, x_{max}, y_{max})` of a bounding box.

    """
    import numpy as np
    py = base_size / 2.
    px = base_size / 2.

    anchor_base = np.zeros((len(ratios) * len(anchor_scales), 4),
                           dtype=np.float32)
    for i in six.moves.range(len(ratios)):
        for j in six.moves.range(len(anchor_scales)):
            h = base_size * anchor_scales[j] * np.sqrt(ratios[i])
            w = base_size * anchor_scales[j] * np.sqrt(1. / ratios[i])

            index = i * len(anchor_scales)   j
            anchor_base[index, 0] = px - w / 2.
            anchor_base[index, 1] = py - h / 2.

            anchor_base[index, 2] = px   h / 2.
            anchor_base[index, 3] = py   w / 2.
    return anchor_base


# test
if __name__ == "__main__":
    bbox_list = generate_anchor_base()
    print(bbox_list)

程序运行输出如下：

[[ -82.50967 -37.254833 53.254833 98.50967 ] [-173.01933 -82.50967 98.50967 189.01933 ] [-354.03867 -173.01933 189.01933 370.03867 ] [ -56. -56. 72. 72. ] [-120. -120. 136. 136. ] [-248. -248. 264. 264. ] [ -37.254833 -82.50967 98.50967 53.254833] [ -82.50967 -173.01933 189.01933 98.50967 ] [-173.01933 -354.03867 370.03867 189.01933 ]]

交并比IOU

交并比（Intersection-over-Union，IoU），目标检测中使用的一个概念，是产生的候选框（candidate bound）与原标记框（ground truth bound）的交叠率，即它们的交集与并集的比值。最理想情况是完全重叠，即比值为1。计算公式如下：

IOU计算公式

IOU计算代码实现如下：

代码语言：javascript复制

# _*_ coding:utf-8 _*_
# 计算iou

"""
bbox的数据结构为(xmin,ymin,xmax,ymax)--(x1,y1,x2,y2),
每个bounding box的左上角和右下角的坐标
输入：
    bbox1, bbox2: Single numpy bounding box, Shape: [4]
输出：
    iou值
"""
import numpy as np
import cv2

def iou(bbox1, bbox2):
    """
    计算两个bbox(两框的交并比)的iou值
    :param bbox1: (x1,y1,x2,y2), type: ndarray or list
    :param bbox2: (x1,y1,x2,y2), type: ndarray or list
    :return: iou, type float
    """
    if type(bbox1) or type(bbox2) != 'ndarray':
        bbox1 = np.array(bbox1)
        bbox2 = np.array(bbox2)

    assert bbox1.size == 4 and bbox2.size == 4, "bounding box coordinate size must be 4"
    xx1 = np.max((bbox1[0], bbox2[0]))
    yy1 = np.max((bbox1[1], bbox1[1]))
    xx2 = np.min((bbox1[2], bbox2[2]))
    yy2 = np.min((bbox1[3], bbox2[3]))
    bwidth = xx2 - xx1
    bheight = yy2 - yy1
    area = bwidth * bheight  # 求两个矩形框的交集
    union = (bbox1[2] - bbox1[0])*(bbox1[3] - bbox1[1])   (bbox2[2] - bbox2[0])*(bbox2[3] - bbox2[1]) - area  # 求两个矩形框的并集
    iou = area / union

    return iou


if __name__=='__main__':
    rect1 = (461, 97, 599, 237)
    # (top, left, bottom, right)
    rect2 = (522, 127, 702, 257)
    iou_ret = round(iou(rect1, rect2), 3) # 保留3位小数
    print(iou_ret)

    # Create a black image
    img=np.zeros((720,720,3), np.uint8)
    cv2.namedWindow('iou_rectangle')
    """
    cv2.rectangle 的 pt1 和 pt2 参数分别代表矩形的左上角和右下角两个点,
    coordinates for the bounding box vertices need to be integers if they are in a tuple,
    and they need to be in the order of (left, top) and (right, bottom). 
    Or, equivalently, (xmin, ymin) and (xmax, ymax).
    """
    cv2.rectangle(img,(461, 97),(599, 237),(0,255,0),3)
    cv2.rectangle(img,(522, 127),(702, 257),(0,255,0),3)
    font  = cv2.FONT_HERSHEY_SIMPLEX
    cv2.putText(img, 'IoU is '   str(iou_ret), (341,400), font, 1,(255,255,255),1)
    cv2.imshow('iou_rectangle', img)
    cv2.waitKey(0)

IoU代码输出结果如下所示：

IoU计算代码输出结果图

非极大值抑制算法NMS

NMS介绍

在目标检测中，常会利用非极大值抑制算法(NMS，non maximum suppression)对生成的大量候选框进行后处理，去除冗余的候选框，得到最佳检测框，以加快目标检测的效率。其本质思想是其思想是搜素局部最大值，抑制非极大值。非极大值抑制，在计算机视觉任务中得到了广泛的应用，例如边缘检测、人脸检测、目标检测（DPM，YOLO，SSD，Faster R-CNN）等。即如下图所示实现效果，消除多余的候选框，找到最佳的 bbox。NMS过程如下图所示：

NMS过程

以上图为例，每个选出来的 Bounding Box 检测框（既BBox）用（x,y,h,w, confidence score，Pdog,Pcat）表示，confidence score 表示 background 和 foreground 的置信度得分，取值范围[0,1]。Pdog, Pcat分布代表类别是狗和猫的概率。如果是 100 类的目标检测模型，BBox 输出向量为 5 100=105。

NMS算法

NMS主要就是通过迭代的形式，不断地以最大得分的框去与其他框做IoU操作，并过滤那些IoU较大的框。其实现的思想主要是将各个框的置信度进行排序，然后选择其中置信度最高的框A，将其作为标准选择其他框，同时设置一个阈值，当其他框B与A的重合程度超过阈值就将B舍弃掉，然后在剩余的框中选择置信度最大的框，重复上述操作。**算法实现过程如下**：

根据候选框类别分类概率排序：F>E>D>C>B>A，并标记最大概率的矩形框F作为标准框。
分别判断A~E与F的重叠度IOU(两框的交并比)是否大于某个设定的阈值，假设B、D与F的重叠度超过阈值，那么就扔掉B、D；
从剩下的矩形框A、C、E中，选择概率最大的E，标记为要保留下来的，然后判读E与A、C的重叠度，扔掉重叠度超过设定阈值的矩形框；
对剩下的bbx，循环执行(2)和(3)直到所有的bbx均满足要求（即不能再移除bbx）

NMS的Python代码如下：

代码语言：javascript复制

import numpy as np

def py_nms(dets, thresh):
    """Pure Python NMS baseline.注意，这里的计算都是在矩阵层面上计算的
    greedily select boxes with high confidence and overlap with current maximum <= thresh
    rule out overlap >= thresh
    :param dets: [[x1, y1, x2, y2 score],] # ndarray, shape(-1,5)
    :param thresh: retain overlap < thresh
    :return: indexes to keep
    """
    # x1、y1、x2、y2、以及score赋值
    x1 = dets[:, 0]
    y1 = dets[:, 1]
    x2 = dets[:, 2]
    y2 = dets[:, 3]

    # 计算每一个候选框的面积, 纯矩阵加和乘法运算,为何加1？
    areas = (x2 - x1   1) * (y2 - y1   1)
    # order是将confidence降序排序后得到的矩阵索引
    order = np.argsort(dets[:, 4])[::-1]
    keep = []
    while order.size > 0:
        i = order[0]
        keep.append(i)
        # 计算当前概率最大矩形框与其他矩形框的相交框的坐标，会用到numpy的broadcast机制，得到的是向量
        xx1 = np.maximum(x1[i], x1[order[1:]])
        yy1 = np.maximum(y1[i], y1[order[1:]])
        xx2 = np.minimum(x2[i], x2[order[1:]])
        yy2 = np.minimum(y2[i], y2[order[1:]])

        # 计算相交框的面积,注意矩形框不相交时w或h算出来会是负数，用0代替
        w = np.maximum(0.0, xx2 - xx1   1)
        h = np.maximum(0.0, yy2 - yy1   1)
        inter = w * h
        # 计算重叠度IOU：重叠面积/（面积1 面积2-重叠面积）
        iou = inter / (areas[i]   areas[order[1:]] - inter)
        # 找到重叠度不高于阈值的矩形框索引
        inds = np.where(iou < thresh)[0]
        # 将order序列更新，由于前面得到的矩形框索引要比矩形框在原order序列中的索引小1，所以要把这个1加回来
        order = order[inds   1]
    return keep

# test
if __name__ == "__main__":
    dets = np.array([[30, 20, 230, 200, 1],
                     [50, 50, 260, 220, 0.9],
                     [210, 30, 420, 5, 0.8],
                     [430, 280, 460, 360, 0.7]])
    thresh = 0.35
    keep_dets = py_nms(dets, thresh)
    print(keep_dets)
    print(dets[keep_dets])

程序输出如下：

[0, 2, 3] [[ 30. 20. 230. 200. 1. ] [210. 30. 420. 5. 0.8] [430. 280. 460. 360. 0.7]]

另一个版本的 nms 的 python 代码如下：

代码语言：javascript复制

from __future__ import print_function
import numpy as np
import time

def intersect(box_a, box_b):
    max_xy = np.minimum(box_a[:, 2:], box_b[2:])
    min_xy = np.maximum(box_a[:, :2], box_b[:2])
    inter = np.clip((max_xy - min_xy), a_min=0, a_max=np.inf)
    return inter[:, 0] * inter[:, 1]

def get_iou(box_a, box_b):
    """Compute the jaccard overlap of two sets of boxes.  The jaccard overlap
    is simply the intersection over union of two boxes.
    E.g.:
        A ∩ B / A ∪ B = A ∩ B / (area(A)   area(B) - A ∩ B)
        The box should be [x1,y1,x2,y2]
    Args:
        box_a: Single numpy bounding box, Shape: [4] or Multiple bounding boxes, Shape: [num_boxes,4]
        box_b: Single numpy bounding box, Shape: [4]
    Return:
        jaccard overlap: Shape: [box_a.shape[0], box_a.shape[1]]
    """
    if box_a.ndim==1:
        box_a=box_a.reshape([1,-1])
    inter = intersect(box_a, box_b)
    area_a = ((box_a[:, 2]-box_a[:, 0]) *
              (box_a[:, 3]-box_a[:, 1]))  # [A,B]
    area_b = ((box_b[2]-box_b[0]) *
              (box_b[3]-box_b[1]))  # [A,B]
    union = area_a   area_b - inter
    return inter / union  # [A,B]

def nms(bboxs,scores,thresh):
    """
    The box should be [x1,y1,x2,y2]
    :param bboxs: multiple bounding boxes, Shape: [num_boxes,4]
    :param scores: The score for the corresponding box
    :return: keep inds
    """
    if len(bboxs)==0:
        return []
    order=scores.argsort()[::-1]
    keep=[]
    while order.size>0:
        i=order[0]
        keep.append(i)
        ious=get_iou(bboxs[order],bboxs[i])
        order=order[ious<=thresh]
    return keep

Soft NMS算法

Soft NMS算法是对NMS算法的改进，是发表在ICCV2017的文章中提出的。NMS算法存在一个问题是可能会把一些目标框给过滤掉，从而导致目标的recall指标比较低。原来的NMS可以描述如下：将IOU大于阈值的窗口的得分全部置为0。

硬NMS

文章的改进有两种形式，一种是线性加权的:

线性加权形式的soft NMS

另一种是高斯加权的：

高斯加权形式的soft NMS

注意，这两种形式，思想都是 (M) 为当前得分最高框，(b_i) 为待处理框， (b_i)和 (M)的 IOU 越大，bbox 的得分 (s_i) 就下降的越厉害( (N_t) 为给定阈值)。更多细节可以参考原论文。soft nms的python代码如下 ：

代码语言：javascript复制

def soft_nms(dets, thresh, type='gaussian'):
    x1 = dets[:, 0]
    y1 = dets[:, 1]
    x2 = dets[:, 2]
    y2 = dets[:, 3]
    scores = dets[:, 4]

    areas = (x2 - x1   1) * (y2 - y1   1)
    order = scores.argsort()[::-1]
    scores = scores[order]

    keep = []
    while order.size > 0:
        i = order[0]
        dets[i, 4] = scores[0]
        keep.append(i)

        xx1 = np.maximum(x1[i], x1[order[1:]])
        yy1 = np.maximum(y1[i], y1[order[1:]])
        xx2 = np.minimum(x2[i], x2[order[1:]])
        yy2 = np.minimum(y2[i], y2[order[1:]])

        w = np.maximum(0.0, xx2 - xx1   1)
        h = np.maximum(0.0, yy2 - yy1   1)
        inter = w * h
        # 计算重叠度IOU：重叠面积/（面积1 面积2-重叠面积）
        ovr = inter / (areas[i]   areas[order[1:]] - inter)

        order = order[1:]
        scores = scores[1:]
        if type == 'linear':
            inds = np.where(ovr >= thresh)[0]
            scores[inds] *= (1 - ovr[inds])
        else:
            scores *= np.exp(- ovr ** 2 / thresh)
        inds = np.where(scores > 1e-3)[0]
        order = order[inds]
        scores = scores[inds]

        tmp = scores.argsort()[::-1]
        order = order[tmp]
        scores = scores[tmp]
    return keep

AP计算

目标检测领域常用的评估标准是：mAP(mean average precision)，计算mAP需要先计算AP，计算AP需涉及到precision和recall的计算，而这两者的计算又需设计TP、FP、FN的计算。

AP的计算一般先设计到P-R曲线（precision-recall curve）的绘制，P-R曲线下面与x轴围成的面积称为average precison（AP）。下图是一个二分类问题的P-R曲线：

分类的PR曲线图

近似计算AP(approximated average precision)

AP = sum_{k=1}^{N}P(k)Delta r(k)

这个计算方式称为 approximated 形式的，而插值计算的方式里面这个是最精确的，每个样本点都参与了计算
很显然位于一条竖直线上的点对计算AP没有贡献
这里N为数据总量，k为每个样本点的索引， (Δr(k)=r(k)−r(k−1))

近似计算AP和绘制PR曲线代码如下：

代码语言：javascript复制

import numpy as np
import matplotlib.pyplot as plt

class_names = ["car", "pedestrians", "bicycle"]

def draw_PR_curve(predict_scores, eval_labels, name, cls_idx=1):
    """calculate AP and draw PR curve, there are 3 types
    Parameters:
    @all_scores: single test dataset predict scores array, (-1, 3)
    @all_labels: single test dataset predict label array, (-1, 3)
    @cls_idx: the serial number of the AP to be calculated, example: 0,1,2,3...
    """
    # print('sklearn Macro-F1-Score:', f1_score(predict_scores, eval_labels, average='macro'))
    global class_names
    fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(15, 10))
    # Rank the predicted scores from large to small, extract their corresponding index(index number), and generate an array
    idx = predict_scores[:, cls_idx].argsort()[::-1]
    eval_labels_descend = eval_labels[idx]
    pos_gt_num = np.sum(eval_labels == cls_idx) # number of all gt

    predict_results = np.ones_like(eval_labels)
    tp_arr = np.logical_and(predict_results == cls_idx, eval_labels_descend == cls_idx) # ndarray
    fp_arr = np.logical_and(predict_results == cls_idx, eval_labels_descend != cls_idx)

    tp_cum = np.cumsum(tp_arr).astype(float) # ndarray, Cumulative sum of array elements.
    fp_cum = np.cumsum(fp_arr).astype(float)

    precision_arr = tp_cum / (tp_cum   fp_cum) # ndarray
    recall_arr = tp_cum / pos_gt_num
    ap = 0.0
    prev_recall = 0
    for p, r in zip(precision_arr, recall_arr):
      ap  = p * (r - prev_recall)
      # pdb.set_trace()
      prev_recall = r
    print("------%s, ap: %f-----" % (name, ap))

    fig_label = '[%s, %s] ap=%f' % (name, class_names[cls_idx], ap)
    ax.plot(recall_arr, precision_arr, label=fig_label)

    ax.legend(loc="lower left")
    ax.set_title("PR curve about class: %s" % (class_names[cls_idx]))
    ax.set(xticks=np.arange(0., 1, 0.05), yticks=np.arange(0., 1, 0.05))
    ax.set(xlabel="recall", ylabel="precision", xlim=[0, 1], ylim=[0, 1])

    fig.savefig("./pr-curve-%s.png" % class_names[cls_idx])
    plt.close(fig)

插值计算(Interpolated average precision)

插值计算AP的公式的演变过程我不写了，详情可以参考这篇文章，我这里的公式和图也是参考此文章的。公式如下：

这是通常意义上的 11points_Interpolated 形式的 AP，选取固定的 {0,0.1,0.2,…,1.0} 11个阈值，这个在PASCAL2007中有使用；
因为参与计算的只有11个点，所以 K=11，称为11points_Interpolated，k为阈值索引；
(Pinterp(k)) 取第 k 个阈值所对应的样本点之后的样本中的最大值，只不过这里的阈值被限定在了 ({0,0.1,0.2,…,1.0}) 范围内。

插值计算方式计算AP

从曲线上看，真实 AP< approximated AP < Interpolated AP，11-points Interpolated AP 可能大也可能小，当数据量很多的时候会接近于 Interpolated AP，与 Interpolated AP不同，前面的公式中计算AP时都是对PR曲线的面积估计，PASCAL的论文里给出的公式就更加简单粗暴了，直接计算11个阈值处的precision的平均值。如下图：

11点方式计算AP公式

AP计算代码实现

不同数据集map计算方法

由于 mAP 是数据集中所有类别 AP 值得平均，所以我们要计算mAP，首先得知道某一类别的AP值怎么求。不同数据集的某类别的AP计算方法大同小异，主要分为三种：

（1）在 VOC2007，只需要选取当Recall >= 0, 0.1, 0.2, …, 1共11个点时的Precision最大值，然后AP就是这11个Precision的平均值，map就是所有类别AP值的平均。（2）在 VOC2010 及以后，需要针对每一个不同的 Recall 值（包括0和1），选取其大于等于这些 Recall 值时的 Precision 最大值，然后计算PR曲线下面积作为AP值，map就是所有类别AP值的平均。（3）COCO 数据集，设定多个IOU阈值（0.5-0.95,0.05为步长），在每一个IOU阈值下都有某一类别的AP值，然后求不同IOU阈值下的AP平均，就是所求的最终的某类别的AP值。

VOC 数据集中计算 AP 的代码（用的是插值计算方法，代码出自py-faster-rcnn仓库）如下：

1, 在给定 recal 和 precision 的条件下计算AP：

代码语言：javascript复制

def voc_ap(rec, prec, use_07_metric=False):
    """ 
    ap = voc_ap(rec, prec, [use_07_metric])
    Compute VOC AP given precision and recall.
    If use_07_metric is true, uses the
    VOC 07 11 point method (default:False).
    """
    if use_07_metric:
        # 11 point metric
        ap = 0.
        for t in np.arange(0., 1.1, 0.1):
            if np.sum(rec >= t) == 0:
                p = 0
            else:
                p = np.max(prec[rec >= t])
            ap = ap   p / 11.
    else:
        # correct AP calculation
        # first append sentinel values at the end
        mrec = np.concatenate(([0.], rec, [1.]))
        mpre = np.concatenate(([0.], prec, [0.]))

        # compute the precision envelope
        for i in range(mpre.size - 1, 0, -1):
            mpre[i - 1] = np.maximum(mpre[i - 1], mpre[i])

        # to calculate area under PR curve, look for points
        # where X axis (recall) changes value
        i = np.where(mrec[1:] != mrec[:-1])[0]

        # and sum (Delta recall) * prec
        ap = np.sum((mrec[i   1] - mrec[i]) * mpre[i   1])
    return ap

给定目标检测结果文件和测试集标签文件 xml 等计算AP：

代码语言：javascript复制

def parse_rec(filename):
    """ Parse a PASCAL VOC xml file 
    Return : list, element is dict.
    """
    tree = ET.parse(filename)
    objects = []
    for obj in tree.findall('object'):
        obj_struct = {}
        obj_struct['name'] = obj.find('name').text
        obj_struct['pose'] = obj.find('pose').text
        obj_struct['truncated'] = int(obj.find('truncated').text)
        obj_struct['difficult'] = int(obj.find('difficult').text)
        bbox = obj.find('bndbox')
        obj_struct['bbox'] = [int(bbox.find('xmin').text),
                              int(bbox.find('ymin').text),
                              int(bbox.find('xmax').text),
                              int(bbox.find('ymax').text)]
        objects.append(obj_struct)

    return objects


def voc_eval(detpath,
             annopath,
             imagesetfile,
             classname,
             cachedir,
             ovthresh=0.5,
             use_07_metric=False):
    """rec, prec, ap = voc_eval(detpath,
                                annopath,
                                imagesetfile,
                                classname,
                                [ovthresh],
                                [use_07_metric])
    Top level function that does the PASCAL VOC evaluation.
    detpath: Path to detections result file
        detpath.format(classname) should produce the detection results file.
    annopath: Path to annotations file
        annopath.format(imagename) should be the xml annotations file.
    imagesetfile: Text file containing the list of images, one image per line.
    classname: Category name (duh)
    cachedir: Directory for caching the annotations
    [ovthresh]: Overlap threshold (default = 0.5)
    [use_07_metric]: Whether to use VOC07's 11 point AP computation
        (default False)
    """
    # assumes detections are in detpath.format(classname)
    # assumes annotations are in annopath.format(imagename)
    # assumes imagesetfile is a text file with each line an image name
    # cachedir caches the annotations in a pickle file

    # first load gt
    if not os.path.isdir(cachedir):
        os.mkdir(cachedir)
    cachefile = os.path.join(cachedir, '%s_annots.pkl' % imagesetfile)
    # read list of images
    with open(imagesetfile, 'r') as f:
        lines = f.readlines()
    imagenames = [x.strip() for x in lines]

    if not os.path.isfile(cachefile):
        # load annotations
        recs = {}
        for i, imagename in enumerate(imagenames):
            recs[imagename] = parse_rec(annopath.format(imagename))
            if i % 100 == 0:
                print('Reading annotation for {:d}/{:d}'.format(
                    i   1, len(imagenames)))
        # save
        print('Saving cached annotations to {:s}'.format(cachefile))
        with open(cachefile, 'wb') as f:
            pickle.dump(recs, f)
    else:
        # load
        with open(cachefile, 'rb') as f:
            try:
                recs = pickle.load(f)
            except:
                recs = pickle.load(f, encoding='bytes')

    # extract gt objects for this class
    class_recs = {}
    npos = 0
    for imagename in imagenames:
        R = [obj for obj in recs[imagename] if obj['name'] == classname]
        bbox = np.array([x['bbox'] for x in R])
        difficult = np.array([x['difficult'] for x in R]).astype(np.bool)
        det = [False] * len(R)
        npos = npos   sum(~difficult)
        class_recs[imagename] = {'bbox': bbox,
                                 'difficult': difficult,
                                 'det': det}

    # read dets
    detfile = detpath.format(classname)
    with open(detfile, 'r') as f:
        lines = f.readlines()

    splitlines = [x.strip().split(' ') for x in lines]
    image_ids = [x[0] for x in splitlines]
    confidence = np.array([float(x[1]) for x in splitlines])
    BB = np.array([[float(z) for z in x[2:]] for x in splitlines])

    nd = len(image_ids)
    tp = np.zeros(nd)
    fp = np.zeros(nd)

    if BB.shape[0] > 0:
        # sort by confidence
        sorted_ind = np.argsort(-confidence)
        sorted_scores = np.sort(-confidence)
        BB = BB[sorted_ind, :]
        image_ids = [image_ids[x] for x in sorted_ind]

        # go down dets and mark TPs and FPs
        for d in range(nd):
            R = class_recs[image_ids[d]]
            bb = BB[d, :].astype(float)
            ovmax = -np.inf
            BBGT = R['bbox'].astype(float)

            if BBGT.size > 0:
                # compute overlaps
                # intersection
                ixmin = np.maximum(BBGT[:, 0], bb[0])
                iymin = np.maximum(BBGT[:, 1], bb[1])
                ixmax = np.minimum(BBGT[:, 2], bb[2])
                iymax = np.minimum(BBGT[:, 3], bb[3])
                iw = np.maximum(ixmax - ixmin   1., 0.)
                ih = np.maximum(iymax - iymin   1., 0.)
                inters = iw * ih

                # union
                uni = ((bb[2] - bb[0]   1.) * (bb[3] - bb[1]   1.)  
                       (BBGT[:, 2] - BBGT[:, 0]   1.) *
                       (BBGT[:, 3] - BBGT[:, 1]   1.) - inters)

                overlaps = inters / uni
                ovmax = np.max(overlaps)
                jmax = np.argmax(overlaps)

            if ovmax > ovthresh:
                if not R['difficult'][jmax]:
                    if not R['det'][jmax]:
                        tp[d] = 1.
                        R['det'][jmax] = 1
                    else:
                        fp[d] = 1.
            else:
                fp[d] = 1.

    # compute precision recall
    fp = np.cumsum(fp)
    tp = np.cumsum(tp)
    rec = tp / float(npos)
    # avoid divide by zero in case the first detection matches a difficult
    # ground truth
    prec = tp / np.maximum(tp   fp, np.finfo(np.float64).eps)
    ap = voc_ap(rec, prec, use_07_metric)

    return rec, prec, ap

FLOPs计算

FLOPs：floating point operations 指的是浮点运算次数,理解为计算量，可以用来衡量算法/模型的复杂度。
FLOPS：（全部大写）,Floating-point Operations Per Second，每秒所执行的浮点运算次数，理解为计算速度,是一个衡量硬件性能的指标。
MACC：multiply-accumulate，乘法累加次数。

卷积层FLOPs计算

(FLOPs=(2times C_itimes K^2-1)times Htimes Wtimes C_o)（不考虑bias） (FLOPs=(2times C_itimes K^2)times Htimes Wtimes C_o)（考虑bias (MACC=(C_itimes K^2)times Htimes Wtimes C_o)（考虑bias）

(C_i) 为输入特征图通道数，(K) 为过滤器尺寸，(H,W,C_o)为输出特征图的高，宽和通道数。二维卷积过程如下图所示：

二维卷积是一个相当简单的操作：从卷积核开始，这是一个小的权值矩阵。这个卷积核在 2 维输入数据上滑动，对当前输入的部分元素进行矩阵乘法，然后将结果汇为单个输出像素。

公式解释，参考这里，如下：

理解 FLOPs 的计算公式分两步，括号内是第一步，计算出 output feature map 的一个 pixel，然后再乘以 (Htimes Wtimes C_o)，从而拓展到整个 output feature map。括号内的部分又可以分为两步：((2times C_itimes K^2-1)=(C_itimes K^2) (C_itimes K^2-1))。第一项是乘法运算次数，第二项是加法运算次数，因为 (n) 个数相加，要加 (n-1)次，所以不考虑 bias 的情况下，会有一个-1，如果考虑 bias，刚好中和掉，括号内变为( (2times C_itimes K^2))。

所以卷积层的 (FLOPs=(2times C_{i}times K^2-1)times Htimes Wtimes C_o) ( (C_i) 为输入特征图通道数，(K) 为过滤器尺寸，(H, W, C_o)为输出特征图的高，宽和通道数)。

全连接层的 FLOPs 计算

全连接层的 (FLOPs = (2I − 1)O)，(I) 是输入层的维度，(O) 是输出层的维度。

参考资料

PRUNING CONVOLUTIONAL NEURAL NETWORKS FOR RESOURCE EFFICIENT INFERENCE

目标检测度量标准汇总

参考资料

目标检测评价标准-AP mAP
目标检测的性能评价指标
Soft-NMS
Recent Advances in Deep Learning for Object Detection
A Simple and Fast Implementation of Faster R-CNN

图像识别编程算法

0 人点赞