PCK(Percentage of Correct Keypoints)定义为正确估计出关键点的比例,计算检测的关键点与其对应的groundtruth间的归一化距离小于设定阈值的比例(the percentage of detections that fall within a normalized distance of the ground truth)。 于是就有了PCK@0.5,也就是设定的阈值是0.5。 归一化距离是关键点预测值与人工标注值的欧式距离,进行人体尺度因子的归一化,MPII数据集是以当前人的头部直径作为尺度因子,即头部矩形框的左上点与右下点的欧式距离,使用此尺度因子的姿态估计指标也称PCKh。 需要注意的是PCK是针对于一个人joints的predict和gt,也就是说不存在多么预测结果与gt之前对应的问题,或者说这个对应问题在PCK计算之前就应该解决了,而PCK解决多人姿态估计时使用的方式是在人的维度上进行平均。 从下面的代码也可以看出,距离的计算是一一对应的,而多人的PCK就是求平均值。


from mmpose

def keypoint_pck_accuracy(pred, gt, mask, thr, normalize):
    """Calculate the pose accuracy of PCK for each individual keypoint and the
    averaged accuracy across all keypoints for coordinates.

        PCK metric measures accuracy of the localization of the body joints.
        The distances between predicted positions and the ground-truth ones
        are typically normalized by the bounding box size.
        The threshold (thr) of the normalized distance is commonly set
        as 0.05, 0.1 or 0.2 etc.

        batch_size: N
        num_keypoints: K

        pred (np.ndarray[N, K, 2]): Predicted keypoint location.
        gt (np.ndarray[N, K, 2]): Groundtruth keypoint location.
        mask (np.ndarray[N, K]): Visibility of the target. False for invisible
            joints, and True for visible. Invisible joints will be ignored for
            accuracy calculation.
        thr (float): Threshold of PCK calculation.
        normalize (np.ndarray[N, 2]): Normalization factor for H&W.

        tuple: A tuple containing keypoint accuracy.

        - acc (np.ndarray[K]): Accuracy of each keypoint.
        - avg_acc (float): Averaged accuracy across all keypoints.
        - cnt (int): Number of valid keypoints.
    distances = _calc_distances(pred, gt, mask, normalize)

    acc = np.array([_distance_acc(d, thr) for d in distances])
    valid_acc = acc[acc >= 0]
    cnt = len(valid_acc)
    avg_acc = valid_acc.mean() if cnt > 0 else 0
    return acc, avg_acc, cnt

def _calc_distances(preds, targets, mask, normalize):
    """Calculate the normalized distances between preds and target.

        batch_size: N
        num_keypoints: K
        dimension of keypoints: D (normally, D=2 or D=3)

        preds (np.ndarray[N, K, D]): Predicted keypoint location.
        targets (np.ndarray[N, K, D]): Groundtruth keypoint location.
        mask (np.ndarray[N, K]): Visibility of the target. False for invisible
            joints, and True for visible. Invisible joints will be ignored for
            accuracy calculation.
        normalize (np.ndarray[N, D]): Typical value is heatmap_size

        np.ndarray[K, N]: The normalized distances.
          If target keypoints are missing, the distance is -1.
    N, K, _ = preds.shape
    distances = np.full((N, K), -1, dtype=np.float32)
    # handle invalid values
    normalize[np.where(normalize <= 0)] = 1e6
    distances[mask] = np.linalg.norm(
        ((preds - targets) / normalize[:, None, :])[mask], axis=-1)
    return distances.T

def _distance_acc(distances, thr=0.5):
    """Return the percentage below the distance threshold, while ignoring
    distances values with -1.

        batch_size: N
        distances (np.ndarray[N, ]): The normalized distances.
        thr (float): Threshold of the distances.

        float: Percentage of distances below the threshold.
          If all target keypoints are missing, return -1.
    distance_valid = distances != -1
    num_distance_valid = distance_valid.sum()
    if num_distance_valid > 0:
        return (distances[distance_valid] < thr).sum() / num_distance_valid
    return -1

