Overview
PCK是mpii使用的人体关键点估计评价标准,在coco之前,PCK一直是比较主流的metric,包括deepfashion,fashionAI等,都是使用的此标准。
PCK
PCK(Percentage of Correct Keypoints)定义为正确估计出关键点的比例,计算检测的关键点与其对应的groundtruth间的归一化距离小于设定阈值的比例(the percentage of detections that fall within a normalized distance of the ground truth)。 于是就有了PCK@0.5,也就是设定的阈值是0.5。 归一化距离是关键点预测值与人工标注值的欧式距离,进行人体尺度因子的归一化,MPII数据集是以当前人的头部直径作为尺度因子,即头部矩形框的左上点与右下点的欧式距离,使用此尺度因子的姿态估计指标也称PCKh。 需要注意的是PCK是针对于一个人joints的predict和gt,也就是说不存在多么预测结果与gt之前对应的问题,或者说这个对应问题在PCK计算之前就应该解决了,而PCK解决多人姿态估计时使用的方式是在人的维度上进行平均。 从下面的代码也可以看出,距离的计算是一一对应的,而多人的PCK就是求平均值。
code
from mmpose
代码语言:javascript复制def keypoint_pck_accuracy(pred, gt, mask, thr, normalize):
"""Calculate the pose accuracy of PCK for each individual keypoint and the
averaged accuracy across all keypoints for coordinates.
Note:
PCK metric measures accuracy of the localization of the body joints.
The distances between predicted positions and the ground-truth ones
are typically normalized by the bounding box size.
The threshold (thr) of the normalized distance is commonly set
as 0.05, 0.1 or 0.2 etc.
batch_size: N
num_keypoints: K
Args:
pred (np.ndarray[N, K, 2]): Predicted keypoint location.
gt (np.ndarray[N, K, 2]): Groundtruth keypoint location.
mask (np.ndarray[N, K]): Visibility of the target. False for invisible
joints, and True for visible. Invisible joints will be ignored for
accuracy calculation.
thr (float): Threshold of PCK calculation.
normalize (np.ndarray[N, 2]): Normalization factor for H&W.
Returns:
tuple: A tuple containing keypoint accuracy.
- acc (np.ndarray[K]): Accuracy of each keypoint.
- avg_acc (float): Averaged accuracy across all keypoints.
- cnt (int): Number of valid keypoints.
"""
distances = _calc_distances(pred, gt, mask, normalize)
acc = np.array([_distance_acc(d, thr) for d in distances])
valid_acc = acc[acc >= 0]
cnt = len(valid_acc)
avg_acc = valid_acc.mean() if cnt > 0 else 0
return acc, avg_acc, cnt
def _calc_distances(preds, targets, mask, normalize):
"""Calculate the normalized distances between preds and target.
Note:
batch_size: N
num_keypoints: K
dimension of keypoints: D (normally, D=2 or D=3)
Args:
preds (np.ndarray[N, K, D]): Predicted keypoint location.
targets (np.ndarray[N, K, D]): Groundtruth keypoint location.
mask (np.ndarray[N, K]): Visibility of the target. False for invisible
joints, and True for visible. Invisible joints will be ignored for
accuracy calculation.
normalize (np.ndarray[N, D]): Typical value is heatmap_size
Returns:
np.ndarray[K, N]: The normalized distances.
If target keypoints are missing, the distance is -1.
"""
N, K, _ = preds.shape
distances = np.full((N, K), -1, dtype=np.float32)
# handle invalid values
normalize[np.where(normalize <= 0)] = 1e6
distances[mask] = np.linalg.norm(
((preds - targets) / normalize[:, None, :])[mask], axis=-1)
return distances.T
def _distance_acc(distances, thr=0.5):
"""Return the percentage below the distance threshold, while ignoring
distances values with -1.
Note:
batch_size: N
Args:
distances (np.ndarray[N, ]): The normalized distances.
thr (float): Threshold of the distances.
Returns:
float: Percentage of distances below the threshold.
If all target keypoints are missing, return -1.
"""
distance_valid = distances != -1
num_distance_valid = distance_valid.sum()
if num_distance_valid > 0:
return (distances[distance_valid] < thr).sum() / num_distance_valid
return -1