参考
coco eval 解析
COCO目标检测比赛中的模型评价指标介绍!
coco 的评估函数对应的是 pycocotools 中的 cocoeval.py 文件。
从整体上来看,整个 COCOeval 类的框架如图:
基础的用法为
# The usage for CocoEval is as follows:
cocoGt=..., cocoDt=... # load dataset and results
E = CocoEval(cocoGt,cocoDt); # initialize CocoEval object
E.params.recThrs = ...; # set parameters as desired
E.evaluate(); # run per image evaluation
E.accumulate(); # accumulate per image results
E.summarize(); # display summary metrics of results
cocoGt, cocoDt 应该是什么格式?如果是COCO 格式,注意需要增加 score 值。(how?)
__init__ 初始化函数
参数解释如下:
注意几个字母的含义
N: 用于评估的img_id 的个数
K: 用于评估的cat_id 的个数
T: iouThrs 的个数
R: recThrs 的个数
A: 对象面积分段后的数量
M: maxDets 每张图片检测的最大检测框数量
_prepare
根据传入的初始化参数做一些前置化的处理
def _prepare(self):
'''
Prepare ._gts and ._dts for evaluation based on params
:return: None
'''
def _toMask(anns, coco):
# modify ann['segmentation'] by reference
for ann in anns:
rle = coco.annToRLE(ann)
ann['segmentation'] = rle
p = self.params
if p.useCats:
gts=self.cocoGt.loadAnns(self.cocoGt.getAnnIds(imgIds=p.imgIds, catIds=p.catIds))
dts=self.cocoDt.loadAnns(self.cocoDt.getAnnIds(imgIds=p.imgIds, catIds=p.catIds))
else:
gts=self.cocoGt.loadAnns(self.cocoGt.getAnnIds(imgIds=p.imgIds))
dts=self.cocoDt.loadAnns(self.cocoDt.getAnnIds(imgIds=p.imgIds))
# convert ground truth to mask if iouType == 'segm'
if p.iouType == 'segm':
_toMask(gts, self.cocoGt)
_toMask(dts, self.cocoDt)
# set ignore flag
for gt in gts:
gt['ignore'] = gt['ignore'] if 'ignore' in gt else 0
gt['ignore'] = 'iscrowd' in gt and gt['iscrowd']
if p.iouType == 'keypoints':
gt['ignore'] = (gt['num_keypoints'] == 0) or gt['ignore']
self._gts = defaultdict(list) # gt for evaluation
self._dts = defaultdict(list) # dt for evaluation
for gt in gts:
self._gts[gt['image_id'], gt['category_id']].append(gt)
for dt in dts:
self._dts[dt['image_id'], dt['category_id']].append(dt)
self.evalImgs = defaultdict(list) # per-image per-category evaluation results
self.eval = {} # accumulated evaluation results
computeIoU(self, imgId, catId):
根据image_id和cat_id计算这张图片里 cat_id 的所有GT、DT的iou矩阵,主要用于bbox和segmentation;
这里就是涉及到单张图片的单个类别的计算。
computeOks(self, imgId, catId):
根据image_id和cat_id计算这张图片里所有GT、DT的Oks矩阵,也就是Sec 1.2.里OKS的计算源码出处。这里OKS矩阵的维度是
OKS 矩阵是什么?
所以是只有是 keypoints 的检测,才使用这个函数?
if p.iouType == 'segm' or p.iouType == 'bbox':
computeIoU = self.computeIoU
elif p.iouType == 'keypoints':
computeIoU = self.computeOks
evaluateImg
perform evaluation for single category and image.
对单张图片的单个类别做统计。
maxDets 每张图片的最大检测数
useCats 指定类别评估
cocoGt, cocoDt 都是 COCO API 数据
过程会计算每张图的结果吗?会的,每张图每个类别分别计算,最后汇总的。
evaluateImg来计算每一张图片、每一个类别在不同条件下的检测结果;
precision(T,R,K,A,M) recall(T,K,A,M)。
TKAM 分别是代表什么?什么意思?
cocoEval.evaluate() 只是每幅图的det和gt做了匹配,并将结果存在了self.evalImgs中。计算tp等指标需要cocoEval.accumulate()。
针对上述accumulate获得的precision、recall矩阵,在不同的维度上进行统计,然后再呈现结果。
函数内部会根据传入的具体的IoU阈值,面积阈值,最大检测数的值返回上述precision和recall中对应维的检测结果,我们就也可以自定义形式返回我们想要的各种参数下的AP与AR啦。
coco api 的 loadRes 怎么理解?
COCO API-COCO模块在det中的应用
def evaluateImg(self, imgId, catId, aRng, maxDet):
'''
perform evaluation for single category and image
:return: dict (single image results)
'''
p = self.params
if p.useCats:
gt = self._gts[imgId,catId]
dt = self._dts[imgId,catId]
else:
gt = [_ for cId in p.catIds for _ in self._gts[imgId,cId]]
dt = [_ for cId in p.catIds for _ in self._dts[imgId,cId]]
if len(gt) == 0 and len(dt) ==0:
return None
for g in gt:
if g['ignore'] or (g['area']<aRng[0] or g['area']>aRng[1]):
g['_ignore'] = 1
else:
g['_ignore'] = 0
# sort dt highest score first, sort gt ignore last
gtind = np.argsort([g['_ignore'] for g in gt], kind='mergesort')
gt = [gt[i] for i in gtind]
dtind = np.argsort([-d['score'] for d in dt], kind='mergesort')
dt = [dt[i] for i in dtind[0:maxDet]]
iscrowd = [int(o['iscrowd']) for o in gt]
# load computed ious
ious = self.ious[imgId, catId][:, gtind] if len(self.ious[imgId, catId]) > 0 else self.ious[imgId, catId]
T = len(p.iouThrs)
G = len(gt)
D = len(dt)
gtm = np.zeros((T,G))
dtm = np.zeros((T,D))
gtIg = np.array([g['_ignore'] for g in gt])
dtIg = np.zeros((T,D))
if not len(ious)==0:
for tind, t in enumerate(p.iouThrs):
for dind, d in enumerate(dt):
# information about best match so far (m=-1 -> unmatched)
iou = min([t,1-1e-10])
m = -1
for gind, g in enumerate(gt):
# if this gt already matched, and not a crowd, continue
if gtm[tind,gind]>0 and not iscrowd[gind]:
continue
# if dt matched to reg gt, and on ignore gt, stop
if m>-1 and gtIg[m]==0 and gtIg[gind]==1:
break
# continue to next gt unless better match made
if ious[dind,gind] < iou:
continue
# if match successful and best so far, store appropriately
iou=ious[dind,gind]
m=gind
# if match made store id of match for both dt and gt
if m ==-1:
continue
dtIg[tind,dind] = gtIg[m]
dtm[tind,dind] = gt[m]['id']
gtm[tind,m] = d['id']
# set unmatched detections outside of area range to ignore
a = np.array([d['area']<aRng[0] or d['area']>aRng[1] for d in dt]).reshape((1, len(dt)))
dtIg = np.logical_or(dtIg, np.logical_and(dtm==0, np.repeat(a,T,0)))
# store results for given image and category
return {
'image_id': imgId,
'category_id': catId,
'aRng': aRng,
'maxDet': maxDet,
'dtIds': [d['id'] for d in dt],
'gtIds': [g['id'] for g in gt],
'dtMatches': dtm,
'gtMatches': gtm,
'dtScores': [d['score'] for d in dt],
'gtIgnore': gtIg,
'dtIgnore': dtIg,
}
Accumulate
Accumulate per image evaluation results and store the result in self.eval
accumulate 部分是否可以区分图片?
结果部分保存在 self.eval 中。
self.eval = {
'params': p,
'counts': [T, R, K, A, M],
'date': datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
'precision': precision,
'recall': recall,
'scores': scores,
}
结合 mmdet 中的 cocometric
mmdet/evaluation/metrics/coco_metric.py
result2json 将结果格式化为coco格式。
# convert predictions to coco format and dump to json file
result_files = self.results2json(preds, outfile_prefix)
/home/my_mmdet/demo/inference_demo.ipynb 已经给出了不同场景下的推理
- 一张图片
- 一个文件夹
确认一下这两种情况是否经过了完整的 预处理?
确认 mmdet 预测的结果格式?
然后保留一份 json 作为 cocoeval 实验的example.
mmdet 中的 cocometric, 更像是一个过程评估器。
需要不断通过process的方式处理gt和pred?
先 process, 再 compute_metric?
模型在处理的过程中,会生成带有 metainfo,img_id 的预测结果。但是在自己调用 detinferencer 的时候却不会生成?为何?
- 如何解决?让自己更容易简易调用数据结果?
- dumpresult 为pkl是怎么实现的?
gt 也是在process 这个函数中的 data_batch 中加入的,额,不是,是在 datasamples 中返回的。
def process(self, data_batch: dict, data_samples: Sequence[dict]) -> None:
"""Process one batch of data samples and predictions. The processed
results should be stored in ``self.results``, which will be used to
compute the metrics when all batches have been processed.
Args:
data_batch (dict): A batch of data from the dataloader.
data_samples (Sequence[dict]): A batch of data samples that
contain annotations and predictions.
"""
for data_sample in data_samples:
result = dict()
pred = data_sample['pred_instances']
result['img_id'] = data_sample['img_id']
result['bboxes'] = pred['bboxes'].cpu().numpy()
result['scores'] = pred['scores'].cpu().numpy()
result['labels'] = pred['labels'].cpu().numpy()
# encode mask to RLE
if 'masks' in pred:
result['masks'] = encode_mask_results(
pred['masks'].detach().cpu().numpy()) if isinstance(
pred['masks'], torch.Tensor) else pred['masks']
# some detectors use different scores for bbox and mask
if 'mask_scores' in pred:
result['mask_scores'] = pred['mask_scores'].cpu().numpy()
# parse gt
gt = dict()
gt['width'] = data_sample['ori_shape'][1]
gt['height'] = data_sample['ori_shape'][0]
gt['img_id'] = data_sample['img_id']
if self._coco_api is None:
# TODO: Need to refactor to support LoadAnnotations
assert 'instances' in data_sample, \
'ground truth is required for evaluation when ' \
'`ann_file` is not provided'
gt['anns'] = data_sample['instances']
# add converted result to the results list
self.results.append((gt, result))
coco 接口
这两个接口是否可以帮助不通过json构造coco?
loadRes
将结果转换为 loadNumpyAnnotations 输入格式、
list:
ann 一定要求包括以下几个 key, score 以及别的key看你心情加?
- image_id
- segmentation
- bbox
- score??? 有score 吗?
loadNumpyAnnotations
def loadNumpyAnnotations(self, data):
"""
Convert result data from a numpy array [Nx7] where each row contains {imageID,x1,y1,w,h,score,class}
:param data (numpy.ndarray)
:return: annotations (python nested list)
"""
print('Converting ndarray to lists...')
assert(type(data) == np.ndarray)
print(data.shape)
assert(data.shape[1] == 7)
N = data.shape[0]
ann = []
for i in range(N):
if i % 1000000 == 0:
print('{}/{}'.format(i,N))
ann += [{
'image_id' : int(data[i, 0]),
'bbox' : [ data[i, 1], data[i, 2], data[i, 3], data[i, 4] ],
'score' : data[i, 5],
'category_id': int(data[i, 6]),
}]
return ann
self.datasets
datasets 是个什么?
mask 这块是一个比较细节的地方
mmdet 返回的mask, 和我们输入的格式不同,一种是 polygon,还有一种?rle?
import pycocotools._mask as _mask
然后这个 mask 的解析, coco metrics 里已经给了一个案例了。
做了一个 annotation 出来而已。
下一步是写出来,然后是继续到最后,detect完整个逻辑(3小时?)
mAP 的计算中
其实有一个比较诡异的问题,边界case 是如何处理的?比如gt为0?dt为0?
计算每张图片的 mAP
# 计算每张图像的 mAP
per_image_mAPs = []
for img_id in coco_api.getImgIds():
coco_eval.params.imgIds = [img_id]
coco_eval.evaluate()
coco_eval.accumulate()
coco_eval.summarize()
# 获取每张图像的 mAP 值
per_image_mAPs.append(coco_eval.stats[1])
# 打印每张图像的 mAP 值
for i, mAP in enumerate(per_image_mAPs):
print(f"mAP for image {i + 1}: {mAP}")
关键逻辑盘点
iou 的计算
直接算两者的面积交集/并集
import numpy as np
import torch
def bbox_iou(box1, box2, x1y1x2y2=True):
"""
Returns the IoU of two bounding boxes
x1y1x2y2: x1y1x2y2 (True) or xywh (False)
"""
if not x1y1x2y2:
# Transform from xywh to x1y1x2y2
b1_x1, b1_x2 = box1[:, 0] - box1[:, 2] / 2, box1[:, 0] + box1[:, 2] / 2
b1_y1, b1_y2 = box1[:, 1] - box1[:, 3] / 2, box1[:, 1] + box1[:, 3] / 2
b2_x1, b2_x2 = box2[:, 0] - box2[:, 2] / 2, box2[:, 0] + box2[:, 2] / 2
b2_y1, b2_y2 = box2[:, 1] - box2[:, 3] / 2, box2[:, 1] + box2[:, 3] / 2
else:
# Get the coordinates of bounding boxes
b1_x1, b1_y1, b1_x2, b1_y2 = box1[:, 0], box1[:, 1], box1[:, 2], box1[:, 3]
b2_x1, b2_y1, b2_x2, b2_y2 = box2[:, 0], box2[:, 1], box2[:, 2], box2[:, 3]
# get the corrdinates of the intersection rectangle
inter_rect_x1 = torch.max(b1_x1, b2_x1)
inter_rect_y1 = torch.max(b1_y1, b2_y1)
inter_rect_x2 = torch.min(b1_x2, b2_x2)
inter_rect_y2 = torch.min(b1_y2, b2_y2)
# Intersection area
inter_area = torch.clamp(inter_rect_x2 - inter_rect_x1 + 1, min=0) * torch.clamp(
inter_rect_y2 - inter_rect_y1 + 1, min=0
)
# Union Area
b1_area = (b1_x2 - b1_x1 + 1) * (b1_y2 - b1_y1 + 1)
b2_area = (b2_x2 - b2_x1 + 1) * (b2_y2 - b2_y1 + 1)
union_area = b1_area + b2_area - inter_area
iou = inter_area / union_area
return iou
def mask_iou(mask1, mask2):
"""
Returns the IoU of two masks
"""
intersection = np.logical_and(mask1, mask2)
union = np.logical_or(mask1, mask2)
iou = np.sum(intersection) / np.sum(union)
return iou
match
按 conf 排序第一个满足 iou 阈值的要求的框?
yolo 中 match 的逻辑好像是 conf 满足阈值后,确认其预测的类,取 iou 最大的框。
def match_predictions(self, pred_classes, true_classes, iou, use_scipy=False):
"""
Matches predictions to ground truth objects (pred_classes, true_classes) using IoU.
Args:
pred_classes (torch.Tensor): Predicted class indices of shape(N,).
true_classes (torch.Tensor): Target class indices of shape(M,).
iou (torch.Tensor): An NxM tensor containing the pairwise IoU values for predictions and ground of truth
use_scipy (bool): Whether to use scipy for matching (more precise).
Returns:
(torch.Tensor): Correct tensor of shape(N,10) for 10 IoU thresholds.
"""
# Dx10 matrix, where D - detections, 10 - IoU thresholds
correct = np.zeros((pred_classes.shape[0], self.iouv.shape[0])).astype(bool)
# LxD matrix where L - labels (rows), D - detections (columns)
correct_class = true_classes[:, None] == pred_classes
iou = iou * correct_class # zero out the wrong classes
iou = iou.cpu().numpy()
for i, threshold in enumerate(self.iouv.cpu().tolist()):
if use_scipy:
# WARNING: known issue that reduces mAP in https://github.com/ultralytics/ultralytics/pull/4708
import scipy # scope import to avoid importing for all commands
cost_matrix = iou * (iou >= threshold)
if cost_matrix.any():
labels_idx, detections_idx = scipy.optimize.linear_sum_assignment(cost_matrix, maximize=True)
valid = cost_matrix[labels_idx, detections_idx] > 0
if valid.any():
correct[detections_idx[valid], i] = True
else:
matches = np.nonzero(iou >= threshold) # IoU > threshold and classes match
matches = np.array(matches).T
if matches.shape[0]:
if matches.shape[0] > 1:
matches = matches[iou[matches[:, 0], matches[:, 1]].argsort()[::-1]]
matches = matches[np.unique(matches[:, 1], return_index=True)[1]]
# matches = matches[matches[:, 2].argsort()[::-1]]
matches = matches[np.unique(matches[:, 0], return_index=True)[1]]
correct[matches[:, 1].astype(int), i] = True
return torch.tensor(correct, dtype=torch.bool, device=pred_classes.device)
map 的计算
问题
- iouType to ‘segm’, ‘bbox’ or ‘keypoints’ 有什么区别?
- maxDets - [1 10 100] M=3 thresholds on max detections per image 这个需要根据实际情况调整吗?