coco_eval 使用

原创已于 2024-07-30 11:11:28 修改 · 1.8k 阅读

23 ·

CC 4.0 BY-SA版权

文章标签：

#目标跟踪 #人工智能 #计算机视觉

于 2024-07-10 15:19:58 首次发布

Python 同时被 2 个专栏收录

29 篇文章

订阅专栏

计算机视觉

19 篇文章

订阅专栏

参考

coco eval 解析
 COCO目标检测比赛中的模型评价指标介绍！

coco 的评估函数对应的是 pycocotools 中的 cocoeval.py 文件。
从整体上来看，整个 COCOeval 类的框架如图：
在这里插入图片描述

基础的用法为

# The usage for CocoEval is as follows:
cocoGt=..., cocoDt=...       # load dataset and results
E = CocoEval(cocoGt,cocoDt); # initialize CocoEval object
E.params.recThrs = ...;      # set parameters as desired
E.evaluate();                # run per image evaluation
E.accumulate();              # accumulate per image results
E.summarize();               # display summary metrics of results

cocoGt, cocoDt 应该是什么格式？如果是COCO 格式，注意需要增加 score 值。（how?）

init 初始化函数

参数解释如下：
在这里插入图片描述
注意几个字母的含义
N: 用于评估的img_id 的个数
K: 用于评估的cat_id 的个数
T: iouThrs 的个数
R: recThrs 的个数
A: 对象面积分段后的数量
M: maxDets 每张图片检测的最大检测框数量

_prepare

根据传入的初始化参数做一些前置化的处理

    def _prepare(self):
        '''
        Prepare ._gts and ._dts for evaluation based on params
        :return: None
        '''
        def _toMask(anns, coco):
            # modify ann['segmentation'] by reference
            for ann in anns:
                rle = coco.annToRLE(ann)
                ann['segmentation'] = rle
        p = self.params
        if p.useCats:
            gts=self.cocoGt.loadAnns(self.cocoGt.getAnnIds(imgIds=p.imgIds, catIds=p.catIds))
            dts=self.cocoDt.loadAnns(self.cocoDt.getAnnIds(imgIds=p.imgIds, catIds=p.catIds))
        else:
            gts=self.cocoGt.loadAnns(self.cocoGt.getAnnIds(imgIds=p.imgIds))
            dts=self.cocoDt.loadAnns(self.cocoDt.getAnnIds(imgIds=p.imgIds))

        # convert ground truth to mask if iouType == 'segm'
        if p.iouType == 'segm':
            _toMask(gts, self.cocoGt)
            _toMask(dts, self.cocoDt)
        # set ignore flag
        for gt in gts:
            gt['ignore'] = gt['ignore'] if 'ignore' in gt else 0
            gt['ignore'] = 'iscrowd' in gt and gt['iscrowd']
            if p.iouType == 'keypoints':
                gt['ignore'] = (gt['num_keypoints'] == 0) or gt['ignore']
        self._gts = defaultdict(list)       # gt for evaluation
        self._dts = defaultdict(list)       # dt for evaluation
        for gt in gts:
            self._gts[gt['image_id'], gt['category_id']].append(gt)
        for dt in dts:
            self._dts[dt['image_id'], dt['category_id']].append(dt)
        self.evalImgs = defaultdict(list)   # per-image per-category evaluation results
        self.eval     = {}                  # accumulated evaluation results

computeIoU(self, imgId, catId)：

根据image_id和cat_id计算这张图片里 cat_id 的所有GT、DT的iou矩阵，主要用于bbox和segmentation；
这里就是涉及到单张图片的单个类别的计算。

computeOks(self, imgId, catId)：

根据image_id和cat_id计算这张图片里所有GT、DT的Oks矩阵，也就是Sec 1.2.里OKS的计算源码出处。这里OKS矩阵的维度是

OKS 矩阵是什么？
所以是只有是 keypoints 的检测，才使用这个函数？

 if p.iouType == 'segm' or p.iouType == 'bbox':
      computeIoU = self.computeIoU
  elif p.iouType == 'keypoints':
      computeIoU = self.computeOks

evaluateImg

perform evaluation for single category and image.
对单张图片的单个类别做统计。

maxDets 每张图片的最大检测数
useCats 指定类别评估

cocoGt, cocoDt 都是 COCO API 数据

过程会计算每张图的结果吗？会的，每张图每个类别分别计算，最后汇总的。
在这里插入图片描述
evaluateImg来计算每一张图片、每一个类别在不同条件下的检测结果；

precision(T,R,K,A,M) recall(T,K,A,M)。
TKAM 分别是代表什么？什么意思？

cocoEval.evaluate() 只是每幅图的det和gt做了匹配，并将结果存在了self.evalImgs中。计算tp等指标需要cocoEval.accumulate()。

针对上述accumulate获得的precision、recall矩阵，在不同的维度上进行统计，然后再呈现结果。
函数内部会根据传入的具体的IoU阈值，面积阈值，最大检测数的值返回上述precision和recall中对应维的检测结果，我们就也可以自定义形式返回我们想要的各种参数下的AP与AR啦。

coco api 的 loadRes 怎么理解？
COCO API-COCO模块在det中的应用

    def evaluateImg(self, imgId, catId, aRng, maxDet):
        '''
        perform evaluation for single category and image
        :return: dict (single image results)
        '''
        p = self.params
        if p.useCats:
            gt = self._gts[imgId,catId]
            dt = self._dts[imgId,catId]
        else:
            gt = [_ for cId in p.catIds for _ in self._gts[imgId,cId]]
            dt = [_ for cId in p.catIds for _ in self._dts[imgId,cId]]
        if len(gt) == 0 and len(dt) ==0:
            return None

        for g in gt:
            if g['ignore'] or (g['area']<aRng[0] or g['area']>aRng[1]):
                g['_ignore'] = 1
            else:
                g['_ignore'] = 0

        # sort dt highest score first, sort gt ignore last
        gtind = np.argsort([g['_ignore'] for g in gt], kind='mergesort')
        gt = [gt[i] for i in gtind]
        dtind = np.argsort([-d['score'] for d in dt], kind='mergesort')
        dt = [dt[i] for i in dtind[0:maxDet]]
        iscrowd = [int(o['iscrowd']) for o in gt]
        # load computed ious
        ious = self.ious[imgId, catId][:, gtind] if len(self.ious[imgId, catId]) > 0 else self.ious[imgId, catId]

        T = len(p.iouThrs)
        G = len(gt)
        D = len(dt)
        gtm  = np.zeros((T,G))
        dtm  = np.zeros((T,D))
        gtIg = np.array([g['_ignore'] for g in gt])
        dtIg = np.zeros((T,D))
        if not len(ious)==0:
            for tind, t in enumerate(p.iouThrs):
                for dind, d in enumerate(dt):
                    # information about best match so far (m=-1 -> unmatched)
                    iou = min([t,1-1e-10])
                    m   = -1
                    for gind, g in enumerate(gt):
                        # if this gt already matched, and not a crowd, continue
                        if gtm[tind,gind]>0 and not iscrowd[gind]:
                            continue
                        # if dt matched to reg gt, and on ignore gt, stop
                        if m>-1 and gtIg[m]==0 and gtIg[gind]==1:
                            break
                        # continue to next gt unless better match made
                        if ious[dind,gind] < iou:
                            continue
                        # if match successful and best so far, store appropriately
                        iou=ious[dind,gind]
                        m=gind
                    # if match made store id of match for both dt and gt
                    if m ==-1:
                        continue
                    dtIg[tind,dind] = gtIg[m]
                    dtm[tind,dind]  = gt[m]['id']
                    gtm[tind,m]     = d['id']
        # set unmatched detections outside of area range to ignore
        a = np.array([d['area']<aRng[0] or d['area']>aRng[1] for d in dt]).reshape((1, len(dt)))
        dtIg = np.logical_or(dtIg, np.logical_and(dtm==0, np.repeat(a,T,0)))
        # store results for given image and category
        return {
                'image_id':     imgId,
                'category_id':  catId,
                'aRng':         aRng,
                'maxDet':       maxDet,
                'dtIds':        [d['id'] for d in dt],
                'gtIds':        [g['id'] for g in gt],
                'dtMatches':    dtm,
                'gtMatches':    gtm,
                'dtScores':     [d['score'] for d in dt],
                'gtIgnore':     gtIg,
                'dtIgnore':     dtIg,
            }

Accumulate

Accumulate per image evaluation results and store the result in self.eval

accumulate 部分是否可以区分图片？
结果部分保存在 self.eval 中。

self.eval = {
            'params': p,
            'counts': [T, R, K, A, M],
            'date': datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
            'precision': precision,
            'recall':   recall,
            'scores': scores,
        }

结合 mmdet 中的 cocometric

mmdet/evaluation/metrics/coco_metric.py

result2json 将结果格式化为coco格式。

# convert predictions to coco format and dump to json file
result_files = self.results2json(preds, outfile_prefix)

/home/my_mmdet/demo/inference_demo.ipynb 已经给出了不同场景下的推理

一张图片
一个文件夹
确认一下这两种情况是否经过了完整的预处理？

确认 mmdet 预测的结果格式？
然后保留一份 json 作为 cocoeval 实验的example.

mmdet 中的 cocometric，更像是一个过程评估器。
需要不断通过process的方式处理gt和pred？

先 process，再 compute_metric？

模型在处理的过程中，会生成带有 metainfo，img_id 的预测结果。但是在自己调用 detinferencer 的时候却不会生成？为何？

如何解决？让自己更容易简易调用数据结果？
dumpresult 为pkl是怎么实现的？

gt 也是在process 这个函数中的 data_batch 中加入的,额，不是，是在 datasamples 中返回的。

def process(self, data_batch: dict, data_samples: Sequence[dict]) -> None:
        """Process one batch of data samples and predictions. The processed
        results should be stored in ``self.results``, which will be used to
        compute the metrics when all batches have been processed.

        Args:
            data_batch (dict): A batch of data from the dataloader.
            data_samples (Sequence[dict]): A batch of data samples that
                contain annotations and predictions.
        """
        for data_sample in data_samples:
            result = dict()
            pred = data_sample['pred_instances']
            result['img_id'] = data_sample['img_id']
            result['bboxes'] = pred['bboxes'].cpu().numpy()
            result['scores'] = pred['scores'].cpu().numpy()
            result['labels'] = pred['labels'].cpu().numpy()
            # encode mask to RLE
            if 'masks' in pred:
                result['masks'] = encode_mask_results(
                    pred['masks'].detach().cpu().numpy()) if isinstance(
                        pred['masks'], torch.Tensor) else pred['masks']
            # some detectors use different scores for bbox and mask
            if 'mask_scores' in pred:
                result['mask_scores'] = pred['mask_scores'].cpu().numpy()

            # parse gt
            gt = dict()
            gt['width'] = data_sample['ori_shape'][1]
            gt['height'] = data_sample['ori_shape'][0]
            gt['img_id'] = data_sample['img_id']
            if self._coco_api is None:
                # TODO: Need to refactor to support LoadAnnotations
                assert 'instances' in data_sample, \
                    'ground truth is required for evaluation when ' \
                    '`ann_file` is not provided'
                gt['anns'] = data_sample['instances']
            # add converted result to the results list
            self.results.append((gt, result))

coco 接口

这两个接口是否可以帮助不通过json构造coco？

loadRes

将结果转换为 loadNumpyAnnotations 输入格式、

list:
ann 一定要求包括以下几个 key， score 以及别的key看你心情加？
- image_id
- segmentation
- bbox

- score??? 有score 吗？

loadNumpyAnnotations

def loadNumpyAnnotations(self, data):
        """
        Convert result data from a numpy array [Nx7] where each row contains {imageID,x1,y1,w,h,score,class}
        :param  data (numpy.ndarray)
        :return: annotations (python nested list)
        """
        print('Converting ndarray to lists...')
        assert(type(data) == np.ndarray)
        print(data.shape)
        assert(data.shape[1] == 7)
        N = data.shape[0]
        ann = []
        for i in range(N):
            if i % 1000000 == 0:
                print('{}/{}'.format(i,N))
            ann += [{
                'image_id'  : int(data[i, 0]),
                'bbox'  : [ data[i, 1], data[i, 2], data[i, 3], data[i, 4] ],
                'score' : data[i, 5],
                'category_id': int(data[i, 6]),
                }]
        return ann

self.datasets

datasets 是个什么？

mask 这块是一个比较细节的地方

mmdet 返回的mask，和我们输入的格式不同，一种是 polygon，还有一种？rle？
import pycocotools._mask as _mask

然后这个 mask 的解析， coco metrics 里已经给了一个案例了。
做了一个 annotation 出来而已。
下一步是写出来，然后是继续到最后，detect完整个逻辑（3小时？）

mAP 的计算中

其实有一个比较诡异的问题，边界case 是如何处理的？比如gt为0？dt为0？

计算每张图片的 mAP

# 计算每张图像的 mAP
    per_image_mAPs = []

    for img_id in coco_api.getImgIds():
        coco_eval.params.imgIds = [img_id]
        coco_eval.evaluate()
        coco_eval.accumulate()
        coco_eval.summarize()

        # 获取每张图像的 mAP 值
        per_image_mAPs.append(coco_eval.stats[1])

    # 打印每张图像的 mAP 值
    for i, mAP in enumerate(per_image_mAPs):
        print(f"mAP for image {i + 1}: {mAP}")

关键逻辑盘点

iou 的计算

直接算两者的面积交集/并集


import numpy as np
import torch


def bbox_iou(box1, box2, x1y1x2y2=True):
    """
    Returns the IoU of two bounding boxes
    x1y1x2y2: x1y1x2y2 (True) or xywh (False)
    """
    if not x1y1x2y2:
        # Transform from xywh to x1y1x2y2
        b1_x1, b1_x2 = box1[:, 0] - box1[:, 2] / 2, box1[:, 0] + box1[:, 2] / 2
        b1_y1, b1_y2 = box1[:, 1] - box1[:, 3] / 2, box1[:, 1] + box1[:, 3] / 2
        b2_x1, b2_x2 = box2[:, 0] - box2[:, 2] / 2, box2[:, 0] + box2[:, 2] / 2
        b2_y1, b2_y2 = box2[:, 1] - box2[:, 3] / 2, box2[:, 1] + box2[:, 3] / 2
    else:
        # Get the coordinates of bounding boxes
        b1_x1, b1_y1, b1_x2, b1_y2 = box1[:, 0], box1[:, 1], box1[:, 2], box1[:, 3]
        b2_x1, b2_y1, b2_x2, b2_y2 = box2[:, 0], box2[:, 1], box2[:, 2], box2[:, 3]

    # get the corrdinates of the intersection rectangle
    inter_rect_x1 = torch.max(b1_x1, b2_x1)
    inter_rect_y1 = torch.max(b1_y1, b2_y1)
    inter_rect_x2 = torch.min(b1_x2, b2_x2)
    inter_rect_y2 = torch.min(b1_y2, b2_y2)
    # Intersection area
    inter_area = torch.clamp(inter_rect_x2 - inter_rect_x1 + 1, min=0) * torch.clamp(
        inter_rect_y2 - inter_rect_y1 + 1, min=0
    )
    # Union Area
    b1_area = (b1_x2 - b1_x1 + 1) * (b1_y2 - b1_y1 + 1)
    b2_area = (b2_x2 - b2_x1 + 1) * (b2_y2 - b2_y1 + 1)
    union_area = b1_area + b2_area - inter_area
    iou = inter_area / union_area
    return iou

def mask_iou(mask1, mask2):
    """
    Returns the IoU of two masks
    """
    intersection = np.logical_and(mask1, mask2)
    union = np.logical_or(mask1, mask2)
    iou = np.sum(intersection) / np.sum(union)
    return iou

match

按 conf 排序第一个满足 iou 阈值的要求的框?

yolo 中 match 的逻辑好像是 conf 满足阈值后，确认其预测的类，取 iou 最大的框。

    def match_predictions(self, pred_classes, true_classes, iou, use_scipy=False):
        """
        Matches predictions to ground truth objects (pred_classes, true_classes) using IoU.

        Args:
            pred_classes (torch.Tensor): Predicted class indices of shape(N,).
            true_classes (torch.Tensor): Target class indices of shape(M,).
            iou (torch.Tensor): An NxM tensor containing the pairwise IoU values for predictions and ground of truth
            use_scipy (bool): Whether to use scipy for matching (more precise).

        Returns:
            (torch.Tensor): Correct tensor of shape(N,10) for 10 IoU thresholds.
        """
        # Dx10 matrix, where D - detections, 10 - IoU thresholds
        correct = np.zeros((pred_classes.shape[0], self.iouv.shape[0])).astype(bool)
        # LxD matrix where L - labels (rows), D - detections (columns)
        correct_class = true_classes[:, None] == pred_classes
        iou = iou * correct_class  # zero out the wrong classes
        iou = iou.cpu().numpy()
        for i, threshold in enumerate(self.iouv.cpu().tolist()):
            if use_scipy:
                # WARNING: known issue that reduces mAP in https://github.com/ultralytics/ultralytics/pull/4708
                import scipy  # scope import to avoid importing for all commands

                cost_matrix = iou * (iou >= threshold)
                if cost_matrix.any():
                    labels_idx, detections_idx = scipy.optimize.linear_sum_assignment(cost_matrix, maximize=True)
                    valid = cost_matrix[labels_idx, detections_idx] > 0
                    if valid.any():
                        correct[detections_idx[valid], i] = True
            else:
                matches = np.nonzero(iou >= threshold)  # IoU > threshold and classes match
                matches = np.array(matches).T
                if matches.shape[0]:
                    if matches.shape[0] > 1:
                        matches = matches[iou[matches[:, 0], matches[:, 1]].argsort()[::-1]]
                        matches = matches[np.unique(matches[:, 1], return_index=True)[1]]
                        # matches = matches[matches[:, 2].argsort()[::-1]]
                        matches = matches[np.unique(matches[:, 0], return_index=True)[1]]
                    correct[matches[:, 1].astype(int), i] = True
        return torch.tensor(correct, dtype=torch.bool, device=pred_classes.device)