FCOSFullyConvolutionalOne-StageObjectDetection.pdf资源-CSDN下载

需积分: 20 60 浏览量 2020-06-03 12:59:14 上传评论收藏 3.25MB PDF 举报

FCOS（Fully Convolutional One-Stage Object Detection）是一种端到端的全卷积单阶段目标检测算法。这种算法避免了传统目标检测方法中使用预定义的锚框（anchor boxes）所带来的复杂计算，并且不需要针对锚框的超参数设置，从而简化了目标检测的流程。FCOS的主要贡献在于提供了一种新的目标检测框架，它以像素级别预测的方式工作，类似于语义分割，并且能够实现更准确的检测效果。在目标检测中，传统的算法如Faster R-CNN、SSD、YOLOv3等，大都依赖预定义的锚框。这些锚框的尺寸、宽高比以及数量都会对最终的检测性能产生敏感影响。例如，在RetinaNet模型中，这些超参数的变化可能会导致平均精度（AP）值在COCO数据集上产生高达4%的影响。FCOS通过摒弃预定义的锚框，避免了与锚框相关的计算（如训练过程中的重叠计算），同时绕过了所有与锚框相关的超参数问题，这使得模型更加简洁。 FCOS在使用ResNeXt-64x4d-101作为主干网络时，仅通过单一模型和单一尺度的测试，就能达到44.7%的AP值，超过了之前的一阶段检测器。这表明，FCOS不仅简化了模型，而且在检测准确性方面也有所提升。FCOS框架的出现，为许多实例级任务提供了一种简单而强大的替代方案。此外，FCOS的代码库已经公开，方便研究人员和开发者访问和使用。从目标检测的基础来看，算法需要为图像中的每一个感兴趣的目标实例预测出一个带有类别标签的边界框。这一任务在计算机视觉中是极具挑战性的。目前主流的目标检测方法大多数都依赖于一组预定义的锚框，而这些锚框被认为是检测器成功的关键。尽管它们取得了巨大的成功，但基于锚框的检测器仍然存在一些缺点。例如，锚框的尺寸、宽高比和数量都会对检测性能产生影响。例如，在RetinaNet中，这些超参数的变化可对COCO基准测试中的AP性能影响高达4%。因此，如何简化目标检测流程、减少对锚框的依赖，成为了一个研究的热点。 FCOS方法完全摒弃了预定义锚框的概念，它是一种无需建议框（proposal-free）的单阶段检测器。这在目标检测领域是开创性的，因为在此之前，几乎所有高准确率的目标检测模型都使用了锚框。FCOS的设计使得它通过一种更简洁的方式处理目标检测任务，即对每个像素点进行分类，判断该像素是否属于目标的边界、目标内部或背景。这种像素级的预测方法与语义分割中使用的方法类似。在实现方面，FCOS通过使用非极大值抑制（Non-Maximum Suppression, NMS）作为唯一的后处理步骤来提高检测的准确性。NMS是一种广泛用于目标检测的算法，用于剔除多余的检测框，保留最有可能的检测结果。FCOS框架的设计理念不仅提高了模型的准确性，还提高了其灵活性，使得它可以应用于多种不同的实例级任务，例如实例分割、场景理解等。 FCOS提供了一种新的视角来处理目标检测问题，其创新之处在于将目标检测转化为像素级别的分类任务，从而避免了传统方法中关于锚框的复杂计算和相关超参数的敏感问题。这种做法简化了模型结构，并且提升了检测的效率和准确性，展示了单阶段检测方法的巨大潜力。FCOS的成功案例激励着后续研究者探索更多无需锚框的目标检测方法，以实现更优的性能和更高的应用价值。

资源推荐

资源详情

资源评论

FCOS: Fully Convolutional One-Stage Object Detection

Zhi Tian Chunhua Shen

∗

Hao Chen Tong He

The University of Adelaide, Australia

Abstract

We propose a fully convolutional one-stage object detec-

tor (FCOS) to solve object detection in a per-pixel predic-

tion fashion, analogue to semantic segmentation. Almost

all state-of-the-art object detectors such as RetinaNet, SSD,

YOLOv3, and Faster R-CNN rely on pre-deﬁned anchor

boxes. In contrast, our proposed detector FCOS is anchor

box free, as well as proposal free. By eliminating the pre-

deﬁned set of anchor boxes, FCOS completely avoids the

complicated computation related to anchor boxes such as

calculating overlapping during training. More importantly,

we also avoid all hyper-parameters related to anchor boxes,

which are often very sensitive to the ﬁnal detection perfor-

mance. With the only post-processing non-maximum sup-

pression (NMS), FCOS with ResNeXt-64x4d-101 achieves

44.7% in AP with single-model and single-scale testing,

surpassing previous one-stage detectors with the advantage

of being much simpler. For the ﬁrst time, we demonstrate

a much simpler and ﬂexible detection framework achieving

improved detection accuracy. We hope that the proposed

FCOS framework can serve as a simple and strong alterna-

tive for many other instance-level tasks. Code is available

at:

tinyurl.com/FCOSv1

1. Introduction

Object detection is a fundamental yet challenging task in

computer vision, which requires the algorithm to predict a

bounding box with a category label for each instance of in-

terest in an image. All current mainstream detectors such

as Faster R-CNN [24], SSD [18] and YOLOv2, v3 [23] rely

on a set of pre-deﬁned anchor boxes and it has long been

believed that the use of anchor boxes is the key to detectors’

success. Despite their great success, it is important to note

that anchor-based detectors suffer some drawbacks: 1) As

shown in [15, 24], detection performance is sensitive to the

sizes, aspect ratios and number of anchor boxes. For exam-

ple, in RetinaNet [15], varying these hyper-parameters af-

fects the performance up to 4% in AP on the COCO bench-

∗

Corresponding author, email: chunhua.shen@adelaide.edu.au

Figure 1 – As shown in the left image, FCOS works by pre-

dicting a 4D vector (l, t, r, b) encoding the location of a bound-

ing box at each foreground pixel (supervised by ground-truth

bounding box information during training). The right plot shows

that when a location residing in multiple bounding boxes, it

can be ambiguous in terms of which bounding box this location

should regress.

mark [16]. As a result, these hyper-parameters need to be

carefully tuned in anchor-based detectors. 2) Even with

careful design, because the scales and aspect ratios of an-

chor boxes are kept ﬁxed, detectors encounter difﬁculties to

deal with object candidates with large shape variations, par-

ticularly for small objects. The pre-deﬁned anchor boxes

also hamper the generalization ability of detectors, as they

need to be re-designed on new detection tasks with differ-

ent object sizes or aspect ratios. 3) In order to achieve

a high recall rate, an anchor-based detector is required to

densely place anchor boxes on the input image (e.g., more

than 180K anchor boxes in feature pyramid networks (FPN)

[14] for an image with its shorter side being 800). Most

of these anchor boxes are labelled as negative samples dur-

ing training. The excessive number of negative samples ag-

gravates the imbalance between positive and negative sam-

ples in training. 4) Anchor boxes also involve complicated

computation such as calculating the intersection-over-union

(IoU) scores with ground-truth bounding boxes.

Recently, fully convolutional networks (FCNs) [20] have

achieved tremendous success in dense prediction tasks such

as semantic segmentation [20, 28, 9, 19], depth estimation

arXiv:1904.01355v5 [cs.CV] 20 Aug 2019

[17, 31], keypoint detection [3] and counting [2]. As one

of high-level vision tasks, object detection might be the

only one deviating from the neat fully convolutional per-

pixel prediction framework mainly due to the use of anchor

boxes. It is nature to ask a question: Can we solve object

detection in the neat per-pixel prediction fashion, analogue

to FCN for semantic segmentation, for example? Thus

those fundamental vision tasks can be uniﬁed in (almost)

one single framework. We show that the answer is afﬁr-

mative. Moreover, we demonstrate that, for the ﬁrst time,

the much simpler FCN-based detector achieves even better

performance than its anchor-based counterparts.

In the literature, some works attempted to leverage the

FCNs-based framework for object detection such as Dense-

Box [12]. Speciﬁcally, these FCN-based frameworks di-

rectly predict a 4D vector plus a class category at each spa-

tial location on a level of feature maps. As shown in Fig. 1

(left), the 4D vector depicts the relative offsets from the four

sides of a bounding box to the location. These frameworks

are similar to the FCNs for semantic segmentation, except

that each location is required to regress a 4D continuous

vector. However, to handle the bounding boxes with dif-

ferent sizes, DenseBox [12] crops and resizes training im-

ages to a ﬁxed scale. Thus DenseBox has to perform detec-

tion on image pyramids, which is against FCN’s philosophy

of computing all convolutions once. Besides, more signif-

icantly, these methods are mainly used in special domain

objection detection such as scene text detection [33, 10] or

face detection [32, 12], since it is believed that these meth-

ods do not work well when applied to generic object de-

tection with highly overlapped bounding boxes. As shown

in Fig. 1 (right), the highly overlapped bounding boxes re-

sult in an intractable ambiguity: it is not clear w.r.t. which

bounding box to regress for the pixels in the overlapped re-

gions.

In the sequel, we take a closer look at the issue and show

that with FPN this ambiguity can be largely eliminated. As

a result, our method can already obtain comparable detec-

tion accuracy with those traditional anchor based detectors.

Furthermore, we observe that our method may produce a

number of low-quality predicted bounding boxes at the lo-

cations that are far from the center of an target object. In

order to suppress these low-quality detections, we intro-

duce a novel “center-ness” branch (only one layer) to pre-

dict the deviation of a pixel to the center of its correspond-

ing bounding box, as deﬁned in Eq. (3). This score is then

used to down-weight low-quality detected bounding boxes

and merge the detection results in NMS. The simple yet ef-

fective center-ness branch allows the FCN-based detector

to outperform anchor-based counterparts under exactly the

same training and testing settings.

This new detection framework enjoys the following ad-

vantages.

• Detection is now uniﬁed with many other FCN-

solvable tasks such as semantic segmentation, making

it easier to re-use ideas from those tasks.

• Detection becomes proposal free and anchor free,

which signiﬁcantly reduces the number of design pa-

rameters. The design parameters typically need heuris-

tic tuning and many tricks are involved in order to

achieve good performance. Therefore, our new de-

tection framework makes the detector, particularly its

training, considerably simpler.

• By eliminating the anchor boxes, our new detector

completely avoids the complicated computation re-

lated to anchor boxes such as the IOU computation and

matching between the anchor boxes and ground-truth

boxes during training, resulting in faster training and

testing as well as less training memory footprint than

its anchor-based counterpart.

• Without bells and whistles, we achieve state-of-the-

art results among one-stage detectors. We also show

that the proposed FCOS can be used as a Region

Proposal Networks (RPNs) in two-stage detectors and

can achieve signiﬁcantly better performance than its

anchor-based RPN counterparts. Given the even better

performance of the much simpler anchor-free detector,

we encourage the community to rethink the necessity of

anchor boxes in object detection, which are currently

considered as the de facto standard for detection.

• The proposed detector can be immediately extended

to solve other vision tasks with minimal modiﬁcation,

including instance segmentation and key-point detec-

tion. We believe that this new method can be the new

baseline for many instance-wise prediction problems.

2. Related Work

Anchor-based Detectors. Anchor-based detectors inherit

the ideas from traditional sliding-window and proposal

based detectors such as Fast R-CNN [6]. In anchor-based

detectors, the anchor boxes can be viewed as pre-deﬁned

sliding windows or proposals, which are classiﬁed as pos-

itive or negative patches, with an extra offsets regression

to reﬁne the prediction of bounding box locations. There-

fore, the anchor boxes in these detectors may be viewed

as training samples. Unlike previous detectors like Fast

RCNN, which compute image features for each sliding win-

dow/proposal repeatedly, anchor boxes make use of the fea-

ture maps of CNNs and avoid repeated feature computation,

speeding up detection process dramatically. The design of

anchor boxes are popularized by Faster R-CNN in its RPNs

[24], SSD [18] and YOLOv2 [22], and has become the con-

vention in a modern detector.

However, as described above, anchor boxes result in

excessively many hyper-parameters, which typically need

剩余12页未读，继续阅读

评论收藏

内容反馈

小睿羊今天好好学习了吗

粉丝: 39

FCOS Fully Convolutional One-Stage Object Detection.pdf

最新资源

FCOS Fully Convolutional One-Stage Object Detection.pdf

FCOS: Fully Convolutional One-Stage Object Detection论文解读的ppt

FCOS检测算法训练模型

FCOS官方代码的解析，从测试到训练

Python-FCOS完全卷积一步OneStage物体检测

FCOS最强解读.pdf

FCOS:FCOS

Python-关于FCOS的一些改进FCOS完全卷积OneStage对象检测

fcos-R-50-1x.pth 的预训练模型

fcos-gluon-cv:FCOS

基于Pytorch的NanoDet项目裁剪实现摄像头实时目标检测源码+说明文档(下载直接使用).zip

FCOS-PyTorch-37.2AP:FCOS 37.2AP的纯炬管工具

fcos3d params fcos3d 模型权重

Object-Detection-API-Tensorflow：对象检测API Tensorflow

fcos-baremetal-k8s

FCOS.Pytorch:pytorch1.x中的fcos实现

改进FCOS的车辆重识别研究.docx

fcos-actions-runner:Fedora CoreOS Github Actions运行程序

bevformer模型r50-fcos3d-pretrain

fedora-coreos-browser:Fedora CoreOS构建的浏览器

Academic+Phrasebank+2021+Edition+_中英文对照.pdf

30个论文的技术路线模板

基于python的超市管理系统的设计与实现毕业论文+项目文档源码

DeepSeek-R1技术报告

1000套计算机毕业设计带源码

数模国赛word模板.zip

IEEE期刊论文格式模板word

2023高教社数学建模C题 - 蔬菜类商品的自动定价与补货决策【数据处理详细代码】

2021年国赛A题（FAST主动反射面形状调节）论文+代码材料.zip

jedis连接池

深入浅出Java编程基础

最新资源