FastLearningofTemporalActionProposalviaDenseBoundary资源-CSDN下载

论文

需积分: 9 172 浏览量 2021-10-19 16:45:26 上传评论收藏 1.73MB PDF 举报

资源推荐

资源详情

资源评论

Fast Learning of Temporal Action Proposal via Dense Boundary Generator

Chuming Lin

∗†

Jian Li

∗†

Yabiao Wang

†

Ying Tai

†

Donghao Luo

†

Zhipeng Cui

†

Chengjie Wang

†

Jilin Li

†

Feiyue Huang

†

Rongrong Ji

‡

†

Youtu Lab, Tencent

‡

Xiamen University, China

†

{swordli, caseywang, yingtai, michaelluo, zhipengcui, jasoncjwang, jerolinli, garyhuang}@tencent.com

†

[email protected],

‡

[email protected]

https://github.com/TencentYoutuResearch/ActionDetection-DBG

Abstract

Generating temporal action proposals remains a very chal-

lenging problem, where the main issue lies in predicting pre-

cise temporal proposal boundaries and reliable action conﬁ-

dence in long and untrimmed real-world videos. In this paper,

we propose an efﬁcient and uniﬁed framework to generate

temporal action proposals named Dense Boundary Genera-

tor (DBG), which draws inspiration from boundary-sensitive

methods and implements boundary classiﬁcation and action

completeness regression for densely distributed proposals.

In particular, the DBG consists of two modules: Temporal

boundary classiﬁcation (TBC) and Action-aware complete-

ness regression (ACR). The TBC aims to provide two tem-

poral boundary conﬁdence maps by low-level two-stream

features, while the ACR is designed to generate an action

completeness score map by high-level action-aware features.

Moreover, we introduce a dual stream BaseNet (DSB) to en-

code RGB and optical ﬂow information, which helps to cap-

ture discriminative boundary and actionness features. Exten-

sive experiments on popular benchmarks ActivityNet-1.3 and

THUMOS14 demonstrate the superiority of DBG over the

state-of-the-art proposal generator (e.g., MGG and BMN).

Introduction

Generating temporal action proposals in video is a fun-

damental task, which serves as a crucial step for various

tasks, like action detection and video analysis. In an opti-

mal case, such proposals should well predict action intervals,

with precise temporal boundaries and reliable conﬁdence in

untrimmed videos. Despite the extensive endeavors (Lin et

al. 2018; Lin et al. 2019; Liu et al. 2019), temporal action

proposal generation retains as an open problem, especially

when facing action duration variability, activity complexity,

blurred boundary, camera motion, background clutter and

viewpoint changes in real-world scenarios.

Previous works in temporal action proposals can be

roughly divided into two categories: anchor based (Buch

et al. 2017; Heilbron, Niebles, and Ghanem 2016; Gao et al.

2017; Shou, Wang, and Chang 2016) and boundary based

(Zhao et al. 2017a; Lin et al. 2018; Lin et al. 2019). Anchor-

based methods design a set of anchors at different scale for

∗

indicates equal contributions. This work was done when

Chuming Lin was an intern at Tencent Youtu Lab.

Figure 1: Overview of our proposed method. Given an untrimmed

video, DBG densely evaluates all proposals by producing simul-

taneously three score maps: starting conﬁdence score map, ending

conﬁdence score map and action completeness score map.

each video segment, which are regularly distributed over the

video sequence. These candidate anchors are then evaluated

by a binary classiﬁer. However, anchor-based methods can

not predict precise boundaries and are not ﬂexible to cover

multi-duration actions.

Boundary-based methods evaluate each temporal location

over the video sequence. Such local information helps to

generate proposals with more precise boundaries and more

ﬂexible durations. As one of the pioneering works (Zhao

et al. 2017a) groups continuous high-score regions as pro-

posal by actionness scores. (Lin et al. 2018) adopts a two-

stage strategy to locate locally temporal boundaries with

high probabilities, and then evaluate global conﬁdences of

candidate proposals generated by these boundaries. To ex-

plore the rich context for evaluating all proposals, (Lin et al.

2019) propose a boundary-matching mechanism for the con-

ﬁdence evaluation of proposals in an end-to-end pipeline.

arXiv:1911.04127v1 [cs.CV] 11 Nov 2019

Figure 2: Boundary prediction comparison of (a) local informa-

tion based and (b) global proposal information based methods.

However, it drops actionness information and only adopts

the boundary matching to capture low-level features, which

can not handle complex activities and clutter background.

Besides, different from our method shown in Fig. 1, it em-

ploys the same methods of (Lin et al. 2018) to generate

boundary probability sequence instead of map, which lacks

a global scope for action instances with blurred boundaries

and variable temporal durations. Fig. 2 illustrates the dif-

ference between local information and our global proposal

information for boundary prediction.

To address the aforementioned drawbacks, we propose

dense boundary generator (DBG) to employ global pro-

posal features to predict the boundary map, and explore

action-aware features for action completeness analysis. In

our framework, a dual stream BaseNet (DSB) takes spatial

and temporal video representation as input to exploit the rich

local behaviors within the video sequence, which is super-

vised via actionness classiﬁcation loss. DSB generates two

types of features: Low-level dual stream feature and high-

level actionness score feature. In addition, a proposal fea-

ture generation (PFG) layer is designed to transfer these two

types of sequence features into a matrix-like feature. And an

action-aware completeness regression (ACR) module is de-

signed to input the actionness score feature to generate a re-

liable completeness score map. Finally, a temporal boundary

classiﬁcation (TBC) module is designed to produce tempo-

ral boundary score maps based on dual stream feature. These

three score maps will be combined to generate proposals.

The main contributions of this paper are summarized as:

• We propose a fast and uniﬁed dense boundary genera-

tor (DBG) for temporal action proposal, which evaluates

dense boundary conﬁdence maps for all proposals.

• We introduce auxiliary supervision via actionness classiﬁ-

cation to effectively facilitate action-aware feature for the

action-aware completeness regression.

• We design an efﬁcient proposal feature generation layer to

capture global proposal features for subsequent regression

and classiﬁcation modules.

• Experiments conducted on popular benchmarks like

ActivityNet-1.3 (Heilbron et al. 2015) and THUMOS14

(Idrees et al. 2017) demonstrate the superiority of our net-

work over the state-of-the-art methods.

Related Work

Action recognition. Early methods for video action recog-

nition mainly relied on hand-crafted features such as HOF,

HOG and MBH. Recent advances resort to deep convolu-

tional networks to promote recognition accuracy. These net-

works can be divided into two patterns: Two-stream net-

works (Feichtenhofer, Pinz, and Zisserman 2016; Simonyan

and Zisserman 2014; Wang et al. 2015; Wang et al. 2016),

and 3D networks (Tran et al. 2015; Qiu, Yao, and Mei 2017;

Carreira and Zisserman 2017). Two-stream networks ex-

plore video appearance and motion clues by passing RGB

image and stacked optical ﬂow through ConvNet pretrained

on ImageNet separately. Instead, 3D methods directly cre-

ate hierarchical representations of spatio-temporal data with

spatio-temporal ﬁlters.

Temporal action proposal. Temporal action proposal aims

to detect action instances with temporal boundaries and con-

ﬁdence in untrimmed videos. Anchor-based methods gener-

ate proposals by designing a set of multi-scale anchors with

regular temporal interval. The work in (Shou, Wang, and

Chang 2016) adopts C3D network (Tran et al. 2015) as the

binary classiﬁer for anchor evaluation. (Heilbron, Niebles,

and Ghanem 2016) proposes a sparse learning framework

for scoring temporal anchors. (Gao et al. 2017) proposes to

apply temporal regression to adjust the action boundaries.

Boundary-based methods evaluate each temporal location in

video. (Zhao et al. 2017a) groups continuous high-score

region to generate proposals by temporal watershed algo-

rithm. (Lin et al. 2018) locates locally temporal boundaries

with high probabilities and evaluate global conﬁdences of

candidate proposals generated by these boundaries. (Lin

et al. 2019) proposes a boundary-matching mechanism for

conﬁdence evaluation of densely distributed proposals in an

end-to-end pipeline. MGG (Liu et al. 2019) combines an-

chor based method and boundary based method to accu-

rately generate temporal action proposal.

Temporal action detection. The temporal action detection

includes generating temporal proposal generation and recog-

nizing actions, which can be divided into two patterns, i.e.,

one-stage (Lin, Zhao, and Shou 2017; Long et al. 2019) and

two-stage (Shou, Wang, and Chang 2016; Gao, Yang, and

Nevatia 2017; Zhao et al. 2017b; Xu, Das, and Saenko 2017;

Chao et al. 2018). The two-stage method ﬁrst generates can-

didate proposals, and then classiﬁes these proposals. (Chao

et al. 2018) improves two-stage temporal action detection

by addressing both receptive ﬁeld alignment and context fea-

ture extraction. For one-stage method, (Lin, Zhao, and Shou

2017) skips the proposal generation via directly detecting

action instances in untrimmed video. (Long et al. 2019) in-

troduces Gaussian kernels to dynamically optimize temporal

scale of each action proposal.

剩余8页未读，继续阅读

评论收藏

内容反馈

DeepLearning小舟

粉丝: 2495

Fast Learning of Temporal Action Proposal via Dense Boundary

Temporal Action Proposal Generation and Detection in Videos.

Fast Visual Tracking via Dense Spatio-Temporal Context Learning

DiffTAD: Temporal Action Detection with Proposal Denoising Diffu

Ji_Learning_Temporal_Action_Proposals_With_Fewer_Labels_ICCV_201

Inductive Learning on Temporal Knowledge Graphs.zip

中文翻译 Action Tubelet Detector for Spatio-Temporal Action Localization 中文版本

SAP:Self-Adaptive Proposal Model for Temporal Action Detection Based on Reinforcement Learning

ODTrack: Online Dense Temporal Token Learning for Visual Track

Learning Sequence Encoders for Temporal Knowledge Graph Completion.pdf

TSPNet：通过时间语义金字塔进行手语翻译的分层特征学习_python_代码_下载

Convolutional Learning of Spatio-temporal Features

STARK:Learning Spatio-Temporal Transformer for Visual Tracking

Contextualized Spatio-Temporal Contrastive Learning with Self-S

Learning-Temporal-Consistency-for-Low-Light-Video-Enhancement-from-Single-Images

Long_Gaussian_Temporal_Awareness_Networks_for_Action_Localizatio

Recurrent Event Network for Reasoning over Temporal Knowledge Graphs.pdf

Temporal Uncertainty of Geographical Information

论文研究-Satisfiablity of Propositional Projection Temporal Logic.pdf

Chapter 6 (Temporal Difference Learning).rar_Q-learning_SARSA Q-

Abstract Spatial-Temporal Reasoning via Probabilistic Abduct.pdf

Reinforcement Learning: An Introduction

CoLA Weakly-Supervised Temporal Action Localization With Snippet

Statistical_Analysis_of_Spatial_and_Spatio－Temporal_Point_Patterns

On-the-fly Verification of Linear Temporal Logic中文翻译

Hierarchical Deep Reinforcement Learning: Integrating Temporal A

Next Point-of-Interest Recommendation with Temporal and Multi-level Context

基于Java+MySQL+SSM奖助学金管理系统的设计与实

AccessibilityService怎么释放

最新资源