【论文阅读】A New Representation of Skeleton Sequences for 3D Action Recognition

本文介绍了一种用于3D动作识别的骨架序列新表示方法。该方法通过选取四个稳定的参考关节点构建时空编码图,并将其转换为圆柱坐标系以提高准确性。最终通过多任务学习的卷积神经网络进行特征提取。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

【论文阅读】A New Representation of Skeleton Sequences for 3D Action Recognition

这篇论文感觉写的太复杂,并不怎么好,所以只挑重点介绍一下就好。

网络结构

在这里插入图片描述
整个网络结构如上图所示,也是基于时空编码图的方法,时空编码图的构建方法:

1.首先确定人体骨骼的4个参考关节点(两肩和两腿上),选这4个关节点的原因是这4个关节点在大多数的运动中都比较稳定。(最后的实验也测试了6个关节点,但是准确率下降了,我感觉还是给出各关节点运动范围的统计更靠谱一些)
2.然后计算所有关节点与参考关节点的相对位置(dx,dy,dz),从而可以构建4个 (m-1)xt的时空编码图,每一个时空编码图包含有xyzchannel。(这里其实也是有疑问的,为什么不将4个参考关节点的向量直接concat一起,这样只构建一个时空编码图就可以了,不知道作者为什么这样设置)
3.然后将其变换到圆柱坐标系中。(也不知道为什么要变换到圆柱坐标系,感觉和平面坐标系没啥区别,最然最后的实验中说使用圆柱坐标系的准确率更高)
4.然后将channel分离,每一个channel都是一个灰度帧的clip,从而可以得到3个clip,每一个clip的帧数4帧(4个参考关节点)。(虽然作者做实验发现分开的效果更好,但是我感觉不合理,不分开,使用3channel的图像也是可以的,可能只是调参的问题)

得到时空编码图以后,使用CNN提取出feature map(14x14x512),然后在feature map中沿着时间维度平均池化(结果为14x512)。
最后使用一种多任务(4个输出)学习的方法训练卷积神经网络,感觉没啥意思。

### Skeleton-Based Action Recognition Using Adaptive Cross-Form Learning In the realm of skeleton-based action recognition, adaptive cross-form learning represents a sophisticated approach that integrates multiple modalities to enhance performance. This method leverages both spatial and temporal information from skeletal data while adapting dynamically across different forms or representations. The core concept involves constructing an end-to-end trainable framework where features extracted from joint coordinates are transformed into various intermediate representations such as graphs or sequences[^1]. These diverse forms capture distinct aspects of human motion patterns effectively: - **Graph Representation**: Models interactions between joints by treating them as nodes connected via edges representing bones. - **Sequence Modeling**: Treats each frame's pose estimation results as elements within time-series data suitable for recurrent neural networks (RNN). Adaptive mechanisms allow seamless switching among these forms based on their suitability at different stages during training/inference processes. Specifically designed modules learn when and how much weight should be assigned to specific transformations ensuring optimal utilization of available cues without overfitting any single modality. For implementation purposes, one might consider employing Graph Convolutional Networks (GCNs) alongside Long Short-Term Memory units (LSTMs). GCNs excel in capturing structural dependencies present within graph structures derived from skeletons; meanwhile LSTMs handle sequential modeling tasks efficiently handling long-range dependencies found along video frames' timelines. ```python import torch.nn as nn class AdaptiveCrossFormModule(nn.Module): def __init__(self): super(AdaptiveCrossFormModule, self).__init__() # Define components responsible for processing individual form types here def forward(self, input_data): # Implement logic determining which transformation path(s) will process 'input_data' pass def train_model(model, dataset_loader): criterion = nn.CrossEntropyLoss() optimizer = ... # Initialize appropriate optimization algorithm for epoch in range(num_epochs): running_loss = 0.0 for inputs, labels in dataset_loader: outputs = model(inputs) loss = criterion(outputs, labels) optimizer.zero_grad() loss.backward() optimizer.step() running_loss += loss.item() ```
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值