0% found this document useful (0 votes)

54 views17 pages

A Lidar-Based Computer Vision System For Monitoring

This research article presents a Lidar-based computer vision system designed for continuous monitoring of patients in medical settings. The system classifies patient postures into four categories and achieves a recognition accuracy of 93.46% with a speed of 42FPS, addressing the challenges of limited manpower in hospitals. The study emphasizes the importance of privacy-safe technology in healthcare while providing timely information to healthcare workers to enhance patient safety.

Uploaded by

jawaechan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

54 views17 pages

A Lidar-Based Computer Vision System For Monitoring

Uploaded by

jawaechan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

A Lidar-based computer vision system for

monitoring of patients in medical scenes

Xuan Wu
Beijing Institute of Technology
Haiyang Zhang (  [email protected] )
Beijing Institute of Technology
Chunxiu Kong
Beijing Institute of Technology
Yuanze Wang
Beijing Institute of Technology
Yezhao Ju
Beijing Institute of Technology
Changming Zhao
Beijing Institute of Technology

Research Article

Keywords: Patient monitoring, computer vision, Lidar, privacy-safe

Posted Date: April 6th, 2023

DOI: https://doi.org/10.21203/rs.3.rs-2760999/v1

License:   This work is licensed under a Creative Commons Attribution 4.0 International License.
Read Full License

Additional Declarations: No competing interests reported.

A Lidar-based computer vision system for monitoring of

patients in medical scenes

Xuan Wu1,2, Haiyang Zhang1,2,3*, Chunxiu Kong1,2 , Yuanze Wang1,2, Yezhao Ju1,2, Changming Zhao1,2,3
School of Optics and Photonics, Beijing Institute of Technology, Beijing, 100081, China.
1

2
Key Laboratory of Photoelectronic Imaging Technology and System, Beijing Institute of Technology,
Beijing, 100081, China.
3
Key Laboratory of Information Photon Technology, Ministry of Industry and Information Technology,
Beijing, 100000,China.
*
Corresponding author(s). E-mail(s): [email protected];
Contributing authors: [email protected]; [email protected]; [email protected];
[email protected]; [email protected];
Abstract: Long-term monitoring for patients can improve patient safety, help doctors diagnose and
evaluate the clinical situation. Limited manpower in hospitals makes it difficult to achieve
continuous and nuanced monitoring. In this paper, we classify the patient's posture as standing,
sitting, lying and falling. Using a non-intrusive, privacy-compliant lidar, a medical human pose
dataset is collected in the First Affiliated Hospital, Sun Yat-Sen University, and a novel computer
vision-based approach is presented to continuously detect patients pose and provide timely
information to health care workers. The recognition accuracy reaches 93.46% and the recognition
speed reaches 42FPS on 3080Ti. Experimental results show that the algorithm performs well on the
medical human pose dataset, and can effectively solve the problem of human pose recognition in
medical scenes.
Keywords: Patient monitoring, computer vision, Lidar, privacy-safe

Statements and Declarations

Competing Interests: The authors declare no conflicts of interest.
Ethical Approval and Consent to Participate: All data is gathered and analyzed after getting
approve from the First Affiliated Hospital, Sun Yat-Sen University and consents from the volunteers
for participation.
Consent for publication: All the authors of this paper have consents fort his paper to be published
in Journal of Medical Systems.
Acknowledgments: The authors appreciate the First Affiliated Hospital, Sun Yat-Sen University
for providing the testing environment. We also appreciate Zhen Zhang and Jianfei Lai of Huawei
Inspiration Lab for their technical support.
Funding: This work is supported by the fund of Huawei Inspiration Lab.
Author contributions: All authors contributed to the study conception and design. Material
preparation, data collection and labeling were performed by Xuan Wu, Chunxiu Kong, Yuanze

1
Wang and Yezhao Ju. Technical guidance and agreement support were provided by Haiyang Zhang
and Changming Zhao. The first draft of the manuscript was written by Xuan Wu and all authors
commented on previous versions of the manuscript. All authors read and approved the final
manuscript.

1. Introduction
Health monitoring is an important basis for evaluating patients’ health status, timely acquisition
of patient activities is conducive to doctors’ judgment of illness and formulation of rehabilitation
plan. Traditional detection methods such as human observation [1] require a lot of manpower and
time. Yet, healthcare workers are often overworked, and hospitals understaffed and resource-limited
[2, 3]. The use of computer vision technology is of great significance to reduce the burden of the
medical system and improve the quality of medical services.
Human action recognition technology based on computer vision can quickly and accurately
obtain human activity information, and has been applied in many fields [4–7]. Some methods
combine RGB cameras and deep learning algorithms to recognize human movements [8–12], but
these approaches are more suitable for public settings due to privacy sensitivity and cannot be
applied in hospitals. With the development of sensing technology, depth sensors are more and more
applied in production and life. Compared with RGB camera, depth sensor can obtain 3D information
of the scene, it is not sensitive to lighting changes and chaotic background, and can well protect
people’s privacy. Combining depth cameras with motion recognition algorithms has been used in
ICU patient activity monitoring [13, 14], home health monitoring for seniors [15], etc. But these
prior works require additional sensors to ensure the accuracy of results. LIDAR-acquired 3D point
clouds provide higher spatial accuracy and have promising applications in computer vision tasks,
which offers solutions for simplicity and precision of monitoring in medical scenarios. Lidar-based
human action recognition is gaining attention [16–19]. So far, an automated, vision-based action
monitoring system that is simple to use, provides timely feedback and reliable results in the medical
scene has not been developed.
In this paper, we propose a monitoring system based on Lidar and 3D human action recognition
algorithm, as shown in Fig.1. The Lidar is deployed in the ward to obtain the patient’s posture and
movement information in real time, and alert the medical staff in time when danger occurs, so as to
improve the quality of medical services. The human pose is complex and diverse, and the occlusion
of facilities in wards and the actual deployment need to be considered. Aiming at the above problems,
this paper classifies human actions in medical scenes into four categories: standing, sitting, lying
and falling. The 3D human pose estimation and action recognition network consists of two stages,

2
the improved anchor-to-joint regression network is used to estimate the keypoints of the human
body, and then the SVM classifier is used to classify the action based on the manually extracted
skeleton features. This method uses the information of multi-scale feature maps to achieve stronger
depth perception ability and better keypoint estimation performance. The manually designed
skeleton features are more suitable for the characteristics of human actions in medical scenes, which
greatly reduces the amount of calculation and shortens the reasoning time under the premise of
meeting the recognition accuracy, the recognition speed is 42FPS on NVIDIA GeForce 3080Ti, and
the recognition accuracy is 93.46%. Compared with existing human action recognition methods, the
proposed method can better describe human pose in medical scenes, and is suitable for deployment.

Fig.1 Lidar-based patient monitoring system in medical scene

2. Methods
The proposed method consists of a pose estimation module and an action recognition module.
Fig.2 shows the overall network frame diagram, and the network input is a depth image of 288*288.
The 3D pose estimation module consists of a backbone network and two branches, the keypoints of
the human body are located based on the idea of anchor weighted voting, and the obtained 3D human
skeleton is composed of 15 keypoints (as shown in Fig.3). The action recognition module extracts
and classifies the skeleton features according to the 3D human skeleton, and finally generates the
action recognition results.

3
Fig. 2 Proposed human pose estimation and action recognition network

Fig.3 3D skeleton of human body

2.1 Data preprocessing and scene parameter calculation

Intel Realsense L515 is used as the data acquisition device. L515 is a small lidar camera for
indoor scenes, which can collect point clouds, infrared images and visible light images. The point
clouds collected by L515 are mapped to depth images as experimental samples, and point clouds
are used for keypoint annotation in order to obtain more accurate data. Point cloud labeling is labor-
intensive and error-prone, mapping the same frame of point cloud into depth maps of different views
can greatly reduce the labeling time and expand the dataset. labelimg is used to mark the boundary
box of the depth map, the part surrounded by the boundary box is cropped out, and it is filled into a
1:1 size image with pixel 0, so that the human body keeps the original proportion, and prevents
excessive deformation of the human body when the image is reshaped in the network, which can
enhance the robustness of the network for different human postures.

4
Fig.4 Deployment schematic for the Ward scenario

The location of the lidar deployment should avoid the occlusion of medical equipment and
medical personnel as far as possible, it is not feasible to shoot in the ideal position of the front, and
the tilt shooting may affect the judgment of the movement. Fig.4 shows the schematic diagram of
the deployment in the ward, the L515 Lidar is fixed on the wall, and the optical axis is tilted
downward, the origin of the lidar coordinate system is the center of the camera, and the direction of
the z axis is consistent with that of the optical axis. When the human body shows an oblique attitude
in the lidar camera coordinate system, the world coordinate system can express the human body
posture more accurately. The transformation matrix is calculated to realize the transformation from
the camera coordinate system to the world coordinate system.
The Random Sample Consensus (RANSAC) is used to fit the ground for the shot scene, and
the plane equation of the ground can be expressed as Eq. (1):

ax + by + cz + d = 0 (1)

Then, the unit normal vector of the ground in the lidar coordinate system is expressed as Eq. (2):

(a, b, c)
r lidar = (2)
a 2 + b2 + c 2

The normal vector of the ground in the world coordinate system is r ground = (0,0,1) ,
r = r lidar  r ground , the transformation matrix R is expressed as Eq. (3):
 0 −r z ry 
 
R = I  cos  + (1-cos  )  r  r + sin    r z
T
0 −r x  (3)
 
 −r y rz 0 


5
Where I represents the unit vector, and θ is the angle between r lidar and r ground . The keypoints
of the human body output by the 3D pose estimation module are transformed into the
representation of the world coordinate system through matrix transformation, which is more
convenient for feature extraction.

2.2 3D pose estimation module

The 3D pose estimation module locates the 3D coordinates of 15 keypoints of the human body
to form a skeleton model, which is used as the input of action classification. Referring to the work
[20], we set anchor points evenly on the graph. Firstly, we estimate the deviation between anchor
point coordinates and keypoint coordinates as well as the response of anchor points, and then convert
the response of anchor points into the weight of anchor points. Then, coordinate deviation and
weight reuse weighting are used to calculate the coordinates of keypoints. The size of the input
depth image is 288*288. resnet50[21] is used as the backbone network, and there are three branches
to estimate the offset of the plane coordinates of keypoints, the depth coordinates and the anchor
response, respectively. Most of the existing human pose depth map datasets have relatively ideal
shooting angles, with fewer cases of human body occlusion and self-occlusion, empty background,
and clear human body outline, the work [20] has achieved good results on the ITOP dataset.
However, the space of the ward is limited, the medical instruments and equipment are numerous,
and the occlusion and complex background bring interference to the human posture recognition,
moreover, the human body point cloud information obtained from the oblique shooting angle has a
certain degree of loss compared with the forward shooting. There are still shortcomings when the
original network is applied to the medical scene, this paper makes improvements for the problems:
1) When the depth difference between the human body and the background is limited, the original
network structure has a large deviation in the prediction of the depth coordinates of the keypoints.
By combining the plane coordinate offset estimation branch and the depth estimation branch into
the 3d offset branch, the 3d characteristics of the depth map can be fully utilized to reduce errors
and reduce the amount of calculation. 2) The human body is highly flexible and poses are complex
and diverse, and the lack of color details in the depth map is not conducive to accurately predicting
the coordinates of keypoints in complex poses. The shallow feature map contains more low-level
information, such as target shape and texture, etc. Concat the shallow feature map obtained by
convolutional layer 3 in the backbone network with the feature map obtained by convolutional layer
4 can enhance the ability to locate the keypoints of the human body in different positions, and is
also conducive to improving the positioning accuracy of the keypoints coordinates of the slender

6
parts of the limbs. The improved network structure is shown in Fig.5.

Fig.5 3D pose estimation network

The work[20] constructs two loss functions based on the positioning of anchor points and
keypoints. The informative anchor point surrounding loss is used to locate the anchor points around
the keypoints, then the weight of the anchor points and the coordinate offset between the keypoints
and the anchor points are used to calculate the loss of the keypoints. The expression of the
informative anchor point surrounding loss is given by Eq. (4):
~
loss1 = ∑𝑗∈𝐽 𝐿𝜏 (∑ 𝑃𝑗 (𝑎)𝑆(𝑎) − 𝑇𝑗 )
𝑎∈𝐴 (4)

~
Where 𝑃𝑗 (𝑎) is the weight of anchor point 𝑎 to keypoint 𝑗, 𝑆(𝑎) is the coordinates of anchor point
𝑎, 𝑇𝑗 is the actual position of keypoint 𝑗, and 𝐿𝜏 (𝑥) is the smoothL1 like loss function[22] given by
Eq. (5):

 1 2
 x , x 
 2
L ( x ) =  (5)
 x − , x 

 2

Where τ is 1. When calculating the locating loss of keypoints, the depth value is expanded to 50
times of the actual depth value (m), so that the plane coordinates and depth coordinates are in the
same order of magnitude. Due to the small number of pixels in the wrist, ankle and other limb parts,
the positioning accuracy is limited, which will make the loss difficult to further decrease in the later
stage of training, and the action classification mainly depends on the information of the keypoints
of the trunk,, we add a weight factor to the keypoint positioning loss, the keypoints of the trunk are
assigned a higher weight, other keypoints are assigned a lower weight, and the improved keypoint
positioning loss is as Eq. (6):

7
~
loss2 =∑ 𝜔𝑗 𝐿𝜏 (∑ 𝑃𝑗 (𝑎)(𝑆(𝑎) + 𝑂𝑗 (𝑎)) − 𝑇𝑗 )
𝑗∈𝐽 𝑎∈𝐴 (6)

Where 𝜔𝑗 is the weight factor of keypoint 𝑗, which is set as 1.5 for keypoint0, 1, 8, 9, 10, 1.2 for
keypoint2, 3, 11, 12, and 0.8 for other keypoints. 𝑂𝑗 (𝑎) represents the coordinate deviation from
anchor point 𝑎 to keypoint 𝑗. In order to further constrain the position of keypoints, bone length loss
is introduced, the difference between the skeleton length formed by keypoints and the actual
skeleton length is optimized to reduce the abnormal positioning of keypoints. 𝐿𝑚,𝑛 is defined as the
bone length formed by keypoints m and n, and the expression of bone length loss was shown as Eq.
(7):
𝑝𝑟𝑒𝑑
loss3 =∑ 𝜔𝑙 |𝐿𝑡𝑟𝑢𝑡ℎ
𝑚,𝑛 − 𝐿𝑚,𝑛 |
𝑚,𝑛∈{0,1,⋯,14} (7)

Where 𝜔𝑙 is the weight factor of skeleton length, which is set as 1.2 for 𝐿0,1, 𝐿1,2, 𝐿1,3, 𝐿1,8, 𝐿8,9 ,
𝐿8,10, and 0.8 for the rest. The total loss of the pose estimation network is expressed as Eq. (8):
loss = loss1 + 𝜆loss2 + loss3 (8)
𝜆=1.5 is the factor that balances the three losses.

2.3 Action classification module

Keypoints are important for describing human posture and behavior. When people make
different postures, the keypoints and joint vectors show different spatial information, tracking the
coordinates of joints and exploring the spatio-temporal geometric angle information of joint vectors
can provide very direct and reliable information for human action recognition. In the action
classification module, we use the skeleton features designed by hand to train the classifier and
complete the action recognition.

B = ( J ,V ) is used to represent the skeleton information of the human body, where J is the

set of joint point positions and V is the set of joint vectors. According to the transformation matrix

in Section 3.1, the coordinates of keypoints are converted into the representation in the world

coordinate system as Eq. (9):

J = J lidar  R + H lidar (9)

H lidar represents the vertical distance between the lidar position and the ground, and the world
coordinate of the kth keypoint of the human body is expressed as Eq. (10):
𝐽𝑘 = (𝑥, 𝑦, 𝑧) (10)

8
Coordinate z represents the height of the keypoint from the ground. The joint vector of the effective
part of the limb is extracted, and the joint vector composed of the m and n keypoints is expressed
as Eq. (11):
𝑉𝑛,𝑚 = 𝐽𝑛 − 𝐽𝑚 (11)
When the human body in the scene shows a certain posture, each segment of joint vector and

the scene form a certain position and Angle. The Angle between the two segments of joint vectors

connected by the same joint node also provides important information for describing the human

behavior. The calculation of the direction cosine feature and the Angle cosine feature of the joint is

conducive to describing the human posture. Among them, the trunk of the human body changes

more obviously, while the movement of the limbs is more flexible, and there is no clear distinction

between different postures. We extracted eight groups of features based on the keypoints of the trunk

to describe the posture of the human body, including:

( )( )( )(
1) Four cosine features of joint direction: V1,8 , V9,11 , V1,8 ,V10,12 , V9,11 ,V11,13 , V10,12 , V12,14 )
2) Three cosine features of joint direction: (𝑉1,8 , r ground ) , (𝑉9,11 , r ground ) , (𝑉10,12 , r ground )

3) A high feature:
J 8 ( z ) + J 9 ( z ) + J10 ( z )
H 8,9,10 =
3
After extracting skeleton features to form samples, it is necessary to select an appropriate
classifier to train the samples. SVM is a binary classifier proposed by Vapnik[23], which has perfect
mathematical theory and is often used in data classification and regression prediction. In this paper,
the nonlinear multidimensional Support Vector Classifier (SVC) is selected according to the actual
situation. SVC allows the establishment of a more robust model at the cost of sacrificing a small
number of classification errors, which can show more stable classification performance for complex
human pose.

3. Results

3.1 Dataset and experimental settings

The data acquisition equipment used in the experiment is Intel realsense L515 lidar. Human
standing, sitting, lying and falling postures of different genders, ages and body types are collected
in the simulated ward of the laboratory and the ward of the First Affiliated Hospital of Sun Yat-sen
University respectively. After labeling and data enhancement, a total of 3552 samples are obtained

9
to form a medical human posture dataset, 80% of them are used as training samples and 20% as
testing samples.
The experimental system environment is Windows10 64-bit, equipped with intel core i7-10700
CPU and NVIDIA GeForce RTX 3080 Ti. It is developed based on pytorch1.10 framework, and the
version of cuda is 11.3.1, the version of cudnn is 8.2.1, the size of the input image is 288*288. The
training is carried out by fine-tune, a total of 300 epochs are trained, the first 10 epochs are used to
preheat the learning rate, and the last 290 epochs are used to adjust the learning rate by Adam
optimizer, and the model with the best effect in the training process is saved.

3.2 Pose estimation results

The average precision (mAP) [24, 25] of 10 cm is used as the evaluation metric to compare the
performance of our proposed method with several mainstream 3D pose estimation methods on the
medical human pose dataset. Tab.1 compares the accuracy of different methods for estimating
human keypoints. Meanwhile, the percentage of success frames over different error thresholds are
given in Fig. 6. The method proposed in this paper has achieved good results. The use of shallow
feature maps provides more information about shape, texture and so on, which significantly
improves the positioning accuracy of human limbs, the mAP of each part is no less than70%. The
merging of plane offset estimation branch and depth branch gives full play to the 3D information
advantage of depth image, improves the depth prediction of keypoints, and the improvement of loss
function also further improves the accuracy of trunk keypoints, mAP reaches 83.37%, which is 3.9%
higher than the original A2J. The improved method achieves 184.69 frame/s reasoning speed on
3080Ti, which is far faster than other methods, compared with the original A2J, the merging of
network branches reduces the amount of calculation and the reasoning speed is increased by 9.54
FPS. Experiments show that the proposed method has better performance in accuracy and speed.
Fig. 7 shows the estimation results of the proposed 3D pose estimation network for different
human poses in the ward scene. In the visual field of lidar, there are more self-cover when people
are standing, sitting and lying on their side, when people fall or lie down, the depth difference
between their limbs and the background is small, and the outline of body is not clear, when people
are lying on the bed covered with a quilt, the position of their limbs is more difficult to identify,
which brings great challenges to posture estimation. Our proposed method fully considers the above
problems, and can effectively estimate the position of keypoints in various human poses.
Experiments show that the proposed method can be applied to human pose estimation in medical
scenarios.

10
Tab.1 Comparison of results of different pose estimation methods on medical human pose dataset

Towards. Integral. V2V GAST A2J

Method Ours
[26] [27] [28] [29] [20]
Head 80.08 91.01 88.68 87.14 88.29 91.14
Neck 81.22 90.30 92.37 89.71 88.86 92.43
Shoulders 74.43 89.52 89.60 83.58 83.00 87.86
Elbows 42.65 71.13 79.86 76.16 75.14 81.71
Wrists 29.09 66.24 67.63 66.72 66.29 71.72
mAP
Turso 82.54 86.60 91.84 88.28 91.43 90.29
Hips 77.61 72.89 80.92 84.86 78.14 85.57
Knees 68.87 68.36 81.19 80.71 75.01 84.55
Ankles 56.49 59.95 70.92 71.57 68.29 77.43
Average 65.89 74.94 80.88 79.47 78.11 83.37

Fig.6 The percentage of success frames over different error thresholds.

11
Fig.7 Results of pose estimation. The red part represents the right side of the body and the blue
part represents the left side

3.3 Action recognition results

The human skeleton obtained by the pose estimation module is used as input to test the
recognition accuracy of the proposed action recognition method for human standing, sitting, lying,
and falling. As shown in Fig. 8, successful pose estimation is beneficial for action classification, and
the action classification module can still output correct recognition results when some keypoints are
incorrectly located.

12
Fig. 8 Action recognition results on the medical human pose dataset

According to the confusion matrix of recognition results in the medical human post test set
shown in Fig. 9, the accuracy, recall and F1 score are calculated as the performance indicators of
the algorithm, and the calculation expressions are shown in s Eq. (12), (13) and (14) respectively.

TP
precision = (12)
TP + FP

TP
recall = (13)
TP + FN

precision  recall
F1 = 2  (14)
precision + recall

The calculation results are shown in Tab.2. The average accuracy and recall of four types of
actions are both above 90%, and the F1 score is 0.93. Among them, the accuracy reaches 95% when
standing due to less human body occlusion and great difference from the background, and the recall
of fall reaches 96.75%, indicating that the skeleton features we design are close to the patient’s
posture characteristics, The experiment proves that the method of combining the manual design
features with SVM classifier proposed in this paper is simple and effective, which is suitable for
action recognition in medical scenes.

13
Fig. 9 Confusion matrix of test results for action recognition

Tab.2 Recognition performance of the proposed method for different poses

Pose Stand Sit Lie Fall Average

precision(%) 95.00 92.57 91.95 94.30 93.46
recall(%) 95.00 89.01 93.37 96.75 93.53
F1 score 0.95 0.91 0.93 0.96 0.93

The training speed of SVM is fast and the reasoning time is microsecond. The whole algorithm
achieves a running speed of about 42 frames per second on the 3080Ti, and it only takes 24ms to
process a depth image, including 15ms for image reading and preprocessing. The algorithm achieves
a balance between accuracy and speed, and has the ability to be deployed in medical scenarios.

4. Conclusion
In this paper, we propose a lidar-based patient monitoring system in medical scene. Lidar
realizes non-intrusive and privacy-safe sensing. Using a dataset we collected from the First
Affiliated Hospital, Sun Yat-Sen University, the proposed 3D human posture estimation and motion
recognition technology can recognize standing, sitting, lying and falling with 93.46% accuracy and
42fps speed. The experimental results show that the proposed method achieves better results than
other mainstream algorithms in medical human pose dataset, and is suitable for deployment and
application in medical scenarios, which can help to meet the challenge of increasing burden on the
healthcare system.

14
References
[1] Berney, S.C., Rose, J.W., Bernhardt, J., Denehy, L.: Prospective observation of physical activity in
critically ill patients who were intubated for more than 48 hours. Journal of critical care 30(4), 658–663
(2015)
[2] Patel, R.S., Bachu, R., Adikey, A., Malik, M., Shah, M.: Factors related to physician burnout and its
consequences: a review. Behavioral sciences 8(11), 98 (2018)
[3] Lyon, M., Sturgis, L., Lendermon, D., Kuchinski, A.M., Mueller, T., Loeffler, P., Xu, H., Gibson, R.:
Rural ed transfers due to lack of radiology services. The American journal of emergency medicine 33(11),
1630–1634 (2015)
[4] Kong, Y., Fu, Y.: Human action recognition and prediction: A survey. International Journal of
Computer Vision 130(5), 1366–1401 (2022)
[5] Ozcan, T., Basturk, A.: Human action recognition with deep learning and structural optimization using
a hybrid heuristic algorithm. Cluster Computing 23(4), 2847–2860 (2020)
[6] Prati, A., Shan, C., Wang, K.I.-K.: Sensors, vision and networks: From video surveillance to activity
recognition and health monitoring. Journal of Ambient Intelligence and Smart Environments 11(1), 5–
22 (2019)
[7] Wang, L., Huynh, D.Q., Koniusz, P.: A comparative review of recent kinect-based action recognition
algorithms. IEEE Transactions on Image Processing 29, 15–28 (2019)
[8] Jaouedi, N., Boujnah, N., Bouhlel, M.S.: A new hybrid deep learning model for human action
recognition. Journal of King Saud University-Computer and Information Sciences 32(4), 447–453 (2020)
[9] Muhammad, K., Ullah, A., Imran, A.S., Sajjad, M., Kiran, M.S., Sannino, G., Albuquerque, V.H.C.,
et al.: Human action recognition using attention based lstm network with dilated cnn features. Future
Generation Computer Systems 125, 820–830 (2021)
[10] Wang, L., Tong, Z., Ji, B., Wu, G.: Tdn: Temporal difference networks for efficient action
recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,
pp. 1895–1904 (2021)
[11] Jhuang, Y.-Y., Tsai, W.-J.: Deeppear: Deep pose estimation and action recognition. In: 2020 25th
International Conference on Pattern Recognition (ICPR), pp. 7119–7125 (2021). IEEE [12] Islam, M.S.,
Bakhat, K., Khan, R., Naqvi, N., Islam, M.M., Ye, Z.: Applied human action recognition network based
on snsp features. Neural Processing Letters 54(3), 1481–1494 (2022)
[13] Ma, A.J., Rawat, N., Reiter, A., Shrock, C., Zhan, A., Stone, A., Rabiee, A., Griffin, S., Needham,
D.M., Saria, S.: Measuring patient mobility in the icu using a novel noninvasive sensor. Critical care
medicine 45(4), 630 (2017)
[14] Yeung, S., Rinaldo, F., Jopling, J., Liu, B., Mehra, R., Downing, N.L., Guo, M., Bianconi, G.M.,
Alahi, A., Lee, J., et al.: A computer vision system for deep learning-based detection of patient
mobilization activities in the icu. NPJ digital medicine 2(1), 11 (2019)
[15] Luo, Z., Hsieh, J.-T., Balachandar, N., Yeung, S., Pusiol, G., Luxenberg, J., Li, G., Li, L.-J., Downing,
N.L., Milstein, A., et al.: Computer vision-based descriptive analytics of seniors’ daily activities for long-
term health monitoring. Machine Learning for Healthcare (MLHC) 2(1) (2018)
[16] Min, Y., Zhang, Y., Chai, X., Chen, X.: An efficient pointlstm for point clouds based gesture
recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,
pp. 5761–5770 (2020)
[17] Fan, H., Yang, Y., Kankanhalli, M.: Point 4d transformer networks for spatio-temporal modeling in

15
point cloud videos. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
Recognition, pp. 14204–14213 (2021)
[18] Katayama, H., Mizomoto, T., Rizk, H., Yamaguchi, H.: You work we care: Sitting posture
assessment based on point cloud data. In: 2022 IEEE International Conference on Pervasive Computing
and Communications Workshops and Other Affiliated Events (PerCom Workshops), pp. 121–123 (2022).
IEEE
[19] Xu, Y., Jung, C., Chang, Y.: Head pose estimation using deep neural networks and 3d point clouds.
Pattern Recognition 121, 108210 (2022)
[20] Xiong, F., Zhang, B., Xiao, Y., Cao, Z., Yu, T., Zhou, J.T., Yuan, J.: A2j: Anchor-to-joint regression
network for 3d articulated pose estimation from a single depth image. In: Proceedings of the IEEE/CVF
International Conference on Computer Vision, pp. 793–802 (2019)
[21] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings
of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
[22] Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: Towards real-time object detection with region
proposal networks. Advances in neural information processing systems 28 (2015) [23] Cortes, C., Vapnik,
V.: Support-vector networks. Machine learning 20, 273–297 (1995)
[24] Haque, A., Peng, B., Luo, Z., Alahi, A., Yeung, S., Fei-Fei, L.: Towards viewpoint invariant 3d
human pose estimation. In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The
Netherlands, October 11–14, 2016, Proceedings, Part I 14, pp. 160–177 (2016). Springer
[25] Wang, K., Zhai, S., Cheng, H., Liang, X., Lin, L.: Human pose estimation from depth images via
inference embedded multi-task learning. In: Proceedings of the 24th ACM International Conference on
Multimedia, pp. 1227–1236 (2016)
[26] Zhou, X., Huang, Q., Sun, X., Xue, X., Wei, Y.: Towards 3d human pose estimation in the wild: a
weakly-supervised approach. In: Proceedings of the IEEE International Conference on Computer Vision,
pp. 398–407 (2017)
[27] Sun, X., Xiao, B., Wei, F., Liang, S., Wei, Y.: Integral human pose regression. In: Proceedings of the
European Conference on Computer Vision (ECCV), pp. 529–545 (2018)
[28] Moon, G., Chang, J.Y., Lee, K.M.: V2v-posenet: Voxel-to-voxel prediction network for accurate 3d
hand and human pose estimation from a single depth map. In: Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, pp. 5079–5088 (2018)
[29] Liu, J., Rojas, J., Li, Y., Liang, Z., Guan, Y., Xi, N., Zhu, H.: A graph attention spatio-temporal
convolutional network for 3d human pose estimation in video. In: 2021 IEEE International Conference
on Robotics and Automation (ICRA), pp. 3374–3380 (2021). IEEE

Signature Redacted
No ratings yet
Signature Redacted
101 pages
Anomaly Detection in Smart Environments A Comprehe
No ratings yet
Anomaly Detection in Smart Environments A Comprehe
45 pages
ANFIS - Identification With Special Emphasis On Training Data Minimization
No ratings yet
ANFIS - Identification With Special Emphasis On Training Data Minimization
141 pages
ME6501 Only 4 Units Available
No ratings yet
ME6501 Only 4 Units Available
140 pages
A Systematic Review of A Digital Twin City
No ratings yet
A Systematic Review of A Digital Twin City
10 pages
Total. FDPPDF
No ratings yet
Total. FDPPDF
226 pages
Methods For Automated Design of
No ratings yet
Methods For Automated Design of
255 pages
HEAT and MASS TRANSFER - Lecture 10
100% (1)
HEAT and MASS TRANSFER - Lecture 10
34 pages
Overview of Support Vector Regression
No ratings yet
Overview of Support Vector Regression
15 pages
GraphSignalProcessing ICIP 2013 Ortega
No ratings yet
GraphSignalProcessing ICIP 2013 Ortega
125 pages
Unit 5
No ratings yet
Unit 5
110 pages
Buku - Data Driven Detection and Diagnosis of Faults in Traction Systems of High Speed Train
No ratings yet
Buku - Data Driven Detection and Diagnosis of Faults in Traction Systems of High Speed Train
164 pages
PHD Thesis Application of Data Mining Techniques For Fault Diagnosis and Prognosis of High Pressure Pump Failures
No ratings yet
PHD Thesis Application of Data Mining Techniques For Fault Diagnosis and Prognosis of High Pressure Pump Failures
263 pages
D Smith Thesis June 2022
100% (1)
D Smith Thesis June 2022
285 pages
Industrial Wet ESP Technology
No ratings yet
Industrial Wet ESP Technology
8 pages
Introduction To Graph Theory. Implementation of The Graph Using The Python Language.
No ratings yet
Introduction To Graph Theory. Implementation of The Graph Using The Python Language.
18 pages
M.Sc. Graph - Theory 2024 24 1
No ratings yet
M.Sc. Graph - Theory 2024 24 1
165 pages
Unit 6
No ratings yet
Unit 6
126 pages
All Slides DT Only 2017
100% (1)
All Slides DT Only 2017
551 pages
Lecture 4.4 Genetic Algorithms For Optimum Design
100% (1)
Lecture 4.4 Genetic Algorithms For Optimum Design
75 pages
Matlabch 01
No ratings yet
Matlabch 01
135 pages
Spatial Domain-Filtering
No ratings yet
Spatial Domain-Filtering
119 pages
Multibody Mechanical Systems SEO Guide
No ratings yet
Multibody Mechanical Systems SEO Guide
102 pages
Thesis Final
No ratings yet
Thesis Final
255 pages
Matlab Modal Analysis
No ratings yet
Matlab Modal Analysis
5 pages
Advanced Gasturbine (Agt) Technology Development Project: DOE/NASA/0167-12 Nasa CR-180891 Garrett NO. 31-3725
No ratings yet
Advanced Gasturbine (Agt) Technology Development Project: DOE/NASA/0167-12 Nasa CR-180891 Garrett NO. 31-3725
386 pages
Outlier Detection Using Isolation Forest
No ratings yet
Outlier Detection Using Isolation Forest
5 pages
Parlar Jay 2010 PH.D PDF
No ratings yet
Parlar Jay 2010 PH.D PDF
199 pages
Frequency Domain State-Space System Identification
No ratings yet
Frequency Domain State-Space System Identification
6 pages
Gas Turbine Vibration Analysis Techniques
No ratings yet
Gas Turbine Vibration Analysis Techniques
9 pages
Implementation of Interactive Control of A Crane Ship Model in MATLAB/Simulink Environment
No ratings yet
Implementation of Interactive Control of A Crane Ship Model in MATLAB/Simulink Environment
9 pages
Basic Concepts of Machine Learning For Beginners 1732109263
No ratings yet
Basic Concepts of Machine Learning For Beginners 1732109263
102 pages
AT01.2a - Sketching-7
No ratings yet
AT01.2a - Sketching-7
20 pages
A Review On Bayesian Modeling Approach To Quantify Failure Risk Assessment of Oil and Gas Pipelines Due To Corrosion
No ratings yet
A Review On Bayesian Modeling Approach To Quantify Failure Risk Assessment of Oil and Gas Pipelines Due To Corrosion
19 pages
Ordinary Differential Equation Based Recurrent Neural Network Models For Learning Continuous Time Series
100% (1)
Ordinary Differential Equation Based Recurrent Neural Network Models For Learning Continuous Time Series
316 pages
Fault Detection and Diagnosis of Air-Conditioning Systems Using
No ratings yet
Fault Detection and Diagnosis of Air-Conditioning Systems Using
23 pages
Model-Based Engineering of Embedded Systems
No ratings yet
Model-Based Engineering of Embedded Systems
411 pages
Informed Machine Learning: Daniel Schulz Christian Bauckhage
No ratings yet
Informed Machine Learning: Daniel Schulz Christian Bauckhage
344 pages
BASF 100 Years Ammonia 1902-1924 - en
No ratings yet
BASF 100 Years Ammonia 1902-1924 - en
4 pages
Dissertation - Advanced Techniques For Detecting Power Transformer Faults
No ratings yet
Dissertation - Advanced Techniques For Detecting Power Transformer Faults
103 pages
Time Series Anomaly Detection Intro
No ratings yet
Time Series Anomaly Detection Intro
43 pages
A New Multivariate Process Capability in
No ratings yet
A New Multivariate Process Capability in
17 pages
DeepChem: Deep Learning for Life Sciences
No ratings yet
DeepChem: Deep Learning for Life Sciences
654 pages
STAT3006 Lecture Notes 2021 Aug8 2021
No ratings yet
STAT3006 Lecture Notes 2021 Aug8 2021
110 pages
Vibro Acoustic Simulation
No ratings yet
Vibro Acoustic Simulation
28 pages
Real Time Fault Monitoring of Industrial Processes
No ratings yet
Real Time Fault Monitoring of Industrial Processes
570 pages
Modelling and Simulation - Lecture 02 - Final Version - Part2
100% (1)
Modelling and Simulation - Lecture 02 - Final Version - Part2
22 pages
GAN-Based Anomaly Detection for CPS
No ratings yet
GAN-Based Anomaly Detection for CPS
10 pages
Vision-Based Construction Monitoring
No ratings yet
Vision-Based Construction Monitoring
17 pages
(SpringerBriefs in Mathematics) Qi He, Le Yi Wang, George G. Yin - System Identification Using Regular and Quantized Observations - Applications of Large Deviations Principles-Springer (2013)
No ratings yet
(SpringerBriefs in Mathematics) Qi He, Le Yi Wang, George G. Yin - System Identification Using Regular and Quantized Observations - Applications of Large Deviations Principles-Springer (2013)
108 pages
Notes On Time Series Analysis
No ratings yet
Notes On Time Series Analysis
111 pages
Multivariate Time Series Classification of Sensor Data From An in
No ratings yet
Multivariate Time Series Classification of Sensor Data From An in
101 pages
Dragline English Manual
No ratings yet
Dragline English Manual
42 pages
Anomaly Detection Toolkit Overview
No ratings yet
Anomaly Detection Toolkit Overview
29 pages
Paper 27-An Efficient Patient Activity Recognition Using LSTM Network
No ratings yet
Paper 27-An Efficient Patient Activity Recognition Using LSTM Network
8 pages
Mm-Pose Real-Time Human Skeletal Posture Estimation Using Mmwave Radars and CNNs
No ratings yet
Mm-Pose Real-Time Human Skeletal Posture Estimation Using Mmwave Radars and CNNs
13 pages
Mm-Pose Real-Time Human Skeletal Posture Estimation Using Mmwave Radars and CNNs
No ratings yet
Mm-Pose Real-Time Human Skeletal Posture Estimation Using Mmwave Radars and CNNs
13 pages
Wast Detection
No ratings yet
Wast Detection
14 pages
Yu Rotationally Equivariant 3D Object Detection CVPR 2022 Paper
No ratings yet
Yu Rotationally Equivariant 3D Object Detection CVPR 2022 Paper
9 pages
Xia Sparse Local Patch Transformer For Robust Face Alignment and Landmarks CVPR 2022 Paper
No ratings yet
Xia Sparse Local Patch Transformer For Robust Face Alignment and Landmarks CVPR 2022 Paper
10 pages
Wang Do Learned Representations Respect Causal Relationships CVPR 2022 Paper
No ratings yet
Wang Do Learned Representations Respect Causal Relationships CVPR 2022 Paper
11 pages
Li Recurrent Dynamic Embedding For Video Object Segmentation CVPR 2022 Paper
No ratings yet
Li Recurrent Dynamic Embedding For Video Object Segmentation CVPR 2022 Paper
10 pages
Classification of States On Certain Orthomodular Structures: Lavinia Corina Ciungu
No ratings yet
Classification of States On Certain Orthomodular Structures: Lavinia Corina Ciungu
12 pages
Fine-Tuning Models Comparisons On Garbage Classification For Recyclability
No ratings yet
Fine-Tuning Models Comparisons On Garbage Classification For Recyclability
4 pages
MRS Yolo
No ratings yet
MRS Yolo
22 pages
Miljkovic - THE USE OF NANOEMULSION-BASED EDIBLE COATINGS TO PROLONG THE SHELF-LIFE OF CHEESE
No ratings yet
Miljkovic - THE USE OF NANOEMULSION-BASED EDIBLE COATINGS TO PROLONG THE SHELF-LIFE OF CHEESE
7 pages
Clariant Blue Magic
100% (2)
Clariant Blue Magic
9 pages
Assertion & Reason Questions on Ecology
No ratings yet
Assertion & Reason Questions on Ecology
5 pages
Pest and Vermin Control Program Guide
100% (3)
Pest and Vermin Control Program Guide
4 pages
HDFC SLIC vs ICICI: Life Insurance Insights
75% (16)
HDFC SLIC vs ICICI: Life Insurance Insights
94 pages
NCIDQ Exam Sample Questions
No ratings yet
NCIDQ Exam Sample Questions
8 pages
2013 PMB ICD 10 Coded List Distribution
No ratings yet
2013 PMB ICD 10 Coded List Distribution
1,194 pages
Construction Specifications Overview
No ratings yet
Construction Specifications Overview
1 page
How To Solve The Solid Waste Disposal Management I
No ratings yet
How To Solve The Solid Waste Disposal Management I
13 pages
Islam Soltan Report 3 CEPs
No ratings yet
Islam Soltan Report 3 CEPs
9 pages
Elemental Analysis An Introduction To Modern Spectrometric Techniques 1st Edition Gerhard Schlemmer Lieve Balcaen Jos Luis Todol Michael W Hinds Download
100% (10)
Elemental Analysis An Introduction To Modern Spectrometric Techniques 1st Edition Gerhard Schlemmer Lieve Balcaen Jos Luis Todol Michael W Hinds Download
75 pages
Uh Short Message, Announcement, Notice
No ratings yet
Uh Short Message, Announcement, Notice
3 pages
Child Abuse in Pakistani Seminaries
No ratings yet
Child Abuse in Pakistani Seminaries
4 pages
Well Intervention Pressure Control Guide
No ratings yet
Well Intervention Pressure Control Guide
94 pages
Year 11 Pe Exam Term 1
No ratings yet
Year 11 Pe Exam Term 1
8 pages
HABTE BEZ Proposal
No ratings yet
HABTE BEZ Proposal
39 pages
Persuasive Writing Power Point
100% (1)
Persuasive Writing Power Point
34 pages
Green Building Design Essentials
No ratings yet
Green Building Design Essentials
7 pages
General Chemical Safety Inspection Checklist
100% (3)
General Chemical Safety Inspection Checklist
3 pages
Testing Hager AFDD for Safety Compliance
No ratings yet
Testing Hager AFDD for Safety Compliance
2 pages
Careers in Food Science From Undergraduate To Professional 2nd Edition Richard W Hartel PDF Download
100% (1)
Careers in Food Science From Undergraduate To Professional 2nd Edition Richard W Hartel PDF Download
88 pages
Log Book Idccm
50% (2)
Log Book Idccm
38 pages
Organic Chemistry Basics
100% (1)
Organic Chemistry Basics
44 pages
Contractual Obligations in Property Law
No ratings yet
Contractual Obligations in Property Law
3 pages
Welder Qualification
No ratings yet
Welder Qualification
57 pages
Polyprayon
No ratings yet
Polyprayon
2 pages
Week 13 Documentation Related To Client Care
100% (1)
Week 13 Documentation Related To Client Care
87 pages
Challenges Confronted by Service Sectors
No ratings yet
Challenges Confronted by Service Sectors
16 pages
Single-Step Purification of Ca
No ratings yet
Single-Step Purification of Ca
9 pages
The Photoconductive Cell
No ratings yet
The Photoconductive Cell
2 pages

A Lidar-Based Computer Vision System For Monitoring

Uploaded by

A Lidar-Based Computer Vision System For Monitoring

Uploaded by

A Lidar-based computer vision system for

monitoring of patients in medical scenes

Keywords: Patient monitoring, computer vision, Lidar, privacy-safe

Posted Date: April 6th, 2023

Additional Declarations: No competing interests reported.

patients in medical scenes

Statements and Declarations

Fig.1 Lidar-based patient monitoring system in medical scene

Fig.3 3D skeleton of human body

2.1 Data preprocessing and scene parameter calculation

2.2 3D pose estimation module

Fig.5 3D pose estimation network

2.3 Action classification module

coordinate system as Eq. (9):

J = J lidar  R + H lidar (9)

to describe the posture of the human body, including:

3.1 Dataset and experimental settings

3.2 Pose estimation results

Towards. Integral. V2V GAST A2J

Fig.6 The percentage of success frames over different error thresholds.

3.3 Action recognition results

Tab.2 Recognition performance of the proposed method for different poses

Pose Stand Sit Lie Fall Average

You might also like