0% found this document useful (0 votes)
37 views15 pages

Uni HAR

Research Paper

Uploaded by

chiguoxuan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views15 pages

Uni HAR

Research Paper

Uploaded by

chiguoxuan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Practically Adopting Human Activity Recognition

Huatao Xu1 , Pengfei Zhou2 , Rui Tan1 , Mo Li1,3


1 Nanyang
Technological University, 2 University of Pittsburgh,
3 Hong
Kong University of Science and Technology
Email:{huatao001, tanrui}@ntu.edu.sg, [email protected], [email protected]
ABSTRACT
Existing inertial measurement unit (IMU) based human activ-
ity recognition (HAR) approaches still face a major challenge
when adopted across users in practice. The severe hetero-
geneity in IMU data significantly undermines model gener-
alizability in wild adoption. This paper presents UniHAR, a
universal HAR framework for mobile devices. To address the
challenge of data heterogeneity, we thoroughly study aug-
menting data with the physics of the IMU sensing process
and present a novel adoption of data augmentations for ex-
ploiting both unlabeled and labeled data. We consider two ap-
plication scenarios of UniHAR, which can further integrate
federated learning and adversarial training for improved Figure 1: A universal HAR scenario.
generalization. UniHAR is fully prototyped on the mobile
platform and introduces low overhead to mobile devices. Ex-
tensive experiments demonstrate its superior performance
in adapting HAR models across four open datasets. 1 INTRODUCTION
Human activity recognition (HAR) has played a critical role
CCS CONCEPTS in considerable real-world applications. Existing studies [8,
• Human-centered computing → Ubiquitous and mo- 9, 15, 25, 29, 40, 64] have explored the possibility of per-
bile computing systems and tools; • Computing method- vasive HAR sensing with inertial measurement unit (IMU)
ologies → Machine learning approaches. sensors in commodity smart devices. A significant challenge
arises when most existing approaches are adopted at scale,
KEYWORDS i.e., the data heterogeneity caused by real-world diversities
(e.g., different devices and usage patterns) leads to degraded
Mobile sensing, human activity recognition, IMU, physics-
performance when HAR models are applied across different
informed data augmentation
user groups. The straightforward solution in addressing such
ACM Reference Format: heterogeneity is to collect a substantial amount of labeled
Huatao Xu, Pengfei Zhou, Rui Tan, and Mo Li. 2023. Practically data from each of the numerous users, which, however, is
Adopting Human Activity Recognition. In The 29th Annual Inter-
prohibitive in practice due to its high overhead.
national Conference on Mobile Computing and Networking (ACM
This paper is motivated by an essential question: can we
MobiCom ’23), October 2–6, 2023, Madrid, Spain. ACM, New York,
NY, USA, 15 pages. https://doi.org/10.1145/3570361.3613299 have a universal framework that supports applying HAR mod-
els across different user groups of real-world diversities, and
Permission to make digital or hard copies of all or part of this work for with realistic adoption overhead? As depicted in Figure 1, we
personal or classroom use is granted without fee provided that copies consider an HAR application scenario where a large number
are not made or distributed for profit or commercial advantage and that
copies bear this notice and the full citation on the first page. Copyrights of mobile users have their own data locally collected. The
for components of this work owned by others than the author(s) must IMU data of the target users are implicitly collected during
be honored. Abstracting with credit is permitted. To copy otherwise, or their daily lives and thus unlabeled. The only labeled data are
republish, to post on servers or to redistribute to lists, requires prior specific shared from a small group of participating users (source users),
permission and/or a fee. Request permissions from [email protected]. which are of small size and may be biased in terms of users,
ACM MobiCom ’23, October 2–6, 2023, Madrid, Spain
usage patterns, devices, or environments. The raw data trans-
© 2023 Copyright held by the owner/author(s). Publication rights licensed
to ACM. missions from the target users to the cloud server are highly
ACM ISBN 978-1-4503-9990-6/23/10. . . $15.00 undesirable due to the prohibitive processing overhead for
https://doi.org/10.1145/3570361.3613299 the cloud server as well as the related privacy concerns. The
ACM MobiCom ’23, October 2–6, 2023, Madrid, Spain Huatao Xu, Pengfei Zhou, Rui Tan, Mo Li

objective is to transfer HAR models from the source users to data augmentations are applied during the supervised train-
target users with realistic adoption overhead. ing stage to increase data diversity for better generalization.
We find existing works poor in the HAR scenario envi- In practical applications, UniHAR is a configurable frame-
sioned in Figure 1. Conventional supervised learning models work that can adapt to two scenarios, i.e., data-decentralized
[15, 25, 27, 64] assume collected labeled dataset is general and and data-centralized scenarios. In the data-decentralized sce-
thus suffer from severe performance degradation in practice. nario where raw data transmission is not encouraged, as
Recent self-supervised learning works [11, 12, 40, 52, 62], in- illustrated in Figure 1, UniHAR integrates self-supervised
cluding those aiming at building foundation models for IMU and federated learning techniques to train a generalized fea-
sensing, e.g., TPN [40] and LIMU-BERT [62], however, may ture extraction model. UniHAR then constructs an activity
still overlook the data heterogeneity and overfit to specific recognition model using limited but augmented labeled data.
user domains. Some domain adaptation works [4, 17, 37, 65] The recognition model is distributed to all users for activity
consider certain aspects of diversity but require fully la- inference without additional training. In the data-centralized
beled data from source users, which still underperform when scenario, where raw data transmissions from target users are
source domain labels are limited. We notice that most ex- possible, UniHAR can further leverage adversarial training
isting efforts are focused on directly learning common fea- techniques for improved performance.
tures among raw data, with the implied assumption that data For experiment evaluation, different from previous works
across different domains already share similar distributions. [11, 12, 15, 27, 33, 40, 47, 52, 62–64], UniHAR is fully proto-
However, this assumption does not hold when the sensor typed on the mobile platform. The client is deployable on
data collected from different user groups are highly hetero- Android devices, which supports real-time model training
geneous. As a result, most existing approaches fail to achieve and inference with locally collected data. We conduct ex-
satisfactory performance in practical adoptions at scale. tensive experiments with four open datasets by transferring
This paper explores the data augmentation perspective to models across datasets, i.e., the activity recognition mod-
combat data heterogeneity by incorporating physical knowl- els are trained with activity labels from only one dataset
edge. Most existing IMU data augmentation approaches are and then applied to the other three datasets without activity
directly borrowed from other application domains (e.g., im- labels. To the best of our knowledge, such a level of hetero-
ages or text processing [45, 46, 60]) without considering geneity involved in the experiment settings has not been
and exploiting the physics of inertial sensing, which can investigated in existing studies. The results show UniHAR
lead to harmful results when improperly adopted. We thor- achieves an average HAR accuracy of 78.5% as compared
oughly study a variety of IMU data augmentation methods to <62% achieved by extending any existing solutions as al-
and classify them into three categories based on their rela- ternatives. When the raw data transmissions are allowed
tions with underlying physical processes: complete - which in the data-centralized scenario, UniHAR can achieve 82.5%
fully aligns with physics, approximate - which captures un- average accuracy as compared to <72% achieved with state-
derlying physics but with approximate formulations, and of-the-art solutions. The key contributions of this paper are
flaky - which is not supported by the physical process and summarized as follows:
may undermine data distribution. The data augmentation • We consider a practical and challenging HAR scenario,
with physical priors does not introduce extra labeling over- where models trained from a small group of source
head and would generalize data distributions. We refer to users are adopted across massive target users with
this technique as Physics-Informed Data Augmentation, as realistic adoption overhead.
opposed to the conventional data plane approaches that dis- • We present a thorough and comprehensive analysis of
regard underlying physical processes. IMU data augmentation methodology and character-
By applying the carefully designed data augmentation ize physics-informed data augmentation based on the
approaches, this paper presents UniHAR, a universal HAR underlying physics of IMU sensing.
framework that extracts generalizable activity-related repre- • We identify a novel approach that organically inte-
sentations from heterogeneous IMU data. UniHAR comprises grates different data augmentation methods into a
two stages as shown in Figure 1 - i) self-supervised learning self-supervised learning framework to address data
for feature extraction with massive unlabeled data from all heterogeneity.
users, and ii) supervised training for activity recognition • We fully prototype UniHAR on the standard mobile
with limited labeled data from the source users. Catering platform and evaluate its generalization with practi-
to the nature of different augmentation methods, UniHAR cal experiment settings across different datasets. The
only applies complete data augmentation during the feature source codes are publicly available 1 .
extraction stage to align data distributions from various user
groups. On the other hand, both complete and approximate 1 https://dapowan.github.io/wands_unihar/
Practically Adopting Human Activity Recognition ACM MobiCom ’23, October 2–6, 2023, Madrid, Spain

1
HDCNN
XHAR
0.8
Accuracy

0.6

0.4

0.2

Cross-Dataset Transfer Cases

(a) Across dataset HAR. (b) Data visualization.


Figure 3: Performance of different data augmentation
Figure 2: Impact of real-world diversities. (a) The per- methods. The "R", "N", "F" and "P" denote rotation,
formance of domain adaptation models transferred noising, flipping, and permutation. Two letters repre-
from the UCI (U) to other datasets (H,M,S) in Table 2. sent the combination of two methods and "ALL" is the
(b) Solid dots and circles represent samples of the UCI combination of all methods.
and MotionSense datasets, respectively.

2.2 IMU Data Augmentation


2 MOTIVATION
Being targeted to improving the diversity and size of training
2.1 Data Heterogeneity data to prevent overfitting, data augmentation has been a
The different users, devices, placements, and environments commonly employed technique in various application do-
cause data diversity for body-worn IMU-based HAR applica- mains [45, 46, 59, 60]. Prior studies [4, 36, 40, 52, 55, 57]
tions [16, 50, 61, 62]. Most existing works [11, 12, 15, 27, 40, directly borrow such a technique and apply many augmen-
47, 52, 62–64] overlook the heterogeneity problem and would tation methods on IMU data for improved performance, e.g.,
underperform in practice. Only a few domain adaptation- adding random noise, rotation, flipping, etc. However, it re-
based works [4, 17, 37, 65] aim at mitigating the impact of mains unclear how effective these methods from other appli-
certain aspects of diversity. To investigate how those ap- cation domains are in handling IMU data heterogeneity. To
proaches perform with such data heterogeneity, we adopt this end, we apply some classical data augmentation methods
HDCNN [17] and XHAR [65] to distinguish activity types to augment data from the UCI dataset as an example. We then
and examine their performance across datasets. We choose train two widely adopted deep learning classifiers, i.e., the
two open datasets (i.e., UCI [38] and MotionSense [31]) with GRU [62] and CNN [63] classifiers, with the augmented data
details provided in Section 7, which are collected with differ- and test their performance on the MotionSense dataset. Our
ent user groups, placements, devices, and environments. The results presented in Figure 3 indicate that many data aug-
two models are trained to transfer from the UCI dataset to mentation methods do not improve the cross-dataset HAR
the other three datasets. The detailed settings are provided in performance of the two classifiers. Some methods, such as
Section 7.1. As shown in Figure 2(a), the two models can han- noising (N) and flipping (F), even negatively impact the end
dle the diversity in the original dataset and achieve nearly performance. These findings show that many IMU data aug-
100% classification accuracy. However, when applied across mentation methods may not effectively increase IMU data
datasets, they suffer from significant performance degrada- diversity and prevent trained models from overfitting.
tion. Similarly, ASTTL as reported in [37] only yields an As our experimental results suggest, although data aug-
average accuracy of 66.3% when transferred across datasets. mentation for images or text has been well-established [45,
In summary, there exists a gap in addressing the data hetero- 46, 59, 60], a blind adoption of those may not work for IMU
geneity when adopting HAR in practice. data. This is because IMU sensor readings are observations
To investigate why the models do not perform well in the of the underlying physical states of device movement, e.g.,
experiment, we select the common activity types of the two the device orientation. Conventional data augmentation like
datasets and visualize the raw IMU reading with t-distributed flipping does not consider IMU sensing physics and directly
Stochastic Neighbor Embedding (t-SNE) [56] in 2D space. applying it to IMU data may generate readings that do not
The raw data have the same sampling rate and window size. adhere to underlying sensing principles. Such unconstrained
Figure 2(b) clearly suggests that the IMU data of the same ac- data generation may lead to biased or even wrong data dis-
tivity type are totally mismatched between the two datasets. tributions and as a result degrade the performance of trained
Existing works [17, 65] may fail to handle the significant models. It remains a challenge to design effective IMU data
data distribution gap and thus cannot achieve satisfactory augmentation approaches and appropriately adopt them to
performance in cross-dataset HAR. improve model performance.
ACM MobiCom ’23, October 2–6, 2023, Madrid, Spain Huatao Xu, Pengfei Zhou, Rui Tan, Mo Li

1), which are defined as source domain D𝑆 = {𝑿 𝑠𝑖 }𝑛𝑖=1 𝑠


. The
local datasets of other clients (the target users) are defined
as the target domain D𝑇 = {𝑿 𝑖𝑡 }𝑛𝑖=1 𝑡
. The 𝑿 𝑠𝑖 or 𝑿 𝑖𝑡 ∈ R𝐹 ×𝑀
represents one IMU sample, where 𝐹 is the number of sensor
features and 𝑀 is the number of IMU readings. Only a small
fraction of D𝑆 is annotated with activity labels, denoted as
D𝐿 = {𝑿 𝑙𝑖 , 𝑦𝑖𝑙 }𝑛𝑖=1
𝑙
. The D𝐿 may be biased to a limited num-
ber of combinations of {device, placement, user, environment}.
Mobile clients can communicate with the cloud server and
exchange necessary information (e.g., trained models). The
objective of the framework is to achieve high activity recog-
nition accuracy for the clients in the target domain.

3.2 Overview
Figure 4: UniHAR overview.
As depicted in Figure 4, UniHAR has two training stages:
■ Feature Extraction. All local unlabeled datasets are first
The virtual IMU technique [19, 20] aims at converting augmented to align the distributions of heterogeneous data
videos of human activity into virtual streams of IMU mea- from various clients. To construct a generalized feature ex-
surement to augment the training data, which follows le- tractor (i.e., the encoder), the cloud server collaborates with
gitimate physical processes. Virtual IMU, however, requires all mobile clients to exploit massive augmented unlabeled
additional sensing information, including activity videos and data. The encoder and decoder are trained on clients individu-
the on-body position of the device, to reconstruct the phys- ally, which learn the high-level features using self-supervised
ical states of devices and thus generate virtual IMU data. learning techniques. The cloud server combines local models
It cannot be generally applied when the additional camera and obtains a generalized model. In a nutshell, the whole
sensing modality is not available. process aims at solving the following problem:

2.3 Experiment Setting 𝑤 ∗ = arg min ℓ𝑟 (𝑤; D𝑆 , D𝑇 ), (1)


𝑤
We also notice that the experiment settings of most existing
where ℓ𝑟 denotes the loss function and 𝑤 denotes the weights
studies [4, 5, 11–13, 15, 17, 27, 32, 41, 42, 47, 50, 52, 52, 62–
of the encoder and decoder.
65] are still not practical - the proposed HAR models are
■ Activity Recognition. Based on the generalized encoder,
primarily evaluated with single IMU datasets. Although some
the server then adopts a small amount of labeled data from
works [4, 40, 41, 52, 62] employ multiple datasets, models
source users and trains an activity recognition model. Data
are evaluated on individual datasets separately and no cross-
augmentation is also integrated to enrich the diversity of
dataset evaluation is presented. The data from single datasets
labeled data and narrow the distribution gap between the
can be highly biased in various ways, e.g., different users and
source and target domains. The activity recognizer, includ-
devices, on-body positions, and environments. As suggested
ing encoder, refiner, and classifier, jointly learn to recognize
by the results in Figure 2(a), the evaluation results with a
activity types of labeled IMU data. The training process can
single dataset may not generalize across variable datasets,
be represented by
which however is essential for practical adoptions. In this
paper, all identified approaches are systematically evaluated 𝑐 ∗ = arg min ℓ𝑐 (𝑤 ∗, 𝑐; D𝐿 ), (2)
by transferring from one dataset to multiple other datasets 𝑐
in order to investigate their generalizability, which to the where ℓ𝑐 denotes the loss function and 𝑐 represents the
best of our knowledge is the first time. weights of the activity recognizer. After the server dispatches
the recognizer, each client utilizes it to classify activities with-
3 UNIHAR OVERVIEW out additional training.
3.1 Problem Definition
We consider an HAR framework consisting of a cloud server
4 PHYSICS-INFORMED DATA
and a number of clients (users). As shown in Figure 1, each AUGMENTATION
client has a local IMU dataset collected by single or multiple To mitigate the data heterogeneity, UniHAR enriches the
mobile devices. The cloud server has some initial datasets IMU data diversity based on physical knowledge and assists
shared by a small group of clients (the source users in Figure the learning of generalizable features.
Practically Adopting Human Activity Recognition ACM MobiCom ’23, October 2–6, 2023, Madrid, Spain

Figure 5). A data augmentation is a mapping 𝐹 (·) that trans-


forms observations from the original space to the augmented
space:
𝐹
( 𝒂, 𝝎 ) −→ ( 𝒂 ′, 𝝎 ′ ), (7)
We introduce the concept of physical embedding 𝐺(·) to align
the mapping 𝐹 (·) between observations with the underlying
physical principles, which is defined as:
Definition 1. 𝐺(·) is a physical embedding of 𝐹 (·) if 𝐺(·)
transforms physical states by
𝐺
( 𝒒, 𝒍, 𝒈, ∆𝑡 ) −→ ( 𝒒 ′, 𝒍 ′, 𝒈 ′, ∆𝑡 ′ ), (8)
Figure 5: Physics-informed IMU data augmentation. such that the observations from the transformed physical states
equal the augmented observations of 𝐹 (·):
4.1 Physical Sensing Models 𝑆𝑎 𝑆𝜔
( 𝒒 ′, 𝒍 ′, 𝒈 ′ ) −→ 𝒂 ′, ( 𝒒 ′, ∆𝑡 ′ ) −→ 𝝎 ′ . (9)
UniHAR augments both accelerometer and gyroscope sensor
readings, which provide more modality information for HAR In practice, a mapping 𝐹 (·) that has a physical embed-
[33, 62]. The measured acceleration is defined as ding 𝐺(·) indicates the transition of readings can take place
𝒂 = 𝑆𝑎 ( 𝒒, 𝒍, 𝒈 ) = 𝒒∗ ⊗ (𝒍 + 𝒈) ⊗ 𝒒, (3) through a physical process in reality. In this paper, we thus
define three types of data augmentation based on the above
where 𝒍 and 𝒈 denote the acceleration caused by the move- mathematical model:
ment of the device and gravity in the global frame, respec-
tively. The unit quaternion 𝒒 represents the orientation of • Complete data augmentation, where its mapping
the device which is the rotation from the global frame to 𝐹 (·) is connected with a physical embedding 𝐺(·), and
the local (body) frame. The 𝒒∗ is the conjugation of 𝒒 and 𝐹 (·) can be fully formulated with original observations
⊗ is the Hamilton product. The acceleration reading 𝒂 is a and known physical states.
rotated vector of the addition of 𝒍 and 𝒈 in the local frame. • Approximate data augmentation, where its map-
The gyroscope measures the angular velocity 𝝎 that can be ping 𝐹 (·) is connected with a physical embedding 𝐺(·),
used to derive the change of orientation 𝒒 by the formula2 but 𝐹 (·) involves unknown physical states and can be
as follows: approximated by a formulation of known states.
∆𝑡 • Flaky data augmentation, where we cannot find a
𝒒 𝒕 = 𝒒 𝒕−1 + 𝒒 𝒕−1 ⊗ 𝝎𝒕 , (4) physical embedding 𝐺(·) to support its mapping 𝐹 (·).
2
where ∆𝑡 is a small value, e.g., 0.01s, denoting the sampling In this paper, we refer complete and approximate data aug-
interval between 𝒒 𝒕 and 𝒒 𝒕−1 . By transforming Equation (4), mentations to physics-informed data augmentations, which
the sensing model of angular velocity is both have underlying support of physical embeddings. In
2 ∗ the following, we characterize each of the three data aug-
𝝎𝒕 = 𝑆𝜔 ( 𝒒, ∆𝑡 ) = 𝒒 ⊗ (𝒒 𝒕 − 𝒒 𝒕−1 ). (5)
∆𝑡 𝒕−1 mentation types based on which we develop more effective
augmentation adoption strategies that are grounded in the
4.2 Data Augmentation Model physical principles of IMU sensing.
In this paper, we propose a general model for IMU data
augmentation as indicated in Figure 5. The 𝒒, 𝒍, 𝒈, and ∆𝑡 are 4.2.1 Complete data augmentation. This type of data aug-
underlying physical states of the device, and sensor readings mentation accurately generates augmented observations us-
𝒂 and 𝝎 are observations from the physical states: ing the known physical states and original observations. We
elaborate on a few instances.
𝑆𝑎 𝑆𝜔
( 𝒒, 𝒍, 𝒈 ) −→ 𝒂, ( 𝒒, ∆𝑡 ) −→ 𝝎, (6) Acceleration normalization. Accelerometer and gyro-
where 𝑆𝑎 and 𝑆𝜔 indicate the accelerometer and gyroscope scope readings usually have different distributions. The range
sensing models, respectively. In practice, the 𝒒 and 𝒍 are difference may affect the performance of deep learning mod-
typically unknown (dashed circles in Figure 5) while other els [62]. A simple method is to narrow the difference by
physical states and observations are known (solid circles in normalizing accelerometer readings with the gravity (9.81
𝑚/𝑠 2 ), i.e, 𝒂 ′ = 𝐹 (𝒂) = ∥𝒈𝒂 . There exists a physical embed-
2 Note ∥
that Equation (4) is an approximation formula but studies [30, 39, 48]
′ 𝒍 ′ 𝒈
have shown that it works well in practice when ∆𝑡 is small. ding 𝒍 = 𝐺(𝒍) = ∥𝒈 ∥ , 𝒈 = 𝐺(𝒈) = ∥𝒈 ∥ . Other physical states
ACM MobiCom ’23, October 2–6, 2023, Madrid, Spain Huatao Xu, Pengfei Zhou, Rui Tan, Mo Li

including orientation 𝒒 and time interval ∆𝑡 remain the same. observations depend on the approximation of original obser-
The acceleration of transformed physical states is vations or known physical states.
Linear Upsampling. IMU data are discrete signals and
𝑆𝑎 ( 𝒒 ′, 𝒍 ′, 𝒈 ′ ) = 𝒒 ′∗ ⊗ (𝒍 ′ + 𝒈 ′) ⊗ 𝒒 ′ upsampling can enrich data samples. The linear upsampling
𝒍 +𝒈 𝒂 interpolates physical states by 𝒒 𝒕′˜ = 𝐺(𝒒) = 𝛼𝒒 𝒕 + (1 − 𝛼)𝒒 𝒕−1
= 𝒒∗ ⊗ ( )⊗𝒒= = 𝒂 ′. (10)
∥𝒈∥ ∥𝒈∥ and 𝒍 𝒕˜′ = 𝐺(𝒍) = 𝛼𝒍 𝒕 + (1 − 𝛼)𝒍 𝒕−1 , where 𝛼 is a value within
[0, 1] and 𝑡˜ = 𝛼𝑡 + (1 − 𝛼)(𝑡 − ∆𝑡) = 𝑡 − (1 − 𝛼)∆𝑡. The
The 𝐹 (𝒂) only involves the known physical state 𝒈, so accel- corresponding augmented observations are
eration normalization is a complete data augmentation.
Local rotation. The placement diversity causes signif- 𝑆𝑎 ( 𝒒 ′, 𝒍 ′, 𝒈 ′ ) = 𝒒 𝒕′˜ ∗ ⊗ (𝒍 𝒕˜′ + 𝒈) ⊗ 𝒒 𝒕′˜ , (13)
icant differences in triaxial distributions of IMU data. To 2
𝑆𝜔 ( 𝒒 ′, ∆𝑡 ′ ) = 𝒒∗ ⊗ (𝒒 𝒕˜ − 𝒒 𝒕˜ −1 ), (14)
simulate the IMU data collected from different device ori- ∆𝑡 𝒕˜ −1
entations, this augmentation applies an extra rotation to both involving unknown physical states 𝒒 and 𝒍. Linear up-
the device and augments orientation in the local frame by sampling is an approximate data augmentation and its aug-
𝒒 ′ = 𝐺(𝒒) = 𝒒 ⊗ ∆𝒒, where ∆𝒒 is generated rotation and mented observations can be approximated as
known. The observations of transformed physical states are
𝒂 ′ = 𝑆𝑎 ( 𝒒 ′, 𝒍 ′, 𝒈 ′ ) = 𝒒 𝒕′˜ ∗ ⊗ (𝛼𝒍 𝒕 + (1 − 𝛼)𝒍 𝒕−1 + 𝒈) ⊗ 𝒒 𝒕′˜
𝑆𝑎 ( 𝒒 ′, 𝒍 ′, 𝒈 ′ ) = (𝒒 ⊗ ∆𝒒)∗ ⊗ (𝒍 ′ + 𝒈 ′) ⊗ (𝒒 ⊗ ∆𝒒) = 𝛼𝒒 𝒕′˜ ∗ ⊗ (𝒍 𝒕 + 𝒈) ⊗ 𝒒 𝒕′˜ + (1 − 𝛼)𝒒 𝒕′˜ ∗ ⊗ (𝒍 𝒕−1 + 𝒈) ⊗ 𝒒 𝒕′˜
∗ ∗
= ∆𝒒 ⊗ 𝒒 ⊗ (𝒍 + 𝒈) ⊗ 𝒒 ⊗ ∆𝒒 (11) ≈ 𝛼 𝒒∗𝒕 ⊗(𝒍 𝒕 + 𝒈) ⊗ 𝒒 𝒕 + (1 − 𝛼) 𝒒∗𝒕−1 ⊗(𝒍 𝒕−1 + 𝒈) ⊗ 𝒒 𝒕−1

= ∆𝒒 ⊗ 𝒂 ⊗ ∆𝒒, = 𝛼𝒂 𝒕 + (1 − 𝛼)𝒂 𝒕−1, (15)
2
𝑆𝜔 ( 𝒒 ′, ∆𝑡 ′ ) = (𝒒 ⊗ ∆𝒒)∗ ⊗ (𝒒 𝒕 ⊗ ∆𝒒 − 𝒒 𝒕−1 ⊗ ∆𝒒) 𝝎 ′ = 𝑆𝜔 ( 𝒒 ′, ∆𝑡 ) =
2 ∗
𝒒 ⊗ (𝛼(𝒒 𝒕 − 𝒒 𝒕−1 ) −
∆𝑡 ∆𝑡 𝒕˜ −1
2 (1 − 𝛼)(𝒒 𝒕−1 − 𝒒 𝒕−2 ))
= ∆𝒒∗ ⊗ 𝒒∗ ⊗ (𝒒 𝒕 − 𝒒 𝒕−1 ) ⊗ ∆𝒒 (12)
∆𝑡 2𝛼 ∗ 2(1 − 𝛼) ∗

= ∆𝒒 ⊗ 𝝎 ⊗ ∆𝒒. ≈ 𝒒 𝒕−1 ⊗(𝒒 𝒕 − 𝒒 𝒕−1 ) + 𝒒 𝒕−2 ⊗(𝒒 𝒕−1 − 𝒒 𝒕−2 )
∆𝑡 ∆𝑡
The 𝐹 (·) can be designed with 𝒂 ′ = 𝐹 (𝒂) = ∆𝒒∗ ⊗ 𝒂 ⊗ ∆𝒒 = 𝛼𝝎𝒕 + (1 − 𝛼)𝝎𝒕−1, (16)
and 𝝎 ′ = 𝐹 (𝝎) = ∆𝒒∗ ⊗ 𝝎 ⊗ ∆𝒒. The augmented observations where 𝒒 𝒕′˜ approximately equals to 𝒒 𝒕−1 or 𝒒 𝒕 if ∆𝑡 is small.
can be derived from original observations and known ∆𝒒, Linear upsampling enlarge the size of data but introduces
so local rotation is a complete data augmentation. The local approximation errors.
rotation significantly diversifies the triaxial distributions of Time wrapping. A same type of activities may vary in
the original readings and maintains other human motion duration across users due to distinct behavioral patterns. To
information, e.g., the magnitude and the fluctuation pattern. mitigate the temporal divergence, time wrapping accelerates
Dense sampling. Existing studies [5, 17, 40, 50, 62, 64, 65] or decelerates changes of physical states in the temporal
simply divide IMU readings using low overlapping rates (e.g., dimension, e.g., 𝒒 𝒕′ = 𝐺(𝒒) = 𝒒 𝒌∗𝒕 , where 𝑘 is a scaling factor
zero or 50% overlapping) and as a result underutilize the usually chosen within [0.8, 1.2]. The augmented observations
data. To fully use the collected IMU data, higher overlapping are accordingly stretched in the temporal dimension, e.g.,
rates with dense sampling may be adopted. The rationale 𝒂 𝒕′ = 𝐹 (𝒂) = 𝒂 𝒌∗𝒕 . To facilitate such a transformation, time
is that most daily activities are periodic, which means any wrapping adopts linear upsampling to obtain continuous
time can be viewed as the start of the motion. Dense sam- observations. Therefore, time wrapping is also approximate
pling shifts observations along the temporal dimension by data augmentation, which enhances temporal diversity and
𝒂 𝒕′ = 𝐹 (𝒂) = 𝒂 𝒕+𝒏 , 𝝎𝒕′ = 𝐹 (𝝎) = 𝝎𝒕+𝒏 , where 𝑛 is a random also with approximation errors.
value. The augmented observations are partitioned with a
fixed window and then put into HAR models for training, 4.2.3 Flaky data augmentation. We find many IMU data
which can enlarge the number of training samples with ex- augmentation methods, although widely adopted in existing
isting sensor readings. Its physical embedding is shifting the works [4, 36, 40, 52, 55, 57], do not have their physical em-
physical states by 𝑛 accordingly, e.g., 𝒍 𝒕′ = 𝐺(𝒍) = 𝒍 𝒕+𝒏 . The beddings. For example, some data augmentations randomly
𝐹 (·) does not require unknown physical states and dense negate observations [36, 40, 52] or reverse the observations
sampling is also a complete data augmentation. along the temporal dimension [36, 40, 52]. The permutation
[36, 40, 52, 55] slices the observations sequence within a
4.2.2 Approximate data augmentation. This type of data aug- temporal window and randomly swaps sliced segments to
mentation has its physical embedding, but the augmented generate a new sequence. The shuffling [36, 40, 52] randomly
Practically Adopting Human Activity Recognition ACM MobiCom ’23, October 2–6, 2023, Madrid, Spain

(as illustrated in the bottom of Figure 6). Since supervised


training with labels is more robust to errors [3], a wider
range of data augmentation methods can be integrated and
UniHAR applies both complete and approximate data aug-
mentation to augment labeled data in this stage.
Flaky data augmentations are prohibited throughout the
entire training process because they may lead to completely
wrong data distributions. We thoroughly investigate the ef-
fect of data augmentation methods with experiments and
show how their varied usage can either improve or deterio-
rate model performance in Section 7.4.

5 UNIHAR ADOPTION
Putting UniHAR to practical adoption, we consider two ap-
plication scenarios and make further optimizations for im-
Figure 6: Adoption of data augmentations.
proved performance.

rearranges the channels of sensor observations to change the 5.1 Data-decentralized Scenario
triaxial distribution. These methods may not be associated In the data-decentralized scenario, the raw data transmis-
with the underlying physical principles. sion from the target users to the cloud server is supposed
The jittering [36, 40, 52, 55] that adds additional random not allowed due to practical constraints, e.g., prohibitive
noise to the original observations is a special flaky data aug- processing and transmission overheads or privacy concerns.
mentation. It aims at augmenting the sensor model by intro-
5.1.1 Feature extraction. To extract effective features from
ducing the sensor noises. But the applied noise distributions
local unlabeled datasets, UniHAR adopts self-supervised
may not match the true distributions, which vary across
learning to train the encoder and decoder. There are sev-
different devices and are hard to determine [50].
eral state-of-the-art self-supervised representation models
Flaky data augmentations only operate with IMU observa-
[12, 40, 52, 62] for IMU data. However, many methods [12, 40,
tions and are not explainable with the underlying physical
52] are entangled with data augmentation and flaky data aug-
process. The adoption of them may lead to unbounded errors
mentation methods are integrated, which we believe may
in the generated data distributions.
harm the end performance. We identify LIMU-BERT [62]
as an effective foundation model for IMU-based sensing,
4.3 Data Augmentation Adoption and embed it into our design to build the representation
UniHAR incorporates physics-informed data augmentation model. LIMU-BERT is orthogonal to the data augmentation
methods differently during the two stages of the framework employed in UniHAR.
based on their respective characteristics and Figure 6 explains We employ acceleration normalization and local rotation
the rationale. for data augmentation, both being complete data augmenta-
During the feature extraction stage, although unlabeled tion. As shown in Figure 4, the encoder and decoder jointly
data are abundant, they are from different users, devices, predict the original values of the randomly masked IMU
and environments, which are subject to significant domain readings. By the reconstruction task, the encoder learns the
shift. The purpose of incorporating data augmentation is underlying relations among IMU data and extracts effec-
to generalize the data distributions and improve the inter- tive features. The Mean Square Error (MSE) loss is used to
domain data representativeness (as illustrated in the top of compute the differences between the original and predicted
Figure 6). On the other hand, it is challenging to control values, which is defined as follows:
the data quality when approximation errors are introduced
|𝑿 | 𝑗 ∈𝑀
[𝑖]
at scale. Therefore, UniHAR only employs complete data 1 X X [𝑖]
ℓ𝑟𝑒𝑐 (𝑤; 𝑿 ) = MSE (𝑿ˆ ·𝑗 − 𝑿 [𝑖]
·𝑗 ), (17)
augmentations to unlabeled data in this stage. |𝑿 | 𝑖=1 𝑗
During the activity recognition stage, labeled data from
the source domain are utilized but they are scarce. In addition where 𝑤 denotes the model weights of the encoder and de-
to aligning the data distributions across domains, the data coder, and 𝑀 [𝑖] represents the set of the position indices of
[𝑖]
augmentation is also expected to enrich the source domain masked readings for the 𝑖-th IMU sample 𝑿 [𝑖] . The 𝑿ˆ ∈
labels and improve the intra-domain data representativeness R𝐹 ×𝑚 denotes the predicted data as shown in Figure 4.
ACM MobiCom ’23, October 2–6, 2023, Madrid, Spain Huatao Xu, Pengfei Zhou, Rui Tan, Mo Li

To avoid the transmission of raw data, UniHAR integrates


a federated learning structure to collaborate with all mobile
clients and train a more generalized feature extraction model.
UniHAR aggregates local models from all clients and obtains
a general global model as shown in Figure 4 (green part). In
each round of training, the cloud server distributes the latest
global model to the clients, which then make use of their
individual local datasets to update the model with the ℓ𝑟𝑒𝑐
defined in Equation (17) for 5 epochs. There are two potential
options to aggregate local models: aggregating gradient and
Figure 7: Workflow for training the recognizer in the
aggregating model weights [24]. Our experiments show that
data-centralized scenario.
the latter does introduce less bias to the clients with more
samples and achieves better overall performance. Therefore,
the server aggregates local models with weights defined
as the ratio of the number of samples at each client to the dropout layer with a drop rate of 0.5 and a fully-connected
total number of samples, which can be expressed as 𝑤𝑔 ← layer with 10 units.
P𝐾 𝑛𝑘 𝑘
𝑘=1 𝑛 𝑙
𝑤 . The 𝑤𝑔 and 𝑤𝑙 denote the parameters of global
and local models, respectively. The aggregation weight 𝑛𝑛𝑘 5.2 Data-centralized Scenario
is the ratio of the number of samples at the 𝑘-th client to
We consider a second scenario where some target users may
the total number of samples 𝑛. The process repeats until the
share their unlabeled data with the cloud server for improved
global model converges.
HAR performance. In such a case, UniHAR is able to incor-
To optimize the training process, UniHAR initializes the
porate unsupervised learning techniques to design a more
weights of the encoder and decoder trained with the source
sophisticated activity recognizer and further eliminate the
domain data such that each mobile client can fine-tune the
domain discrepancies. Specifically, UniHAR injects extra in-
models with fewer epochs.
formation, i.e., domain label specifying which domain the
5.1.2 Activity recognition. Based on the encoder trained IMU data belong to, into the activity recognizer training
with massive unlabeled data, the server exploits the aug- process using adversarial domain adaptation techniques.
mented labeled data from the source users and trains an Figure 7 illustrates the workflow of the activity recogni-
activity recognizer. Figure 4 gives the workflow in training tion stage in the data-centralized scenario. Both the source
the recognizer. In addition to acceleration normalization and and target domain data are augmented and then processed
local rotation, UniHAR further applies dense sampling and by the encoder and refiner. The domain classifier learns to
time wrapping to augment the source domain labeled data. distinguish the domain with the training loss
To control the approximation errors, time wrapping is ap-
|𝑿 |
plied with a probability of 0.4 which is fine-tuned based on 1 X
ℓ𝑑𝑜𝑚 (𝑤, 𝑟, 𝑑; 𝑿 ) = CE ( 𝑦ˆ𝑑[𝑖], 𝑦𝑑[𝑖] ), (19)
our empirical experiments. A refiner is designed to distill the |𝑿 | 𝑖=1
representations and extract activity-specific features. The
classifier is trained to recognize activity types with refined where 𝑦ˆ𝑑[𝑖] and 𝑦𝑑[𝑖] denote the predicted probability and ac-
features. The training loss is defined as follows: tual domain label, respectively. The weights of the domain
classifier 𝑑 are updated with ℓ𝑑𝑜𝑚 while the encoder, refiner,
|𝑿 |
1 X and activity classifier are trained with the mixed loss:
ℓ𝑎𝑐𝑡 (𝑤, 𝑟, 𝑐; 𝑿 ) = CE ( 𝑦ˆ[𝑖], 𝑦 [𝑖] ), (18)
|𝑿 | 𝑖=1
ℓ𝑚𝑖𝑥 (𝑤, 𝑟, 𝑐; 𝑿 ) = ℓ𝑎𝑐𝑡 (𝑤, 𝑟, 𝑐; 𝑿 ) − 𝛼 · ℓ𝑑𝑜𝑚 (𝑤, 𝑟, 𝑑; 𝑿 ), (20)
where ℓ𝑎𝑐𝑡 is defined with the Cross-Entropy (CE) loss, 𝑟 and
𝑐 represent the weights of the refiner and classifier, respec- where 𝛼 is a weight set to 0.6. By minimizing ℓ𝑚𝑖𝑥 , the en-
tively. The 𝑦ˆ[𝑖] and 𝑦 [𝑖] are the estimated softmax probability coder and refiner are trained against the domain classifier
and corresponding ground truth. Note that the encoder is and thus capture domain-independent features, which fur-
fine-tuned according to ℓ𝑎𝑐𝑡 during training. ther mitigates the data heterogeneity issue.
In UniHAR, the refiner contains two Gated Recurrent Unit The domain classifier contains two fully-connected layers
(GRU) layers (bi-directional) with the same hidden sizes of with the hidden and output size of 72 and 2, respectively.
10 and the input size of 36. Only the hidden features at the The first fully-connected layer is followed by the Rectified
last position are input into the classifier, which consists of a Linear Unit (ReLU) [2] activation function layer.
Practically Adopting Human Activity Recognition ACM MobiCom ’23, October 2–6, 2023, Madrid, Spain

Table 1: System Overhead.

Model Autoencoder Recognizer


Client train: 66ms infer: 25ms
Latency
Server aggre.: / train: 15ms
Size 62.6 KB 68.5 KB
CPU 13% 11%
Client Memory 102 MB 93 MB
Energy light light
Figure 8: UniHAR implementation.
aggregation time depends on the number of autoencoders
shared by clients.
6 IMPLEMENTATION AND SYSTEM Communication overhead. The models are first initi-
EVALUATION ated with Tensorflow lite files on the client. The model
weights are exchanged in the format of Tensorflow check-
UniHAR is fully prototyped on the mobile platform and Fig-
point files. The total communication overhead of the feder-
ure 8 illustrates the system designs for the mobile client and
ated training process with 100 rounds for each client is about
the cloud server. The mobile client is an Android application
18.9 MB (100 × 2 × 62.6 KB + 100 × 68.5 KB), which is easily
that supports real-time data collection and model training,
affordable with nowadays’s 4G/5G data bundles.
inference, and sharing. The sensor collector accesses IMU
Computational overhead and energy consumption.
sensors and saves readings as files implicitly. The model man-
The Android Profiler [6] indicates that the training of au-
ager implemented with TensorFlow lite [1] is then activated
toencoder and inference of recognizer cause about 10% CPU
to train the autoencoder (including the encoder and decoder)
load and require about 100 MB memory on the Samsung
with unlabeled data. The server aggregates all client encoders
Galaxy S8. The energy usages are both below the level of
in the encoder queue and trains the recognizer with labeled
"light" defined by the Android Profiler.
data. All components are guided by the configuration data
In summary, UniHAR introduces low system overhead to
(e.g., sampling rate and batch size) set by the admin website.
mobiles and the cloud server.
To avoid affecting the daily use of other applications, the
model manager is triggered only when the smartphone is
not actively used and is being charged.
7 COMPARATIVE EVALUATION
The input sensor data are accelerometer and gyroscope To ensure direct and fair comparisons between UniHAR and
readings down-sampled to 20 Hz. The input window contains a variety of existing works with open datasets, we re-build
20 readings. UniHAR defines the encoder and decoder [62] UniHAR and implement baseline models with Pytorch [35]
with 𝑅𝑛𝑢𝑚 = 1, 𝐴𝑑𝑖𝑚 = 4, 𝐻𝑑𝑖𝑚 = 36, and 𝐹𝑑𝑖𝑚 = 72. The two and emulate mobile clients that hold offline local datasets.
stages adopt the same learning rate and batch size, which
are 0.001 and 64, respectively. The recognizer is trained for 7.1 Experiment Setup
500 epochs. The model weights are updated with Adam [18] 7.1.1 Datasets. We evaluate UniHAR with four publicly
optimizer. To evaluate the system overhead, we install the available datasets, which have been widely used in previous
client on a Samsung Galaxy S8 SM-G9500 (Octa-core CPU studies [40, 62–64]. These datasets cover a wide variety of
and 4 GB RAM). The cloud server is deployed on a computer {user, device, placement, environment} combinations.
equipped with an Intel(R) Core(TM) i9-9820X 3.30GHz CPU, ■ HHAR [50] contains accelerometer and gyroscope read-
128 GB memory, and four NVIDIA GEFORCE 2080Ti GPUs. ings from 9 users performing 6 different activities (sitting,
Latency. Table 1 shows the latency of the autoencoder standing, walking, upstairs, downstairs, and biking) with 6
and recognizer on the two platforms. The smartphone re- types of mobile phones (3 models of Samsung Galaxy and
quires 66ms and 25ms to train the autoencoder and infer one model of LG). All smartphones were carried by the users
the recognizer for one batch of samples, respectively. The around their waists. The dataset was collected in Denmark.
first-time training and inference may take longer time, i.e., ■ UCI [38] has raw accelerometer and gyroscope data with
2.5s and 0.5s, respectively, which may be attributed to the 30 volunteers aged from 19 to 48 years from Italy. The read-
model file initialization process. The training time of the ings of 6 basic activities (standing, sitting, lying, walking,
recognizer is about 15 ms per batch on the server and the walking downstairs, and walking upstairs) were collected at
ACM MobiCom ’23, October 2–6, 2023, Madrid, Spain Huatao Xu, Pengfei Zhou, Rui Tan, Mo Li

50 Hz with a Samsung Galaxy S II carried on the waist. Table 2: Cross-dataset transfer setup.
■ MotionSense [31] dataset (abbreviated as Motion in our
paper) adopted an iPhone 6s to gather accelerometer and gy- Souce Domain Target Domain
Case
roscope time-series data. 24 participants from UK performed with labels without labels
6 activities (sitting, standing, walking, upstairs, downstairs, 1 HHAR UCI, Motion, Shoaib
2 UCI HHAR, Motion, Shoaib
and jogging) with the device stored in their front pockets.
3 Motion HHAR, UCI, Shoaib
All data were collected at a 50 Hz sampling rate.
4 Shoaib HHAR, UCI, Motion
■ Shoaib [44] et al. collected data of seven daily activities
(sitting, standing, walking, walking upstairs, walking down-
stairs, jogging, and biking) in the Netherlands. The 10 male is already selected as one of the baselines, and the perfor-
participants were equipped with five Samsung Galaxy SII mance of ASTTL as originally reported in [37] is poor.
(i9100) smartphones placed at five on-body positions (right
pocket, left pocket, belt, upper arm, and wrist). The IMU read- 7.1.3 Cross-dataset evaluation. To demonstrate the gener-
ings were collected at 50 Hz. alizability of UniHAR, we design four cross-dataset evalu-
To demonstrate the effectiveness of UniHAR across di- ation cases and Table 2 indicates each of the cases, e.g., in
verse datasets, we select four common activities (i.e., still, case 1, UniHAR transfers models from the HHAR dataset to
walk, walk upstairs, walk downstairs) contained in all four other three datasets without activity labels. The clients in
open datasets. For still activity, we merge several similar the source domain share a small portion of labeled data with
activities, for example, sit and stand in the HHAR dataset the cloud server, while the clients in the target domain only
into still. In addition to the diversity of {user, device, place- contribute local unlabeled data. Each mobile client has a lo-
ment, environment}, the merged dataset has general label cal dataset containing the IMU data collected from the same
definitions (i.e., still). The activity distributions of the four user and our setup has a total of 73 clients. Each local dataset
activities also differ in the four datasets. is partitioned into training (80%), validation (10%), and test
(10%) sets. The training sets of all clients participate in the
7.1.2 Baseline models. We consider both data-decentralized federated training process of the encoder and decoder. To
and data-centralized scenarios, and compare UniHAR with train the recognizer, we randomly select a small portion of la-
relevant state-of-the-art solutions in each scenario. In the beled samples (i.e., 1,000) from the training sets in the source
data-decentralized scenario, we extend three most relevant domain. The validation sets are utilized to select models (e.g.,
approaches as baselines: the encoder and classifier) and the trained recognizers are
■ DCNN [63] designs a CNN-based HAR model that out- evaluated on the test sets in the target domain.
performs many traditional methods. It assumes labeled data
are abundant and adequately representative. 7.1.4 Metrics. We compare the performance of HAR models
■ TPN [40] learns features from unlabeled data by recogniz- with average accuracy and F1-score of all users in the target
domain, which are defined as 𝑎 = 𝑎𝑖 , 𝑓 = 𝑓𝑖 , 𝑠.𝑡 . 𝑖 ∈ D𝑡 ,
P P
ing the applied data augmentations. It only requires limited
data for training but with an implicit assumption that they where 𝑎𝑖 and 𝑓𝑖 are the activity classification accuracy and
are not biased. F1-score of the 𝑖-th user, respectively.
■ LIMU-GRU [62] learns representations by a self-supervised
autoencoder LIMU-BERT. It uses limited labeled data and 7.2 Overall Performance
assumes unbiased data distributions. Table 3 compares the performance of UniHAR and other base-
In the data-centralized scenario, we compare UniHAR with line models in the data-decentralized and data-centralized
three existing unsupervised domain adaptation approaches. scenarios. The two numbers in each cell denote average ac-
■ HDCNN [17] handles domain shift by minimizing the curacy and F1-score, respectively. The row of UniHAR-A
Kullback-Leibler divergence between the fully-labeled source gives the performance of the UniHAR recognizer trained
domain features and unlabeled target domain features. with centralized target user data. In the data-decentralized
■ FM [4] minimizes the feature distance across domains by scenario, DCNN, TPN, and LIMU-GRU achieve poor per-
maximum mean discrepancy [54]. It requires full supervision formance mainly because they overlook the data diversity
with adequate labeled data from the source domain. among the source and target domain users. In contrast, Uni-
■ XHAR [65] is an adversarial domain adaptation model HAR achieves 78.5% average accuracy and 67.1% F1-score,
and needs to select the source domain before adapting models which outperforms the best of three baselines by at least 15%.
to the unlabeled target domain. For the data-centralized scenario, UniHAR-A also yields bet-
The SelfHAR [52] and ASTTL [37] are not compared be- ter accuracies and F1-scores when compared with HDCNN,
cause SelfHAR inherits the training scheme from TPN, which FM, and XHAR in most cases. Although XHAR delivers the
Practically Adopting Human Activity Recognition ACM MobiCom ’23, October 2–6, 2023, Madrid, Spain

Table 3: Performance comparison. (The two numbers in each cell are accuracy and F1-score.)

Cross-Dataset Transfer Case


Scenario Model Average
1 2 3 4
DCNN 0.594, 0.438 0.583,0.373 0.628, 0.437 0.668, 0.465 0.618, 0.428
Data TPN 0.584, 0.361 0.530,0.281 0.541, 0.302 0.601, 0.350 0.564, 0.324
Decentralized LIMU-GRU 0.306, 0.174 0.435, 0.178 0.353, 0.248 0.497, 0.337 0.398, 0.234
UniHAR 0.757, 0.611 0.785, 0.667 0.789, 0.704 0.810, 0.702 0.785, 0.671
HDCNN 0.557, 0.439 0.515, 0.233 0.487, 0.293 0.518, 0.376 0.524, 0.335
Data FM 0.386, 0.250 0.757, 0.507 0.410, 0.273 0.564, 0.369 0.539, 0.350
Centralized XHAR 0.648, 0.430 0.615, 0.433 0.733, 0.566 0.879, 0.777 0.719, 0.552
UniHAR-A 0.805, 0.674 0.833, 0.708 0.819, 0.731 0.824, 0.723 0.820, 0.709

0.8 0.8 0.8 0.8


Accuracy

Accuracy

Accuracy

Accuracy
0.6 0.6 0.6 0.6

0.4 DCNN 0.4 HDCNN 0.4 0.4


TPN FM
0.2 LIMU-GRU 0.2 XHAR w.o. data augmentations w.o. data augmentations
0.2 w/ data augmentations 0.2 w/ data augmentations
UniHAR UniHAR-A
50 100 200 500 1000 1400 50 100 200 500 1000 1400 DCNN TPN LIMU-GRU UniHAR HDCNN FM XHAR UniHAR-A
Number of Labels Number of Labels Model Model

(a) Data-decentralized. (b) Data-centralized. (a) Data-decentralized. (b) Data-centralized.

Figure 9: Accuracies with different numbers of labels. Figure 10: Effect of data augmentations.

best performance for case 4, it is shown to be sensitive to samples from the source domain. On the other hand, the
the source domain dataset and falls much lower in other models in the data-centralized scenario can achieve higher
cases. UniHAR-A, however, achieves accuracies consistently accuracies when more labeled data are employed. UniHAR-A
higher than 80.0% across all cases and in average outperforms consistently outperforms the other two models in the data-
the best of the three by 10% in terms of accuracy and 15% centralized scenario. The UniHAR(-A) is able to achieve aver-
in terms of F1-score. In summary, the results demonstrate age accuracies of 71.0% and 72.1% in the two scenarios, when
the outstanding performance of UniHAR(-A), credited to the only 50 labeled samples are used. The experiment results
effective data augmentations and feature extraction. also suggest a more robust performance of UniHAR(-A).

7.3 Impact of Labeled Sample Size


We then investigate how UniHAR and the baseline models 7.4 Effect of Data Augmentation
perform with different amount of labeled samples. We vary We devise an ablation study to evaluate how different data
the size of labeled samples from 50 to 1,400. Figure 9 plots augmentation methods are effective in supporting the train-
the average accuracies achieved in data-decentralized and ing objective. Figure 10 compares the performance of Uni-
data-centralized scenarios, respectively. The error bar rep- HAR and baseline models in both data-decentralized and
resents the standard deviation of the accuracies over the data-centralized. It is shown that the accuracies of all models
four cross-dataset evaluation cases. The results suggest that increase when integrating with the proposed data augmen-
UniHAR outperforms other models in all cases by at least tation methods. Although UniHAR(-A) may lower accuracy
10% in accuracy (and up to 20% in the data-decentralized when data augmentation is not employed, its training archi-
scenario). Since HDCNN, TPN, and LIMU-GRU are prone to tecture fully exploits the potential of data augmentation with
overfitting to the source domain, their performances on the the whole learning framework and eventually outperforms
target domain are not very related to the number of labeled the best baseline models by 10% in both scenarios.
ACM MobiCom ’23, October 2–6, 2023, Madrid, Spain Huatao Xu, Pengfei Zhou, Rui Tan, Mo Li

0.9 0.9 0.9 0.9

0.8 0.8

Accuracy

Accuracy
Accuracy

Accuracy
0.8 0.8
0.7 0.7
0.7 0.7
0.6 0.6
complete d.a. complete d.a. w.o. pretraining
0.5 complete + approximate d.a. 0.5 complete + approximate d.a. 0.6 w.o. pretraining 0.6 w/ self-supervised training
complete + approximate + flaky d.a. complete + approximate + flaky d.a. w/ UniHAR pretraining w/ UniHAR pretraining
1 2 3 4 Avg. 1 2 3 4 Avg. 1 2 3 4 Avg. 1 2 3 4 Avg.
Transfer Case Transfer Case Transfer Case Transfer Case

(a) Feature extraction. (b) Activity recognition. (a) Data-decentralized. (b) Data-centralized.

Figure 11: Impact of data augmentation combinations. Figure 12: Accuracies of pretraining approaches.

we then examine how effective is the proposed way of Table 4: Efficiency comparison.
integrating complete and approximate data augmentation
in UniHAR. We compare the three choices of i) using only Model Para. Size Train. Time Infer. Time
complete data augmentation, ii) using both complete and DCNN 17 K 76 KB 4.2 ms 0.8 ms
approximate data augmentation, and iii) using all including TPN 26 K 194 KB 7.6 ms 2.2 ms
flaky data augmentations, and during both the feature ex- LIMU-GRU 54 K 239 KB 8.1 ms 5.5 ms
traction and activity recognition stages. Figure 11 plots the HDCNN 28 K 118 KB 3.6 ms 0.8 ms
achieved accuracies in the data-decentralized scenario. Re- FM 50 K 203 KB 5.8 ms 2.9 ms
sults show that if approximate data augmentation is adopted XHAR 700 K 2607 KB 17.0 ms 14.2 ms
in the feature extraction stage, it may lead to an accuracy UniHAR 15 K 78 KB 8.9, 17.6 ms 3.6 ms
drop of 2.0% because of the approximation errors introduced
to massive unlabeled data. On the other hand, applying ap-
proximate data augmentation in the activity recognition sensitive to the number of samples [28], the recognizer with
stage can further enrich data diversity and thus increase the biased encoder degrades significantly in case 2, where
accuracy by 1.9%. Flaky data augmentation, however, only the source domain dataset UCI has the fewest samples. The
introduces a negative impact to almost all cases when applied experiment suggests that UniHAR(-A) can achieve more ro-
in either stage. The UniHAR performance drops by 5.7% on bust performance with federated training. The encoder and
average when data augmentation jittering and permutation decoder are first initialized with the source domain dataset in
are applied. Our results with the data-centralized scenario the feature extraction process, which gains a 1.5% improve-
show similar results (omitted due to page limits). ment in average accuracy. The possible reason is that mobile
We also investigate the detailed performance gains when clients can better adapt a good initialized model to their local
different data augmentation methods are adopted. The local datasets [7].
rotation introduces the largest gains, i.e., 19.4% and 15.3%
accuracy improvements in the two scenarios. The dense sam- 7.6 Model Size and Latency
pling and time wrapping together improve the average accu- Table 4 compares UniHAR with the baseline models in terms
racies by 2.6% and 2.3% in the two scenarios, respectively. of the number of parameters, model size, training time, and
inference time. The models are optimized by lite Pytorch
7.5 Effect of Feature Extraction Mobile [35]. The training time is the time the server takes
We evaluate the effectiveness of feature extraction by com- to train a mini-batch (64) of samples and the inference time
paring the performance of three training approaches, i.e., is the execution time for inferring one IMU sample on the
without any pretraining, with self-supervised learning (only Samsung Galaxy S8. The two training time of UniHAR corre-
available for the data-centralized scenario), and with both sponds to the models without and with the domain classifier.
self-supervised and federated learning (UniHAR pretrain- In summary, the model size of UniHAR is small and its train-
ing). Figure 12 plots the end performance of different per- ing and inference time is comparable with others.
taining approaches and the results show that the UniHAR
training approach consistently achieves better performance 8 RELATED WORK
in the data-decentralized scenario. For the data-centralized Wearable-based HAR systems [9, 10, 15, 17, 27, 37, 40, 64, 65]
scenario, although the self-supervised training slightly out- are ubiquitous and low-cost. Conventional HAR models [15,
performs the UniHAR training in case 4, it only achieves 43, 47, 51, 58, 63] adopt deep neural networks and achieve
an accuracy of 71.2% in case 2. It is because the encoder is high performance with the help of sufficient well-annotated
Practically Adopting Human Activity Recognition ACM MobiCom ’23, October 2–6, 2023, Madrid, Spain

datasets. However, IMU data heterogeneity prevents them existence of physical embedding is independent of represen-
from achieving promising performance in practice. tations. While quaternion is one of several possible ways to
Recent federated learning schemes [21–23, 34, 53] allow represent the device orientation, augmentation with physi-
for distributed training without accessing raw data but they cal embedding can be expressed with other representations.
require fully-labeled data at the target users, which cannot For example, we may also represent sensing models using
be directly applied to the considered scenario. rotation matrices: 𝒂 = 𝑹 −1 (𝒍 + 𝒈), 𝝎 = 𝑓𝑔−1 (𝑹𝑡−1−1 𝑹𝑡 )/∆𝑡, where
Self-supervised learning works [14, 36, 40, 52, 57, 62] have 𝑹 is the rotation matrix representing the device orientation
shown effectiveness in extracting useful features from unla- and 𝑓𝑔−1 converts the rotation matrix to angular changes [48].
beled data and thereby improving the performance of down- Taking local rotation as example, its physical embedding is
stream HAR models. For example, the encoder models from 𝑹 ′ = 𝑹∆𝑹, and augmented readings are derived:
TPN [40] and LIMU-BERT [62] may be viewed as the early
efforts in building "foundation" models to extract contextual 𝒂 ′ = (𝑹∆𝑹)−1 (𝒍 + 𝒈) = ∆𝑹 −1 𝑹 −1 (𝒍 + 𝒈) = ∆𝑹 −1 𝒂, (21)

features from unlabeled IMU data, with which task-specific 𝝎 = 𝑓𝑔−1 ((𝑹𝑡 −1 ∆𝑹)−1 𝑹𝑡 ∆𝑹)/∆𝑡 = 𝑓𝑔−1 (∆𝑹 −1 𝑹𝑡−1−1 𝑹𝑡 ∆𝑹)/∆𝑡
models can achieve superior performances with limited la- = 𝑓𝑔−1 (∆𝑹 −1 𝑓𝑔 (𝝎∆𝑡)∆𝑹)/∆𝑡
beled data. However, these models still require some labeled
data to train HAR classifiers, which can be overfitted to spe- = 𝑓𝑔−1 (∆𝑹 −1 𝑓𝑔 (𝝎)∆𝑹) = ∆𝑹 −1 𝝎. (22)
cific domains and fail to achieve high performance for target
These equations demonstrate the same results as those ob-
users without any labeled data.
tained with quaternions (Equation 11 and 12).
Unsupervised domain adaptation approaches have been
Frequency-Domain data augmentation. Some studies
introduced to HAR applications [17, 37, 65] and reduce the
[26, 36] propose data augmentations in the frequency domain.
distribution divergence between different domains. Specif-
To establish the physical embedding of these data augmen-
ically, HDCNN [17] learns transferable features by mini-
tations, we may accordingly perform Fourier Transform on
mizing Kullback-Leibler divergence between the source and
physical states like orientation. However, frequency domain
target domains. XHAR [65] extracts domain-independent fea-
operations can potentially violate the constraints of orien-
tures by adversarial training. Unfortunately, our experiments
tation representation after the inverse Fourier Transform.
demonstrate that purely learning-based domain adaptation
For example, a low-pass filter on orientation quaternions
approaches fail to handle highly heterogeneous IMU data
can lead to non-unit quaternions and a loss of their phys-
across domains and cannot achieve satisfactory performance
ical meaning. Further study is needed to understand their
in adapting models across different user groups.
relationships with physics-informed data augmentation.
Prior works [49, 55] devise a range of IMU data augmenta-
tion methods, e.g., random noising, to increase label size and
10 CONCLUSION
prevent overfitting to specific domains. And recent studies
[36, 40, 52, 57] have explored self-supervised learning with In this paper, we practically adopt HAR with realistic over-
data augmentation techniques for leveraging unlabeled data. head for mobile devices. The proposed UniHAR framework
However, many flaky data augmentations have been adopted effectively adopts physics-informed data augmentation on
in those studies [36, 40, 49, 52, 55, 57], which may generate massive unlabeled and limited labeled IMU data to overcome
readings that do not conform to the physical sensing princi- the data heterogeneity across various users. UniHAR is pro-
ples and undermine the data distributions. totyped in the mobile platform and tested introducing low
Different from existing studies, UniHAR aims at building overhead. Extensive evaluation with cross-dataset experi-
a general HAR framework, in which a representation model ments demonstrates its outstanding performance compared
is first built with massive unlabeled data, and supervised with state-of-the-art approaches.
training with limited labeled data is thereafter adopted to
adapt the model across user domains. UniHAR specifically ACKNOWLEDGMENTS
explores physics-informed data augmentation that aligns This research is supported by the National Research Foun-
with the underlying physical process and constructively em- dation, Singapore under its Industry Alignment Fund – Pre-
beds them into different learning stages to improve both positioning (IAF-PP) Funding Initiative, under its NRF In-
intra-domain and inter-domain data representativeness. vestigatorship (NRFI) NRF-NRFI08-2022-0010. Any opinions,
findings and conclusions or recommendations expressed in
this material are those of the authors and do not reflect the
9 DISCUSSION views of National Research Foundation, Singapore. This re-
Impact of orientation representation. The core idea of search is also supported by Singapore MOE Tier 1 (RG88/22).
the physics-informed data augmentation is general and the Mo Li is the corresponding author.
ACM MobiCom ’23, October 2–6, 2023, Madrid, Spain Huatao Xu, Pengfei Zhou, Rui Tan, Mo Li

REFERENCES adaptation. In 2018 IEEE international conference on pervasive comput-


[1] Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, ing and communications (PerCom). IEEE, 1–9.
Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, [18] Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic
Michael Isard, et al. 2016. {TensorFlow}: a system for {Large-Scale} optimization. arXiv preprint arXiv:1412.6980 (2014).
machine learning. In 12th USENIX symposium on operating systems [19] Hyeokhyen Kwon, Catherine Tong, Harish Haresamudram, Yan Gao,
design and implementation (OSDI 16). 265–283. Gregory D Abowd, Nicholas D Lane, and Thomas Ploetz. 2020. Imutube:
[2] Abien Fred Agarap. 2018. Deep learning using rectified linear units Automatic extraction of virtual on-body accelerometry from video for
(relu). arXiv preprint arXiv:1803.08375 (2018). human activity recognition. Proceedings of the ACM on Interactive,
[3] Mikhail Belkin, Daniel Hsu, Siyuan Ma, and Soumik Mandal. 2019. Mobile, Wearable and Ubiquitous Technologies 4, 3 (2020), 1–29.
Reconciling modern machine-learning practice and the classical bias– [20] Hyeokhyen Kwon, Bingyao Wang, Gregory D Abowd, and Thomas
variance trade-off. Proceedings of the National Academy of Sciences 116, Plötz. 2021. Approaching the real-world: Supporting activity recog-
32 (2019), 15849–15854. nition training with virtual imu data. Proceedings of the ACM on
[4] Youngjae Chang, Akhil Mathur, Anton Isopoussu, Junehwa Song, and Interactive, Mobile, Wearable and Ubiquitous Technologies 5, 3 (2021),
Fahim Kawsar. 2020. A systematic study of unsupervised domain 1–32.
adaptation for robust human-activity recognition. Proceedings of the [21] Ang Li, Jingwei Sun, Pengcheng Li, Yu Pu, Hai Li, and Yiran Chen.
ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 4, 1 2021. Hermes: an efficient federated learning framework for heteroge-
(2020), 1–30. neous mobile clients. In Proceedings of the 27th Annual International
[5] Yiqiang Chen, Jindong Wang, Meiyu Huang, and Han Yu. 2019. Cross- Conference on Mobile Computing and Networking. 420–437.
position activity recognition with stratified transfer learning. Pervasive [22] Chenglin Li, Di Niu, Bei Jiang, Xiao Zuo, and Jianming Yang. 2021.
and Mobile Computing 57 (2019), 1–13. Meta-HAR: Federated Representation Learning for Human Activity
[6] Android Developers. [n.d.]. Profile your app performance. https: Recognition. In Proceedings of the Web Conference 2021. 912–922.
//developer.android.com/studio/profile [23] Chenning Li, Xiao Zeng, Mi Zhang, and Zhichao Cao. 2022. PyramidFL:
[7] Alireza Fallah, Aryan Mokhtari, and Asuman Ozdaglar. 2020. Personal- A fine-grained client selection framework for efficient federated learn-
ized federated learning with theoretical guarantees: A model-agnostic ing. In Proceedings of the 28th Annual International Conference on Mobile
meta-learning approach. Advances in Neural Information Processing Computing And Networking. 158–171.
Systems 33 (2020), 3557–3568. [24] Xiang Li, Kaixuan Huang, Wenhao Yang, Shusen Wang, and Zhihua
[8] Siwei Feng and Marco F Duarte. 2019. Few-shot learning-based hu- Zhang. 2019. On the Convergence of FedAvg on Non-IID Data. In
man activity recognition. Expert Systems with Applications 138 (2019), International Conference on Learning Representations.
112782. [25] Xinyu Li, Yanyi Zhang, Ivan Marsic, Aleksandra Sarcevic, and Ran-
[9] Taesik Gong, Yeonsu Kim, Jinwoo Shin, and Sung-Ju Lee. 2019. dall S Burd. 2016. Deep learning for rfid-based activity recognition. In
Metasense: few-shot adaptation to untrained conditions in deep mobile Proceedings of the 14th ACM Conference on Embedded Network Sensor
sensing. In Proceedings of the 17th Conference on Embedded Networked Systems CD-ROM. 164–175.
Sensor Systems. 110–123. [26] Dongxin Liu, Tianshi Wang, Shengzhong Liu, Ruijie Wang, Shuochao
[10] Andreas Grammenos, Cecilia Mascolo, and Jon Crowcroft. 2018. You Yao, and Tarek Abdelzaher. 2021. Contrastive self-supervised represen-
are sensing, but are you biased? a user unaided sensor calibration tation learning for sensing signals from the time-frequency perspec-
approach for mobile sensing. Proceedings of the ACM on Interactive, tive. In 2021 International Conference on Computer Communications
Mobile, Wearable and Ubiquitous Technologies 2, 1 (2018), 1–26. and Networks (ICCCN). IEEE, 1–10.
[11] Harish Haresamudram, Irfan Essa, and Thomas Plötz. 2021. Contrastive [27] Shengzhong Liu, Shuochao Yao, Jinyang Li, Dongxin Liu, Tianshi
predictive coding for human activity recognition. Proceedings of the Wang, Huajie Shao, and Tarek Abdelzaher. 2020. GIobalFusion: A
ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 5, 2 Global Attentional Deep Learning Framework for Multisensor Infor-
(2021), 1–26. mation Fusion. Proceedings of the ACM on Interactive, Mobile, Wearable
[12] Yash Jain, Chi Ian Tang, Chulhong Min, Fahim Kawsar, and Akhil and Ubiquitous Technologies 4, 1 (2020), 1–27.
Mathur. 2022. ColloSSL: Collaborative Self-Supervised Learning for [28] Xiao Liu, Fanjin Zhang, Zhenyu Hou, Li Mian, Zhaoyu Wang, Jing
Human Activity Recognition. Proceedings of the ACM on Interactive, Zhang, and Jie Tang. 2021. Self-supervised learning: Generative or
Mobile, Wearable and Ubiquitous Technologies 6, 1 (2022), 1–28. contrastive. IEEE Transactions on Knowledge and Data Engineering
[13] Jeya Vikranth Jeyakumar, Liangzhen Lai, Naveen Suda, and Mani (2021).
Srivastava. 2019. SenseHAR: a robust virtual activity sensor for smart- [29] Yang Liu, Zhenjiang Li, Zhidan Liu, and Kaishun Wu. 2019. Real-time
phones and wearables. In Proceedings of the 17th Conference on Embed- arm skeleton tracking and gesture inference tolerant to missing wear-
ded Networked Sensor Systems. 15–28. able sensors. In Proceedings of the 17th Annual International Conference
[14] Sijie Ji, Yaxiong Xie, and Mo Li. 2022. SiFall: Practical Online Fall on Mobile Systems, Applications, and Services. 287–299.
Detection with RF Sensing. In Proceedings of the 20th ACM Conference [30] Sebastian OH Madgwick, Andrew JL Harrison, and Ravi Vaidyanathan.
on Embedded Networked Sensor Systems. 563–577. 2011. Estimation of IMU and MARG orientation using a gradient de-
[15] Wenchao Jiang and Zhaozheng Yin. 2015. Human activity recognition scent algorithm. In 2011 IEEE international conference on rehabilitation
using wearable sensors by deep convolutional neural networks. In robotics. IEEE, 1–7.
Proceedings of the 23rd ACM international conference on Multimedia. [31] Mohammad Malekzadeh, Richard G Clegg, Andrea Cavallaro, and
1307–1310. Hamed Haddadi. 2019. Mobile sensor data anonymization. In Pro-
[16] Antonio R Jimenez, Fernando Seco, Carlos Prieto, and Jorge Guevara. ceedings of the international conference on internet of things design and
2009. A comparison of pedestrian dead-reckoning algorithms using a implementation. 49–58.
low-cost MEMS IMU. In 2009 IEEE International Symposium on Intelli- [32] Alan Mazankiewicz, Klemens Böhm, and Mario Bergés. 2020. Incre-
gent Signal Processing. IEEE, 37–42. mental Real-Time Personalization in Human Activity Recognition
[17] Md Abdullah Al Hafiz Khan, Nirmalya Roy, and Archan Misra. 2018. Using Domain Adaptive Batch Normalization. Proceedings of the ACM
Scaling human activity recognition via deep learning-based domain on Interactive, Mobile, Wearable and Ubiquitous Technologies 4, 4 (2020),
Practically Adopting Human Activity Recognition ACM MobiCom ’23, October 2–6, 2023, Madrid, Spain

1–20. 18, 9 (2018), 2892.


[33] Xiaomin Ouyang, Xian Shuai, Jiayu Zhou, Ivy Wang Shi, Zhiyuan Xie, [50] Allan Stisen, Henrik Blunck, Sourav Bhattacharya, Thor Siiger
Guoliang Xing, and Jianwei Huang. 2022. Cosmo: contrastive fusion Prentow, Mikkel Baun Kjærgaard, Anind Dey, Tobias Sonne, and
learning with small data for multimodal human activity recognition. Mads Møller Jensen. 2015. Smart devices are different: Assessing
In Proceedings of the 28th Annual International Conference on Mobile and mitigatingmobile sensing heterogeneities for activity recognition.
Computing And Networking. 324–337. In Proceedings of the 13th ACM conference on embedded networked
[34] Xiaomin Ouyang, Zhiyuan Xie, Jiayu Zhou, Jianwei Huang, and Guo- sensor systems. 127–140.
liang Xing. 2021. ClusterFL: a similarity-aware federated learning [51] Scott Sun, Dennis Melamed, and Kris Kitani. 2021. IDOL: Inertial Deep
system for human activity recognition. In Proceedings of the 19th An- Orientation-Estimation and Localization. In Proceedings of the AAAI
nual International Conference on Mobile Systems, Applications, and Conference on Artificial Intelligence, Vol. 35. 6128–6137.
Services. 54–66. [52] Chi Ian Tang, Ignacio Perez-Pozuelo, Dimitris Spathis, Soren Brage,
[35] Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Nick Wareham, and Cecilia Mascolo. 2021. SelfHAR: Improving Hu-
Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, man Activity Recognition through Self-training with Unlabeled Data.
and Adam Lerer. 2017. Automatic differentiation in pytorch. (2017). Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous
[36] Hangwei Qian, Tian Tian, and Chunyan Miao. 2022. What makes good Technologies 5, 1 (2021), 1–30.
contrastive learning on small-scale wearable-based tasks?. In Proceed- [53] Linlin Tu, Xiaomin Ouyang, Jiayu Zhou, Yuze He, and Guoliang Xing.
ings of the 28th ACM SIGKDD Conference on Knowledge Discovery and 2021. FedDL: Federated Learning via Dynamic Layer Sharing for Hu-
Data Mining. 3761–3771. man Activity Recognition. In Proceedings of the 19th ACM Conference
[37] Xin Qin, Yiqiang Chen, Jindong Wang, and Chaohui Yu. 2019. Cross- on Embedded Networked Sensor Systems. 15–28.
dataset activity recognition via adaptive spatial-temporal transfer [54] Eric Tzeng, Judy Hoffman, Ning Zhang, Kate Saenko, and Trevor Dar-
learning. Proceedings of the ACM on Interactive, Mobile, Wearable rell. 2014. Deep domain confusion: Maximizing for domain invariance.
and Ubiquitous Technologies 3, 4 (2019), 1–25. arXiv preprint arXiv:1412.3474 (2014).
[38] Jorge-L Reyes-Ortiz, Luca Oneto, Albert Samà, Xavier Parra, and Da- [55] Terry T Um, Franz MJ Pfister, Daniel Pichler, Satoshi Endo, Muriel
vide Anguita. 2016. Transition-aware human activity recognition using Lang, Sandra Hirche, Urban Fietzek, and Dana Kulić. 2017. Data aug-
smartphones. Neurocomputing 171 (2016), 754–767. mentation of wearable sensor data for parkinson’s disease monitoring
[39] Angelo Maria Sabatini. 2011. Kalman-filter-based orientation deter- using convolutional neural networks. In Proceedings of the 19th ACM
mination using inertial/magnetic sensors: Observability analysis and international conference on multimodal interaction. 216–220.
performance evaluation. Sensors 11, 10 (2011), 9182–9206. [56] Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data
[40] Aaqib Saeed, Tanir Ozcelebi, and Johan Lukkien. 2019. Multi-task using t-SNE. Journal of machine learning research 9, 11 (2008).
self-supervised learning for human activity detection. Proceedings of [57] Jinqiang Wang, Tao Zhu, Jingyuan Gan, Liming Luke Chen, Huan-
the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies sheng Ning, and Yaping Wan. 2022. Sensor Data Augmentation by
3, 2 (2019), 1–30. Resampling in Contrastive Learning for Human Activity Recognition.
[41] Aaqib Saeed, Flora D Salim, Tanir Ozcelebi, and Johan Lukkien. 2020. IEEE Sensors Journal 22, 23 (2022), 22994–23008.
Federated self-supervised learning of multisensor representations for [58] Yanwen Wang, Jiaxing Shen, and Yuanqing Zheng. 2020. Push the
embedded intelligence. IEEE Internet of Things Journal 8, 2 (2020), limit of acoustic gesture recognition. IEEE Transactions on Mobile
1030–1040. Computing 21, 5 (2020), 1798–1811.
[42] Andrea Rosales Sanabria, Franco Zambonelli, Simon Dobson, and Juan [59] Qingsong Wen, Liang Sun, Fan Yang, Xiaomin Song, Jingkun Gao, Xue
Ye. 2021. ContrasGAN: Unsupervised domain adaptation in Human Wang, and Huan Xu. 2020. Time series data augmentation for deep
Activity Recognition via adversarial and contrastive learning. Pervasive learning: A survey. arXiv preprint arXiv:2002.12478 (2020).
and Mobile Computing (2021), 101477. [60] Qizhe Xie, Zihang Dai, Eduard Hovy, Thang Luong, and Quoc Le. 2020.
[43] Zhiyao Sheng, Huatao Xu, Qian Zhang, and Dong Wang. 2022. Facilitat- Unsupervised data augmentation for consistency training. Advances
ing Radar-Based Gesture Recognition With Self-Supervised Learning. in neural information processing systems 33 (2020), 6256–6268.
In 2022 19th Annual IEEE International Conference on Sensing, Commu- [61] Haifeng Xing, Jinglong Li, Bo Hou, Yongjian Zhang, and Meifeng Guo.
nication, and Networking (SECON). IEEE, 154–162. 2017. Pedestrian stride length estimation from IMU measurements
[44] Muhammad Shoaib, Stephan Bosch, Ozlem Durmaz Incel, Hans and ANN based algorithm. Journal of Sensors 2017 (2017).
Scholten, and Paul JM Havinga. 2014. Fusion of smartphone mo- [62] Huatao Xu, Pengfei Zhou, Rui Tan, Mo Li, and Guobin Shen. 2021.
tion sensors for physical activity recognition. Sensors 14, 6 (2014), LIMU-BERT: Unleashing the Potential of Unlabeled Data for IMU
10146–10176. Sensing Applications. In Proceedings of the 19th ACM Conference on
[45] Connor Shorten and Taghi M Khoshgoftaar. 2019. A survey on image Embedded Networked Sensor Systems. 220–233.
data augmentation for deep learning. Journal of big data 6, 1 (2019), [63] Jianbo Yang, Minh Nhut Nguyen, Phyo Phyo San, Xiaoli Li, and Shonali
1–48. Krishnaswamy. 2015. Deep convolutional neural networks on multi-
[46] Connor Shorten, Taghi M Khoshgoftaar, and Borko Furht. 2021. Text channel time series for human activity recognition.. In Ijcai, Vol. 15.
data augmentation for deep learning. Journal of big Data 8 (2021), Buenos Aires, Argentina, 3995–4001.
1–34. [64] Shuochao Yao, Shaohan Hu, Yiran Zhao, Aston Zhang, and Tarek
[47] Pekka Siirtola and Juha Röning. 2012. Recognizing human activi- Abdelzaher. 2017. Deepsense: A unified deep learning framework for
ties user-independently on smartphones based on accelerometer data. time-series mobile sensing data processing. In Proceedings of the 26th
IJIMAI 1, 5 (2012), 38–45. International Conference on World Wide Web. 351–360.
[48] Joan Sola. 2017. Quaternion kinematics for the error-state Kalman [65] Zhijun Zhou, Yingtian Zhang, Xiaojing Yu, Panlong Yang, Xiang-Yang
filter. arXiv preprint arXiv:1711.02508 (2017). Li, Jing Zhao, and Hao Zhou. 2020. XHAR: Deep Domain Adaptation
[49] Odongo Steven Eyobu and Dong Seog Han. 2018. Feature representa- for Human Activity Recognition with Smart Devices. In 2020 17th
tion and data augmentation for human activity classification based on Annual IEEE International Conference on Sensing, Communication, and
wearable IMU sensor data using a deep LSTM neural network. Sensors Networking (SECON). IEEE, 1–9.

You might also like