0% found this document useful (0 votes)
38 views19 pages

Sensor-Based Datasets For Human Activity Recognition A Systematic Review of Literature

This document presents a systematic review of sensor-based datasets used for human activity recognition. It analyzes variables of publications using these datasets and describes the datasets, including the types of activities, sensors, annotation methods, preprocessing techniques, and classification algorithms used. The review aims to identify the most commonly used datasets and best practices for evaluating activity recognition systems.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views19 pages

Sensor-Based Datasets For Human Activity Recognition A Systematic Review of Literature

This document presents a systematic review of sensor-based datasets used for human activity recognition. It analyzes variables of publications using these datasets and describes the datasets, including the types of activities, sensors, annotation methods, preprocessing techniques, and classification algorithms used. The review aims to identify the most commonly used datasets and best practices for evaluating activity recognition systems.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

SPECIAL SECTION ON HEALTHCARE INFORMATION TECHNOLOGY

FOR THE EXTREME AND REMOTE ENVIRONMENTS

Received August 9, 2018, accepted September 22, 2018, date of publication October 2, 2018, date of current version October 31, 2018.
Digital Object Identifier 10.1109/ACCESS.2018.2873502

Sensor-Based Datasets for Human Activity


Recognition – A Systematic Review
of Literature
EMIRO DE-LA-HOZ-FRANCO1 , PAOLA ARIZA-COLPAS1 , JAVIER MEDINA QUERO 2,

AND MACARENA ESPINILLA 2


1 Department of Computer Science and Electronics, Universidad de la Costa–CUC, Barranquilla 080002, Colombia
2 Department of Computer Science, University of Jaén, Campus Las Lagunillas, 23071 Jaén, Spain
Corresponding author: Emiro De-la-Hoz-Franco ([email protected])
This work was supported in part by the REMIND Project through the European Union’s Horizon 2020 Research and Innovation
Programme under the Marie Skłodowska-Curie under Grant 734355.

ABSTRACT The research area of ambient assisted living has led to the development of activity recognition
systems (ARS) based on human activity recognition (HAR). These systems improve the quality of life and
the health care of the elderly and dependent people. However, before making them available to end users, it is
necessary to evaluate their performance in recognizing activities of daily living, using data set benchmarks
in experimental scenarios. For that reason, the scientific community has developed and provided a huge
amount of data sets for HAR. Therefore, identifying which ones to use in the evaluation process and which
techniques are the most appropriate for prediction of HAR in a specific context is not a trivial task and
is key to further progress in this area of research. This work presents a systematic review of the literature
of the sensor-based data sets used to evaluate ARS. On the one hand, an analysis of different variables
taken from indexed publications related to this field was performed. The sources of information are journals,
proceedings, and books located in specialized databases. The analyzed variables characterize publications
by year, database, type, quartile, country of origin, and destination, using scientometrics, which allowed
identification of the data set most used by researchers. On the other hand, the descriptive and functional
variables were analyzed for each of the identified data sets: occupation, annotation, approach, segmentation,
representation, feature selection, balancing and addition of instances, and classifier used for recognition.
This paper provides an analysis of the sensor-based data sets used in HAR to date, identifying the most
appropriate dataset to evaluate ARS and the classification techniques that generate better results.

INDEX TERMS Ambient assisted living–AAL, human activity recognition–HAR, activities of daily
living–ADL, activity recognition systems–ARS, dataset.

I. INTRODUCTION Nowadays, there has been a growing need for society


The care of elderly dependent people who have difficul- to take care of their health integrating the use of technol-
ties to effectively develop ADL requires a lot of attention ogy. HAR enables monitoring of people’s quality of life
and dedication, because both the lifestyle and the health and more features and functionalities arise in this area over
state of these people are affected. The proliferation of time, relying on a wide repertoire of hardware and software
problems associated with dementia in older adults between components. The research area of AAL has influenced the
74 and 84 years of age [1] constitutes one of the main generation of reminder solutions, as a support for people
public health challenges worldwide. Due to this fact, sec- suffering from neurodegenerative diseases. Proof of this is the
ondary problems are generated that affect mental, physi- implementation of several solutions in indoor environments,
cal and mobility capabilities [2]–[4]. In addition, there is which capture the data generated from the interactions of
a decline in basic communication skills, such as writing, people with an intelligent environment [6]. The objectives
speaking and performing simple and complex motor activ- of HAR, based on the analysis of ADL [7], are: 1) the
ities (cooking, taking medications and paying bills, among creation of predictive models that allow the classification of
others) [5]. the normal and abnormal behaviour of individuals [8], 2) to

2169-3536 2018 IEEE. Translations and content mining are permitted for academic research only.
59192 Personal use is also permitted, but republication/redistribution requires IEEE permission. VOLUME 6, 2018
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
E. De-la-Hoz-Franco et al.: Sensor-Based Data Sets for HAR–Systematic Review of Literature

provide the necessary tools for the caregiver and the medical Motivated by this research field, the main contribution of
team to identify the activities carried out by them and generate this paper is:
preventive and corrective measures. 1) The identification of the most recognised datasets by the
The data collected from heterogeneous sensors deployed academic community regarding HAR, assessing the types of
in smart environments or from sensors attached to the activity and data, data capture devices, level of occupation,
body (wearables), are stored in datasets. In this way, dif- annotation, context and scenario where the data have been
ferent modalities of data collection have been proposed: collected, the duration of the capture and the number of
video [9], [10], audio [11], [12] and binary sensors [6], [13] individuals or inhabitants that generated the activities.
or portable sensors deployed on the body such as accelerom- 2) A characterisation and analysis of each identified
eters and gyroscopes [14], [15], among others. The dataset is dataset, which includes: the different classification tech-
then used to train different machine learning techniques that niques used for the AR, the segmentation techniques used to
predict the behaviour of people with different purposes, such select the data-streaming windows, the feature representation
as sending early warnings to caregivers and mitigating the of the data, the distribution of the dataset used for training
risks related to the deterioration of the health of the monitored and testing and the quality metrics (F-measure and accuracy)
people. used in the HAR evaluation, and
Currently there is a large amount of datasets for HAR. 3) A series of suggestions in relation to the use of differ-
Therefore, identifying which ones to use in the evaluation ent techniques: segmentation, feature representation, feature
process of an ARS and which techniques (in the phases of pre- selection, balancing or addition of instances and distribution
processing, extraction features, feature selection and trans- of datasets for the experimentation processes.
formation, classification and post-classification) are the most In previous literature, two studies [28], [29] can be identi-
appropriate to improve the rates of Activity Recognition (AR) fied, in which six dataset benchmarks are compared to several
is a complex task. The exploratory process of identifying the ARS. In the first study, a comparative analysis of six bench-
most appropriate dataset to be used in the evaluation of the mark datasets (Berkeley [30], USC [31], HMD [32], Oppor-
ARS and the identification of the techniques that have been tunity [24], UTD MHAD [33] and Sport [34]) was made
successfully applied in different AR approaches, in order comparing the F-measure obtained in nine ARS proposals,
to improve accuracy rates, demands considerable time that including theirs (identifying the classifier or classifiers used
the researcher could use on other tasks at the core of their in each proposal). In the second study, a comparative analysis
research. of 15 proposals (including theirs) was documented, compar-
ARS have emerged thanks to the advancement of sensor ing whether they were used or not: six dataset benchmarks
technology, mainly for its ability to understand the situations (CASAS [23], VanKasteren [6] and others), two feature
that arise in contexts in which humans interact while perform- selection techniques (Principal Component Analysis – PCA
ing ADL. As indicated in [16], the practical applications of and Information Gain – IG) and 12 association approaches
ARS are numerous: fall detection [17], gait anomaly detec- (or classifiers). The proposed approach in this paper is an
tion [18], energy expenditure estimation [19], [20], stress original contribution, because there is no benchmark that
detection [21], behaviour monitoring [22] and rehabilita- has identified the most relevant datasets in terms of HAR.
tion [23], among others. Therefore, evaluating the reliability In this paper, for each one of them, a detailed characterisation
of ARS in terms of their ability to predict the different activ- and subsequent analysis is made in relation to the different
ities collected in the dataset is a challenging task. classification techniques used for AR including the segmen-
Initially, ARS were evaluated with adapted laboratory tation techniques used to select the data-streaming windows,
datasets, which were recorded in controlled conditions. the feature representation of the data, the distribution of the
With the growing development of new ARS, the diffi- dataset used for training and testing, and the quality metrics
culty of collecting data is increased, since the collection of (F-measure and accuracy) used in the HAR evaluation.
data recorded in a particular laboratory does not include The compilation, documentation and analysis of the afore-
a wide enough variety of activities to evaluate ARS with mentioned variables for each dataset and the studies done
sufficient rigor. Given this situation, a series of dataset between 2003 and 2017 constitute a considerable volume of
benchmarks (Opportunity [24], HASC [25], AmI Reposi- papers to be reviewed. This task of evaluating the particular-
tory [26], among others) have emerged. Additionally, the ities of each variable analysed in the datasets using scientific
scientific community has created a competition called Eval- rigor makes this work a valuable and relevant contribution.
uating AAL Systems Through Competitive Benchmark- The work has been structured in the following way: in
ing – AR (EvAAL-AR) [27]. Both initiatives aim for Section II, related research is described. Section III pro-
researchers: 1) to develop ARS and put them to the test poses the methodology used for the systematic review of
in the context of experimentation, 2) to submit their pro- the literature. In Section IV, the scientometric analysis is
posals to evaluation using different dataset benchmarks, presented and, in Section V, the technical analyses are pre-
and 3) to validate their developments in an academic sented. Section VI presents the characterisation of each of
competition. the identified datasets and the respective analysis of results.

VOLUME 6, 2018 59193


E. De-la-Hoz-Franco et al.: Sensor-Based Data Sets for HAR–Systematic Review of Literature

Section VII presents the discussion and, finally, Sections VIII proposed method, they applied experiments to three (publicly
and IX present the conclusions and future work, respectively. available) smart home datasets, and compared it with a range
of shallow models in terms of time-slice accuracy and class
II. RELATED RESEARCH accuracy. The datasets were collected using simple sensors
When consulting related works, it has been observed (motion detector, contact switch, pressure mats, mercury
that some reviews of the literature analyse very specific contacts and float sensors), and each of the smart homes
approaches from the point of view of technology (Mode of housed one resident performing ADL. The description of
Movement Recognition – MMR [35], Acceleration-Based each of the datasets, in relation to the number of sensors, data
Activity – ABC [36] and Non-Intrusive Load Monitoring collection time, number of activities, resulting sensor events
NILM [37]). Other reviews, albeit not thoroughly, focus on and activity instances is as follows: first dataset (14 sensors,
the evaluation of classification techniques in terms of their 25 days, 10 activities, 1229 sensor events and 292 activity
effectiveness for the recognition of activities of daily life instances), second dataset (23 sensors, 14 days, 13 activities,
from datasets [38], [39]. However, there is no evidence that 19075 sensor events and 200 activity instances) and third
proposes a systematic review of the literature, which allows dataset (21 sensors, 19 days, 16 activities, 22700 sensor
characterising datasets for HAR based on the analysis of events and 344 activity instances).
ADL. Below a detailed description of the studies previously In [39], three learning classification algorithms were
mentioned is provided. implemented to evaluate the AR of ADL using: Naïve
In [35] the approaches to MMR are compared and Bayesian (NB), Support Vector Machine (SVM) and Random
described from different viewpoints: usability and con- Forest (RF). For this, recruiting ten healthy subjects and mon-
venience, device types, data collection methods, types itoring their activities over 20 days using the sensor system
and errors of sensors used, signal pre-processing meth- was necessary.
ods employed (windowing, de-noising and variable calcula-
tion), feature extraction (statistical and time-domain features, III. METHODOLOGY
energy, power, magnitude and frequency-domain), feature The SRL is a key piece of secondary research that allows
selection and transformation, classification techniques and the creation of frameworks on which future research is sup-
post-classification refining. This paper ends with a quantita- ported. An outstanding reference in this respect is [42], which
tive comparison of the performance of motion mode recogni- proposes a methodology based on the definition of research
tion modules developed by researchers in different domains. questions, search process, inclusion and exclusion criteria,
In [36], a naturalistic 3D acceleration-based activity quality assessment, data collection, data analysis and devia-
dataset, the SCUT-NAA dataset (publicly available) is created tions from protocol. The vast majority of research of this type
to assist researchers in the field of acceleration-based AR and is carried out in the health field, for which the methodologies
to provide a standard dataset for comparing and evaluating proposed in [43]–[45] were analysed. In the field of engineer-
the performance of different algorithms. The SCUT-NAA ing, [42] and [46] propose their respective methodologies.
contains 1278 samples from 44 subjects, collected in natu- Reference [46] is organised in three stages: the definition of
ralistic settings with only one tri-axial accelerometer located search parameters (objective, hypothesis and search index),
alternatively on the waist belt, in the trouser pocket, and in identification and debugging in data bases (selection of search
the shirt pocket. In this research they showed a summary of chains whose results will be deepened) and the proposal of
some representative datasets (none publicly available) on AR answers to the hypothesis (from the information obtained
using acceleration, in which they compared: number of activ- from the categorisation and analysis of the most relevant
ities, number of subjects, whether data was collected under articles). Specifically, in [47] a methodology was proposed
laboratory or naturalistic settings, number of accelerometers to perform a validation of Chronic Kidney Disease (CKD)
used per subject and accuracy. and related conditions in existing datasets (including admin-
In [37], a novel technique to monitor human activity based istrative datasets and disease registries). All these studies
on NILM is presented. In order to evaluate the performance contributed to the approach of the SRL methodology used
of the proposed algorithm, two different datasets have been in the research described here, which was organised in three
considered: the Household Electricity Survey dataset [40] and stages.
the UK Domestic Appliance-Level Electricity – UK-DALE The definition of search parameters configures the first
dataset [41]. Both use real collected data from the aggregated stage and consists in determining the objectives of the review,
and disaggregated energy consumption of UK households. to then identify the following hypothesis: ‘‘Which datasets
The former contains a year of data from three single pensioner have provided the greatest impact on the development of
households, which are the targeted community in this study; research related to the recognition of ADL?’’ Subsequent to
whereas the latter is a two-year collection of data from a this, we proceeded to locate the topics of the databases on
family household (two adults, two children and a dog). which the search would focus (Scopus, IEEExplore, Science
In [38], they proposed the use of Deep Learning (DL) Direct, Web of Science and ACM). From these, the keywords
techniques to automatically learn high-level features from to be used and discarded were identified and validated, due
binary sensor data. To evaluate the performance of the to the noise generated by the latter in the results. Fig. 1 show

59194 VOLUME 6, 2018


E. De-la-Hoz-Franco et al.: Sensor-Based Data Sets for HAR–Systematic Review of Literature

FIGURE 1. Search chains conformation model.

the schema from which the search terms were built, where the
term ‘‘video’’ and ‘‘audio’’ have been excluded, given that
the processing of data based on this type of research differs
considerably from other feedback mechanisms such as pres-
sure, contact, positioning and accelerometer sensors, among
FIGURE 2. Trend in the number of publications: (a) number of
others. Additionally, the term ‘‘outdoor’’ was excluded in publications by year and (b) publications by scientific database.
order to delimit the scope of the publications to be analysed.
In the second stage, the identification and filtering of
the information obtained from the specialised databases was IV. SCIENTOMETRIC ANALYSIS
undertaken, and the results obtained from the application After recording the scientometric variables of the 374 publi-
of the different search chains were analysed. The data have cations, they were quantified based on the following criteria:
been represented synthetically in different arrays and where year of publication, number of articles published by database
the search indexes are hierarchically organised, discarding (identifying those that were referenced in several databases),
the combinations that did not yield any results. In addi- publications according to the typology of the same and to the
tion, the terms that yielded a large number of results were quartile of the magazine, congress or book where it was pub-
identified, with greater specificity criteria added to limit the lished. We also considered the identification of the countries
searches to the subject matter of the research. that receive a greater flow of works and those that output
The search chains that yielded results were selected, iden- a greater flow of works, the journals and universities that
tifying the articles that match the proposed hypothesis. The have more development in this concrete field of research. The
search chains were constructed using the keywords identified following figures and tables of contents illustrate the above
in Fig. 1, its structure being as follows: (HAR OR ADL) AND with greater precision.
dataset AND (‘‘indoor environment’’ OR ‘‘smart homes’’ Evidence of the validity of this field of research (see Fig. 2)
OR ‘‘intelligent buildings’’ OR ‘‘ambient intelligence’’ OR is the growing trend in the number of publications related
‘‘assisted living’’) AND NOT (video) AND NOT (audio) to HAR, in terms of: 1) the implementation of intelligent
AND NOT (outdoor). A series of scientometric variables environments in indoor contexts, 2) the capture of data gener-
were documented for each article, such as the year of publica- ated from the interactions of the inhabitants with the sensors
tion, the journal, the typology of the document, the journal’s deployed in such environments, 3) the collection and structur-
quartile, the country of publication of the journal, the country ing of datasets and 4) the application of predictive algorithms
from where the production is generated and the entity or uni- for the classification of ADL. Fig. 2a shows how the number
versity that presents the product. Additionally, a series of of publications has grown each year in this research field
technical variables were documented to characterise the type reaching its peak in 2017. Fig. 2b indicates that between
of dataset used in the different research works consulted. Such 2003 and 2017, the scientific database with the most research
variables were: 1) if reference was made to a dataset or a products registered (Journals, Proceedings and Chapters of
repository, 2) its name, 3) what type of events it contained, books, among others) is IEEEXplore.
4) its level of occupation and annotation, 5) the sens- 66% of the publications in this field of knowledge are
ing modality with which the dataset was fed. Finally, the carried out in journals and proceedings, which can be
approach, segmentation, representation, feature selection, accessed from the IEEExplore specialised database. 72%
balancing and addition of instances, category and subcate- of the publications accessible from the different specialised
gory of the classifiers referenced in the papers. The last stage databases are carried out in proceedings and 27% in journals
is the presentation and analysis of results. (see Fig. 3a). Although it is true that the highest percentage of

VOLUME 6, 2018 59195


E. De-la-Hoz-Franco et al.: Sensor-Based Data Sets for HAR–Systematic Review of Literature

TABLE 1. Journals with the most publications.

FIGURE 3. Publications by type and quartile: (a) number of publications


by type and (b) publications by quartile.

FIGURE 5. Publications by University.

Table 1 shows the resources where more works related


to the recognition of ADL are published. It is noteworthy
FIGURE 4. Number of publications received and generated by country. that most of them present a good ranking in relation to their
quartile.
publications is made in uncategorised resources (71%), there In addition, we note the production in research at the level
is a significant percentage (19%) of publications that are of scientific Journals regarding pervasive healthcare, by the
made in journals of the first quartile (see Fig. 3b). University of Ulster and Washington State University – WSU
Fig. 4 shows the number of publications received and (see Fig. 5). This fact is due to the institutions having gener-
generated by country, between 2003 and 2017. The first value ated and validated their own datasets, which have been widely
was calculated by counting the publications by the coun- used by a representative sector of the academic community.
try of origin or edition of the journal, proceeding or book. In this way, we highlight the University of Ulster, which
From this, it has been identified that the countries with the presented an initiative for the creation of open datasets within
highest number of publications received, in relation to the pervasive healthcare, which can be consulted in [48]. For its
scope of the HAR, are: USA, Netherlands, China, UK and part, WSU is the creator of the most complete repository of
Germany, among others. To account for the publications AR in smart homes, called CASAS [23], [49].
by the country of generation of these, the place of origin
of the university, research center or organization to which V. TECHNICAL ANALYSIS
the authors of said publication are affiliated was identified. The Inter-University Consortium for Political and Social
Specifically, the criterion of identification of the country of Research – ICPSR [50] is an international consortium of
generation, was determined mostly by taking the one that was more than 750 academic institutions and research organ-
most common to all the authors of the respective paper and in isations, which provides leadership and training in data
some exceptional cases, taking the place of origin of the first access, curation, and methods of analysis for the social sci-
author. From this, we have identified that the countries with ence research community. This organisation has Institutional
the highest number of publications generated, based on HAR, Review Boards (IRB) which review research proposals [51].
are: USA, UK, Italy, China and South Korea, among others. It is a good practice to submit ADL recognition datasets to

59196 VOLUME 6, 2018


E. De-la-Hoz-Franco et al.: Sensor-Based Data Sets for HAR–Systematic Review of Literature

FIGURE 6. Documented events in the datasets. FIGURE 7. Level of occupations.

the IRB for review, in order to safeguard the human subjects


who participate in biomedical or behavioural research.
Rodríguez et al. [52] classify the dataset according to
the type of event or activity granularity levels they record:
actions, activities and behaviours, First, the actions or atomic
events with a timestamp define the lowest granularity degree
of representation (e.g.: open door, move object, turn light FIGURE 8. Number of papers that relate the type of dataset annotation.
off, walk by, be observed in location, among others).
Second, the activities, which were considered as single
actions with an inherent purpose or composed by a set
of different actions, represent an intermediate level of rep-
resentation as regards granularity and have a start-date-
time and end-date-time (e.g.: take coffee, attend conference,
group meeting, video call, send email, among others). Last,
the behaviours were defined as a sequence of activities and/or
actions by a set of compulsory actions or activities plus a set FIGURE 9. Number of papers according to sensing modalities.
of optional actions or activities, where some of them can have
temporal execution interdependencies (e.g., the behaviour
coffee break includes the action exit office, the activity make In Fig. 9, we show the results according to the modality
coffee or take coffee, and the action enter office in this order). in which the data were sensed to feed the dataset. It is
Fig. 6 shows the number of papers that reference actions, noteworthy that most of the research has been done where the
activities or behaviours (or combinations of these events). data is captured by the use of environmental sensors (46.3%),
According to [53], we identify many datasets proposed although there is an evident growth in wearables
in the literature in which three classes of activities can be sensors (19.3%).
distinguished: single activity [54], interleaved activity [55] A relevant classification of Machine Learning is presented
and multi-occupancy [13]. Single defines an activity which in [48], which shows two approaches for AR: Data-Driven
has been fully carried out before starting the performance Approaches (DDA) and Knowledge-Driven Approaches
of a new one; interleaved activities are carried out while (KDA). DDA are based on machine learning techniques in
another activity is being performed at the same time; and which a pre-existent dataset of user behaviours is required.
multi-occupancy is related to a class of activity in which Here a training process is usually carried out to build an
some people are performing their activities simultaneously. activity model which is followed by a testing process and to
Fig. 7 indicates that, in relation to occupation, the vast major- evaluate the generalisation of the model in classifying unseen
ity of the papers consulted have used datasets that contain data activities [38], [108]. Regarding KDA, an activity model
records of single activities, 42.2% of the papers reviewed, is built through the incorporation of rich prior knowledge
while 20.3% correspond to datasets with interleaved gleaned from the application domain, using knowledge engi-
activities. neering and knowledge management techniques [109], [110].
Otherwise, a dataset is annotated when each data record is The vast majority of studies reviewed (76.2%), mention the
assigned a class tag that identifies it. In this case, researchers name of the classifier used (285 papers), while the remaining
use different techniques (manual and automatic) to assign the 23.8% do not mention it (89 papers), 60.9% use classifiers
labelling of the dataset records. Fig. 8 shows that the majority based on DDA, 12.6% use classifiers based on KDA and 2.7%
of datasets referenced in the papers are annotated (60.4%). use classifiers of both approaches.

VOLUME 6, 2018 59197


E. De-la-Hoz-Franco et al.: Sensor-Based Data Sets for HAR–Systematic Review of Literature

TABLE 2. References to classifier subcategories. TABLE 3. Most referenced dataset.

In this study, to account for the results obtained, the hier-


archy defined in [35] was used, in relation to the approaches,
categories, subcategories and classification techniques, used TABLE 4. Details of the most referenced dataset.
for HAR based on the identification of ADL. In several
papers, reference is made to more than one classification
technique. Therefore, in some cases, the reference to sev-
eral subcategories of techniques was counted for the same
papers. Table 2 contains the number of references within
these subcategories associated with the respective categories
and approaches (the 89 papers that do not mention the name
of the classifier used were discarded). The most referenced
subcategories were those belonging to the DDA categories of
Machine Learning Methods – MLM (e.g., Markov Models
– MM, Instance Based Classifiers or Instance based learn-
ing – IBL, Bayesian Classifiers – BC, Decision Trees – DT,
Artificial Neural Networks – ANN, among others); the sub-
categories belonging to Meta-Level Classifiers – MLC (e.g.,
Multi-agent System – MaS and Cascading) and Semantic
Attribute-Based Learning (SABL) within KDA categories are
referenced to a lesser extent.
The consulted papers refer to one or more classification
techniques, with the purpose of carrying out a comparative
analysis of their performance. Accordingly, 41.7% of the
references to classifiers correspond to the MM subcategory
(specifically using the Hidden-Markov Model – HMM clas- Of the 352 papers that indicate the use of datasets,
sifier), while 17.4% reference the SVM classifier, and the 50% (175 papers) explicitly mention the name of the dataset
subcategories IBL and BC have 15.8% references each in the used. In some of these papers, reference is made to more
papers consulted, the first one specifically using the k-Nearest than one dataset when a comparative analysis of the perfor-
Neighbor – kNN algorithm and the second using NB and BC mance quality metrics of the classifier used is performed.
classifiers (see Table 2). The remaining 50% (177 papers) do not mention the name
On the other hand, of 374 publications related to this field of the dataset used. Tables 3 and 4 show the details of the
of research, 94% of the works (352 papers) indicate that they seven (7) most referenced datasets (all are annotated), in the
have used datasets (their own or from other authors), while 175 publications that do identify the name of the dataset used.
2% (7 papers) do not specify what type of data structure The Van Kasteren dataset [57] is the result of the measure-
has been used (if it is a dataset or a repository) and the ment of a Wireless Sensor Network (WSN) in an enclosure
remaining 4% (15 papers) mention the use of a repository, that is occupied by two men (26 and 57 years old). In this
in which the results obtained from the processing are com- apartment there are 14 sensors that indicate changes of state
pared to the different datasets that constitute it. The most used associated with actions such as: opening and closing of doors,
repositories are CASAS [23], [49], [54] and UCI Machine pressure on the apartment floor, as well as sensors on the
Learning Repository [56]. It is important to note that some bed and on the sofa. The characteristics of the dataset are:
papers mention one or several datasets generated or used. the stored values are binary (either because of the use of

59198 VOLUME 6, 2018


E. De-la-Hoz-Franco et al.: Sensor-Based Data Sets for HAR–Systematic Review of Literature

binary sensors for the capture or because some threshold was (who visited her several times). They interacted with envi-
applied to the captured analogue value), it has 245 actions ronment sensors (motion, door and temperature sensors) and,
from different activities (brushing teeth, showering, toileting, like the previous dataset, this one contains information related
bathing, shaving, breakfast, dinner, snacking, drinking, load- to the date and time of each event, the sensor ID and value
ing the dishwasher, unloading the dishwasher, among others) (binary or numeric) of each sensor activated during the event.
and the duration of the capture process was two weeks (which The third dataset contains a wide variety of activities (fill-
included sensor data and annotation). The data were captured ing medication dispenser, hanging up clothes, moving the
through the implementation of RFID, WSN and different couch and coffee table, sitting on the couch, watering plants,
types of sensors (reed switches, mercury contacts, passive sweeping the kitchen floor, playing a game of checkers,
infrared - PIR and float sensors). For the annotation process, setting out ingredients for dinner, setting dining room table,
a combination of Bluetooth headsets with speech recognition reading a magazine, simulating the payment of an electric
and a handwritten register of activities were used. In [2], [29], bill, gathering food for a picnic, retrieving dishes from a
[64], and [65] different classification techniques (NB, HMM, kitchen cabinet, packing supplies in the picnic basket and
Hidden Semi-Markov Model – HSMM and Conditional Ran- packing food in the picnic basket) collected from two inhab-
dom Field – CRF) were used to compare Van Kasteren with itants, who participated at the same time and with 26 tests
one or more datasets. for each pair of inhabitants (40 participants in total), where
The CASAS project [23], [49], [54] is located on the they interacted with environment sensors (motion, item, cab-
campus of WSU. The apartment is made up of a bathroom, inet, water, burner, phone and temperature sensors). This
a living room, three bedrooms and a kitchen. The sensors in dataset contains information related to the date and time of
the apartment are distributed at a distance of approximately each event, the sensor ID and (binary) value of each sensor
one meter. The sensors can be categorised as: motion sensor, activated during the event and the Task ID identifies that
motion area sensor (covers a larger region), item sensor for event. The annotation process was manual (labelled during
selected items in the kitchen, door sensor, burner sensor, hot the recording). In [29], [64], and [66]–[73], different classi-
water sensor, cold water sensor, temperature sensor, elec- fication techniques were used to compare one of the datasets
tricity usage, battery level, light level, shake sensor, light available in the CASAS repository, with one or more datasets
sensor, gyro sensor, experimenter switch (manual trigger) of the same or other repositories. In these works, the most ref-
and fan. The official website1 of the project contains a wide erenced CASAS datasets are: ARAS, Cairo, Aruba, Tulum,
variety of datasets and tools. For each dataset the following Kyoto (ADL Activities and Multiresident ADL Activities),
is detailed: the name of the testbed, the number of residents DOMUS and Tokyo.
or participants, whether it is annotated or not. Addition- The HAR using a smartphone dataset – HAR [59], [60]
ally, the files of each dataset are available for download. was collected using a Samsung Galaxy SII smartphone, with
The CASAS repository also has the following tools: real- the collaboration of 30 volunteers in order to identify actions
time activity profiling, activity learning (recognition, dis- (walking, going upstairs, going downstairs, sitting, stand-
covery, and prediction), AR, rule-based activity prediction, ing and laying down). This dataset is available in the UCI
pattern visualiser, activity visualisation, real-time annota- Machine Learning dataset repository. For each record in the
tion tools, data sampling tools, sequential prediction, multi- dataset, the following is provided: triaxial acceleration from
view transfer learning techniques and mobile activity learner the accelerometer (total acceleration) and the estimated body
(IOS and Android). acceleration, triaxial angular velocity from the gyroscope, to
The most referenced datasets of the CASAS reposi- 561-feature vector with time and frequency domain variables,
tory are: Daily life 2010-2012 (Testbed: Kyoto) also called activity label and an identifier of the subject who carried
ADL Activities [23], Daily life 2010-2011-2012 (Testbed: out the experiment. In [74]–[79], different classification tech-
Aruba) [58] and Multiresident ADL Activities (Testbed: niques were used to compare HAR datasets with one or more
Kyoto) [13]. The first contains activities (making a call, wash- datasets of the same or other repositories.
ing hands, cooking, eating and washing the dishes) collected The Opportunity AR dataset [24] (also available in the
from 20 participants, using environment sensors (motion, UCI repository), allows multi-patient experimentation on
associated with objects, from the medicine box, a flowerpot, 4 subjects in the same venue. The subjects were monitored
a diary, a closet, water, kitchen and telephone use sensors), for 30 days, in a room simulating a studio flat with kitchen,
the dataset contains information related to the date and time deckchair, and outdoor access where subjects performed daily
of each event, the sensor ID and value (binary or numeric) of morning activities. The dataset contains activities (waking
each sensor activated during the event. The second contains up, grooming, making breakfast and cleaning). The data cap-
activities (movement from bed to bathroom, eating, getting ture is achieved through 68 sensors (14 located on objects,
home, housework, leaving home, preparing food, relaxing, 21 ambient sensors and 33 located on the body). The data
sleeping, washing dishes and working) collected from an is provided as a text file containing an array where each
older volunteer woman, and her children and grandchildren row corresponds to a sample, the first column includes the
sample timestamp (ms), and the last two columns include
1 http://casas.wsu.edu/ the labels for modes of locomotion and gestures respectively.

VOLUME 6, 2018 59199


E. De-la-Hoz-Franco et al.: Sensor-Based Data Sets for HAR–Systematic Review of Literature

The annotation process was labelled while recording. TABLE 5. KDA and classification techniques used in analysed datasets.
In [16], [28], and [76] different classification techniques were
used to compare Opportunity with one or more datasets.
The mHealth dataset [63], [80] (also available in the
UCI repository), contains body motion and vital sign record-
ings for ten volunteers performing several physical actions
(standing still, sitting and relaxing, lying down, walking,
climbing stairs, bending waist forward, front arm elevation,
knee bending, cycling, jogging, running, jumping front and
back). Sensors were located on the chest, right wrist and left
ankle of the subject and were used to measure the motion
experienced in diverse body parts (acceleration, rate of turn
and magnetic field orientation). The sensor positioned on
the chest also provides 2-lead ECG measurements, which
can be potentially used for basic heart monitoring, checking
for various arrhythmias or looking at the effects of exercise
on the ECG. The dataset includes fine-grained real-valued
sensor readings of actions over a short time interval, with
no explicit timestamps or locations included in the dataset.
The annotation process was recorded using a video camera.
In [75] and [81], [82] different classification techniques were
used to compare mHealth with one or more datasets. A comprehensive review of each of the seven aforemen-
tioned datasets was carried out. With the analyses presented
VI. CHARACTERISATION AND ANALYSIS OF RESULTS below, we aim to identify: the different classification tech-
Regarding the classification techniques used for the process- niques used for AR, the segmentation techniques used to
ing of the aforementioned datasets, Tables 5 and 6 detail select the data-streaming windows, the feature representation
each of them according to the different approaches (DDA of the data, the distribution of the dataset used for training
and KDA). Additionally, the total number of citations of and testing, and the quality metrics (F-measure and accuracy)
the papers that introduce each dataset (taken from Google used in the evaluation of the proposals analysed. Not all stud-
Scholar) are indicated. Some of the techniques referenced ies applied the same quality metrics and not all present the
in the papers were not listed in the table, because there is values obtained with their corresponding standard deviation.
no evidence that they have been applied to these datasets. The papers that do not explicitly indicate the techniques and
However, the following are mentioned: Activity Discovery metrics used for the processing and evaluation of the dataset
(AD) [83], Bayesian Belief Network (BBN) [84], Hierar- were not documented in the respective tables.
chical, Autonomic Recursive and Distributed Bayesian Net-
work (HARD-BN) [85], Cross-subject unsupervised transfer A. VANKASTEREN DATASET ANALYSIS
learning (CsUTL) [86], Data-Driven Non-Linear Heb- Table 7 presents the evaluation of the pre-processing and clas-
bian (DD-NHL) [87], Dynamic Background Subtraction sification techniques, as well as the quality metrics, of each
(DBS) [88], Temporal Learning using Echo State Network of the proposals applied to the VanKasteren. From this, the
(TL-ESN) [89], Expectation Maximization (EM) algorithm following analysis was made:
[90], Extended Episode Discovery (xED) algorithm [91], • Whenever the VanKasteren and ARAS datasets were
Finite Action-set Learning Automata (FALA) [92], Finite compared when [64] using Streaming Multi-Class
State Machine (FSM) [93], Fuzzy Logic (FL) [94], Fuzzy imbalance ensemble NB classifiers (streamingMEn) and
HMM (FHMM) [95], Fuzzy Inference Model (FIM) [96], when [113] using HMM and HHMM, the best results
Fuzzy Temporal Relationships (FTR) [97], Learning Fre- were obtained with VanKasteren.
quent Patterns of User Behaviour System (LFPUBS) [8], • Only in [38] the noise was eliminated, applying the
Minimum Redundancy Maximum Relevance (MRMR) [98], Stacked Denoising Autoencoder (SDAE) technique,
Multi-stage Decision Model (MsDM) [99], Qualitative which generated an improvement in accuracy, when
Spatial Reasoning + AtomGID (QSR-AtGID) [100], Self- compared with not using this technique.
Adaptive Neural Networks (SANN) and Growing Self Orga- • In [29], [107], and [38] feature selection techniques were
nizing Maps (GSOM) [101], Semantic Indoor Trajectory applied to improve the performance of the classifier.
Model and N-gram Model (SITM-NgM) [102], Sequen- In the first one, an accuracy of up to 85.6% was obtained
tial Extreme Learning Algorithm (SELA) [103], Suitability using the SAE technique. In the second, the IG technique
of Multi-label Learning Algorithms (SMLLA) [104], Term was used, obtaining an accuracy of 95.3%. This signifi-
Based Labelling (TBL) [105] and User Behaviour Shift cant result is also due to the fact that in this proposal, data
Detection (UBSD) [106]. balancing was used through the oversampling approach,

59200 VOLUME 6, 2018


E. De-la-Hoz-Franco et al.: Sensor-Based Data Sets for HAR–Systematic Review of Literature

TABLE 6. DDA and classification techniques used in analysed datasets. TABLE 7. VanKasteren dataset evaluation.

classifier based on RNN, called Long Short Term


(LSTM). However, it is important to note that a very high
accuracy (96.9%) was obtained in [115] without using
segmentation techniques, by applying the classification
using the Synthetic Minority Oversampling Technique technique PCC + CC. Additionally, significant values
(SMOTE) [128]. Finally, in [107] PCA was used, obtain- of accuracy were obtained in [107] and [29] apply-
ing an accuracy of 96.3%. ing the Segmented Activity Instances (Seg-AI) (96.3%)
• Other studies that applied data balancing techniques and Activity Recognition - Segmental Pattern Mining
were: [64] using Multi-class Stream Imbalance (McSI), (AR-SPM) (95.3%) techniques.
[114] using an algorithmic approach and [2] adding • The majority of the analysed studies use the Last-fired
manually synthesised abnormal activities (occurring at (Lf) and Change-point (Cp) feature representation tech-
a wrong time of the day and after and before a specific niques, where the highest accuracy was obtained when
activity); the latter is the proposal with the best accuracy: implemented in [115] 96.9% and [2] 96.7% ± 2.6.
96.7% ± 2.6. • Leave one day out cross validation (Lodo-CV) was the
• Although different segmentation techniques have been most used distribution of data for training and testing in
applied, the most commonly used in these studies was the different experimentation scenarios.
a Time-based and Sliding Window (Tb-SW) of 60 sec- • The most commonly used classifiers in the analszed
onds. The best accuracy obtained when applying this studies are those based on MM (in their different varia-
segmentation technique was 96.7% in [2], using a tions: HMM, HSMM and HHMM), NB, CRF and SVM,

VOLUME 6, 2018 59201


E. De-la-Hoz-Franco et al.: Sensor-Based Data Sets for HAR–Systematic Review of Literature

individually employed, assembled with another tech- TABLE 8. CASAS-kyoto dataset evaluation.
nique or as a benchmark to compare with other proposed
methods.
• The best accuracy achieved in AR, using the
VanKasteren dataset, has been obtained in pro-
posal [115] 96.9%, using PCC + CC as a classifier and in
proposal [2] 96.7% ± 2.6, using the RNN called LSTM
as a classifier.
• Proposals such as [108], where the metric evaluated was
the Average time slice error (%), and [112] where the
metrics evaluated were precision, recall and F1score, are
not documented in Table 7, due to the uniformity of the
TABLE 9. CASAS-aruba dataset evaluation.
analysis.

B. CASAS-KYOTO DATASET ANALYSIS


Table 8 presents the evaluation of the pre-processing and clas-
sification techniques, as well as the quality metrics, of each
of the proposals applied to Kyoto. From this, the following
analysis was made:
• In the analysed studies there is no evidence of the
application of noise elimination techniques and feature
representation when using Kyoto.
• There is only evidence of such in the application of
two data segmentation techniques, in [73] and [117]
Segments of Activities (Seg-oA) and in [29] AR-SPM,
where best accuracy was obtained with the application
of the latter.
• The distribution of data for training and testing in [29]
and [117] was Leave one out cross validation (Loo-CV)
with k-folds = 20 and in [73] with k-folds = 10.
• In [29], a comparison was made between VanKasteren
and Kyoto, evaluating the performance of the ET-kNN, of the proposals applied to Aruba. From this, the following
where best accuracy was obtained when applied in analysis was made:
Kyoto (97.4%) using IG and overlapping activity classes • In [73], a comparison was made between Aruba and
as a feature selection technique. Kyoto, evaluating the performance of the SVM classifier
• The best accuracy results using Kyoto were obtained (both non-graphical and graphical features), where the
in [29] at 97.4%, when the IG feature selection technique best accuracy was obtained for Aruba at 93.4%, with
was applied and at 96.2% when it was not applied. graphical features.
This significant result is also due to the use of data • In the analysed studies, there is no evidence of the
balancing in this proposal through the over-sampling application of noise elimination techniques or addition
approach, using the SMOTE [128]. Although in [73] the of instances when using Aruba.
combination of feature selection techniques Consistency • Only two proposals use feature representation: [107]
Subset Eval (best First Search) – CsubE and Chisquared uses the number of times that sensors activated during
Attribute Evaluation (Ranker Thresh-old 1) – Ch2AE activity and [121] uses Last-state (Ls) representation,
were applied, the obtained results did not exceed those the best accuracy between these two was obtained with
previously mentioned. the first at 91.4%.
• The most commonly used classifiers in the analysed • Different feature selection techniques were used: IG
studies are: LCCRF (with two variations according to in [118] and [119], PCA in [107], CsubE and Ch2AE
the way the activities were evaluated: 1. Single Model in [73] and Activity Features Maintain the Statistical
for a Single Activity – SMSA and 2. Single Model for Information about the Activities – AFMSI-Act (Mutual
All Activities – SMAA), SCCRF, HMM, DCP, ET-kNN Information, Frequency of triggered sensors of an activ-
(applying and not IG) and SVM. ity, Interval time and Last two sensors) in [67]. This
last technique improved the accuracy of the classifier,
C. CASAS-ARUBA DATASET ANALYSIS achieving 100% in AR.
Table 9 presents the evaluation of the pre-processing and clas- • Regarding the distribution of data for training and
sification techniques, as well as the quality metrics, of each testing, although most of the studies used 10-fold

59202 VOLUME 6, 2018


E. De-la-Hoz-Franco et al.: Sensor-Based Data Sets for HAR–Systematic Review of Literature

TABLE 10. CASAS-multiresident dataset evaluation. TABLE 11. UCI-har dataset evaluation.

cross-validation, the best result was obtained in [67] with


a training of 80% and a test with 20% of the data.
• The analysed studies use different segmentation
techniques: Dynamic Window (D-win) or Adaptive
Window (A-win), which is based on the number of
sensors activated or in the approach, by Segmentation
of Activities (Seg-oA) and fixed size based on time. The
highest 100% accuracy was obtained with an innova-
tive segmentation technique called Adaptive windowing
approach [67], which is divided into two phases (off-line
modelling and on-line recognition). In the former phase
a representation called Activity Features (AFs) is built • In the analysed studies there is no evidence of the
from statistical information about the activities from application of noise elimination techniques and addi-
annotated sensory data and a NB classifier is modelled tion or balancing of instances when using UCI-HAR.
accordingly. In the second phase, a dynamic multi- • Regarding the segmentation in [124], a sliding windows
feature windowing approach using AFs and the learner of 128 samples was established and in [123] a win-
NB classifier is introduced to segment unlabelled sensor dows of raw signals is used, but no further details are
data as well as predicting the related activity, as indicated given. In both studies feature representation techniques
in [67]. were used: in [124] they used Scaling to obtain z-scores
• The most used classifiers in the analysed studies are: (from values between 1 and -1), reaching an accuracy
MLR, ET-kNN, NB, HMM, CRF and SVM. of 91.76% and in [123] they used raw signals represen-
tation, reaching 90.5%.
D. CASAS-MULTIRESIDENT DATASET ANALYSIS • The only study that explicitly indicates the distribution
Table 10 presents the evaluation of the pre-processing and of the dataset in the experimentation process for training
classification techniques, as well as the quality metrics, and testing is [124], where 10-fold cross validation was
of each of the proposals applied to the Multiresident ADL used, 70% for training (of 21 subjects) and 30% for
Activities (testbed: Kyoto) dataset. From this, the following testing (of 9 subjects), reaching an accuracy of 91.76%.
analysis was made: • In [75] different kernel approaches for the SVM
classifier were analysed (Compressive single RBF
• In the analysed studies there is no evidence of the appli-
kernel – Cs-RBFk, Compressive single Laplacian
cation of segmentation techniques, noise elimination
kernel – Cs-Lk, Compressive single sigmoid kernel –
and balancing or addition of instances when using the
Cs-Sk, Compressive uniform multi-kernel – Cu-mk,
Multiresident ADL Activities dataset.
Compressive uniform multi-RBF-kernel – Cu-mRBFk,
• Regarding the feature representation in [122], Change-
Compressive alignment-based multi-kernel – Cab-mk,
point (Cp) was used, reaching an accuracy of 70.3%.
Compressive alignment-based multi-RBF-kernel –
• In both [122] and [66] cross-validation was used,
Cab-mRBFk, Compressive SNR-based multi-kernel –
in the first with k-folds = 10 and in the second with
C-SNRb-mk and Compressive SNR-based multi-RBF-
k-folds = 3, the best accuracy being obtained in the
kernel – C-SNRb-mRBFk). When comparing their per-
latter.
formance in the mHealth and UCI-HAR, the highest
• The best accuracy was 81.5% in [66], using the
accuracy (91.4%) was obtained when applying SVM
NB classifier and selecting features manually.
C-SNRb-mk in UCI-HAR.
• The best accuracy achieved with UCI-HAR was 92.6%,
E. UCI-HAR DATASET ANALYSIS using Multiple HMMs classifier with MOT and kNN
Table 11 presents the evaluation of the pre-processing and ensemble, proposed in [78], without applying feature
classification techniques, as well as the quality metrics, selection techniques.
of each of the proposals applied to UCI-HAR. From this, the • The most used classifiers in the analysed studies are:
following analysis was made: SVM and HMM.

VOLUME 6, 2018 59203


E. De-la-Hoz-Franco et al.: Sensor-Based Data Sets for HAR–Systematic Review of Literature

TABLE 12. Opportunity dataset evaluation. TABLE 13. mHEALTH dataset evaluation.

per group), which allowed to reach an accuracy of 94.0%


± 2.0 using a 3SC classifier with SVM.
• The best accuracy was 99.9% in [28], using a classifier
F. OPPORTUNITY DATASET ANALYSIS based on DL technique. The approach proposed in [28]
Table 12 presents the evaluation of the pre-processing and first applied to matrix factorization (NMF) in order to
classification techniques, as well as the quality metrics, project data into a new reduced space to find a better
of each of the proposals applied to Opportunity. From this, activity representation and to increase discrimination
the following analysis was made: capacity. Then, features were automatically extracted
from the projected data using SAE. For classification,
• In the analysed studies there is no evidence of the
they built a softmax classifier on the top hidden layer of
application of noise elimination techniques when using
the SAE.
Opportunity.
• In [127], Smartphone Activity Recognition (AR) [129]
• The only study that explicitly claims to use segmentation
and Opportunity datasets were compared, while in [28]
techniques is [16] with a sliding window of 2 seconds
Berkeley [30], USC [31], HMD [32], UTD MHAD [33],
(1 second overlap), reaching an accuracy of 99.0% when
Sport [34] and Opportunity datasets were compared, and
using a hybrid model based on ER2RP and a classifier
in both studies Opportunity showed better results.
trained with DMA.
• The most used classifiers in the analysed studies are:
• Regarding the feature representation in [125], the raw
SVM, RF, DT, NB and MLP.
sensor data was compressed, which allowed to reach an
accuracy of 94.0% ± 2.0, using a 3SC with SVM.
• Only two studies, [28] and [127], applied feature selec- G. mHEALT DATASET ANALYSIS
tion techniques: IG was used in the former, reaching Table 13 presents the evaluation of the pre-processing and
an accuracy of 96.3% when applying the RF classi- classification techniques, as well as the quality metrics,
fier, and in the latter, SAE were used, achieving an of each of the proposals applied to mHealth. From this,
accuracy of 99.9% when applying a classifier based on the following analysis was made:
DL Technique. • In the analysed studies there is no evidence of the appli-
• Regarding the balancing or addition of instances, cation of noise elimination techniques, features repre-
in [126] the instances are randomly selected for each sentation and selection techniques, balancing or addition
fold among the four inhabitants, which allowed an accu- of instances, nor the distribution of the datasets in the
racy of 92.7% ± 1.3 to be achieved, using a classi- experimentation process (for training and testing) in
fier that operated MLP, SVM and BC. A better result mHealth.
was achieved in [125] where the Instance Reassignment • In [82] sliding window segmentation with a size
technique was applied (avoiding class imbalance within of 60 samples was applied, obtaining an accuracy
each group and also reducing the number of classes of 91.9% when using the CNN classification technique.

59204 VOLUME 6, 2018


E. De-la-Hoz-Franco et al.: Sensor-Based Data Sets for HAR–Systematic Review of Literature

TABLE 14. Classification techniques with better accuracy by dataset. CASAS-Multiresident, there is no evidence of the application
of segmentation techniques.
Regarding feature representation, most of the studies anal-
ysed for VanKasteren use the last-fired technique, obtaining
the highest accuracy (96.7% ± 2.6) in [2]. In CASAS-Kyoto
and mHealth there is no evidence of the application of fea-
ture representation techniques. In CASAS-Aruba, CASAS-
Multiresident, UCI-HAR and Opportunity, although some
techniques have been used (number of times that sensors
were activated, Last-state, Change-point, z-scores and raw
signals), this topic should be addressed with much greater
• The highest accuracy was obtained in [81] at 97.2%, and depth since the accuracy obtained with its application has not
was achieved by applying a Hierarchical Classification been as significant.
Method (HCM) based on the combination of binary Although different feature selection techniques were
classifiers. applied in VanKasteren (SAE, IG and PCA), the highest
• The most used classifiers in the analysed studies are accuracy (96.7% ± 2.6) was obtained without the application
SVM and CNN. of these techniques. For CASAS-Kyoto, a combination of
CsubE and Ch2AE techniques was applied with low results
VII. DISCUSSION and IG, with the highest accuracy at 97.4%. For CASAS-
To address the discussion, Table 14 condenses the classifica- Aruba, different feature selection techniques were used (IG,
tion techniques that generated the highest accuracy for each PCA and CsubE + Ch2AE) with low results, the highest
dataset. We note that the datasets correspond to different con- accuracy (100%) was obtained with statistical information
texts and purposes and therefore the evaluation and methods about the activities. For UCI-HAR, different techniques were
are not directly comparable between them. That is, while applied (PCA, RF, LDA and Correlation), however, the high-
VanKasteren, CASAS-Kyoto, CASAS-Aruba and CASAS- est accuracy (92.6%) was achieved without the application of
Multiresident contain data captured with WSN, environmen- any technique. With Opportunity, the IG and SAE techniques
tal sensors and wearables from the interactions of one or have been used, with the latter obtaining the highest accuracy
several inhabitants in indoor environments, the UCI-HAR, at 99.9%. For CASAS-Multiresident, the feature selection
Opportunity and mHealth datasets contain data captured was carried out manually, reaching the maximum accuracy
from wearables or smartphones not necessarily in indoor (81.5%) and for the mHealth dataset no feature selection
environments. The discussion focuses on the use of differ- technique was applied. Implementing techniques such as
ent techniques (segmentation, feature representation, feature SAE, statistical information about the activities, IG and PCA,
selection, balancing or addition of instances and distribution among others, in those datasets in which very few or no
of datasets for experimentation processes) in each dataset, feature selection techniques have been implemented is an
in order to facilitate the decision making of researchers important challenge to be addressed.
regarding their choice in ARS evaluation processes. The balancing or addition of instances in VanKasteren
Different segmentation techniques have been applied in the allowed to reach the maximum accuracy (96.7% ± 2.6)
VanKasteren and CASAS-Kyoto datasets in almost all the adding instances of manual form of the abnormal activities.
proposals. The use of time-based techniques with 60-second None of the other techniques applied (SMOTE, McSI and
sliding windows and AR-SPM predominates in VanKasteren, algorithmic approach) exceeded this value. In CASAS-Kyoto
reaching its maximum accuracy (96.7% ± 2.6) with the only SMOTE was applied, reaching the highest accuracy
first mentioned technique. In the CASAS-Kyoto dataset the of 97.4%. In Opportunity, several instance addition tech-
maximum accuracy (97.4%) was reached with the AR-SPM niques were applied (instances randomly selected for each
technique, giving a clear idea of which are the most appropri- fold and Instance Reassignment), however the best accu-
ate segmentation techniques in the pre-processing of the data- racy (99.9%) was obtained without the application of tech-
streaming windows of these two datasets. In UCI-HAR [124] niques. There is no evidence of the application of balancing
and mHealth [82] the segmentation for Sliding windows of techniques or addition of instances in the tests carried out
samples (128 and 60 respectively) has been applied, reaching with CASAS-Aruba, CASAS-Multiresident, UCI-HAR and
an accuracy of 91.76% and 91.9%, which have not been the mHealth. It seems that the techniques of addition or bal-
maximum values of accuracy achieved. The application of ancing of instances have not been studied in depth in this
segmentation techniques is still a topic to explore in these two field of research, preferring in some studies to add instances
datasets. There is only one segmentation application refer- manually. However, there is discussion in relation to the fact
ence for Opportunity [16], with a sliding window of 2 seconds that the addition of instances could generate an erroneous
(1 second overlap), reaching an accuracy of 99.0%. Although representation of the data really captured, as for this type
it is not the maximum accuracy reached for this dataset, of study the addition of instances by categories of activities
it is very good reference for future research. In the case of could avoid the bias of the classifier.

VOLUME 6, 2018 59205


E. De-la-Hoz-Franco et al.: Sensor-Based Data Sets for HAR–Systematic Review of Literature

The distribution of data for training and testing in the proceedings or conferences (71%), a very important percent-
different experimentation scenarios presented in the propos- age (27%) of the publications is done in the first and second
als where VanKasteren and CASAS-Kyoto were evaluated quartile journals. The USA can be highlighted as the country
was leave one day out cross-validation; in CASAS-Aruba that accepts the largest amount of publications in this area
and UCI-HAR, it was 10-fold cross- validation; in of research, with a participation of 31.8% with respect to
CASAS-Multiresident, it was cross-validation (3 or 10 fold); the total number of publications received worldwide and
in Opportunity, it was cross-validation (5 or 10 fold) and in followed by the Netherlands with 13.1%. In terms of the
mHealth no such distribution was indicated. number of publications generated, the USA and the UK with
Additionally, work must be done on the dynamic identi- 13.4% of publications each, are the countries that stand out
fication of window sizes according to the types of activi- most, while the participation of Italy is also representative,
ties. Another approach that contributes considerably to this with 11.2%. Although the media in which research results
process is the use of stacked auto encoder for the automatic are published in this area of knowledge are very diverse,
extraction of the features. the most outstanding are: the magazine Pervasive and Mobile
A relevant issue in the methodologies is that the vast Computing (Netherlands) with 2.6% and the series of books
majority of studies have been recreated in experimental sce- Lecture Notes in computer science (Germany), with the same
narios and labelled activities, which generally yield very percentage. The institutions with the most experience in this
good accuracy rates, and are adequate as a process prior to field are: Ulster University (Ireland) with a participation
real-time implementations. In the first hand, the approaches of 4.8% of the publications worldwide and the WSU (USA)
without real-time capabilities are used to evaluate usefully with 3.2%.
the daily activities of inhabitants in long-term within the The technical analysis outlined in this paper has made it
early diagnosis of mental diseases [53]. Example of health possible to identify the WSU CASAS repository as the most
applications for these purposes are analysing disturbed sleep used, as it is referenced in 12.5% of the papers consulted,
cycles, which have become an indicator of mental disease in particular with the following datasets: Tokyo, Aruba,
such as Alzheimer, or identifying a change of patterns in Tulum, DOMUS, Real-Time Smart Home Stats, Single-
activities, which is related to cognitive or physical decline. resident apartment data and Kyoto Multiresident ADL Activi-
On the second hand, the challenge of recognizing activi- ties. Additionally, the VanKasteren (5.7%), UCI HAR (2.8%)
ties in real time undoubtedly generates a series of issues to and Opportunity (2.3%) datasets have shown significant rep-
address, in terms of pre-processing, in particular the repre- resentativeness in terms of usability, which allowed us to
sentation of features and segmentation, requiring advanced answer the hypothesis initially raised. A more detailed char-
methods which process data with little delay and high reli- acterisation of the datasets referenced in the publications
ability [67]. However, the benefits of performing an activ- was made, in terms of: type of event (activities: 35.3%),
ity recognition close to real time allow smart environments occupation (single: 42.2%), annotation (annotated: 60.4%),
to provide a valuable short-term interaction with the user. sensing modalities (environment sensor: 46.3%).
For example, to promptly notify forgetfulness in the activity In addition, it has been possible to identify that the vast
development, such as forgetting the umbrella a rainy day or majority of papers reviewed use a classifier based on DDA
intaking medications, or to prevent home riks of patients with (61%). Most of the papers referred to several classification
dementia when manipulating household appliances, such as techniques for the comparative analysis of quality metrics.
turning off the oven.. Accordingly, 41.7% of the references to classifiers corre-
spond to the MM, 17.4% of the references used to the SVM
VIII. CONCLUSION classifier, while IBL is mentioned in 15.8% of the papers
The purpose of this article has been to propose a series of consulted, 15.8% of the papers mention the use of BC and
recommendations to researchers related to HAR, in relation 9.6% reference the use of DT.
to the identification of the most appropriate dataset according The best results regarding the recognition of activities have
to the type of research. Development in this area of research in been achieved using MLC, which combines several individ-
the last fifteen years has been growing from the presentation ual classifiers, as evidenced by [28], [29], [78], and [81].
of only one research article in 2003 to 85 research arti- In addition, the use of RNN (which should be extended
cles in 2017. The most representative specialised databases, to deep neural networks) considerably improves hit rates.
in relation to the number of scientific publications in ADL, On the other hand, proposal [67] has obtained such a high
is IEEExplore, with 66% of publications in total (247), sur- result because the Activity parameter maintains the statistical
passing the sum of the results obtained from consulting the information about the activities (through Mutual Information,
databases Scopus, Science Direct, Web of Science and ACM. Frequency of triggered sensors of an activity, Interval time
It is noteworthy that the majority of publications in this and Last two sensors) and by its innovative Adaptive win-
field of research are proceedings or conference papers (72% dowing approach.
of publications) and a representative percentage is in journal Regarding the processing of datasets with multi-
(27%). As for the quartile of the publication, although the occupancy, it is necessary to implement techniques that
highest percentage of publications is done in non-categorised automatically select feature values, with the challenges that

59206 VOLUME 6, 2018


E. De-la-Hoz-Franco et al.: Sensor-Based Data Sets for HAR–Systematic Review of Literature

this entails. However, the use of genetic algorithms could [10] P. Lasitha and S. Kodagoda, ‘‘Gaussian mixture based HMM for
be a key element to solve this challenge. Complex activity human daily activity recognition using 3D skeleton features,’’ in Proc.
IEEE 8th Conf. Ind. Electron. Appl. (ICIEA), Jul. 2013, pp. 567–572,
recognition (cooperative, parallel and individual) is a non- doi: 10.1109/ICIEA.2013.6566433.
trivial problem due to the conflicts that are generated in [11] N. Oliver, E. Horvitz, and A. Garg, ‘‘Layered representations for human
the capture of data generated by the interactions. The rule- activity recognition,’’ in Proc. 4th IEEE Int. Conf. Multimodal Interfaces,
Oct. 2002, pp. 3–8, doi: 10.1109/ICMI.2002.1166960.
based approach could provide solutions to the management [12] S. Mostafa-Al-Masum, H. Keikichi, and I. Mitsuru, ‘‘Recognition of real-
of such conflicts, taking into account the spatial and temporal world activities from environmental sound cues to create life-log,’’ in
location of the inhabitants. The Systemic Dimension of Globalization. Rijeka, Croatia: InTech, 2011,
pp. 173–190, doi: 10.5772/22491.
[13] G. Singla, D. J. Cook, and M. Schmitter-Edgecombe, ‘‘Recognizing
IX. FUTURE WORKS independent and joint activities among multiple residents in smart
In future work, we propose the evaluation of the recognition environments,’’ J. Ambient Intell. Humanized Comput., vol. 1, no. 1,
capacity of an ARS of complex activities, comparing the pp. 57–63, 2010, doi: 10.1007/s12652-009-0007-1.
[14] L. Young-Seol and C. Sung-Bae, ‘‘Activity recognition using hierarchical
obtained quality metrics. In a first phase, we will recreate hidden Markov models on a smartphone with 3D accelerometer,’’ in
experimentation scenarios, which will be validated from our Proc. Int. Conf. Hybrid Artif. Intell. Syst., in Lecture Notes in Com-
own multi-occupancy dataset and other dataset benchmarks. puter Science, vol. 6678. Berlin, Germany: Springer, 2011, pp. 460–467,
doi: 10.1007/978-3-642-21219-2_58.
In a second phase, the aim is to provide real-time capabil- [15] A. M. Mannini and A. Sabatini, ‘‘Machine learning methods for classi-
ities. In addition, we will focus on the implementation of fying human physical activity from on-body accelerometers,’’ Sensors,
techniques for the re-labelling of activities (not identified) vol. 10, no. 2, pp. 1154–1175, 2010, doi: 10.3390/s100201154.
[16] H. Gjoreski, M. Gams, and M. Lutrek, ‘‘Human activity recognition:
and the recognition of activities using a multi-level classifier From controlled lab experiments to competitive live evaluation,’’ in
approach that integrates: 1) genetic algorithms for feature Proc. IEEE Int. Conf. Data Mining Workshop (ICDMW), Nov. 2015,
selection, and 2) Growing Hierarchical Self-Organizing Maps pp. 139–145, doi: 10.1109/ICDMW.2015.29.
for classification based on the proposals of [130]–[132]. [17] H. Gjoreski, M. Gams, and M. Lus̋trek, ‘‘Context-based fall detection
and activity recognition using inertial and location sensors,’’ J. Ambient
Intell. Smart Environ., vol. 6, no. 4, pp. 419–433, 2014, doi: 10.3233/AIS-
ACKNOWLEDGMENTS 140268.
The authors would like to thank the team members (Miguel [18] H. H. Manap, N. M. Tahir, and A. I. M. Yassin, ‘‘Anomalous gait
detection based on support vector machine,’’ in Proc. IEEE Int. Conf.
Ortiz, Sandra De-la-Hoz, Guillermo Rodriguez, Zhoe Comas, Comput. Appl. Ind. Electron., Dec. 2011, pp. 623–626, doi: 10.1109/
Fabio Mendoza, Dionicio Neira and Andrés Sanchez) and ICCAIE.2011.6162209.
collaborators (Juan De-la-Hoz, Alejandro De-la-Hoz, Mario [19] H. Gjoreski, B. Kaluža, M. Gams, M. Radoje, and M. Luštrek,
‘‘Context-based ensemble method for human energy expenditure esti-
Orozco, Ernesto Esmeral, Nahum De Ávila, Walter Santiago mation,’’ Appl. Soft Comput., vol. 37, pp. 960–970, Dec. 2015,
and Jans Patiño). doi: 10.1016/j.asoc.2015.05.001.
[20] M. Altini, J. Penders, R. Vullers, and O. Amft, ‘‘Estimating energy
REFERENCES expenditure using body-worn accelerometers: A comparison of meth-
ods, sensors number and positioning,’’ IEEE J. Biomed. Health Inform.,
[1] D. A. Umphred, Neurological Rehabilitation, vol. 27. Amsterdam, vol. 19, no. 1, pp. 219–226, Jan. 2015, doi: 10.1109/JBHI.2014.2313039.
The Netherlands: Elsevier, no. 5, 2013.
[21] M. Gjoreski, H. Gjoreski, M. Lutrek, and M. Gams, ‘‘Automatic detec-
[2] D. Arifoglu and A. Bouchachia, ‘‘Activity recognition and abnormal
tion of perceived stress in campus students using smartphones,’’ in
behaviour detection with recurrent neural networks,’’ Procedia Comput.
Proc. Int. Conf. Intell. Environ., Jul. 2015, pp. 132–135, doi: 10.1109/
Sci., no. 110, pp. 86–93, Jul. 2017, doi: 10.1016/j.procs.2017.06.121.
IE.2015.27.
[3] M. S. Albert et al., ‘‘The diagnosis of mild cognitive impairment due to
[22] H. Alemdar, C. Tunca, and C. Ersoy, ‘‘Daily life behaviour mon-
Alzheimer’s disease: Recommendations from the National Institute on
itoring for health assessment using machine learning: Bridging the
Aging-Alzheimer’s Association workgroups on diagnostic guidelines for
gap between domains,’’ Pers. Ubiquitous Comput., vol. 19, no. 2,
Alzheimer’s disease,’’ Alzheimer’s Dementia, vol. 7, no. 3, pp. 270–279,
pp. 303–315, Feb. 2015, doi: 10.1007/s00779-014-0823-y.
2011, doi: 10.1016/j.jalz.2011.03.008.
[4] F. E. Mendoza et al., ‘‘Cardiovascular disease analysis using supervised [23] D. Cook, A. S. Crandall, B. L. Thomas, and N. C. Krishnan, ‘‘CASAS:
and unsupervised data mining techniques,’’ J. Softw., vol. 12, no. 2, A smart home in a box,’’ Computer, vol. 46, no. 7, pp. 62–69, Jul. 2013,
pp. 81–90, Feb. 2017, doi: 10.17706/jsw.12.2.81-90. doi: 10.1109/MC.2012.328.
[5] Alzheimer’s Association, ‘‘2013 Alzheimer’s disease facts and fig- [24] R. Chavarriaga et al., ‘‘The opportunity challenge: A benchmark
ures,’’ Alzheimer’s Dementia, vol. 9, no. 2, pp. 208–245, Mar. 2013, database for on-body sensor-based activity recognition,’’ Pattern
doi: 10.1016/j.jalz.2013.02.003. Recognit. Lett., vol. 34, no. 15, pp. 2033–2042, Nov. 2013,
[6] T. L. M. van Kasteren, G. Englebienne, and B. J. A. Kröse, ‘‘Human doi: 10.1016/j.patrec.2012.12.014.
activity recognition from wireless sensor network data: Benchmark and [25] N. Kawaguchi et al., ‘‘HASC Challenge: Gathering large scale human
software,’’ in Activity Recognition in Pervasive Intelligent Environments, activity corpus for the real-world activity understandings,’’ in Proc.
vol. 4. Paris, France: Atlantis Press, 2011, pp. 165–186. doi: 10.2991/978- 2nd Augmented Hum. Int. Conf. AH, 2011, pp. 1–5, doi: 10.1145/
94-91216-05-3_8. 1959826.1959853.
[7] J. Ye, S. Dobson, and S. McKeever, ‘‘Situation identification techniques [26] B. Kaluža, S. Kozina, and M. Luštrek, ‘‘The activity recognition repos-
in pervasive computing: A review,’’ Pervasive Mobile Comput., vol. 8, itory: Towards competitive benchmarking in ambient intelligence,’’
no. 1, pp. 36–66, 2012, doi: 10.1016/j.pmcj.2011.01.004. in Proc. AAAI Activity Context Represent., Techn. Lang., Jan. 2012,
[8] A. Aztiria, J. C. Augusto, R. Basagoiti, A. Izaguirre, and D. J. Cook, pp. 44–47.
‘‘Learning frequent behaviours of the users in intelligent environments,’’ [27] H. Gjoreski et al., ‘‘Competitive live evaluations of activity-recognition
J. Ambient Intell. Smart Environ., vol. 2, no. 4, pp. 435–436, 2010, systems,’’ IEEE Pervasive Comput., vol. 14, no. 1, pp. 70–77,
doi: 10.3233/AIS-2010-0084. Jan./Mar. 2015, doi: 10.1109/MPRV.2015.3.
[9] M. Ghazvininejad, H. R. Rabiee, N. Pourdamghani, and P. Khanipour, [28] B. Chikhaoui and F. Gouineau, ‘‘Towards automatic feature extraction for
‘‘HMM based semi-supervised learning for activity recognition,’’ in activity recognition from wearable sensors: A deep learning approach,’’ in
Proc. ACM Int. Workshop Situation Activity Goal Awareness, Sep. 2011, Proc. IEEE Int. Conf. Data Mining Workshops (ICDMW), New Orleans,
pp. 95–100, doi: 10.1145/2030045.2030065. LA, USA, Nov. 2017, pp. 693–702, doi: 10.1109/ICDMW.2017.97.

VOLUME 6, 2018 59207


E. De-la-Hoz-Franco et al.: Sensor-Based Data Sets for HAR–Systematic Review of Literature

[29] L. G. Fahad, S. F. Tahir, and M. Rajarajan, ‘‘Feature selection [49] S. K. Das and D. J. Cook, ‘‘Designing smart environments: A paradigm
and data balancing for activity recognition in smart homes,’’ in based on learning and prediction,’’ in Pattern Recognition and Machine
Proc. IEEE Int. Conf. Commun. (ICC), Jun. 2015, pp. 512–517, Intelligence (Lecture Notes in Computer Science), vol. 3776, S. K. Pal,
doi: 10.1109/ICC.2015.7248373. S. Bandyopadhyay, and S. Biswas, Eds. Berlin, Germany: Springer, 2005,
[30] F. Ofli, R. Chaudhry, G. Kurillo, R. Vidal, and R. Bajcsy, ‘‘Berkeley pp. 80–90, doi: 10.1007/11590316_11.
MHAD: A comprehensive multimodal human action database,’’ in Proc. [50] ICPSR dataset. Inst. Social Res., Univ. Michigan, Ann Arbor,
IEEE Workshop Appl. Comput. Vis. (WACV), Jan. 2013, pp. 53–60, MI, USA. Accessed: Jul. 31, 2018. [Online]. Available:
doi: 10.1109/WACV.2013.6474999. https://www.icpsr.umich.edu/icpsrweb/content/about
[31] M. Zhang and A. A. Sawchuk, ‘‘USC-HAD: A daily activity dataset [51] IRBS. International Review Boards. Accessed: Jul. 31, 2018. [Online].
for ubiquitous activity recognition using wearable sensors,’’ in Proc. Available: https://www.icpsr.umich.edu/icpsrweb/ICPSR/irb/index.jsp
ACM Conf. Ubiquitous Comput. (UbiComp), Sep. 2012, pp. 1036–1043, [52] N. D. Rodríguez, M. P. Cuéllar, J. Lilius, and M. D. Calvo-Flores,
doi: 10.1145/2370216.2370438. ‘‘A fuzzy ontology for semantic modelling and recognition of human
[32] B. Bruno, F. Mastrogiovanni, A. Sgorbissa, T. Vernazza, and R. Zaccaria, behaviour,’’ Knowl.-Based Syst., vol. 66, pp. 46–60, Aug. 2014,
‘‘Analysis of human behavior recognition algorithms based on accelera- doi: 10.1016/j.knosys.2014.04.016.
tion data,’’ in Proc. IEEE Int. Conf. Robot. Autom. (ICRA), May 2013, [53] F. J. Quesada, F. Moya, J. Medina, L. Martínez, C. Nugent, and
pp. 1602–1607, doi: 10.1109/ICRA.2013.6630784. M. Espinilla, ‘‘Generation of a partitioned dataset with single, interleave
[33] C. Chen, R. Jafari, and N. Kehtarnavaz, ‘‘UTD-MHAD: A multimodal and multioccupancy daily living activities,’’ in Proc. Int. Conf. Ubiquitous
dataset for human action recognition utilizing a depth camera and a Comput. Ambient Intell. Cham, Switzerland: Springer, 2015, pp. 60–71,
wearable inertial sensor,’’ in Proc. IEEE Int. Conf. Image Process. (ICIP), doi: 10.1007/978-3-319-26401-1_6.
Sep. 2015, pp. 168–172, doi: 10.1109/ICIP.2015.7350781. [54] D. Cook, M. Schmitter-Edgecombe, A. Crandall, C. Sanders, and
[34] K. Altun, B. Barshan, and O. Tunçel, ‘‘Comparative study on clas- B. Thomas, ‘‘Collecting and disseminating smart home sensor data in
sifying human activities with miniature inertial and magnetic sen- the CASAS project,’’ in Proc. CHI Workshop Developing Shared Home
sors,’’ Pattern Recognit., vol. 43, no. 10, pp. 3605–3620, Oct. 2010, Behav. Datasets Adv. HCI Ubiquitous Comput. Res., 2009, pp. 1–7.
doi: 10.1016/j.patcog.2010.04.019. [55] G. Singla, D. J. Cook, and M. Schmitter-Edgecombe, ‘‘Tracking activities
[35] M. Elhoushi, J. Georgy, A. Noureldin, and M. J. Korenberg, ‘‘A sur- in complex settings using smart environment technologies,’’ Int. J. Biosci.
vey on approaches of motion mode recognition using sensors,’’ IEEE Psychiatry Technol., vol. 1, no. 1, pp. 25–35, Jan. 2009.
Trans. Intell. Transp. Syst., vol. 18, no. 7, pp. 1662–1686, Jul. 2017, [56] UCI Machine Learning Repository. Accessed: Jul. 31, 2018. [Online].
doi: 10.1109/TITS.2016.2617200. Available: https://archive.ics.uci.edu/ml/index.php
[36] X. Yang and J. Lianwen, ‘‘A naturalistic 3D acceleration-based activ- [57] T. L. M. van Kasteren, G. Englebienne, and B. J. A. Kröse, ‘‘Activ-
ity dataset & benchmark evaluations,’’ Proc. IEEE Int. Conf. Syst., ity recognition using semi-Markov models on real world smart home
Man Cybern., Oct. 2010, pp. 4081–4085, doi: 10.1109/ICSMC.2010. datasets,’’ J. Ambient Intell. Smart Environ., vol. 2, no. 3, pp. 311–325,
5641790. Aug. 2010, doi: 10.3233/AIS-2010-0070.
[37] J. M. Alcalá, J. Ureña, A. Hernández, and D. Gualda, ‘‘Assessing human [58] D. Cook, ‘‘Learning setting-generalized activity models for smart
activity in elderly people using non-Intrusive load monitoring,’’ Sensors, spaces,’’ IEEE Intell. Syst., vol. 27, no. 1, pp. 32–38, Jan./Feb. 2012,
vol. 17, no. 2, p. 351, Feb. 2017, doi: 10.3390/s17020351. doi: 10.1109/MIS.2010.112.
[59] D. Anguita, A. Ghio, L. Oneto, X. Parra, and J. L. Reyes-Ortiz, ‘‘A pub-
[38] G. Chen, A. Wang, S. Zhao, L. Liu, and C.-Y. Chang, ‘‘Latent feature
lic domain dataset for human activity recognition using smartphones,’’
learning for activity recognition using simple sensors in smart homes,’’
in Proc. 21th Eur. Symp. Artif. Neural Netw., Comput. Intell. Mach.
Multimedia Tools Appl., vol. 77, no. 12, pp. 15201–15219, Jun. 2018,
Learn. (ESANN), Bruges, Belgium, Apr. 2013, pp. 437–442.
doi: 10.1007/s11042-017-5100-4.
[60] C. A. Ronao and S.-B. Cho, ‘‘Human activity recognition using smart-
[39] T. Nef et al., ‘‘Evaluation of three state-of-the-art classifiers for recogni-
phone sensors with two-stage continuous hidden Markov models,’’
tion of activities of daily living from smart home ambient data,’’ Sensors,
in Proc. 10th Int. Conf. Nat. Comput., Aug. 2014, pp. 681–686,
vol. 15, no. 5, pp. 11725–11740, May 2015, doi: 10.3390/s150511725.
doi: 10.1109/ICNC.2014.6975918.
[40] J. P. Zimmermann et al., ‘‘Household electricity survey: A study
[61] D. Roggen et al., ‘‘Collecting complex activity datasets in highly rich
of domestic electrical product usage,’’ Intertek, London, U.K., Tech.
networked sensor environments,’’ in Proc. 7th Int. Conf. Netw. Sens. Syst.,
Rep. R66141, May 2012, p. 600.
Jun. 2010, pp. 233–240, doi: 10.1109/INSS.2010.5573462.
[41] J. Kelly and W. Knottenbelt, ‘‘The UK-DALE dataset, domestic [62] P. Lukowicz et al., ‘‘Recording a complex, multi modal activity data
appliance-level electricity demand and whole-house demand from set for context recognition,’’ in Proc. 23th Int. Conf. Archit. Comput.
five UK homes,’’ Sci. Data, vol. 2, Mar. 2015, Art. no. 150007, Syst., Feb. 2010, pp. 1–6. Accessed: Jul. 31, 2018. [Online]. Available:
doi: 10.1038/sdata.2015.7. http://www.opportunity-project.eu/challengeDataset
[42] B. Kitchenham, O. P. Brereton, D. Budgen, M. Turner, J. Bailey, and [63] O. Banos et al., ‘‘mHealthDroid: A novel framework for agile develop-
S. Linkman, ‘‘Systematic literature reviews in software engineering ment of mobile health applications,’’ in Proc. 6th Int. Workshop Conf.
—A systematic literature review,’’ Inf. Softw. Technol., vol. 51, no. 1, Ambient Assist. Living (IWAAL), Belfast, U.K., Dec. 2014, pp. 91–98,
pp. 7–15, Jan. 2009, doi: 10.1016/j.infsof.2008.09.009. doi: 10.1007/978-3-319-13105-4_14.
[43] C. Manterola, P. Astudillo, E. Arias, and N. Claros, ‘‘Systematic reviews [64] A. Shahi, J. D. Deng, and B. J. Woodford, ‘‘A streaming ensemble
of the literature: What should be known about them,’’ Cirugía Española, classifier with multi-class imbalance learning for activity recognition,’’ in
vol. 91, no. 3, pp. 149–155, Mar. 2013, doi: 10.1016/j.ciresp.2011.07.009. Proc. Int. Joint Conf. Neural Netw. (IJCNN), May 2017, pp. 3983–3990.
[44] L. García-Pérez et al., ‘‘Systematic review of health-related utilities in [65] M. H. Kabir, M. R. Hoque, K. Thapa, and S.-H. Yang, ‘‘Two-layer
Spain: the case of mental health,’’ Gaceta Sanitaria, vol. 28, no. 1, hidden Markov model for human activity recognition in home environ-
pp. 77–83, May 2013, doi: 10.1016/j.gaceta.2013.04.006. ments,’’ Int. J. Distrib. Sensor Netw., vol. 12, no. 1, p. 4560365, 2016,
[45] C. A. Merlano-Porras and L. Gorbanev, ‘‘Health system in Colombia: doi: 10.1155/2016/4560365.
A systematic review of literature,’’ Revista Gerencia Políticas Salud, [66] H. Fang, R. Srinivasan, and D. J. Cook, ‘‘Feature selections for human
vol. 12, no. 24, pp. 74–86, Jan./Jun. 2013. activity recognition in smart home environments,’’ Int. J. Innov. Comput.,
[46] A. Sanchez, D. Neira, and J. J. Cabello, ‘‘Frameworks Inf. Control, vol. 8, no. 5B, pp. 3525–3535, May 2012.
applied in quality management—A systematic review,’’ Rev. [67] A. Shahi, B. J. Woodford, and H. Lin, ‘‘Dynamic real-time segmentation
Espacios, vol. 37, no. 9, p. 17, Jan. 2016. [Online]. Available: and recognition of activities using a multi-feature windowing approach,’’
http://www.revistaespacios.com/a16v37n09/16370917.html in Proc. PAKDD Workshops, Jeju, South Korea, vol. 10526, U. Kang, ed.
[47] M. E. Grams et al., ‘‘Validation of CKD and related conditions in existing Cham, Switzerland: Springer, 2017, pp. 26–38, doi: 10.1007/978-3-319-
data sets: A systematic review,’’ Amer. J. Kidney Diseases, vol. 57, no. 1, 67274-8_3
pp. 44–54, Jan. 2011, doi: 10.1053/j.ajkd.2010.05.013. [68] I. Fatima, M. Fahim, Y.-K. Lee, and S. Lee, ‘‘Effects of smart home
[48] C. Nugent et al., ‘‘An initiative for the creation of open datasets within dataset characteristics on classifiers performance for human activity
pervasive healthcare,’’ in Proc. 10th EAI Int. Conf. Pervasive Com- recognition,’’ in Computer Science and Its Applications (Lecture Notes
put. Technol. Healthcare, Cancun, Mexico, May 2016, pp. 318–321, in Electrical Engineering), vol. 203, S.-S. Yeo ed. 2012, pp. 271–281,
doi: 10.4108/eai.16-5-2016.2263830. doi: 10.1007/978-94-007-5699-1_28.

59208 VOLUME 6, 2018


E. De-la-Hoz-Franco et al.: Sensor-Based Data Sets for HAR–Systematic Review of Literature

[69] N. Twomey, T. Diethe, I. Craddock, and P. Flach, ‘‘Unsupervised learn- [89] S. Basterrech and V. K. Ojha, ‘‘Temporal learning using echo state
ing of sensor topologies for improving activity recognition in smart network for human activity recognition,’’ in Proc. 3rd Eur. Netw. Intell.
environments,’’ Neurocomputing, vol. 234, pp. 93–106, Apr. 2017, Conf. (ENIC), Sep. 2016, pp. 217–223, doi: 10.1109/ENIC.2016.039.
doi: 10.1016/j.neucom.2016.12.049. [90] Y.-H. Chen, C.-H. Lu, K.-C. Hsu, L.-C. Fu, Y.-J. Yeh, and L.-C. Kuo,
[70] N. K. Suryadevara, S. C. Mukhopadhyay, R. Wang, and R. K. Rayudu, ‘‘Preference model assisted activity recognition learning in a smart
‘‘Forecasting the behavior of an elderly using wireless sensors data in a home environment,’’ in Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst.,
smart home,’’ Eng. Appl. Artif. Intell., vol. 26, no. 10, pp. 2641–2652, Oct. 2009, pp. 4657–4662, doi: 10.1109/IROS.2009.5353937.
2013, doi: 10.1016/j.engappai.2013.08.004. [91] J. Soulas, P. Lenca, and A. Thépaut, ‘‘Unsupervised discovery of
[71] K. Amphawan, J. Soulas, and P. Lenca, ‘‘Mining top-k regular episodes activities of daily living characterized by their periodicity and vari-
from sensor streams,’’ Procedia Comput. Sci., no. 69, pp. 76–85, ability,’’ Eng. Appl. Artif. Intell., vol. 45, pp. 90–102, Oct. 2015,
Nov. 2015, doi: 10.1016/j.procs.2015.10.008. doi: 10.1016/j.engappai.2015.06.006.
[72] J. W. Lee, A. Helal, Y. Sung, and K. Cho, ‘‘Context-driven control [92] M. Ros, M. Delgado, A. Vila, H. Hagras, and A. Bilgin, ‘‘A fuzzy logic
algorithms for scalable simulation of human activities in smart homes,’’ approach for learning daily human activities in an ambient intelligent
in Proc. IEEE 10th Int. Conf. Ubiquitous Intell. Comput., Dec. 2013, environment,’’ in Proc. IEEE Int. Conf. Fuzzy Syst., Jun. 2012, pp. 1–8,
pp. 285–292, doi: 10.1109/UIC-ATC.2013.68. doi: 10.1109/FUZZ-IEEE.2012.6250770.
[73] S. S. Akter and L. B. Holder, ‘‘Activity recognition using graphical [93] J. Kavya and M. Geetha, ‘‘An FSM based methodology for inter-
features,’’ in Proc. 13th Int. Conf. Mach. Learn. Appl., Dec. 2014, leaved and concurrent activity recognition,’’ in Proc. Int. Conf. Adv.
pp. 165–170, doi: 10.1109/ICMLA.2014.31. Comput., Commun. Informat. (ICACCI), Sep. 2016, pp. 994–999,
[74] T. R. Bandaragoda, K. M. Ting, D. Albrecht, F. T. Liu, and doi: 10.1109/ICACCI.2016.7732174.
J. R. Wells, ‘‘Efficient anomaly detection by isolation using nearest [94] Y.-P. Huang and S.-R. Chen, ‘‘A fuzzy approach to discriminating heart-
neighbour ensemble,’’ in Proc. IEEE Int. Conf. Data Mining Workshop, beat types and detecting arrhythmia,’’ in Proc. Int. Conf. Fuzzy Theory
Dec. 2014, pp. 698–705, doi: 10.1109/ICDMW.2014.70. Appl., Nov. 2012, pp. 327–332, doi: 10.1109/iFUZZY.2012.6409725.
[75] T. Chanyaswad, J. M. Chang, and S. Y. Kung, ‘‘A compressive multi- [95] N. Pathak, N. Roy, and A. Biswas, ‘‘Iterative signal separation assisted
kernel method for privacy-preserving machine learning,’’ in Proc. Int. energy disaggregation,’’ in Proc. 6th Int. Green Sustain. Comput.
Joint Conf. Neural Netw. (IJCNN), May 2017, pp. 4079–4086. Conf. (IGSC), Dec. 2015, pp. 1–8, doi: 10.1109/IGCC.2015.7393701.
[76] J. Wen and Z. Wang, ‘‘Sensor-based adaptive activity recognition with [96] G. Acampora and A. Vitiello, ‘‘Interoperable neuro-fuzzy services
dynamically available sensors,’’ Neurocomputing, vol. 218, pp. 307–317, for emotion-aware ambient intelligence,’’ Neurocomputing, vol. 122,
Dec. 2016, doi: 10.1016/j.neucom.2016.08.077. pp. 3–12, Dec. 2013, doi: 10.1016/j.neucom.2013.01.046.
[77] D. Acharjee, A. Mukherjee, J. K. Mandal, and N. Mukherjee, ‘‘Activity [97] J. Shell and S. Coupland, ‘‘Improved decision making using fuzzy
recognition system using inbuilt sensors of smart mobile phone and min- temporal relationships within intelligent assisted living environments,’’
imizing feature vectors,’’ in Microsystem Technologies. Berlin, Germany: in Proc. 7th Int. Conf. Intell. Environ., Jul. 2011, pp. 149–156,
Springer-Verlag, 2015, doi: 10.1007/s00542-015-2551-2. doi: 10.1109/IE.2011.30.
[78] Y.-J. Kim, Y. Kim, J. Ahn, and D. Kim, ‘‘Integrating hidden Markov mod- [98] N. Oukrich, A. Maach, E. Sabri, E. Mabrouk, and K. Bouchard, ‘‘Activity
els based on mixture-of-templates and k-NN2 ensemble for activity recog- recognition using back-propagation algorithm and minimum redundancy
nition,’’ in Proc. 23rd Int. Conf. Pattern Recognit. (ICPR), Dec. 2016, feature selection method,’’ in Proc. 4th IEEE Int. Colloq. Inf. Sci. Tech-
pp. 1636–1641. nol. (CiSt), Oct. 2016, pp. 818–823, doi: 10.1109/CIST.2016.7805000.
[79] B. Bruno, F. Mastrogiovanni, and A. Sgorbissa, ‘‘A public domain [99] S. Soviany and S. Puscoci, ‘‘A hierarchical decision system for
dataset for ADL recognition using wrist-placed accelerometers,’’ in human behavioral recognition,’’ in Proc. 7th Int. Conf. Electron., Com-
Proc. 23rd IEEE Int. Symp. Robot Hum. Interact. Commun., Aug. 2014, put. Artif. Intell. (ECAI), Jun. 2015, pp. S-79–S-84, doi: 10.1109/
pp. 738–743, ECAI.2015.7301165.
[80] O. Banos et al., ‘‘Design, implementation and validation of a novel [100] K. Bouchard et al., ‘‘Unsupervised spatial data mining for smart homes,’’
open framework for agile development of mobile health applications,’’ in Proc. IEEE Int. Conf. Data Mining Workshop (ICDMW), Nov. 2015,
BioMed. Eng. OnLine, vol. 14, nos. S2–S6, pp. 1–20, 2015. pp. 1433–1440, doi: 10.1109/ICDMW.2015.126.
[81] S. A. Khowaja, B. N. Yahya, and S.-L. Lee, ‘‘Hierarchical classification [101] H. Zheng, H. Wang, and N. Black, ‘‘Human activity detection in smart
method based on selective learning of slacked hierarchy for activity home environment with self-adaptive neural networks,’’ in Proc. IEEE
recognition systems,’’ Expert Syst. Appl., vol. 88, pp. 165–177, Dec. 2017, Int. Conf. Netw., Sens. Control (ICNSC), Apr. 2008, pp. 1505–1510,
doi: 10.1016/j.eswa.2017.06.040. doi: 10.1109/ICNSC.2008.4525459.
[82] S. Ha and S. Choi, ‘‘Convolutional neural networks for human activity [102] X. Zhang, G.-B. Kim, Y. Xia, and H.-Y. Bae, ‘‘Human activity recog-
recognition using multiple accelerometer and gyroscope sensors,’’ in nition with trajectory data in multi-floor indoor environment,’’ in Proc.
Proc. Int. Joint Conf. Neural Netw. (IJCNN), Jul. 2016, pp. 381–388. Int. Conf. Rough Sets Knowl. Technol., in Lecture Notes in Computer
[83] J. Saives, C. Pianon, and G. Faraut, ‘‘Activity discovery and detection Science, Chengdu, China, vol. 7414. Berlin, Germany: Springer, 2012,
of behavioral deviations of an inhabitant from binary sensors,’’ IEEE pp. 257–266, doi: 10.1007/978-3-642-31900-6_33.
Trans. Autom. Sci. Eng., vol. 12, no. 4, pp. 1211–1224, Oct. 2015, [103] R. C. Kumar, S. S. Bharadwaj, B. N. Sumukha, and K. George, ‘‘Human
doi: 10.1109/TASE.2015.2471842. activity recognition in cognitive environments using sequential ELM,’’ in
[84] D. I. Kim and E. Martinson, ‘‘Human centric spatial affordances Proc. 2nd Int. Conf. Cognit. Comput. Inf. Process. (CCIP), Aug. 2016,
for improving human activity recognition,’’ in Proc. IEEE/RSJ pp. 1–6. doi: 10.1109/CCIP.2016.7802880.
Int. Conf. Intell. Robots Syst. (IROS), Oct. 2016, pp. 725–730, [104] R. Kumar, I. Qamar, J. S. Virdi, and N. C. Krishnan, ‘‘Multi-label learning
doi: 10.1109/IROS.2016.7759132. for activity recognition,’’ in Proc. Int. Conf. Intell. Environ., Jul. 2015,
[85] A. K. Ramakrishnan, D. Preuveneers, and Y. Berbers, ‘‘A loosely cou- pp. 152–155, doi: 10.1109/IE.2015.32.
pled and distributed Bayesian framework for multi-context recognition [105] C. Bhadrachalam, T. Jyothi, and T. S. Indulekha, ‘‘New approaches
in dynamic ubiquitous environments,’’ in Proc. IEEE 10th Int. Conf. for discovering unsupervised human activities by mining sensor data,’’
Ubiquitous Intell. Comput. and IEEE 10th Int. Conf. Autonomic Trusted in Proc. Int. Conf. Comput. Netw. Commun. (CoCoNet), Dec. 2015,
Comput., Dec. 2013, pp. 270–277, doi: 10.1109/UIC-ATC.2013.66. pp. 118–123, doi: 10.1109/CoCoNet.2015.7411176.
[86] R. Fallahzadeh and H. Ghasemzadeh, ‘‘Personalization without user inter- [106] A. Aztiria, G. Farhadi, and H. Aghajan, ‘‘User behavior shift detection in
ruption: Boosting activity recognition in new subjects using unlabeled ambient assisted living environments,’’ J. Med. Internet Res., vol. 1, no. 1,
data,’’ in Proc. ACM/IEEE 8th Int. Conf. Cyber-Phys. Syst. (ICCPS), p. e6, Jan./Jun. 2013, doi: 10.2196/mhealth.2536.
Apr. 2017, pp. 293–302, doi: 10.1145/3055004.3055015. [107] L. G. Fahad, S. F. Tahir, and M. Rajarajan, ‘‘Activity recognition in smart
[87] A. S. Billis et al., ‘‘A decision-support framework for promoting indepen- homes using clustering based classification,’’ in Proc. 22nd Int. Conf. Pat-
dent living and ageing well,’’ IEEE J. Biomed. Health Inform., vol. 19, tern Recognit., Aug. 2014, pp. 1348–1353, doi: 10.1109/ICPR.2014.241.
no. 1, pp. 199–209, Jan. 2015, doi: 10.1109/JBHI.2014.2336757. [108] E. Hoque and J. Stankovic, ‘‘AALO: Activity recognition in
[88] E. E. Stone and M. Skubic, ‘‘Mapping kinect-based in-home gait speed smart homes using active learning in the presence of overlapped
to TUG time: A methodology to facilitate clinical interpretation,’’ in activities,’’ in Proc. 6th Int. Conf. Pervasive Comput. Technol.
Proc. 7th Int. Conf. Pervasive Comput. Technol. Healthcare Workshops, Healthcare (PervasiveHealth) Workshops, May 2012, pp. 139–146,
May 2013, pp. 57–64, doi: 10.4108/icst.pervasivehealth.2013.252097. doi: 10.4108/icst.pervasivehealth.2012.248600.

VOLUME 6, 2018 59209


E. De-la-Hoz-Franco et al.: Sensor-Based Data Sets for HAR–Systematic Review of Literature

[109] V. Ghasemi and A. K. Pouyan, ‘‘Activity recognition in smart homes using [130] E. de la Hoz, E. de la Hoz, A. Ortiz, J. Ortega, and
absolute temporal information in dynamic graphical models,’’ in Proc. A. Martínez-Álvarez, ‘‘Feature selection by multi-objective optimisation:
10th Asian Control Conf., May/Jun. 2015, pp. 1–6. Application to network anomaly detection by hierarchical self-
[110] P. Kodeswaran, R. Kokku, M. Mallick, and S. Sen, ‘‘Demultiplexing organising maps,’’ Knowl.-Based Syst., vol. 71, pp. 322–338, Nov. 2014,
activities of daily living in IoT enabled smarthomes,’’ in Proc. 35th Annu. doi: 10.1016/j.knosys.2014.08.013.
IEEE Int. Conf. Comput. Commun., Apr. 2016, pp. 1–9. [131] F. Mendoza, A. De-La-Hoz-Manotas, E. De-La-Hoz-Franco, and
[111] J. L. G. Ortega, L. Han, N. Whittacker, and N. Bowring, ‘‘A machine- P. Ariza-Colpas, ‘‘Feature selection, learning metrics and dimen-
learning based approach to model user occupancy and activity patterns sion reduction in training and classification processes in intrusion
for energy saving in buildings,’’ in Proc. Sci. Inf. Conf., London, U.K., detection systems,’’ J. Theor. Appl. Inf. Technol., vol. 82, no. 2,
Jul. 2015, pp. 474–482. pp. 291–298, Dec. 2015. Accessed: Jul. 31, 2018. [Online]. Available:
[112] U. Avci and A. Passerini, ‘‘Improving activity recognition by segmental http://www.jatit.org/volumes/Vol82No2/12Vol82No2.pdf
pattern mining,’’ in Proc. 8th IEEE Int. Conf. Pervasive Comput. Com- [132] E. De-La-Hoz-Franco, A. Ortiz, J. Ortega, E. De-La-Hoz-Correa, and
mun. Workshops, Mar. 2012, pp. 709–714. F. Mendoza, ‘‘Implementation of an intrusion detection system based
[113] H. Alemdar, T. L. M. van Kasteren, M. E. Niessen, A. Merentitis, and on self organizing map,’’ J. Theor. Appl. Inf. Technol., vol. 71, no. 3,
C. Ersoy, ‘‘A unified model for human behavior modeling using a hier- pp. 324–334, Jan. 2015. Accessed: Jul. 31, 2018. [Online]. Available:
archy with a variable number of states,’’ in Proc. 22nd Int. Conf. Pattern http://www.jatit.org/volumes/Vol71No3/2Vol71No3.pdf
Recognit., Aug. 2014, pp. 3804–3809, doi: 10.1109/ICPR.2014.653.
[114] M. B. Abidine, B. Fergani, and L. Clavier, ‘‘Importance-weighted the
EMIRO DE-LA-HOZ-FRANCO was born in
imbalanced data for C-SVM classifier to human activity recognition,’’ in
Barranquilla, Colombia, in 1972. He received the
Proc. 8th Int. Workshop Syst., Signal Process. Appl. (WoSSPA), May 2013,
pp. 330–335. M.Sc. degree in computer and network engineer-
[115] M. B. Abidine and B. Fergani, ‘‘Evaluating a new classification method ing and the Ph.D. degree in information and com-
using PCA to human activity recognition,’’ in Proc. Int. Conf. Comput. munication technology from the University of
Med. Appl. (ICCMA), Jan. 2013, pp. 1–4. Granada, Spain, in 2011 in 2016, respectively. He
[116] X. Hong, C. D. Nugent, M. D. Mulvenna, S. Martin, S. Devlin, and is currently a full-time Professor and also a mem-
J. G. Wallace, ‘‘Dynamic similarity-based activity detection and recog- ber of the GIECUC and the Software Engineering
nition within smart homes,’’ Int. J. Pervasive Comput. Commun., vol. 8, & Networks Research Groups, Universidad de la
no. 3, pp. 264–278, 2012, doi: 10.1108/17427371211262653. Costa-CUC, Barranquilla. His research interests
[117] V. Ghasemi, A. A. Pouyan, and M. Sharifi, ‘‘Human activity recognition are in the field of machine learning and recognition of activities of daily
in smart homes based on a difference of convex programming problem,’’ life–ADL. ORCID: 0000-0002-4926-7414.
KSII Trans. Internet Inf. Syst., vol. 11, no. 1, pp. 321–344, Jan. 2017,
doi: 10.3837/tiis.2017.01.017.
[118] F. A. Machot and H. C. Mayr, ‘‘Improving human activity recognition PAOLA ARIZA-COLPAS was born in Barran-
by smart windowing and spatio-temporal feature analysis,’’ in Proc. 9th quilla, Colombia, in 1985. She received the M.Sc.
ACM Int. Conf. Pervasive Technol. Rel. Assistive Environ., 2016, p. 56, degree in computer and systems engineering from
doi: 10.1145/2910674.2910697. the Universidad del Norte, Barranquilla, Colom-
[119] F. A. Machot, H. C. Mayr, and S. Ranasinghe, ‘‘A window- bia, in 2011. She is currently pursuing the Ph.D.
ing approach for activity recognition in sensor data streams,’’ in degree in engineering with the Universidad Ponti-
Proc. 8th Int. Conf. Ubiquitous Future Netw. (ICUFN), Jul. 2016, ficia Bolivariana, Medellín, Colombia. She is cur-
pp. 951–953. rently a full-time Professor and also a member of
[120] A. De Paola et al., ‘‘An ambient intelligence system for assisted living,’’ the Software Engineering & Networks Research
in Proc. AEIT Int. Annu. Conf., Sep. 2017, pp. 1–6. Group, Universidad de la Costa-CUC, Barran-
[121] N. Yala, B. Fergani, and A. Fleury, ‘‘Feature extractionand incremen- quilla. Her research interests are in the field of data mining and image
tal learning to improve activity recognition on streaming data,’’ in recognition.
Proc. IEEE Int. Conf. Evolving Adapt. Intell. Syst. (EAIS), Dec. 2015,
pp. 1–8.
[122] R. Mohamed, T. Perumal, N. Sulaiman, N. Mustapha, and M. N. Razali, JAVIER MEDINA QUERO was born in Granada,
‘‘Conflict resolution using enhanced label combination method for com- Spain, in 1983. He received the M.Sc. and Ph.D.
plex activity recognition in smart home environment,’’ in Proc. IEEE 6th degrees in computer science from the University
Global Conf. Consum. Electron. (GCCE), Oct. 2017, pp. 1–3. of Granada, Spain, in 2007 and 2010, respectively.
[123] S. Ntalampiras and M. Roveri, ‘‘An incremental learning mechanism for He is currently a Researcher with the research
human activity recognition,’’ in Proc. IEEE Symp. Ser. Comput. Intell., group Intelligent Systems Based on Fuzzy Deci-
Dec. 2016, pp. 1–6.
sion Analysis (Sinbad2), University of Jaén. His
[124] C. A. Ronao and S.-B. Cho, ‘‘Human activity recognition using smart-
research interests encompass fuzzy logic, e-health,
phone sensors with two-stage continuous hidden Markov models,’’ in
Proc. 10th Int. Conf. Natural Comput., Aug. 2014, pp. 681–686.
intelligent systems, ubiquitous computing, and
[125] E. Garcia-Ceja and R. F. Brena, ‘‘An improved three-stage classifier for
ambient intelligence.
activity recognition,’’ Int. J. Pattern Recognit. Artif. Intell., vol. 32, no. 1,
p. 1860003, 2018, doi: 10.1142/S0218001418600030.
[126] J. Cumin, G. Lefebvre, F. Ramparany, and J. L. Crowley, ‘‘Human activity
recognition using place-based decision fusion in smart home,’’ Arch.
MACARENA ESPINILLA was born in Jaén,
Ouverte HAL, Tech. Rep., 2017.
Spain, in 1983. She received the M.Sc. and Ph.D.
[127] G. Chetty and M. White, ‘‘Body sensor networks for human activity
recognition,’’ in Proc. 3rd Int. Conf. Signal Process. Integr. Netw. (SPIN),
degrees, both in computer science, from the Uni-
Noida, India, Feb. 2016, pp. 660–665, doi: 10.1109/SPIN.2016.7566779. versity of Jaén, Jaén, Spain, in 2006 and 2009,
[128] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, respectively. She is currently an Associate Profes-
‘‘SMOTE: Synthetic minority over-sampling technique,’’ J. Artif. Intell. sor with the Department of Computer Systems,
Res., vol. 16, no. 1, pp. 321–357, 2002. University of Jaén. Her current research interests
[129] D. Anguita, A. Ghio, L. Oneto, X. Parra, and J. L. Reyes-Ortiz, include ambient intelligence, ubiquitous comput-
‘‘Human activity recognition on smartphones using a multiclass ing, ambient assisted living, evaluation process,
hardware-friendly support vector machine,’’ in Proc. Int. Workshop decision-making, recommender system, linguistic
Ambient Assist. Living (IWAAL), Vitoria-Gasteiz, Spain, Dec. 2012, preference modeling, and fuzzy logic-based systems.
pp. 216–223.

59210 VOLUME 6, 2018

You might also like