Smart Meter Data Analysis
Smart Meter Data Analysis
Chongqing Kang
Smart Meter
Data Analytics
Electricity Consumer Behavior
Modeling, Aggregation,
and Forecasting
Smart Meter Data Analytics
Yi Wang Qixin Chen Chongqing Kang
• •
123
Yi Wang Qixin Chen
Department of Electrical Engineering Department of Electrical Engineering
Tsinghua University Tsinghua University
Beijing, China Beijing, China
Chongqing Kang
Department of Electrical Engineering
Tsinghua University
Beijing, China
This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd.
The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721,
Singapore
To Our Alma Mater
–Tsinghua University
Foreword
Smart grid is a cyber-physical-social system where the power flow, data flow, and
business flow are deeply coupled. Enlightened consumers facilitated by smart
meters form the foundation of a smart grid. Countries around the world are in the
midst of massive smart meter installations for consumers on the pathway towards
grid digitalization and modernization. It enables the collection of extensive
fine-grained smart meter data, which could be processed by data analytical tech-
niques, especially now widely available machine learning techniques. Big data and
machine learning terms are widely used nowadays. People from different industries
try to apply advanced machine learning techniques to solve their own practical
issues. The power and energy industry is no exception. Smart meter data analytics
can be conducted to fully explore the value behind these data to improve the
understanding of consumer behavior and enhance electric services such as demand
response and energy management.
This book explores and discusses the applications of data analytical techniques
to smart meter data. The contents of the book are divided into three parts. The first part
(Chaps. 1–2) provides a comprehensive review of recent developments of smart meter
data analytics and proposes the concept of “electricity consumer behavior model”.
The second part (Chaps. 3–5) studies the data analytical techniques for smart meter
data management, such as data compression, bad data detection, data generation, etc.
The third part (Chaps. 6–12) conducts application-oriented research to depict the
electricity consumer behavior model. This part includes electrical consumption pat-
tern recognition, personalized tariff design for retailers, socio-demographic infor-
mation identification, consumer aggregation, electrical load forecasting, etc. The
prospects of future smart meter data analytics (Chap. 13) are also provided as the end
of the book. The authors offer model formulations, novel algorithms, in-depth dis-
cussions, and detailed case studies in various chapters of this book.
One author of this book, Prof. Chongqing Kang, is a professional colleague. He
is a distinguished scholar and pioneer in the power and energy area. He has done
extensive work in the field of data analytics and load forecasting. This is a book
worth reading; one will see how much insight can be gained from smart meter data
vii
viii Foreword
alone. There are definitely broader qualitative understanding that can be gained
from massive data collected in the realm of generation, transmission, distribution,
and end use of the smart grid.
ix
x Preface
This book aims to make the best use of all of the data available to process and
translate them into actual information and incorporate into consumer behavior
modeling and distribution system operations. The research framework of the smart
meter data analytics in this book can be summarized in the following figure.
forecasting, and load management. We also review the techniques and method-
ologies adopted or developed to address each application.
Chapter 2 proposes the concept of ECBM and decomposes consumer behavior
into five basic aspects from the sociological perspective: behavior subject, behavior
environment, behavior means, behavior result, and behavior utility. On this basis,
the research framework for ECBM is established.
Chapter 3 provides a highly efficient data compression technique to reduce the
great burden on data transmission, storage, processing, application, etc. It applies
the generalized extreme value distribution characteristic for household load data
and then utilizes it to identify load features including load states and load events.
Finally, a highly efficient lossy data compression format is designed to store key
information of load features.
Chapter 4 applies two novel data mining techniques, the maximum information
coefficient (MIC) and the clustering technique by fast search and find of density
peaks (CFSFDP), to detect electricity abnormal consumption or thefts. On this
basis, a framework of combining the advantages of the two techniques is further
proposed to boost the detection accuracy.
Chapter 5 proposes a residential load profiles generation model based on the
generative adversarial network (GAN). To consider the different typical load patterns
of consumers, an advanced GAN based on the auxiliary classifier GAN (ACGAN) is
further to generate profiles under typical modes. The proposed model can generate
realistic load profiles under different load patterns without loss of diversity.
Chapter 6 proposes a K-SVD-based sparse representation technique to decom-
pose original load profiles into linear combinations of several partial usage patterns
(PUPs), which allows the smart meter data to be compressed and hidden electricity
consumption patterns to be extracted. Then, a linear support vector machine
(SVM)-based method is used to classify the load profiles into two groups, resi-
dential customers and small- and medium-sized enterprises (SMEs), based on the
extracted patterns.
Chapter 7 studies a data-driven approach for personalized time-of-use
(ToU) price design based on massive historical smart meter data. It can be for-
mulated as a large-scale mixed-integer nonlinear programming (MINLP) problem.
Through load profiling and linear transformation or approximation, the MINLP
model is simplified into a mixed-integer linear programming (MILP) problem. In
this way, various tariffs can be designed.
Chapter 8 investigates how much socio-demographic information can be inferred
or revealed from fine-grained smart meter data. A deep convolutional neural net-
work (CNN) first automatically extracts features from massive load profiles.
Then SVM is applied to identify the characteristics of the consumers. Different
socio-demographic characteristics show different identification accuracies.
Chapter 9 uses smart meter data to identify energy behavior indicators through a
cross-domain feature selection and coding approach. The idea is to extract and
connect customers’ features from the energy domain and demography domain.
Smart meter data are characterized by typical energy spectral patterns, whereas
household information is encoded as the energy behavior indicator. The proposed
xii Preface
This book made a summary of our research about smart meter data analytics
achieved in recent years. These works were carried out in the Energy Intelligence
Laboratory (EILAB), Department of Electrical Engineering, Tsinghua University,
Beijing, China.
Many people contributed to this book in various ways. The authors are indebted
to Prof. Daniel Kirschen from the University of Washington; Prof. Furong Li and
Dr. Ran Li from the University of Bath; Dr. Tao Hong from the University of North
Carolina at Charlotte; and Dr. Ning Zhang, Dr. Xing Tong, Mr. Kedi Zheng,
Mr. Yuxuan Gu, Mr. Dahua Gan, and Mr. Cheng Feng from Tsinghua University,
who have contributed materials to this book.
We also thank Mr. Yuxiao Liu, Mr. Qingchun Hou, Mr. Haiyang Jiang,
Mr. Yinxiao Li, Mr. Pei Yong, Mr. Jiawei Zhang, Mr. Xichen Fang, and Mr. Tian
Xia at Tsinghua University for their assistance in pointing out typos and checking
the whole book.
In addition, we acknowledge the innovative works contributed by others in this
increasingly important area especially through IEEE Power & Energy Society
Working Group on Load Aggregator and Distribution Market, and appreciate the
staff at Springer for their assistance and help in the preparation of this book.
This book is supported in part by the National Key R&D Program of China
(2016YFB0900100), in part by the Major Smart Grid Joint Project of National
Natural Science Foundation of China and State Grid (U1766212), and in part by the
Key R&D Program of Guangdong Province (2019B111109002). The authors really
appreciate their supports.
Yi Wang
Qixin Chen
Chongqing Kang
xiii
Contents
xv
xvi Contents
2.2.3 Denotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
2.2.4 Relationship with Other Models . . . . . . . . . . . . . . . . . 43
2.3 Basic Characteristics of Electricity Consumer Behavior . . . . . . . 45
2.4 Mathematical Expression of ECBM . . . . . . . . . . . . . . . . . . . . . 47
2.5 Research Paradigm of ECBM . . . . . . . . . . . . . . . . . . . . . . . . . 50
2.6 Research Framework of ECBM . . . . . . . . . . . . . . . . . . . . . . . . 51
2.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3 Smart Meter Data Compression . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.2 Household Load Profile Characteristics . . . . . . . . . . . . . . . . . . 61
3.2.1 Small Consecutive Value Difference . . . . . . . . . . . . . . 61
3.2.2 Generalized Extreme Value Distribution . . . . . . . . . . . 62
3.2.3 Effects on Load Data Compression . . . . . . . . . . . . . . . 64
3.3 Feature-Based Load Data Compression . . . . . . . . . . . . . . . . . . 66
3.3.1 Distribution Fit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
3.3.2 Load State Identification . . . . . . . . . . . . . . . . . . . . . . . 67
3.3.3 Base State Discretization . . . . . . . . . . . . . . . . . . . . . . . 67
3.3.4 Event Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
3.3.5 Event Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
3.3.6 Load Data Compression and Reconstruction . . . . . . . . 69
3.4 Data Compression Performance Evaluation . . . . . . . . . . . . . . . . 71
3.4.1 Related Data Formats . . . . . . . . . . . . . . . . . . . . . . . . . 71
3.4.2 Evaluation Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
3.4.3 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
3.4.4 Compression Efficiency Evaluation Results . . . . . . . . . 73
3.4.5 Reconstruction Precision Evaluation Results . . . . . . . . 74
3.4.6 Performance Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
3.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
4 Electricity Theft Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
4.2.1 Observer Meters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
4.2.2 False Data Injection . . . . . . . . . . . . . . . . . . . . . . . . . . 81
4.2.3 A State-Based Method of Correlation . . . . . . . . . . . . . 83
4.3 Methodology and Detection Framework . . . . . . . . . . . . . . . . . . 83
4.3.1 Maximum Information Coefficient . . . . . . . . . . . . . . . . 84
4.3.2 CFSFDP-Based Unsupervised Detection . . . . . . . . . . . 85
4.3.3 Combined Detecting Framework . . . . . . . . . . . . . . . . . 86
Contents xvii
1.1 Introduction
Smart meters have been deployed around the globe during the past decade. Smart
meters, together with the communication network and data management system,
constitute the advanced metering infrastructure (AMI), which plays a vital role in
power delivery systems by recording the load profiles and facilitating bi-directional
information flow [1]. The widespread popularity of smart meters enables an immense
amount of fine-grained electricity consumption data to be collected. Billing is no
longer the only function of smart meters. High-resolution data from smart meters
provide rich information on the electricity consumption behaviors and lifestyles of
the consumers. Meanwhile, the deregulation of the power industry, particularly on the
delivery side, is continuously moving forward in many countries worldwide. These
countries are now sparing no effort on electricity retail market reform. Increasingly
more participators, including retailers, consumers, and aggregators, are involved in
making the retail market more prosperous, active, and competitive [2]. How to employ
massive smart meter data to promote and enhance the efficiency and sustainability
of the demand side has become an important topic worldwide.
Figure 1.1 depicts the five major players on the demand side of the power sys-
tem: consumers, retailers, aggregators, distribution system operators (DSO), and data
service providers. For retailers, at least four businesses related to smart meter data
analytics need to be conducted to increase the competitiveness in the retail market.
(1) Load forecasting, which is the basis of decision making for the optimization of
electricity purchasing in different markets to maximize profits. (2) Price design to
attract more consumers. (3) Providing good service to consumers, which can be im-
plemented by consumer segmentation and characterization. (4) Abnormal detection
to have a cleaner dataset for further analysis and decrease potential loss from elec-
tricity theft. For consumers, individual load forecasting, which is the input of future
home energy management systems (HEMS) [10], can be conducted to reduce their
electricity bill. In the future peer-to-peer (P2P) market, individual load forecasting
can also contribute to the implementation of transactive energy between consumers
[11, 12]. For aggregators, they deputize a group of consumers for demand response
or energy efficiency in the ancillary market. Aggregation level load forecasting and
demand response potential evaluation techniques should be developed. For DSO,
smart meter data can be applied to distribution network topology identification, opti-
mal distribution system energy management, outage management, and so forth. For
data service providers, they need to collect smart meter data and then analyze these
massive data and provide valuable information for retailers and consumers to maxi-
mize profits or minimize cost. Providing data services, including data management
and data analytics, is an important business model when increasingly more smart
meter data are collected and to be processed.
To support the businesses of retailers, consumers, aggregators, DSO, and data
service providers, following the three stages of analytics, namely, descriptive, pre-
dictive and prescriptive analytics, the main applications of smart meter data analytics
are classified into load analysis, load forecasting, load management, and so forth.
4 1 Overview of Smart Meter Data Analytics
The detailed taxonomy is illustrated in Fig. 1.2. The machine learning techniques
used for smart meter data analytics include time series analysis, dimensionality re-
duction, clustering, classification, outlier detection, deep learning, low-rank matrix,
compressed sensing, online learning, and so on. Studies on how smart meter data
analytics works for each application and what methodologies have been applied will
be summarized in the following sections.
This chapter attempts to provide a comprehensive review of the current research
in recent years and identify future challenges for smart meter data analytics. Note that
every second or higher frequency data used for nonintrusive load monitoring (NILM)
are very limited at present due to the high cost of communicating and storing the
data. The majority of smart meters collect electricity consumption data at a frequency
of every 15 min to each hour. In addition, several comprehensive reviews have been
conducted on NILM. Thus, in this chapter, works about NILM are not included.
Figure 1.3 shows eight typical normalized daily residential load profiles obtained us-
ing the simple k-means algorithm in the Irish resident load dataset. The load profiles
of different consumers on different days are diverse. Having a better understanding
of the volatility and uncertainty of the massive load profiles is very important for
further load analysis. In this section, the works on load analysis are reviewed from
the perspectives of anomaly detection and load profiling. Anomaly detection is very
important because training a model such as a forecasting model or clustering model
on a smart meter dataset with anomalous data may result in bias or failure for pa-
rameter estimation and model establishment. Moreover, reliable smart meter data are
1.2 Load Analysis 5
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0 4 8 12 16 20 24 28 32 36 40 44 48
Time/30 min
important for accurate billing. The works on anomaly detection in smart meter data
are summarized from the perspective of bad data detection and NTL detection (or
energy theft detection). Load profiling is used to find the basic electricity consump-
tion patterns of each consumer or a group of consumers. The load profiling results
can be further used for load forecasting and demand response programs.
Bad data, as discussed here, can be missing data or unusual patterns caused by
unplanned events or the failure of data collection, communication, or entry. Bad
data detection can be divided into probabilistic, statistical, and machine learning
methods [13]. The methods for bad data detection in other research areas could be
applied to smart meter data. Only the works closely related to smart meter bad data
detection are surveyed in this subsection. According to the modeling methods, these
works are summarized as time-series-based methods, low-rank matrix technique-
based methods, and time-window-based methods.
Smart meter data are essentially time series. An optimally weighted average
(OWA) method was proposed for data cleaning and imputation in [14], which can
be applied to offline or online situations. It was assumed that the load data could
be explained by a linear combination of the nearest neighbor data, which is quite
similar to the autoregressive moving average (ARIMA) model for time series. The
optimal weight was obtained by training an optimization model. While in [15], the
6 1 Overview of Smart Meter Data Analytics
nonlinear relationship between the data at different time periods and exogenous in-
puts was modeled by combining autoregressive with exogenous inputs (ARX) and
artificial neural network (ANN) models where the bad data detection was modeled
as a hypothesis testing on the extreme of the residuals. A case study on gas flow
data was performed and showed an improvement in load forecasting accuracy after
ARX-based bad data detection. Similarly, based on the auto-regression (AR) model,
the generalized extreme Studentized deviate (GESD) and the Q-test were proposed
to detect the outliers when the number of samples is more and less than ten, re-
spectively, in [16]. Then, canonical variate analysis (CVA) was conducted to cluster
the recovered load profiles, and a linear discriminate analysis (LDA) classifier was
further used to search for abnormal electricity consumption. Instead of detecting bad
data, which forecasting method is robust to the cyber attack or bad data without bad
data detection was investigated in [17].
The electricity consumptions are spatially and temporally correlated. Exploring
the spatiotemporal correlation can help identify the outliers and recover them. A
low-rank matrix fitting-based method was proposed in [18] to conduct data cleaning
and imputation. An alternating direction method of multipliers (ADMM)-based dis-
tributed low-rank matrix technique was also proposed to enable communication and
data exchange between different consumers and to protect the privacy of consumers.
Similarly, to produce a reliable state estimation, the measurements were first pro-
cessed by low-rank matrix techniques in [19]. Both off-line and on-line algorithms
have been proposed. However, the improvement in state estimation after low-rank
denoising has not been investigated. Low-rank matrix factorization works well when
the bad data are randomly distributed. However, when the data are unchanged for a
certain period, the low-rank matrix cannot handle it well.
Rather than detecting all the bad data directly, strategies that continuously detect
and recover a part within a certain time window have also been studied. A clustering
approach was proposed on the load profiles with missing data in [20, 21]. The
clustering was conducted on segmented profiles rather than the entire load profiles
in a rolling manner. In this way, the missing data can be recovered or estimated
by other data in the same cluster. Collective contextual anomaly detection using a
sliding window framework was proposed in [22] by combining various anomaly
classifiers. The anomalous data were detected using overlapping sliding windows.
Since smart meter data are collected in a real-time or near real-time fashion, an online
anomaly detection method using the Lambda architecture was proposed in [23]. The
proposed online detection method can be parallel processed, having high efficiency
when working with large datasets.
Strictly speaking, smart meter data with energy theft also belong to bad data. The
bad data discussed above are unintentional and appear temporarily, whereas energy
theft may change the smart meter data under certain strategies and last for a relatively
1.2 Load Analysis 7
long time. Energy theft detection can be implemented using smart meter data and
power system state data, such as node voltages. The energy theft detection methods
with only smart meter data are summarized in this part from two aspects: supervised
learning and unsupervised learning.
Supervised classification methods are effective approaches for energy theft de-
tection, which generally consists of two stages: feature extraction and classification.
To train a theft detection classifier, the non-technical loss was first estimated in [24].
k-means clustering was used to group the load profiles, where the number of clusters
was determined by the silhouette value [25]. To address the challenge of imbalanced
data, various possible malicious samples were generated to train the classifier. An
energy theft alarm was raised after a certain number of abnormal detections. Differ-
ent numbers of abnormal detections resulted in different false-positive rates (FPR)
and Bayesian detection rates (BDR). The proposed method can also identify energy
theft types. Apart from clustering-based feature extraction, an encoding technique
was first performed on the load data in [26], which served as the inputs of classifiers
including SVM and a rule-engine-based algorithm to detect the energy theft. The
proposed method can run in parallel for real-time detection. By introducing external
variables, a top-down scheme based on decision tree and SVM method was proposed
in [27]. The decision tree estimated the expected electricity consumption based on
the number of appliances, persons, and outdoor temperature. Then, the output of the
decision tree was fed to the SVM to determine whether the consumer is normal or
malicious. The proposed framework can also be applied for real-time detection.
Obtaining the labeled dataset for energy theft detection is difficult and expensive.
Compared with supervised learning, unsupervised energy theft detection does not
need the labels of all or partial consumers. An optimum-path forest (OPF) clustering
algorithm was proposed in [28], where each cluster is modeled as a Gaussian dis-
tribution. The load profile can be identified as an anomaly if the distance is greater
than a threshold. Comparisons with frequently used methods, including k-means,
Birch, affinity propagation (AP), and Gaussian mixture model (GMM), verified the
superiority of the proposed method. Rather than clustering all load profiles, clus-
tering was only conducted within an individual consumer to obtain the typical and
atypical load profiles in [29]. A classifier was then trained based on the typical and
atypical load profiles for energy theft detection. A case study in this paper showed
that extreme learning machine (ELM) and online sequential-ELM (OS-ELM)-based
classifiers have better accuracy compared with SVM. Transforming the time series
smart meter data into the frequency domain is another approach for feature extrac-
tion. Based on the discrete Fourier transform (DFT) results, the features extracted in
the reference interval and examined interval were compared based on the so-called
Structure & Detect method in [30]. Then, the load profile can be determined to be
normal or malicious. The proposed method can be implemented in a parallel and
distributed manner, which can be used for the on-line analysis of large datasets. An-
other unsupervised energy theft detection method is to formulate the problem as a
load forecasting problem. If the metered consumption is considerably lower than the
forecasted consumption, then the consumer can be marked as a malicious consumer.
8 1 Overview of Smart Meter Data Analytics
An anomaly score was given to each consumption data and shown with different
colors to realize visualization in [31].
encoded load profile, a locality sensitive hashing (LSH) method was further proposed
to classify the load profiles and obtain the representative load profiles.
Insights into the local and global characteristics of smart meter data are important
for finding meaningful typical load profiles. Three new types of features generated
by applying conditional filters to meter-resolution-based features integrated with
shape signatures, calibration and normalization, and profile errors were proposed
in [41] to cluster daily load curves. The proposed feature extraction method was of
low computational complexity, and the features were informative and understand-
able for describing the electricity usage patterns. To capture local and global shape
variations, 10 subspace clustering and projected clustering methods were applied to
identify the contact type of consumers in [42]. By focusing on the subspace of load
profiles, the clustering process was proven to be more robust to noise. To capture the
peak load and major variability in residential consumption behavior, four key time
periods (overnight, breakfast, daytime, and evening) were identified in [43]. On this
basis, seven attributes were calculated for clustering. The robustness of the proposed
clustering was verified using the bootstrap technique.
The variability and uncertainty of smart meter data have also been considered
for load profiling. Four key time periods, which described different peak demand
behaviors, coinciding with common intervals of the day were identified in [43],
and then a finite mixture-model-based clustering was used to discover ten distinct
behavior groups describing customers based on their demand and variability. The
load variation was modeled by a lognormal distribution, and a Gaussian mixture
model (GMM)-based load profiling method was proposed in [44] to capture the dy-
namic behavior of consumers. A mixture model was also used in [45] by integrating
the C-vine copula method for the clustering of residential load profiles. The high-
dimensional nonlinear correlations among consumptions of different time periods
were modeled using the C-vine copula. This method has an effective performance
in large datasets. While in [46], a Markov model was established based on the sep-
arated time periods to describe the electricity consumption behavior dynamics. A
clustering technique consisting of fast search and find of density peaks (CFSFDP)
integrated into a divide-and-conquer distributed approach was proposed to find typi-
cal consumption behaviors. The proposed distributed clustering algorithm had higher
computational efficiency. The expectation-maximization (EM)-based mixture model
clustering method was applied in [47] to obtain typical load profiles, and then the
variabilities in residential load profiles were modeled by a transition matrix based on
a second-order Markov chain and Markov decision processes. The proposed method
can be used to generate pseudo smart meter data for retailers and protect the privacy
of consumers.
1.2.4 Remarks
Table 1.1 provides the correspondence between the key techniques and the surveyed
references in smart meter data analytics for load analysis.
10 1 Overview of Smart Meter Data Analytics
For bad data detection, most of the bad data detection methods are suitable for
business/industrial consumers or higher aggregation level load data, which are more
regular and have certain patterns. The research on bad data detection on the individual
consumer is still limited and not a trivial task because the load profiles of an individual
consumer show more variation. In addition, since bad data detection and repairing
are the basis of other data analytics application, how much improvement can be
made for load forecasting or other applications after bad data detection is also an
issue that deserves further investigation. In addition, smart meter data are essentially
streaming data. Real-time bad data detection for some real-time applications, such
as very-short-term load forecasting, is another concern. Finally, as stated above, bad
data may be brought from data collection failure. Short period anomaly usage patterns
may also be identified as bad data even though it is “real” data. More related factors,
such as sudden events, need to be considered in this situation. Redundant data are
also good sources for “real” but anomaly data identification.
For energy theft detection, with a longer time period of smart meter data, the
detection accuracy is probably higher because more data can be used. However, using
longer historical smart meter data may also lead to a detection delay, which means that
we need to achieve a balance between the detection accuracy and detection delay.
Moreover, different private data and simulated data have been tested on different
energy theft detection methods in the existed literature. Without the same dataset,
the superiority of a certain method cannot be guaranteed. The research in this area will
be promoted if some open datasets are provided. Besides, in most cases, one paper
proposes one energy theft detection method. Just like ensemble learning for load
forecasting, can we propose an ensemble detection framework to combine different
individual methods?
For load profiling, the majority of the clustering methods are used for stored smart
meter data. However, the fact is that smart meter data are streaming data. Sometimes,
we need to deal with the massive streaming data in a real-time fashion for specific
applications. Thus, distributed clustering and incremental clustering methods can be
further studied in the field of load profiling. Indirect load profile methods extract
1.2 Load Analysis 11
features first and then conduct clustering on the extracted features. Some clustering
methods such as deep embedding clustering [48] that can implement feature extrac-
tion and clustering at the same time, have been proposed outside the area of electrical
engineering. It is worth trying to apply these state-of-the-art methods to load pro-
filing. Most load profiling methods are evaluated by clustering-based indices, such
as similarity matrix indicator (SMI), Davies–Bouldin indicator (DBI) and Silhouette
Index (SI) [49]. More application-oriented matrices such as forecasting accuracy are
encouraged to be used to guide the selection of suitable clustering methods. Finally,
how to effectively extract meaningful features before clustering to improve the per-
formance and efficiency of load profiling is another issue that needs to be further
addressed.
Load forecasts have been widely used by the electric power industry. Power distribu-
tion companies rely on short- and long-term forecasts at the feeder level to support
operations and planning processes, while retail electricity providers make pricing,
procurement and hedging decisions largely based on the forecasted load of their cus-
tomers. Figure 1.4 presents the normalized hourly profiles of a week for four different
types of loads, including a house, a factory, a feeder, and a city. Loads of a house, a
factory, and a feeder are more volatile than the city-level load. In reality, the higher
level the load is measured at, the smoother the load profile typically is. Developing
a highly accurate forecast is nontrivial at lower levels.
Although the majority of the load forecasting literature has been devoted to fore-
casting at the top (high voltage) level, the information from medium/low voltage
levels, such as distribution feeders and even down to the smart meters, offer some
opportunities to improve the forecasts. A recent review of load forecasting was con-
ducted in [50], focusing on the transition from point load forecasting to probabilistic
load forecasting. In this section, we will review the recent literature for both point and
probabilistic load forecasting with the emphasis on the medium/low voltage levels.
Within the point load forecasting literature, we divide the review based on whether
the smart meter data is used or not.
Compared with the load profiles at the high voltage levels, the load profiles aggre-
gated to a customer group or medium/low voltage level are often more volatile and
sensitive to the behaviors of the customers being served. Some of them, such as the
load of a residential community, can be very responsive to the weather conditions.
Some others, such as the load of a large factor, can be driven by specific work
schedules. Although these load profiles differ by the customer composition, these
12 1 Overview of Smart Meter Data Analytics
House 1
0.5
0
0 12 24 36 48 60 72 84 96 108 120 132 144 156 168
1
Factory
0.5
0
0 12 24 36 48 60 72 84 96 108 120 132 144 156 168
1
Feeder
0.5
0
0 12 24 36 48 60 72 84 96 108 120 132 144 156 168
1
City
0.5
0
0 12 24 36 48 60 72 84 96 108 120 132 144 156 168
Time/Hour
Fig. 1.4 Normalized hourly profiles of a week for four types of loads
load forecasting problems share some common challenges, such as accounting the
influence from the competitive markets, modeling the effects of weather variables,
and leveraging the hierarchy.
In competitive retail markets, electricity consumption is largely driven by the
number of customers. The volatile customer count contributes to the uncertainties
in the future load profile. A two-stage long-term retail load forecasting method was
proposed in [51] to take customer attrition into consideration. The first stage was
to forecast each customer’s load using multiple linear regression with a variable
selection method. The second stage was to forecast customer attrition using survival
analysis. Thus, the product of the two forecasts provided the final retail load forecast.
Another issue in the retail market is the consumers’ reactions to the various demand
response programs. While some consumers may respond to the price signals, others
may not. A nonparametric test was applied to detect the demand-responsive con-
sumers so that they can be forecasted separately [52]. Because the authors did not
find publicly available demand data for individual consumers, the experiment was
conducted using aggregate load in the Ontario power gird.
Since the large scale adoption of electrical air conditioning systems in the 1940s,
capturing the effects of weather on load has been a major issue in load forecasting.
Most load forecasting models in the literature include temperature variables and their
variants, such as lags and averages. How many lagged hourly temperatures and mov-
ing average temperatures can be included in a regression model? An investigation
was conducted in [53]. The case study was based on the data from the load fore-
casting track of GEFCom2012. An important finding is that a regression-based load
forecasting model estimated using two to three years of hourly data may include
more than a thousand parameters to maximize the forecast accuracy. In addition,
each zone may need a different set of lags and moving averages.
1.3 Load Forecasting 13
Not many load forecasting papers are devoted to other weather variables. How to
include humidity information in load forecasting models was discussed in [54], where
the authors discovered that the temperature-humidity index (THI) might not be op-
timal for load forecasting models. Instead, separating relative humidity, temperature
and their higher-order terms and interactions in the model, with the corresponding
parameters being estimated by the training data, were producing more accurate load
forecasts than the THI-based models. A similar investigation was performed for wind
speed variables in [55]. Comparing with the models that include wind chill index
(WCI), the ones with wind speed, temperature, and their variants separated were
more accurate.
The territory of a power company may cover several micro-climate zones. Cap-
turing the local weather information may help improve the load forecast accuracy
for each zone. Therefore, proper selection of weather stations would contribute to
the final load forecast accuracy. Weather station selection was one of the challenges
designed into the load forecasting track of GEFCom2012 [56]. All four winning
team adopted the same strategy: first deciding how many stations should be selected,
and then figuring out which stations to be selected [57–60]. A different and more ac-
curate method was proposed in [61], which follows a different strategy, determining
how many and which stations to be selected at the same time instead of sequentially.
The method includes three steps: rating and ranking the individual weather stations,
combining weather stations based on a greedy algorithm, and rating and ranking the
combined stations. The method is currently being used by many power companies,
such as the North Carolina Electric Membership Corporation, which was used as one
of the case studies in [61].
The pursuit of operational excellence and large-scale renewable integration is
pushing load forecasting toward the grid edge. Distribution substation load forecast-
ing becomes another emerging topic. One approach is to adopt the forecasting tech-
niques and models with good performance at higher levels. For instance, a three-stage
methodology, which consists of preprocessing, forecasting, and postprocessing, was
taken to forecast loads of three datasets ranging from distribution level to transmis-
sion level [62]. A semi-parametric additive model was proposed in [63] to forecast
the load of the Australian National Electricity Market. The same technique was also
applied to forecast more than 2200 substation loads of the French distribution net-
work in [64]. Another load forecasting study on seven substations from the French
network was reported in [65], where a conventional time series forecasting method-
ology was used. The same research group then proposed a neural network model to
forecast the load of two French distribution substations, which outperformed a time
series model [66].
Another approach to distribution load forecasting is to leverage the connection
hierarchy of the power grid. In [67], The load of a root node of any subtree was fore-
casted first. The child nodes were then treated separately based on their similarities.
The forecast of a “regular” node was proportional to the parent node forecast, while
the “irregular” nodes were forecasted individually using neural networks. Another
attempt to make use of the hierarchical information for load forecasting was made in
[68]. Two case studies were conducted, one based on New York City and its substa-
14 1 Overview of Smart Meter Data Analytics
tions, and the other one based on PJM and its substations. The authors demonstrated
the effectiveness of aggregation in improving the higher-level load forecast accuracy.
The value that smart meters bring to load forecasting is two-fold. First, smart meters
make it possible for the local distribution companies and electricity retailers to better
understand and forecast the load of an individual house or building. Second, the high
granularity load data provided by smart meters offer great potential for improving
the forecast accuracy at aggregate levels.
Because the electricity consumption behaviors at the household and building
levels can be much more random and volatile than those at aggregate levels, the tra-
ditional techniques and methods developed for load forecasting at an aggregate level
may or may not be well suited. To tackle the problem of smart meter load forecasting,
the research community has taken several different approaches, such as evaluating
and modifying the existing load forecasting techniques and methodologies, adopting
and inventing new ones, and a mixture of them.
A highly cited study compared seven existing techniques, including linear re-
gression, ANN, SVM, and their variants [69]. The case study was performed based
on two datasets: one containing two commercial buildings and the other containing
three residential homes. The study demonstrated that these techniques could produce
fine forecasts for the two commercial buildings but not the three residential homes.
A self-recurrent wavelet neural network (SRWNN) was proposed to forecast an
education building in a microgrid setting [70]. The proposed SRWNN was shown to
be more accurate than its ancestor wavelet neural network (WNN) for both building-
level load forecasting (e.g., a 694 kW peak education building in British Columbia,
Canada) and state- or province-level load forecasting (e.g., British Columbia and
California).
Some researchers tried deep learning techniques for the household- and building-
level load forecasting. Conditional Restricted Boltzmann Machine (CRBM) and Fac-
tored Conditional Restricted Boltzmann Machine (FCRBM) were assessed in [71] to
estimate energy consumption for a household and three submetering measurements.
FCRBM achieves the highest load forecast accuracy compared with ANN, RNN,
SVM, and CRBM. Different resolutions ranging from one minute to one week have
been tested. A pooling-based deep recurrent neural network (RNN) was proposed
in [72] to learn spatial information shared between interconnected customers and
to address the over-fitting challenges. It outperformed ARIMA, SVR, and classical
deep RNN on the Irish CER residential dataset.
Sparsity is a key character in household-level load forecasting. A spatiotemporal
forecasting approach was proposed in [73], which incorporated a large dataset of
many driving factors of the load for all surrounding houses of a target house. The
proposed method combined ideas from Compressive Sensing and data decompo-
sition to exploit the low-dimensional structures governing the interactions among
1.3 Load Forecasting 15
the nearby houses. The Pecan Street data was used to evaluate the proposed method.
Sparse coding was used to model the usage patterns in [74]. The case study was based
on a dataset collected from 5000 households in Chattanooga, TN, where Including
the sparse coding features led to 10% improvements in forecast accuracy. A least
absolute shrinkage and selection (LASSO)-based sparse linear method was proposed
to forecast individual consumption in [75]. The consumer’s usage patterns can be
extracted from the non-zero coefficients, and it was proven that data from other con-
sumers contribute to the fitted residual. Experiments on real data from Pacific Gas
and Electric Company showed that the LASSO-based method has low computational
complexity and comparable accuracy.
A commonly used method to reduce noise in smart meter data is to aggregate the
individual meters. To keep the salient features from being buried during aggregation,
clustering techniques are often used to group similar meters. In [76], next-day load
forecasting was formulated as a functional time series problem. Clustering was first
performed to classify the historical load curves into different groups. The last ob-
served load curve was then assigned to the most similar cluster. Finally, based on the
load curves in this cluster, a functional wavelet-kernel (FWK) approach was used
to forecast the next-day load curve. The results showed that FWK with clustering
outperforms simple FWK. Clustering was also conducted in [77] to obtain the load
patterns. Classification from contextual information, including time, temperature,
date, and economic indicator to clusters, was then performed. Based on the trained
classifier, the daily load can be forecasted with known contextual information. A
shape-based clustering method was performed in [78] to capture the time drift char-
acteristic of the individual load, where the cluster number was smaller than those
obtained by traditional Euclidean-distance-based clustering methods. The clustering
method is quite similar to k-means, while the distance is quantified by dynamic time
warping (DTW). Markov models were then constructed to forecast the shape of the
next-day load curve. Similar to the clustering method proposed in [78], a k-shape
clustering was proposed in [79] to forecast building time-series data, where the time
series shape similarity was used to update the cluster memberships to address the
time-drift issue.
The fine-grained smart meter data also introduce new perspectives to the aggrega-
tion level load forecasting. A clustering algorithm can be used to group customers.
Each customer group can then be forecasted with different forecasting models.
Finally, the aggregated load forecast can be obtained by summing the load forecast
of each group. Two datasets including the Irish CER residential dataset and another
dataset from New York were used to build the case study in [80]. Both showed that
forecast errors can be reduced by effectively grouping different customers based on
their energy consumption behaviors. A similar finding was presented in [81] where
the Irish CER residential dataset was used in the case study. The results showed that
cluster-based forecasting can improve the forecasting accuracy and that the perfor-
mance depends on the number of clusters and the size of the consumer.
The relationship between group size and forecast accuracy based on Seasonal-
Naïve and Holt-Winters algorithms was investigated in [82]. The results showed
that forecasting accuracy increases as group size increases, even for small groups.
16 1 Overview of Smart Meter Data Analytics
A simple empirical scaling law is proposed in [83] to describe how the accuracy
changes as different aggregation levels. The derivation of the scaling law is based on
the Mean Absolute Percentage Error (MAPE). Case studies on the data from Pacific
Gas and Electric Company show that MAPE decreases quickly with the increase of
the number of consumers when the number of consumers is less than 100,000. When
the number of consumers is more than 100,000, the MAPE has a little decrease.
Forecast combination is a well-known approach to accuracy improvement. A
residential load forecasting case study showed that the ensembles outperformed all
the individual forecasts from traditional load forecasting models [84]. By varying the
number of clusters, different forecasts can be obtained. A novel ensemble forecasting
framework was proposed in [85] to optimally combine these forecasts to further
improve the forecasting accuracy.
Traditional error measures such as MAPE cannot reasonably quantify the perfor-
mance of individual load forecasting due to the violation and time-shifting character-
istics. For example, MAPE can easily be influenced by outliers. A resistant MAPE
(r-MAPE) based on the calculation of the Huber M-estimator was proposed in [86] to
overcome this situation. The mean arctangent absolute percentage error (MAAPE)
was proposed in [87] to consider the intermittent nature of individual load profiles.
MAAPE, a variation of MAPE, is a slope as an angle, the tangent of which is equal
to the ratio between the absolute error and real value, i.e., the absolute percentage
error (APE). An error measure designed for household-level load forecasts was pro-
posed in [88] to address the time-shifting characteristic of household-level loads. In
addition to these error measures, some modifications of MAPE and mean absolute
error (MAE) have been used in other case studies [74, 75].
a range. An empirical formula was also proposed to select parameters for the tem-
perature scenario generation methods. The idea of generating temperature scenarios
was also applied in [93]. An embedding based quantile regression neural network
was used as the regression model instead of the MLR model, where the embedding
layer can model the effect of calendar variables. In this way, the uncertainties of both
future temperature and the relationship between temperature and load can be com-
prehensively considered. The scenario generation method was also used to develop
a probabilistic view of power distribution system reliability indices [94].
On the output side, one can convert point forecasts to probabilistic ones via resid-
ual simulation or forecast combination. Several residual simulation methods were
evaluated in [95]. The results showed that the residuals do not always follow a nor-
mal distribution, though group analysis increases the passing rate of normality tests.
Adding simulated residuals under the normality assumption improves probabilis-
tic forecasts from deficient models, while the improvement is diminishing as the
underlying model improves. The idea of combining point load forecasts to gener-
ate probabilistic load forecasts was first proposed in [96]. The quantile regression
averaging (QRA) method was applied to eight sister load forecasts, a set of point
forecasts generated from homogeneous models developed in [53]. A constrained
QRA (CQRA) was proposed in [97] to combine a series of quantiles obtained from
individual quantile regression models.
Both approaches mentioned above rely on point forecasting models. It is still an
unsolved question whether a more accurate point forecasting model can lead to a
more skilled probabilistic forecast within this framework. An attempt was made in
[98] to answer this question. The finding is that when the two underlying models
are significantly different w.r.t. the point forecast accuracy, a more accurate point
forecasting model would lead to a more skilled probabilistic forecast.
Various probabilistic forecasting models have been proposed by statisticians and
computer scientists, such as quantile regression, Gaussian process regression, and
density estimation. These off-the-shelf models can be directly applied to generate
probabilistic load forecasts [50]. In GEFCom2014, a winning team developed a quan-
tile generalized additive model (quantGAM), which is a hybrid of quantile regression
and generalized additive models [99]. Probabilistic load forecasting has also been
conducted on individual load profiles. Combining the gradient boosting method and
quantile regression, a boosting additive quantile regression method was proposed in
[100] to quantify the uncertainty and generate probabilistic forecasts. Apart from
18 1 Overview of Smart Meter Data Analytics
the quantile regression model, kernel density estimation methods were tested in
[101]. The density of electricity data was modeled using different implementations
of conditional kernel density (CKD) estimators to accommodate the seasonality in
consumption. A decay parameter was used in the density estimation model for recent
effects. The selection of kernel bandwidths and the presence of boundary effects are
two main challenges with the implementation of CKD that were also investigated.
1.3.4 Remarks
Table 1.2 provides the correspondence between the key techniques and the surveyed
references in smart meter data analytics for load forecasting.
Forecasting the loads at aggregate levels is a relatively mature area. Nevertheless,
there are some nuances in the smart grid era due to the increasing need of highly
accurate load forecasts. One is on the evaluation methods. Many forecasts are being
evaluated using widely used error measures such as MAPE, which does not consider
the consequences of over- or under-forecasts. In reality, the cost to the sign and mag-
nitude of errors may differ significantly. Therefore, the following research question
rises: how can the costs of forecast errors be integrated into the forecasting processes?
Some research in this area would be helpful to bridge the gap between forecasting and
decision making. The second one is load transfer detection, which is a rarely touched
area in the literature. Distribution operators may transfer the load from one circuit to
another permanently, seasonally, or on an ad hoc basis, in response to maintenance
needs or reliability reasons. These load transfers are often poorly documented. With-
out smart meter information, it is difficult to physically trace the load blocks being
transferred. Therefore, a data-driven approach is necessary in these situations. The
third one is hierarchical forecasting, specifically, how to fully utilize zonal, regional,
or meter load and local weather data to improve the load forecast accuracy. In addi-
tion, it is worth studying how to reconcile the forecasts from different levels for the
applications of aggregators, system operators, and planners. The fourth one is on the
emerging factors that affect electricity demand. The consumer behaviors are being
changed by many modern technologies, such as rooftop solar panels, large batteries,
and smart home devices. It is important to leverage the emerging data sources, such
as technology adoption, social media, and various marketing surveys.
To comprehensively capture the uncertainties in the future, researchers and prac-
titioners recently started to investigate in probabilistic load forecasting. Several areas
within probabilistic load forecasting would need some further attention. First, dis-
tributed energy resources and energy storage options often disrupt the traditional
load profiles. Some research is needed to generate probabilistic net load forecasts
for the system with high penetration of renewable energy and large scale storage.
Secondly, forecast combination is widely regarded in the point forecasting literature
as an effective way to enhance the forecast accuracy. There is a primary attempt
in [97] to combine quantile forecasts. Further investigations can be conducted on
combining other forms probabilistic forecasts, such as density forecasts and interval
forecasts. Finally, the literature of probabilistic load forecasting for smart meters is
still quite limited. Since the meter-level loads are more volatile than the aggregate
loads, probabilistic forecasting has a natural application in this area.
The electricity consumption behaviors of the consumers are closely related to their
socio-demographic status. Bridging the load profiles to socio-demographic status is
an important approach to classify the consumers and realize personalized services.
A naive problem is to detect consumer types according to the load profiles. The other
20 1 Overview of Smart Meter Data Analytics
two issues are identifying socio-demographic information from load profiles and
predicting the load shapes using the socio-demographic information.
Identifying the type of consumers can be realized by simple classification. The
temporal load profiles were first transformed into the frequency domain in [104] using
fast Fourier transformation (FFT). Then the coefficients of different frequencies were
used as the inputs of classification and regression tree (CART) to place consumers
in different categories. FFT decomposes smart meter data based on a certain sine
function and cosine function. Another transformation technique, sparse coding, has
no assumption on the base signal but learns them automatically. Non-negative sparse
coding was applied to extract the partial usage patterns from original load profiles in
[105]. Based on the partial usage patterns, linear SVM was implemented to classify
the consumers into residents and small and medium-sized enterprises (SME). The
classification accuracy is considerably higher than the discrete wavelet transform
(DWT) and PCA.
There are still consumers without smart meter installations. External data, such
as the socio-demographic status of consumers, are applied to estimate their load pro-
files. Clustering was first implemented to classify consumers into different energy
behavior groups, and then energy behavior correlation rate (EBCR) and indicator
dominance index (IGD) were defined and calculated to identify the indicators higher
than a threshold [106]. Finally, the relationship between different energy behav-
ior groups and their socio-demographic status was mapped. Spectral clustering was
applied to generate typical load profiles, which were then used as the inputs of pre-
dictors such as random forests (RF) and stochastic boosting (SB) in [107]. The results
showed that with commercial and cartographic data, the load profiles of consumers
can be accurately predicted. The stepwise selection was applied to investigate the
factors that have a great influence on residential electricity consumption in [108].
The location, floor area, the age of consumers, and the number of appliances are
main factors, while the income level and homeownership have little relationship
with consumption. A multiple linear regression model was used to bridge the total
electricity consumption, maximum demand, load factor, and ToU to dwelling and
occupant socioeconomic variables in [109]. The factors that have a great impact on
total consumption, maximum load, load factor, and ToU were identified. The influ-
ence of socioeconomic status of consumers’ electricity consumption patterns was
evaluated in [110]. RF regression was applied to combine socioeconomic status and
environmental factors to predict the consumption patterns.
More works focus on how to mine the socio-demographic information of con-
sumers from the massive smart meter data. One approach is based on a clustering
algorithm. DPMM was applied in [111] for household and business premise load
profiling where the number of clusters was not required to predetermined. The clus-
tering results obtained by the DPMM algorithm have a clear corresponding relation
with the metadata of dwellings, such as the nationality, household size, and type of
dwelling. Based on the clustering results, multinomial logistic regression was applied
to the clusters and dwelling and appliance characteristics in [112]. Each cluster was
analyzed according to the coefficients of the regression model. Feature extraction
and selection have also been applied as the attributes of the classifier. A feature set
1.4 Load Management 21
including the average consumption over a certain period, the ratios of two consump-
tions in different periods, and the temporal properties was established in [113]. Then,
classification or regression was implemented to predict the socio-demographic sta-
tus according to these features. Results showed that the proposed feature extraction
method outperforms the biased random guess. More than 88 features from consump-
tion, ratios, statistics, and temporal characteristics were extracted, and then corre-
lation, KS-test, and η2 -based feature selection methods were conducted in [114].
The so-called extend CLASS classification framework was used to forecast the de-
duced properties of private dwellings. A supervised classification algorithm called
dependent-independent data classification (DID-Class) was proposed to address the
challenges of dependencies among multiple classification-relevant variables in [115].
The characteristics of dwellings were recognized based on this method, and com-
parisons with SVM and traditional CLASS proposed in [113] were conducted. The
accuracy of DID-Class with SVM and CLASS is slightly higher than those of SVM
and CLASS. To capture the intra-day and inter-day electricity consumption behav-
ior of the consumers, a two-dimensional convolutional neural network (CNN) was
used in [116] to make a bridge between the smart meter data and socio-demographic
information of the consumers. The deep learning method can extract the features
automatically and outperforms traditional methods.
Demand response program marketing is to target consumers who have a large poten-
tial to be involved in demand response programs. On one hand, 15 min or half-hour
smart meter data cannot provide detail information on the operation status of the
appliance; on the other hand, the altitude of consumers towards demand response
is hard to model. Thus, the demand response potential cannot be evaluated directly.
In this subsection, the potential of demand response can be indirectly evaluated by
analyzing the variability, sensitivity to temperature, and so forth.
Variability is a key index for evaluating the potential of demand response. A
hidden Markov model (HMM)-based spectral clustering was proposed in [117] to
describe the magnitude, duration, and variability of the electricity consumption and
further estimate the occupancy states of consumers. The information on the vari-
ability, occupancy states, and inter-temporal consumption dynamics can help retail-
ers or aggregators target suitable consumers at different time scales. Both adaptive
k-means and hierarchical clustering were used to obtain the typical load shapes of
all the consumers within a certain error threshold in [118]. The entropy of each con-
sumer was then calculated according to the distribution of daily load profiles over
a year, and the typical shapes of load profiles were analyzed. The consumers with
lower entropy have relatively similar consumption patterns on different days and can
be viewed as a greater potential for demand response because their load profiles are
more predictable. Similarly, the entropy was calculated in [46] based on the state
transition matrix. It was stated that the consumers with high entropy are suitable
22 1 Overview of Smart Meter Data Analytics
for price-based demand response for their flexibility to adjust their load profile ac-
cording to the change in price, whereas the consumers with low entropy are suitable
for incentive-based demand response for their predictability to follow the control
commands.
Estimation of electricity reduction is another approach for demand response po-
tential. A mixture model clustering was conducted on a survey dataset and smart
meter data in [47] to evaluate the potential for active demand reduction with wet ap-
pliances. The results showed that both the electricity demand of wet appliances and
the attitudes toward demand response have a great influence on the potential for load
shifting. Based on the GMM model of the electricity consumption of consumers and
the estimated baseline, two indices, i.e., the possibility of electricity reduction greater
than or equal to a certain value and the least amount of electricity reduction with
a certain possibility were calculated in [119]. These two indices can help demand
response implementers have a probabilistic understanding of how much electricity
can be reduced. A two-stage demand response management strategy was proposed
in [120], where SVM was first used to detect the devices and users with excess
load consumption and then a load balancing algorithm was performed to balance the
overall load.
Since appliances such as heating, ventilation and air conditioning (HVAC) have
great potential for demand response, the sensitivity of electricity consumption to
outdoor air temperature is an effective evaluation criterion. Linear regression was
applied to smart meter data and temperature data to calculate this sensitivity, and
the maximum likelihood approach was used to estimate the changing point in [121].
Based on that, the demand response potentials at different hours were estimated.
Apart from the simple regression, an HMM-based thermal regime was proposed
to separate the original load profile into the thermal profile (temperature-sensitive)
and base profile (non-temperature-sensitive) in [122]. The demand response potential
can be calculated for different situations, and the proposed method can achieve much
more savings than random selection. A thermal demand response ranking method
was proposed in [123] for demand response targeting, where the demand response
potential was evaluated from two aspects: temperature sensitivity and occupancy.
Both linear regression and breakpoint detection were used to model the thermal
regimes; the true linear response rate was used to detect the occupancy.
Demand response can be roughly divided into price-based demand response and
incentive-based demand response. Price design is an important business model to
attract consumers and maximize profit in price-based demand response programs;
baseline estimation is the basis of quantifying the performance of consumers in
incentive-based demand response programs. The applications of smart meter data
analytics in price design and baseline estimation are summarized in this subsection.
1.4 Load Management 23
For tariff design, an improved weighted fuzzy average (WFA) k-means was first
proposed to obtain typical load profiles in [124]. An optimization model was then
formulated with a designed profit function, where the acceptance of consumers over
price was modeled by a piecewise function. The similar price determination strategy
was also presented in [125]. Conditional value at risk (CVaR) for the risk model
was further considered in [126] such that the original optimization model becomes a
stochastic one. Different types of clustering algorithms were applied to extract load
profiles with a performance index granularity guided in [127]. The results showed
that different clusterings with different numbers of clusters and algorithms lead to
different costs. GMM clustering was implemented on both energy prices and load
profiles in [128]. Then, ToU tariff was developed using different combinations of the
classifications of time periods. The impact of the designed price on demand response
was finally quantified.
For baseline estimation, five naive baseline methods, HighXofY, MidXofY,
LowXofY, exponential moving average, and regression baselines, were introduced
in [129]. Different demand response scenarios were modeled and considered. The
results showed that bias rather than accuracy is the main factor for deciding which
baseline provides the largest profits. To describe the uncertainty within the consump-
tion behaviors of consumers, Gaussian-process-based probabilistic baseline estima-
tion was proposed in [130]. In addition, how the aggregation level influences the
relative estimation error was also investigated. k-means clustering of the load pro-
files in non-event days was first applied in [131], and a decision tree was used to
predict the electricity consumption level according to demographics data, including
household characteristics and electrical appliances. Thus, a new consumer can be
directly classified into a certain group before joining the demand response program
and then simple averaging and piecewise linear regression were used to estimate to
baseline load in different weather conditions. Selecting a control group for baseline
estimation was formulated as an optimization problem in [132]. The objective was
to minimize the difference between the load profiles of the control group and de-
mand response group when there is no demand response event. The problem was
transformed into a constrained regression problem.
1.4.4 Remarks
Table 1.3 provides the correspondence between the key techniques and the surveyed
references in smart meter data analytics for load management.
For consumer characterization, it is essentially a high dimensional and nonlinear
classification problem. There are at least two ways to improve the performance of
consumer characterization: (1) conducting feature extraction or selection; (2) devel-
oping classification models. In the majority of existing literature, the features for
consumer characterization are manually extracted. A data-driven feature extraction
method might be an effective way to further improve performance. The classification
is mainly implemented by the shallow learning models such as ANN and SVM. We
24 1 Overview of Smart Meter Data Analytics
can try different deep learning networks to tackle high nonlinearity. We also find that
the current works are mainly based on the Irish dataset [133]. Low Carbon London
dataset may be another good choice. More open datasets are needed to enrich the
research in this area.
For demand response program marketing, evaluating the potential for load shifting
or reduction is an effective way to target suitable consumers for different demand
response programs. Smart meter data with a frequency of 30 min or lower cannot
reveal the operation states of the individual appliance; thus, several indirect indices,
including entropy, sensitivity to temperature and price, are used. More indices can
be further proposed to provide a comprehensive understanding of the electricity
consumption behavior of consumers. Since most papers target potential consumers
for demand response according to the indirect indices, a critical question is why and
how these indices can reflect the demand response potential without experimental
evidence? More real-world experimental results are welcomed for the research.
For demand response implementation, all the price designs surveyed above are
implemented with a known acceptance function against price. However, the accep-
tance function or utility function is hard to estimate. How to obtain the function
has not been introduced in the existing literature. If the used acceptance function
or utility function is different from the real one, the obtained results will deviate
from the optimal results. Sensitivity analysis of the acceptance function or utility
function assumption can be further conducted. Except for traditional tariff design,
some innovative prices can be studied, such as different tariff packages based on
fine-grained smart meter data. For baseline estimation, in addition to deterministic
estimation, probabilistic estimation methods can present more future uncertainties.
Another issue is how to effectively incorporate the deterministic or probabilistic
baseline estimation results into demand response scheduling problem.
1.5 Miscellanies 25
1.5 Miscellanies
In addition to the three main applications summarized above, the works on smart
meter data analytics also cover some other applications, including power network
connection verification, outage management, data compression, data privacy, and so
forth. Since only several trials have been conducted in these areas and the works in
the literature are not so rich, the works are summarized in this miscellanies section.
The distribution connection information can help utilities and DSO make the optimal
decision regarding the operation of the distribution system. Unfortunately, the entire
topology of the system may not be available especially at low voltage levels. Several
works have been conducted to identify the connections of different demand nodes
using smart meter data.
Correlation analysis of the hourly voltage and power consumption data from
smart meters were used to correct connectivity errors in [134]. The analysis assumed
that the voltage magnitude decreases downstream along the feeder. However, the
assumption might be incorrect when there is a large amount of distributed renewable
energy integration. In addition to consumption data, both the voltage and current
data were used in [135] to estimate the topology of the distribution system secondary
circuit and the impedance of each branch. This estimation was conducted in a greedy
fashion rather than an exhaustive search to enhance computational efficiency. The
topology identification problem was formulated as an optimization problem min-
imizing the mutual-information-based Kullback–Leibler (KL) divergence between
each two voltage time series in [136]. The effectiveness of mutual information was
discussed from the perspective of conditional probability. Similarly, based on the
assumption that the correlation between interconnected neighboring buses is higher
than that between non-neighbor buses, the topology identification problem was for-
mulated as a probabilistic graph model and a Lasso-based sparse estimation problem
in [137]. How to choose the regularization parameter for Lasso regression was also
discussed.
The electricity consumption data at different levels were analyzed by PCA in
[138] for both phase and topology identification where the errors caused by tech-
nical loss, smart metering, and clock synchronization were formulated as Gaussian
distributions. Rather than using all smart meter data, a phase identification problem
with incomplete data was proposed in [139] to address the challenge of bad data or
null data. The high-frequency load was first obtained by a Fourier transform, and
then the variations in high-frequency load between two adjacent time intervals were
extracted as the inputs of saliency analysis for phase identification. A sensitivity
analysis of smart meter penetration ratios was performed and showed that over 95%
accuracy can be achieved with only 10% smart meters.
26 1 Overview of Smart Meter Data Analytics
Massive smart meter data present more challenges with respect to data communi-
cation and storage. Compressing smart meter data to a very small size and without
(large) loss can ease the communication and storage burden. Data compression can
be divided into lossy compression and lossless compression. Different compression
methods for electric signal waveforms in smart grids are summarized in [148].
Some papers exist that specifically discuss the smart meter data compression
problem. Note that the changes in electricity consumption in adjunct time periods
are much smaller than the actual consumption, particularly for very high-frequency
data. Thus, combining normalization, variable-length coding, and entropy coding,
and the differential coding method was proposed in [149] for the lossless compression
of smart meter data. While different lossless compression methods, including IEC
62056-21, A-XDR, differential exponential Golomb and arithmetic (DEGA) coding,
and Lempel Ziv Markov chain algorithm (LZMA) coding, were compared on REDD
1.5 Miscellanies 27
and SAG datasets in [150]. The performances on the data with different granularities
were investigated. The results showed that these lossless compression methods have
better performance on higher granularity data.
For low granularity (such as 15 min) smart meter data, symbolic aggregate ap-
proximation (SAX), a classic time series data compression method, was used in [46,
151] to reduce the dimensionality of load profiles before clustering. The distribution
of load profiles was first fitted by generalized extreme value in [152]. A feature-
based load data compression method (FLDC) was proposed by defining the base
state and stimulus state of the load profile and detecting the change in load status.
Comparisons with the piecewise aggregate approximation (PAA), SAX, and DWT
were conducted. Non-negative sparse coding was applied to transform original load
profiles into a higher dimensional space in [105] to identify the partial usage patterns
and compress the load in a sparse way.
One of the main oppositions and concerns for the installation of smart meters is
the privacy issue. The socio-demographic information can be inferred from the fine-
grained smart meter data, as introduced in Sect. 1.4. Several works in the literature
discuss how to preserve the privacy of consumers.
A study on the distributed aggregation architecture for additive smart meter data
was conducted in [153]. A secure communication protocol was designed for the gate-
ways placed at the consumers’ premises to prevent revealing individual data informa-
tion. The proposed communication protocol can be implemented in both centralized
and distributed manners. A framework for the trade-off between privacy and utility
requirement of consumers was presented in [154] based on a hidden Markov model.
The utility requirement was evaluated by the distortion between the original and the
perturbed data, while the privacy was evaluated by the mutual information between
the two data sequences. Then, a utility-privacy trade-off region was defined from
the perspective of information theory. This trade-off was also investigated in [155],
where the attack success probability was defined as an objective function to be mini-
mized and ε-privacy was formulated. The aggregation of individual smart meter data
and the introduction of colored noise were used to reduce the success probability.
Edge detection is one main approach for NILM to identify the status of appli-
ances. How the data granularity of smart meter data influences the edge detection
performance was studied in [156]. The results showed that when the data collection
frequency is lower than half the on-time of the appliance, the detection rate dramat-
ically decreases. The privacy was evaluated by the F-score of NILM. The privacy
preservation problem was formulated as an optimization problem in [157], where
the objective was to minimize the sum of the expected cost, disutility of consumers
caused by the late use of appliances, and information leakage. Eight privacy-enhanced
scheduling strategies considering on-site battery, renewable energy resources, and
appliance load moderation were comprehensively compared.
28 1 Overview of Smart Meter Data Analytics
1.6 Conclusions
In this chapter, we have provided a comprehensive review of smart meter data an-
alytics in retail markets, including the applications in load forecasting, abnormal
detection, consumer segmentation, and demand response. The latest developments
in this area have been summarized and discussed. In addition, we have proposed
future research directions from the prospective big data issue, developments of ma-
chine learning, novel business model, energy system transition, and data privacy and
security. Smart meter data analytics is still an emerging and promising research area.
We hope that this review can provide readers a complete picture and deep insights
into this area.
References
1. Mohassel, R. R., Fung, A., Mohammadi, F., & Raahemifar, K. (2014). A survey on advanced
metering infrastructure. International Journal of Electrical Power & Energy Systems, 63,
473–484.
2. Yang, J., Zhao, J., Luo, F., Wen, F., & Dong, Z. Y. (2017). Decision-making for electricity
retailers: A brief survey. IEEE Transactions on Smart Grid, 9(5), 4140–4153.
3. National Science Foundation. (2016). Smart grids big data. https://www.nsf.gov/awardsearch/
showAward?AWD_ID=1636772&HistoricalAwards=false.
4. Liu, X., Heller, A., & Nielsen P. S. (2017). CITIESData: A smart city data management
framework. Knowledge and Information Systems, 53(3), 699–722.
5. Bits to energy lab projects. Retrieved July 31, 2017, from http://www.bitstoenergy.ch/home/
projects/.
6. Siebel Energy Institute. (2016). Advancing the science of smart energy. http://www.
siebelenergyinstitute.org/.
7. Wp3 overview. Retrieved July 31, 2017, from https://webgate.ec.europa.eu/fpfis/mwikis/
essnetbigdata/index.php/WP3_overview.
8. SAS. (2017). Utility analytics in 2017: Aligning data and analytics with business strategy.
Technical report.
9. Hong, T., Gao, D. W., Laing, T., Kruchten, D., & Calzada, J. (2018). Training energy data
scientists: Universities and industry need to work together to bridge the talent gap. IEEE
Power and Energy Magazine, 16(3), 66–73.
10. Keerthisinghe, C., Verbič, G., & Chapman, A. C. (2016). A fast technique for smart home
management: ADP with temporal difference learning. IEEE Transactions on Smart Grid,
9(4), 3291–3303.
11. Pratt, A., Krishnamurthy, D., Ruth, M., Hongyu, W., Lunacek, M., & Vaynshenk, P. (2016).
Transactive home energy management systems: The impact of their proliferation on the elec-
tric grid. IEEE Electrification Magazine, 4(4), 8–14.
12. Morstyn, T., Farrell, N., Darby, S. J., & McCulloch, M. D. (2018). Using peer-to-peer energy-
trading platforms to incentivize prosumers to form federated power plants. Nature Energy,
3(2), 94.
13. Hodge, V., & Austin, J. (2004). A survey of outlier detection methodologies. Artificial Intel-
ligence Review, 22(2), 85–126.
14. Peppanen, J., Zhang, X., Grijalva, S., & Reno, M. J. (2016). Handling bad or missing smart
meter data through advanced data imputation. In IEEE Power & Energy Society Innovative
Smart Grid Technologies Conference (ISGT), pp. 1–5.
References 29
15. Akouemo, H. N., & Povinelli, R. J. (2017). Data improving in time series using ARX and
ANN models. IEEE Transactions on Power Systems, 32(5), 3352–3359.
16. Li, X., Bowers, C. P., & Schnier, T. (2010). Classification of energy consumption in buildings
with outlier detection. IEEE Transactions on Industrial Electronics, 57(11), 3639–3644.
17. Jian, L., Tao, H., & Meng, Y. (2018). Real-time anomaly detection for very short-term load
forecasting. Journal of Modern Power Systems and Clean Energy, 6(2), 235–243.
18. Mateos, G., & Giannakis, G. B. (2013). Load curve data cleansing and imputation via sparsity
and low rank. IEEE Transactions on Smart Grid, 4(4), 2347–2355.
19. Huang, H., Yan, Q., Zhao, Y., Wei, L., Liu, Z., & Li, Z. (2017). False data separation for data
security in smart grids. Knowledge and Information Systems, 52(3), 815–834.
20. Al-Wakeel, A., Jianzhong, W., & Jenkins, N. (2017). k-means based load estimation of do-
mestic smart meter measurements. Applied Energy, 194, 333–342.
21. Al-Wakeel, A., Jianzhong, W., & Jenkins, N. (2016). State estimation of medium voltage
distribution networks using smart meter measurements. Applied Energy, 184, 207–218.
22. Araya, D. B., Grolinger, K., ElYamany, H. F., Capretz, M. A., & Bitsuamlak, G. (2017). An
ensemble learning framework for anomaly detection in building energy consumption. Energy
and Buildings, 144, 191–206.
23. Liu, X., Iftikhar, N., Nielsen, P. S., & Heller, A. (2016). Online anomaly energy consumption
detection using lambda architecture. In International Conference on Big Data Analytics and
Knowledge Discovery, pp. 193–209.
24. Jokar, P., Arianpoo, N., & Leung, V. C. (2016). Electricity theft detection in AMI using
customers’ consumption patterns. IEEE Transactions on Smart Grid, 7(1), 216–226.
25. Wang, K., Wang, B., & Peng, L. (2009). Cvap: validation for cluster analyses. Data Science
Journal, 8, 88–93.
26. Depuru, S. S. S. R., Wang, L., Devabhaktuni, V., & Green, R. C. (2013). High performance
computing for detection of electricity theft. International Journal of Electrical Power &
Energy Systems, 47, 21–30.
27. Jindal, A., Dua, A., Kaur, K., Singh, M., Kumar, N., & Mishra, S. (2016). Decision tree and
SVM-based data analytics for theft detection in smart grid. IEEE Transactions on Industrial
Informatics, 12(3), 1005–1016.
28. Júnior, L. A. P., Ramos, Caio C. O., Rodrigues, D., Pereira, D. R., de Souza, A. N., da Costa, K.
A. P. & Papa, J. P. (2016). Unsupervised non-technical losses identification through optimum-
path forest. Electric Power Systems Research, 140, 413–423.
29. Nizar, A. H., Dong, Z. Y., & Wang, Y. (2008). Power utility nontechnical loss analysis with
extreme learning machine method. IEEE Transactions on Power Systems, 23(3), 946–955.
30. Botev, V., Almgren, M., Gulisano, V., Landsiedel, O., Papatriantafilou, M., & van Rooij, J.
(2016). Detecting non-technical energy losses through structural periodic patterns in AMI
data. In IEEE International Conference on Big Data, pp. 3121–3130.
31. Janetzko, H., Stoffel, F., Mittelstädt, S., & Keim, D. A. (2014). Anomaly detection for visual
analytics of power consumption data. Computers & Graphics, 38, 27–37
32. Chicco, G. (2012). Overview and performance assessment of the clustering methods for
electrical load pattern grouping. Energy, 42(1), 68–80.
33. Zhou, K., Yang, S., & Shen, C. (2013). A review of electric load classification in smart grid
environment. Renewable and Sustainable Energy Reviews, 24, 103–110.
34. Wang, Y., Chen, Q., Kang, C., Zhang, M., Wang, K., & Zhao, Y. (2015). Load profiling and its
application to demand response: A review. Tsinghua Science and Technology, 20(2), 117–129.
35. Granell, R., Axon, C. J., & Wallom, D. C. (2015). Impacts of raw data temporal resolution
using selected clustering methods on residential electricity load profiles. IEEE Transactions
on Power Systems, 30(6), 3217–3224.
36. Benítez, I., Quijano, A., Díez, J.-L., & Delgado, I. (2014). Dynamic clustering segmentation
applied to load profiles of energy consumption from spanish customers. International Journal
of Electrical Power & Energy Systems, 55, 437–448.
37. Al-Jarrah, O. Y., Al-Hammadi, Y., Yoo, P. D., & Muhaidat, S. (2017). Multi-layered clustering
for power consumption profiling in smart grids. IEEE Access, 5, 18459–18468.
30 1 Overview of Smart Meter Data Analytics
38. Koivisto, M., Heine, P., Mellin, I., & Lehtonen, M. (2013). Clustering of connection points
and load modeling in distribution systems. IEEE Transactions on Power Systems, 28(2),
1255–1265.
39. Chelmis, C., Kolte, J., & Prasanna, V. K. (2015). Big data analytics for demand response:
Clustering over space and time. In IEEE International Conference on Big Data, pp. 2223–
2232.
40. Varga, E. D., Beretka, S. F., Noce, C., & Sapienza, G. (2015). Robust real-time load profile en-
coding and classification framework for efficient power systems operation. IEEE Transactions
on Power Systems, 30(4), 1897–1904.
41. Al-Otaibi, R., Jin, N., Wilcox, T., & Flach, P. (2016). Feature construction and calibration
for clustering daily load curves from smart-meter data. IEEE Transactions on Industrial
Informatics, 12(2), 645–654.
42. Piao, M., Shon, H. S., Lee, J. Y., & Ryu, K. H. (2014). Subspace projection method based
clustering analysis in load profiling. IEEE Transactions on Power Systems, 29(6), 2628–2635.
43. Haben, S., Singleton, C., & Grindrod, P. (2016). Analysis and clustering of residential cus-
tomers energy behavioral demand using smart meter data. IEEE Transactions on Smart Grid,
7(1), 136–144.
44. Stephen, B., Mutanen, A. J., Galloway, S., Burt, G., & Järventausta, P. (2014). Enhanced load
profiling for residential network customers. IEEE Transactions on Power Delivery, 29(1),
88–96.
45. Sun, M., Konstantelos, I., & Strbac, G. (2016). C-vine copula mixture model for clustering
of residential electrical load pattern data. IEEE Transactions on Power Systems, 32(3), 2382–
2393.
46. Wang, Y., Chen, Q., Kang, C., & Xia, Q. (2016). Clustering of electricity consumption behavior
dynamics toward big data applications. IEEE Transactions on Smart Grid, 7(5), 2437–2447.
47. Labeeuw, W., & Deconinck, G. (2013). Residential electrical load model based on mixture
model clustering and markov models. IEEE Transactions on Industrial Informatics, 9(3),
1561–1569.
48. Xie, J., Girshick, R., & Farhadi, A. (2016). Unsupervised deep embedding for clustering
analysis. In International Conference on Machine Learning, pp. 478–487.
49. Zhang, T., Zhang, G., Jie, L., Feng, X., & Yang, W. (2012). A new index and classification
approach for load pattern analysis of large electricity customers. IEEE Transactions on Power
Systems, 27(1), 153–160.
50. Hong, T., & Fan, S. (2016). Probabilistic electric load forecasting: A tutorial review. Interna-
tional Journal of Forecasting, 32(3), 914–938.
51. Xie, J., Hong, T., & Stroud, J. (2015). Long-term retail energy forecasting with consideration
of residential customer attrition. IEEE Transactions on Smart Grid, 6(5), 2245–2252.
52. Hoiles, W., & Krishnamurthy, V. (2015). Nonparametric demand forecasting and detection of
energy aware consumers. IEEE Transactions on Smart Grid, 6(2), 695–704.
53. Wang, P., Liu, B., & Hong, T. (2016). Electric load forecasting with recency effect: A big data
approach. International Journal of Forecasting, 32(3), 585–597.
54. Xie, J., Chen, Y., Hong, T., & Laing, T. D. (2018). Relative humidity for load forecasting
models. IEEE Transactions on Smart Grid, 9(1), 191–198.
55. Xie, J., & Hong, T. (2017). Wind speed for load forecasting models. Sustainability, 9(5), 795.
56. Hong, T., Pinson, P., & Fan, S. (2014). Global energy forecasting competition 2012. Interna-
tional Journal of Forecasting, 30(2), 357–363.
57. Charlton, N., & Singleton, C. (2014). A refined parametric model for short term load fore-
casting. International Journal of Forecasting, 30(2), 364–368.
58. James Robert Lloyd. (2014). GEFCom2012 hierarchical load forecasting: Gradient boosting
machines and Gaussian processes. International Journal of Forecasting, 30(2), 369–374.
59. Nedellec, R., Cugliari, J., & Goude, Y. (2014). GEFCom2012: Electric load forecasting and
backcasting with semi-parametric models. International Journal of forecasting, 30(2), 375–
381.
References 31
60. Taieb, S. B., & Hyndman, R. J. (2014). A gradient boosting approach to the Kaggle load
forecasting competition. International Journal of Forecasting, 30(2), 382–394.
61. Hong, T., Wang, P., & White, L. (2015). Weather station selection for electric load forecasting.
International Journal of Forecasting, 31(2), 286–295.
62. Høverstad, B. A., Tidemann, A., Langseth, H., & Öztürk, P. (2015). Short-term load forecast-
ing with seasonal decomposition using evolution for parameter tuning. IEEE Transactions on
Smart Grid, 6(4), 1904–1913.
63. Fan, S., & Hyndman, R. J. (2012). Short-term load forecasting based on a semi-parametric
additive model. IEEE Transactions on Power Systems, 27(1), 134–141.
64. Goude, Y., Nedellec, R., & Kong, N. (2014). Local short and middle term electricity load
forecasting with semi-parametric additive models. IEEE Transactions on Smart Grid, 5(1),
440–446.
65. Ding, N., Bésanger, Y., & Wurtz, F. (2015). Next-day MV/LV substation load forecaster using
time series method. Electric Power Systems Research, 119, 345–354.
66. Ding, N., Benoit, C., Foggia, G., Bésanger, Y., & Wurtz, F. (2016). Neural network-based
model design for short-term load forecast in distribution systems. IEEE Transactions on
Power Systems, 31(1), 72–81.
67. Sun, X., Luh, P. B., Cheung, K. W., Guan, W., Michel, L. D., Venkata, S.S., & Miller, M. T.
(2016). An efficient approach to short-term load forecasting at the distribution level. IEEE
Transactions on Power Systems, 31(4), 2526–2537.
68. Borges, C. E., Penya, Y. K., & Fernandez, I. (2013). Evaluating combined load forecasting
in large power systems and smart grids. IEEE Transactions on Industrial Informatics, 9(3),
1570–1577.
69. Edwards, R. E., New, J., & Parker, L. E. (2012) Predicting future hourly residential electrical
consumption: A machine learning case study. Energy and Buildings, 49, 591–603.
70. Chitsaz, H., Shaker, H., Zareipour, H., Wood, D., & Amjady, N. (2015). Short-term electricity
load forecasting of buildings in microgrids. Energy and Buildings, 99, 50–60.
71. Mocanu, E., Nguyen, P. H., Gibescu, M., & Kling, W. L. (2016). Deep learning for estimating
building energy consumption. Sustainable Energy, Grids and Networks, 6, 91–99.
72. Shi, H., Xu, M., & Li, R. (2017). Deep learning for household load forecasting—a novel
pooling deep RNN. IEEE Transactions on Smart Grid, 9(5), 5271–5280.
73. Tascikaraoglu, A., & Sanandaji, B. M. (2016). Short-term residential electric load forecasting:
A compressive spatio-temporal approach. Energy and Buildings, 111, 380–392.
74. Yu, C.-N., Mirowski, P., & Ho, T. K. (2017) A sparse coding approach to household electricity
demand forecasting in smart grids. IEEE Transactions on Smart Grid, 8(2), 738–748.
75. Li, P., Zhang, B., Weng, Y., & Rajagopal, R. (2017). A sparse linear model and significance test
for individual consumption prediction. IEEE Transactions on Power Systems, 32(6), 4489–
4500.
76. Chaouch, M. (2014). Clustering-based improvement of nonparametric functional time se-
ries forecasting: Application to intra-day household-level load curves. IEEE Transactions on
Smart Grid, 5(1), 411–419.
77. Hsiao, Y.-H. (2015). Household electricity demand forecast based on context information and
user daily schedule analysis from meter data. IEEE Transactions on Industrial Informatics,
11(1), 33–43.
78. Teeraratkul, T., O’Neill, D., & Lall, S. (2017). Shape-based approach to household electric
load curve clustering and prediction. IEEE Transactions on Smart Grid, 9(5), 5196–5206.
79. Yang, J., Ning, C., Deb, C., Zhang, F., Cheong, D., Lee, S. E., Sekhar, C., & Tham, K. W.
(2017). k-shape clustering algorithm for building energy usage patterns analysis and forecast-
ing model accuracy improvement. Energy and Buildings, 146, 27–37.
80. Quilumba, F. L., Lee, W.-J., Huang, H., Wang, D. Y., & Szabados, R. L. (2015). Using smart
meter data to improve the accuracy of intraday load forecasting considering customer behavior
similarities. IEEE Transactions on Smart Grid, 6(2), 911–918.
81. Wijaya, T. K., Vasirani, M., Humeau, S., & Aberer, K. (2015). Cluster-based aggregate fore-
casting for residential electricity demand using smart meter data. In IEEE International Con-
ference on Big Data, pp. 879–887.
32 1 Overview of Smart Meter Data Analytics
82. Silva, P. G. D., Ilic, D., & Karnouskos, S. (2014). The impact of smart grid prosumer grouping
on forecasting accuracy and its benefits for local electricity market trading. IEEE Transactions
on Smart Grid, 5(1), 402–410.
83. Sevlian, R., & Rajagopal, R. (2018). A scaling law for short term load forecasting on varying
levels of aggregation. International Journal of Electrical Power & Energy Systems, 98, 350–
361.
84. Stephen, B., Tang, X., Harvey, P. R., Galloway, S., & Jennett, K. I. (2017). Incorporating
practice theory in sub-profile models for short term aggregated residential load forecasting.
IEEE Transactions on Smart Grid, 8(4), 1591–1598.
85. Wang, Y., Chen, Q., Sun, M., Kang, C., & Xia, Q. (2018). An ensemble forecasting method
for the aggregated load with subprofiles. IEEE Transactions on Smart Grid, 9(4), 3906–3908.
86. Moreno, J. J. M., Pol, A. P., Abad, A. S., & Blasco, B. C. (2013) Using the R-MAPE index
as a resistant measure of forecast accuracy. Psicothema, 25(4), 500–506.
87. Kim, S., & Kim, H. (2016). A new metric of absolute percentage error for intermittent demand
forecasts. International Journal of Forecasting, 32(3), 669–679.
88. Haben, S., Ward, J., Greetham, D. V., Singleton, C., & Grindrod, P. (2014). A new error
measure for forecasts of household-level, high resolution electrical energy consumption. In-
ternational Journal of Forecasting, 30(2), 246–256.
89. Hong, T., Wilson, J., & Xie, J. (2014). Long term probabilistic load forecasting and normal-
ization with hourly information. IEEE Transactions on Smart Grid, 5(1), 456–462.
90. PJM. (2015). PJM Load Forecast Report January 2015 Prepared by PJM Resource Adequacy
Planning Department. Technical report.
91. Hyndman, R. J., & Fan, S. (2010). Density forecasting for long-term peak electricity demand.
IEEE Transactions on Power Systems, 25(2), 1142–1153.
92. Xie, J., & Hong, T. (2016). Temperature scenario generation for probabilistic load forecasting.
IEEE Transactions on Smart Grid, 9(3), 1680–1687.
93. Dahua, G. A. N., Yi, W. A. N. G., Shuo, Y. A. N. G., & Chongqing, K. A. N. G. (2018). Em-
bedding based quantile regression neural network for probabilistic load forecasting. Journal
of Modern Power Systems and Clean Energy, 6(2), 244–254.
94. Black, J., Hoffman, A., Hong, T., Roberts, J., & Wang, P. (2018). Weather data for energy
analytics: From modeling outages and reliability indices to simulating distributed photovoltaic
fleets. IEEE Power and Energy Magazine, 16(3), 43–53.
95. Xie, J., Hong, T., Laing, T., & Kang, C. (2015). On normality assumption in residual simulation
for probabilistic load forecasting. IEEE Transactions on Smart Grid, 8(3), 1046–1053.
96. Bidong, L., Jakub, N., Tao, H., & Rafal, W. (2017). Probabilistic load forecasting via quantile
regression averaging on sister forecasts. IEEE Transactions on Smart Grid, 8(2), 730–737.
97. Wang, Y., Zhang, N., Tan, Y., Hong, T., Kirschen, D. S., & Kang, C. (2019). Combining
probabilistic load forecasts. IEEE Transactions on Smart Grid, 10(4), 3664–3674.
98. Xie, J., & Hong, T. (2017). Variable selection methods for probabilistic load forecasting:
Empirical evidence from seven states of the united states. IEEE Transactions on Smart Grid,
9(6), 6039–6046.
99. Gaillard, P., Goude, Y., & Nedellec, R. (2016). Additive models and robust aggregation for
GEFCom2014 probabilistic electric load and electricity price forecasting. International Jour-
nal of Forecasting, 32(3), 1038–1050.
100. Taieb, S. B., Huser, R., Hyndman, R. J., & Genton, M. G. (2016). Forecasting uncertainty in
electricity smart meter data by boosting additive quantile regression. IEEE Transactions on
Smart Grid, 7(5), 2448–2455.
101. Arora, S., & Taylor, J. W. (2016). Forecasting electricity smart meter data using conditional
kernel density estimation. Omega, 59, 47–59.
102. Zhang, P., Xiaoyu, W., Wang, X., & Bi, S. (2015). Short-term load forecasting based on big
data technologies. CSEE Journal of Power and Energy Systems, 1(3), 59–67.
103. Humeau, S., Wijaya, T. K., Vasirani, M., & Aberer, K. (2013). Electricity load forecasting
for residential customers: Exploiting aggregation and correlation between households. In
Sustainable Internet and ICT for Sustainability (SustainIT), pp. 1–6.
References 33
104. Zhong, S., & Tam, K.-S. (2015). Hierarchical classification of load profiles based on their
characteristic attributes in frequency domain. IEEE Transactions on Power Systems, 30(5),
2434–2441.
105. Wang, Y., Chen, Q., Kang, C., Xia, Q., & Luo, M. (2016). Sparse and redundant representation-
based smart meter data compression and pattern extraction. IEEE Transactions on Power
Systems, 32(3), 2142–2151.
106. Tong, X., Li, R., Li, F., & Kang, C. (2016). Cross-domain feature selection and coding for
household energy behavior. Energy, 107, 9–16.
107. Vercamer, D., Steurtewagen, B., Van den Poel, D., & Vermeulen, F. (2015). Predicting con-
sumer load profiles using commercial and open data. IEEE Transactions on Power Systems,
31(5), 3693–3701.
108. Kavousian, A., Rajagopal, R., & Fischer, M. (2013). Determinants of residential electricity
consumption: Using smart meter data to examine the effect of climate, building characteristics,
appliance stock, and occupants’ behavior. Energy, 55, 184–194.
109. McLoughlin, F., Duffy, A., & Conlon, M. (2012). Characterising domestic electricity con-
sumption patterns by dwelling and occupant socio-economic variables: An irish case study.
Energy and Buildings, 48, 240–248.
110. Han, Y., Sha, X., Grover-Silva, E., & Michiardi, P. (2014). On the impact of socio-economic
factors on power load forecasting. In IEEE International Conference on Big Data, pp. 742–
747.
111. Granell, R., Axon, C. J., & Wallom, D. C. (2015). Clustering disaggregated load profiles using
a dirichlet process mixture model. Energy Conversion and Management, 92, 507–516.
112. McLoughlin, F., Duffy, A., & Conlon, M. (2015). A clustering approach to domestic electricity
load profile characterisation using smart metering data. Applied energy, 141, 190–199.
113. Beckel, C., Sadamori, L., Staake, T., & Santini, S. (2014). Revealing household characteristics
from smart meter data. Energy, 78, 397–410.
114. Hopf, K., Sodenkamp, M., Kozlovkiy, I., & Staake, T. (2016). Feature extraction and filtering
for household classification based on smart electricity meter data. Computer Science-Research
and Development, 31(3), 141–148.
115. Sodenkamp, M., Kozlovskiy, I., & Staake, T. (2016). Supervised classification with inter-
dependent variables to support targeted energy efficiency measures in the residential sector.
Decision Analytics, 3(1), 1.
116. Wang, Y., Chen, Q., Gan, D., Yang, J., Kirschen, D. S., & Kang, C. (2018). Deep learning-based
socio-demographic information identification from smart meter data. IEEE Transactions on
Smart Grid, 10(3), 2593–2602.
117. Albert, A., & Rajagopal, R. (2013). Smart meter driven segmentation: What your consumption
says about you. IEEE Transactions on Power Systems, 28(4), 4019–4030.
118. Kwac, J., Flora, J., & Rajagopal, R. (2014). Household energy consumption segmentation
using hourly data. IEEE Transactions on Smart Grid, 5(1), 420–430.
119. Bai, Y., Zhong, H., & Xia, Q. (2016). Real-time demand response potential evaluation: A
smart meter driven method. In IEEE Power and Energy Society General Meeting, pp. 1–5.
120. Jindal, A., Kumar, N., & Singh, M. (2016). A data analytical approach using support vector
machine for demand response management in smart grid. In IEEE Power and Energy Society
General Meeting, pp. 1–5.
121. Dyson, M. E., Borgeson, S. D., Tabone, M. D., & Callaway., D. S. (2014). Using smart meter
data to estimate demand response potential, with application to solar energy integration.
Energy Policy, 73, 607–619.
122. Albert, A., & Rajagopal, R. (2015). Thermal profiling of residential energy use. IEEE Trans-
actions on Power Systems, 30(2), 602–611.
123. Albert, A., & Rajagopal, R. (2016). Finding the right consumers for thermal demand-response:
An experimental evaluation. IEEE Transactions on Smart Grid, 9(2), 564–572.
124. Mahmoudi-Kohan, N, Moghaddam, M. P., Sheikh-El-Eslami, M. K., & Shayesteh, E. (2010).
A three-stage strategy for optimal price offering by a retailer based on clustering techniques.
International Journal of Electrical Power & Energy Systems, 32(10), 1135–1142.
34 1 Overview of Smart Meter Data Analytics
125. Joseph, S., & Erakkath Abdu, J. (2018). Real-time retail price determination in smart grid
from real-time load profiles. International Transactions on Electrical Energy Systems.
126. Mahmoudi-Kohan, N., Moghaddam, M. P., & Sheikh-El-Eslami, M. K. (2010). An annual
framework for clustering-based pricing for an electricity retailer. Electric Power Systems
Research, 80(9), 1042–1048.
127. Maigha & Crow, M. L. (2014). Clustering-based methodology for optimal residential time of
use design structure. In North American Power Symposium (NAPS), pp. 1–6.
128. Li, R., Wang, Z., Chenghong, G., Li, F., & Hao, W. (2016). A novel time-of-use tariff design
based on gaussian mixture model. Applied Energy, 162, 1530–1536.
129. Wijaya, T. K., Vasirani, M., & Aberer, K. (2014). When bias matters: An economic assessment
of demand response baselines for residential customers. IEEE Transactions on Smart Grid,
5(4), 1755–1763.
130. Weng, Y., & Rajagopal, R. (2015). Probabilistic baseline estimation via gaussian process. In
IEEE Power & Energy Society General Meeting, pp. 1–5.
131. Zhang, Y., Chen, W., Rui, X., & Black, J. (2016). A cluster-based method for calculating
baselines for residential loads. IEEE Transactions on Smart Grid, 7(5), 2368–2377.
132. Hatton, L., Charpentier, P., & Matzner-Løber, E. (2016). Statistical estimation of the residential
baseline. IEEE Transactions on Power Systems, 31(3), 1752–1759.
133. Irish Social Science Data Archive. (2012). Commission for energy regulation (cer) smart
metering project. http://www.ucd.ie/issda/data/commissionforenergyregulationcer/.
134. Luan, W., Peng, J., Maras, M., Lo, J., & Harapnuk, B. (2015). Smart meter data analytics
for distribution network connectivity verification. IEEE Transactions on Smart Grid, 6(4),
1964–1971.
135. Peppanen, J., Grijalva, S., Reno, M. J., & Broderick, R. J. (2016). Distribution system low-
voltage circuit topology estimation using smart metering data. In IEEE/PES Transmission
and Distribution Conference and Exposition, pp. 1–5.
136. Weng, Y., Liao, Y., & Rajagopal, R. (2016). Distributed energy resources topology identifi-
cation via graphical modeling. IEEE Transactions on Power Systems, 32(4), 2682–2694.
137. Liao, Y., Weng, Y., & Rajagopal, R. (2016). Urban distribution grid topology reconstruction
via lasso. In IEEE Power and Energy Society General Meeting (PESGM), pp. 1–5.
138. Pappu, S. J., Bhatt, N., Pasumarthy, R., & Rajeswaran, A. (2017). Identifying topology of low
voltage distribution networks based on smart meter data. IEEE Transactions on Smart Grid,
9(5), 5113–5122.
139. Minghao, X., Li, R., & Li, F. (2016). Phase identification with incomplete data. IEEE Trans-
actions on Smart Grid, 9(4), 2777–2785.
140. Gungor, V. C., Sahin, D., Kocak, T., Ergut, S.,Buccella, C., Cecati, C., & Hancke, G. P.
(2013) A survey on smart grid potential applications and communication requirements. IEEE
Transactions on Industrial Informatics, 9(1), 28–42.
141. Tram, H. (2008). Technical and operation considerations in using smart metering for outage
management. In IEEE/PES Transmission and Distribution Conference and Exposition, pp.
1–3.
142. He, Y., Jenkins, N., & Jianzhong, W. (2016). Smart metering for outage management of
electric power distribution networks. Energy Procedia, 103, 159–164.
143. Kuroda, K., Yokoyama, R., Kobayashi, D., & Ichimura, T. (2014). An approach to outage
location prediction utilizing smart metering data. In 8th Asia Modelling Symposium (AMS),
pp. 61–66.
144. Jiang, Y., Liu, C.-C., Diedesch, M., Lee, E., & Srivastava, A. K. (2016). Outage management
of distribution systems incorporating information from smart meters. IEEE Transactions on
Power Systems, 31(5), 4144–4154.
145. Moghaddass, R., & Wang, J. (2017). A hierarchical framework for smart grid anomaly detec-
tion using large-scale smart meter data. IEEE Transactions on Smart Grid, 9(6), 5820–5830.
146. Zheng, J., Gao, D. W., & Lin, L. (2013). Smart meters in smart grid: An overview. In IEEE
Green Technologies Conference, pp. 57–64.
References 35
147. Andrysiak, T., Saganowski, Ł., & Kiedrowski, P. (2017). Anomaly detection in smart metering
infrastructure with the use of time series analysis. Journal of Sensors, 2017
148. Tcheou, M. P., Lovisolo, L., Ribeiro, M. V., da Silva, E. A., Rodrigues, M. A., Romano, J. M.,
& Diniz, P. S. (2014). The compression of electric signal waveforms for smart grids: State of
the art and future trends. IEEE Transactions on Smart Grid, 5(1), 291–302.
149. Unterweger, A., & Engel, D. (2015). Resumable load data compression in smart grids. IEEE
Transactions on Smart Grid, 6(2), 919–929.
150. Unterweger, A., Engel, D., & Ringwelski, M. (2015). The effect of data granularity on load
data compression. In DA-CH Conference on Energy Informatics, pp. 69–80.
151. Notaristefano, A., Chicco, G., & Piglione, F. (2013). Data size reduction with symbolic ag-
gregate approximation for electrical load pattern grouping. IET Generation, Transmission &
Distribution, 7(2), 108–117.
152. Tong, X., Kang, C., & Xia, Q. (2016). Smart metering load data compression based on load
feature identification. IEEE Transactions on Smart Grid, 7(5), 2414–2422.
153. Rottondi, C., Verticale, G., & Krauss, C. (2013). Distributed privacy-preserving aggregation
of metering data in smart grids. IEEE Journal on Selected Areas in Communications, 31(7),
1342–1354.
154. Sankar, L., Rajagopalan, S. R., & Mohajer, S. (2013). Smart meter privacy: A theoretical
framework. IEEE Transactions on Smart Grid, 4(2), 837–846.
155. Savi, M., Rottondi, C., & Verticale, G. (2015). Evaluation of the precision-privacy tradeoff
of data perturbation for smart metering. IEEE Transactions on Smart Grid, 6(5), 2409–2416.
156. Eibl, G., & Engel, D. (2015). Influence of data granularity on smart meter privacy. IEEE
Transactions on Smart Grid, 6(2), 930–939.
157. Kement, C. E., Gultekin, H., Tavli, B., Girici, T., & Uludag, S. (2017). Comparative analysis
of load-shaping-based privacy preservation strategies in a smart grid. IEEE Transactions on
Industrial Informatics, 13(6), 3226–3235.
Chapter 2
Electricity Consumer Behavior Model
2.1 Introduction
With the increasing integration of renewable energy and the advancement of the elec-
tricity market, the broad interaction between consumers and systems is an important
part of the future smart grid. As required by the increasing integration of renewable
energy, the power system should provide more flexibility to stabilize its fluctuation.
However, the consumers in traditional power system often “consume the electricity
passively”, and never actively participate in the interaction with the power system,
so the flexibility of the power system has yet to be further explored. In addition,
the opening of the electricity retail market objectively requires electricity retailers to
provide consumer-centric services to improve their competitiveness.
Fortunately, smart grid provides the all-around physical, information and market
supports for the broad interaction between the consumers and systems: (1) Physical
aspect: with the integration of distributed energy resources (DERs) such as renewable
energy and storage, the traditional electricity consumers turn into the “prosumers”,
and can reasonably control the electric equipment and energy storage to realize the
optimal utility. These DERs and control device lay the physical basis for the interac-
tion between consumers and systems. (2) Information aspect: the advanced metering
infrastructure (AMI) which consists of smart meter, communication network and
data management system, plays a vital role in collecting the smart meter data and
realizing the bidirectional flow between the energy flow and information flow [1]. It
provides an information communication basis for the interaction between consumers
and systems. (3) Market aspect: the open electricity retail market will cultivate var-
ious business models. The consumer service will be conducted from the aspects of
electricity price design, consumer agent, and demand response [2]. It provides a
market basis for the interaction between consumers and systems.
The power system is increasingly becoming a complex system with high “power-
cyber-social” coupling [3]. Since the modeling for power system from the purely
physical perspective is not enough to fully depict the whole picture of the power
system, the full consideration should be given to the impacts of environmental, eco-
nomic, and social factors and human behaviors for the entire power system. The study
on the power system with “cyber-physical” coupling characteristic has attracted broad
attentions [4]. It focuses on the impact of cybersecurity and big data technology on
the power system and provides a cyber perspective on the power system. However,
there are currently very few studies on the modeling of the social aspect of the
power system with “cyber-physical-social” deep coupling characteristic. Notably,
the modeling for “consumers” in the power system is insufficient since now.
As an essential part of the whole power system, the electrical power load has been
widely concerned and studied such as composite load modeling and load forecasting,
which provides the basis for the planning, operation and stability analysis of the whole
power system. The study on load also focuses on its electrical or power characteristics,
by either conducting the composite load modeling (such as building a ZIP model) for
the network computing of the power system, or conducting the sensitivity analysis
and forecasting of several relevant factors of the load for the planning and operation
of the power system. We know that the load is generated by the use of electrical
appliances by electricity consumers. The traditional studies on power systems mainly
focus on the load, rather than electricity consumers. These works fail to give full
consideration to the impact of electricity consumer behavior on the power system.
That is to say, the modeling of the demand side (such as composite load modeling
from the physical perspective) only considers the electrical characteristics of the
load rather than analyzing the massive consumers. That is to say, there have been
few analyses on electricity consumer behavior.
With the further development of the smart grid, there are extensive studies on
demand response, energy efficiency management, smart meter big data analytics,
etc. Some studies build optimization models from the physical perspective [5, 6];
while the others focus on the data-driven analysis and electricity price design for the
electrical power consumption patterns of consumers by clustering, etc. [7]. There are
also analytical studies on the power consumption behavior of consumers.
Researchers all around the world have conducted a significant number of studies
in terms of smart meter data analytics. These works have broad applications such as
demand response, electricity price design, system operation, etc. However, current
studies are often conducted by focusing on one specific application, which is similar
to the “process-oriented” programming, lack of recognition of systematization of
electricity consumer behavior, and has no “object-oriented” overall design. That
is to say, the current studies neither have accurately analyzed the exact meaning
2.1 Introduction 39
of electricity consumer behavior, nor have built the “consumer behavior” model
“systematically”, and the recognition of the consumer behavior is not improved to
the “system” or “model” level like that in “cyber-physical system”.
The study and application of behavioristics and sociology in various industries are
increasingly concerned. The Nature Research specially develops the online forum
for the researchers to discuss and share the study of behavioristics and sociology
and their applications in various industries where the energy industry is one of them
[8]. Therefore, modeling and analysis of the demand side can be conducted from
the sociology and behavioristics perspectives. The consumer in the power system
is a complex subsystem, which lacks analytical models for study. Thus, the model-
driven approach may not be suitable for the electricity consumer behavior modeling.
Nevertheless, the big data in the power systems provide a new data-driven solution
for consumer behavior analysis.
This chapter decomposes the basic components of electricity consumer behavior
from the sociological perspective and proposes the concept of electricity consumer
behavior model (ECBM) by analyzing the internal logical relationship among these
basic components. ECBM is then transformed into a series of consumer characteristic
attribute identification problems and their relationship analysis problems. The data-
driven research framework of ECBM is established by conducting prospective and
fundamental research in terms of consumer portrait, load structure, load profile, load
forecasting, and consumer aggregation. The following Chaps. 3–12 in this book can
be viewed as the approaches to electricity consumer behavior modeling.
The concept of the consumer behavior model has been widely used in the fields such
as supply chain management [9], software or web design [10], consumer portrait [11],
and intelligent recommendation [12] to realize the personalized consumer service.
The electricity consumer is one specific consumer in the power system. ECBM can
be viewed as an intersection between the consumer behavior model and the power
system.
2.2.1 Definition
The word “behavior” has various meanings, and may be interpreted in different
ways in different research fields. The electricity consumer behavior described in
this chapter is interpreted from the sociological and psychological perspectives: the
electricity consumer behavior refers to the electricity consumer’s power consump-
tion activities and attitudes under the impact of external environments. The power
consumption activities are dominant behaviors that can be measured or perceived
by the sensors such as smart meter; the power consumption attitudes (such as the
40 2 Electricity Consumer Behavior Model
appliances or equipment) according to its own attribute and the behavior environment
(external factors) at that time to generate the behavior result (forming the electric-
ity consumption), and realize the highest behavior utility (such as making profit).
The five components take a progressive relationship from the intrinsic electricity
consumer behavior to the presentative behavior and a successive relationship from
the recessive behavior to the dominant behavior. It should be pointed out that the
electricity consumer behavior has a different concept from the consumer’s power
consumption behavior. The consumer’s power consumption behavior only describes
the power characteristics of the consumer’s power utilization and is a consumer’s
dominant behavior. That is to say, the consumer’s power consumption behavior is an
important part of electricity consumer behavior.
For a single electricity consumer behavior, the spatial extension can be conducted,
i.e. aggregated behavior, which refers to the collection of multiple similar con-
sumers according to a consumer characteristic to form several consumer groups
having a similar characteristic; and the time extension can also be conducted, i.e.
foreseeable behavior, which refers to a changing trend of the consumer behavior
in a period of time in the future. The power consumption behavior forecasting (load
forecasting) is the most common extension.
On this basis, the ECBM can be defined as an abstract and standard expression
of electricity consumer behavior that reveals and describes the intrinsic charac-
teristics of the behavior subject, behavior environment, behavior means, behavior
result, behavior utility, foreseeable behavior, and aggregated behavior and their
relationships based on diversified information using optimization modeling, data
analytics, and other approaches. The consumer smart meter data analytics for a
specific application is similar to the “process-oriented” programming, which pro-
vides a specific solution for a specific application. However, the ECBM is similar
to “object-oriented” overall design, which involves five basic components and two
derivative components regarding the specific object of the consumer behavior, and
the behavior model describes the relationship between five behavior components and
two derivative components.
2.2.2 Connotation
The ECBM has the connotation covering the following aspects according to its
definition:
ECBM is based on diversified data: the popularization of smart meter provides
the basis for the wider and more fine-grained data collection at the demand side,
including the consumer’s smart meter data, electric vehicle charging and discharging
data, meteorological data, electricity price data, etc. The electricity consumer has a
certain ability of cognition and thinking, and the consumer can be regarded as the
most complex system in the world. For the modeling of physical components in the
power system, the priori physical model is provided, and then the parameters of the
model can be estimated. However, human behavior modeling is different, which is
42 2 Electricity Consumer Behavior Model
usually based on a lot of observed experience. Thus the modeling of human behavior
should be conducted based on the diversified data, rather than several simple physical
parameters.
ECBM takes the optimization modeling and data analytics as the main approaches:
For example, the consumer’s power consumption optimization model under a cer-
tain external environment can be built based on a certain assumption for the utility
function. Thus the consumer’s power consumption behavior can be analyzed. For
another example, there is no existing model to describe how the consumer’s social
and economic attributes affect the consumer’s load profile or how the consumer’s
load profile can reflect the consumer’s social and economic attributes. This could
be deemed as a high-dimensional and nonlinear mapping relationship. In this situ-
ation, the advanced data analytics approaches such as deep learning can be applied
to describe the relationship between the consumer’s social and economic attributes
and their load profiles.
ECBM describes the intrinsic characteristics of behavior components and their
relationships: a model generally includes objective, variables, and relationships. As
the consumer behavior has five basic components and temporal and spatial exten-
sions, the ECBM should be a collection of a series of submodels, and each submodel
describes the relationship among the consumer behavior components and has its
own objective, variables, and relationships. For example, taking the consumer’s load
profile as the variable and with the target of identifying the consumer’s social and
economic information, the consumer portrait identification submodel can be used to
build a high-dimensional and nonlinear relationship between these two. For another
example, taking the external environment and load profile as the variable and with
the target of stripping the distributed PV and energy storage, the load disaggregation
submodel for the consumer’s distributed PV and energy storage is used to build the
relationship among the final net load profile and external environmental factor.
2.2.3 Denotation
The ECBM has different forms of denotation according to the consumer’s basic types
and the submodels.
The basic types of consumer include the residential consumer, commercial con-
sumer, industrial consumer, building consumer, etc. Sometimes, the load aggregator
can be regarded as a type of consumer as they interact with the power system on
behalf of a group of consumers. Different types of consumers mean different types
of behavior subject. Therefore, the attributes to describe its basic characteristics
are also different. For example, for the residential customers, their portrait can be
described through such attributes as age, retirement, type of work and social class.
However, these attributes cannot be applicable to the building customers with the
“portraits” described by the number of floors, age of the building, installation of
energy management system and other attributes.
2.2 Basic Concept of ECBM 43
According to the submodels, the ECBM has complicated compositions and inter-
nal interactions. Therefore, it is difficult to build only one complete relationship
to describe the relationships among the five basic components and the spatial and
temporal-scale extensions. It needs a series of submodels to describe the mapping
relationship between the two or more components. For example, the mapping rela-
tionship between the behavior subject and behavior result could be complicated; the
relationship between the behavior means and behavior result is a simple additive rela-
tionship; and the relationship between the behavior environment and the behavior
means could also be complicated, but can be described with PV panel energy con-
version model with regard to the distributed PV. Consumer behavior has numerous
submodels, which will be detailed in the research framework.
overload” problem. Thus, it is better to build an ECBM for each consumer, includ-
ing the consumer portrait, load structure, load pattern, load trend, and even power
consumption attitude, then reducing the consumer’s range of service selection in
terms of electricity price package, demand response and goods recommendation in
the retail electricity market, and conducting the personalized recommendation or
actively providing the corresponding services. In addition, the power system has
massive consumers; therefore, only by building the ECBM can the behavior of
electricity consumer be abstracted to a certain extent, thereby improving service
efficiency. Therefore, the ECBM is an application and expansion of the consumer
behavior model in the power system, as shown in Fig. 2.2.
(2) From composite load model to ECBM
The electricity consumer plays a crucial role in the smart grid and the Energy Internet.
It is not sufficient to comprehensively model the whole power system by only focus-
ing on the physical characteristics of the power system. The modeling of electricity
consumer behavior should be fully considered in order to mine its interaction char-
acteristics. Although the study on demand response has involved consumer behavior
and interaction, it mainly focuses on the arrangement of consumer’s electric appli-
ance and other more microscopic physical models. We need to model the consumer
behavior more comprehensively, and especially analyze from the sociological and
psychological perspectives so as to truly realize the value creation of the power
system with the electricity consumer as its core.
For the whole power system, the synchronous generator set, power network, load,
power, and electronic equipment are the most important and basic components. The
generator, excitation system, prime mover speed governing system, and compos-
ite load modeling in the synchronous generator set are very complicated, and their
parameters form the “four parameters” needing to be mainly identified by the tradi-
tional power system. Identification of the “four parameters” provides the support to
the safe, stable and economic operation of the traditional power system. Figure 2.3
shows the basic components of parameter identification of the power system. On the
basis of original traditional “identification of four parameters”, the ECBM is added,
which focuses more on the power consumption behavior of the consumer at the
demand side, and tries to find the underlying basic laws of the consumer throughout
the power consumption process. Extension from the composite load modeling to the
consumer behavior modeling at the demand side is a transformation in perspective
and thinking and is a brand-new component of the power system model. The com-
posite load modeling and consumer behavior modeling constitute the two sides of
modeling at the demand side.
2.3 Basic Characteristics of Electricity Consumer Behavior 45
Fig. 2.3 From consumer behavior model to electricity consumer behavior model
The electricity consumer behavior has the following basic characteristics: near-
optimality of utility, initiative, foreseeability, diversity, uncertainty, high-dimensional
complexity, cluster characteristics, and weak observability, all of which will also
become the basis for ECBM. They will be respectively elaborated in the following:
(1) Near-optimality of utility
The consumer, as the person having the ability of cognition and thinking, finally
has the power consumption behavior due to the impact of the external environment,
and meets their daily or specific demands by using or controlling certain electrical
equipment, thereby maximizing the utility. In the consumer’s demand response and
home energy management system, its internal setting realizes the lowest power cost
by reasonable arrangement and use of electrical equipment to meet the consumer
comfort. The consumer cannot conduct precise modeling to their power consumption
behavior and obtain the optimum like software programming but tends to increase
their power utility and reduce their power cost.
(2) Initiative
The consumers do not only consume the power supplied by the power system pas-
sively, but also have certain subjective initiative, and actively changes their power
consumption behavior according to the changes in the external environments, to real-
ize the near-optimality of utility. The programs currently conducted, such as demand
response and energy efficiency management, require fully mobilizing the consumer’s
subjective initiative, and transforming the traditional “passive load” into the “active
load”.
(3) Diversity
Different consumers have different utility functions and own different electric appli-
ance. In addition, the external environments suffered by consumers in different areas
are also different. Thus, the behavior results of different behavior subjects in different
46 2 Electricity Consumer Behavior Model
time periods and under the different environments are diverse, including the diversity
of different consumers and the diversity of the same consumer at different periods.
(4) Foreseeability
Due to the near-optimality of the utility of the consumer, consumer behavior has cer-
tain inherent laws. When certain laws are detected, various behaviors of the consumer
can be forecasted to a certain extent. For example, the load profile of the consumer
in a certain time period in the future can be forecasted according to historical load
profiles of the consumer. The basic patterns of consumer’s future consumption can
also be speculated through their social and economic information. The foreseeability
of consumer behavior comes from the stability of the same consumer behavior and
the similar laws of different consumer behaviors.
(5) Uncertainty
Consumer behavior not only has foreseeability but also has uncertainty. In essence,
the consumer’s power consumption behavior is the result of superimposing a series
of random events on the basis of their long-term work and living habit. Therefore,
there is inevitable uncertainty in the ECBM. Uncertainty may either come from the
consumer’s random behavior caused by purely random events or come from the
model deviation caused by the consumer’s regular behavior that is not identified. For
the short term, there may be a difference in ECBM in the different periods within a
day, working days, and weekend. For the long term, the ECBM will change with the
change in lifestyle, upgrading of consumption level, and improvement of intelligent
level of electric appliance. Therefore, the ECBM cannot be built without the depiction
of its uncertainty.
(6) High-dimensional complexity
The ECBM involves a series of basic attributes of the consumer. As the natural human
attribute and social attribute have high complexity, human behavior has multiple
complex sides. Several simple attributes cannot be used to depict the ECBM in
all dimensions. The ECBM will certainly have the high-dimensional complexity.
“There are no two identical leaves in the world, not to mention the two identical
people”. Each consumer will be an instance in the high-dimensional ECBM space.
Moreover, the consumer’s power consumption behavior is closely related to their
production and life, and human behavior has high subjectivity. Therefore, compared
with the objective law, the consumer behavior model usually has no existing analytic
mathematical expression but has complicated non-analytic and nonlinear association
relationship.
(7) Cluster characteristics
The human production activity has social nature, so the electricity consumer behavior
shows certain cluster characteristics. That is to say, the ECBM of different consumer
individuals independently forms a series of groups in terms of attribute space or its
subspace. The behavior of consumer tends to be the same in each group and has
significant differences in different groups. The consumer’s cluster characteristics
2.3 Basic Characteristics of Electricity Consumer Behavior 47
provide the clue for the clustering analysis and aggregation modeling of the consumer
model.
(8) Weak observability
The electricity consumer behavior is complex and changeable. The information inter-
action between the power system and electricity consumer is usually conducted
through the smart meter, thereby realizing the direct observation of the load profile
and other dominant behaviors. Its internal power consumption behaviors, including
the power consumption behavior of single electrical appliance, output of distributed
PV, response behavior of distributed energy storage, consumer attitude and other
recessive behaviors, cannot be directly observed. Accordingly, the power system
integrates more diversified and more fine-grained data to meet the challenge brought
by this weak observability.
The ECBM is a model to describe the intrinsic characteristics and relationship among
the main components and its extensions of the electricity consumer behavior. The
main components of the electricity consumer behavior should be mathematically
defined to describe the ECBM in a standard manner. The relevant mathematical
notations are summarized in Table 2.1.
The electricity consumer behavior subject, i.e. the consumer, can use a series of
(such as J ) attributes to describe and thus forming the relatively complete consumer
portrait. Accordingly, the consumer attribute space C is defined, The attribute set
in this space is C = [c1 , c2 , . . . c j , . . . c J ], where each element c j in the attribute
set C represents a consumer attribute, such as consumer type, age, social class,
children, interests and preferences, and other information. The consumer attributes
have various expressions forms, including the continuous variable, discrete variable,
fuzzy variable, characteristic matrix, probabilistic expression of quantile, interval,
or probability distribution. For example, the social and economic information of
the consumer, such as age and retirement, shall be expressed with the continuous
or discrete variable; the consumer’s acceptance for the smart home installed can
be expressed with the fuzzy number; the power consumption uncertainty of future
consumer shall be expressed in probabilistic form. As the consumer portrait is time-
varying in a long time scale, including the change in age and occupation, we use
Cit = [ci,1
t
, ci,2
t
, . . . ci,t j , . . . ci,J
t
] to indicate the complete portrait of the ith consumer
at the time t.
The electricity consumer behavior environment is the external factors stimulating
or affecting the electricity consumer behavior, which are also diversified. Similarly,
the behavior environment factor space E is defined. The environmental factor set
in this space is E = [e1 , e2 , . . . ek , . . . e K ], where each element ek in the environ-
mental factor set E represents an environmental factor, such as the power network
topology, external temperature, illumination intensity, and electricity price. We use
Eit = [ei,1
t
, ei,2
t
, . . . ei,k
t
, . . . ei,K
t
] to indicate the environment of the ith consumer at
the time t.
The electricity consumer behavior means is the electrical equipment that the con-
sumer uses the electricity to improve their own utility, including the household appli-
ances, distributed energy storage, and distributed PV. The set of consumer’s electrical
equipment is defined as A, and the operating state of the ath electrical equipment
t
is directly decided by the power Pi,a consumed or generated by it. The electricity
consumer behavior result is the final power exchanged with the power grid, which is
defined as Pi,t .
The electricity consumer behavior utility (i.e. Oi ) varies by the consumer attribute,
external environment and state of electrical equipment, Therefore, the behavior utility
t
is a function related to Ci , Ei and Pi,a , which is defined as gi .
So far, the five components of the ith electricity consumer behavior are respec-
tively expressed as: behavior subject Cit , behavior environment Eit , behavior means
t
Pi,a , behavior result Pit , and behavior utility git . It’s worth noting that, all basic com-
ponents of the consumer behavior are time-varying. The behavior subject attribute
and behavior utility function often change slowly, and can be approximately deemed
as no change over a period of time; and the behavior environment changes fast,
causing the changing behavior means and behavior result.
The electricity consumer has the near-optimality of utility, so the behavior subject
Cit realizes the maximum behavior utility git by adopting the behavior means Pi,a t
t
under the behavior environment Ei . The behavior subject Ci , behavior environment
t
Ei and behavior means Pi,a are coupled through the utility function git :
t
arg max Oi = arg max git (Pi,a ) Ct ,Et (2.1)
P P i i
t
2.4 Mathematical Expression of ECBM 49
As the consumer does not completely pursue the utility optimization in a ratio-
nal manner, but the “near-optimality of utility” to a certain extent, the consumer’s
t
behavior means Pi,a may be affected by the consumer habit and other various factors,
and thus shows uncertainties. That is to say, whether the consumer uses or how the
consumer uses equipment could be regarded as a random variable having a certain
expectation, which is also the direct cause that the power consumption has high
uncertainty.
Without considering transmission network loss, there is a simple linear additive
t
relationship between the behavior means Pi,a and behavior result Pit . That is to say,
the final behavior result or behavior mode is equal to the sum of consumption of all
kinds of electrical equipment:
Pit = t
Pi,a (2.2)
a∈A
Except for the five basic components of the electricity consumer behavior, the
aggregation behavior, extension of the consumer behavior in space, essentially refers
to dividing the consumer group according to a characteristic of the consumer, i.e.
dividing the consumer set I into N subgroups, where each consumer belongs to one
of the N subgroups:
N
max Pr ob(Fit i ∈ Sn )
S1 ,S2 ,...,S N
n=1 i∈Sn
(2.3)
s.t. S1 ∪ S2 ∪ · · · ∪ S N = I
S1 ∩ S2 ∩ · · · ∩ S N = ∅
where, Fit denotes one characteristic which is used for dividing the consumer groups,
such as consumer age, composition of consumer’s electrical equipment, and shape
of load profile. The objective function refers to the maximum probability of the
observed characteristic when the consumer is divided into a specific group; the two
constraints indicate that each consumer can only be divided into one group. The
consumer group can be divided using clustering algorithm.
The foreseeable behavior, extension of the consumer in time, is generally for
t+h
the behavior means and behavior result, i.e. future change trend of power Pi,a or
t+h
total exchange power Pi of specific electrical equipment in a time period in the
future. Essentially, the foreseeing of future consumer behavior refers to uncovering
the relationship f i,a or f i within the historical data, and forecasting the future power
consumption behavior according to the historical behavior:
t+h
P̂i,a = f i,a (Cit , Eit , Êit+h , Pi,a
t
, t)
(2.4)
P̂it+h = f i (Cit , Eit , Êit+h , Pit , t)
where the superscript of t refers to the historical value and current value of variables;
Êit+h indicates the forecasting value of the future behavior environment; P̂i,a t+h
and
50 2 Electricity Consumer Behavior Model
The ECBM is composed of a series of submodels that describe the intrinsic character-
istics or their relationship among consumer behavior components. Each submodel
can be abstracted to the form of Y = h(X ), which tries to identify one behavior
attribute Y of the consumer, given another consumer behavior information X . h(·) is
the function to be trained. That is to say, the ECBM is established by identifying the
consumer behavior attribute Y . This section will introduce the research framework
of the ECBM, including the basic research paradigm and research contents.
Figure 2.4 gives a basic research paradigm of the ECBM, which mainly includes
three modules, i.e. data collection, consumer behavior model, and consumer inter-
action. In the three modules, the data collection is the basis, the consumer behavior
model is the core, and the interaction between consumers and systems is the pur-
pose. The three modules are progressive successively to form a closed loop, thereby
realizing the continuous updating and optimization of the ECBM.
Specifically, in the data collection module, various data related to the consumer’s
characteristics shall be widely collected. There are two ways to collect the data: (1)
active collection, such as smart meter data, meteorological data, and electricity price
data; and (2) consumer feedback, including the direct feedback data (for example,
whether interested in a demand response program) and indirect feedback data (for
example, consumer’s power consumption at the different electricity prices).
2.5 Research Paradigm of ECBM 51
The consumer model module mainly includes three steps: the consumer attribute
definition, consumer attribute identification, and ECBM updating:
(1) Firstly, different consumer attributes need to be defined from different aspects
according to the diversified requirements of power system for the consumer, such as
implementation of demand response, electricity price design, and recommendation
of personalized electrical appliance and other commodities. Generally, the attributes
cab be sorted out from the aspects of endogenous attributes, behavior attributes, and
preference attributes. The details are discussed in the next sections.
(2) Secondly, the attributes need to be identified. This step is the key of the whole
ECBM, which needs to determine the expression forms of different attributes and
the identification method of each attribute. For example, the expression form of
the electricity consumer uncertainty is presented as a series of quantiles, and the
identification method is the probabilistic quantile regression method.
(3) Finally, the ECBM needs to be updated, i.e. updating the set formed by all
attributed values. The updates include directly substituting the original result with
the latest result, or comprehensively considering the attribute value calculated latest
and historical attribute value with the weight decay.
In the research paradigm of ECBM, the consumer behavior model is the core which
is established mainly based on the consumer attribute definition. The electricity
consumer attribute shall have the following four characteristics:
(1) The attribute should be defined for real applications: The consumer attribute
is the standard expression for describing the electricity consumer characteristics.
The consumer is complicated and shall be comprehensively depicted with massive
attributes/However, the purpose of establishing the consumer behavior model is to
realize the personalized service for the consumer and the optimization and interaction
between the consumer and the power grid. Therefore, the consumer attribute should
be screened, and the important attributes that have great application potential in
52 2 Electricity Consumer Behavior Model
the power system shall be reserved. For example, the socioeconomic information
of the consumer shall be detected and applied in terms of voice service, electrical
appliance promotion; for another example, the identification of power consumption
pattern provides the basis for the time-of-use electricity pricing.
(2) The attribute may drift: The consumer attributes are not unalterable but may
change over time. For the attribute drift, the consumer attribute shall be modified
in real-time or regularly, such as the attribute modification method based on the
weight decay. To timely establish and update the variable consumer behavior model,
various consumer behavior data including the smart meter data, meteorological data,
electricity price data, and questionnaire data shall be reacquired by scrolling or
periodically. On this basis, the core relationship and parameter of the consumer
behavior model are updated or modified.
(3) The attribute should be consistent: The internal consistency shall be ensured
among the consumer attribute sets. Different attributes are used for depicting the
different aspects of the consumer. The obtained attribute values cannot contradict
each other but should verify each other and depict the consumer and their power
consumption characteristics as fully as possible.
(4) The attribute can be evaluated: Different attribute values have different forms
of expression, but these forms of expression shall be able to be evaluated to guide
the data acquisition and attribute identification. For example, the probability model
can be evaluated through the quantile loss, and the classified discrete value can
be evaluated through the accuracy or classification entropy. All attributes shall be
expressed with the specialized value, and have corresponding evaluation indexes,
including the qualitative evaluation and quantitative evaluation.
According to the above basic characteristics, Fig. 2.5 concludes several consumer
attributes tp reflect the electricity consumer behavior from the perspectives of endoge-
nous attributes, consumption attributes, and preference attributes by taking the resi-
dential customer as an example.
Figure 2.6 concludes the multi-dimensional research framework of ECBM and its
analysis method according to the components of consumer behavior.
For the behavior subject, the consumer portrait can be described, including the
consumer’s basic attributes such as sex and age, occupation and salary, social class
and state of the house, and the consumer preference attribute such as demand response
willingness, and power consumption preference. Figure 2.7 gives the average weekly
load curve of three consumers and their corresponding socioeconomic information.
This kind of relationship can help to obtain the socio-economic information of some
consumers from the load profiles conveniently and more intuitively. The power con-
sumption of the retired consumer #1018 at the working hours is also maintained at
the higher level, while that of the consumers #1020 and #1032 that have not retired
is relatively low at the working hours except for the weekends, both of which accord
with the working states of the three consumers. The consumer #1032 has a small
number of bedrooms, and their power level is also relatively low. The consumer
#1018 having children in the family still has higher power level at late night, which
may be because that the house of this consumer is bungalow (similar to villa), all
family members live together, and other members of some families still keep the
2.6 Research Framework of ECBM 53
Fig. 2.7 Illustration of the correspondence between load profiles and characteristics of consumers
active power at night to look after the children at night. The three consumers keep
the active power from 6:00 to 8:00 PM, which accords with the power habit of the
general family. Chapter 10 builds a bridge between the consumer’s power consump-
tion and their social and economic information with the deep convolutional neural
network.
The behavior means refers to interpreting the structural analysis of consumer’s
power consumption behavior in two aspects. One is to directly decompose the
operating state of one or some equipment according to the total load profile. The
non-intrusive Load Monitoring (NILM) is the important approach to conduct the
structural analysis of the power consumption behavior of the residential customer
and even building customer, which decomposes the load into several power curves
of single electric equipment with more fine-grained smart meter data. The study on
NILM can be traced back to the 1970s, but the current relevant studies do not fully
consider the impact of access to distributed renewable energy and energy storage.
The other interpretation is to analyze the different components of the consumer.
For example, the consumer’s power consumption behavior structure is analyzed as
the meteorological sensitive component, electricity price-sensitive component, and
2.6 Research Framework of ECBM 55
basic power component, etc., or analyzed as the seasonal component, weekly com-
ponent, daily component, etc., or analyzed as the low-frequency stable component,
high-frequency random component, etc.
The behavior result can be used for identifying the various indexes, such as con-
sumer’s basic power consumption patterns, dynamic characteristics, and uncertainty
of power consumption. The consumers’ power consumption pattern can be extracted
by clustering of the load curve. Chapter 8 re-examines the consumer’s load profile
from the sparse perspective, believing that the consumer’s load profile is essentially
the superposition of several power consumption behaviors, as shown in Fig. 2.8.
Then, the consumer behavior mode extracting problem is modeled as a sparse cod-
ing problem, which can effectively identify the consumer’s partial usage pattern, as
well as compress the massive smart meter data.
For the foreseeable behavior, the estimation of future power consumption behav-
ior may have different time scales, such as ultra-short-term, short term, and medium
and long term. The consumer load forecasting is a typical foreseeing for the behavior
result to describe the uncertainty of consumer’s power consumption behavior in the
future. Currently, researchers around the world conduct more and more probabilistic
load forecasting studies facing a single consumer. For example, Chap. 12 proposes
a quantile long and short- term memory network model to conduct the probabilistic
forecasting for the single consumer. Figure 2.9 gives a typical illustration for proba-
56 2 Electricity Consumer Behavior Model
bilistic load forecasting of ultra-short time residential customer, which describes its
future uncertainty by a series of quantiles.
For the aggregation behavior, the consumers are grouped according to the different
standards, i.e. a consumer behavior characteristic, such as identifying the group
according to the consumer’s basic attribute, use of electrical Fig. 2.10.
2.7 Conclusions 57
2.7 Conclusions
This chapter proposes the basic concept of ECBM, decomposes the basic compo-
nents of the consumer behavior, including the behavior subject, behavior environ-
ment, behavior means, behavior result, and behavior utility, then further extends
to the aggregation behavior and foreseeable behavior. On this basis, the theoreti-
cal research framework of ECBM is proposed through several illustrations. This
chapter is expected to provide the reference for the study of ECBM, build the data-
driven consumer-centric research and application, and further promote the interaction
between consumers and systems in the context of Energy Internet.
References
1. Wang, Y., Chen, Q., Kang, C., Zhang, M., Wang, K., & Zhao, Y. (2015). Load profiling and its
application to demand response: A review. Tsinghua Science and Technology, 20(2), 117–129.
2. Wang, Q., Zhang, C., Ding, Y., Xydis, G., Wang, J., & Østergaard, J. (2015). Review of real-time
electricity markets for integrating distributed energy resources and demand response. Applied
Energy, 138, 695–706.
3. Xue, Y., & Xinghuo, Y. (2017). Beyond smart grid-cyber-physical-social system in energy
future [point of view]. Proceedings of the IEEE, 105(12), 2290–2292.
4. Xin, S., Guo, Q., Sun, H., Chen, C., Wang, J., & Zhang, B. (2017). Information-energy flow com-
putation and cyber-physical sensitivity analysis for power systems. IEEE Journal on Emerging
and Selected Topics in Circuits and Systems, 7(2), 329–341.
5. Palensky, P., & Dietrich, D. (2011). Demand side management: Demand response, intelligent
energy systems, and smart loads. IEEE transactions on Industrial Informatics, 7(3), 381–388.
6. Siano, P. (2014). Demand response and smart grids-a survey. Renewable and Sustainable
Energy Reviews, 30, 461–478.
7. Yang, J., Zhao, J., Wen, F., & Dong, Z. (2018). A model of customizing electricity retail prices
based on load profile clustering analysis. IEEE Transactions on Smart Grid, 10(3), 3374–3386.
8. Behavioural and social sciences at nature research. https://socialsciences.nature.com/.
9. Harland, C. M. (1996). Supply chain management: Relationships, chains and networks. British
Journal of Management, 7, S63–S80.
10. Koufaris, M., Kambil, A., & LaBarbera, P. A. (2001). Consumer behavior in web-based com-
merce: An empirical study. International Journal of Electronic Commerce, 6(2), 115–138.
11. Kooti, F., Lerman, K., Aiello, L. M., Grbovic, M., Djuric, N., & Radosavljevic V. (2016). Portrait
of an online shopper: Understanding and predicting consumer behavior. In Proceedings of the
9th ACM International Conference on Web Search and Data Mining, pp. 205–214. ACM
12. Koufaris, M. (2002). Applying the technology acceptance model and flow theory to online
consumer behavior. Information Systems Research, 13(2), 205–223.
Chapter 3
Smart Meter Data Compression
Abstract The huge amount of household load data requires highly efficient data
compression techniques to reduce the great burden on data transmission, storage,
processing, application, etc. This chapter proposes the generalized extreme value
distribution characteristic for household load data and then utilizes it to identify load
features, including load states and load events. Finally, a highly efficient lossy data
compression format is designed to store key information of load features. The pro-
posed feature-based load data compression method can support highly efficient load
data compression with little reconstruction error and simultaneously provide load
feature information directly for applications. A case study based on the Irish Smart
Metering Trial Data validates the high performance of this new approach, including
in-depth comparisons with the state-of-the-art load data compression methods.
3.1 Introduction
Smart meters typically capture the domestic loads accumulated over a 30 min period,
offering a previously unknown degree of insight into the behavior in an individual
dwelling as an aggregation of appliance loads [1]. With the rollout of smart meters,
there is an explosive increase in smart metering load data. The yearly volume of
load profile data for the 1.658 million households in Ireland (statistics obtained from
Central Statistics Office) could amount to 216 GB. Compared with Ireland, in which
the number of households is relatively small, the volume of load profile data generated
by the 230 million smart meters installed by the State Grid Corporation of China is
estimated to be 29 TB each year. It should be noted that all encapsulating identifiers
and length fields are omitted to treat different data formats equally. Hence, the real
volume of load profile data is larger.
The accompanying hundreds of millions of load profile data recorded by smart
meters have also caused “big data” problems covering data transmission, storage,
processing, and application, etc. Smart meters are typically connected with narrow-
band powerline communication (PLC) links and upload load data to the aggregator
installed in the transformer. Owing to the limited bandwidth, the reliability of data
transmittance will decrease with increasing data volume [2]. The storage requirement
and processing time would also increase with increasing data. However, the volume
of smart meter data exasperates these applications. Compressing load profile data
allows for a substantial reduction in data volume, thus providing a highly efficient
framework to transmit, store, and process these load profile data.
Data compression can be divided into either lossy compression or lossless com-
pression. Lossy compression typically reduces bits through identifying unneces-
sary information in the data and removing it, whereas lossless compression usu-ally
reduces bits through eliminating statistical redundancy. Lossy compression drop
nonessential detail and retain information key to the data’s applications from the
data source; thus it can be mainly applied to accelerate similarity search, which
supports important load data mining applications such as load profiling [3, 4] and
customer segmentation [5–7]. The similarity between two load profiles is typically
measured with distance index like Euclidean distance. The similarity within com-
pressed load data through lossy data compression can be calculated more efficiently
compared to lossless data compression because distance calculation between part
information is faster than the complete information.
In terms of load profile data compression, a resumable load data compression
(RLDC) method is proposed in [2]. This method is mainly based on the differential
coding method. In this method, for a load profile, the first load value is recorded com-
pletely, and the following data are the value differences between consecutive load
values. Most consecutive values of load profiles in households exhibit little value dif-
ference; thus, the difference can be stored by fewer bits, thereby conserving storage.
This method can accomplish resumable data compression with improved compres-
sion efficiency by orders of magnitude compared with transmission encodings that
are currently used for electricity metering. However, because of the differential cod-
ing technique, the compressed data record the difference between consecutive load
values rather than the original load values or symbols marking the load level, thus
making it inconvenient and inefficient for direct processing by data mining methods.
Reference [8] exploits the effects of using the symbolic aggregate approximation
(SAX) method [9] to do lossy data compression. By symbolizing the average load
value in a fixed time window, this method provides high compression efficiency,
and the compressed data can be easily processed by data mining methods. How-
ever, the compressed data lose some of the high-frequency signals; hence, the data
reconstruction precision for this lossy method is not high.
Above all, there is an urgent requirement to design a smart meter data compression
method that can provide high compression efficiency, high reconstruction precision,
and a simple data compression format for applications. Here, we propose a feature-
based load data compression method (FLDC). The method is a lossy smart metering
load data compression method which is designed to fulfill the above requirement. In
the method, the general extreme value (GEV) distribution characteristic of household
load data is validated and utilized to identify load features such as load states and
load events for low-resolution load data. The identified load features are stored in
the proposed highly efficient data compression format, which can support highly
efficient load data compression with little compression error and simultaneously
provide load feature information directly for application. With the method presented
3.1 Introduction 61
in this chapter, this compressed data volume will be only 1.8% of the original data
volume, reducing Irish smart meter data from 216 GB to 3.88 GB and China’s smart
meter data from 29 TB to 0.52 TB (assuming data properties similar to the test data).
Pn,t − Pn,t−1
rn,t = (3.1)
Pn,max
where Pn,t is the load at interval t of day n, Pn,t−1 is the load at interval t − 1, and
Pn,max is the peak load on day n.
The cumulative probability versus consecutive value difference rate plot illustrates
that 70% of the load values exhibit a consecutive value difference rate smaller than
62 3 Smart Meter Data Compression
Fig. 3.1 Cumulative probability versus consecutive value difference rate for household #1008
10%, which suggests that most value differences are smaller than 10% of the daily
peak load. This small difference allows household load data to be compressed because
most load values would be the same if the 10% value difference is ignored. If we
count only the load values below 50% of the daily peak load, the probability will
increase to 78%. As the load values decrease below 10% of the daily peak load, the
probability increases to 95%.
The cumulative probability plot illustrates that when the household load is at a
very low level, it is stable and exhibits little change. As the load level improves, the
household load becomes unstable and shows a large change rate.
Fig. 3.2 Four best distributions for the household load of household #1008. The GEV distribution
provided the best fit. The bin interval for the empirical histogram equals 0.0769 kWh
and finally choose the best four distributions, including the GEV distribution, the
exponential distribution, the generalized Pareto distribution and the t-distribution.
Here we show the result of the typical household #1008 in detail in Table 3.1 and
Fig. 3.2, and illustrate the summary result of all the households in Table 3.2. As shown
in Fig. 3.2, the GEV distribution significantly outperforms the other distributions.
The Akaike information criterion (AIC) is often used to evaluate a distribution fit. A
lower AIC value means a better distribution fit. According to the AIC values shown
in Tables 3.1 and 3.2, the AIC values of the GEV distribution are the lowest; thus the
GEV distribution performs best in the distribution fits of the household load data.
Our distribution fitting result show that the GEV distribution also fits smart meter
data well. The GEV distribution is often used to model the probability of extreme
events. It performs well because domestic events of consumption electricity highly
resemble extreme events. For most times of day, residents do not require the use of
high-power appliances such as ovens, washers, dryers, air conditioners, and electric
water heater; thus, the load remains at a low level. When residents switch on high-
64 3 Smart Meter Data Compression
Table 3.2 Statistics of AIC value for the distribution fits of all the 4225 households
AIC Generalized Exponential Generalized t location scale
extreme value Pareto
Mean −2058 9903 2929 14550
Median 3833 13350 11551 19924
25% quantile −20885 −8832 −12477 −5560
75% quantile 22859 31208 30215 40461
power appliances, the load will soon increase to a high level but typically will not
maintain it for a long time. This behavior pattern of domestic electricity consumption
leads to the load typically remaining at a low level for most of the time and a high
level for rare occasions.
The GEV distribution combines the three possible types of limiting distributions
for extreme values into a single form. The distribution function is
(x − u) − 1
F(x) = exp[−{1 + k } k ], k == 0
σ (3.2)
(x − u)
= exp[− exp{− }], k = 0
σ
with x bounded by u − σ/k from below if k > 0 and from above if k < 0. Here,
u and σ are location and scale parameters, respectively, and the shape parameter k
determines which extreme value distribution is represented: Fréchet, Weibull, and
Gumbel correspond to k > 0, k < 0, and k = 0.
The Fréchet type has a lower bound below which the probability density equals
0, whereas the Weibull type has an upper bound above which the probability density
equals 0. The Gumbel type has no restriction on value [11]. Most households consume
electricity, so their load typically shows a zero lower bound; hence, the best-fitted
GEV distributions of their load profile data typically belong to the Fréchet type
(k > 0).
The small consecutive value difference illustrates that the household load rarely
changes between two consecutive time intervals. As shown in Fig. 3.1, when the
3.2 Household Load Profile Characteristics 65
Fig. 3.3 Boundary separating the base state and stimulus state for the load profile of household
#1008
load level decreases, this characteristic will strengthen. This means that when the
household load decreases, the consecutive value difference is smaller, and the load
would become more stable. When the household load steps into a high level, as the
consecutive value difference increases, the load would become more unstable.
To differentiate stable and unstable load levels, Fig. 3.3 plots a state boundary,
below and above which the load is defined as “base state” and “stimulus state”,
respectively. As shown in Fig. 3.3, there is a state boundary set to differentiate the
base state and stimulus state. The load below the boundary is in the base state;
otherwise, the load is in the stimulus state.
Base state: In this state, the load level and consecutive value difference are both
low.
Stimulus state: In this state, the load level and consecutive value difference are
both high.
Load event: The phenomenon of the household load deviating from the base
state, experiencing several stimulus states accompanied with large consecutive value
difference and finally returning to the base state is defined as a “load event”. The
load event can be detected by searching for the transition from the base state to the
stimulus state.
The household load typically remains in the base state, which is often accompanied
by small value differences between adjacent sampling load values. Load events are
often caused by the switching of high-power appliances, such as air conditioners,
microwave ovens, washers, and dryers. As the load event finishes, the load will
return to the base state and remain nearly unchanged again. The state boundary and
corresponding load events construct the key features of household load profiles and
hence are our identification target.
For data compression, because the base state is a stable state that exhibits little
value difference and the load events rarely occur, the compression efficiency can
66 3 Smart Meter Data Compression
be improved significantly by recording the time and load of typical load events.
The remaining data are all base state loads. This process would not yield much
compression error because the consecutive value difference in the base state is low.
For GEV distribution, the value is distributed densely at a low level-i.e., the base state–
and loosely at a high level-i.e., the stimulus state. Hence, we can adopt the quantile
at which the cumulative distribution function equals to a predetermined probability
as the state boundary. According to the fitted GEV cumulative distribution function
through maximum likelihood estimation (MLE), when the confidence probability is
ascertained, the state boundary separating the base state from the stimulus state can
be calculated.
Household loads obey the GEV distribution; hence, the first step is modeling house-
hold load time series xt by a distribution fit through the MLE algorithm. Because
the household load characteristics differ by season, the distribution fit is made for
load data in spring (Mar. to May), summer (Jun. to Aug.), autumn (Sep. to Nov.) and
winter (Dec. to Feb.). Given a confidence probability α and cumulative probability
density function F(x), the load state boundary B is calculated as follows:
B = x if F(x) = α (3.3)
Through the GEV distribution fit and boundary calculation, a load state matrix S =
[S1 , S2 , . . . , S N ] composed of 0 (base state) and 1 (stimulus state) is generated by
determining whether each load value in the original load profile data is below the
boundary B, as shown in step 1 of Fig. 3.5.
0 if xt ≤ B
St = (3.4)
1 if xt > B
The load data in the base state are discretized by predetermined breakpoints accord-
ing to the fitted GEV distribution. As shown in Fig. 3.6, breakpoints are a sequence
of quantiles C = [c0 , c1 , . . . , cd ] such that the area under the fitted GEV probability
density function f (x) from ci−1 and ci = α/d(i = 1, 2, . . . , d), where α is the con-
fidence probability, d is the discretization interval number, and c0 = uσ/k, cd = B.
68 3 Smart Meter Data Compression
For any load series in the base state whose average value is falling into the interval
between ci−1 and ci , the series is coded by sub-base state ID i and expected value
E(i) between ci−1 and ci :
As shown in Fig. 3.6, the base state is separated into 8 sub-base states, with 9
breakpoints from c0 to c8 . c8 is the state boundary B; hence, below c8 , the load is
in the base state or the stimulus state. The area under the GEV probability density
function between two consecutive breakpoints equals α/8. It can be seen that the
original base state is coded by one number 0, and after discretization, it is divided
into 8 sub-base states; thus, the coding resolution of the base state is significantly
improved.
As shown in Step 2 of Fig. 3.5, event detection is performed after scanning all nonzero
segments in the state matrix S. A load event occurs when the load deviates from the
3.3 Feature-Based Load Data Compression 69
base state and moves into the stimulus state. Before the load returns to the base state,
the load may experience several stimulus states. Hence, the event detection algorithm
is composed of two steps, which are 0–1, 1–0 edge detection, and event load slicing.
(1) 0–1 Edge Detection: When the state changes from 0 to 1, a load event starts.
Hence, the load event start time ts is calculated as follows:
ts = t + 1 if St+1 − St = 1 (3.7)
(2) 1–0 Edge Detection: Increase t by 1 iteratively until the state changes from 1
to 0, at which point a load event ends. Hence, the load event end time te is calculated
as follows:
te = t − 1 if St − St−1 = −1 (3.8)
(3) Event Load Profile Slicing: The event load profile (ELP) is sliced from xt
according to each matched detected start time ts and end time te.
The number of stimulus state intervals is defined as the length of the ELP:
length(ELP) = te − ts + 1 (3.10)
After all load events are detected, the sliced load event profiles are used to construct
a load event segment pool, on which load event clustering is based.
The length of the ELP represents the operation time of high-power appliances. As
shown in Fig. 3.7, the first step is to classify load events according to their lengths.
In addition to the length of the load event, the profile shape and load level are
also important metrics for clustering. Here, the Euclidean distance is used as the
distance between two ELPs with the same length. Based on the Euclidean distance,
the hierarchical clustering algorithm [12] is applied to cluster load events into M
groups in which the load events share a similar load event profile. The group ID is
coded with integers from 1 to M. Finally, profiles in the same group are averaged to
shape the representative profile, which is combined with the group ID.
Here, we propose a special data format for data compression, as shown in Fig. 3.8.
This data structure is event-based, with every 16 bits recording one load event.
70 3 Smart Meter Data Compression
Fig. 3.7 Load event clustering decomposed into two steps: event classification based on lengths
and hierarchical clustering based on Euclidean distance between ELPs
The 16 bits are equal to 2 bytes, which is easy for CPUs to process. Of the 16
bits, the first is named the next day bit so that the day on which the load event occurs
can be determined. If this bit equals 0, the event occurs on the same day. If this bit
equals 1, the event occurs on the following day. Following the next day bit, there are
6-time interval bits, which record the time when the load event starts. The maximum
time interval provided by six bits is 64. The next 6 bits are responsible for coding the
event group ID. With 6 bits, the data compression format can support no more than
64 event clusters. The final 3 sub-base state bits can support recoding of no more
than 8 sub-base states.
This data compression format improves the compression efficiency significantly
because all of the load values in the base state are recoded by the integer sub-base state
3.3 Feature-Based Load Data Compression 71
ID, and the event is represented by the integer event group ID. For household loads,
events rarely occur, which is beneficial for significantly improving data compression.
The data reconstruction is divided into two steps: event reconstruction and base
state reconstruction. In the event reconstruction process, the representative load pro-
file of the event group is used to reconstruct the original event load profile. The
start time and event group ID of any identified load event are recorded in the data
compression format, as shown in Fig. 3.8. The baseload data before load events are
replaced by the expected values corresponding to the sub-base state IDs, which are
recorded in the last three bits of the data compression format.
In terms of compression efficiency, one common index is the average value size in bits.
This index evaluates the number of bits required to store one load value. The lower
the index is, the higher the compression efficiency will be. For uncompressed double-
precision float data, the bit number per value is constant at 64. For the uncompressed
unsigned integer data described in IEC 61334-6, which is also referred to as A-
XDR encoding, the bit number per value equals 16 if we use 16 bits to store an
integer. The other evaluation index is the compression ratio, which is defined here as
the uncompressed data volume divided by the double-precision floating point data
volume.
In terms of reconstruction precision—because in most time intervals, the customer
load remains low compared with the peak load–a micro-error would lead to large
percent error. Even if the absolute error between the data before and after compression
was small, a percent error evaluation method, such as MAPE, would also produce an
extremely large percent error. Here, we propose a new precision evaluation metric
defined as the mean peak percent error (MPPE), which uses the daily peak load as
the denominator to calculate the percent error:
where T equals the number of overall time intervals. For data compression, it is
important to evaluate the accuracy of both the reconstructed time and load level of
load events; thus there is no requirement to use the new error metrics proposed by
[13] which reduces the so-called “double penalty” effect, incurred by forecasts whose
features are displaced in space or time.
3.4.3 Dataset
The dataset for compression performance evaluation is taken from the Irish Smart
Metering Trial Data from SEAI. The smart metering data are recorded in 30-min
3.4 Data Compression Performance Evaluation 73
There are 4225 households evaluated overall, of which 20 household load compres-
sion efficiencies are plotted in Fig. 3.9. For the 20 household load data, the average
value size in bits given by FLDC is 1.24, which surpasses most approaches signifi-
cantly and is only a bit lower than SAX, which has the highest compression efficiency.
It is noted that FLDC falls behind SAX by 0.25 bits per value. However, the latter
method loses the capability of high reconstruction precision, which will be discussed
in the next part.
As shown in Table 3.3, the overall evaluation result shows that the mean com-
pression ratio of the 4225 households through FLDC reaches a high level of 55.71,
which is near that of SAX.
Fig. 3.9 Average value size in bits for 20 households from the Irish smart meter data
74 3 Smart Meter Data Compression
Table 3.3 Average Compression Efficiency and Reconstruction Precision for 4255 Households
Average bits per value Average compression MPPE (%)
ratio
Double precision float 64 1 0
16 bit unsigned integer 16 4 0
PAA 8 8 10.48
SAX 1 64 11.42
Harr DWT 8 8 10.48
RLDC [6] 1.6 40 0
FLDC 1.27 55.71 5.57
Figure 3.10 shows the load reconstruction profiles of FLDC compared with PAA,
SAX, DWT, and RLDC for households #1009, #1015, and #1018. Figure 3.11 shows
the data reconstruction precision of FLDC for 20 households compared to the existing
methods. With the exception of RLDC, it can be seen that FLDC outperforms the
other methods significantly. Because RLDC does not yield any compression error,
the reconstruction precision is 100%, and the reconstruction profile is the same as
the uncompressed load.
The existing methods—PAA, SAX, and DWT—cannot capture the load event with
high time and load level resolution, whereas FLDC restores the load event profile
nearly without error. Figure 3.10c shows that the start time interval of the first load
event in a day for household #1018 obtained by PAA, SAX, and DWT is 4:00 a.m.
whereas the real start time interval equals 7:00 a.m.
As shown in the MPPE column of Table 3.3, the average MPPE of FLDC for all
4225 residents equals 5.57%, which indicates that the average reconstruction error is
only 5.57% of the daily peak load for the 4225 households. Although it provides high
compression efficiency, SAX loses the capability of high reconstruction precision and
hence has the highest MPPE, which is 11.42%. PAA and DWT have the same MPPE,
both equal 10.18%.
Figure 3.12 shows a performance map in which the state-of-the-art methods are
located according to their performance in terms of reconstruction precision (1-MPPE)
and compression ratio. It can be seen that any of SAX, RLDC, and FLDC cannot
beat each other methods in both dimensions of performance, and they all significantly
outperform PAA and DWT in the dimension of compression ratio. The compression
ratio of SAX is the highest, but its reconstruction precision is the lowest.
3.4 Data Compression Performance Evaluation 75
Fig. 3.11 Data reconstruction precision for 20 households. The MPPE of RLDC equals 0. PAA and
DWT have the same reconstruction precision. With the exception of RLDC, FLDC has the lowest
MPPE
Fig. 3.12 The reconstruction precision versus compression ratio for data compression methods.
1 From PAA and DWT to SAX: compression ratio increases by 800%, reconstruction precision
decreases by 0.94%; 2 From SAX to RLDC: compression ratio decreases by 37.5%, reconstruction
precision increases by 11.42%; 3 From RLDC to FLDC: compression ratio increases by 39.3%,
reconstruction precision decreases by 5.57%; 4 From SAX to FLDC: compression ratio decreases
by 13.0%, reconstruction precision increases by 5.85%
3.4 Data Compression Performance Evaluation 77
From PAA and DWT to SAX and RLDC, there is a huge improvement in com-
pression ratio. However, when the compression ratio has been as high as 40–64, it
becomes difficult to improve them without reducing the reconstruction precision.
Compared with SAX, FLDC improves the reconstruction precision from 88.58 to
94.43% while sacrificing only 13.0% compression ratio. From RLDC to FLDC, the
compression ratio is improved by 39.3% at the expense of 5.57% of reconstruction
precision. Although a 39.3% compression ratio increase is much smaller than the
800% compression ratio increases from PAA and DWT to SAX, it is still significant
progress. Actually, FLDC realizes a better compromise of compression ratio and
reconstruction precision and yields a large improvement in compression ratio with
little loss of reconstruction precision.
3.5 Conclusions
This chapter proposes a smart metering load data compression method based on
load feature identification. This feature-based load data compression identifies load
features from the uncompressed load data and restores load features rather than orig-
inal data values. According to the GEV distribution characteristic, load features are
classified into two types: base states and load events. The base state load is then dis-
cretized into several sub-base states, which improves the coding resolution. The load
events are clustered into load event groups in which the load events share a represen-
tative load event profile. Finally, we design an event-based data compression format,
within which every 16 bits record one load event, and the baseload before the event
starts. Owing to the GEV distribution characteristic of household load, the base state
load rarely changes, and load events rarely occur, thus giving FLDC the capability of
high compression ratio with little compression error while simultaneously providing
feature information.
The advantages of FLDC include the following:
(1) Applied to the Irish smart meter data, the data compression ratio is as high as
55.71, with an average reconstruction error equaling 5.57% of the daily peak load;
(2) The data compression and reconstruction are simple and efficient, enabling
both online and offline application;
(3) The compressed data directly show load feature information including the
base state and load event type.
References
1. Stephen, B., & Galloway, S. J. (2012). Domestic load characterization through smart meter
advance stratification. IEEE Transactions on Smart Grid, 3(3), 1571–1572.
2. Unterweger, A., & Engel, D. (2015). Resumable load data compression in smart grids. IEEE
Transactions on Smart Grid, 6(2), 919–929.
78 3 Smart Meter Data Compression
3. Piao, M., Shon, H. S., Lee, J. Y., & Ryu, K. H. (2014). Subspace projection method based
clustering analysis in load profiling. IEEE Transactions on Power Systems, 29(6), 2628–2635.
4. Wang, Y., Chen, Q., Kang, C., Zhang, M., Wang, K., & Zhao, Y. (2015). Load profiling and its
application to demand response: A review. Tsinghua Science and Technology, 20(2), 117–129.
5. Tsekouras, G. J., Hatziargyriou, N. D., & Dialynas, E. N. (2007). Two-stage pattern recognition
of load curves for classification of electricity customers. IEEE Transactions on Power Systems,
22(3), 1120–1128.
6. Chicco, G., Ionel, O. M., & Porumb, R. (2013). Electrical load pattern grouping based on
centroid model with ant colony clustering. IEEE Transactions on Power Systems, 28(2), 1706–
1715.
7. Espinoza, M., Joye, C., Belmans, R., & Demoor, B. (2005). Short-term load forecasting, profile
identification, and customer segmentation: A methodology based on periodic time series. IEEE
Transactions on Power Systems, 20(3), 1622–1630.
8. Notaristefano, A., Chicco, G., & Piglione, F. (2013). Data size reduction with symbolic aggre-
gate approximation for electrical load pattern grouping. IET Generation Transmission & Dis-
tribution, 7(2), 108–117.
9. Lin, J., Keogh, E., Lonardi, S., & Chiu, B. (2003). A symbolic representation of time series,
with implications for streaming algorithms. In ACM SIGMOD Workshop on Research Issues
in Data Mining and Knowledge Discovery.
10. Commission for Energy Regulation (CER). (2012). CER Smart Metering Project - Electricity
Customer Behaviour Trial, 2009–2010. Irish Social Science Data Archive. SN: 0012-00.
11. Walden, A. T., & Prescott, P. (1983). Maximum likeiihood estimation of the parameters of
the three-parameter generalized extreme-value distribution from censored samples. Journal of
Statistical Computation and Simulation, 16(3–4), 241–250.
12. Johnson, S. C. (1967). Hierarchical clustering schemes. Psychometrika, 32(3), 241–254.
13. Haben, S., Ward, J., Vukadinovic Greetham, D., Singleton, C., & Grindrod, P. (2014). A new
error measure for forecasts of household-level, high resolution electrical energy consumption.
International Journal of Forecasting, 30(2), 246–256.
Chapter 4
Electricity Theft Detection
Abstract As the problem of electricity thefts via tampering with smart meters
continues to increase, the abnormal behaviors of thefts become more diversified and
more difficult to detect. Thus, a data analytics method for detecting various types of
electricity thefts is required. However, the existing methods either require a labeled
dataset or additional system information which is difficult to obtain in reality or have
poor detection accuracy. In this chapter, we combine two novel data mining tech-
niques to solve the problem. One technique is the Maximum Information Coefficient
(MIC), which can find the correlations between the non-technical loss (NTL) and
a certain electricity behavior of the consumer. MIC can be used to precisely detect
thefts that appear normal in shapes. The other technique is the clustering technique
by fast search and find of density peaks (CFSFDP). CFSFDP finds the abnormal users
among thousands of load profiles, making it quite suitable for detecting electricity
thefts with arbitrary shapes. Next, a framework for combining the advantages of the
two techniques is proposed. Numerical experiments on the Irish smart meter dataset
are conducted to show the good performance of the combined method.
4.1 Introduction
Fraudulent users can tamper with smart meter data using digital tools or cyber-attacks.
Thus, the form of electricity thefts is very different from the form in the past, which
relies mostly on physically bypassing or destructing mechanical meters [1]. Cases of
organized energy theft spreading tampering tools and methods against smart meters
that caused a severe loss for power utilities were reported by the U.S. Federal Bureau
of Investigation [2] and Fujian Daily [3] in China. In total, the non-technical loss
(NTL) due to consumer fraud in the electrical grid in the U.S. was estimated to be
$6 billion/year [4]. Because the traditional detection methods of sending technical
staff or Video Surveillance are quite time-consuming and labor-intensive, electricity
theft detection methods that take advantage of the information flow in power system
are urgently needed to solve the problem of the “Billion-Dollar Bug”.
Fig. 4.1 Observer meters for areas and smart meters for customers
Our method is applicable to the scene of Fig. 4.1, where an observer meter is installed
in an area with a group of customers. An observer meter is more secure than a normal
smart meter is, making it almost impossible for fraudulent users to tamper with the
meter. We believe that DSOs and electricity retailers have access to observer meter
data.
Electricity thieves tend to reduce the quantity of their billed electricity. Thus an FDI
that has certain impacts on the tampered load profiles is used to simulate the tampering
behaviors of the electricity thieves. We use six FDI types similar to those mentioned
in [10] that have time-variant modifications on load profiles. Table 4.1 shows our FDI
definitions and Fig. 4.2 gives an example of the tampered load profiles. In Table 4.1,
xt is the ground true power consumption during time interval t, and x̃t is the tampered
data recorded by the smart meter. There are many other FDI types in the literature [5,
17]. However, a characteristic can be generalized according to their definitions and
examples: an FDI type either keeps the features and fluctuations of the original curve
or creates new patterns. This is the same for other sophisticated FDI types, so our
method can handle them as well.
82 4 Electricity Theft Detection
1.8
Original
1.6 FDI1
FDI2
1.4 FDI3
FDI4
power consumption/kWh
FDI5
1.2
FDI6
0.8
0.6
0.4
0.2
0
0 5 10 15 20 25
time/hour
The NTL of an area et can be calculated by subtracting the observer meter data E t
from the sum of the smart meter data x̃i,t in the area:
et = E t − x̃i,t (4.1)
i∈A
where A is the set of the labels of all meters in the ares. Let F denote the set of
the labels of tampered meters in the area, and B = A /F be the set of the labels of
benign meters. Equation (4.1) can be represented as:
et = (xi,t − x̃i,t ) (4.2)
i∈F
where Corr (·, ·) is a proper correlation measurement for two vectors. Figure 4.3
shows a real electricity theft case in Shenzhen [18], where e and x̃i have a high
correlation. In FDI1, the correlation is linear and certain; however, in many other
situations, the correlation is rather fuzzy. Note that Eq. (4.3) may not hold for some
FDI types (e.g., FDI6 which produces a totally random curve); however, we can filter
out a large part of electricity thefts by using Eq. (4.3). The selection of measurement
Corr (·, ·) that can precisely reveal the fuzzy relationship between NTL and tampered
load profiles is of vital importance.
The overall detection methodology is based on the two novel data mining techniques,
i.e., MIC and CFSFDP. MIC utilizes the analysis in Sect. 4.2.3 to detect associations
between the area NTL and tampered load profiles. CFSFDP is used to determine the
load profiles with abnormal shapes. According to the suspicion ranks given by the
two methods, a combined rank is given to take the advantages of both methods.
84 4 Electricity Theft Detection
8000
NTL Kengzi substation F04 line #2 user
7000
power consumption/kWh
6000
5000
4000
3000
2000
1000
0 5 10 15 20 25 30 35
time/day
Fig. 4.3 A real case of NTL and power consumption of the suspected user [18]
where the maximum is over all grids G with a columns and b rows, and I (D|G ) is
the MI of D|G . The characteristic matrix M(D) is defined as
I ∗ (D, a, b)
M(D)a,b = (4.5)
log min{a, b}
The MIC of a finite set D with sample size |D| and grid size less than B(n) is given
by
4.3 Methodology and Detection Framework 85
We use B(|D|) = |D|0.6 in this chapter because it is found to work well in practice.
The value of MIC falls in the range of [0, 1], and a larger value indicates a stronger
association.
The M I C(·) is applied as the Corr (·, ·) in Eq. (4.3) to detect electricity thefts
whose consumption behaviors have strong relevance to the NTL in the area.
To tackle the FDI types that cannot be detected by the method of correlation, we use
clustering to find the outliers in the numerous load profiles. Density-based clustering
methods have been widely adopted in anomaly detecting. CFSFDP [19] is a newly-
proposed method that has been proven to be very powerful in large dataset clustering
and outlier detection.
In CFSFDP, two values are defined for the pth load profile: its local density ρ p
and its distance δ p from other load profiles of higher density. Both values depend on
the distances d pq between the data points. Equation (4.7) gives the definition of ρ p :
ρp = χ (d p,q − dc ) (4.7)
q
where dc isthe cut-off distance and χ (·) is the kernel function. The cut-off kernel
1, if x < 0
is χ (x) = . Because the local density ρ p is discrete in Eq. (4.7), a
0, otherwise
Gaussian kernel is occasionally used to estimate ρ p , as shown in Eq. (4.8), to avoid
conflicts:
d p,q 2
ρp = exp − ( ) (4.8)
q= p
dc
For those load profiles with the highest local density, δ p is conventionally written as
and their distance to the high-density area is larger than the normal points. From the
definitions above, the spatial distribution of the abnormal points results in a small ρ p
and a large δ p (Fig. 4.5). We define the degree of abnormality ζ p in Eq. (4.11):
δp
ζp = (4.11)
ρp + 1
Figure 4.6 shows the framework of how to utilize MIC and CFSFFDP in electricity
theft detecting and how to combine the results of the two independent but comple-
mentary methods.
For an area with n consumers and m-day recorded data series, a time series of NTL
is first calculated using Eq. (4.1). Next, we normalize each load profile x̃ p by dividing
it with maxt x̃ p and then reconstruct the smart meter dataset into a normalized load
4.3 Methodology and Detection Framework 87
profile dataset with n × m vectors. This procedure retains the shape of each load
curve to the greatest extent and helps the clustering method focus on the detection
of arbitrary load shapes. Let ui, j denote the normalized vector of the ith consumer’s
load profile on the jth day and e j denote the NTL loss vector of the area on the jth
day. For every i and j, M I C(ui, j , e j ) is calculated according to the equations in
Sect. 4.3.1. Moreover, ρi, j and δi, j are calculated using CFSFDP, and the degree of
abnormality ζi, j for vector ui, j is obtained.
For consumer i with m MIC or ζ values, a k-means clustering method with k = 2
is used to detect the MIC or ζ values of suspicious days by classifying the m days
into 2 groups. The mean of the MIC or ζ values that belong to the more suspicious
group is taken as the suspicion degree for consumer i. Thus, the two suspicion ranks
of the n consumers can be extracted by inter-comparing the n × m MIC or ζ values.
The idea of combining the two ranks is based on the famous Rank Product (RP)
method [20], which is frequently used in Biostatistics. In this chapter, we use the
arithmetic mean and the geometric mean of the two ranks to combine the methods,
as in Eq. (4.12).
Rank1 + Rank2
RankArith =
2 (4.12)
or RankGeo = Rank1 × Rank2
4.4.1 Dataset
We use the smart meter dataset from Irish CER Smart Metering Project [21] that
contains the load profiles of over 5000 Irish residential users and small & medium-
sized enterprises (SMEs) for more than 500 days. Because all users have completed
the pre-trial or post-trial surveys, the original data are considered ground truth. We
use the load profiles of all 391 SMEs in the dataset from July 15 to August 13, 2009.
Thus, we have 391 × 30 = 11 730 load profiles in total, and each load profile consists
of 48 points, with a time interval of half an hour. The 391 SMEs are randomly and
evenly divided into several areas with observer meters. For each area, several users
are randomly chosen as fraudulent users, and certain types of FDI are used to tamper
with their load profiles. Fifteen of the 30 load profiles of each fraudulent user are
tampered with.
4.4 Numerical Experiments 89
Let Yk denote the number of electricity thieves who rank at top k, and define the
precision P@k = Ykk . Given a certain number of N , MAP@N is the mean of P@k
defined in Eq. (4.14): r
P@ki
MAP@N = i=1 (4.14)
r
where r is the number of electricity thieves who rank in the top N and ki is the
position of the ith electricity thieves. We use MAP@20 in this chapter. In the ran-
dom guess (RG), the true positive rate equals the false positive rate; thus, the AUC
for RG is always 0.5, and the MAP for RG is |F |/(|F | + |B|) which is the pro-
portion of electricity thieves among all users. We consider these values to be the
benchmarks.
Note that all the numerical experiments in this chapter are repeated for 100 ran-
domly generated scenarios to avoid contingency among the results. The values of
AUC and MAP are calculated using the mean value to show the average perfor-
mance.
90 4 Electricity Theft Detection
In this subsection, we divide the users into 10 areas and randomly choose 5 electricity
thieves for each area. Thus, there are approximately 39 users in each area, and the
ratio of fraudulent users is 12.8%.
Figure 4.7 shows the comparison results of the methods. Tables 4.2 and 4.3 shows
the detailed values of AUC and MAP@20 of the correlation-based methods and
the unsupervised clustering-based methods for the six FDI types. The type MIX
indicates that the 5 electricity thieves randomly choose one of the six types. We
believe that different fraudulent users might choose different FDI types. The results
for the detection of single FDI type show the advantage of each method under certain
situations, while the results for type MIX are of significance in practice. In CFSFDP,
the cut-off kernel is used because it is faster than the Gaussian kernel and because we
have a large dataset in which conflicts do not occur. In the application of FCM, there
are 9 different results due to the number of cluster centers, and we only present the
best among them. MI denotes the Kraskov’s estimator for mutual information, and
Arith and Geo are abbreviations for arithmetic and geometric mean, respectively.
The best results among the 8 methods are in bold for each FDI type in Tables 4.2
and 4.3.
The results demonstrate that the correlation-based methods exhibit excellent per-
formance in detecting FDI1. The blue lines in Fig. 4.7 show that MIC has a more
balanced performance in both AUC and MAP@20. MIC also shows its superiority
in detecting type MIX. The correlation-based method performs poorly in detect-
ing FDI5 and FDI6 because the tampered load profiles become quite random, and
the correlation no longer exists. The unsupervised clustering methods, especially
CFSFDP and LOF, have quite high values of AUC in detecting FDI4, FDI5, and
FDI6; however, they have zero performance in FDI1 because after normalization the
tampered load profiles appear exactly the same as the original load profiles. FCM has
poor performance in types, except for FDI6; thus FCM may not be a good tool for
electricity theft detection. Furthermore, during the numerical experiments, we notice
that the performance of FCM is heavily affected by the number of cluster centers,
and it is quite unpractical to tune the number in a wider range. From the black lines
in Fig. 4.7, CFSFDP is found to have the best performance in detecting FDI5, FDI6,
and type MIX among all the clustering methods. The MAP@20 of CFSFDP is much
higher than that of LOF for these types.
The combined methods have taken the advantages of both MIC and CFSFDP.
For FDI1, for which MIC specializes in, the performance of our combined methods
is not as good as that of MIC. However, our method achieves a rather high AUC
of 0.766 in detecting FDI1. For FDI5 and FDI6, for which CFSFDP specializes in,
our methods also have high values of AUC and MAP@20. The combined methods
achieved improvements in the remaining types. The MIC-CFSFDP combined meth-
ods maintain the excellent performance of the original two methods in their own
specialized situations while achieving significant improvements in the remaining
situations, resulting in the best detection accuracy in type MIX and high and steady
4.4 Numerical Experiments 91
0.9
0.8
0.7 MIC
PCC
MI
0.6
AUC
CFSFDP
FCM
0.5 LOF
Arith
0.4 Geo
0.3
0.2
0.1
1 2 3 4 5 6 MIX
FDI type
(a) AUC values of the methods
1
0.9
0.8
0.7
MIC
PCC
0.6
MAP@20
MI
CFSFDP
0.5
FCM
LOF
0.4 Arith
Geo
0.3
0.2
0.1
0
1 2 3 4 5 6 MIX
FDI type
(b) MAP@20 values of the methods
Fig. 4.7 The evaluation results of the original and combined methods
detection accuracy for FDI1 to FDI6. The AUC value for type MIX increased from
0.748 to 0.816 (approximately 10%), and the MAP@20 value for type MIX increased
from 0.693 to 0.831 (approximately 20%). The results for Arith and Geo are similar
in most cases, and Arith performs slightly better in AUC. It is worthwhile to mention
that weight factors in type MIX alter the detection accuracy. Although we assume
92 4 Electricity Theft Detection
identical weights for the FDI types, the combined methods achieve improvements in
accuracy for other non-extreme weight factors.
Figure 4.8 shows the standard deviations σ of AUC and MAP@20 in the 100 ran-
domly generated scenes of type MIX for each method. σ of AUC is approximately 4%
for all the methods, and Arith has a minimum σAUC of 3.08%. σMAP@20 is distributed
between 9 and 17%. σMAP@20 of Arith and Geo are 9.16 and 9.13%, respectively,
and are smaller than those of all the other methods. The combined methods improve
both the accuracy and the stability of the original methods.
Figure 4.9 presents the average time consumption of the six methods for one
detection of the whole 11 730 load profiles. For FCM, we only show the results of
4 and 12 cluster centers. The test was done on an Intel Core i7-7900X@4.30GHz
desktop computer with 32GB RAM. Among these methods, Kraskov’s estimator
for MI has the most time-consuming. The combining process only requires simple
calculation and sorting, and its time consumption is less than 1 s.
4.4 Numerical Experiments 93
0.18
AUC
0.16 MAP@20
0.14
Standard deviation
0.12
0.1
0.08
0.06
0.04
0.02
0
MIC PCC MI CFSFDP FCM LOF Arith Geo
Methods
25
20.47
20
time consumption (s)
15
10.6
10
4.89 4.53
5
3.02
1.08 1.14
0
MIC PCC MI CFSFDP FCM-4 FCM-12 LOF
When applying the electricity detection methods in real-world conditions, the number
of electricity consumers or electricity thieves per area varies over a wide range,
94 4 Electricity Theft Detection
0.9
0.8 MIC
PCC
MI
AUC
CFSFDP
0.7
FCM
LOF
Arith
0.6 Geo
0.5
0.4
1 2 3 4 5 6 7
The number of electricity thieves per area
(a) AUC values of the methods
0.9
0.8
0.7
MIC
0.6
PCC
MI
MAP@20
0.5 CFSFDP
FCM
0.4 LOF
Arith
Geo
0.3
Benchmark
0.2
0.1
0
1 2 3 4 5 6 7
The number of electricity thieves per area
(b) MAP@20 values of the methods
Fig. 4.10 Performance of the methods with different numbers of electricity thieves per area
0.85
0.8
0.75
0.7
MIC
PCC
0.65
MI
AUC
CFSFDP
0.6
FCM
LOF
0.55 Arith
Geo
0.5
0.45
0.4
0.35
30 40 50 60 70 80 90 100
The number of electricity consumers per area
(a) AUC values of the methods
0.9
0.8
0.7
MIC
0.6
PCC
MI
MAP@20
0.5 CFSFDP
FCM
0.4 LOF
Arith
Geo
0.3
Benchmark
0.2
0.1
0
30 40 50 60 70 80 90 100
The number of electricity consumers per area
(b) MAP@20 values of the methods
Fig. 4.11 Performance of the methods with different numbers of electricity consumers per area
area to 5 and change the number of electricity consumers per area from 30 to 98
(which is achieved by dividing the 391 users into 4 to 13 areas). Figures 4.10 and
4.11 show the evaluation results for the two aspects of sensitivity analysis. Due to
space limitations, we only present the results for type MIX.
96 4 Electricity Theft Detection
0.1 0.18
0.09 0.16
0.08
0.14
0.07
0.12
MAP@20
AUC
0.06
0.1
0.05
0.08
0.04
0.03 0.06
0.06 0.2
0.055 0.18
MAP@20
0.05 0.16
AUC
0.045 0.14
0.04 0.12
0.035 0.1
0.03 0.08
0.025 0.06
30 40 50 60 70 80 90 100 30 40 50 60 70 80 90 100
The number of electricity consumers per area The number of electricity consumers per area
Fig. 4.12 Standard deviations of the methods with different numbers of electricity thieves and
electricity consumers per area
As the number of electricity thieves per area changes, we can see from the AUC
values that MIC and PCC perform well under the conditions of fewer electricity
thieves and that MI is more robust in this aspect. However, MIC and PCC perform
better in MAP@20 than MI. MIC can detect electricity thieves more precisely under
these conditions. CFSFDP always performs the best of the three unsupervised clus-
tering methods. The combined method of Arith maintains excellent performance for
both AUC and MAP@20.
As the number of electricity consumers per area increases, most of the methods
give a stable performance against the benchmark value. MIC is the best overall of
the correlation-based methods, and CFSFDP is the best among the clustering-based
methods. The combined methods achieve improvements against other methods in all
conditions.
Figure 4.12 shows the change in standard deviations during the two aspects of
sensitivity analysis. σAUC shows a certain trend as the number of electricity thieves
or electricity consumers increases. As the electricity theft problem becomes more
severe, σAUC decreases slightly, whereas σMAP@20 changes in a more disordered
way. σMAP@20 of most methods have an upward trend as the number of electricity
4.4 Numerical Experiments 97
consumers per area increases. Although the combined methods do not always have
the smallest standard deviation, the change of σ is over a rather small range, which
is adequate for the methods in the practical application.
4.5 Conclusions
This chapter proposes a combined method for detecting electricity thefts against
AMI in the Energy Internet. We first analyze the basic structure of the observer
meters and the smart meters. Next, a correlation-based detection method using MIC
is given to quantify the association between the tampered load profiles and the NTL.
Considering the FDI types that have little association with the original data, an
unsupervised CFSFDP-based method is proposed to detect outliers in the smart
meter dataset. To improve the detection accuracy and stability, we ensemble the two
techniques by combining the suspicion ranks. The numerical results show that the
combined method achieves good and steady performance for all FDI types in various
conditions.
References
1. Jiang, R., Lu, R., Wang, Y., Luo, J., Shen, C., & Shen, X. S. (2014). Energy-theft detection
issues for advanced metering infrastructure in smart grid. Tsinghua Science and Technology,
19(2), 105–120.
2. Federal Bureau of Investigation. (2012). Cyber intelligence section: smart grid electric meters
altered to steal electricity.
3. Fujian Daily. (2013). The first high-tech smart meter electricity theft case in China reported
solved.
4. McDaniel, P., & McLaughlin, S. (2009). Security and privacy challenges in the smart grid.
IEEE Security & Privacy, 7(3), 75–77.
5. Jokar, P., Arianpoo, N., & Leung, V. C. M. (2016). Electricity theft detection in AMI using
customers’ consumption patterns. IEEE Transactions on Smart Grid, 7(1), 216–226.
6. Nizar, A. H., Dong, Z., & Wang, Y. (2008). Power utility nontechnical loss analysis with
extreme learning machine method. IEEE Transactions on Power Systems, 23(3), 946–955.
7. Zheng, Z., Yatao, Y., Niu, X., Dai, H.-N., & Zhou, Y. (2018). Wide & deep convolutional neural
networks for electricity-theft detection to secure smart grids. IEEE Transactions on Industrial
Informatics, 14(4), 1606–1615.
8. Ahmad, T., Chen, H., Wang, J., & Guo, Y. (2018). Review of various modeling techniques for
the detection of electricity theft in smart grid environment. Renewable and Sustainable Energy
Reviews, 82, 2916–2933.
9. Passos, L. A. Jr., Oba Ramos, C. C., Rodrigues, D., Pereira, D. R., de Souza, A. N., Pontara
da Costa, K. A., & Papa, J. P. (2016). Unsupervised non-technical losses identification through
optimum-path forest. Electric Power Systems Research, 140, 413–423.
10. Zanetti, M., Jamhour, E., Pellenz, M., Penna, M., Zambenedetti, V., & Chueiri, I. (2017). A
tunable fraud detection system for advanced metering infrastructure using short-lived patterns.
IEEE Transactions on Smart Grid, 10(1), 830–840.
11. Sun, M., Konstantelos, I., & Strbac, G. (2016). C-vine copula mixture model for clustering of
residential electrical load pattern data. IEEE Transactions on Power Systems, 32(3), 2382–2393.
98 4 Electricity Theft Detection
12. Aranha Neto, E. A. C., & Coelho, J. (2013). Probabilistic methodology for technical and
non-technical losses estimation in distribution system. Electric Power Systems Research, 97,
93–99.
13. Leite, J. B., & Mantovani, J. R. S. (2016). Detecting and locating non-technical losses in modern
distribution networks. IEEE Transactions on Smart Grid, 9(2), 1023–1032.
14. Cárdenas, A., Amin, S., Schwartz, G., Dong, R., & Sastry, S. (2012). A game theory model for
electricity theft detection and privacy-aware control in AMI systems. In 50th Annual Allerton
Conference on Communication, Control, and Computing (Allerton), 2012 (pp. 1830–1837).
Monticello: IEEE.
15. Amin, S., Schwartz, G. A., Cardenas, A. A., & Sastry, S. S. (2015). Game-theoretic models of
electricity theft detection in smart utility networks: Providing new capabilities with advanced
metering infrastructure. IEEE Control Systems, 35(1), 66–81.
16. Reshef, D. N., Reshef, Y. A., Finucane, H. K., Grossman, S. R., McVean, G., Turnbaugh, P. J.,
Lander, E. S., Mitzenmacher, M., & Sabeti, P. C. (2011). Detecting novel associations in large
data sets. Science, 334(6062), 1518–1524.
17. Han, W., & Xiao, Y. (2016). Combating TNTL: Non-technical loss fraud targeting time-based
pricing in smart grid. In International Conference on Cloud Computing and Security (pp.
48–57). Berlin: Springer.
18. Yijia, T., & Hang, G. (2016). Anomaly detection of power consumption based on waveform
feature recognition. In 2016 11th International Conference on Computer Science & Education
(ICCSE) (pp. 587–591). Nagoya: IEEE.
19. Rodriguez, A., & Laio, A. (2014). Clustering by fast search and find of density peaks. Science,
344(6191), 1492–1496.
20. Breitling, R., Armengaud, P., Amtmann, A., & Herzyk, P. (2004). Rank products: a simple,
yet powerful, new method to detect differentially regulated genes in replicated microarray
experiments. FEBS Letters, 573(1–3), 83–92.
21. Commission for Energy Regulation (CER). (2012). CER Smart Metering Project - electricity
customer behaviour trial, 2009–2010. Irish Social Science Data Archive. SN: 0012-00.
22. Kraskov, A., Stögbauer, H., & Grassberger, P. (2004). Estimating mutual information. Physical
Review E, 69(6), 066138.
23. Breunig, M. M., Kriegel, H.-P., Ng, R. T., & Sander, J. (2000). LOF: identifying density-based
local outliers. In ACM sigmod record (Vol. 29, pp. 93–104). New York: ACM.
Chapter 5
Residential Load Data Generation
5.1 Introduction
The residential load data play an important role in various research and application
fields. However, the large-scale and real-time collection of residential load data still
remains a big challenge. First, collecting a large volume of load data is still costly due
to the technical barriers of data storage and communication. Second, processing and
analyzing real load data might bring potential legal risks due to the rising privacy
concern of customers and the promulgation of relevant laws in recent years [1].
Generally speaking, although smart meters have been widely deployed in many areas
around the world, the recorded load data still suffer from barriers to being completely
utilized at present. To solve the problem of shortage of available residential load data,
researches present to generate synthetic loads as an alternative [2].
Existing load generation methods can be classified into 2 categories: bottom-up
and top-down. Bottom-up methods decompose the total electricity consumption in
the household into loads of individual appliances. This kind of approaches mainly
includes two steps. First, construct the electrical model and usage model for different
appliances. Next, generate load profiles of live appliances and sum them up to form
the total profile. Capassp et al. present a model of residential end-users to establish
the load diagram of an area in [3]. The total consumption is constructed from the rel-
evant socioeconomic and demographic characteristics, unitary energy consumption
and load profiles of individual household appliances. McKenna et al. propose a model
capturing closed-loop load behavior with bottom-up load modeling techniques for
the residential sector in [4]. It incorporates time-variant load models and discrete
state-space representation of loads of thermal appliances. Tsagarakis et al. convert
user activity profiles into load profiles in [5]. The user activity profiles, including
time series of daily resident activities, are based on an appliance ownership statis-
tics and electrical characteristics dataset. Dickert et al. present a time-series-based
probabilistic load curve model for residential customers in [6]. The total loads are
constructed by investigating each possible appliance, respective power consumption,
frequency of use, turn-on time, operating time as well as the potential correlation
between appliances. Collin et al. propose a Markov chain Monte Carlo method to gen-
erate load profiles based on the electrical characteristics of appliances [7]. Stephen
et al. propose a Markov chain-based generating method derived from practice theory
of human behavior [8]. To conclude, bottom-up approaches model the residential
load based on the details of end-use appliances, thus are interpretable and precise.
However, these approaches are usually of high computational effort and additional
historical and statistics data requirements.
Top-down methods consider the residential load as a whole and fit the relationship
between the load and relevant influence factors. Labeeuw et al. propose a top-down
model based on a dataset of over 1300 load profiles in [9]. The load profiles are
first clustered by a mixed model. Then two Markov models are used to construct
the user behavior and randomness of the behavior respectively. Xu et al. compare
the bottom-up model with an agent-based approach and top-down model based on
neural networks in [10]. Uhrig et al. use the generalized extreme value distribution to
describe the distribution of residential loads in [11]. By introducing corresponding
transition matrices, the synthetic load profiles are generated directly with Markov
chains. Gu et al. propose to use the Generative Adversarial Network (GAN) to gen-
erate residential load under typical user behavior groups in [12]. The GAN model
can generate synthetic load profiles from random noise upon finishing training the
neural networks. To conclude, top-down models are mainly data-driven and of low
model complexity. They are suitable for scenarios where low computational effort is
preferred to the consumption details of end-use appliances.
Different from the industrial or commercial load, the residential load has strong
randomness, volatility, and is difficult to predict. Thus, conventional methods have
difficulty in balancing the model complexity and the fidelity of synthetic loads.
Numerical experiments in [12] indicate that the GAN model is suitable for load
generation. First, GANs are of low complexity due to its general architecture and
standard training process. Second, GANs are of low computational cost. Once upon
finishing the training, they can be used to generate synthetic loads quickly. Third,
generated loads follow a similar distribution to that of the real loads without losing
5.1 Introduction 101
diversity. Due to these fine properties, GANs have become a new research hotspot
of generative models in recent years. Various variants have been derived based on
the original GAN. In this chapter, we test several classic and popular GAN vari-
ants for residential load synthesis. A comprehensive investigation is conducted to
find the most proper GAN variant. Different metrics are applied to evaluate model
performance.
5.2 Model
Basic GANs contain two parts: one generator and one discriminator [13]. The gen-
erator converts random noise vectors to synthetic samples that subject to the same
distribution as the real samples. The discriminator tells whether the input sample
is real or synthetic. Both the generator and discriminator are composed of neural
networks. Their structures should be designed according to the specific task, e.g.
convolution and transposed convolution layers are commonly used in image genera-
tion tasks, fully connected layers are widely used in vectorized data generation tasks.
The basic structure sketch of the GAN is shown in Fig. 5.1.
Since GAN actually is constructed by neural networks, the training process is sim-
ilar to traditional networks. Both the generator and the discriminator search optimal
parameters by stochastic gradient descent of their loss functions. The main difference
is that we need to deal with the training of two networks synchronously. In practice,
two networks get trained alternately to ensure that their abilities are balanced. During
the training process, the discriminator gets better at distinguishing from real and syn-
thetic samples, while the generator gets better at generating samples that could fool
the discriminator. When the game between the generator and discriminator reaches
a Nash equilibrium, no one gets better through the training. The discriminator could
not tell the difference between the real samples and generated samples, and then we
could use the generator to produce synthetic samples.
Denote the trainable parameters in the generator and the discriminator as θG and
θ D , the mapping function of the generator and the discriminator as G(·) and D(·),
the random noise vector subjected to a given distribution as z ∼ p(z), the synthetic
and real samples as x s and x r , the distribution of real samples as p(x r ). From the
generator we have
x s = G(z; θG ) (5.1)
Now that the discriminator outputs the probability that the input sample is real,
the loss function of the discriminator l D could be defined as
The l D decreases during the training, indicating that the expectation of discriminator
outputs of real samples tending to 1 while that of synthetic samples tending to 0. The
generator aims at confusing the discriminator, it outputs synthetic samples that make
the discriminator presents wrong judgments. Thus the loss function of the gererator
l G could be defined as
l G = −E z∼ p(z) log D(x s ; θ D ) (5.3)
The l G decreases during the training, indicating that the expectation of discriminator
outputs of synthetic samples tending to 1. The θG and θ D are updated by gradient
descent of l D and l G until the two loss functions converge.
To design GANs for residential load generation, we need to take the load charac-
teristics into consideration. First, smart meters in residents usually record the loads
every 15 or 30 min. Gather recorded load points in a day, and we could form a
daily load curve, which could be viewed as a 1D vector mathematically. Second, the
household load is closely related to the living and working habits of family mem-
bers, which often take a week as a cycle. Third, neighboring loads in a daily load
curve have close relationships. Fourth, loads at similar time intervals on different
weekdays have close relationships. Based on the first and second points mentioned
above, we arrange daily load curves for a complete week as rows to form a 2D load
matrix. The load matrix can be viewed as a one-channel image with every load data
point as a pixel. According to the third and fourth points above, a load matrix is
similar to an image since their pixels are both relevant to the neighboring pixels. By
transforming load curves into load matrics, we could generate loads for a complete
week synchronously without missing the relevance of loads on different weekdays.
Also, we can use convolutional layers in the generating and discriminating networks
to discover the deep features behind the load pattern.
Suppose the residential loads are sampled every 30 min, then we have 336 points in
a week. Arrange them to form a one-channel 2D image with a size of 1 × 7 × 48. Then
the output size of the generator and input size of the discriminator are ascertained.
According to the guidelines above, designing of the network architecture is given
below.
5.2.2.1 Generator
Denote the length of the input noise vector as Nz , the height and width of the output
image as Nh and Nw . First, we use a fully-connected layer to map the 1 × Nz input
to a higher dimensional space. The output size of the first layer is 1 × N f c1 . Here
N f c1 is given by
N f c1 = 128 × Nh /4 × Nw /4 (5.4)
Reshape the 1 × N f c1 vector into a 128-channels image with the size of Nh /4 ×
Nw /4. The second layer is a fractional-strided convolutional layer to up-sample
the first layer output. Number of input channels of this layer is set as 128, number
of output channels is set as 128 too, size of the convolving kernel is set as 3 × 3,
stride of the convolution is set as 2, number of zero-padding that will be added to
both sides of each dimension in the input and output image are both set as 1. The
output size of this layer can be derived by
Hout = (Hin − 1) × stride − 2 × input_ padding + (ker nel[1] − 1) + out put_ padding + 1
(5.5)
Wout = (Win − 1) × stride − 2 × input_ padding + (ker nel[2] − 1) + out put_ padding + 1
(5.6)
104 5 Residential Load Data Generation
Thus the output size of this layer is a 128-channels image with the size of Nh /2 ×
Nw /2. In the third layer, we apply a batch norm layer to normalize the input. The
mean and standard-deviation are calculated per-dimension over the batches. The size
of this layer’s output remains unchanged. In the fourth layer we use the ReLU as the
activation function. It is an element-wise function goes as
From the fifth to the seventh layer, we place a fractional-strided convolutional layer,
a batch norm layer and an activation layer in turn. The parameters of these layers are
the same as the former except that the number of output channels of the fractional-
strided convolutional layer is set as 64. Then the output of the seventh layer is a
64-channels image with the size of (Nh + 1) × Nw (when Nh is odd, e.g. 7). In the
eighth layer, we use a convolutional layer to regularize the output channel and size.
The number of input channels is set as 64, the number of output channels is set as
1, size of the convolving kernel is set as 4 × 3, stride of the convolution is set as 1,
number of zero-padding that will be added to both sides of each dimension in the
input is set as 1. The output size of this layer can be derived by
5.2.2.2 Discriminator
The input of the discriminator is real or synthetic load images. We use convolutional
layers to down-sample the original image and map it to the real/fake binary space.
The network architecture of the discriminator is approximately symmetrical to the
generator. The convolutional layers in the discriminator have same parameters except
for the number of input and output channels. The size of the convolving kernel is set
as 3 × 3, the stride of the convolution is set as 2, and the number of zero-padding that
will be added to both sides of each dimension in the input is set as 1. In the first layer
we place a convolutional layer. The number of input channels is 1; the number of
output channels is set as 16. Then the output of this layer is a 16-channel image with
the size of 4 × 24 according to 5.8 and 5.9. In the second layer we use the LeakyReLU
with negative_slope 0.2 as the activation function. It is an element-wise function
goes as
5.2 Model 105
In the third layer, we use a dropout2D layer that randomly zero out entire channels
of the input with the probability of 0.25. The main difference between dropout2D
and normal dropout is that the first abandon channels of the input while the second
abandons pixels of the input. As described in [15], if adjacent pixels within feature
maps are strongly correlated (as is often the case in early convolution layers) then
normal dropout will not regularize the activations and will otherwise just result in
an effective learning rate decrease. Under this circumstance, dropout2D will help
promote independence between feature maps and should be used instead. From
the forth to the seventh layer, we use a convolutional layer with the number of
input channels as 16, the number of output channels as 32, a leaky-ReLU activation
function layer, a dropout2D layer and a batch norm layer in turn. The output of the
seventh layer is a 32-channels image with a size of 2 × 12. From the eighth to the
eleventh layer, we use a convolutional layer with the number of input channels as
32, the number of output channels as 64, a leaky-ReLU activation function layer, a
dropout2D layer and a batch norm layer in turn. The output of the eleventh layer is a
64-channels image with a size of 1 × 6. Reshape the output into a 1D vector with a
size of 1 × 384. Use a fully-connected layer to map the vector into the binary space
of real/fake in the twelfth layer. Finally use the Sigmoid activation to regularize the
output value within [0, 1]. Layers in the discriminator and their parameters are listed
in Table 5.2.
The network architecture presented in this part is general designing. When it
comes to the variants of the GAN, the network architecture needs fine-tuning.
106 5 Residential Load Data Generation
In this part, we introduce unclassified GAN variants for residential load generation.
The structure and network architecture are inherited from Tables 5.1 and 5.2. Loss
functions for the generator and discriminator are also defined in this part.
In order to overcome the instability and poor convergence when training the GAN,
boundary equilibrium GAN (BEGAN) modifies the output of the discriminator and
loss functions. The discriminator reconstructs the input instead of classifying it.
The network architecture of the discriminator is presented in Table 5.3. First, the
discriminator uses convolutional and fully-connected layers to down-sample the input
to the feature space. Then we use fully-connected and fractional-strided convolutional
layers to up-sample the features to the original space. The network architecture of
the generator keeps the same as Table 5.1.
To define the loss function for the BEGAN, we first introduce l1 distance here.
For 2D images x 1 and x 2 with the height and width of Nh and Nw pixels, their l1
distance can be expressed as
Nw Nh
i=1 j=1 |x 1 (i, j) − x 2 (i, j)|
l1 (x 1 , x 2 ) = (5.11)
Nw ∗ Nh
5.2 Model 107
Denote the noise vectors and real load images sampled in the tth training step as
z t and x rt . The reconstruction error of real samples can be expressed as
L (x rt ) = l1 (x rt , D(x rt )) (5.12)
Then the loss function of the discriminator and generator can be defined as
l D = L (x rt ) − kt L (G(z t )) (5.14)
l G = L (G(z t )) (5.15)
In the formula above, λ is the update step of k, γ ∈ [0, 1] is the parameter that
determines the diversity of synthetic samples. The larger γ is, the more diversity
synthetic samples have. According to [16], we set k0 = 0, λ = 0.001 and γ = 0.9.
During the iteration, if k surpasses the bound [0, 1], it would be clipped.
108 5 Residential Load Data Generation
The boundary-seeking GAN (BGAN) retains network architectures in Tables 5.1 and
5.2. It modifies the loss function from vanilla GAN so that the generator could produce
samples on the decision boundary of the current discriminator. As proposed in [17],
the optimal generator is the one that can make the discriminator be 0.5 everywhere.
In order to make the discriminative results of synthetic samples D(G(z)) near the
decision boundary, the BGAN tries to minimize the distance between D(G(z)) and
1 − D(G(z)). Thus, the loss function of the generator is
1
l G = −E z∼ p(z) (log D(G(z)) − log(1 − D(G(z))))2 (5.17)
2
The loss function of the discriminator remains unchanged and is given in Eq. 5.2.
Besides the instability during the training, the vanilla GAN has some other prob-
lems, e.g. be easy to encounter the mode collapse, the loss function cannot indicate
the training process, etc. The Wasserstein GAN (WGAN) solves these problems by
redesigning the network architecture and the loss function. Mathematically, optimiz-
ing the vanilla GAN is equivalent to minimizing the Jensen–Shannon divergence
between the distribution of real samples and generated samples [18]. However, if the
two distributions do not overlap or overlap parts are negligible in high-dimensional
space, their JS divergence is constant. Under such kind of circumstance, the JS diver-
gence can neither reflect the distance nor provide meaningful gradients for training
the networks. WGANs use the Wasserstein distance to measure the similarity between
the real and synthetic distribution. The advantage of Wasserstein distance over JS
divergence is that even if two distributions do not overlap, Wasserstein distance
can still reflect their distance. Briefly, the JS divergence is discontinuous while the
Wasserstein distance is smooth. When we use the gradient descent method to opti-
mize the trainable parameters in neural networks, the JS divergence can not provide
gradient at all while the Wasserstein distance can.
To implement the Wasserstein distance in practice, [18] suggested the following
modifications.
• Every iteration the discriminator parameters updated, clip their values into a fixed
range, e.g. [−0.01, 0.01].
• Use the RMSProp optimizer instead of the Adam optimizer.
However, WGAN is also hard to train in practice. In the vanilla WGAN, trainable
parameters of the discriminator are clipped into a given range to satisfy Lipschitz
condition. This brings two main problems. First, the parameters will be concentrated
on the boundary. In other words, parameters are either maximized or minimized.
As a result, the network tends to learn a simple mapping function. Second, weight
clipping may cause gradient vanishing or explosion. If we set the clipping threshold a
little bit smaller, the gradient would decrease exponentially after several layers, thus
leads to gradient vanishing. On the contrary, if we set it a little bit larger, the gradient
would increase exponentially after several layers, thus leads to gradient explosion.
Author of the WGAN proposes corresponding improvement methods in [19]. The
solution is that we do not need to impose Lipschitz restriction on the whole space.
Instead, we only need to impose Lipschitz restriction on where the generated and real
samples gather and the area between them. In practice, we add a penalty term to the
loss function of the discriminator. Denote the synthetic sample as x s , the real sample
as x r , a random variable ε drawn from N (0, 1). First, we randomly interpolate on
the segment between x s and x r
x̂ = εx r + (1 − ε)x s (5.20)
In the formula above, λ is the weight of the penalty. We set λ = 10 in this chapter
since [19] found that it works well across a variety of architectures and datasets.
The network architecture of the discriminator and generator are shown in Tables 5.1
and 5.2 except the batch norm layers in the discriminator. As suggested in [19], all
the batch norm layers in the discriminator are omitted since we penalize the norm of
the discriminator’s gradient with respect to each input independently, not the entire
batch.
110 5 Residential Load Data Generation
The conditional GAN (CGAN) modifies the model input. Load curve labels are added
to both the generator and the discriminator [21]. Denote the number of categories as
K , the label as y. First, we implement One-Hot encoding to process the label. After
One-Hot encoding, y is converted into a 1 × K vector filled with 0 except that the
kth entry is 1, which represents the sample belongs to the kth category.
Since the generator has an additional input vector, we modify its architecture in
Table 5.1. The new architecture is shown in Fig. 5.2. We replace the first layer with
two parallel fully-connected layers, each of them outputs a 1D feature vector with
a size of 1 × 1536. Then we concatenate two vectors to form a 1 × 3072 vector.
Subsequent layers remain unchanged.
Similar to the modification of the generator, we add labels to the discriminator
input. Use the One-Hot encoding to convert the label y into a K -channels image
with the size of 7 × 48. All channels are filled with 0 except that the kth channel is
1. The new architecture is shown in Fig. 5.3. We replace the first layer in Table 5.2
with two convolutional layers, one with 1-channel input and 8-channel output while
the other with K -channel input and 8-channel output. The stride step and kernel size
are the same as other convolutional layers in Table 5.2. Concatenate the two outputs
to form a 16-channels image. Subsequent layers remain unchanged.
Loss functions of the CGAN remain unchanged as defined in Eqs. 5.2 and 5.3.
5.2.4.2 InfoGAN
exp( ŷk )
cr oss_entr opy( ŷ, k) = − log K (5.23)
i=1 exp( ŷi )
112 5 Residential Load Data Generation
The reconstruction error is the mean squared error between each element in the
recovered latent information ĉ = Dcon (x s ) and the input latent information c
Nl
i=1 (ĉi − ci )2
mse(ĉ, c) = (5.24)
Nl
Here λcate and λcon are both set as 1. In every training step, the generator and dis-
criminator first update according to l G and l D then they update together according to
lin f o , respectively.
The ACGAN is the latest variant of classified generative models and has been widely
used in the generation of labeled samples. The generator gets the noise vector and the
label as input, outputs synthetic samples of the given type. Different from the CGAN,
we apply the embedding layer to process the label instead of One-Hot encoding. The
network architecture of the generator is shown in Fig. 5.4. After embedding the
label into the noise space, we multiply the embedded label and noise to incorporate
the randomness and type information. Subsequent layers remain unchanged as in
Table 5.1. The discriminator outputs the truth probability and label prediction. It is
almost the same as that of the InfoGAN. We only need to omit the 12(3) layer in
Table 5.4.
5.2 Model 113
Denote the truth probability as Ddisc (x), the predicted label as Dcate (x), the
synthetic sample and label as x s and ys , the real sample and label as x r and yr . The
objective of the generator is to make the truth probability Ddisc (x s ) approximate 1,
and the classification accuracy higher. Thus its loss function can be defined as,
1
lG = (−E z∼ p(z) log Ddisc (x s ) + cr oss_entr opy(Dcate (x s ), ys )) (5.26)
2
The objective of the discriminator is composed of two parts. In addition to making
Ddisc (x r ) approximate 1 and Ddisc (x s ) approximate 0, the discriminator should
increase the classification accuracy for both real and synthetic samples. Thus its loss
function can be defined as,
1
lD = (−E z∼ p(z) log(1 − Ddisc (x s )) + cr oss_entr opy(Dcate (x s ), ys ))+
2 (5.27)
1
(−E xr ∼ p(xr ) log Ddisc (x r ) + cr oss_entr opy(Dcate (x r ), yr ))
2
5.3 Methodology
In this section, we will present the methodology to generate residential load data. It
contains three steps including data preprocessing, model training, and model evalu-
ation. Different metrics to evaluate the generation performance are also given in this
part.
114 5 Residential Load Data Generation
Data preprocessing includes two stages. The first stage is data cleaning and regular-
ization. The second is data clustering and labeling.
Smart meters might come across errors during the measurement, storage, com-
munication, etc. It is unavoidable to have some absurd or missing data in the whole
dataset. We should omit samples that contain null or negative load values first. After
removing the bad data, we apply l1 norm regularization to each sample. Denote the
weekly load curve as x with the size of 1 × N , the regularized curve as x̂, then we
have x
x̂ = N (5.28)
i=1 x i
After regularization, the sum of all points in a load curve equals to 1. The reason of
applying l1 norm regularization is that we care more about the consumption pattern
rather than its absolute value.
To generate load curves of a specific type, we need to classify the dataset before
training. k-means clustering is used to label the load curves in this chapter. We use
the Silhouette Coefficient (SC) and sum of the squared errors (SSE) to find the best
k for the dataset.
Denote the dataset as {x 1 , x 2 , . . . , x N }, the clusters as {S1 , S2 , . . . , Sk }, the
clustering centers as {c1 , c2 , . . . , ck }, the distance function as d(x i , x j ). The SC
measures the cohesion within clusters and the separation among clusters. Suppose
x i belongs to the jth cluster, then the cohesion of x i is its mean distance from other
samples in S j .
x∈S j d(x i , x)
ai = (5.29)
S j
In the formula above, |·| returns the size of a set. The separation of x i is its mean
distance from all samples in S p , where the center c p is the nearest center to x i except
cj.
x∈S p d(x i , x)
bi = ; p = arg min d(x i , c j ) (5.30)
S p j
The SC of x i is
bi − ai
SCi = (5.31)
max(ai , bi )
For the whole dataset, the SC equals to the mean of each point,
N
SCi
SC = i=1
(5.32)
N
5.3 Methodology 115
The range of SC is [−1, 1]. A high SC indicates samples of the same category are
close, and samples of different categories are distant. Since SC is highly relevant to
the data distribution, we care more about its trend with respect to different k rather
than its absolute value.
The SSE equals to the sum of squared errors between samples and their clustering
centers, which is defined as
k
SS E =
x − ci
2 (5.33)
i=1 x∈S i
The training of GAN models includes three steps: initialization, iteration, and gen-
eration.
First, we initialize trainable parameters in the network and set hyperparameters
that control the training process. Initialization configurations of different GAN vari-
ants are shown in Table 5.5. In the table, epoch is the number of times that the
training set being traversed. Batch size is the number of samples trained per itera-
tion. Optimizer is the algorithm of gradient descending during the training. Learning
rate and betas are parameters of the optimizer. Noise dim is the length of the noise
vector. Latent dim is the length of latent information vector (for InfoGAN only).
Ncritic is the ratio of discriminator training frequency to the generator training fre-
quency. Trainable parameters in convolution and fractional-convolution layers are
initialized according to normal distribution with the mean and standard deviation in
Conv Initial. Trainable parameters in fully-connected layers are initialized accord-
ing to normal distribution with the mean and standard deviation in Dense Initial.
The hyper-parameters in Table 5.5 is determined by suggestions in existing literature
which have been found to work well across a variety of architectures and datasets.
Second, we iterate trainable parameters in the model over batched samples. The
iteration algorithm is determined by the optimizer. Two optimizers are used in this
chapter, RMSprop for the WGAN and Adam for others. They are both based on the
gradient descent algorithm. Denote the loss function of the network as l, trainable
116 5 Residential Load Data Generation
g t = ∇θ l(θ ) (5.34)
m t = φ(g 1 , g 2 , . . . , g t ) (5.35)
vt = ϕ(g 1 , g 2 , . . . , g t ) (5.36)
θ is updated according to
mt
θ t+1 = θ t − √ (5.37)
vt + ε
In the formula above, β1 and β2 are hyper-parameters betas in Table 5.5. η is the
learning rate.
5.3 Methodology 117
m t = ηg t (5.40)
In the formula above, γ is the smoothing constant. Its default value is 0.99. At the
beginning of training, m 0 = 0, v0 = 0.
Take Adam optimizer as an example, the training process of a GAN model is
shown in Algorithm 1. It should be noted that the presented algorithm is the general
case. It needs fine-tuning for different GAN variants if necessary. For example, when
it comes to the InfoGAN, the generator and discriminator need to be updated once
again by the gradient of the information loss.
2: Initialize the first and second order momentum of the generator and discriminator: m 0G = 0,
0 = 0, m 0 = 0, v 0 = 0.
vG D D
3: Shuffle the training set and pack it into Nb = N /bs batches (N is the volume of training set).
4: for each i = 1, 2, · · · , Ne do
5: for each j = 1, 2, · · · , Nb do
6: t = (i − 1) ∗ Nb + j
7: Get batched real samples x r and yr .
8: Get random noise vectors and labels, denoted as z and ys .
9: Generate synthetic samples x s = G(z, ys ; θG ).
10: Get discriminative results of real and synthetic samples D(x r , yr ; θ D ) and D(x s , ys ; θ D ).
11: if t%Ncritic == 0 then
12: Calculate l G (D(x s , ys ; θ D )).
13: Update gG t by Eq. (5.34).
5.3.3 Metrics
Jensen–Shannon Divergence
The JS divergence is widely used to compute the distance between two distributions.
Denote the real load values as {xri }i=1
N
, the synthetic load values as {xsi }i=1
N
. First, we
regularize load values to the range of [0, 1] as
xri xsi
x̂ri = x̂ i
s = (5.42)
max({xri }i=1
N
) max({xsi }i=1
N
)
Set the number of discrete intervals as K and divide [0, 1] into K segments. Then
the range of kth interval is [ k−1
K
, Kk ]. Compute the number of real load values and
synthetic load values within the kth interval, denoted as Nr k and Nsk respectively.
Then the discrete distributions of the real and synthetic samples are
Nr 1 Nr 2 Nr K
Pr = , ,..., (5.43)
N N N
Ns1 Ns2 Ns K
Ps = , ,..., (5.44)
N N N
K
1 2Pr (k) 2Ps (k)
J S(Pr , Ps ) = Pr (k) log + Ps (k) log
2 k=1 Pr (k) + Ps (k) Pr (k) + Ps (k)
(5.45)
In the formula above, Pr (k) and Ps (k) represent the kth element in Pr and Ps .
The PRD is a novel definition of precision and recall that can disentangle the diver-
gence of image data distributions [23]. It is originated from but superior to recent
5.3 Methodology 119
evaluation metrics that can measure the distribution of images such as Inception Score
and FID. The PRD can quantify the degree of mode dropping and mode invention
on two separate dimensions, which we called PRD curves.
Denote the real load curves as x r , the synthetic curves as x s , merge them to
form a new dataset {x r1 , x r2 , . . . , x rN , x 1s , x 2s , . . . , x sN }. Then use k-means to classify
the dataset and label the curves. Denote the number of real and synthetic samples
in each type denoted as [Nr 1 , Nr 2 , . . . , Nr k ] and [Ns1 , Ns2 , . . . , Nsk ]. The discrete
distributions of the real and synthetic samples are
Nr 1 Nr 2 Nr k
Pr = , ,..., (5.46)
N N N
Ns1 Ns2 Nsk
Ps = , ,..., (5.47)
N N N
Next we compute the PRD curve for Ps with respect to Pr . The PRD will be
computed for an equiangular grid of angle θ values between [0, π/2]. For a given
threshold θ , we compute
Ns1 Ns2 Nsk
P̂s (θ ) = tan θ, tan θ, . . . , tan θ (5.48)
N N N
Then we compare P̂s (θ ) with Pr entry by entry and retain the smaller one to form a
new vector. The precision at θ equals to the sum of the new vector
k
p(θ ) = min( P̂s (θ )i , Pri ) (5.49)
i=1
It measures how much of the real distribution can be generated by a part of the
synthetic distribution. When two distributions are highly similar, both the precision
and recall are close to 1. It should be noted that different thresholds lead to different
trade-offs between precision and recall. If we compute p(θ ) and r (θ ) at every θ from
0 to 2π , we have the precision vector and recall vector. Plot precision on the vertical
Y-axis against recall on the horizontal X-axis, then we get the PRD curve. The PRD
equals the area under the PRD curve. It is given as follows
r (2π)
P RD = p(θ )dr (θ ) (5.51)
r (0)
120 5 Residential Load Data Generation
In order to summarize the PRD curves, we also compute the maximum F1 score,
which corresponds to the harmonic mean of the precision and the recall as a single-
number summary. It is given as follows
p(θ )r (θ )
Fβ (θ ) = (1 + β 2 ) (5.52)
β 2 p(θ ) + r (θ )
Since β ≥ 1 weighs recall higher than precision while β ≤ 1 on the contrast, thus we
compute a pair of values for the PRD curve: Fβ and F1/β . Select the maximum Fβ (θ )
and Fβ (θ ) when θ ranges from 0 to 2π . In this chapter, we choose β = 8 as suggested
in [23]. As mentioned above, F8 weighs the recall higher than precision while F1/8
on the contrast. If the maximum F8 ≤ the maximum F1/8 , then the model is with
higher precision than recall. On the opposite, if the maximum F8 ≥ the maximum
F1/8 , then the model is with higher recall than precision. Considering the problem of
privacy leakage of customers, we believe that a higher precision and a lower recall is
better in residential load generation, which indicates that the synthetic distribution
is easy to recover from the real while the contrary is difficult.
Besides comparing the real and synthetic distributions, we also inspect the visual
characteristics of generated load curves, which is called the fidelity. For example,
the generated weekly load curve should exhibit reasonable periodicity, peak-valley
property, and volatility. The root mean squared error (RMSE) and structural similarity
(SSIM) are applied in this chapter.
The RMSE is used to compute the distance between the vectorized data. It can
measure the similarity of shape and value at the same time. Denote the set of synthetic
curves as {x 1s , x 2s , . . . , x sN }, set of real curves as {x r1 , x r2 , . . . , x rN }. First, we compute
the mean curves of two sets respectively as
N N
x is x ri
x̄ s = i=1
x̄ r = i=1
(5.53)
N N
Next, compute the RMSE distance between x̄ s and x̄ r as
Nl
i=1 ( x̄ s (i) − x̄ r (i))2
R M S E( x̄ s , x̄ r ) = (5.54)
Nl
5.3 Methodology 121
In the formula above, Nl represents the length of curves; x̄ s (i) and x̄ r (i) represent
the load at the ith time slot. The smaller the RMSE, the more similar the synthetic
samples and real samples.
The SSIM index is used to compute the similarity between two images [24]. Denote
the 2D images as x and y, their width and height as Nw and Nh . First, we compute
the mean and variance of the single image and the covariance between two images
as follows.
1 Nw Nh
μx = x(i, j) (5.55)
Nw ∗ Nh i=1 j=1
1 w N
h N
σx = (x(i, j) − μ x )2 (5.56)
Nw ∗ Nh − 1 i=1 j=1
1 w
N hN
σx y = (x(i, j) − μ x )( y(i, j) − μ y ) (5.57)
Nw ∗ Nh − 1 i=1 j=1
Then the luminance, contrast and structure comparison measurement are given as
follows
2μ x μ y + C1
l(x, y) = 2 (5.58)
μ x + μ2y + C1
2σ x σ y + C2
c(x, y) = (5.59)
σ x2 + σ y2 + C2
σx y + C3
s(x, y) = (5.60)
σ x σ y + C3
Value of the SSIM is between 0 and 1. When two images are similar, their SSIM is
close to 1.
In this section, we present the generation and evaluation results of the proposed GAN
models trained on the real world residential load data. All numerical experiments are
conducted on a PC equipped with 12 Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz
and an NVIDIA GeForce RTX 2060 GPU. All programs for GAN variants are written
in Python using torch v1.1.0.
The training data are from the Smart Metering Electricity Customer Behavior Tri-
als [25]. Electricity consumptions of over 5000 Irish homes and businesses from
14/07/2009/ to 31/12/2010 were collected. The load data are recorded every 30 min.
We randomly select 20000 weekly load curves from 1000 residential consumers as
our training set after data cleaning.
First, we use k-means to cluster the curves. For each weekly load curve, we convert
it into a daily load curve by computing the mean of load curves every day. Then we
cluster the averaged daily load curves which can be viewed as vectors with the size
of 1 × 48. The SC and SSE on the vertical Y-axis against k ranging from 2 to 14 on
the horizontal X-axis are plotted in Fig. 5.5.
It can be observed that the elbow of the SSE curve appears when k is in [5, 7]. The
SC decreases as k increases except for k = 7 and k = 11. Considering the trade-off
between SSE and SC, we set k = 7. Centers of 7 clusters are shown in Fig. 5.6. The
number of curves in each cluster is given in Table 5.6.
In this part, we evaluate unclassified GAN variants presented in Sect. 5.2.3. For each
variant, we generate 20000 synthetic weekly load curves, respectively. It should be
noted that since we care more about the shape of load curves rather than their absolute
values, all real curves are regularized before sending to the discriminator. Thus, the
generated curves are also regularized. To recover them to the same value range as the
real loads, we multiply the synthetic curves by a constant scalar which is the ratio of
averaged real load values to averaged synthetic load values.
First, we inspect the visual characteristics of synthetic load curves. We plot mean
curves of real samples and generated samples for BEGAN, BGAN, and WGAN-GP,
respectively in Fig. 5.7. It can be observed that all synthetic curves exhibit periodicity
which is corresponding to the daily living pattern. Also, their peak-valley positions
and values are similar to real ones. In terms of the fidelity, the BEGAN outperforms
the other two obviously. We find that there are many spikes in curves generated by
the WGAN-GP, which reflects the instability of the training process.
124 5 Residential Load Data Generation
(a) BEGAN
(b) BGAN
(c) WGAN-GP
5.4 Case Studies 125
Second, we inspect the statistical characteristics of load values and load curves.
We compute the discrete distribution of real and synthetic load values according
to Eqs. 5.43 and 5.44. Plot the probability distribution functions for three GANs
respectively in Fig. 5.8. It can be observed that loads generated by the BEGAN
deviate from real loads obviously, which indicates that the BEGAN would generate
higher loads comparing with real values. Distributions of loads from BGAN and
WGAN-GP have similar features as the real. The scatter plot of load curve means
on the horizontal X-axis against standard variances on the vertical Y-axis is shown
in Fig. 5.9. It actually reflects the diversity of load curves. We find that although the
curves from the BEGAN fit the real curves best on the shape, the diversity of synthetic
curves is quite poor compared with the BGAN and WGAN-GP. In other words, the
BEGAN is easy to run into the modes collapse. We plot the PRD curves in Fig. 5.10.
In terms of the PRD, BEGAN and BGAN perform better than the WGAN-GP. The
precision is higher compared with the recall in the BEGAN and BGAN, which
indicates the synthetic curves are mainly originated from the real curve distribution
and the real curves are difficult to be recovered from the synthetic curve distribution.
Quantitative metrics presented in Sect. 5.3.3 are listed in Table 5.7. When it comes
to the similarity of visual characteristics, the BEGAN outperforms the other two
obviously. On the other hand, in terms of the similarity of statistical characteristics,
the BGAN and WGAN-GP perform better. To conclude, unclassified GANs have to
make a trade-off between the diversity and fidelity of generated curves.
In this part, we evaluate classified GANs presented in Sect. 5.2.4. For each category,
we generate the same number of synthetic curves as that of real ones. Same as above,
each curve is multiplied by a constant scalar.
First, we inspect the visual characteristics of synthetic load curves. Take the 2th
category as an example, we plot mean curves of real samples and generated samples
for CGAN, InfoGAN, and ACGAN respectively in Fig. 5.11. We can find large ripples
in curves from the CGAN. Although they are periodic, the volatility of curves is
quite unreasonable. Synthetic curves from the InfoGAN exhibit rational peak-valley
positions and values. However, there exist negative load values in generated curves.
126 5 Residential Load Data Generation
(a) BEGAN
(b) BGAN
(c) WGAN-GP
5.4 Case Studies 127
(a) BEGAN
(b) BGAN
(c) WGAN-GP
128 5 Residential Load Data Generation
(a) BEGAN
(b) BGAN
(c) WGAN-GP
5.4 Case Studies 129
(a) BEGAN
(b) BGAN
(c) WGAN-GP
130 5 Residential Load Data Generation
(a) CGAN
(b) InfoGAN
(c) ACGAN
5.4 Case Studies 131
(a) CGAN
(b) InfoGAN
(c) ACGAN
132 5 Residential Load Data Generation
(a) CGAN
(b) InfoGAN
(c) ACGAN
5.4 Case Studies 133
The ACGAN outperforms the other two obviously, mean of synthetic curves is almost
the same as that of real curves.
Second, we inspect the statistical characteristics of load values and load curves.
Take the 7th category as an example, we plot the probability distribution functions
for three GANs respectively in Fig. 5.12. It can be observed that distributions of
generated loads from the InfoGAN and ACGAN are similar to that of real loads.
However, we can observe an upward tail at the end of the probability distribution
function. This might be caused by the supersaturation of the generator neurons. Some
parameters in the network are trapped in local optimum and cause relevant neurons
output maximum for any input. After the T anh activation, the output load value is
always the maximum. The scatter plot of load curve means and standard variances
of the 3th category is shown in Fig. 5.13. The ACGAN is shown to generate load
curves with appropriate diversity. However, synthetic curves have not been able to
cover all possible real scenarios since the model weighs more on the fidelity than the
diversity when making the trade-off. The PRD curves of the 1th category are plotted
in Fig. 5.14. In terms of the PRD, the ACGAN outperforms the other two obviously.
The area under the PRD curve is far greater than the former GANs. It reflects that
the synthetic curve distribution and real curve distribution overlap mainly.
Quantitative metrics for all categories are listed in Table 5.8. It can be found that
the ACGAN wins on most indexes in terms of fidelity and diversity. For different
categories, the ACGAN has stable performance. It should also be noted that the
maximum F1/8 is far greater than the maximum F8 for the ACGAN, which indicates
the model is with high precision and low recall. Thus the synthetic distribution is easy
to recover from the real while the contrary is difficult, which prevents the privacy
leakage of customers.
In summary, the ACGAN balances well between the diversity and fidelity of
generated load curves. Comprehensive comparisons on different metrics between the
ACGAN and other 5 widely used GANs reveal the superiority of the ACGAN. With
the ACGAN, we are able to generate residential load curves of different categories.
5.5 Conclusion
Due to technical barriers and rising privacy concerns, acquire abundant available
residential load data becomes a big challenge both for the academia and industry. To
solve the problem, various generative models are used to produce synthetic residen-
tial loads for use. However, as one of the most popular generative models, GANs are
rarely used in this area. In this chapter, we conduct a comprehensive investigation of
6 widely used GAN models with regard to their performance on load generation. For
every GAN variant, we design the proper network architecture and loss functions.
The standard process of data preprocessing, model training, and evaluation is also
presented. Case study results demonstrate that the ACGAN outperforms others sig-
nificantly. It can balance well between the fidelity and diversity of generated loads.
With the ACGAN, we are able to generate residential load under the specific con-
sumption type, which might be helpful in the generation, delivery, and distribution
of the electrical power.
References
1. McDaniel, P., & McLaughlin, S. (2009). Security and privacy challenges in the smart grid.
IEEE Security & Privacy, 7(3), 75–77.
2. Swan, L. G., & Ugursal, V. I. (2009). Modeling of end-use energy consumption in the residential
sector: A review of modeling techniques. Renewable and Sustainable Energy Reviews, 13(8),
1819–1835.
3. Capasso, A., Grattieri, W., Lamedica, R., & Prudenzi, A. (1994). A bottom-up approach to
residential load modeling. IEEE Transactions on Power Systems, 9(2), 957–964.
References 135
4. McKenna, K., & Keane, A. (2016). Open and closed-loop residential load models for assessment
of conservation voltage reduction. IEEE Transactions on Power Systems, 32(4), 2995–3005.
5. Tsagarakis, G., Collin, A. J., & Kiprakis, A. E. (2012). Modelling the electrical loads of UK res-
idential energy users. In 2012 47th International Universities Power Engineering Conference
(UPEC) (pp. 1–6). Uxbridge: IEEE.
6. Dickert, J., & Schegner, P. (2011). A time series probabilistic synthetic load curve model for
residential customers. In 2011 IEEE Trondheim PowerTech (pp. 1–6). Stockholm: IEEE.
7. Collin, A. J., Tsagarakis, G., Kiprakis, A. E., & McLaughlin, S. (2014). Development of low-
voltage load models for the residential load sector. IEEE Transactions on Power Systems, 29(5),
2180–2188.
8. Stephen, B., Tang, X., Harvey, P. R., Galloway, S., & Jennett, K. I. (2015). Incorporating
practice theory in sub-profile models for short term aggregated residential load forecasting.
IEEE Transactions on Smart Grid, 8(4), 1591–1598.
9. Labeeuw, W., & Deconinck, G. (2013). Residential electrical load model based on mixture
model clustering and markov models. IEEE Transactions on Industrial Informatics, 9(3), 1561–
1569.
10. Xu, F. Y., Wang, X., Lai, L. L., & Lai, C. S. (2013). Agent-based modeling and neural network
for residential customer demand response. In 2013 IEEE International Conference on Systems,
Man, and Cybernetics (pp. 1312–1316). Manchester: IEEE.
11. Uhrig, M., Mueller, R., & Leibfried, T. (2014). Statistical consumer modelling based on smart
meter measurement data. In 2014 International Conference on Probabilistic Methods Applied
to Power Systems (PMAPS) (pp 1–6). Durham: IEEE.
12. Gu, Y., Chen, Q., Liu, K., Xie, L., & Kang, C. (2019). Gan-based model for residential load
generation considering typical consumption patterns. In 2019 IEEE Power & Energy Society
Innovative Smart Grid Technologies Conference (ISGT) (pp. 1–5). (Washington, D.C.: IEEE).
13. Goodfellow, I. J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville,
A., & Bengio, Y. (2014). Generative Adversarial Networks. arXiv:1406.2661.
14. Radford, A., Metz, L., & Chintala, S. (2015). Unsupervised Representation Learning with Deep
Convolutional Generative Adversarial Networks. arXiv:1511.06434.
15. Tompson, J., Goroshin, R., Jain, A., LeCun, Y., & Bregler, C. (2014). Efficient Object Local-
ization Using Convolutional Networks. arXiv:1411.4280.
16. Berthelot, D., Schumm, T., & Metz, L. (2017). BEGAN: Boundary Equilibrium Generative
Adversarial Networks. arXiv:1703.10717.
17. Hjelm, R. D., Jacob, A. P., Che, T., Trischler, A., Cho, K., & Bengio, Y. (2017). Boundary-
Seeking Generative Adversarial Networks. arXiv:1702.08431.
18. Arjovsky, M., Chintala, S., & Bottou, L. (2017). Wasserstein GAN. arXiv:1701.07875.
19. Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., & Courville, A. (2017). Improved Train-
ing of Wasserstein GANs. arXiv:1704.00028.
20. Yi, W., Chen, Q., Kang, C., & Xia, Q. (2017). Clustering of electricity consumption behavior
dynamics toward big data applications. IEEE Transactions on Smart Grid, 7(5), 2437–2447.
21. Mirza, M. & Osindero, S. (2014). Conditional Generative Adversarial Nets. arXiv:1411.1784.
22. Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., & Abbeel, P. (2016). Info-
GAN: Interpretable Representation Learning by Information Maximizing Generative Adver-
sarial Nets. arXiv:1606.03657.
23. Sajjadi, M. S. M., Bachem, O., Lucic, M., Bousquet, O., & Gelly, S. (2018). Assessing Gen-
erative Models via Precision and Recall. arXiv:1806.00035.
24. ZHOU, W. (2004). Image quality assessment: From error measurement to structural similarity.
IEEE Transactions on Image Processing, 13, 600–613.
25. Commission for Energy Regulation (CER). (2012). CER Smart Metering Project - Electricity
Customer Behaviour Trial 2009–2010.
Chapter 6
Partial Usage Pattern Extraction
Abstract Massive amounts of data are being collected owing to the popularity of
smart meters. Two main issues should be addressed in this context. One is the com-
munication and storage of big data from smart meters at a reduced cost which has
been discussed in Chap. 3. The other one is the effective extraction of useful infor-
mation from this massive dataset. In this chapter, the K-SVD sparse representation
technique, which includes two phases (dictionary learning and sparse coding), is used
to decompose load profiles into linear combinations of several partial usage patterns
(PUPs), which allows the smart meter data to be compressed and hidden electricity
consumption patterns to be extracted at the same time. Then, a linear support vector
machine (SVM)-based method is used to classify the load profiles into two groups,
residential customers and small and medium-sized enterprises (SMEs), based on the
extracted patterns. Comprehensive comparisons with the results of k-means cluster-
ing, the discrete wavelet transform (DWT), principal component analysis (PCA), and
piecewise aggregate approximation (PAA) are conducted on real datasets in Ireland.
The results show that our proposed technique outperforms these methods in both
compression ratio and classification accuracy. Further analysis is also conducted on
the PUPs.
6.1 Introduction
© Science Press and Springer Nature Singapore Pte Ltd. 2020 137
Y. Wang et al., Smart Meter Data Analytics,
https://doi.org/10.1007/978-981-15-2624-4_6
138 6 Partial Usage Pattern Extraction
Sparse coding is firstly applied to load profiles to extract PUPs, which is implemented
by K-SVD. In this section, the idea of sparse coding is firstly introduced and the non-
negative K-SVD algorithm is also given.
or approximated,
K
xi ≈ ai ,k dk (6.2)
k=1
where dk = [dk,1 , dk,2 , . . . , dk,N ]T denotes the kth PUP, which has N dimensions,
ai = [ai,1 , ai,2 , . . . , ai,K ]T denotes the coefficient vector of K PUPs, which has K
dimensions. These K PUPs form a redundant dictionary, D ∈ R N ×K , where K is
greater than N .
Generally, the lossy data compression algorithms include two parts, coding and
reconstruction. The encoder transforms the original load profile into another format
that requires less storage space, and the decoder recovers the load profile with minimal
reconstruction loss. From a data compression perspective on sparse coding, given a
certain dictionary, D, searching the coefficient vector, ai , is load profile encoding,
while the linear combination of basis vectors is essentially load profile reconstruction.
Then, the original load profile, xi , is transformed into ai . Sparse coding attempts to
obtain a sparse and redundant dictionary set for using in characterizing the original
load profile. Sparsity means that only a few elements of ai are nonzero; redundancy
means that K > N . Figure 6.1 presents a visualization of sparse coding. The known
redundant dictionary can be used to obtain the coefficient of each PUP by K-SVD
which is introduced in the next part. It shows that only the first, third, fifth, and
K th coefficients are non-zero. That is to say, among these K PUPs, the presented
load profile is a linear combination of only four (the 1st, 3rd, 5th, and K th) PUPs.
Therefore, in the encoding stage, the 48-dimensional load profile is transformed
into four coefficients. Then, in the reconstruction stage, the load profile is restored
according to Eq. (6.1).
140 6 Partial Usage Pattern Extraction
For the sparse coding of M load profiles, Eq. (6.1) can be rewritten in matrix form
as follows:
X = DA (6.3)
min X − D A2F
s.t. ai 0 ≤ s0 , 1 ≤ i ≤ M
(6.4)
ai,k ≥ 0, 1 ≤ i ≤ M, 1 ≤ k ≤ K
dk,n ≥ 0, 1 ≤ k ≤ K , 1 ≤ n ≤ N
6.2 Non-negative K-SVD-Based Sparse Coding 141
Based on the extracted PUPs, load profile classification can be conducted. The effec-
tiveness of sparse coding based feature extraction can be verified by the boost of
classification accuracy. In addition, linear SVM can be used for feature selection and
ranking. In this way, the most relevant PUPs can be selected.
1 m
min ω2 + C ξi
γ ,ω,b 2 (6.6)
i=1
s.t. yi (ω ai + b) ≥ 1 − ξi , ξi ≥ 0
T
where, ω denotes the weights of the features; C > 0 is a penalty parameter for the
training error; ξi denotes the loss function; b is the bias term in SVM. When the
optimal value of C and weights of the features ω for any testing instance ai have
been found, the decision function is defined as
where sgn(·) is the sign function; its value is 1 or −1 when the input is positive or
negative, respectively. There are four reasons to use a linear SVM for load profiles
classification: (i) a linear SVM does not need to compute a kernel value for each
pair of load profiles, which makes it run faster than other kernel-based SVMs. This
means that a linear SVM is able to address large datasets; (ii) a linear SVM has only
one parameter, C, that must be determined. The optimal value of C can be found
quickly, in contrast to other types of SVMs that have two or more parameters that
must be determined; (iii) the cross-validation accuracy of a linear SVM is as good as
that of some kernel-based SVMs when the number of load profiles is large enough;
and (iv) the weights of the features, ω, can be used to determine the relevance and
importance of each feature in the linear SVM-based model.
The weights of the features, ωi , represent the importance of the features in the
decision function. A larger absolute value of ω j means that the jth feature is more
6.3 Load Profile Classification 143
important and relevant in the classification model [9]. Note that only ω in linear
SVM is meaningful. Thus, the features can be ranked according to the absolute value
of ω. The most relevant features will be analyzed and presented in the section that
describes the numerical experiments.
In this section, five criteria are proposed to evaluate the performance of the proposed
method from the perspective of data compression and load profile classification.
Besides, the theory of four commonly used data compression and feature extraction
methods including k-means, DWT, PCA, and PAA, are briefly introduced.
C R = s0 /N (6.8)
(2) The RMSE (root mean squared error) is a frequently used measure of the
reconstruction error,
⎛ ⎞2
1
M K
RMSE = ⎝xi − ajdj⎠ (6.9)
N M i=1 j=1
(3) The MAE (mean absolute error), another index that quantifies the reconstruc-
tion error, is defined as follows:
M
1 K
M AE = xi − a j d j (6.10)
N M i=1 j=1
It is worth noting that the relative error is not suitable for evaluating the loss of
information because when the original data are close to zero, little absolute error will
result in a great deal of relative error. Usually, the smaller the CR, the RMSE, and
the MAE are, the better the compression algorithm is.
144 6 Partial Usage Pattern Extraction
(5) The F1 score is essentially the harmonic mean of the recall and the precision.
It is used to evaluate the classifier that corresponds to’s performance on a dataset
with an imbalance of labeled data.
TP
r ecall = (6.12)
T P + FN
TP
pr ecision = (6.13)
T P + FP
2 × pr ecision × r ecall
F1 = (6.14)
pr ecision + r ecall
Both the accuracy and F1 score are values between 0 and 1. The higher the accuracy
and F1 score are, the better the classifier performs. So far, we have proposed five
indexes used in evaluating the performance of the proposed data compression and
feature extraction method.
6.4 Evaluation Criteria and Comparisons 145
6.4.3 Comparisons
To verify the superiority of the proposed technique, we compare K-SVD with some
other common data compression and pattern extraction algorithms, including k-
means clustering, the DWT, PCA, and PAA, which are briefly introduced in this part.
s0
xi = ci u i (6.15)
i=1
(3) PCA
PCA is another technique that is commonly used for data compression and time
series analysis [11]. PCA is a linear transformation technique that attempts to
identify a new set of orthogonal coordinates for its original dataset. A new set
of uncorrelated variables are derived from the actual interrelated variables in
the data. These new variables, or principal components (PCs), are also sorted in
decreasing order so that the front few capture more of the variations present in
the original variables.
(4) PAA
PAA is an intuitive data compression technique that is often used with time series
[12]. It first segments the time horizon into several equal parts and then, approx-
imates the load profile by replacing the real values that fall in each time interval
with their average values. By piecewise averaging, the “spikes” are filtered out,
and the outline is retained.
(5) Lossless compression methods
A-XDR coding, DEGA (Differential Exponential Golomb and Arithmetic) cod-
ing, and LZMH (Lempel Ziv Markov Chain Huffman) coding are three state-of-
the-art lossless compression algorithms for smart meter data proposed in [13]. For
the datasets with a granularity of 15 min and one hour, the excepted compression
ratios of these methods vary from 0.14 to 1 for the REDD load data set [13]. These
methods have also been tested in the numerical experiments on the same dataset.
146 6 Partial Usage Pattern Extraction
The dataset used in our study was provided by Electric Ireland and SEAI (Sustainable
Energy Authority of Ireland). We select the load profiles of 500 customers (300
residents, 200 SMEs) over 100 days at a granularity of 30 min. After cleaning and
normalization, the entire dataset, X , consists of 49,232 daily load profiles.
Figure 6.3 shows the average daily load profiles of residential customers and
SMEs. The electricity consumption of residential customers increases gradually from
6:00 to 8:00, reaches a steady-state until 9:00, and remains approximately constant
between 8:00 and 16:00. Then, the consumption continues increasing and peaks at
approximately 20:00. The electricity consumption of SMEs remains high during
working hours, from 9:00 to 17:00. The consumption the rest of the time is relatively
low in comparison to that of residential customers.
Figure 6.4 shows the daily load profiles of a resident and an SME for one week.
The electricity-consuming behavior of resident #1002 is significantly different from
that of SME #1021. Resident #1002’s consumption reaches its peak at noon and
is higher at 20:00 and 24:00. In contrast, there are only two short-duration peaks
at approximately 8:00 and 21:00 in SME #1021’s consumption. The rest of the
Fig. 6.3 Averaged daily load profiles of residential customers and SMEs
6.5 Numerical Experiments 147
(a) Resident#1002
(b) SME#1021
Fig. 6.4 Daily load profiles of a resident and an SME for one week
time, electricity consumption is much lower due to some constantly running electric
appliances, such as refrigerators. Each customer has different usage patterns on
different days in terms of peak hours and peak durations. The peaks in morning
and at night can be decomposed, which shows the sparsity of these load profiles.
As explained above, the RMSE and the MAE are determined by the size of the dic-
tionary K and the CR. The value of s0 depends on the requirements of compression
ratio. While the value of K is indeed determined by several times of trials by con-
sidering its influence on recovery error (RMSE) and classification accuracy. Larger
K will result smaller RMSE. However, when the value of K is much smaller, it will
have larger influence on the RMSE; when the value of K gets larger, the influence
148 6 Partial Usage Pattern Extraction
Fig. 6.5 The RMSE of the K-SVD algorithm as the parameters vary
Fig. 6.6 The accuracy of the K-SVD algorithm as the parameters vary
Fig. 6.7 The RMSE of the K-SVD algorithm for different numbers of iterations
We also record the RMSE at each iteration of the K-SVD algorithm, as shown
in Fig. 6.7. The RMSE decreases slightly when the number of iterations is greater
than 60. Therefore, we choose 60 for the number of iterations, J , in our case studies.
Figure 6.8 shows the reconstructions of four typical loads using the K-SVD algo-
rithm. The solid and dotted lines are the original and reconstructed load profiles,
respectively. The overall trend of each load profile is identified, and most of the
peaks are reproduced.
150 6 Partial Usage Pattern Extraction
Fig. 6.10 The ten most relevant and important PUPs for SMEs
Fig. 6.11 The ten most relevant and important PUPs for residential customers
Fig. 6.12 The compression quality of the K-SVD algorithm, the DWT, PCA, and PAA
We retain the largest s0 coefficients of the DWT and the PCs of the PCA, and then,
calculate the MAE of each case by varying s0 from 1 to 20 in steps of one unit. We
also perform PAA by dividing the 48 time periods into 1, 2, 3, 4, 6, 8, 12, and 16
parts. As shown in Fig. 6.12, the K-SVD algorithm provides the best compression
quality for all values of s0 . We also have tested the performance of three state-of-
the-art lossless compression methods on the dataset from Electric Ireland and SEAI.
These methods include DEGA coding, LZMH coding, and A-XDR coding. Their
CRs are summarized in Table 6.3. The results show that DEGA coding performs best
among these three state-of-the-art lossless compression methods. The compression
ratios of DEGA coding is 0.257. Compared with DEGA coding, the proposed sparse
coding-based method can achieve a compression ratio of 0.083 when s0 is set to 4
with very little reconstruction error (only 0.066 measured by MAE).
Figure 6.13 shows reconstructions of the load profile shown in Fig. 6.8 performed
using the DWT, PCA and PAA when s0 = 6. The performance is worse than that of
the K-SVD algorithm when s0 = 5. PAA and PCA can identify trends in the load
profiles; the DWT can retain the peak value of each load profile. However, the K-SVD
algorithm can capture the trend and the peak of each load profile simultaneously, as
shown in Fig. 6.8. The K-SVD algorithm performs better because individual load
6.5 Numerical Experiments 153
Fig. 6.13 A load profile reconstructed using the DWT, PCA, and PAA
profiles vary significantly and have fixed consumption patterns, which makes them
suitable for sparse coding.
As compression algorithms, the K-SVD algorithm and the DWT are very similar
because they can be viewed as using a linear combination of several basis vectors.
The basis vectors of the DWT are predefined and orthogonal. Those of the K-SVD
algorithm are non-orthogonal and can be adapted to the characteristics of the set
of load profile. These are all lossy compression techniques. The DWT and PCA
can recover a load profile without information loss when all 48 elements are used;
however, information is still lost by the K-SVD algorithm when s0 = 48.
Despite its time complexity, the coding performed in a PCA is explicit, while those
of the DWT and the K-SVD algorithm are implicit, which means that optimization
or another certain operation is necessary. PCA and the DWT involve only linear
operations. The time required for coding in the K-SVD algorithm is about 6 hours
which is much higher than that required by PCA and the DWT, but it is still acceptable
in practice. A compressed load profile does not exactly require real-time acquisition
but does require that the data be transferred daily.
Table 6.4 compares the performance of the K-SVD algorithm with those of k-
means clustering (k = 80), PCA (s0 = 5), the DWT (s0 = 5), and PAA (s0 = 6) in
terms of the five proposed criteria. Except for k-means clustering and the original load
profiles, the K-SVD algorithm has lower reconstruction error and higher classification
accuracy. In particular, accuracy is significantly improved. k-means clustering, as a
special case of the K-SVD algorithm, performs better than the other techniques and
has larger reconstruction errors at the same time. The original load profiles are also
154 6 Partial Usage Pattern Extraction
classified and Fig. 6.14 shows the ωi for different times of day. The negative values
of ωi are mainly concentrated in the morning and at night, and the positive value of
ωi are mainly concentrated during working hours and at dawn, which is consistent
with the results of the K-SVD algorithm.
From the dataset, we can clearly see that residential users and SME users have sig-
nificant consumption preference. Figure 6.15 shows the four load profiles for the two
kinds of users, which are drawn by simply applying k-means clustering to the dataset.
6.6 Further Multi-dimensional Analysis 155
Actually, residential profiles usually have short and strong peaks at certain periods
of time in a day, while SME users have less variability and their consumptions last
longer. This is not properly presented in Fig. 6.15, because the means of profiles
reduce the fluctuation. Most traditional clustering methods usually require a step of
calculating the centroid of a cluster. However, a centroid is sometimes not represen-
tative enough, and we can see that PUPs perform well in capturing the features and
keeping the original information of the load profiles from Figs. 6.15 and 6.16.
Figure 6.16 shows the 8 most frequently used PUPs for residential and SME
users. For residential users, some persistent PUPs at night like the orange line can be
considered as the usage of television or personal computers before sleep. Some peak
shape PUPs might correspond to using microwave oven or dryer at a certain time.
For SME users, most PUPs have persistent load during office hour but the peak time
is usually different. There is a PUP that lasts for a whole day for both kinds of users,
which corresponds to appliances like refrigerators or fresh air systems.
156 6 Partial Usage Pattern Extraction
D48 D70
300
150
Aggregate coefficient
Aggregate coefficient
100 200
50 100
0 0
0 200 400 0 200 400
Day Day
(a) Seasonal coefficient series (b) Weekly coefficient series
D13 D72
150 300
Aggregate coefficient
Aggregate coefficient
100 200
50 100
0 0
0 200 400 0 200 400
Day Day
(c) Constant coefficient series (d) Mixed type coefficient series
Periodic patterns can be extracted from users’ sparse coding, and seasonal-related
PUPs can be defined according to their coefficient. Residential users are considered
to have stronger seasonal patterns, so we use them as an example. We aggregate the
whole residential dataset with 3411 users and add up their coefficient for each of the
536 days. Our analysis can be extended to user aggregates of the other size, and the
results are similar in most cases. Typically we can see three kinds of PUPs: seasonal
PUPs, weekly PUPs, and constant PUPs. The coefficient of seasonal PUPs has an
approximate period of 365 days. As for the weekly PUPs the period is 7 days, and for
constant PUPs the variation of the coefficient is relatively small. Figure 6.17 shows the
coefficient series of four typical PUPs. Figure 6.17a shows the residential aggregate
uses less PUP#48 during winter. Figure 6.17b illustrates a weekly pattern of PUP#70.
Figure 6.17c is a constant PUP and Fig. 6.17d is a seasonal-weekly mixed PUP.
To quantify the periodic characteristics of the PUPs, time series decomposition
methods can be applied to the coefficient series. Since the length of the series is 536
which is not long enough for most decomposition methods to deal with a period
of 356, here we use Discrete Fourier Transform (DFT) to extract the spectra of the
coefficient series. Figure 6.18 shows the spectrum of PUP# 70 using DFT. Due to
6.6 Further Multi-dimensional Analysis 157
20
Seasonal Components
Weekly Components
15 Other Components
Amplitude
10
0
0 50 100 150 200 250
-1
Frequency ( 1/536 day )
30%
25% Seasonal
Weekly
Percentage (%)
20%
15%
10%
5%
0%
PUPs #1~80
the non-sinusoidality of the periodic components and leakage error, seasonal and
weekly components correspond to several lines marked with different colors in the
spectrum. Note that we only show the half spectrum because of symmetry and that
the DC component is not plotted.
The energy of a component in DFT is defined as its squared amplitude in the
spectrum. We calculate the proportion of energy for the 80 PUPs and Fig. 6.19 shows
the results. Based on the amplitudes and phases of the DFT results, we can determine
which PUPs residential use more in winter or summer, as well as in weekdays or
weekends. Figure 6.20 shows typical winter PUPs and summer PUPs for residential
users. Residential users tend to consume more electricity in the evening during winter.
In one of the summer PUPs, we can see people start consuming electricity at midnight.
To some extent, this PUP marks air-conditional usage during bedtime. Also, if we
look carefully at its coefficient series, we can see there is a sudden increase around
Christmas every year. This is likely to correspond to events such as night parties
158 6 Partial Usage Pattern Extraction
Consumption/p.u.
Consumption/p.u.
0.8
0.3
0.6
0.2
0.4
0.1 0.2
0 0
0 5 10 15 20 25 0 5 10 15 20 25
Time/hour Time/hour
Consumption/p.u.
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 0
0 5 10 15 20 25 0 5 10 15 20 25
Time/hour Time/hour
which happens more during summer or winter holidays. Figure 6.21 shows the typical
weekly PUPs. Residential users clearly consume electricity earlier on weekdays when
they have to go to work.
For every load profile, the dominant PUP is defined with the biggest coefficient. In the
SME pre-trail survey, Question 61022 gave us the information about whether a user
works during weekends. However, the survey only covers 290 of the 347 SME users.
We counted their dominant PUPs on their working day and off day respectively. The
average frequency a PUPi appears on a working day and on an off day is defined
as wi and oi . And pi defined as wi /(wi + oi ) measures the possibility of a working
day when PUPi appears. A working day pattern of SME Users has a greater pi and
an off day pattern has a smaller one. Figure 6.22 shows the typical SME PUPs for
6.6 Further Multi-dimensional Analysis 159
Consumption/p.u.
0.4 0.8
0.3 0.6
0.2 0.4
0.1 0.2
0 0
0 5 10 15 20 25 0 5 10 15 20 25
Time/hour Time/hour
Fig. 6.22 Working day and off day patterns of SME users
1
Mon Easter
Monday
Tue May Bank June Bank August Bank Halloween 0.8
August Bank Halloween
Wed Holiday
St Patricks Holiday Holiday Holiday
Day 0.6
Christmas Christmas
Thu
& new year & new year 0.4
Fri
Sat 0.2
Sun
0
10 20 30 40 50 60 70
Week#
working day and off day. The working/off patterns can be applied in designing price
packages and load forecasting.
To demonstrate the effectiveness of the two kinds of PUPs, we do a simple test on
the remaining 57 users. The color bar in Fig. 6.23 marks the proportion of working
day PUPs on a specific day for the 57 users. A yellow one indicates a working day, and
a blue one indicates an off day. The results in Fig. 6.23 is consistent with weekends
and all the public holidays in Ireland without any prior knowledge. As we can see,
some SMEs work on Saturdays but fewer on Sundays. The prediction of working/off
day is not only useful in energy services but also a good reference for economy and
labor market.
While periodic pattern analysis is useful in load forecasting, entropy analysis mea-
sures the variability of a customer, which can help find potential targets for demand
response programs. The 536 days are classified into 7 groups according to the day
160 6 Partial Usage Pattern Extraction
Entropy
2
1.5
1
0.5
0
Residential SME
of the week, i.e. Monday, Tuesday, etc. For every customer and every group of days,
the occurrence of his dominant PUPs is counted, and the entropy is calculated. A
customer’s entropy is defined as the average of the 7 groups of entropy.
Figure 6.24 shows the box-plot of the entropy for the two kinds of customers.
The red lines mark the median, and the boxes mark the 1st and 3rd quantiles q1
& q3 . The black lines mark the Whisker lines defined as q3 + 1.5(q3 − q1 ) and
q1 − 1.5(q3 − q1 ). The distribution of SME users’ entropy is significantly lower than
that of residential users. A customer is more likely to shift between different PUPs
on a fixed day of the week with higher entropy, indicating that his consumption is
more flexible. Also, a lot of residential entropy is below the lower Whisker lines and
marked as outliers. In some cases, this is due to bad data with zero measurements. In
other cases, it indicates that there is usually nobody at home so that the consumptions
are consistently low.
Summer Winter
0.3 0.3
Average usage in a day
0.1 0.1
0 0
PUPs #1~80 PUPs #1~80
6.7 Conclusions
References
1. Gao, P., Meng, W., Ghiocel, S. G., Chow, J. H., Fardanesh, B., & Stefopoulos, G. (2016). Missing
data recovery by exploiting low-dimensionality in power system synchrophasor measurements.
IEEE Transactions on Power Systems, 31(2), 1–8.
2. Yoshua, B., Aaron, C., & Pascal, V. (2013). Representation learning: a review and new per-
spectives. IEEE Transactions on Pattern Analysis & Machine Intelligence, 35(8), 1798–1828.
3. Aharon, M., Elad, M., & Bruckstein, A. M. (2005) K-SVD and its non-negative variant for
dictionary design. In Wavelets XI
4. Piao, M., & Ryu, K. H. (2016) Subspace frequency analysis-based field indices extraction for
electricity customer classification. ACM Transactions on Information Systems, 34(2), 1–18.
162 6 Partial Usage Pattern Extraction
5. Wang, Y., Chen, Q., Kang, C., Zhang, M., Wang, K., & Zhao, Y. (2015). Load profiling and its
application to demand response: A review. Tsinghua Science and Technology, 20(2), 117–129.
6. Piao, M., Shon, H. S., Lee, J. Y., & Ryu, K. H. (2014) Subspace projection method based
clustering analysis in load profiling. IEEE Transactions on Power Systems, 29(6), 2628–2635.
7. Olshausen, B. A., & Field, D. J. (1996) Emergence of simple-cell receptive field properties by
learning a sparse code for natural images. Nature, 381(6583), 607.
8. Chang, K. W., Hsieh, C. J., & Lin, C. J. (2008) Coordinate descent method for large-scale L2-
loss linear support vector machines. Journal of Machine Learning Research, 9(3), 1369–1398.
9. Chang, Y. W., & Lin, C. J. (2008) Feature ranking using linear SVM. In Causation and pre-
diction challenge (pp. 53–64)
10. Ning, J., Wang, J., Gao, W., & Liu, C. (2011). A wavelet-based data compression technique
for smart grid. IEEE Transactions on Smart Grid, 2(1), 212–218.
11. Mehra, R., Bhatt, N., Kazi, F., & Singh, N. M. (2013) Analysis of pca based compression and
denoising of smart grid data under normal and fault conditions. In 2013 IEEE International
Conference on Electronics, Computing and Communication Technologies (pp. 1–6). IEEE.
12. Lin, J., Keogh, E., Lonardi, S., & Chiu, B. A. (2003) Symbolic representation of time series,
with implications for streaming algorithms. In Proceedings of the 8th ACM SIGMOD Workshop
on Research Issues in Data Mining and Knowledge Discovery (pp. 2–11). ACM.
13. Unterweger, A., Engel, D., & Ringwelski, M. (2015) The effect of data granularity on load
data compression. In DA-CH Conference on Energy Informatics (pp. 69–80). Springer.
Chapter 7
Personalized Retail Price Design
7.1 Introduction
How to make full use of the smart meter data to promote better demand-side manage-
ment has been a major focus area for utility companies with the increasing installation
of smart meters [1, 2]. Smart meters can provide the retailer with more detailed high-
quality information about the electricity consumption activities, and the retailer can
use the data to extract the electricity consumption patterns of consumers and develop
innovative customizing retailing strategies. It is appealing to the retailer that it could
increase profits and market penetration while maintaining consumers’ willingness
through personalized pricing or customizing pricing [3]. There has been a surge need
in researches on how to effectively and practically implement customizing pricing
for retailers in the power market [4].
The study of customizing pricing originates from the researches of demand-side
energy management. To promote better energy management, different types of time-
varying tariffing approaches have been proposed by giving consumers incentives,
© Science Press and Springer Nature Singapore Pte Ltd. 2020 163
Y. Wang et al., Smart Meter Data Analytics,
https://doi.org/10.1007/978-981-15-2624-4_7
164 7 Personalized Retail Price Design
among which time-of-use (ToU) pricing is more widely adopted for its less volatility
and risk for consumers [5]. The energy management problems faced by the retailer
focuses on the bidding strategy [6–9] and retailing price design [10–12], both of
which are highly correlated to each other. These two kinds of energy management
problems are commonly designed as a Stackelberg game [6–12] which is a hierar-
chical control problem in which sequential decision making of the retailer and the
consumers occurs. Smart meter data enables us to know the detailed differences
among consumers, and further improvement could be made in the situation where
every different individual strategic consumer may behave differently.
Some interesting works related to customizing pricing design have been developed
in recent years. [13, 14] focus on how to identify the differences among consumers
through different clustering methods and statistical analysis. [15] uses appliance
identification to find a fine-grained method to simulate consumer behaviors. [16]
proposes a game between the retailer and different types of consumers (residen-
tial,industrial,commercial). In a market consisting of a single type of users, as one
of the vital considerations of energy management, the market mechanism must be
incentive compatible [17, 18]. Consumers should be allowed to choose freely, and
consumers’ willingness should be fully respected. Incentive compatibility states in
economics and game theory that the incentives should motivate the actions of individ-
ual participants (consumers) to behave in a manner consistent with the rules estab-
lished by the agent (retailer) [19]. In a voluntary optional market, each consumer
evaluates the benefits of each tariff scheme provided by the retailer and selects the
one that offers the greatest benefits, so the consumer is supposed to prefer his/her des-
ignated scheme than any other one [17]. Problems may arise in these models which
design pricing schemes for different individuals separately because the retailer still
needs to check each consumers’ satisfaction for all the schemes to guarantee that each
consumer prefers the pricing scheme just tailored for him. Otherwise, it will result
in ill incompatible incentive design which also suggests a huge deviation between
consumer’s will and the retailer’s expectation.
Fueled by the increased availability of high-quality smart meter data, this chapter
proposes a novel data-driven approach for incentive-compatible customizing time-
of-use (ToU) price design based on massive historical smart meter data. The Stack-
elberg relationship between the profit-maximizing retailer (leader) and the strategic
consumers (followers) along with the considerations for the incentive-compatible
market is modeled as a bilevel optimization problem. Smart meter data is used to
estimate the satisfaction of the consumer, predict consumer behaviors and preferences
inspired by the work of [20]. Load profile clustering is also implemented to gather
consumers of similar preferences. The bilevel problem is integrated and reformu-
lated as a single mixed-integer nonlinear programming (MINLP) problem and then
simplified to a mixed-integer linear programming (MILP) problem. To validate the
proposed model, The smart meter dataset from Commission for Energy Regulation
(CER) in Ireland is adopted to better illustrate the whole process.
7.2 Problem Formulation 165
Contracts DLMPs
Market of Sensing,
Tariff Design
Retailing-level Metering,
Caching,
... Communication
T
F ( p, q) = u (q) − pt q t (7.1)
t=1
where u (q) corresponds to the consumer’ satisfaction gained from using certain
amount of power. u (q) often takes the form of a concave function to simulate the
diminishing marginal utility [19, 21].
The consumers follow the principle of utility maximization, for any given p, a
consumer would adapt his consumption to the best response under p. That means q
will be set to the value that maximize F ( p, q) under any fixed p. Namely,
consumer k through his load profiles, but he or she does not choose the designated
scheme, the consumer k can be viewed as declaring a false preference. The retailer
should ensure that the satisfaction consumers receive is the highest when the designed
desired outcome is achieved so that the pricing scheme a consumer like is exactly
the one the retailer designs for him. In this way, truthfully and faithful choosing the
corresponding pricing scheme is the consumer’s dominant strategy. Thus for every
consumer k, if the retailer designs the pricing scheme r for him, Eq. (7.4) should be
satisfied
Uk pr ≥ Uk ( p ) ∀k (7.4)
where p denotes any other pricing scheme. Thus, choosing a pricing scheme r is the
dominant strategy. Besides, if the retailer wants consumers to adopt the new pricing
schemes, the utility gained from the new pricing should exceed the old one, expressed
as follows
Uk pr ≥ Uk ( p0 ) ∀k (7.5)
where Eq. (7.5) is a key to associate the price and power consumption in different
time slots. If the retailer desires to raise the price during some hours, then it must
bring down the price during other periods to hold Eq. (7.5). Equation (7.5) makes the
power consumption of different time act as substitutes and also makes load shifting
during different periods possible.
The retailer wants to maximize its profit by providing diverse types of price schemes.
The retailer’s daily profit function is given by
K
T
T
NF
R= pk,t × qk,t − pnF × L nF × on,t
F
× on
k=1 t=1 t=1 n=1
(7.6)
T
− ptD,est × L tD − ξ × CV aR
t=1
supply load at period t from day-ahead markets. The retailer will buy all predictable
consumers’ load through forward contracts and day-ahead markets to avoid price
fluctuation in real-time markets. The real-time market cost is due to the need to
balance unpredictable random load. The purchasing strategy to balance predictable
load is as follows
K
NF
qk,t = L nF × on,t
F
× on + L tD,est , ∀t (7.7)
k=1 n=1
(3) The third term is the risk of loss calculated by using conditional value at risk
(CVaR). CVaR is a risk measure frequently used in current risk management for the
retailer [11, 23] As discussed above, Eq. (7.7) does not include any purchasing from
real-time market. But cost for balancing random load must be included. Besides, we
use estimation price ptD,est for day-ahead cost. However, the real day-ahead price
contains a certain deviation and fluctuation. These two cost cannot be predicted
before new price schemes takes effective and we considers as pure risk. ξ , the risk
weighting factor, measures the degree of importance the retailer attaches to risk. This
chapter uses CVaR to represent these two risk loss as follows
1
C V a R = inf {u +
u∈R 1 − αC V a R · N S
NS (7.8)
+
−R D − R RT − u }
n S =1
The number of pricing schemes could reach as high as the number of consumers but
it will lead to serious price discrimination which is too hard to implement in reality. A
more reasonable approach is to offer a relatively small number of choices compared
to the number of consumers out of considerations for two aspects: (1) it reduces
the complexity brought by the number of consumers for the retailer; (2) it is more
practical for consumers to choose among relatively small number of price schemes.
In order to achieve this goal, consumers with similar preferences will be designated
the same price scheme and we should cluster consumers of similar preferences.
Figure 7.2 shows the basic idea how smart meter data is used.
Before clustering, concrete expressions of utility should be specified because
it indicates the level of preference. Different choices of utility function itself is a
7.2 Problem Formulation 169
Form
Discover utility Data-driven
optimization
from data Price Design
problem
Correlate
Cluster load Make centroids
preference with
profiling data representatives
shape
Fig. 7.2 The flowchart to illustrate how smart meter data is used
∂ F( p(0) , q (0) )
F( p(0) , q (0) ) = 0 = 0, ∀t (7.9)
∂qt
T
pt(0) qt(0) qt α
u (q) = − 1 + pt(0) qt(0) (7.10a)
t=1
α qt(0)
α−1
1
pt
qt∗ = × qt(0) (7.10b)
pt(0)
T α
1 pt α−1
U ( p) = −1 − 1 × qt(0) pt(0) (7.10c)
t=1
α pt(0)
170 7 Personalized Retail Price Design
and cluster the processed load profiling into r = 1, 2, . . . , R clusters and each cluster
contains kr = 1, 2, . . . , K r load profiling.
Theorem 7.1 Load profiling of consumers who have the same preferences are of
the same shape after being processed by Eq. (7.11).
From Theorem 7.1, we know clustering can cluster consumers of similar prefer-
ences. Notice there may be some deviations between different load profiling in real
clustering so that Theorem 7.1 may only approximately holds true. Furthermore, the
mean value of the original load profile in each cluster (centroid) can represent all
the members in the corresponding cluster in terms of both preferences and quantity
of load. Since the centroid is in similar shape of the cluster members, so they have
similar preferences. In terms of the quantity of the load in the cluster, we have
Kr Kr 1
pr,t α−1
qk,t = × qk,t(0)
k=1 k=1
pr,t(0)
1
pr,t α−1
Kr
= × qk,t(0) (7.12)
pr,t(0) k=1
1
pr,t α−1
= × K r × qk,t(0) = K r × qr,t
pr,t(0)
So the centroid can represent the members in terms of electricity quantity. It simpli-
fies the problem of the retailer by equivalently reducing the number of consumers.
Subscript r is used to represent the cluster centroid. Equations (7.6) and (7.7) are
conversed to equations below
R
T
T
NF
R= K r × pr,t × qr,t − pnF × L nF × on,t
F
T
× on −
ptD × L tD − ξ × C V a R (7.13a)
t=1
R
NF
qr,t × K r = L nF × on,t
F
× on + L tD , ∀t (7.13b)
r =1 n=1
For detailed clustering methods, in this chapter, different clustering methods are
adopted to find the best clustering results [24]. These methods include: hierarchical
clustering, k-means clustering, fuzzy C-means clustering, Gaussian mixture model.
Both within-cluster compactness and between-cluster separation of different clus-
tering results contribute to the different results of the model. Clustering results are
evaluated by the Davies–Bouldin index, which represents the worst-case within-to-
between cluster ratio for all clusters. For a detailed discussion about how clustering
affects the whole model sees Sensitivity Analysis.
Before formulating the integrated model, some other constraints are given below to
fix the ToU structure. We assume each ToU has m = 1, 2, . . . , M blocks and prm is
the price of the block m for pricing scheme r
M
T
m
er,t = 1, m
er,t ≥ Dmin , ∀m, r (7.14a)
m=1 t=1
m
T
m
e − e m + e m
r,T r ,1 r,t−1 − er,t = 2, ∀m, r (7.14b)
t=2
M
pr,t = m
er,t × prm , ∀t, r (7.14c)
m=1
m
where et,r is a binary variable. For pricing scheme r , if period t belongs to its ToU
m
block m, et,r = 1. Dmin is the minimum duration periods of each block. Generally,
a ToU price contains three blocks but m can be other values rather than just 3 in our
framework. Equation (7.14a) restricts each time slot belongs to one block. Equation
(7.14b) restricts each block only changes two times.
Here, we can give the integrated model of designing customizing ToU to maximize
the retailer’s profit while ensuring consumers’ willingness
max R (7.15a)
s.t. (7 : 4)(7 : 5)(7 : 10b)(7 : 10c)(7 : 13a)(7 : 13b)(7 : 14a)(7 : 14b)(7 : 14c)
(7.15b)
172 7 Personalized Retail Price Design
Notice subscript r will be added for Eqs. (7.4), (7.5), (7.10b) and (7.10c). Clearly,
this model is a MINLP model.
7.3.1 Framework
The integrated optimization problem is nonlinear and may be difficult to find the
optimal solution. This model is nonlinear mainly due to the power function in Eqs.
(7.10b) and (7.10c), the product of two decision variables in Eqs. (7.14c), (7.13a), the
expression of CVaR Eq. (7.8) and the absolute values in Eq. (7.14b). Piece-wise linear
approximation is used to deal with power function. Equivalent linear transformation
is used to eliminate binary variable product, simplify CVaR and eliminate absolute
values.
One of the reasons for being a nonlinear model is the power function term in con-
straints and the term pr,t × qr,t of two decision variables’ product in objective func-
tion Eq. (7.13a). If the term pr,t × qr,t is treated as a whole and qr,t is substituted by
Eq. (7.10b), the whole term can be expressed as
α
α−1
1
1
pr,t × qr,t = pr,t ×
α−1
× qr,t(0) , (7.16)
pt(0)
α/(α−1)
which is a function of pr,t and relates to the term pr,t . Meanwhile, the term
α/(α−1)
pr,t also appears in consumers’ utility function Eq. (7.10c).
It is not a coincidence or a special case that just fits this model. Taking pr,t × qr,t as
two decision variables’ product in the retailer’s profit function is a common practice
and qr,t should be affected by pr,t somehow to simulate demand response in related
works [7, 13]. Considering things above, this chapter treats the term pr,t × qr,t as a
whole and uses the piece-wise linear approximation of qr,t and pr,t × qr,t respectively
for linearizing the model. In this chapter, we assign
α 1
pr,t = pr,t
α−1
,
pr,t = pr,t
α−1
(7.17)
The first term appears in profit and utility function and indicates how profit and utility
will change as price changes. The second term appears in the consumer’s reaction to
price and indicates how behaviors change along as price changes.
7.3 Solution Methods 173
Piece-wise linear approximation of pr,t and
pr,t are as follows
J +1
pr,t = w j,r,t a j,r,t (7.18a)
j=1
J +1
φr,t = w j,r,t a j,r,t (7.18b)
j=1
J +1
θr,t = w j,r,t
a j,r,t (7.18c)
j=1
where j = 1, 2, . . . , J is the piece-wise segment number. a1,r,t < a2,r,t < · · · <
a J +1,r,t are segment connection endpoints. Positive continuous variables w j,r,t and
binary variables z j,r,t are intermediate
variables.
φr,t and θr,t are the piece-wise lin-
ear approximation of pr,t and
pr,t respectively. The specific method to find
segment connection endpoints is referred to Ref. [25].
The product of a binary variable and a continuous variable in Eq. (7.14c) is conversed
to linear constraints below:
σr,t
m
≤ M × et,r
m
, σr,t
m
≤ prm (7.19a)
σr,t
m
≥ prm − M × 1 − et,r
m
, σr,t
m
≥0 (7.19b)
where σr,t is the result of the product operation, M is a sufficiently large number
compared to prm .
7.3.4 CVaR
1 N
S
CV aR ≥ u + Wn S (7.20a)
1 − α C V a R · N S n =1
S
Wn S ≥ 0, Wn S ≥ −RnDS − RnRT S
−u (7.20b)
e1 − e2 ≤ A ≤ e1 − e2 + 2 × B (7.21a)
e2 − e1 ≤ A ≤ e2 − e1 + 2 × (1 − B) (7.21b)
For simplicity, the subscripts are omitted and e1 ,e2 represent any two terms that take
absolute values in Eq. (7.14). A is the result of modulus value operations. B is an
intermediate variable. A and B are both binary variables.
To sum up the arguments above, the objective function and all constraints are
conversed to linear form and a MILP model is finally reformulated. The problem
is coded into General Algebraic Modelling System (GAMS) model solved with the
MILP solver Cplex. To compare the performance of the linear and nonlinear model,
the nonlinear model is also coded into GAMS model solved with the MINLP solver
BARON. The programs are run on a personal computer with Intel Core i5 2.80 GHz
CPU and 8 GB RAM.
The smart meter electricity trial data of 6435 consumers from Commission for Energy
Regulation (CER) based in Ireland are used for case study. The data were collected
every 30 min and T = 48 is set in this case. Before new pricing schemes take effect,
flat rate pt (0) = 0.2 $/kWh for all t is adopted by the retailer. National Institute of
Economic and Industry Research estimated the mean long run elasticity of demand
as −0.37 for residential consumers and could rise to −0.4 [26]. We set demand
elasticity as ε = −0.4 and get α = −1.5. Each different ToU scheme has 3 segments
so M = 3. Dmin is set to 4 so that the minimum duration time of each ToU block is
2 h. ( p) and
( p) is approximated by J = 15 lines in the range of p ∈ [0.04, 0.8]
just as shown in Fig. 7.3. ξ is set to 1.
7.4 Case Study 175
7.4.2.1 Clustering
The data are clustered into 5−10 clusters by various methods and Davies–Bouldin
index is used to choose the best result among them. The evaluation result is shown
in Fig. 7.4. For hierarchical clustering methods, we use complete-linkage (HIA-
COMPLETE) and ward method (HIA-WARD) to perform agglomerative clustering.
For k-means clustering, we use sample method (KM-SAMPLE), uniform scatter-
ing (KM-UNIFORM), k-means++ (KM-PLUS) to initialize centroids. For fuzzy
C-means clustering (FCM), we set the hyper-parameter m that controls how fuzzy
the cluster will be equal to 1.1, 1.2, 1.3 respectively. For Gaussian mixture model,
expectation maximization algorithm (GMEM) is used to perform iterations with
initial points set by k-means++ (PLUS) or random scattering (RAND).
The optimum value of the varying cluster numbers is chosen such that the Davies–
Bouldin index is minimized. So the optimum value is R = 6 clusters in total and the
detailed method is HIA-COMPLETE. The correspondent load profiling of the six
clusters are shown in Fig. 7.5.
Figure 7.5 provides a glimpse at the detailed load patterns of each cluster: The load
profiling in cluster 1 is scheduled to a nine-to-five peak while cluster 2 peaks only
176 7 Personalized Retail Price Design
Fig. 7.4 The Davies–Bouldin criterion values of different clustering methods and cluster numbers.
The minimum is chosen as the final result
in the morning. Load in cluster 3 is evenly distributed on the whole day. Consumers
in cluster 4 and 5 both prefer to consume in the evening, but cluster 4 consume as
much in the afternoon as in the evening. Consumers of cluster 6 regularly stay up
late at night.
Figure 7.6 shows the designed ToU schemes for different clusters by solving the
proposed MILP model and Fig. 7.7 shows a comparison between loads under flat
pricing and ToU pricing schemes. The retailer encourages consumers to reschedule
their electricity consumption to use more during low-pool-price hours by lowering
the ToU price during these hours for all clusters. Generally, low-pool-price periods
are usually off-peak of the total load as well. In Fig. 7.7, consumers indeed respond
to the off-peak retailing price fall but for different clusters of consumers, how much
the retailing price will fall and how long this block will last may not be the same.
The retailer needs to balance consumers’ utility decline when retailing price rises
7.4 Case Study 177
1 1
Cluster1 Cluster2
0.5 0.5
0 0
1 1
Cluster3 Cluster4
Processed load
0.5 0.5
0 0
1 1
Cluster5 Cluster6
0.5 0.5
0 0
0:00 8:00 16:00 23:30 0:00 8:00 16:00 23:30
Time/30 minutes
Mean value of processed load
Fig. 7.5 The optimal clustering result under the method HIA-COMPLETE
and utility increase when retailing price falls. The retailer will not unduly raise the
retailing price for fear of losing consumers since consumers’ utility may drop sharply
with a high price, so the price of all clusters is below 0.3 $/kWh. Some details which
ensure each consumer will be more satisfied with the pricing scheme designed for
him than any other one can be directly seen in Fig. 7.6. Take cluster 6 for an example.
The lowest retailing price is designed for cluster 6 whose original peak is just during
low-pool-price periods. To shave more price during peak time is inviting for cluster 6
and the retailer can benefit from increase in consumption of cluster 6 though retailing
price falls in these hours.
7.4.2.3 Linearization
For the MILP model, it takes 125.6 s to find the optimal solution 1186.01$. For the
MINLP model, it converges to a solution 1008.7$ after it runs out of time resources
of 2 h. Linearization enhances the speed of solving the problem, and the MILP model
does not fall into a local optimum in a two-hour time limit. Linearization may bring
relaxing errors, but increasing linear segments can help to fix the gap.
178 7 Personalized Retail Price Design
Fig. 7.6 The optimal ToU pricing schemes of the six clusters
7.4.3.1 Elasticity ε
Figure 7.9 shows the comparison between total loads of these 6435 consumers
under different elasticity ε. Peak shaving and valley filling are more notable when
elasticity is higher. Table 7.1 shows if consumers have higher elasticity, they can be
motivated to use more energy. Table 7.2 shows if consumers have higher elasticity,
the retailer also can make more profit.
The risk-weighting factor is set as different values to study the effect of CVaR,
forward contracts, and day-ahead market and the result is shown in Fig. 7.10. As
the retailer attaches more importance to risk, it tends to purchase more electricity
through forwarding contracts rather than from day-ahead market because the retailer
faces risk in day-ahead market. When ξ = 100, the retailer nearly considers all its
cost as risk. The trend of CVaR is decreasing as risk-weighting factor rises because
the CVaR becomes the decisive factor of the value of the revenue function as ξ rises.
180 7 Personalized Retail Price Design
Fig. 7.10 CVaR, the number of forward contracts to be signed/chosen and the quantity of power
to be purchased in day-ahead market under different risk-weighting factors
Different clustering methods are adopted to group load profiles, and consumers in
the same cluster have similar preferences. A statistical index, Davies–Bouldin, is
used to choose the best cluster result. But how different cluster methods influence
the performance of the whole model is worthy of discussion. Table 7.3 shows the
performance of different clustering methods when maintaining the number of clusters
R = 6.
In Table 7.3, the first column is the retailer’s retailing profit. The second column is
the total consumers’ welfare calculated by the sum of the individual utility functions,
sometimes referred to as a classical utilitarian [19]. The third column is the average
retailing price. We represent all the consumers’ preferences as the corresponding
cluster centroids’ preferences, but there may be some deviations between individuals
182 7 Personalized Retail Price Design
and the cluster centroid. If 6435 consumers choose among these 6 pricing schemes
by themselves, some members may not select the same pricing scheme as the cluster
centroid does because of the deviations. We simulate the real situation with the
following steps:
1. First, the utility gained from the six pricing schemes is calculated by Eq. (7.10c)
and sorted in descending order for every consumer.
2. Second, since consumers act in the principle of utility-maximization, the top in
order for every consumer is the consumer’s first choice in the real market. The pro-
portion of consumers whose first choices are just the same as their corresponding
centroids’ choices is the index First Choice.
3. Third, the second-highest in order for every consumer is the consumer’s second
choice in real market. The proportion of consumers, one of whose first choices or
second choices are just the same as their corresponding centroids’ choices, is the
index Second Choice. Second Choice is calculated to extend difference tolerance
between individuals and centroids.
According to Table 7.3, all the clustering methods increase both retailing profit
and social welfare and decrease average retailing price. By using this model, the
retailer will at least get the same profit as it does under flat pricing since flat pric-
ing is a feasible solution where the price of all ToU segments is the same as flat
price. HIA-COMP may not lead in the retailing profit, but it is indeed far ahead in
satisfying consumers. Thus, a Pareto improvement is achieved compared with the
original flat pricing scheme and the result gained by using HIA-COMP is also a
Pareto optimum among all the methods. The Pareto optimum coincides more with
consumers’ interests, and this implies if the retailer wants to increase profit further,
consumers are bound to be hurt. The high social welfare of HIA-COMP is due to
its wider between-cluster separation. If load profiles in different clusters are very
7.4 Case Study 183
similar, different profiles in different clusters have little difference, so the retailer
just arbitrarily keeps consumer utility at a small marginal value near zero to maxi-
mize its profit. On the contrary, if wide between-cluster separation is achieved, the
retailer must keep consumers’ utility large to avoid that consumers will not choose
the corresponding pricing scheme. Wider between-cluster separation brings bigger
social welfare.
First Choice and Second Choice focus more on within-cluster compactness. A
less dispersed within-cluster compactness implies low variance within each cluster
so the centroid can be a qualified representative in preferences. It can be inferred
from Table 7.3 that HIA-COMP has denser within-cluster compactness. The retailer
needs to condense within-cluster compactness because it wants to predict its profit
as accurate as possible, so HIA-COMP is the optimal choice.
Appendix I
If pt and qt are small enough compared with pt and qt . The left side of (7.22)
is expanded by using first-order Taylor-series as
qn(0) + qn α−1 qn α−1
= 1+ ≈
qn(0) qn(0)
(7.23)
qn
1 + (α − 1)
qn(0)
qn pn
1 + (α − 1) ≈1+ (7.24)
qn(0) pn(0)
qn /qn(0) 1
≈ (7.25)
pn / pn(0) (α − 1)
Appendix II
T
T
μ pr,t , pk,t(0) · qk,t(0) ≥ μ pt , pk,t(0) · qk,t(0) ⇒
t=1 t=1
(7.26)
μ( pr , pk(0) ) · q Tk(0) ≥ μ( p , pk(0) ) · q Tk(0)
where μ( p, pk(0) ) = μ( p1 , pk,1(0) ), . . . , μ( pT , pk,T (0) ) represents the terms unre-
lated to qt(0) in the expression of U ( p), namely
α−1
α
1 pt
μ pt , pk,t(0) = −1 − 1 × pk,t(0) (7.27)
α pk,t(0)
μ( p, pk(0) ) is a function of the old price scheme pk(0) and the new price scheme p.
Equation (7.26) for consumer k1 and k2 are displayed as follows respectively
If k1 and k2 have similar preferences, they will choose the same pricing scheme most
of the time including the last time when they chose among various pricing schemes.
So q k1 (0) = q k2 (0) is satisfied and the following should be true
Aiming at finding consumers who have similar preferences through load profiling
clustering, it is significant to find the relationship between two load profiling so that
it is guaranteed when Eq. (7.28a) is satisfied, Eq. (7.28b) is also satisfied. Thus,
considering Eq. (7.29), q k1 (0) needs to vary proportionally with q k2 (0)
Equation (7.31) means the shape of the load profiling after being processed is the
same.
References
1. Grid 2030 (2013). A national vision for electricity’s second 100 years. Technical report, United
States Department of Energy Office of Electric Transmission and Distribution.
2. Akhavan-Hejazi, H., & Mohsenian-Rad, H. (2018). Power systems big data analytics: An
assessment of paradigm shift barriers and prospects 4, 91–100, 11.
3. Adam Elmachtoub, Vishal mname Gupta, and Michael mname Hamilton. The value of per-
sonalized pricing. SSRN Electronic Journal, 1–46, 1.
4. Yang, J., Zhao, J., Luo, F., Wen, F., & Yang Dong, Z. (2017). Decision-making for electricity
retailers: A brief survey. IEEE Transactions on Smart Grid, 9(5), 4140–4153.
5. Celebi, E., & David Fuller, J. (2007). A model for efficient consumer pricing schemes in
electricity markets. IEEE Transactions on Power Systems, 22(1), 60–67.
6. Zugno, M., Miguel Morales, J., Pinson, P., & Madsen, H. (2013). A bilevel model for electricity
retailers’ participation in a demand response market environment. Energy Economics, 36, 182–
197.
7. Wei, W., Liu, F., & Mei, S. (2014). Energy pricing and dispatch for smart grid retailers under
demand response and market price uncertainty. IEEE Transactions on Smart Grid, 6(3), 1364–
1374.
8. Song, M., & Amelin, M. (2016). Purchase bidding strategy for a retailer with flexible demands
in day-ahead electricity market. IEEE Transactions on Power Systems, 32(3), 1839–1850.
9. Ghamkhari, M., Sadeghi-Mobarakeh, A., & Mohsenian-Rad, H. (2017). Strategic bidding for
producers in nodal electricity markets: A convex relaxation approach. IEEE Transactions on
Power Systems, 32(3), 2324–2336.
10. Carrión, M., Arroyo, J. M., & Conejo, A. J. (2009). A bilevel stochastic programming approach
for retailer futures market trading. IEEE Transactions on Power Systems, 24(3), 1446–1456.
11. Carrion, M., Conejo, A. J., & Arroyo, J. M. (2007). Forward contracting and selling price
determination for a retailer. IEEE Transactions on Power Systems, 22(4), 2105–2114.
186 7 Personalized Retail Price Design
12. Nguyen, D. T., Nguyen, H. T., & Le, L. B. (2016). Dynamic pricing design for demand response
integration in power distribution networks. IEEE Transactions on Power Systems, 31(5), 3457–
3472.
13. Li, R., Wang, Z., Chenghong, G., Li, F., & Hao, W. (2016). A novel time-of-use tariff design
based on gaussian mixture model. Applied Energy, 162, 1530–1536.
14. Yang, J., Zhao, J., Wen, F., & Dong, Z. (2019). A model of customizing electricity retail prices
based on load profile clustering analysis. IEEE Transactions on Smart Grid, 10(3), 3374–3386.
15. Yang, J., Zhao, J., Wen, F., & Dong, Z. Y. (2018). A framework of customizing electricity retail
prices. IEEE Transactions on Power Systems, 33(3), 2415–2428.
16. Yang, P., Tang, G., & Nehorai, A. (2012). A game-theoretic approach for optimal time-of-use
electricity pricing. IEEE Transactions on Power Systems, 28(2), 884–892.
17. Chapman, A. C., Verbič, G., & Hill, D. J. (2016). Algorithmic and strategic aspects to integrating
demand-side aggregation and energy management methods. IEEE Transactions on Smart Grid,
7(6), 2748–2760.
18. Samadi, P., Mohsenian-Rad, H., Schober, R., & Wong, V. W. S. (2012). Advanced demand side
management for the future smart grid using mechanism design. IEEE Transactions on Smart
Grid, 3(3), 1170–1180.
19. Varian, H. R. (2010). Intermediate microeconomics: A modern approach (8th ed.). W.W. Norton
Co.
20. Saez-Gallego, J., Morales, J. M., Zugno, M., & Madsen, H. (2016). A data-driven bidding
model for a cluster of price-responsive consumers of electricity. IEEE Transactions on Power
Systems, 31(6), 5001–5011.
21. Ratliff, L. J., Dong, R., Ohlsson, H., & Sastry, S. S. (2014). Incentive design and utility learning
via energy disaggregation. IFAC Proceedings Volumes, 47(3), 3158 – 3163. 19th IFAC World
Congress.
22. Chiu, T.-C., Shih, Y.-Y., Pang, A.-C., & Pai, C.-W. (2016). Optimized day-ahead pricing with
renewable energy demand-side management for smart grids. IEEE Internet of Things Journal,
4(2), 374–383.
23. García-Bertrand, R. (2013). Sale prices setting tool for retailers. IEEE Transactions on Smart
Grid, 4(4), 2028–2035.
24. Wang, Y., Chen, Q., Kang, C., Zhang, M., Wang, K., & Zhao, Y. (2015). Load profiling and
its application to demand response: A review. Tsinghua Science and Technology, 20, 117–129,
04.
25. Imamoto, A., & Tang, B. (2008). A recursive descent algorithm for finding the optimal minimax
piecewise linear approximation of convex functions. In Advances in Electrical and Electronics
Engineering-IAENG Special Edition of the World Congress on Engineering and Computer
Science 2008 (pp. 287–293). IEEE.
26. Price elasticity of demand. Technical report, Australian Energy Regulator, 2005.
Chapter 8
Socio-demographic Information
Identification
Abstract This chapter investigates how such characteristics can be inferred from
fine-grained smart meter data. A deep convolutional neural network (CNN) first
automatically extracts features from massive load profiles. A support vector machine
(SVM) then identifies the characteristics of the consumers. Comprehensive compar-
isons with state-of-the-art and advanced machine learning techniques are conducted.
Case studies on an Irish dataset demonstrate the effectiveness of the proposed
deep CNN-based method, which achieves higher accuracy in identifying the socio-
demographic information about the consumers.
8.1 Introduction
appliance stock, and occupants’ behavior influence electricity consumption using the
factor analysis regression method. Jin et al. [6] link unusual consumption patterns
with consumers’ socio-demographic characteristics and generate descriptive and pre-
dictive models to identify subgroups of consumers. Tong et al. [7] define an energy
behavior correlation rate and an indicator dominance index to form a mapping rela-
tionship between different energy behavior groups of Irish people and their energy
behavior indicators using wavelet analysis and X-means clustering. Vercamer et al.
[8] address the issue of assigning new customers, for whom no advanced metering
infrastructure (AMI) readings are available, to one of these load profiles based on
spectral clustering, random forests, and stochastic boosting-based classification.
Other authors have worked on the identification of socio-demographic information
from load profiles. Beckel et al. [9] propose a household characteristic estimation
system called CLASS, where features selection and classification are conducted, and
the accuracies of the majority of the household characteristic estimations are greater
than 70%. In [10] these authors extend the classification work to regression and
provide additional details on the consumption figures, ratios, temporal and statistical
properties based on feature extraction. Hopf et al. [11] describe an extended system
based on the CLASS tool [9], where a total of 88 features are designed, and a
combined feature selection method is proposed for classification. Viegas et al. [12]
use transparent fuzzy models to estimate the characteristics of consumers and extract
knowledge from the fuzzy model rules. Zhong et al. [13] combine discrete Fourier
transform (DFT) and a classification and regression tree (CART) to systematically
divide the consumers into different groups. Wang et al. [14] apply non-negative sparse
coding to extract partial usage patterns from load profiles and use SVM to identify
the types of consumers.
As this review of the literature shows, the existing methods for identifying socio-
demographic information about the consumers include three main stages: feature
extraction to form a feature set, feature selection, and classification or regression.
The majority of the works on feature extraction are implemented manually, such as
the calculation of consumption, ratios, statistics, and temporal characteristics from
load profiles. These manually extracted features may not effectively model the high
variability and nonlinearity of individual load profiles. This chapter proposes an
automatic feature extraction method based on deep learning techniques to learn
features from a different dataset in a flexible manner.
Deep learning is an emerging technique that has advanced considerably since
efficient optimization methods were proposed to train deep neural networks [15,
16]. Different types of deep neural networks have been proposed, including auto-
encoder, convolutional neural networks (CNNs), recurrent neural networks (RNNs),
restricted Boltzmann machine (RBM), and deep belief network (DBN) [17]. Net-
works that can effectively handle time series, such as deep RNN [18] and RBM
[19], have been proposed for load forecasting. Auto-encoders have been applied to
extract features from load profiles [20]. CNNs are an effective approach for gener-
ating useful and discriminative features from raw data and have broad applications
in image recognition, speech recognition, and natural language processing [21]. In
this chapter, a deep CNN is proposed to extract the highly nonlinear relationships
8.1 Introduction 189
between electricity consumption at different hours and on different days and the
socio-demographic status of the consumer. To further improve the identification per-
formance, a support vector machine (SVM) is used to replace the softmax classifier to
identify the socio-demographic information of consumers based on the automatically
extracted features.
For the jth label, the classification model F j (si, j , w2, j ) needs to be trained, where
F j denotes the mapping relationship from smart meter data to the jth label and w2, j
denotes the trained optimal parameters for classification. Thus, for given {si, j , yi, j },
the jth label of the ith consumer can be estimated:
A smooth function categorical cross entropy is used as the loss function to guide the
training of the functions G j (ci , w1, j ) and F j (si, j , w2, j ) when the total number of
training samples is K j :
1
Kj
L j (w1, j , w2, j ) = [yi, j log ŷi, j − (1 − yi, j )log(1 − ŷi, j )]. (8.3)
K j i=1
Since feature extraction and classification models are established for each label, the
subscript j will be omitted for simplicity.
For the socio-demographic information identification problem, three issues should
be addressed:
1. The determination of the feature extraction model G j to obtain the input data si, j .
2. The determination of the classification model F j to produce the estimated label
ŷi, j .
3. The determination of the training method to obtain w1, j and w2, j that achieves
the optimal classification performance.
190 8 Socio-demographic Information Identification
8.3 Method
This section first introduces the rationale for applying a CNN for feature selection and
extraction rather than other machine learning techniques such as the least absolute
shrinkage and selection operator (LASSO), principal component analysis (PCA),
sparse coding. Then, it describes how the CNN architecture is constructed for feature
extraction and classification. Finally, it proposes techniques to reduce overfitting and
train optimal parameters.
Figure 8.1 shows the daily load profiles of a consumer over a week. Although some
trends can be observed, such as higher consumption in the morning and at night and
nearly zero consumption at midnight, considerable uncertainty exists regarding when
and how much electricity is used. Time shifting is one of the main characteristics
of residential load profiles. The peaks highlighted in the three red circles in Fig. 8.1
show that this consumer uses comparable amounts of electricity during adjacent time
periods on different days. The load profiles are highly similar but slightly shifted.
In a CNN, the filter weights are uniform for different regions. Thus, the features
calculated in the convolutional layer are invariant to small shifts, which means that
relatively stable features can be obtained from varying load profiles.
Fig. 8.1 Smart meter data of one consumer over one week
8.3 Method 191
Unlike the load profile for an entire power system, which is considerably more reg-
ular and has a relatively clear relationship with time and weather conditions, the
residential load profiles are affected not only by the weather conditions and the type
of day, but also by the socio-demographic status of the consumer, the house size, and
other factors. The correlations between electricity consumption and these factors are
highly nonlinear. Neural networks are able to model these highly nonlinear correla-
tions, particularly networks with multiple layers. A deep CNN can rely on multiple
convolutional and fully connected layers to learn the highly nonlinear relationships
between load profiles and the socio-demographic information.
The filters learned by the convolutional layers in a deep CNN can be visualized
according to the learned weights. This visualization can show how the original pro-
files are transformed into other forms at different layers. Furthermore, the load pro-
files that produce the largest activations of neurons can be extracted.
Figure 8.2 shows the proposed deep CNN architecture. It consists of eight layers,
three of which are convolutional layers, three layers are pooling layers, one is a fully
connected layer, and the last one is an SVM layer.
Two factors are considered in determining the CNN network structure. The first
factor is the characteristics of consumer electricity consumption behaviors. Since the
load profiles are so variable, two CNN layers are applied to capture the hidden patterns
to identify the socio-demographic information. Moreover, since the dimensions of
the input data size are 7 × 24, which is quite small compared with what is used in
image recognition problems, only one pooling layer is used. The second factor is
the number of training samples. Since the number of samples is limited, to reduce
the risk of overfitting, the network structure cannot be too complex. Thus, to reduce
the parameters, the architecture consists of two convolutional layers, followed by a
max-pooling layer and a dropout layer. Finally, the fully connected layer performs
the final classification based on the flattened inputs from the previous layers.
The hyperparameters of the proposed deep CNN include the number of kernels,
the kernel size of the CNN layers, the pool size of the max-pooling layer, the ratio of
dropout, and the number of outputs of the last dense layer. These hyperparameters
are obtained by grid search and cross-validation. Table 8.1 summarizes the hyperpa-
rameters and the number of parameters of the proposed deep CNN. A total of 10808
parameters must be trained.
Section 8.3.2 provides the overall structure of the proposed network. This subsection
introduces how each layer works. Generally, for the lth layer with input xl , the learnt
weight and bias are Wl and bl , respectively. gl is the transformation function of the
layer. The learnt features can be expressed as gl (Wl , bl , xl ). In the following, the
exact expressions of gl for different types of layers are introduced.
8.3.3.1 Activation
For each layer, the information is transferred along each neuron which is formu-
lated as activation function. The activation of a neuron is a function from the input
8.3 Method 193
of the neuron to its output. Various activation functions have been designed, such
as gtanh (x) = tanh(x) and gsig (x) = (1 + e−x )−1 . Activation functions with satu-
rating nonlinearities can significantly slow training with gradient descent or even
block weight convergence, which is called vanishing gradient [22]. A non-saturating
activation function named rectified linear unit (ReLU) is used in the proposed deep
CNN [23], which has been proven to be several times faster than tanh in deep CNN
in [24].
g ReLU (xl ) = max(0, xl ). (8.4)
Note that a sigmoid activation function is used in the last layer for the classification
tasks.
Convolutional layers are the main layers for feature extraction in a deep CNN. Each
convolutional layer has a certain number of feature filters. The number of filters in
the lth layer is Fl . The fl th feature filter has its own learnable parameters Wl, fl . Thus,
the convolution results obtained by the fl th filter can be expressed as follows:
Fl
gcon (xl, fl ) = xl, fl ∗ Wl, fl + bl, fl . (8.5)
fl =1
where ∗ is the convolution operation. Note that both xl, fl and bl, fl are matrices with
the same size of filter Wl, fl .
A dense layer is also called a fully connected layer. All the input features xl are
transmitted to the next layer by the weight Wl :
8.3.3.4 Pooling
indicates that max-pooling has better performance than average-pooling [25]. Thus,
max-pooling is used in the pooling layers:
The dropout layer randomly selects a fraction of inputs and sets them to 0. The
random selection is assumed to have a Bernoulli distribution with a probability p:
where rl is a matrix of the same size as the input xl and its elements are either 0 or 1
following a Bernoulli distribution. The dropout layer can be expressed as follows:
8.3.3.6 Classification
Traditionally, softmax is used for classification in the last layer. Softmax is also a
fully connected layer:
gsm (xl ) = Wl · xl + bl . (8.10)
Thus, the probability of the mth class can be calculated using (8.11), and the predicted
class is the class corresponding to the maximum probability.
e xm
P(y = m | x) = .
M
x
(8.11)
e m
m=1
Rather than applying softmax for classification, the proposed method uses an
SVM to predict the class based on the learned features:
where sgn(·) is the sign function, which maps negative values to −1 and positive
values to 1. The parameter Wl in the SVM layer is formulated as an optimization
problem:
1
Ki
min λ Wsvm + max(0, 1 − yi (Wsvm · xl,i + bl )) . (8.13)
K i i=1
8.3 Method 195
where · denotes the 2-norm; λ denotes the trade-off between increasing the margin
size and confidence that it lies on the correct side of the margin [26].
Although a deep network with a large number of parameters is very powerful for
feature extraction and classification, it can easily become over-fitted. In this case, the
number of parameters to be trained is 10808. Changes are made in the inputs, in the
model, and in the training method to reduce overfitting of the deep CNN.
Increasing the sample size is an effective way to reduce overfitting. Various data
augmentation techniques, including noise injection, horizontal reflection, and ran-
dom sampling, have been applied in CNN-based image classification to enlarge the
size of the input. For the socio-demographic information identification problem, we
use one-week smart meter data to refer to each socio-demographic information of
the consumer. Even though the electricity consumption behavior of individual con-
sumers can be affected by their socio-demographical status, weather condition, and
even their mood, the previous study shows that each weekly load profile can more or
less reveal the socio-demographic information of consumers [10]. Thus, data aug-
mentation consists simply of using smart meter data of other weeks as training data.
If the dataset contains Q weeks of smart meter data, then the training dataset can be
enlarged Q times.
8.3.4.2 Dropout
Establishing a model with good generalization is important for the proposed deep
CNN. Dropping units randomly from the neural network during training can prevent
units from co-adapting too much [27] and make a neuron not rely on the presence of
other specific neurons. Dropout is quite similar to the ensemble method by varying
the hyperparameters to obtain a less correlated model at each epoch.
196 8 Socio-demographic Information Identification
Applying an appropriate training method is also useful for reducing overfitting. The
weight decay term in (8.14) is essentially a regularizer that adds a penalty for weight
update at each iteration. Regularization in stochastic gradient descent (SGD) reduces
the risk of overfitting.
The deep CNN model is trained using stochastic gradient descent with given batch
size B, learning rate r , weight decay d, and momentum m. Iterations are implemented
as follows [28]:
∂L
vt+1 = m · vt − d · r · Wt − r · |W ,B . (8.14)
∂W t t
where vt denotes the changes in the weights at the tth iteration, Wt denotes the
learnt weights at the tth iteration, m · vt smooths the direction of gradient descent
and accelerates the training process, d · r · Wt reduces the risk of overfitting, and
∂L
r · ∂W |Wt ,Bt denotes the average value of the partial derivative of the loss function
with respect to the weight of the tth batch data Bt .
The weights in each layer are initialized by random sampling from a normal
distribution with a mean of zero and a standard deviation of 0.01. Biases of all the
neurons are initialized at a value of 1 to accelerate the early stage of learning because
the inputs of ReLUs are positive in this case.
This section discusses several evaluation criteria used to quantify the performance
of the proposed method. Other methods proposed in the literature are also tested for
comparison.
M
Cm,m
m=1
Accuracy = . (8.16)
M M
Cm,n
m=1 n=1
where Pr and Re denote precision and recall, respectively, and are calculated as
follows:
Pr = T P/(T P + F N )
(8.18)
Re = T P/(T P + F P).
The seven methods that are compared with the method proposed in this chapter are
briefly introduced in the following paragraphs.
Since we have prior knowledge of the proportions of different classes in the training
dataset, we can identify the socio-demographic information of consumers as the class
with the largest proportions. The accuracy of this BG strategy is larger than that of
a random guess and can be expressed as follows [10]:
M
Im 2
Accuracy BG = . (8.19)
m=1
I
198 8 Socio-demographic Information Identification
where Im and I denote the number of samples of class m and the total number of
samples. The accuracy of BG is used as a naive benchmark for other methods of
identifying socio-demographic information.
Beckel et al. [10] proposed a consumer characteristic identification system where the
majority of the features are extracted manually. The accuracies reported in [10] are
compared in the case studies.
8.4.2.3 SVM
The linear model with an L 1 regularizer penalty can obtain sparse solutions, i.e.,
part of the coefficients corresponding to electricity consumptions at different time
periods are set to zero. A linear SVM combined with an L 1 regularizer is first used
for feature selection and to retain non-zero coefficients.
Sparse coding is a compressive sensing technique to map the original data into a
higher-dimensional space, which is quite different from PCA. The basic idea of
sparse coding is to generate redundant vectors such that the original data can be
represented in terms of a linear combination of a limited number of vectors [14].
The coefficients learned by sparse coding are then fed into the SVM for socio-
demographic information identification.
8.4 Performance Evaluation and Comparisons 199
Softmax in (8.11) rather than SVM is used in the last layer of the proposed deep
CNN and is also compared with the proposed method.
In this section, the case studies are implemented using Python 2.7.13 on a standard
PC with an Intel CoreTM i7-4770MQ CPU running at 2.40 GHz and with 8.0 GB
of RAM. The deep CNN architecture is constructed based on Tensorflow [30], and
the interface between CNN and SVM is programmed using scikit-learn [31] and
Keras [32].
The dataset used in this section was provided by the Commission for Energy Regula-
tion (CER), which is the regulator for the electricity and natural gas sectors in Ireland
[33]. This dataset contains the smart meter data of 4232 residential consumers over
536 days at an interval of 30 min. Among the 536 days of smart meter data, the first
75 weeks (525 days) data were chosen to train, validate, and test the proposed deep
CNN. More specifically, the consumers are first listed in increasing order according
to the ID of the consumers. Then, the smart meter data of the first 80% consumers
are used to train and validate the CNN model; the smart meter data of the rest 20%
consumer are used to test the model. If there are null values or continuous zero values,
the data for these weeks are removed. A total of 300,138 weeks of smart meter data
are used. The training data is thus approximately 28 times the number of parameters
to be estimated, which reduces the risk of overfitting.
The Irish dataset also contains two survey datasets (pre-trial and post-trial sur-
veys) [33] which contain socio-demographic information about the consumers and
are used as labels in the supervised learning task. For a fair comparison with the
existing method, we identify the ten survey questions (socio-demographic informa-
tion) in this section that are also investigated in the existing literature. Table 8.3 lists
the socio-demographic information to be identified. To help readers easily find the
corresponding survey questions, the question numbers in the survey are also pro-
vided in the second column. These questions cover information of the occupants of
the house, the house itself, and the domestic appliances.
Figure 8.3 shows the accuracies and F1 scores of different socio-demographic infor-
mation. Among these ten questions, the accuracies of #2 (chief income earner has
200 8 Socio-demographic Information Identification
retired or not), #4 (have children or not), and #8 (cooking facility type) are higher
than 75%; the accuracies of #7 (number of bedrooms) and #9 (energy-efficient light
bulb proportion) are lower than 60%; and the accuracies of the remaining questions
are between 60 and 75%. Note that the number of the two answers of #9 (up to
half/three quarters or more) are 2041 and 2149, respectively. Although the accuracy
of #9 is lower, its F1 score is not the lowest. Clearly, whether having children or not
has a great influence on the daily life of consumers and significantly affects the load
profiles. The type of cooking appliance and light bulbs directly determine electricity
consumption. Thus, it is rather easy to identify these two factors from smart meter
data. Compared with other information, the number of bedrooms has a weak rela-
tionship with the behavior of electricity consumption. The average accuracy and F1
score of these questions are 67.3% and 0.622, respectively. Three ways, data aug-
8.5 Case Study 201
mentation, dropout, and weight decay, are used to reduce the overfitting risk. We also
conduct numerical experiments without dropout layer and weight decay. The average
accuracies of the methods without dropout layer and weight decay are 61.2% and
64.3%, respectively; the average F1 scores are 0.597 and 0.608, respectively. It can
be seen that the network with dropout layer and weight decay has better performance
which verifies the effectiveness of the two ways to reduce overfitting.
Tables 8.4 and 8.5 compare the accuracies and F1 scores of the proposed and com-
peting methods. The column Improvement 1 shows the relative improvements of CS
202 8 Socio-demographic Information Identification
(traditional CNN) compared with the best performer among the other six competing
methods. The column Improvement 2 shows the relative improvements of the pro-
posed CNN+SVM method compared with the CS method. Improvement 3 shows
the relative improvements of the average performance of MF, SVM, LS, PS, SS, CS,
and the proposed method compared with BG method. The accuracies of the BG and
MF methods are provided in [10], the F1 scores of which are not provided. Note that
the accuracies of the MF method are obtained by running the classification over the
entire 75 weeks and majority voting.
It is clear that all classification models outperform the BG method. If the electricity
consumption of residents has no relationship with one specific socio-demographic
information, the feature extraction and classification process may not be able to
improve the identification accuracy compared with BG method. In other words,
the improvements of feature extraction and classification methods over BG method
(Improvement 3 in Table 8.4) can more or less indicate how much the socio-
demographic information can affect the electricity consumption behaviors of the
consumers. The improvement of #9 is the lowest, which means that the energy-
efficient light bulb proportion has very little influence on electricity consumption
behavior.
The performance of Lasso-based SVM (LS) has better but very comparable per-
formance with SVM in terms of average accuracy and F1 score. This result means
that the Lasso-based feature extraction method has a very small effect on the per-
formance of the SVM classifier. However, after PCA or sparse coding-based feature
extraction, the classifiers have better performance than the Lasso-based method or
simple SVM. CNN-based deep learning network (CS) has distinct advantages over
SVM, LS, PS, and SS, which means that the proposed method can extract highly
nonlinear relationships hidden in these massive load profiles. By replacing softmax
with SVM, the performance can be further improved in terms of both accuracy and
F1 score.
8.6 Conclusions 203
8.6 Conclusions
This chapter proposes a CNN-based deep learning method for identifying consumer
socio-demographic information. CNN can take into consideration the correlations
between different hours of the day and different days. The proposed method auto-
matically extracts the hidden usage patterns from massive and varying smart meter
data to improve the accuracy of socio-demographic information identification. Case
studies on an Irish dataset show the superiority of CNN over other feature extractions
methods.
References
1. Keerthisinghe, C., Verbič, G., & Chapman, A. C. (2016). A fast technique for smart home
management: ADP with temporal difference learning. IEEE Transactions on Smart Grid, 9(4),
3291–3303.
2. Sun, Siyang, Yang, Qiang, & Yan, Wenjun. (2017). Optimal temporal-spatial pev charging
scheduling in active power distribution networks. Protection and Control of Modern Power
Systems, 2(1), 34.
3. McLoughlin, Fintan, Duffy, Aidan, & Conlon, Michael. (2012). Characterising domestic elec-
tricity consumption patterns by dwelling and occupant socio-economic variables: An irish case
study. Energy and Buildings, 48, 240–248.
4. McLoughlin, Fintan, Duffy, Aidan, & Conlon, Michael. (2015). A clustering approach to
domestic electricity load profile characterisation using smart metering data. Applied Energy,
141, 190–199.
5. Kavousian, Amir, Rajagopal, Ram, & Fischer, Martin. (2013). Determinants of residential
electricity consumption: Using smart meter data to examine the effect of climate, building
characteristics, appliance stock, and occupants’ behavior. Energy, 55, 184–194.
6. Jin, Nanlin, Flach, Peter, Wilcox, Tom, Sellman, Royston, Thumim, Joshua, & Knobbe, Arno.
(2014). Subgroup discovery in smart electricity meter data. IEEE Transactions on Industrial
Informatics, 10(2), 1327–1336.
7. Tong, Xing, Li, Ran, Li, Furong, & Kang, Chongqing. (2016). Cross-domain feature selection
and coding for household energy behavior. Energy, 107, 9–16.
8. Vercamer, Dauwe, Steurtewagen, Bram, Van den Poel, Dirk, & Vermeulen, Frank. (2015).
Predicting consumer load profiles using commercial and open data. IEEE Transactions on
Power Systems, 31(5), 3693–3701.
9. Beckel, C., Sadamori, L., & Santini, S. (2013). Automatic socio-economic classification of
households using electricity consumption data. In Proceedings of the Fourth International
Conference on Future Energy Systems (pp. 75–86). ACM.
10. Beckel, Christian, Sadamori, Leyna, Staake, Thorsten, & Santini, Silvia. (2014). Revealing
household characteristics from smart meter data. Energy, 78, 397–410.
11. Hopf, Konstantin, Sodenkamp, Mariya, Kozlovkiy, Ilya, & Staake, Thorsten. (2016). Feature
extraction and filtering for household classification based on smart electricity meter data. Com-
puter Science-Research and Development, 31(3), 141–148.
12. Viegas, J. L., Vieira, S. M., & Sousa, J. M. C. (2016). Mining consumer characteristics from
smart metering data through fuzzy modelling. In International Conference on Information Pro-
cessing and Management of Uncertainty in Knowledge-Based Systems (pp. 562–573) Springer.
13. Zhong, Shiyin, & Tam, Kwa-Sur. (2015). Hierarchical classification of load profiles based
on their characteristic attributes in frequency domain. IEEE Transactions on Power Systems,
30(5), 2434–2441.
204 8 Socio-demographic Information Identification
14. Wang, Yi, Chen, Qixin, Kang, Chongqing, Xia, Qing, & Luo, Min. (2016). Sparse and redundant
representation-based smart meter data compression and pattern extraction. IEEE Transactions
on Power Systems, 32(3), 2142–2151.
15. LeCun, Yann, Bengio, Yoshua, & Hinton, Geoffrey. (2015). Deep learning. Nature, 521(7553),
436–444.
16. Hinton, G. E., & Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural
networks. Science, 313(5786), 504–507.
17. Schmidhuber, Jürgen. (2015). Deep learning in neural networks: An overview. Neural Net-
works, 61, 85–117.
18. Shi, H., Minghao, X., & Li, R. (2017). Deep learning for household load forecasting—a novel
pooling deep rnn. IEEE Transactions on Smart Grid, 9(5), 5271–5280.
19. Mocanu, E., Nguyen, P. H., Gibescu, M., & Kling, W. L. (2016). Deep learning for estimating
building energy consumption. Sustainable Energy, Grids and Networks, 6, 91–99.
20. Varga, E. D., Beretka, S. F., Noce, C., & Sapienza, G. (2015). Robust real-time load profile
encoding and classification framework for efficient power systems operation. IEEE Transac-
tions on Power Systems, 30(4), 1897–1904.
21. Sharif Razavian, A., Azizpour, H., Sullivan, J., & Carlsson, S. (2014). Cnn features off-the-
shelf: an astounding baseline for recognition. In Proceedings of the IEEE Conference on Com-
puter Vision and Pattern Recognition Workshops (pp. 806–813).
22. He, K., Zhang, X., Ren, S., & Sun, J. (2015). Delving deep into rectifiers: Surpassing human-
level performance on imagenet classification. In Proceedings of the IEEE International Con-
ference on Computer Vision (pp. 1026–1034).
23. Nair, V., & Hinton, G. E. (2010). Rectified linear units improve restricted boltzmann machines.
In Proceedings of the 27th International Conference on Machine Learning (ICML-10) (pp.
807–814).
24. Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep con-
volutional neural networks. In Advances in Neural Information Processing Systems (pp. 1097–
1105).
25. Boureau, Y. L., Ponce, J., & LeCun, Y. (2010). A theoretical analysis of feature pooling in
visual recognition. In Proceedings of the 27th International Conference on Machine Learning
(ICML-10) (pp. 111–118).
26. Furey, T. S., Cristianini, N., Duffy, N., Bednarski, D. W., Schummer, M., & Haussler, D. (2000).
Support vector machine classification and validation of cancer tissue samples using microarray
expression data. Bioinformatics, 16(10), 906–914.
27. Srivastava, Nitish, Hinton, Geoffrey, Krizhevsky, Alex, Sutskever, Ilya, & Salakhutdinov, Rus-
lan. (2014). Dropout: A simple way to prevent neural networks from overfitting. The Journal
of Machine Learning Research, 15(1), 1929–1958.
28. Bengio, Y. (2012). Practical recommendations for gradient-based training of deep architectures.
In Neural Networks: Tricks of the Trade (pp. 437–478) Springer.
29. Wold, Svante, Esbensen, Kim, & Geladi, Paul. (1987). Principal component analysis. Chemo-
metrics and Intelligent Laboratory Systems, 2(1–3), 37–52.
30. Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G. S., Davis, A.,
Dean, J., Devin, M. et al. (2016). Tensorflow: Large-scale machine learning on heterogeneous
distributed systems. arXiv:1603.04467.
31. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M.,
Prettenhofer, P., Weiss, R., Dubourg, V. et al. (2011). Scikit-learn: Machine learning in python.
Journal of Machine Learning Research, 12(Oct), 2825–2830.
32. Chollet, F. et al. (2015). Keras.
33. Irish Social Science Data Archive. (2012). Commission for Energy Regulation (CER) Smart
Metering Project. http://www.ucd.ie/issda/data/commissionforenergyregulationcer/.
Chapter 9
Coding for Household Energy Behavior
Abstract Household energy behavior is a key factor that dictates energy consump-
tion, efficiency, and conservation. In the past, household energy behavior was typi-
cally unknown because conventional meters only recorded the total amount of energy
consumed by a household over a significant period of time. The rollout of smart
meters enables real-time household energy consumption data to be recorded and
analyzed. This chapter uses smart meter readings from more than 5000 Irish house-
holds to identify energy behavior indicators through a cross-domain feature selection
and coding approach. The idea is to extract and connect customers’ features from
the energy domain and demography domain, i.e., smart meter data and household
information. Smart meter data are characterized by typical energy spectral patterns,
whereas household information is encoded as the energy behavior indicator. The
results show that employment status and internet usage are highly correlated with
household energy behavior in Ireland because employment status and internet usage
have an important effect on lifestyle, including when to work, play, and rest, and
hence yield a difference in electricity use style. The proposed approach offers a
simple, transparent and effective alternative to a challenging cross-domain matching
problem with massive smart meter data and energy behavior indicators.
9.1 Introduction
With the development of household-level low carbon technologies such as PVs (pho-
tovoltaics), EVs (electric vehicles) and HPs (heat pumps), households have taken a
more active role in the energy system. Understanding their energy behaviors would
be beneficial to decreasing energy loss, improving system efficiency, and enhancing
sustainable energy integration. In the past, little information on household energy has
been collected because conventional meters only record the total amount of energy
consumed for a household over a significant period of time. Smart meters record the
consumption of electric energy in intervals of an hour or less and communicate that
information back to the utility for monitoring and billing.
The recent rollout of smart meters has brought opportunities to provide insight into
household energy behavior because energy behavior dictates the shape and magnitude
© Science Press and Springer Nature Singapore Pte Ltd. 2020 205
Y. Wang et al., Smart Meter Data Analytics,
https://doi.org/10.1007/978-981-15-2624-4_9
206 9 Coding for Household Energy Behavior
of the electrical load, which can be captured by smart meter data. With increasingly
extensive and massive smart meter data, the energy industry has embraced big data
[1], in which a series of data-mining-related methods in different aspects, such as clas-
sification [2–4], regression [5–7], and clustering [8–10], are applied to mine and con-
nect data in the energy domain. Based on these methods, several smart meter mining
applications have emerged, including load profiling [11–13], customer segmentation
[14–16], load forecasting [17], and NILM (non-intrusive load monitoring) [18].
Reference [19] proposes a load prediction method for industrial and commercial
consumers based on socioeconomic factors. The socioeconomic factors considered
include the population, crime rate, building size, available area, turnover rate, num-
ber of employees, and etc. The basic idea is to obtain consumers’ typical load pro-
files using spectral clustering method, and then to establish the adaptive stochastic
boosting and random forests classification model between socioeconomic factors
and consumers’ typical load profiles. The method has been applied to more than
6000 Belgian industrial and commercial consumers. The results show high predic-
tion accuracy. Except for this work, few studies have linked data in the demography
domain with smart meter data. Smart meter data represent human energy behavior,
which is affected by data in the demography domain. The idea is to extract and
connect human features from the energy domain to the demography domain, i.e.,
smart meter data and household information. Smart meter data are characterized by
typical energy spectral patterns representing energy behavior, whereas household
information is encoded as an energy behavior indicator.
Motivated by this idea, this chapter proposes a cross-domain feature selection
and coding method for household energy behavior. Human-feature-related data in
the demography domain is typically recorded in a questionnaire composed of a series
of specially designed questions, such as age, occupancy, employment status, income,
and energy usage habits. Each question can have a range of answers which can be
designated as A, B, C, D, etc. In this method, the questionnaire answers constitute a
label sequence, and the subset of labels that has the most significant effect on energy
behavior represented by the typical energy spectral pattern is identified as energy
behavior indicators, making it possible to construct a connection between data in the
energy domain and the demography domain. Through this approach, energy patterns
and energy behavior indicators are connected, providing a deep understanding of why
people behave differently, illustrating the underlying factors that are responsible for
differing human energy behavior, and showing how energy behavior might change
in the future if people’s status changes.
load profiles first. On this basis, for each class of consumers with similar load pro-
file shapes, their common socioeconomic factors can be identified. These identified
factors are viewed as influential socioeconomic factors. They are defined as “socioe-
conomic genes”. Then, for different types of consumers, the differences of their
“socioeconomic genes” will be analyzed, and the “socioeconomic genes” with sig-
nificant differences among different classes will be identified as the socioeconomic
related factors that affect consumers’ electricity consumption behavior. It is defined
as “dominant socioeconomic gene”. Finally, the dominant socioeconomic gene is
used as the input of the consumer load profile predictor to realize the prediction
process.
For the consumer typical load profile extraction, a Gaussian Mixture Model (GMM)-
[20] and X-mean- [21] based two-stage clustering approach is proposed. The load
profiles of different consumers are very different. Even the load profiles on different
208 9 Coding for Household Energy Behavior
days of one consumer are also very different. It is necessary to extract a typical load
profile for each consumer first to characterize the electricity consumption behavior
of this consumer. Then clustering can be conducted on different consumers based on
their typical load profiles.
In the process of typical load profile extraction for each consumer, the GMM
clustering method is directly applied to extract several types of load profiles. Then,
the typical load curves of multiple consumers are clustered by X-means cluster-
ing method. The overall steps are described in Fig. 9.2. Details are provided in the
following subsections.
A simple and efficient method for consumer typical load profile extraction is proposed
here, which can be divided into two steps: piecewise average approximation (PAA)
and GMM clustering.
(1) Piecewise Average Approximation
A fixed window width w is used to divide the original load profile into several
segments, and each segment is approximated by the average load within the window.
In this way, the fluctuation of the original load profile can be reduced. We denote the
9.3 Load Profile Clustering 209
1
w
Pn,1 = Pn,i ,
w i=1
1
w
Pn,2 = Pn,w+i ,
w i=1 (9.1)
...,
1 w
Pn,T /w = Pn,T −w+i .
w i=1
where λk denotes the weight of the finite distribution f k ; denotes all the parameters
of the GMM model; θk denotes the parameters of the kth PDF; λk f k (Pn ; θk ) denotes
the weighted PDF. Since the integral of the mixed probability density distributions
is 1, the sum of weights should be equal to 1:
K
λk = 1 (9.3)
k=1
The GMM model can be solved by Expectation Maximum (EM) algorithm with
historical load profiles Pn , finite distribution f k , and the number of distributions K .
Then, according to the weighted probability density that Pn belongs to each finite
distribution, the posterior probability of each finite distribution f k that Pn belongs to
can be obtained by Bayesian theorem.
Then we choose the class k with the greatest posterior probability as the classifi-
cation of Pn . The term (1−posterior probability that Pn belongs to class k) is defined
as the risk. The optimal number of distributions K can be searched by gradually
increasing the value of K from K = 1, and the minimum value of K where the total
risk of all historical days is less than the threshold β is selected as the optimal value.
210 9 Coding for Household Energy Behavior
For all classes of historical load profiles, the load profile with the maximum poste-
rior probability is selected as the typical load profile of this class. On this basis, and
the typical load profile covering the most historical days is taken as the typical load
profile of the consumer.
(1) For the given k clusters, the cluster center and the corresponding BIC are deter-
mined.
(2) Some existing classes are split into two clusters by a k-means clustering method
(k = 2), where whether the cluster should be split depends on whether it can
help increase the value of BIC.
(3) Repeat the above steps until the number of clusters k reaches the pre-set max-
imum value, and select k that corresponds to the optimum value of BIC as the
final number of clusters.
Fig. 9.3 The classification of the Irish people’s household information released by SEAI
In the Irish smart meter trial data released by Electric Ireland and SEAI (Sustain-
able Energy Authority of Ireland) [22], the questionnaire recording Irish people’s
household information comprises 144 questions, and these questions can be divided
into four major categories and 12 minor categories, as shown in Fig. 9.3. The four
major categories of questions address social information, lifestyle, electrical appli-
ances and opinions about energy usage. These four major categories of questions
can then be classified into 16 minor categories, which are sex & age, income, occu-
pation, house, people living with, internet usage, heater, freezer, other appliances,
expectations, determination, and satisfaction.
To illustrate the questionnaire information clearly, the most representative ques-
tion in each category is listed in Table 9.1. For example, in the sex & age category,
the representative question is “please record sex from voice”, with the answer A
corresponding to Male, and B corresponding to Female. In the occupation category,
according to the NRS (National Readership Survey) social grades system [23], the
customers are classified into six grades, which are AB, C1, C2, DE, F, and refused.
These grades are represented by answer A to F, respectively. Note that there are
12 categories of questions, and for each category, one representative question and
its corresponding answers are shown in this table. Some answer options, such as
options C, D, E, and F in the sex & age category, are blank, indicating that there is
no corresponding answer to the options.
Table 9.1 Representative questions in the behavior questionnaire released by SEAI
Category Question A B C D E F
212
Sex & age Please record sex from voice Male Female
Income Can you state which of the Less than 15,000 15,000–30,000 30,000–50,000 50,000–75,000 75,000 or more Refused
following broad categories best Euros Euros Euros Euros Euros
represents the yearly household
income BEFORE TAX?
Occupation SOCIAL CLASS Interviewer, AB C1 C2 DE F Refused
Respondent said that occupation of
chief income earner was
<CLASS>Please code
House Do you own or rent your home? Rent (from a Rent (from a local Own outright (not Own with Other
private landlord) authority) mortgaged) mortgage etc.
People living with What best describes the people you I live alone All people in my Both adults and
live with? home are over 15 children under 15
years of age years of age live in
my home
Internet usage Do you use the internet regularly Yes No
yourself?
Heater Do you have a timer to control Yes No
when your heating comes on and
goes off?
Freezer Have any of standalone freezers Yes No
ever applied to you?
Non-heating Number of washing machine None 1 2 More than 2
appliances
Expectations I would now like to ask you about Yes No
your expectations about: Learn how
to reduce my energy usage
Determination My household may decide to make Strongly agree Agree Neutral Disagree Strongly disagree
major changes to the way we use
electricity
9 Coding for Household Energy Behavior
Satisfaction The overall cost of electricity Very Satisfied Satisfied Neutral Disagree Very Dissatisfied
9.4 Socioeconomic Genes Identification Method 213
For each consumer, the answer labels of all the questions in the questionnaire can
be arranged in the order of questions to form a label sequence, which represents
the socioeconomic characteristics of residents, only a part of which has an impact
on residents’ electricity consumption behavior. Coincidentally, hereditary molecular
DNA is composed of four bases, A, T, C, and G, arranged in an orderly manner.
Most of the base sequences of DNA are not expressed, but only those that can
express and affect biological traits are called genes. Inspired by the similarity between
DNA carrying biological traits and socioeconomic information labeling sequences
affecting consumers’ electricity consumption behavior, the following concepts are
vividly defined:
Socioeconomic DNA: An orderly sequence of labels representing users’ socioe-
conomic information. The labels of different locations represent different social and
economic characteristics of different aspects, and the labels of the same location
represent different social and economic characteristics of the same aspect.
Socioeconomic genes: labels that have a dominant impact on the consumer’s
electricity consumption profiles.
Socioeconomic gene loci: The sequence of socioeconomic genes in socioeco-
nomic DNA.
Socioeconomic gene profiles: Maps of socioeconomic genes and their loci are
represented on consumers’ socioeconomic DNA.
Fig. 9.4 Label sequence, energy behavior indicating question, energy behavior indicator and energy
behavior indicator map for a person
By setting a certain threshold value 1 , we can judge whether the entropy value
Sg is less than 1 or not. If Sg < 1 , the locus belongs to the gene locus, and the label
with the highest proportion of the locus is the socioeconomic gene.
As shown in Fig. 9.4, The four different answers A, B, C, and D are marked with
different colors. The sample household label sequence is a sequence of answers for
eight questions in the questionnaire. The sample questionnaire is composed of eight
different questions marked with Q1 Q8, and the sample household answers are A,
B, C, D, B, A, C, and D, respectively. Hence, the sample household label sequence
is composed of eight sequential answers, which are A, B, C, D, B, A, C, and D. The
number contained in each small rectangle shows the rate of the corresponding answer
in the group of the sample household. For example, the answer rates for Q3 and Q6
are 60% and 70%, respectively; for the remainder, the answer rates are all below 50%.
For this sample household, only the answer to Q3 and Q6 are the most frequent in the
group, and the corresponding EBCRs are greater than the preset threshold. Hence
Q3 and Q6 are energy behavior indicating questions, and the corresponding energy
behavior indicators are C (Q3) and A (Q6). From the energy behavior indicator map
of the household, it is obvious that only question 3 and 6 in the energy behavior
sequence are identified as energy behavior indicating questions, which means their
answers C (Q3) and A (Q6) are energy behavior indicators and are correlated with
the household energy behavior.
(2) Classification Entropy
In biology, genes can be divided into dominant genes and recessive genes. Compared
with recessive genes, dominant genes play a significant role in biological traits. When
dominant genes and recessive genes coexist, recessive genes will be “shielded” by
dominant genes, and thus have no effect on biological traits. In socioeconomic genes,
high gene entropy characterizes that a class of consumers contain almost the same
gene at the corresponding locus. However, there may be many types of consumers
with the same gene at the same time. It is impossible to judge which kind of consumers
9.4 Socioeconomic Genes Identification Method 215
belong to according to the gene. Thus, we define the dominant socioeconomic genes
which have strong classification ability for consumers, while the recessive socioe-
conomic genes which have weak classification ability for consumers. The so-called
consumer classification ability of socioeconomic gene is the ability to distinguish
which kind of consumer belongs to according to the socioeconomic gene. Gener-
ally speaking, the more uniformly the socioeconomic genes among the consumers
are distributed, the worse their classification ability is., the worse their classification
ability; the more concentrated they are among the minority consumers, the stronger
their classification ability is. In order to distinguish dominant genes from recessive
genes, it is necessary to reflect the classification ability of genes. The classification
entropy index of genes defined here is as follows:
Classification Entropy: A measure of the ability of a gene for consumer classifi-
cation. 0 means that the gene is simply distributed in one class; it is closer to 1 when
more genes are distributed in many classes.
For a gene, if it distributes in m clusters and the proportion of the ith class is qi ,
the classification entropy of the gene can be calculated as follows:
m
Sc = −qi logm (qi ) (9.5)
i=1
By setting a certain threshold value 2 , we can judge whether the entropy value
Sc is less than 2 or not. If Sc < 2 , the gene is the dominant gene.
The IGD represents the uniqueness of the energy behavior indicator. Different
groups of customers may have the same energy behavior indicator. The more groups
the energy behavior indicator is shared by, the lower the IGD is. A unique energy
behavior indicator means a higher dominance on energy behavior, and a common
energy behavior indicator probably has no decisive effects on energy behavior. There
is also a threshold for the IGD index to determine whether an energy behavior indi-
cator is dominant or recessive. As shown in Fig. 9.5, there are three groups of cus-
tomers that have different energy behaviors. The energy behavior indicators of group
1 include Q3 (C) and Q8 (D). The energy behavior indicators of group 2 include Q3
(C), Q6 (A), and Q8 (D). The energy behavior indicators of group 3 include Q3 (B)
and Q8 (D). The energy behavior indicator Q8 (D) is shared by all groups; hence, it
has little ability to differentiate energy behavior among different groups. The energy
behavior indicator Q3 (B) only exists in group 3, and Q6 (A) only exists in group 2.
Therefore, these indicators are more beneficial in identifying the energy behavior for
groups 2 and 3 than any other energy behavior indicators. The energy behavior indi-
cator Q3 (C) is shared by two groups - group 1 and group 2, so Q3 (C) is more unique
than Q8 (D), which is shared by three groups. The IGD fully represents these energy
behavior indicators’ uniqueness and dominance on energy behavior. By calculating
the IGD value and setting the threshold as 50%, the energy behavior indicators Q3
(B), Q3 (C), and Q6 (A) can be identified as dominant indicators, whereas Q8 (D) is
recessive.
216 9 Coding for Household Energy Behavior
Fig. 9.5 The dominance value represents the uniqueness and dominance of the energy behavior
indicator
Naive Bayesian classifier [24] is used in this chapter to predict the consumer load
profile. The inputs of the classifier are the dominant socioeconomic genes. This
method assumes that all dominant socioeconomic genes have the same importance
and independence. The posterior probability of each consumer load profile is cal-
culated based on the training data using Bayesian rule. According to the values of
9.5 Load Profile Prediction 217
socioeconomic genes of test data, the consumer load profile can be finally predicted
as the load profile with the maximum posterior probability. Assume that there are
n dominant socioeconomic gene loci, the labels of which are g1 ∼ gn . The value of
each gene for the typical load profiles of M consumers is C1 ∼ C M . The load profile
prediction process can be divided into the following three steps:
(1) Prior probability calculation
According to the training data, calculate the label values of the n dominant socioe-
conomic genes P(g1 ) ∼ P(gn ).
Calculate the occurrence probability of the ith (i = 1 ∼ M) class of typical load
profile P(C = Ci ).
Calculate the values of dominant socioeconomic gene loci P(g1 |C = Ci ),
P(g2 |C = Ci ), . . . , P(gn |C = Ci ) for all typical load profiles C = Ci .
(2) Posterior probability calculation
For test data, according to Bayesian theorem and dominant socioeconomic indepen-
dent hypothesis, when the label value of n dominant socioeconomic gene locus is
g1 ∼ gn , the posterior probability of consumer load profiles occurrence is calculated
as follows:
The smart meter data recording energy behavior and the demographic data recording
household information used in this chapter, are from Electric Ireland and SEAI. SEAI
released online fully anonymized data sets from smart meter trials for electricity
customers. The smart meter trials occurred during 2009 and 2010, with more than
5000 Irish households and businesses participating. The participating households
were all carefully recruited to ensure that they were representative of the national
population. 5375 residential participants were initially recruited with a return rate
of 78.7%, which means 4232 participants returned the pre-trial questionnaires [24].
Of participants who return the questionnaires, only 3487 have a record of smart
meter data. Hence, the final number of participants adopted in this study is 3487.
These participants’ data were mainly composed of two parts: (1) the smart meter data
218 9 Coding for Household Energy Behavior
Fig. 9.6 The typical energy spectral patterns of three groups of customers
which recorded the daily customer energy consumptions at a 30 min interval; (2) the
demographic data in the form of a questionnaire, comprising of 144 questions. These
questions and their answers include household information, such as customers’ age,
employment status, social class, electrical appliances, and energy usage habits.
In this step, according to smart meter data, Irish people are clustered into three groups
through x-means clustering. X-means clustering is an extended k-means method
with an efficient estimation of the number of clusters. This method overcomes the
main shortcoming of k-means method that the number of clusters k has to be pre-
determined. Applying the technique to cluster the Irish smart meter data, three typical
energy spectral patterns are identified and labeled as “day group”, “evening group”,
and “midnight group”. As shown in Fig. 9.6, for the day group, the major energy
usage occurs in the afternoon from 12:00 to 16:00, which means the group uses
electricity mainly during the day. For the evening group, the main consumption is
from 16:00 to 20:00, which means these people use electricity mainly in the evening.
For the midnight group, the load gradually increases throughout the day, with the
lowest load in the morning and the highest load late in the midnight.
Through the energy behavior indicator searching method, 91 energy behavior indica-
tors are found, of which 74 (81%) are recessive indicators and 17 (19%) are dominant
indicators, as shown in Fig. 9.7. Most of the behavior indicators are recessive, which
9.6 Case Studies 219
Fig. 9.7 Composition of the energy behavior indicators found in Irish people
Fig. 9.8 The energy behavior indicator maps for dominant indicators of the Irish people
indicates that although abundant behavior indicators are found, only a small propor-
tion of them are dominant, and contribute to the difference in energy behavior.
According to the classification results of the questions, the composition of dom-
inant indicators can also be divided into four major categories. Of the 17 dominant
indicators, nine behavior indicators belong to the lifestyle category, six are in the
electricity appliances category, one belongs to the category of social information,
and one is in the category of opinions about energy usage. This statistical result
shows that human features in the lifestyle category have the greatest effect on Irish
people’s energy behavior.
All of the energy behavior indicators in the lifestyle category are related to
internet usage habits. Superficially, internet usage is entirely unrelated to energy
behavior; however, they are highly associated according to the energy behavior indi-
cator results. To understand how the internet-usage-related behavior indicators affect
energy behavior in-depth, the energy behavior indicator map of the three groups are
shown in Fig. 9.8. The energy behavior indicator map is composed of 144 questions.
In this energy behavior indicator map, four of the internet-usage-related indicators
and another important employment-related indicator are plotted. The internet-usage-
related indicators include Q7 (A), Q7 (B), Q8 (A) and Q8 (B), and the employment-
related indicator is Q3 (A). From the figure, these energy behavior indicators’ IGD
values are greater than the IGD threshold; hence, they are all dominant.
220 9 Coding for Household Energy Behavior
Table 9.3 The energy behavior indicating questions and corresponding energy behavior indicators
in three groups
Id Energy behavior indicating Day group Evening group Midnight group
question
3 What is the employment Nag An employee Nag
status of the chief income
earner in your household, is
he/she
7 Do you use the internet No Nag Yes
regularly yourself?
8 Are there other people in No Yes Yes
your household that use the
internet regularly?
Fig. 9.9 The mapping relationship between different energy behavior groups of the Irish people
and their energy behavior indicators
different behavior indicating questions. Of the three questions, two address the fre-
quency of internet usage, and one considers employment status. Using the energy
behavior indicators provided by Table 9.3, a mapping between Irish people’s energy
behavior indicator and energy behavior is performed in Fig. 9.9. From this figure, it
can be seen that the Irish people in the day group do not use the internet. This is most
likely to describe an elderly group that is retired and also tends not to use internet.
The people in this group also tend to stay at home; hence, their daytime electricity
usage is the highest. For the midnight group, questions 7 and 8 show that this group
uses the internet regularly, suggesting this group is largely composed of young peo-
ple in a shared home or students. Young people often have a habit of joining parties,
going to pubs and clubs, or resting late. Therefore, their energy behavior peaks near
midnight. In the evening group, people are typically employed, which can explain
why people in this group use electricity mainly in the early evenings.
The Irish people’s internet usage habits are related to energy behavior. This phe-
nomenon is mainly because internet usage habits represent human’s preference in
lifestyle to some degree, including when to work, play, and rest. The lifestyle infor-
mation plays an important role in people’s energy consumption patterns; therefore,
internet and electricity usage are highly correlated.
222 9 Coding for Household Energy Behavior
9.7 Conclusions
In this chapter, the relationship between customers’ energy behaviors and their house-
hold information is first extracted and analyzed through the proposed cross-domain
feature selection and coding method. This enables access to disaggregating smart
meter data into a range of energy behaviors. Each energy behavior can then be
uniquely traced by a set of energy behavior indicators, thus offering a simple, trans-
parent and effective alternative to a challenging matching problem with massive
smart meter data and a huge range of possible indicators.
Energy behavior indicators explain household energy behavior with a deeper
view of the rationale behind differing energy behavior patterns, the underlying fac-
tors influencing human’s energy consumption and are able to predict people’s future
energy behavior if their behavior status changes. In the energy industry, according
to the Ireland case study, household energy behavior is highly correlated with the
features of employment status and internet usage. Through this finding, household
energy behavior can be forecasted based on several features, saving the investment
of installing millions of smart meters in Ireland. If other household load profiles in
Ireland are known, this information can also be used to infer household informa-
tion, thus creating value-added services and products for energy utilities and energy
service providers. The correlation between household energy behavior and human
features of employment status and internet usage in Ireland may not be appropri-
ate for other countries. Because the socioeconomic status (developed, developing or
under-developed) and geographical location (European, Asian, African) of countries
differ, different countries’ people may have different energy behavior patterns and
also different energy behavior indicators. Although different countries’ results may
differ, the method validated by the Ireland case can be applied universally.
References
1. Brown, B., Chui, M., & Manyika, J. (2011). Are you ready for the era of ‘big data’. McKinsey
Quarterly, 4(1), 24–35.
2. Williams, C. K. I., & Barber, D. (1998). Bayesian classification with gaussian processes. IEEE
Transactions on Pattern Analysis and Machine Intelligence, 20(12), 1342–1351.
3. Suykens, J. A. K., & Vandewalle, J. (1999). Least squares support vector machine classifiers.
Neural Processing Letters, 9(3), 293–300.
4. Friedl, M. A., & Brodley, C. E. (1997). Decision tree classification of land cover from remotely
sensed data. Remote Sensing of Environment, 61(3), 399–409.
5. Kutner, M. H., Nachtsheim, C. J., Neter, J., Li, W., & et al. (2005). Applied linear statistical
models (Vol. 5). Boston: McGraw-Hill Irwin.
6. Park, D. C., El-Sharkawi, M. A., Marks, R. J., Atlas, L. E., & Damborg, M. J. (1991). Electric
load forecasting using an artificial neural network. IEEE Transactions on Power Systems, 6(2),
442–449.
7. Smola, A. J., & Schölkopf, B. (2004). A tutorial on support vector regression. Statistics and
Computing, 14(3), 199–222.
References 223
8. Grygorash, O., Zhou, Y., & Jorgensen, Z. (2006). Minimum spanning tree based clustering
algorithms. 2006 18th IEEE International Conference on Tools with Artificial Intelligence
(ICTAI’06) (pp. 73–81). IEEE.
9. Langfelder, P., Zhang, B., & Horvath, S. (2007). Defining clusters from a hierarchical cluster
tree: The dynamic tree cut package for r. Bioinformatics, 24(5), 719–720.
10. Hartigan, J. A., & Wong, M. A. (1979). Algorithm as 136: A k-means clustering algorithm.
Journal of the Royal Statistical Society. Series C (Applied Statistics), 28(1), 100–108.
11. Wang, Y., Chen, Q., Kang, C., Zhang, M., Wang, K., & Zhao, Y. (2015). Load profiling and its
application to demand response: A review. Tsinghua Science and Technology, 20(2), 117–129.
12. Chuan, L., & Ukil, A. (2014). Modeling and validation of electrical load profiling in residential
buildings in singapore. IEEE Transactions on Power Systems, 30(5), 2800–2809.
13. Stephen, B., Mutanen, A. J., Galloway, S., Burt, G., & Järventausta, P. (2013). Enhanced load
profiling for residential network customers. IEEE Transactions on Power Delivery, 29(1), 88–
96.
14. Chicco, G., & Akilimali, J. S. (2010). Renyi entropy-based classification of daily electrical
load patterns. IET Generation, Transmission & Distribution, 4(6), 736–745.
15. Tsekouras, G. J., Hatziargyriou, N. D., & Dialynas, E. N. (2007). Two-stage pattern recognition
of load curves for classification of electricity customers. IEEE Transactions on Power Systems,
22(3), 1120–1128.
16. Chicco, G., Ionel, O.-M., & Porumb, R. (2012). Electrical load pattern grouping based on
centroid model with ant colony clustering. IEEE Transactions on Power Systems, 28(2), 1706–
1715.
17. Espinoza, M., Joye, C., Belmans, R., & De Moor, B. (2005). Short-term load forecasting, profile
identification, and customer segmentation: a methodology based on periodic time series. IEEE
Transactions on Power Systems, 20(3), 1622–1630.
18. Zeifman, M., & Roth, K. (2011). Nonintrusive appliance load monitoring: Review and outlook.
IEEE Transactions on Consumer Electronics, 57(1), 76–84.
19. Vercamer, D., Steurtewagen, B., Van den Poel, D., & Vermeulen, F. (2015). Predicting consumer
load profiles using commercial and open data. IEEE Transactions on Power Systems, 31(5),
3693–3701.
20. Rasmussen, C. E. (2000). The infinite gaussian mixture model. Advances in Neural Information
Processing Systems (pp. 554–560).
21. Pelleg, D., Moore, A. W., & et al. (2000). X-means: Extending k-means with efficient estimation
of the number of clusters. Icml (Vol. 1, pp. 727–734).
22. Irish Social Science Data Archive. (2012). Commission for energy regulation (cer) smart meter-
ing project. http://www.ucd.ie/issda/data/commissionforenergyregulationcer/.
23. Meier, E., & Moy, C. (2004). Social grading and the census. International Journal of Market
Research, 46(2), 141–170.
24. McCallum, A., Nigam, K., & et al. (1998). A comparison of event models for naive bayes text
classification. AAAI-98 Workshop on Learning for Text Categorization (Vol. 752, pp. 41–48).
Citeseer.
Chapter 10
Clustering of Consumption Behavior
Dynamics
Abstract In a competitive retail market, large volumes of smart meter data provide
opportunities for load-serving entities (LSEs) to enhance their knowledge of cus-
tomers’ electricity consumption behaviors via load profiling. Instead of focusing on
the shape of the load curves, this chapter proposes a novel approach for the clustering
of electricity consumption behavior dynamics, where “dynamics” refer to transitions
and relations between consumption behaviors, or rather consumption levels, in adja-
cent periods. First, for each individual customer, symbolic aggregate approximation
(SAX) is performed to reduce the scale of the data set, and a time-based Markov
model is applied to model the dynamic of electricity consumption, transforming the
large data set of load curves to several state transition matrices. Second, a clustering
technique by Fast Search and Find of Density Peaks (CFSFDP) is primarily carried
out to obtain the typical dynamics of consumer behavior, with the difference between
any two consumption patterns measured by the Kullback–Liebler (K–L) distance,
and to classify the customers into several clusters. To tackle the challenges of big data,
the CFSFDP technique is integrated into a divide-and-conquer approach toward big
data applications. A numerical case verifies the effectiveness of the proposed models
and approaches.
10.1 Introduction
The existing studies on load profiling mainly focus on individual large indus-
trial/commercial customer, medium or low voltage feeder, or a combination of small
customers, load profiles of which shows much more regularity [1]. It should be noted
that although these dynamic characteristics are always “deluged” in a combination
of customers, they could be described by several typical load patterns. However,
with regard to residential customers, at least two new challenges will be faced. One
challenge is the great variety and variability of the load patterns. As indicated in
Fig. 10.1, there are clear differences in the electricity consumption patterns of the
two residents. Peak loads have different amplitudes and occur at different times of
day, for example. Electricity consumption patterns also vary daily even for the same
customer. In this case, several typical daily load patterns are not fine enough to
reveal the actual consumption behaviors. The daily profile should be decomposed
© Science Press and Springer Nature Singapore Pte Ltd. 2020 225
Y. Wang et al., Smart Meter Data Analytics,
https://doi.org/10.1007/978-981-15-2624-4_10
226 10 Clustering of Consumption Behavior Dynamics
(a) Resident #1
(b) Resident #2
Fig. 10.1 Daily electricity load profiles of two residents over three weeks
into more fine-grained fragments, which are dynamically changed and identified.
Moreover, as the consumption behavior of a specific customer is essentially a state-
dependent, stochastic process, it is important to explore the dynamic characteristics,
e.g., switching and maintaining, of the consumption states and the corresponding
probabilities. The other challenge is that of “big data”. Considering the high fre-
quency and dimensionality of the data contained in the load curves, data sets in
the multi-petabyte range will be analyzed [2]. Traditional clustering techniques are
tricky to be executed in a “big data world”.
To tackle these two challenges, this chapter implements a time-based Markov
model to formulate the dynamics of customers’ electricity consumption behaviors,
considering the state-dependent characteristics, which indicates that future consump-
10.1 Introduction 227
tion behaviors would be related to the current states. This assumption is reasonable
as various electricity consumption behaviors would last for different periods before
being capable of changes, as could be abstracted from historical performances. The
transitions and relations between consumption behaviors, or rather consumption lev-
els, in adjacent periods are referred to as “dynamics” in this chapter. These dynamics
have been modeled by the Markov model in several works [3]. However, few papers
consider the dynamics as a factor for clustering. Profiling of the dynamics could pro-
vide useful information for understanding the consumption patterns of customers,
forecasting the consumption trends in short periods, and identifying the potential
demand response targets. Moreover, this approach formulates the large data set of
load curves as several state transition matrices, greatly reducing the dimensionality
and scale.
In addition to the Markov model, this chapter tries to address the “data deluge”
issue in three other ways. First, applying SAX to transform the load curves into
a symbolic string to reduce the storage space and ease the communication traffic
between smart meters and data centers. Second, a recently reported effective cluster-
ing technique by Fast Search and Find of Density Peaks (CFSFDP) is first utilized
to profile the electricity consumption behaviors, which has the advantages of low
time complexity and robustness to noise points [4]. The dynamics of electricity
consumption are described by the differences between every two consumption pat-
terns, as measured by the Kullback–Liebler (K–L) distance [5]. Third, to tackle the
challenges of big and dispersed data, the CFSFDP technique is integrated into a
divide-and-conquer approach to further improve the efficiency of data processing,
where adaptive k-means is applied to obtain the representative customers at the local
sites and a modified CFSFDP method is performed at the global sites. The approach
could be further applied to big data applications.
Finally, the potential applications of the proposed method to demand response
targeting, abnormal consumption behavior detecting and load forecasting are ana-
lyzed and discussed. Especially, entropy analysis is conducted based on the clustering
results to evaluate the variability of consumption behavior for each cluster, which
can be used to quantify the potential of price-based and incentive-based demand
response.
the typical dynamics of electricity consumption. Finally, the results analysis of the
demand response targeting is conducted in the sixth stage. The details of the first
five stages will be introduced in the following, and the demand response targeting
analysis part will be further explained in the case studies.
Data preparations including data cleaning is not the subject of this chapter and will
not be discussed. To make the load profiles comparable, the normalization process
transforms the consumption data of arbitrary value x = {x1 , x2 , . . . x H } to the range
of (0, 1), as shown in (10.1).
xi − xmin
xi = (10.1)
xmax − xmin
where, xi and x i denote the actual and normalized electricity consumption at time
i; xmin and xmax denote the minimum and maximum consumption over H periods
respectively.
It should be noted that the normalization is performed daily instead of over entire
periods. This strategy is chosen for at least three reasons. First, it can weaken the
impact of anomalous days with critical peaks or bad data injections. Second, it
can provide load shapes which maximum values are less affected by the daily or
10.2 Basic Methodology 229
seasonal changes. Third, it can filter out the baseload, which has little effect on
demand response and reserve, in favor of the fluctuant part, which shows greater
potential in demand response.
SAX is a powerful technique for the dimensional reduction and representation of time
series data with the lower bounding of the Euclidean distance [6]. SAX discretizes
numeric time series into symbolic strings by two steps: transforming the load data into
a piecewise aggregate approximation (PAA) representation and then symbolizing the
PAA representation into a discrete string.
The basic idea of PAA is intuitive and simple, replacing the amplitude values
falling in the same time interval with their mean values, as shown in (10.2).
1
ki
x̄i = x j (10.2)
ki − ki−1 j=ki−1 +1
where j is the index of the normalized load data; i is the index of the transformed
PAA load data; ki is the ith time domain breakpoint; and x̄i is the average value of
the ith segment [6].
The averaging of the PAA can smooth out large, short-duration “spikes” of load
profiles. It has been proven that PAA has all the pruning power of the Haar-based
DWT and can be defined for arbitrary length queries with lower computation cost
[6].
The transformed PAA time series data are then referred by the SAX algorithm
to obtain a discrete symbolic representation. The amplitude axis is partitioned into
N intervals, and each univocal representation w p corresponds to an amplitude range
[β p−1 , β p ]. On this basis, the mapping from a PAA approximation x̄i to a word w p
is obtained as follows:
αi = w p if β p−1 < x̄i < β p (10.3)
Hence, the load curves can be represented by a symbolic string α. For exam-
ple, Fig. 10.3 shows the normalized electricity consumption data collected from
customer #1512 over one week (168 hours) at a frequency of 30 min. The time
axis is divided into four periods each day. These data can be represented as
“abcabbcaaacabaabbaccbbbcbabb”, with three symbols and a total of 28 periods.
For traditional SAX, the time domain is divided into regular intervals, and inside
each interval, the average of the amplitude values is calculated.
The main concern of SAX is the determination of the time domain breakpoint ki
and the amplitude breakpoint β p . Generally, the time domain is partitioned uniformly,
and the amplitude axis is partitioned based on the normal distribution hypothesis [7].
To make the breakpoints clear in physics meaning, this chapter adopts a non-
regular interval on both time domain and amplitude. Specifically, the time domain
230 10 Clustering of Consumption Behavior Dynamics
Fig. 10.3 Electricity consumption data of customer #1512 over one week and its SAX representa-
tion
If we want to predict the trend or level of electricity consumption for each customer,
we may make full use of their past and present states. If the future consumption level
10.2 Basic Methodology 231
or state depends only on the present state, it is called a Markov property and can
be modeled by a Markov chain. Various Markov models have been applied to load
forecasting [10].
For a symbolic string with N symbols, discrete Markov model with N correspond-
ing states can be applied to model the dynamic characteristics of their consumption
levels. However, customers have different dynamic characteristics at different peri-
ods for their regular routines every day. Therefore, time-based Markov model is
applied to formulate the characteristics. For each adjacent period, a Markov chain
can be modeled. Then, the one-step transition number matrix F t at period t can be
calculated. From F t , the transition probability matrix P t at period t can be further
estimated according to (10.4).
⎧
⎪
⎪
f itj n
⎪
⎨ if f kt j = 0
n
t k=1
p̂it j = fk j (10.4)
⎪
⎪ k=1
⎪
⎩
0 otherwise
⎡ ⎤ ⎡ t ⎤
t
f 11 t
f 12 · · · f 1n
t
p̂11 p̂12t
· · · p̂1n
t
⎢ ft ft ··· ft ⎥ ⎢ p̂ t p̂ t · · · p̂ t ⎥
where, F t = ⎢ 2n ⎥ ⎢ 21 22 2n ⎥
⎣ · · · · · · · · · · · · ⎦ ; P̂ = ⎣ · · · · · · · · · · · · ⎦ ; p̂i j denotes the esti-
21 22 t t
t
f n1 t
f n2 · · · f nn
t t
p̂n1 t
p̂n2 · · · p̂nn
t
N
N
pi j
χ =2
2
f i j log (10.5)
i=1 j=1
p• j
N
N
N
where p• j = fi j / f ik , N is the number of states.
i=1 i=1 k=1
Given a significance level α, if χ 2 ≥ χα2 ((N − 1)2 ) holds, we can be reasonably
confident that the electricity consumption of customers has a Markov property.
1 t
N N
pt
K L D(Pit , P jt ) = pimn log timn (10.6)
N m=1 n=1 p jmn
Note that K L D(Pit , P jt ) = K L D(P jt , Pit ) is not guaranteed to hold; that is to say,
the K–L distance is unsymmetrical. For the convenience of clustering, we define the
symmetric K–L distance Dit j of two Markov model at period t as [14]
T
Di j = Dit j (10.8)
t=1
The dissimilarity matrix is derived by calculating the K–L distance among all cus-
tomers according to (10.8).
For the point with the highest local density, the minimum distance δi = max j (Di j ).
Thus, the object with much larger δi has the maximum density in the local or global
area.
Hence, each object or point has two important quantities: local density ρi and
distance δi . We can plot all the points Ai (ρi , δi ) on a two-dimensional plane, which
is called the decision graph. The points of higher local density and a larger distance
than the thresholds (ρ0 , δ0 ) can be identified as density peaks or cluster centers. After
these density peaks are found, other remaining points are assigned to the same cluster
as its nearest neighbor of higher density.
As stated above, the proposed clustering method has the following advantages so
that we adopt in our study.
First, CFSFDP is so elegant and simple that fewer parameters are needed with
low time complexity, and it has shown high performance in classifying several data
sets. After finding the density peaks, the assignment of each object can be performed
in a single step without iteration, in contrast with many other clustering methods like
k-means.
Second, CFSFDP as density-based clustering technique can effectively detect
non-spherically distributed data and be robust to noise points, which is verified in
our case studies.
Third, the distribution of objects on the decision graph reveals much information.
For example, it is easy to detect the outliers or bad data injections with a small ρi and
large δi , and find the objects around the edge of the cluster with both small ρi and
δi . The number of clusters can be adjusted elastically according to the distribution
of objects by setting different thresholds for ρi and δi .
analysis and clustering of large data sets gathered from each distributed site need a
very long time and memory overhead. When applying the CFSFDP, the dissimilarity
matrix of all the customers should first be obtained, which accounts for most of the
computation time. Both the time and space complexity of the CFSFDP are O(N 2 ).
In fact, there exist many works on parallel clustering for big data applications
[15, 16]. For these algorithms, the whole data set should reside on the same data
center and then be distributed to different clients like map-and-reduce in Hadoop. It
is not satisfied with the practical situation of electricity consumption data collecting
and storing. Besides, some fully distributed clustering algorithms are also proposed
[17] by aggregating the information of local data and then sending it to a central
site for central analysis. However, these algorithms do not have the advantages as
CFSFDP. Thus, this section is proposed to design a fully distributed instead of parallel
clustering algorithm to ease the communication and computation burden as well as
retain the advantages of the CFSFDP by a divide-and-conquer framework.
10.3.1 Framework
A set of clustering centers will be obtained by k-means, where the sum of the squared
distances between each object is minimized. These centroids can be used as a “code
book”: each object can be represented by the corresponding centroid with the least
error. This is called vector quantization (VQ). We try to establish a local model by
finding the “code book” that guarantees that the distortion of each object by VQ
satisfies the threshold condition according to (10.11)
T
N
N
2
T
N
N
2
Ek = ( pit j − Ckit j ) ≤ θ C tki j (10.11)
t=1 j=1 i=1 t=1 j=1 i=1
where Ckit j denotes the kth centroid; and θ denotes the distortion threshold.
Traditional k-means needs a given number of centers, which makes it difficult
to guarantee that (10.11) holds. In this chapter, an adaptive k-means is adopted to
dynamically adjust the number of centers following a simple rule: if an object violates
the threshold condition, 2-means (i.e. k-means for k = 2) will be applied to partition
this cluster further and add a new center to the “codebook” [18].
The Fig. 10.5 shows the detailed procedures of the adaptive k-means method. The
distortion threshold θ varies depending on different needs. The smaller threshold
corresponds to higher clustering accuracy and a larger number of local represen-
236 10 Clustering of Consumption Behavior Dynamics
tative objects, and vice versa. As a supplement of distortion threshold and another
terminating condition of the iteration, the parameters, K min and K max , are given to
limit the size of the “codebook”. The value of K min and K max can be determined
according to the data transmission limits. Especially, if ensuring certain precision is
the priority, the adaptive k-means can start from K min = 2 until the (10.11) holds
by setting the value of K max to positive infinity. The proposed adaptive k-means
distinguishes from traditional k-means in at least four aspects:
First, for adaptive k-means, the number of clusters adjusts dynamically depending
on whether the distortion threshold condition is satisfied, in contrast to traditional
k-means, where it should be pre-determined.
Second, the convergence condition of adaptive k-means is given by (10.11) and
K max . While traditional k-means converges until the sum of the squared distances
between each object no longer decreases.
Third, the proposed algorithm is capable of retaining the information of outliers
on each site because these outliers will become separate clusters.
Fourth, this algorithm applies 2-means to the violating cluster separately. Thus
it has a small computational burden, and parallel computation potential makes it
applicable to large data sets. While traditional k-means is conducted on the whole
data sets.
10.3 Distributed Algorithm for Large Data Sets 237
The original CFSFDP algorithm considers the clustered objects equally. However, in
a two-level clustering framework, the selected representative models from different
local sites might represent “samples” of different populations. It would be reasonable
to consider the representativeness of the local models in the centralized clustering.
Thus, a modified CFSFDP method is proposed, which introduces a weight factor to
differentiate the representativeness of the local models. Without loss of generality,
the weight factor, C j , is added to the local density calculation
N
ρi = C j χ Di j − dc (10.12)
j=1
where C j refers to the weight of the representative points of each cluster in Mi, which
is equal to the number of objects belonging to the cluster. The calculation of δi is the
same as (10). Similarly, based on the calculated ρi and δi , a decision graph can be
drawn to find the density peaks that have a higher local density ρi and larger distance
δi as cluster centers. After the determination of the cluster centers, each of the other
objects is assigned to the same cluster as its nearest neighbor of higher density [4].
Now that each representative object from the distributed site has its own cluster
label, the objects on the distributed sites will be relabeled according to the cluster
label of the representative object. If two centroids ended up in the same cluster, then
all their objects will belong to the same cluster.
The data set used in this chapter was provided by Research Perspective, Ltd. and con-
tains the electricity consumption of 6,445 customers (4,511 residents, 391 industries,
and 1533 unknown) over one and a half years (537 days) at a granularity of 30 min
[19]. The whole data set consists of a total of 3.46 million (6445 × 537) daily load
profiles. The bad load profiles are roughly identified by detecting the load profiles
with missing values or all zeroes. Among these massive load data, we eliminate 6187
bad load profiles, which is a very small sample (approximately 0.18%) of the whole
data set.
238 10 Clustering of Consumption Behavior Dynamics
Fig. 10.6 Histogram and CDF of PAA representations of the whole data sets
Fig. 10.7 The average error with different number of Markov states
Fig. 10.8 Decision graph to find density peaks for full periods
Fig. 10.9 2-D plane mapping for full periods of 6445 customers by MDS according to their K–L
distance
place each object in N-dimensional space such that the between-object distances are
preserved as closely as possible. Each point in the plane stands for a customer. Points
in the same cluster are marked with the same color. It can be seen that the customers
of different clusters are unevenly distributed. Approximately 90% of the customers
belong to the 10 larger clusters, whereas the other 10% are distributed in the other
30 clusters. In this way, these 6445 customers are segmented into different groups
according to their electricity consumption dynamic characteristics for full periods.
Note that the customers in the same cluster have similar electricity consumption
behavior dynamics over a certain period instead of similar shape in load profiles.
Sometimes, we may not be concerned with the dynamic characteristics of full peri-
ods and instead concentrate on a certain period. For example, to evaluate the demand
response potential in noon peak shaving of each customer, the dynamics from Period
1 to Period 2 are much more important; to measure the potential to follow the change
of wind power at midnight, the dynamics from Period 4 to Period 1 should be empha-
sized. Thus, it is necessary to conduct customer segmentation for different adjacent
periods. Figure 10.10 illustrates the decision graph and 2-D plane mapping of cus-
tomers for the four adjacent periods.
It can be seen that the distributions of the customers of the four adjacent periods
are shaped like bells, and the proposed clustering technique can effectively address
the non-spherically distributed data. Unsurprisingly, the dynamics from Period 2 to
Period 3 and from Period 3 to Period 4 show more diversity because people become
more active during the day, whereas the dynamics from Period 1 to Period 2 and
from Period 4 to Period 1 show less diversity because most people are off duty
and go to sleep with less electricity consumption. Taking the dynamics from Period
10.4 Case Studies 241
Fig. 10.10 Decision graph and 2-D plane mapping of customers for different adjacent periods
2 to Period 3 as an example, the six most typical dynamic patterns are shown in
Fig. 10.11. The percent in each matrix stands for the percentage of customers who
belong to the cluster. For example, approximately 37% of the customers have very
similar electricity consumption dynamics to that of Type_1.
242 10 Clustering of Consumption Behavior Dynamics
To verify the proposed distributed clustering algorithm, we divide the 6445 customers
into three equal parts. Then, the distortion threshold θ is carefully selected for the
adaptive k-means method, as a larger threshold leads to poor accuracy, whereas a
smaller one leads to little compression. We run 100 cases by varying θ from 0.0025
to 0.25 with steps of 0.0025 and calculate the average compression ratio (CR) of
the three distributed sites for each case. The CR is defined as the ratio between
the volume of the compressed data and the volume of the original data. Especially,
the compressed data refers to local models obtained by adaptive k-means, and the
original data refers to the whole objects distributed on each site:
The lower the CR, the better the compression effect. Figure 10.12 shows the
relationship between the average compression ratio and the threshold of different
periods. To obtain a lower compression ratio and guarantee clustering quality, we
choose “knee point” A as a balance, where θ is approximately 0.025 and the average
compression ratio is approximately 0.065. K min and K max are valued as 10 and 1000
respectively.
To evaluate the performance of the proposed algorithm, we run both the central-
ized and distributed clustering processes. The high consistency indicates the good
performance of the distributed algorithm. As shown in Table 10.1, the matching rate
of the algorithm with centralized algorithm can be as high as 96.47%. This indicates
that the proposed algorithm has a higher clustering quality with a lower CR. In addi-
tion, the time and space complexity of the modified CFSFDP in global modeling is
O((C R · N )2 ). This means that the efficiency of the global clustering has increased
by (1/C R)2 times, where CR < 1 holds. In this case, the efficiency has been boosted
to approximately (1/0.065)2 ≈ 235 times.
10.4 Case Studies 243
Fig. 10.12 The relationship between averaged compression ratio and threshold for Markov model
of different periods
Table 10.1 Matching matrix of centralized clustering with three clusters for gull periods
Centralized clustering
Cluster 1 Cluster 2 Cluster 3
Distributed clustering Cluster 1 2417 15 143
Cluster 2 46 991 0
Cluster 3 22 3 2808
Different from the traditional load profiling methods which mainly focus on the shape
of load profiles, this chapter tries to perform clustering on the load consumption
change extents and possibilities in adjacent periods, indicating dynamic features of
customer consumption behaviors. The proposed modeling method has many poten-
244 10 Clustering of Consumption Behavior Dynamics
tial applications. For example, on the decision graph obtained by CFSFDP such as
Figs. 10.8 and 10.10, we can easily find the objects with small ρi and large δi , which
can be considered as outlier. That is to say, this customer shows the great difference
in electricity consumption behavior dynamics. However, customers of similar social
eco-backgrounds are more likely to have similar electricity consumption behav-
ior dynamics. Thus, we can detect abnormal or suspicious electricity consumption
behavior quickly through the decision graph. For another example, future consump-
tion can be simulated through Monte-Carlo from the angle of statistics and probability
if the state transition probability matrix is known. Based on the simulated electricity
consumption, optimal ToU tariff can be designed. Moreover, entropy-based demand
response targeting will be further analyzed in this section as an illustration of the
applications.
It is believed that customers of less variability and heavier consumption are suit-
able for incentive-based demand response programs like direct load control (DLC)
for their predictability for control, whereas customers of greater variability and heav-
ier consumption are suitable for price-based demand response programs, like ToU
pricing, for their flexibility to modify their consumption. Note that a N × N state
transition probability matrix is essentially a combination of N probability distri-
butions as mentioned before. Obviously, though the dynamic characteristics have
been abstracted into 3 × 3 matrices as shown in Fig. 10.11, we can make intuitive
evaluations on the customers toward demand response targeting by introducing the
approach of entropy evaluation to further extract information from the matrices. The
variability could be quantified by the Shannon entropy [21] of the state transition
matrix:
T N N
Entropy = − pit j log pit j (10.14)
t=1 i=1 j=1
Table 10.2 shows the entropies of the Markov model in Fig. 10.11. It can be seen
that Type_3 shows the minimum entropy. The 0.994 in the Type_3 matrix means that
the Type_3 customers have a greater opportunity to remain unchanged in state c, i.e.,
the higher consumption level, and are easier to predict. Thus, customers of Type_3
may have a greater potential for an incentive-based demand response during Period
3. However, Type_1 and Type_2 show much higher entropies and have a relatively
higher consumption level than Type_3, which makes them much more suitable for
price-based demand response. For example, the Type_1 and Type_2 customers have
almost the same probability of switching from state c to state b and state c, which is
hard to predict, and have more flexibility to adjust their consumption behaviors.
10.6 Conclusions 245
10.6 Conclusions
In this chapter, a novel approach for the clustering of electricity consumption behav-
ior dynamics toward large data sets has been proposed. Different from traditional
load profiling from a static perspective, SAX and the time-based Markov model
are utilized to model the electricity consumption dynamic characteristics of each
customer. A density-based clustering technique, CFSFDP, is performed to discover
the typical dynamics of electricity consumption and segment customers into differ-
ent groups. Finally, a time-domain analysis and entropy evaluation are conducted
on the result of the dynamic clustering to identify the demand response potential
of each group’s customers. The challenges of massive high-dimensional electricity
consumption data are addressed in three ways. First, SAX can reduce and discretize
the numerical consumption data to ease the cost of data communication and storage.
Second, the Markov model is utilized to transform long-term data to several tran-
sition matrices. Third, a distributed clustering algorithm is proposed for distributed
big data sets.
References
1. Notaristefano, A., Chicco, G., & Piglione, F. (2013). Data size reduction with symbolic aggre-
gate approximation for electrical load pattern grouping. IET Generation, Transmission & Dis-
tribution, 7(2), 108–117.
2. Rodriguez, M., González, I., & Zalama, E. (2014). Identification of electrical devices apply-
ing big data and machine learning techniques to power consumption data. In International
Technology Robotics Applications, pp. 37–46. Springer.
3. Torriti, J. (2014). A review of time use models of residential electricity demand. Renewable
and Sustainable Energy Reviews, 37, 265–272.
4. Rodriguez, A., & Laio, A. (2014). Clustering by fast search and find of density peaks. Science,
344(6191), 1492–1496.
5. Kullback, S., & Leibler, R. A. (1951). On information and sufficiency. The Annals of Mathe-
matical Statistics, 22(1), 79–86.
6. Lin, J., Keogh, E., Lonardi, S., & Chiu, B. (2003). A symbolic representation of time series,
with implications for streaming algorithms. In Proceedings of the 8th ACM SIGMOD workshop
on Research issues in data mining and knowledge discovery, pp. 2–11. ACM.
7. Lin, J., Keogh, E., Wei, L., & Lonardi, S. (2007). Experiencing SAX: A novel symbolic
representation of time series. Data Mining and Knowledge Discovery, 15(2), 107–144.
8. Haben, S., Singleton, C., & Grindrod, P. (2015). Analysis and clustering of residential customers
energy behavioral demand using smart meter data. IEEE Transactions on Smart Grid, 7(1),
136–144.
9. Labeeuw, W., & Deconinck, G. (2013). Residential electrical load model based on mixture
model clustering and Markov models. IEEE Transactions on Industrial Informatics, 9(3),
1561–1569.
10. Niu, D., Shi, H., Li, J., & Xu, C. (2010). Research on power load forecasting based on combined
model of Markov and BP neural networks. In 2010 8th World Congress on Intelligent Control
and Automation, pp. 4372–4375. IEEE.
11. Yang, Y., Wang, Z., Zhang, Q., & Yang, Y. (2010). A time based Markov model for automatic
position-dependent services in smart home. In 2010 Chinese Control and Decision Conference,
pp. 2771–2776. IEEE.
246 10 Clustering of Consumption Behavior Dynamics
12. Zhang, Y., Zhang, Q., & Yu, R. (2010). Markov property of Markov chains and its test. In 2010
International Conference on Machine Learning and Cybernetics, vol. 4, pp. 1864–1867.
13. Liao, T. W. (2005). Clustering of time series data—a survey. Pattern Recognition, 38(11),
1857–1874.
14. Tabibian, S., Akbari, A., & Nasersharif, B. (2015). Speech enhancement using a wavelet thresh-
olding method based on symmetric Kullback-Leibler divergence. Signal Processing, 106, 184–
197.
15. Zhao, W., Ma, H., & He, Q. (2009). Parallel k-means clustering based on mapreduce. In IEEE
International Conference on Cloud Computing, pp. 674–679. Springer.
16. Sun, Z., Fox, G., Weidong, G., & Li, Z. (2014). A parallel clustering method combined infor-
mation bottleneck theory and centroid-based clustering. The Journal of Supercomputing, 69(1),
452–467.
17. Januzaj, E., Kriegel, H.-P., & Pfeifle, M. (2004). Dbdc: Density based distributed clustering.
In International Conference on Extending Database Technology, pp. 88–105. Springer.
18. Kwac, J., Flora, J., & Rajagopal, R. (2014). Household energy consumption segmentation using
hourly data. IEEE Transactions on Smart Grid, 5(1), 420–430.
19. Commission for Energy Regulation (CER). (2012). CER Smart Metering Project - electricity
customer behaviour trial, 2009-2010. Irish Social Science Data Archive. SN: 0012-00.
20. de Leeuw, J., & Heiser, W. (1982). 13 theory of multidimensional scaling. Handbook of Statis-
tics, 2, 285–316.
21. Lin, J. (1991). Divergence measures based on the shannon entropy. IEEE Transactions on
Information Theory, 37(1), 145–151.
Chapter 11
Probabilistic Residential Load
Forecasting
11.1 Introduction
Electrical load forecasting is the basis of power system planning and operation. It is of
great significance to provide a load forecast that strikes a balance between supply and
demand, thus allowing for more efficient planning and dispatch of the energy and min-
imization of energy waste [1]. Traditional load forecasting mainly focuses on system-
level or bus-level loads. However, the wide installation of smart meters enables the
collection of massive amounts of fine-grained electricity consumption data, mak-
ing it possible to implement load forecasting for individual consumers. Individual
consumer load forecasting acts as the operation data source for demand response
implementation [2], energy home management [3], transactive energy [4], etc.
In recent years, an increasing amount of research has been carried out on individual
consumer load forecasting. Loads of individual households show greater volatility
compared with aggregated load [2]. Different machine learning techniques, such as
linear regression, feed-forward NNs, SVR, and least squares support vector machine
(LS-SVM), were applied to four house loads. The results show that these methods
perform poorly on individual loads, with LS-SVM giving the best performance. A
least absolute shrinkage and selection operator (Lasso) panelized linear regression
© Science Press and Springer Nature Singapore Pte Ltd. 2020 247
Y. Wang et al., Smart Meter Data Analytics,
https://doi.org/10.1007/978-981-15-2624-4_11
248 11 Probabilistic Residential Load Forecasting
model was proposed in [5] to capture the sparsity of individual household load pro-
files. This linear Lasso regression has low computational burden, and the results are
more interpretable compared with nonlinear machine learning models. A clustering-
based approach considered in [6] took into account the daily load profile as a segment
and directly forecasted the whole segment instead of the load at each time point sep-
arately. A clustering algorithm was also used in [7] for household load prediction
where the transition of load shape between two days was characterized by a Markov
model. Shape-based clustering can also consider the time drift of the load profiles.
Recurrent neural networks (RNNs), the most widely used deep-learning architecture
for time series modeling, were applied for household load forecasting in [8]. Case
studies on 920 customers from Ireland show that an RNN-based method outper-
forms traditional forecasting models, including an autoregressive integrated moving
average model (ARIMA) and SVR. A long short-term memory (LSTM) RNN was
also used in [9, 10]. The difference between the work done in [10] and that in [9]
is that more detailed information about appliance-level consumption was known.
Sparse coding was applied to model the household load profiles in [11], and differ-
ent forecasting methods including ARIMA, Holt-Winters, and ridge regression were
then performed on the extracted features. Compressive sensing techniques were also
applied in [12] to explore the spatiotemporal sparsity within the residential load pro-
files. Reference [13] investigated how calendar variables, forecasting granularity, and
the length of the training set influenced the forecasting performance. Comprehensive
case studies were carried out with various regression models. The performance was
evaluated using root mean square error (RMSE) and normalized RMSE.
Since the individual load is highly volatile and may be very close to zero for
some time periods, traditional forecasting error metrics, such as the mean absolute
percentage error (MAPE), are not suitable for quantifying the performance of dif-
ferent methods. Thus, in addition to the establishment of new forecasting models,
the metrics of forecasting error have also been studied. A novel adjusted p-norm
error was proposed in [14] to reduce the “double penalty” effect caused by the time
drift of residential load profiles. This metric is quite similar to dynamic time warping
(DTW). To tackle the challenges of near-zero values and outliers for MAPE, another
metric, the mean arctangent absolute percentage error (MAAPE), was proposed in
[15]. MAAPE, a variation of MAPE, is defined as a slope and an angle, the tangent of
which equals the ratio between the absolute error and the real value, i.e., the absolute
percentage error (APE).
Traditional point load forecasting can only provide the expected values of future
loads. One of the recent advances in load forecasting has been probabilistic load fore-
casting, which is presented in the form of density, quantiles, or intervals. Density
load forecasts were obtained by Gaussian process quantile regression in [16]. The
proposed Gaussian process quantile regression belongs to a nonparametric method.
Different quantile regression methods have also been applied to net load forecast-
ing [17] and modeling the effect of temperature on load [18]. Quantile regression
averaging was applied to multiple point sister forecasts in [19]. The quantile regres-
sion averaging bridges point forecasts and probabilistic forecasts. A comprehensive
review of probabilistic load forecasting can be found in [20].
11.1 Introduction 249
This section introduces the pinball loss guided LSTM regression for probabilistic
residential load forecasting. The main idea is to combine the strength of LSTM with
quantile regression: the former is able to capture the long- and short-term depen-
dencies within the load data, and the latter is able to provide the future uncertainty
information using predefined quantiles.
250 11 Probabilistic Residential Load Forecasting
11.2.1 LSTM
LSTM is an efficient RNN architecture for time series modeling and forecasting.
Traditional neural networks try to learn the correspondence between inputs and out-
puts from a static perspective. However, when the input data are a time series, the
information will be lost if these data are independently trained as inputs and outputs
of the neural network. Compared with traditional neural networks, RNNs make a
link between each two “input-output” pair. Figure 11.1 shows the basic topology of
a simple RNN, where X and Y denote the input and output data; h denotes the hid-
den state; Whx , W yh , and Whh denote the weight matrices describing the relationship
between X and h, h and Y , and h and h. The output yt is not only determined by
the input Xt but also by the last hidden state ht−1 . The hidden state ht is the key
component to keep the temporal dependences within the time series.
However, the simple RNN has only a single hidden state h, which is sensitive
to short-term input. To capture long-term dependencies within the time series, an
LSTM unit contains two hidden states ht and ct , which are designed for keeping
short-term information and long-term information, respectively. The inner structure
of an LSTM unit is presented in Fig. 11.2.
The hidden state c contains an extra mechanism for strategically forgetting unre-
lated information corresponding to the current time. To retain the long-term informa-
tion, three control gates are introduced in the LSTM unit, as shown in Fig. 11.2. These
are the forget gate, the input gate, and the output gate. The control gates essentially
fully connect the layers (denoted as σ ).
The first gate in the LSTM unit is the forget gate ft , which determines how much
information is kept from the last state ct−1 . The forget state at time t is formulated as:
where σ (·) denotes the sigmoid activation function; Xt is the input vector for the
regression model, which mainly include historical load data, calendar data, and exter-
nal factors; ft , ht−1 , and b f stand for the forget gate vector at time t, the output vector
(also the state-h vector) at time t − 1, and the bias of the forget gate at time t, respec-
tively; W f is the weight matrix of the forget gate; and [·] is the concatenating operator
for vectors.
The second gate is the input gate it , which determines how much current infor-
mation should be treated as input to generate the current state ct . it is calculated by:
it = σ Wi · [ht−1 , Xt ] + bi , (11.2)
where Wi and bi denote the weight matrix and bias of the input gate, respectively. It
can be seen that it has a similar formulation to ft . Both gates are determined by ht−1
and Xt .
The current hidden state ct is determined by adding the parts of information they
control. The long-term information is controlled by ft , and the short-term information
is controlled by it :
c̃t = tanh Wc · [ht−1 , xt ] + bc , (11.3)
where tanh(·) denotes the tanh activation function; Wc and bc denote the weight
matrix and the bias of the current gate, respectively; and the operator ∗ stands for the
element-wise product.
The last phase of the LSTM unit is to calculate how much information can even-
tually be treated as the output. Another control gate is chosen as the output gate ot :
ot = σ Wo · [ht−1 , xt ] + bo . (11.5)
ht = ot ∗ tanh(ct ). (11.6)
1
T
L MSE = (yt − ŷtE )2 , (11.7)
T t=1
252 11 Probabilistic Residential Load Forecasting
where yt and ŷtE denote the measured and predicted load at time t, respectively, and
T denotes the total prediction time period.
Traditional LSTM can only provide the expected value of the future load. To
provide more information about future uncertainties, we replace the loss function
MSE by the pinball loss, also called the quantile loss, to guide the training of the
LSTM network. The pinball loss is calculated as follows:
q q
q (1 − q)( ŷt − yt ) ŷt ≥ yt
L q,t (yt , ŷt ) = q q (11.8)
q(yt − ŷt ) ŷt < yt .
q
where q denotes the targeted quantile, ŷt denotes the estimated qth quantile at time
t, and L q,t denotes the pinball loss for the qth quantile at time t. Figure 11.3 gives
an illustration of pinball loss, which is asymmetric. When the forecasted quantile is
higher than the real value, the penalty will be multiplied by (1 − q), and when the
forecasted quantile is lower than the real value, the penalty will be multiplied by q.
There are at least two advantages to choosing pinball loss as the loss function:
1. Under the guidance of pinball loss, the trained LSTM network provides the
targeted quantile value instead of the expected value. By varying the value of the
quantile, we can obtain a series of quantiles to represent the uncertainties. The
whole training process is non-parametric and requires no presumption about the
distributions.
2. The probabilistic forecasts are usually evaluated using three aspects: reliability,
sharpness, and calibration. Pinball loss is a comprehensive index for these three
criteria, which means that the pinball loss can guarantee the performance of the
final probabilistic forecasts.
As introduced above, the pinball loss guided LSTM is a combination of LSTM and
pinball loss. The overall pinball loss guided LSTM network is shown in Fig. 11.4.
Concretely, the proposed pinball loss guided LSTM in this chapter consists of
three phases.
11.2 Pinball Loss Guided LSTM 253
The first phase is stacked by LSTM units where the inputs are the sequential loads
at different time stamps, and the output is the hidden state h t at the last timestamp,
corresponding to the encoded features learned from the historical load. In this figure,
m denotes the number of time periods ahead, and d denotes the number of time
periods that are considered as the inputs in the forecasting model.
The second phase is a one-hot encoder, converting numerical time variables Wt and
Ht into encoded vectors, where W eekt and H ourt denote the day of the week and the
hour of the day of the forecasted load yt , respectively. W eekt (en) and H ourt (en) denote
the encoded vectors corresponding to the week and hour variables, respectively.
The third phase is a fully-connected (FC) network, where the inputs are the con-
catenated feature vectors generated from the two phases mentioned above, and the
outputs are the forecasted quantiles.
In traditional quantile regression, the model would be trained for each quantile
individually. The training objective is to minimize the average loss function L q for
the qth quantile, which is described as:
1
T
q
min L q = L q,t (yt , ŷt ). (11.9)
T t=1
1
Q T
q
min L = L q,t (yt , ŷt ). (11.10)
Q × T q=1 t=1
254 11 Probabilistic Residential Load Forecasting
In this way, the LSTM network needs to be trained only once. Our numerical
experiments show that the integrated model has comparable performance with mul-
tiple individual models.
11.3 Implementations
11.3.1 Framework
The basic structure of the proposed pinball loss guided LSTM was introduced in the
above section. In this section, we provide more details on the implementation of the
whole probabilistic forecasting process. The implementation can be roughly divided
into three stages: data preparation, model training, and probabilistic forecasting.
These are shown in Fig. 11.5.
In the data preparation stage, we only use the historical load data since the weather
data are not available in our dataset. We first clean the load dataset as follows. Any
not-a-number (NAN) data are simply replaced by the average of the load data at
the same time period from one day ahead and one data later. The input data of the
regression model include the historical load data and the calendar variables, such as
the current hour of the day and day of a week. After formulating the input and output
dataset, we split the dataset into three parts for model training (S1), validation (S2),
and testing (S3).
The first step is the setup of the neural network. A static computing graph is generated
with TensorFlow [25] according to Fig. 11.4. Then, the parameters are initialized.
The parameters can be divided into weights and biases. All weights in the three
phases are initialized with values sampled from a truncated normal distribution with
a mean of 0 and a standard deviation of 0.01. All biases are initialized to 0. Such
initialization can, to some extent, prevent the neural network from becoming stuck
in a local minimum. After that, the loss function of the neural network is optimized
using a gradient-descent-based method with an adequate learning rate—Adam [26].
We define the maximum training epoch as Nmax , and an early stopping mechanism is
utilized to prevent the model from overfitting. Concretely, if the monitored validation
loss does not drop for k epochs, the training process is terminated.
One requirement of the application of Adam is that the loss function should be
differentiable so that the neural network can be trained using gradient descent. How-
ever, the pinball loss is not differentiable everywhere. In this chapter, we introduce
the Huber norm [27] to the loss function, with very little approximation, in order to
make the loss function differentiable everywhere. The Huber norm can be viewed as
a combination of the L1- and L2-norms:
q
( ŷt − yt )
2
q
q
0 ≤ | ŷt − yt | ≤ ε
H (yt , ŷt ) = 2ε (11.11)
q ε q
| ŷt − yt | − | ŷt − yt | > ε,
2
where ε denotes the threshold magnitude for the L1- and L2-norms. When the forecast
q
error | ŷt − yt | is below the threshold, the Huber norm is the L2-norm; when the
forecast error is larger than the threshold, the Huber norm is the L1-norm.
q
We then substitute ( ŷt − yt ) into Eq. (11.8) with the Huber norm, and the approx-
imated pinball loss can be calculated as:
q q
q (1 − q)h(yt , ŷt ) ŷt ≥ yt
L q,t (yt , ŷt ) = q q (11.12)
qh(yt , ŷt ) ŷt < yt .
Compared with standard pinball loss, the approximated pinball loss is differen-
q
tiable when the forecast error is zero, i.e., ŷt = yt . The gradient of the approximated
pinball loss is equal to that of the standard pinball loss when the forecast error is
larger than the threshold, and there is very little difference when the forecast error is
below the threshold.
256 11 Probabilistic Residential Load Forecasting
The performance of the probabilistic forecasts is evaluated by the average of the total
pinball loss:
1 Q
q
L= L q,t (yt , ŷt ), (11.13)
Q × |S3| q=1 t∈S3
11.4 Benchmarks
11.4.1 QRNN
11.4.2 QGBRT
11.4.3 LSTM+E
In addition to two frequently used nonlinear quantile regression models, the prob-
abilistic forecasts can also be obtained by the statistics of the point forecast errors.
To make a fair comparison, the point forecasts are produced based on a traditional
LSTM with the same structure as the proposed pinball loss guided LSTM. We sim-
ply assume that the errors follow Gaussian distributions. Since the distribution of
the errors varies for different time periods, the variances for different time peri-
ods are calculated individually. Then, the quantiles can be calculated based on the
corresponding variances.
The proposed pinball loss guided LSTM, QRNN, and traditional LSTM are imple-
mented using TensorFlow [25]. QGBRT is implemented using the GBRT package in
Scikit-Learn [28]. The model training is supported by CUDA8.0 and an Nvidia GPU,
TITAN X (Pascal). The GPU is also applied for parallel computation to accelerate
the model training process. For the implementation of QGBRT, a total of Q parallel
processes are opened for the individual training of Q quantile regression models.
The hyperparameters of the proposed pinball loss guided LSTM (denoted as
QLSTM in the following), and the competing methods (QRNN, QGBRT, and
LSTM+E) are illustrated in Table 11.1. The structures of QLSTM and LSTM+E
Table 11.1 Hyperparameter Models Parameters
settings for different models
QLSTM/LSTM+E LSTM-unit:16
FC-unit:16
FC-layer:3
QRNN FC-unit:16
FC-layer:3
QGBRT N_estimators:500
min_samples_split = 2
max_depth = 3
samples_leaf = 1
258 11 Probabilistic Residential Load Forecasting
are the same except for the loss function, allowing us to make a fair comparison.
The full connection layer of QRNN is the same as that of Phase 3 in QLSTM. The
number of estimators of QGBRT is set to 500, which makes the GBRT an adequately
strong learning model.
The dataset used in the case studies was collected from the Smart Metering Elec-
tricity Customer Behavior Trials (CBTs) proposed by the Commission for Energy
Regulation (CER) in Ireland. It contains over 6000 residential load profiles and small
and medium enterprises (SME) load profiles for approximately one and a half years
(from the 1st of July 2009 to the 31st of December 2010). These load profiles were
collected at 30-min intervals. There are a total of 26,000 data points for each indi-
vidual consumer. We use the first 22,000 load data points for model training and
validation (S1 and S2) and apply the following 2000 points for model testing (S3).
We implement the case studies on the load profiles of 100 randomly selected residen-
tial and SME consumers. We provide comprehensive model testing by using several
forecast lead times, 30 min, one hour, two hours, and four hours.
Table 11.2 presents the performance of the proposed QLSTM and three other compet-
ing methods measured by the average pinball loss for all 100 residential consumers.
The proposed QLSTM has the lowest pinball loss for the four different lead times. In
Table 11.2, I_QRNN, I_QGBRT, and I_LSTM+E denote the relative improvements
of the proposed QLTSM model compared with QRNN, QGBRT, and LSTM+E.
Except for QLSTM, QRNN has better performance than QGBRT and LSTM+E,
while LSTM+E has the worst performance. A possible reason for this is the unrea-
sonable assumption that forecasting errors follow a Gaussian distribution. Compared
with QRNN, the relative improvements of QLSTM are 3.46, 2.76, 2.18, and 2.19%.
0.3
QRNN
QGBRT
0.25 LSTM+E
Pinball Loss / Proposed
0.2
0.15
0.1
0.05
0
0 0.05 0.1 0.15 0.2 0.25 0.3
(a) 30 min
0.3
QRNN
QGBRT
0.25 LSTM+E
Pinball Loss / Proposed
0.2
0.15
0.1
0.05
0
0 0.05 0.1 0.15 0.2 0.25 0.3
Fig. 11.7 Performance comparison between QLSTM and three benchmarks for all residential
consumers
In addition, the averaged pinball loss gets larger with longer lead time, especially for
the lead time from 30 min to one hour. However, the pinball loss stays relatively sta-
ble when the lead time is longer than one hour. The reason is that the residential load
profiles are so stochastic that only 30 min or one hour ahead time data can effectively
help to capture future trends.
260 11 Probabilistic Residential Load Forecasting
0.3
QRNN
QGBRT
0.25 LSTM+E
Pinball Loss / Proposed
0.2
0.15
0.1
0.05
0
0 0.05 0.1 0.15 0.2 0.25 0.3
0.2
0.15
0.1
0.05
0
0 0.05 0.1 0.15 0.2 0.25 0.3
To provide the detailed performance of the proposed QLSTM and the three com-
peting methods for the 100 residential consumers, a scatter plot of the average pin-
ball loss of the proposed QLSTM versus the three competing methods is provided
in Fig. 11.7. It can be seen that most of the points fall under the line y = x, which
11.5 Case Studies 261
2.5
2
Consumption/kW
1.5
0.5
0
0 48 96 144 192 240 288 336
Time/30 min
Fig. 11.8 Thirty minute ahead forecasts for one sample residential consumer over one week
means that QLSTM outperforms the other three methods for almost all residential
consumers.
Figure 11.8 shows the 30 min ahead forecasts of one sample residential consumer
over one week, from 17 October 2010 to 23 October 2010 (a total of 336 time
periods), where the dotted lines denote a series of forecasted quantiles and the red
line denotes the actual values. The quantiles can effectively capture the basic trends
in the load profile, except for several sudden peaks.
We calculate the pinball loss for all 2000 time periods. Then, we draw the box-
plots for the distribution of the pinball loss for the 48 time periods in a day. The
distribution of the pinball losses is shown in Fig. 11.9. The pinball loss is higher and
more dispersed from 7:00 to 8:00 and from 17:30 to 22:00. The time period from
7:00 to 8:00 corresponds to the time that people get up for work, and the time period
from 17:30 to 22:00 corresponds to the after-work time. Consumers have higher
load demands with larger uncertainties in these two time periods, and thus, the loads
are more difficult to forecast. The distributions of the pinball losses for different
forecasting lead times have similar trends. These results can be a good reference
for demand response targeting, baseline estimation, and reliability assessments. In
addition, the pinball load distributions for 1, 2, and 4 h ahead forecasting are similar;
while the distribution for 30 min ahead forecasting is different from the rest three
distributions and has smaller averaged pinball loss.
Similar to the residential forecasts, we summarize the average pinball loss for all
100 of the SME consumers in Table 11.3. For these data, the proposed QLSTM also
gives the best performance. Similarly, QRNN has the best performance in addition
to QLSTM. However, in contrast to the residential consumer test, it is interesting
that the performance of LSTM+E is better than that of QGBRT. This may due to two
262 11 Probabilistic Residential Load Forecasting
0.35
0.3
0.25
Pinball Loss / kW
0.2
0.15
0.1
0.05
0
1 2 3 4 5 6 7 8 9 101112131415161718192021222324252627282930313233343536373839404142434445464748
Time / 30 min
(a) 30 min
0.35
0.3
0.25
Pinball Loss / kW
0.2
0.15
0.1
0.05
0
1 2 3 4 5 6 7 8 9 101112131415161718192021222324252627282930313233343536373839404142434445464748
Time / 30 min
Fig. 11.9 Distribution of pinball loss at different time periods for one residential consumer
11.5 Case Studies 263
0.35
0.3
0.25
Pinball Loss / kW
0.2
0.15
0.1
0.05
0
1 2 3 4 5 6 7 8 9 101112131415161718192021222324252627282930313233343536373839404142434445464748
Time / 30 min
reasons: (1) the assumption that the forecasting errors follow a Gaussian distribution
may be more reasonable for SME consumers than for residential consumers; (2)
LSTM is able to provide more accurate point forecasts compared with GBRT. The
improvements with respect to QLSTM are also greater than those for residential
consumers for different forecasting lead times.
Figure 11.10 presents the scatter plot of the average pinball loss for the proposed
QLSTM versus the three competing methods. We obtain similar results in that most
of the points fall under the line y = x, which means that QLSTM outperforms the
other three methods for almost all of the SME consumers.
Figure 11.11 shows the 30 min ahead forecasts for one sample SME consumer
over one week, from 17 October 2010 to 23 October 2010, where the dotted lines
denote a series of forecasted quantiles and the red line denotes the actual values. In
contrast with the residential consumer, the SME consumer has more stable electricity
consumption behavior and clearer patterns (there are no sudden peaks).
Accordingly, the box-plot of the pinball loss at different time periods is shown in
Fig. 11.12. The pinball loss is higher and more dispersed between 9:30 and 20:00.
This time period corresponds to working hours, which means that the consumer has
large but highly uncertain, electricity consumption in this time period.
264 11 Probabilistic Residential Load Forecasting
11.6 Conclusions
In this chapter, we proposed a pinball loss guided LSTM for probabilistic residential
and SME consumer load forecasting. Comprehensive case studies are conducted on
1
QRNN
QGBRT
LSTM+E
0.8
Pinball Loss / Proposed
0.6
0.4
0.2
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
(a) 30 min
1
QRNN
QGBRT
LSTM+E
0.8
Pinball Loss / Proposed
0.6
0.4
0.2
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Pinball Loss / Benchmarks
Fig. 11.10 Performance comparison between QLSTM and three benchmarks for all SME con-
sumers
11.6 Conclusions 265
1
QRNN
QGBRT
LSTM+E
0.8
Pinball Loss / Proposed
0.6
0.4
0.2
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
1
QRNN
QGBRT
LSTM+E
0.8
Pinball Loss / Proposed
0.6
0.4
0.2
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
different consumers with different forecasting lead times and with state-of-the-art
competing methods. We can draw the following conclusions:
1. The proposed pinball loss guided LSTM has better performance than QRNN,
QGBRT, and LSTM+E for almost all 100 residential loads and 100 SME loads.
266 11 Probabilistic Residential Load Forecasting
18
16
14
Consumption/kW
12
10
8
6
4
2
0 48 96 144 192 240 288 336
Time/30 min
Fig. 11.11 Thirty minute ahead forecasts for one sample SME consumer over one week
1.5
Pinball Loss / kW
0.5
0
1 2 3 4 5 6 7 8 9 101112131415161718192021222324252627282930313233343536373839404142434445464748
Time / 30 min
(a) 30 min
2
1.5
Pinball Loss / kW
0.5
0
1 2 3 4 5 6 7 8 9 101112131415161718192021222324252627282930313233343536373839404142434445464748
Time / 30 min
(b) One Hour
Fig. 11.12 Distribution of pinball loss at different time periods for one SME consumer
11.6 Conclusions 267
1.5
Pinball Loss / kW
0.5
0
1 2 3 4 5 6 7 8 9 101112131415161718192021222324252627282930313233343536373839404142434445464748
Time / 30 min
(c) Two Hours
2
1.5
Pinball Loss / kW
0.5
0
1 2 3 4 5 6 7 8 9 101112131415161718192021222324252627282930313233343536373839404142434445464748
Time / 30 min
(d) Four Hours
2. Compared with the three competing methods, the proposed method has
improvements ranging from 2.19 to 7.52% for residential consumers; while, the
improvements range from 3.79 to 25.80% for SME consumers. The improve-
ments for residential consumers are greater than the improvements seen for SME
consumers, which means that QLSTM can more effectively capture the change
patterns of SME loads.
3. The distributions of the pinball loss in different time periods are different. For
residential consumers, the time periods that have the largest and most dispersed
pinball losses are 7:00-8:00 and 17:30–22:00; while, for SME consumers, the
time periods that have the largest and most dispersed pinball losses are 9:00–
20:00; These time periods for residential consumers are complementary to those
of SME consumers.
268 11 Probabilistic Residential Load Forecasting
References
1. Hong, T., Pinson, P., Fan, S., Zareipour, H., Troccoli, A., & Hyndman, R. J. (2016). Probabilistic
energy forecasting: Global energy forecasting competition 2014 and beyond. International
Journal of Forecasting, 32(3), 896–913.
2. Wang, Y., Chen, Q., Kang, C., & Xia, Q. (2016). Clustering of electricity consumption behavior
dynamics toward big data applications. IEEE Transactions on Smart Grid, 7(5), 2437–2447.
3. Keerthisinghe, C., Verbič, G., & Chapman, A. C. (2016). A fast technique for smart home
management: Adp with temporal difference learning. IEEE Transactions on Smart Grid, 9(4),
3291–3303.
4. Morstyn, T., Farrell, N., Darby, S. J., & McCulloch, M. D. (2018). Using peer-to-peer energy-
trading platforms to incentivize prosumers to form federated power plants. Nature Energy,
3(2), 94.
5. Li, P., Zhang, B., Weng, Y., & Rajagopal, R. (2017). A sparse linear model and significance
test for individual consumption prediction. IEEE Transactions on Power Systems, 32(6), 4489–
4500.
6. Chaouch, M. (2014). Clustering-based improvement of nonparametric functional time series
forecasting: Application to intra-day household-level load curves. IEEE Transactions on Smart
Grid, 5(1), 411–419.
7. Teeraratkul, T., O’Neill, D., & Lall, S. (2017). Shape-based approach to household electric
load curve clustering and prediction. IEEE Transactions on Smart Grid, 9(5), 5196–5206.
8. Shi, H., Minghao, X., & Li, R. (2017). Deep learning for household load forecasting–a novel
pooling deep rnn. IEEE Transactions on Smart Grid, 9(5), 5271–5280.
9. Kong, W., Dong, Z. Y., Jia, Y., Hill, D. J., Xu, Y., & Zhang, Y. (2017). Short-term residential
load forecasting based on lstm recurrent neural network. IEEE Transactions on Smart Grid,
10(1), 841–851.
10. Kong, W., Dong, Z. Y., Hill, D. J., Luo, F., & Xu, Y. (2017) Short-term residential load fore-
casting based on resident behaviour learning. IEEE Transactions on Power Systems, 33(1),
1087–1088.
11. Yu, C-N., Mirowski, P., & Ho, T. K. (2017). A sparse coding approach to household electricity
demand forecasting in smart grids. IEEE Transactions on Smart Grid, 8(2), 738–748.
12. Tascikaraoglu, A., & Sanandaji, B. M. (2016). Short-term residential electric load forecasting:
A compressive spatio-temporal approach. Energy and Buildings, 111, 380–392.
13. Impact of calendar effects and forecast granularity. (2017). Peter Lusis, Kaveh Rajab
Khalilpour, Lachlan Andrew, and Ariel Liebman. Short-term residential load forecasting.
Applied Energy, 205, 654–669.
14. Haben, S., Ward, J., Greetham, D. V., Singleton, C., & Grindrod, P. (2014). A new error measure
for forecasts of household-level, high resolution electrical energy consumption. International
Journal of Forecasting, 30(2), 246–256.
15. Kim, S., & Kim, H. (2016). A new metric of absolute percentage error for intermittent demand
forecasts. International Journal of Forecasting, 32(3), 669–679.
16. Yang, Y., Li, S., Li, W., & Meijun, Q. (2018). Power load probability density forecasting using
gaussian process quantile regression. Applied Energy, 213, 499–509.
17. Wang, Y., Zhang, N., Chen, Q., Kirschen, D. S., Li, P., & Xia, Q. (2017). Data-driven proba-
bilistic net load forecasting with high penetration of behind-the-meter pv. IEEE Transactions
on Power Systems, 33(3), 3255–3264.
18. Dahua, G., Yi, W., Shuo, Y., & Chongqing, K. (2018). Embedding based quantile regression
neural network for probabilistic load forecasting. Journal of Modern Power Systems and Clean
Energy, 6(2), 244–254.
19. Liu, B., Nowotarski, J., Hong, T., & Weron, R. (2017). Probabilistic load forecasting via quantile
regression averaging on sister forecasts. IEEE Transactions on Smart Grid, 8(2), 730–737.
20. Hong, T., & Fan, S. (2016). Probabilistic electric load forecasting: A tutorial review. Interna-
tional Journal of Forecasting, 32(3), 914–938.
References 269
21. Arora, S., Taylor, J. W. (2016). Forecasting electricity smart meter data using conditional kernel
density estimation. Omega, 59, 47–59.
22. Taieb, S. B., Huser, R., Hyndman, R. J., & Genton, M. G. (2016). Forecasting uncertainty in
electricity smart meter data by boosting additive quantile regression. IEEE Transactions on
Smart Grid, 7(5), 2448–2455.
23. Shepero, M., van der Meer, D., Munkhammar, J., & Widén, J. (2018). Residential probabilistic
load forecasting: A method using gaussian process designed for electric load data. Applied
Energy, 218, 159 – 172.
24. Amini, M. H., Karabasoglu, O., Ilic, M. D., & Boroojeni, K. G. (2015). Arima-based demand
forecasting method considering probabilistic model of electric vehicles’ parking lots. In Power
& Energy Society General Meeting (pp. 1–5).
25. Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., et al. (2016). Tensorflow: A
system for large-scale machine learning. OSDI, 16, 265–283.
26. Kingma, D. P., Ba, J. (2014). Adam: A method for stochastic optimization. arXiv:1412.6980.
27. Huber, P . J., & Ronchetti, E. M. (1981). Robust statistics. Series in probability and mathematical
statistics. New York: Wiley.
28. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O. et al. (2011).
Scikit-learn: Machine learning in python. Journal of Machine Learning Research, 12(Oct),
2825–2830.
Chapter 12
Aggregated Load Forecasting with
Sub-profiles
Abstract With the prevalence of smart meters, fine-grained sub-profiles reveal more
information about the aggregated load and further help improve forecasting accuracy.
This chapter proposes a novel ensemble approach for aggregated load forecasting.
An ensemble is an effective approach for load forecasting. It either generates mul-
tiple training datasets or applies multiple forecasting models to produce multiple
forecasts. In this chapter, the proposed ensemble forecast method for the aggregated
load with sub-profiles is conducted based on the multiple forecasts produced by dif-
ferent groupings of sub-profiles. Specifically, the sub-profiles are first clustered into
different groups, and forecasting is conducted on the grouped load profiles individ-
ually. Thus, these forecasts can be summed to form the aggregated load forecast. In
this way, different aggregated load forecasts can be obtained by varying the number
of clusters. Finally, an optimal weighted ensemble approach is employed to combine
these forecasts and provide the final forecasting result. Case studies are conducted
on two open datasets and verify the effectiveness and superiority of the proposed
method.
12.1 Introduction
© Science Press and Springer Nature Singapore Pte Ltd. 2020 271
Y. Wang et al., Smart Meter Data Analytics,
https://doi.org/10.1007/978-981-15-2624-4_12
272 12 Aggregated Load Forecasting with Sub-profiles
the aggregated load forecast. The optimal number of clusters is determined by cross-
validation. The results demonstrate that the clustering-based method outperforms the
direct forecasting method.
Beyond the aforementioned single-output forecasting methods (i.e. only provide
one final forecast value), a series of works have been done on ensemble forecasting
methods, which can produce multiple forecasts from different models [5]. In general,
ensemble forecasting can be classified as homogeneous and heterogeneous methods
such as bootstrap aggregating methods and the combination of SVM and ANN [6].
This chapter tries to answer the following question: Is it possible to utilize both
ensemble techniques and fine-grained sub profiles to further improve the forecasting
accuracy?
This chapter first provides some primary experiment on the sub-profiles, includ-
ing studying how the aggregation level affects the forecasting performance and a
clustering-based forecasting approach to make full use of the fine-grained smart
meter data. On this basis, this chapter proposes a novel ensemble forecasting method
for the aggregated load with sub-profiles to answer this question. A brief summary
of the ensemble method is as follows: First, the sub-profiles are grouped using hier-
archical clustering method and forecasting is conducted on the grouped load profiles
individually. Then, these forecasts are summed to form the aggregated load fore-
cast. Thus, we can vary the number of clusters to obtain multiple aggregated load
forecasts instead of a single forecast. Subsequently, an optimally weighted ensemble
approached is used to combine these forecasts and provide the final result. Finally,
case studies are conducted on two open datasets (residential and substation loads) to
verify the effectiveness and superiority of the proposed method.
Root-mean-square deviation (RMSE) and mean absolute error (MAE) are two
frequently applied forecasting performance evaluation criteria. To make the load
forecasting performances of different aggregation levels comparable, the relative
274 12 Aggregated Load Forecasting with Sub-profiles
T
1
|L t − L̂ t |
t=1
R-MAE = T
1 T (12.3)
T t=1 L t
where T denotes the total number of forecasting time periods; L̂ t denotes the fore-
casted load value. R-RMSE (or R-MAE) is the ratio between RMSE (or MARE) and
the average load.
Reference [7] provides a scaling law for short term load forecasting on varying
aggregation levels. It is proven that the average R-RMSE can be approximated as a
function of the aggregation level W .
α0
R-RMSE(W ) = + α1 (12.4)
W
where W is the average load indicating the aggregation level; α0 and α1 are con-
stants. The two constants can be estimated using the regression method based on
experimental results. This scaling law can also be extended to R-MAE.
It clearly shows that R-RMSE will decrease with the aggregation level W increas-
ing. When W is very small, αW0 α1 , the domain part is αW0 . Thus, the average R-
RMSE can be estimated as:
α0
R-RMSE(W ) (12.5)
W
That is to say, when the aggregation level is very small, R-RMSE is approximately
and linearly determined by √1W .
When W is very large, αW0 α1 , the domain part is α1 . Thus, the average R-RMSE
can be estimated as:
√
R-RMSE(W ) α1 (12.6)
It is interesting that when the aggregation level is very large, R-RMSE changes
√
very slightly and approximately equals to α1 .
12.2 Load Forecasting with Different Aggregation Levels 275
To verify the scaling law, we conduct massive case studies by randomly selecting
individual consumers. We define 13 aggregation level where the number of consumers
at the nth aggregation level is 2n−1 . For example, there are 4096 = 212 consumers
randomly selected and aggregated at the 13th aggregation level. For each aggregation
level, 20 experiments are conducted by repeatably selected the individual consumers.
Figures 12.2 and 12.3 provide the boxplot of R-MAE and R-RMSE. We can find
a clear trend that both average R-MAE and average R-RMSE decrease when the
number of aggregated consumer increase. When the number of aggregated consumers
is greater than 286 = 28 , both average R-MAE and average R-RMSE change very
slightly. These trends are consistent with the scaling law provided in Eq. (12.4).
Another observation is that the variances of R-MAE and R-RMSE also decrease
when the number of aggregated consumer increase. For lower aggregation level, the
aggregated load profile shows higher volatility, and thus the forecasting performance
is unstable; for higher aggregation level, the aggregated load profile shows lower
volatility, and thus the forecasting performance is much more stable.
276 12 Aggregated Load Forecasting with Sub-profiles
12.3.1 Framework
With fine-grained sub-profile, we have two intuitive ideas for aggregated load fore-
casting: (1) Directly train the forecasting model based on the final aggregated load.
(2) Train the forecasting model for each individual consumer first and then obtain
the summation of all the individual forecasts to form the final forecast. The first
strategy is the traditional load forecasting approach but does not make full use of the
fine-grained sub-profiles. The second approach can train specific forecasting model
for each consumer. However, it suffers from two drawbacks: (1) training forecasting
model for each consumer is time-consuming and needs more computing sources; (2)
since individual load profile has great volatility, the trained model may over-fit, and
their summation may even have worse performance.
Clustering is an effective approach to aggregated consumers with similar con-
sumption behavior into the same group. Is it possible to first partition the consumers
into different group first, then train the forecasting model for each group, and finally
12.3 Clustering-Based Aggregated Load Forecasting 277
sum all the forecasts? This section studies the performance of the proposed forecast-
ing strategy. The Cluster-based aggregate forecasting strategy is shown in Fig. 12.4.
For a region with M sub-profiles, let L t and L m,t denote the total load and the
mth sub load at time t, the matrix form of the sub-load profiles can be represented as
L M×T . Each column of L M×T , L·,t denotes the load consumption of all N consumer at
time period t; each row of L M×T , Lm,· denotes the load consumption of the consumer
m at all time periods T .
First, we partition all the M consumers into K different groups C=C1 , C2 , . . . , C K ;
then the K aggregated load profile is:
lk = Lm,· (12.7)
m∈Ck
On this basis, each forecasting model f k is trained for each aggregated load profile
lk : L̂ t,k = f k (Xt,k ). Thus, the final forecast is:
K
L̂ t = L̂ t,k . (12.8)
k=1
We apply two traditional clustering methods, k-means and k-medoids, to group the
consumers according to their average weekly load profiles. The number of clusters
varies from 1 to 50. All the forecasting models are ANNs. Figures 12.5 and 12.6 give
the aggregated load forecasting performance with different numbers of clusters in
terms of MAPE, MAE, and RMSE, based on k-means and k-medoids, respectively.
The optimal number of clusters is 2 for Irish dataset for both clustering methods.
It seems that there is a clear trend or correlation between forecasting performance
and the number of clusters. This observation is different from the results in [4] where
we can easily find the optimal number of cluster according to the clear trend of the
performance.
278 12 Aggregated Load Forecasting with Sub-profiles
Fig. 12.5 Load forecasting performance with different numbers of clusters (k-means)
Fig. 12.6 Load forecasting performance with different numbers of clusters (k-medoids)
Since the forecasts are different with a different number of clusters, we can apply
ensemble learning method on these forecasts. This idea inspires the work in the next
section.
12.4 Ensemble Forecasting for the Aggregated Load 279
To highlight the idea of the proposed method, only historical load data is employed as
input features for constructing the forecasting model. Note that other relevant factors
(e.g. temperature) can also be considered in the proposed framework. First, the sub
profiles L M×T are segmented into three parts: the first part Ltr is used to train the
forecasting model for each group load profile; the second part Len is used to calculate
the weights ω for ensemble; the third part Lte is used to test the performance of the
aggregated load ensemble forecasting model. The proposed method includes four
main stages: the clustering stage, training stage, ensemble stage, and test stage.
where TW denotes the number of time periods over one week. It is important to
notice that, in this stage, a large number of clustering procedures are required to be
performed on different numbers of groups. Therefore, in this research, the agglom-
erative hierarchical clustering method with single linkage is selected to cluster the
customers because of its capability to establish the hierarchical structure and the fact
that it does not need to be performed repeatedly [8].
The purpose of this stage is to produce multiple forecasts by varying the number of
clusters. This stage is also conducted on Ltr . When the number of clusters is M, the
forecasting is essentially the bottom-up approach; when the number of clusters is 1,
the forecasting is performed directly based on historical aggregated load data. In order
to diversify the forecasting results, we vary the number of clusters exponentially.
Thus, a total of N forecasts will be obtained:
280 12 Aggregated Load Forecasting with Sub-profiles
N = log2 M + 1 (12.10)
where [·] denotes the round-down function. For example, N = 7 when M = 100.
The nth forecast is obtained by summing the forecasts of kn grouped load profiles,
where kn is expressed as follows:
kn = min 2n−1 , M (12.11)
For example, the set of cluster number is K = [1, 2, 4, 8, 16, 32, 64, 100] when M =
100.
As one of the main contributions in this work, ensemble stage is proposed to calculate
the weights ω for the N forecasts and combine them into final forecast. This stage
is conducted on Len instead of Ltr to reduce overfitting risk. The ensemble of N
forecasts is formulated as an optimization problem where the objective function is
to minimize the mean absolute percent error (MAPE) and the constraints include
the equations of the combined forecasts, the summation of all the weights, and non-
negativity of the weights.
T
1 |L en,t − L̂ en,t |
ω̂ = arg min
ω T L en,t
t=1
(12.12)
N
N
s.t. L̂ en,t = ωn L̂ en,n,t , ωn = 1, ωn ≥ 0.
n=1 n=1
The absolute percent error in the objective function can be easily transformed into
linear programming (LP) problem by introducing auxiliary decision variables ven,t ,
as follows:
T
1 ven,t
ω̂ = arg min
ω T L en,t
t=1
N
N (12.13)
s.t. L̂ en,t = ωn L̂ en,n,t , ωn = 1, ωn ≥ 0.
n=1 n=1
In this section, case studies are conducted on two open datasets. In particular, 50, 25
and 25% of the whole dataset are partitioned into a training dataset, test dataset, and
ensemble dataset, respectively.
Table 12.1 Performance of individual and ensemble forecasts for Irish dataset
N 1 2 4 8 16 32 64 128 256 … 5237 Ensemble
ω 0.634 0 0 0.271 0 0 0.095 0 0 … 0 /
MAPE 4.25% 5.05% 5.29% 4.74% 5.55% 4.66% 4.79% 5.09% 5.59% … 10.31% 4.05%
RMSE 210.95 229.73 228.01 217.68 244.9 217.64 227.36 232.61 250.27 … 441.33 202.88
12 Aggregated Load Forecasting with Sub-profiles
12.4 Ensemble Forecasting for the Aggregated Load 283
The residential load data used in this section are obtained from the Smart Meter-
ing Electricity Customer Behaviour Trials (CBTs) initiated by the Commission for
Energy Regulation (CER) in Ireland. It contains half-hour electricity consumption
data of over 5000 Irish residential consumers and small and medium enterprises
(SMEs) [9]. After excluding the consumers with a large number of zero values, the
data of a total of 5237 consumers from July 20, 2009, to December 26, 2010 (75
weeks) are used for forecasting and testing. Figure 12.7 shows the weekly predicted
and real load profiles from December 13, 2010 to December 19, 2010. As shown
in the figure, the dotted lines are individual forecasts; the blue and red lines are the
ensemble forecast and actual value, respectively. Table 12.1 provides the weights,
MAPE, and RMSE of individual forecasts. Regarding the individual forecasts, it
can be seen that instead of using the clustering strategy (i.e. N > 1), direct load
forecasting based on the aggregated data (i.e. N = 1) exhibits the best performance.
Nevertheless, the superior performance of the proposed ensemble method can be
indicated by the 4.71% and 3.83% lower MAPE and RMSE values, respectively,
than those of the best individual forecast method. Results also show that the perfor-
mance of the bottom-up approach is much worse than the clustering-based method
due to the large variety of individual load profiles.
We use the Ausgrid substation load data from May 5, 2014, to April 24, 2016 (103
weeks). After deleting the substations with a large number of non-value, a total of
155 substations data are retained [10]. Thus, nine individual forecasts are obtained
by varying the number of clusters. The predicted load profiles from April 11, 2016 to
284
Table 12.2 Performance of individual and ensemble forecasts for Ausgrid dataset
N 1 2 4 8 16 32 64 128 155 Ensemble
ω 0 0 0 0 0.113 0 0 0 0.887 /
MAPE 5.68% 5.59% 5.47% 5.27% 5.15% 5.19% 5.13% 5.12% 5.09% 5.08%
RMSE 223.23 217.4 215.47 208.21 203.91 206.3 204.66 202.73 202.65 202.55
12 Aggregated Load Forecasting with Sub-profiles
12.4 Ensemble Forecasting for the Aggregated Load 285
April 17, 2016 and performances are shown in Fig. 12.8 and Table 12.2, respectively.
After the optimization procedure, the weights for forecasts #5 and #9 are 0.113
and 0.887 respectively, whereas the weights for other forecasts are zeros. When
comparing the calculated MAPE and RMSE values, it is very interesting to find that,
in contrast to Irish dataset, the bottom-up approach (i.e. N = 155) have the lowest
forecasting errors. The reason for this phenomenon might be that the substation load
profiles are more regular than residential load profiles.
12.5 Conclusions
This chapter proposes an ensemble forecasting method for aggregated load profile
using hierarchical clustering and based on fine-grained sub-load profiles. It is a new
way to make full advantages of fine-grained data to further improve the forecast-
ing accuracy of the aggregated load. Case studies on both residential load data and
substation load data demonstrate the superior performance of the proposed ensem-
ble method when comparing with the traditional direct or bottom-up forecasting
strategies.
References
1. Hong, T., & Fan, S. (2016). Probabilistic electric load forecasting: A tutorial review. Interna-
tional Journal of Forecasting, 32(3), 914–938.
2. Yu, C.-N., Mirowski, P., & Ho, T. K. (2017). A sparse coding approach to household electricity
demand forecasting in smart grids. IEEE Transactions on Smart Grid, 8(2), 738–748.
3. Stephen, B., Tang, X., Harvey, P. R., Galloway, S., & Jennett, K. I. (2017). Incorporating
practice theory in sub-profile models for short term aggregated residential load forecasting.
IEEE Transactions on Smart Grid, 8(4), 1591–1598.
4. Quilumba, F. L., Lee, W.-J., Huang, H., Wang, D. Y., & Szabados, R. L. (2015). Using smart
meter data to improve the accuracy of intraday load forecasting considering customer behavior
similarities. IEEE Transactions on Smart Grid, 6(2), 911–918.
5. Li, S., Wang, P., & Goel, L. (2016). A novel wavelet-based ensemble method for short-term load
forecasting with hybrid neural networks and feature selection. IEEE Transactions on Power
Systems, 31(3), 1788–1798.
6. Mendes-Moreira, J., Soares, C., Jorge, A. M., & De Sousa, J. F. (2012). Ensemble approaches
for regression: A survey. ACM Computing Surveys (CSUR), 45(1), 1–10.
7. Sevlian, R., & Rajagopal, R. (2018). A scaling law for short term load forecasting on varying
levels of aggregation. International Journal of Electrical Power & Energy Systems, 98, 350–
361.
8. Steinbach, M., Karypis, G., Kumar, V., & et al. (2000) A comparison of document clustering
techniques. KDD Workshop on Text Mining (Vol. 400, pp. 525–526). Boston.
9. Irish Social Science Data Archive. (2012). Commission for Energy Regulation (CER) Smart
Metering Project. http://www.ucd.ie/issda/data/commissionforenergyregulationcer/.
10. Ausgird. Distribution zone substation information data to share. http://www.ausgrid.
com.au/Common/About-us/Corporate-information/Data-to-share/DistZone-subs.aspx#.
WYD6KenauUl. Retrieved July 31, 2017.
Chapter 13
Prospects of Future Research Issues
Abstract Although smart meter data analytics has received extensive attention and
rich literature studies related to this area have been published, developments in com-
puter science and the energy system itself will certainly lead to new problems or
opportunities. In this chapter, we discuss some research trends for smart meter data
analytics, such as big data issues, novel machine learning technologies, new business
models, the transition of energy systems, and data privacy and security. By the end
of this book, we hope this chapter can help readers identify new issues and works on
smart meter data analytics in the future smart grid.
Substantial works in the literature have conducted smart meter data analytics. Two
special sections about big data analytics for smart grid modernization were hosted
in IEEE Transactions on Smart Grid in 2016 [1] and IEEE Power and Energy Mag-
azine in 2018, respectively [2]. However, the size of the dataset analyzed can hardly
be called big data. How to efficiently integrate more multivariate data with a larger
size to discover more knowledge is an emerging issue. As shown in Fig. 13.1, big
data issues with smart meter data analytics include at least two aspects: the first is
multivariate data fusion, such as economic information, meteorological data, and EV
charging data apart from energy consumption data; the second is high-performance
computing, such as distributed computing, GPU computing, cloud computing, and
fog computing. It should also be noted that more data collection and analysis may
bring more value as well as a larger cost. Collecting smart meter data without con-
sideration of cost is unreasonable. How to make a balance between the value and the
cost of data collection and analysis is also an interesting problem.
(1) Multivariate Data Fusion
The fusion of various data is one of the basic characteristics of big data [3]. Current
studies mainly focus on the smart meter data itself or even electricity consumption
© Science Press and Springer Nature Singapore Pte Ltd. 2020 287
Y. Wang et al., Smart Meter Data Analytics,
https://doi.org/10.1007/978-981-15-2624-4_13
288 13 Prospects of Future Research Issues
Fig. 13.1 Big data issues with smart meter data analytics
data. Very few papers consider weather data, survey data from consumers, or some
other external data. Integrating more external data, such as crowd-sourcing data from
the Internet, weather data, voltage, and current data, and even voice data from service
systems may reveal more information. The multivariate data fusion needs to deal with
structured data with different granularities and unstructured data. We would like to
emphasis that big data is a change of concept. More data-driven methods will be
proposed to solve practical problems that may traditionally be solved by model-
based methods. For example, with redundant smart meter data, the power flow of the
distribution system can be approximated through hyperplane fitting methods such as
ANN and SVM. In addition, how to visualize high dimensional and multivariate data
to highlight the crucial components and discover the hidden patterns or correlations
among these data is a seldom touched area [4].
(2) High-Performance Computing
In addition, a majority of smart meter data analytics methods that are applicable
to small datasets may not be appropriate for large datasets. Highly efficient algo-
rithms and tools such as distributed and parallel computing and the Hadoop platform
should be further investigated. Cloud computing, an efficient computation architec-
ture that shares computing resources on the Internet, can provide different types of
big data analytics services, including Platform as a Service (PaaS), Software as a
Service (SaaS), and Infrastructure as a Service (IaaS) [5]. How to make full use of
cloud computing resources for smart meter data analytics is an important issue. How-
ever, the security problem introduced by cloud computing should be addressed [6].
Another high-performance computation approach is GPU computing. It can realize
highly efficient parallel computing [7]. Specific algorithms should be designed for
the implementation of different GPU computing tasks.
13.2 New Machine Learning Technologies 289
Smart meter data analytics is an interdisciplinary field that involves electrical engi-
neering and computer science, particularly machine learning. The development of
machine learning has great impacts on smart meter data analytics. The application of
new machine learning technologies is an important aspect of smart meter analytics.
The recently proposed clustering method in [8] has been used in [9]; the progress in
deep learning in [10] has been used in [11]. When applying a machine learning tech-
nology to smart meter data analytics, the limitations of the method and the physical
meaning revealed by the method should be carefully considered. For example, the
size of data or samples should be considered in deep learning to avoid overfitting.
(1) Deep Learning and Transfer Learning
Deep learning has been applied in different industries, including smart grids. As
summarized above, different deep learning techniques have been used for smart meter
data analytics, which is just a start. Designing different deep learning structures for
different applications is still an active research area. The lack of label data is one
of the main challenges for smart meter data analytics. How to apply the knowledge
learned for other objects to the research objects using transfer learning can help us
fully utilize various data [12]. Many transfer learning tasks are implemented by deep
learning [13]. The combination of these two emerging machine learning techniques
may have widespread applications.
(2) Online Learning and Incremental Learning
Note that smart meter data are essentially real-time stream data. Online learning
and incremental learning are varied suitably for handling these real-time stream
data [14]. Many online learning techniques, such as online dictionary learning [15]
and incremental learning techniques such as incremental clustering [16], have been
proposed in other areas. However, existing works on smart meter data analytics
rarely use online learning or incremental learning, expect for several online anomaly
detection methods.
retailers, aggregators, and individual consumers. How to analyze the smart data and
how much data should be analyzed in the micro electricity market to promote friendly
electricity consumption and renewable energy accommodation is a new perspective
in future distribution systems.
(2) Sharing Economy
For the distribution system with distributed renewable energy and energy storage
integration, a new business model sharing economy can be introduced. The con-
sumers can share their rooftop PV [19] and storage [20] with their neighborhoods.
In this situation, the roles of consumers, retailers, and DSO will change when play-
ing the game in the energy market [21]. Other potential applications of smart meter
data analytics may exist, such as changes in electricity purchasing and consumption
behavior and optimal grouping strategies for sharing energy.
As shown in Fig. 13.2, the integration of distributed renewable energy and multiple
energy systems is an inevitable trend in the development of smart grids. A typical
smart home has multiple loads, including cooling, heat, gas, and electricity. These
newcomers such as rooftop PV, energy storage, and EV also change the structure of
future distribution systems.
(1) High Penetration of Renewable Energy
High penetration of renewable energy such as behind-the-meter PV [22, 23] will
greatly change the electricity consumption behavior and will significantly influence
the net load profiles. Traditional load profiling methods should be improved to con-
sider the high penetration of renewable energy. In addition, by combining weather
data, electricity price data, and net load data, the capacity and output of renewable
energy can be estimated. In this way, the original load profile can be recovered.
Energy storage is widely used to suppress renewable energy fluctuations. However,
13.4 Transition of Energy Systems 291
As stated above, the concern regarding smart meter privacy and security is one of
the main barriers to the privilege of smart meters. Many existing works on the data
privacy and security issue mainly focus on the data communication architecture and
physical circuits [26]. How to study the data privacy and security from the perspective
of data analytics is still limited.
(1) Data Privacy
Analytics method for data privacy is a new perspective except for communication
architecture, such as the design of privacy-preserving clustering algorithm [27] and
PCA algorithm [28]. Strategic battery storage charging and discharging schedule was
proposed in [29] to mask the actual electricity consumption behavior and alleviate
the privacy concerns. However, several basic issues about smart meter data should
be but have not been addressed: Who owns the smart meter data? How much can
private information be mined from these data? Is it possible to disguise data to protect
privacy and not to influence the decision making of retailers?
(2) Data Security
For data security, the works on cyber-physical security (CPS) in the smart grid such
as phasor measurement units (PMU) and supervisory control and data acquisition
(SCADA) data attacks have been widely studied [30]. However, different types of
cyberattacks for electricity consumption data such as nontechnical loss should be
further studied.
292 13 Prospects of Future Research Issues
References
1. Hong, T., Chen, C., Huang, J., Ning, L., Xie, L., & Zareipour, H. (2016). Guest editorial big
data analytics for grid modernization. IEEE Transactions on Smart Grid, 7(5), 2395–2396.
2. Hong, T. (2018). Big data analytics: Making the smart grid smarter [guest editorial]. IEEE
Power and Energy Magazine, 16(3), 12–16.
3. Lv, Z., Song, H., Basanta-Val, P., Steed, A., & Jo, M. (2017). Next-generation big data ana-
lytics: State of the art, challenges, and future research topics. IEEE Transactions on Industrial
Informatics, 13(4), 1891–1899.
4. Hyndman, R. J., Liu, X. A., & Pinson, P. (2018). Visualizing big energy data: Solutions for
this crucial component of data analysis. IEEE Power and Energy Magazine, 16(3), 18–25.
5. Baek, J., Vu, Q. H., Liu, J. K., Huang, X., & Xiang, Y. (2015). A secure cloud computing based
framework for big data information management of smart grid. IEEE Transactions on Cloud
Computing, 3(2), 233–244.
6. Bera, S., Misra, S., & Rodrigues, J. J. P. C. (2015). Cloud computing applications for smart
grid: A survey. IEEE Transactions on Parallel and Distributed Systems, 26(5), 1477–1494.
7. Mittal, S. (2017). A survey of techniques for architecting and managing gpu register file. IEEE
Transactions on Parallel and Distributed Systems, 28(1), 16–28.
8. Rodriguez, A., & Laio, A. (2014). Clustering by fast search and find of density peaks. Science,
344(6191), 1492–1496.
9. Wang, Y., Chen, Q., Kang, C., & Xia, Q. (2016). Clustering of electricity consumption behavior
dynamics toward big data applications. IEEE Transactions on Smart Grid, 7(5), 2437–2447.
10. Gers, F. A., Schmidhuber, J., & Cummins, F. (2000). Learning to forget: Continual prediction
with lstm. Neural Computation, 12(10), 2451–2471.
11. Marino, D. L., Amarasinghe, K., & Manic, M. (2016). Building energy load forecasting using
deep neural networks. IECON 2016-42nd Annual Conference of the IEEE Industrial Electronics
Society (pp. 7046–7051). IEEE.
12. Pan, S. J., & Yang, Q. (2010). A survey on transfer learning. IEEE Transactions on Knowledge
and Data Engineering, 22(10), 1345–1359.
13. Bengio, Y. (2012). Deep learning of representations for unsupervised and transfer learning.
Proceedings of ICML Workshop on Unsupervised and Transfer Learning (pp. 17–36).
14. Diethe, T., & Girolami, M. (2013). Online learning with (multiple) kernels: A review. Neural
Computation, 25(3), 567–625.
15. Xie, Y., Zhang, W., Li, C., Lin, S., Yanyun, Q., & Zhang, Y. (2014). Discriminative object track-
ing via sparse representation and online dictionary learning. IEEE Transactions on Cybernetics,
44(4), 539–553.
16. Zhang, Q., Zhu, C., Yang, L. T., Chen, Z., Zhao, L., & Li, P. (2017). An incremental CFS algo-
rithm for clustering large data in industrial internet of things. IEEE Transactions on Industrial
Informatics, 13(3), 1193–1201.
17. Rahimi, F. A., & Ipakchi, A. (2012). Transactive energy techniques: Closing the gap between
wholesale and retail markets. The Electricity Journal, 25(8), 29–35.
18. Kok, K., & Widergren, S. (2016). A society of devices: Integrating intelligent distributed
resources with transactive energy. IEEE Power and Energy Magazine, 14(3), 34–45.
19. Celik, B., Roche, R., Bouquain, D., & Miraoui, A. (2017). Decentralized neighborhood energy
management with coordinated smart home energy sharing. IEEE Transactions on Smart Grid,
9(6), 6387–6397.
20. Liu, N., Xinghuo, Y., Wang, C., & Wang, J. (2017). Energy sharing management for micro-
grids with PV prosumers: A stackelberg game approach. IEEE Transactions on Industrial
Informatics, 13(3), 1088–1098.
21. Ye, G., Li, G., Di, W., Chen, X., & Zhou, Y. (2017). Towards cost minimization with renewable
energy sharing in cooperative residential communities. IEEE Access, 5, 11688–11699.
22. Shaker, H., Zareipour, H., & Wood, D. (2016). Estimating power generation of invisible solar
sites using publicly available data. IEEE Transactions on Smart Grid, 7(5), 2456–2465.
References 293
23. Wang, Y., Zhang, N., Chen, Q., Kirschen, D. S., Li, P., & Xia, Q. (2017). Data-driven proba-
bilistic net load forecasting with high penetration of behind-the-meter PV. IEEE Transactions
on Power Systems, 33(3), 3255–3264.
24. Chitsaz, H., Zamani-Dehkordi, P., Zareipour, H., & Parikh, P. P. (2017). Electricity price fore-
casting for operational scheduling of behind-the-meter storage systems. IEEE Transactions on
Smart Grid, 9(6), 6612–6622.
25. Krause, T., Andersson, G., Frohlich, K., & Vaccaro, A. (2011). Multiple-energy carriers: mod-
eling of production, delivery, and consumption. Proceedings of the IEEE, 99(1), 15–27.
26. Yongdong, W., Chen, B., Weng, J., Wei, Z., Li, X., Qiu, B., & et al. (2018). False load attack to
smart meters by synchronously switching power circuits. IEEE Transactions on Smart Grid,
10(3), 2641–2649.
27. Xing, K., Chunqiang, H., Jiguo, Y., Cheng, X., & Zhang, F. (2017). Mutual privacy preserving k-
means clustering in social participatory sensing. IEEE Transactions on Industrial Informatics,
13(4), 2066–2076.
28. Wei, L., Sarwate, A. D., Corander, J., Hero, A., & Tarokh, V. (2016). Analysis of a privacy-
preserving pca algorithm using random matrix theory. IEEE Global Conference on Signal and
Information Processing (GlobalSIP) (pp. 1335–1339).
29. Salehkalaibar, S., Aminifar, F., & Shahidehpour, M. (2017). Hypothesis testing for privacy of
smart meters with side information. IEEE Transactions on Smart Grid, 10(2), 2059–2067.
30. Yan, Y., Qian, Y., Sharif, H., & Tipper, D. (2012). A survey on cyber security for smart grid
communications. IEEE Communications Surveys & Tutorials, 14(4), 998–1010.