0% found this document useful (0 votes)
146 views6 pages

AI - Machine Learning Algorithms Applied To Transformer Diagnostics

This document discusses using machine learning algorithms to analyze operational data from power transformers and classify their condition. It describes training multiple machine learning models using data from 1,000 transformers that were individually assessed by human experts. The models were trained to output condition assessments of "green", "yellow", or "red" to classify new transformer cases. The document outlines the process of applying both linear and non-linear machine learning algorithms to the labeled training data in order to automatically classify transformer condition based on operational parameters.

Uploaded by

ABHINAV SAURAV
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
146 views6 pages

AI - Machine Learning Algorithms Applied To Transformer Diagnostics

This document discusses using machine learning algorithms to analyze operational data from power transformers and classify their condition. It describes training multiple machine learning models using data from 1,000 transformers that were individually assessed by human experts. The models were trained to output condition assessments of "green", "yellow", or "red" to classify new transformer cases. The document outlines the process of applying both linear and non-linear machine learning algorithms to the labeled training data in order to automatically classify transformer condition based on operational parameters.

Uploaded by

ABHINAV SAURAV
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

MACHINE LEARNING

Figure 1. Illustration of the method employed to train multiple machine


learning (ML) algorithms based on a transformers’ operational data supervised
by human experts. The output of each ML algorithm is the actual condition
of individual transformers (green = good, yellow = acceptable but requiring
maintenance, and red = unacceptable presenting elevated operational risk).

AI - machine learning
algorithms applied
to transformer
diagnostics
ABSTRACT paper describes the use of machine able in ML supervised training mode.
learning (ML) algorithms as supporting The paper describes the main steps
With the arrival of the age of big data, tools for the automatic classification of towards the training of the multiple ML
e-commerce, and smartphones, there power transformers operating condi- algorithms and the stunning output
has been a growing interest in the appli- tions. The work [1] consists of training produced by those algorithms when
cation of fast and sophisticated tools, of multiple ML algorithms with real-life requested to analyze 200 unseen trans-
namely machine learning algorithms, data from 1,000 (one thousand) trans- former cases (new cases).
to handle massive amounts of data formers that were individually analyzed
and extract meaningful information by human experts. Each transformer in KEYWORDS
that can boost and speed up regres- the database was scored with a ‘green,’
sion and classification problems, as for ‘yellow’ or ‘red’ card depending on the automated tool, condition assessment,
example in short term load forecasting data and the interpretation of human machine learning algorithms, trans-
and asset condition assessment. This experts, thus serving as the target vari- former diagnostics

76
76 TRANSFORMERS MAGAZINE | Special Edition: Digitalization | 2020
Advertorial

Machine learning algorithms can be interpreted as a universal non-linear


approximator which can be used for fitting very complex multidimen-
sional data with an arbitrary number of inputs and outputs

Table 1. Structure of training dataset, showing 10 random samples

Bsh- Bsh- CO-


Age IMP HV MVA TF IFT DS PF25 H2O H2 … CO2 O2 N2 H1PF H1Cap O2N2 Class
PF Cap 2CO

43 70 345,0 201,6 1,00 33,1 44,1 0,005 16,0 10 913 12479 65221 0,51 35,9 0,51 359 14,27 0,19 2

20 57 141,0 93,0 14,00 33,0 35,0 0,030 3,9 2 210 1900 110000 0,39 2562,8 0,52 190 42,00 0,02 3

44 70 345,0 33,3 0,10 33,4 34,4 0,036 20,0 8 333 6910 32940 0,40 36,5 0,40 365 8,12 0,21 2

44 100 765,0 100,0 0,10 33,9 42,0 0,078 6,6 50 3484 377 26202 0,25 44,0 0,25 440 11,28 0,01 1

34 60 20,9 39,2 0,20 31,0 35,0 0,020 33,0 8 2012 260 21440 0,42 1051,2 0,35 151 15,24 0,02 3

25 85 345,0 660,8 0,66 26,0 35,0 0,051 9,0 3 8818 5715 70864 0,19 8000,0 0,41 1838 58,79 0,08 3

22 30 230,0 53,3 0,66 42,0 39,0 0,013 12,0 11 540 2135 79702 0,36 1542,8 0,32 179 22,50 0,03 3

23 100 765,0 500,0 2,00 34,0 25,0 0,042 7,2 48 2710 28215 79492 0,24 38,9 0,24 389 3,07 0,35 1

51 47 161,0 230,0 0,20 33,0 42,0 0,180 13,0 25 5472 1103 68585 0,61 3396,3 0,60 308 195,43 0,02 2

10 100 765,0 112,4 0,10 25,0 35,0 0,005 2,7 24 5608 8661 24715 0,37 58,6 0,37 586 4,38 0,35 3

Table 2. Statistical description of all features of the transformer dataset

Bsh- Bsh- CO-


Age IMP HV MVA TF IFT DS PF25 H2O H2 … CO2 O2 N2 H1PF H1Cap O2N2 Class
PF Cap CO
2

Count 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000

Mean 28,8 54,0 290,3 191,4 7,6 34,7 36,6 0,121 8,9 63,4 2156,6 6221,2 57002,1 0,40 1149,2 0,38 394,8 38,20 0,11

Stdev 16,2 30,8 250,3 202,9 14,6 5,5 8,0 1,534 9,2 568,0 2594,6 8256,0 33658,7 0,26 1308,2 0,21 615,7 69,47 0,11

Min 1,0 0,6 4,2 0,2 0,0 0,0 7,0 0,001 1,0 1,0 15,0 62,0 53,0 0,04 0,0 0,12 0,0 0,07 0,00 1

0,25 16,8 29,0 138,0 47,0 0,2 33,0 35,0 0,005 3,9 8,0 517,8 960,0 28962,3 0,26 40,2 0,27 156,8 7,46 0,02 1

0,5 30,0 50,0 161,0 100,0 0,7 33,0 35,0 0,023 5,0 11,2 1290,5 3113,5 56716,0 0,35 964,8 0,36 217,2 13,35 0,06 3

0,75 39,0 80,0 345,0 260,8 5,0 37,8 37,0 0,060 11,0 25,0 2643,0 8577,8 77402,8 0,45 1764,0 0,42 408,5 34,86 0,18 3

Max 79,0 100,0 765,0 1000,0 79,0 56,8 75,0 35,300 117,0 15092,0 22200,0 74556,0 300210,0 4,63 9195,6 2,90 7062,2 700,00 0,50 3

Fitting of input data to desired output data using ML algorithms is called


learning or training, and once the ML model is trained, it can be used for
predicting output value for arbitrary inputs
w w w . t ra n sfo r m e r s - m a g a z i n e . co m 77
MACHINE LEARNING

Machine learning algorithms and tech- Linear algorithms


1. General linear regression (logistic
niques are used for the assessment of the regression) - GLM
transformer’s conditions 2. Linear discriminant analysis – LDA

Non-linear algorithms
3. Classification and regression trees
1. Introduction ing based on a 10-fold cross-validation (CART)
procedure with 3 repeats, yielding 30 out- 4. C5.0 (a type of CART algorithm)
1.1 Dataset put accuracies for each machine learning 5. Naïve Bayes algorithm (NB)
algorithm [2-5], with each accuracy cor- 6. K-nearest neighbor (KNN)
The dataset employed to train the ma- responding to each fold in a given repeat 7. Support vector machine (SVM)
chine learning algorithms contained process. The supervised learning was
24 typical transformer parameters such applied with the support of human ex- Ensemble algorithms
as nameplate data, DGA, oil quality, perts who have analyzed the same 1,000 8. Random forest (stochastic assem-
insulation power factor, etc. As illus- cases provided to the machine learning bly of a large number of CART al-
trated in Table 1 and Table 2, it pro- algorithms. gorithms)
vides a general statistical description of 9. Tree bagging (tree bagging)
each parameter for the whole dataset. Machine learning algorithms 10. Extreme gradient boosting ma-
chine (XGB tree)
1.2 Machine learning training with The following 12 ML algorithms were 11. Extreme gradient boosting ma-
10-fold cross-validation trained and compared in the present work: chine (XGB linear)

The training was achieved by first ran-


dom partitioning the original dataset
with 1,000 transformers into two subsets, Machine learning algorithms have the pos-
in which one dataset contained data for
800 transformers (training dataset), and sibility to estimate a part of the missing
the remaining 200 transformers were
used as validation or test dataset. The
data, which is extremely important in the
training process was a supervised learn- transformer diagnostics application

Figure 2. Map of missing values in the 1,000 cases used in the current paper to train and test the machine learning algorithms. Red lines show missing
values for each column of data. Greyscale color shows available data, varying from low numbers (white) to high numbers (black).

78 TRANSFORMERS MAGAZINE | Special Edition: Digitalization | 2020


12. Artificial neural networks (ANN –
not deep learning yet)

The following section describes the mean-


ing of statistical learning for each algo-
rithm.

2. Statistical learning process


Statistical learning has a different inter-
pretation for each of the above-indicated
algorithms. In linear regression, for ex-
ample, the learning process is associat-
ed with the optimal search of the linear
model coefficients that best correlate
inputs to outputs in a given problem. In
the classification and regression trees, the
training is related to a statistical meth-
od that optimizes the breakdown of the
feature space (transformer parameters)
into a decision tree, capable of classifying
transformers based on the class distribu-
tion inside the tree. The support vector
machine (SVM) is the so-called “widest
band classifier” that optimizes the sepa- Figure 3. Correlation matrix of all 24 variables used in the study. Blue ellipses indicate a
ration between different classes in a given positive correlation; red ellipses show a negative correlation. Color intensity is proportional
dataset. In the neural networks method, to correlation. Blank squares denote that there is no correlation.
the learning process is related to finding
the optimal distribution of weights inter-
connecting multiple nodes in different The condition assessment data generated
layers until the classification error reach-
es a maximum acceptable threshold. The
by the transformer experts have been used
fact of the matter is that, although there to train the various types of ML algorithms,
are different algorithms and learning
methodologies, the so-called “learning” and their accuracy was compared
is only possible due to the robust sta-
tistical procedures applied to each in-
dividual algorithm, through repetition strategy. After the learning, algorithms cause it has been overlooked. A human
of thousands of examples of different are tested against the 200 unseen cases, expert will intuitively handle missing
class types, until each algorithm is capa- and another accuracy is calculated. data by, for example, assuming a missing
ble of outputting an acceptable level of An interesting method of showing the parameter (say bushing power factor) is
accuracy. accuracy of such a test is through the so- normal and, as such, will not influence
called “confusion matrix” to be described the decision about the condition of the
Each random case illustrated in Fig. 1, in the sections below. transformer. This is called “single value
out of the 800 training cases, contains imputation” or “educated guess” since it
the features and classification (expert 3. Handling missing data replaces missing data with “normal” val-
judgment) necessary in the learning ues. The most common imputation pro-
phase of each ML algorithm. In the Missing data is perhaps the single most cedures are:
end, each algorithm maps input data to important aspect of any machine learn-
output class in a statistical process that is ing technique, and it is also extremely a) Single imputation (educated guess,
characteristic of that specific algorithm. important in any transformer diagnostic mean or even median value of a dis-
The supervised learning takes place in a process since human experts are typi- tribution),
comparison between the output of each cally forced to make decisions based on b) Feature correlation (make the column
individual machine learning algorithm incomplete data. Fig. 2 shows a concise of missing data a function of all other
and the one posted by the human map of the actual missing data in the cur- parameters),
expert for each individual example. An rent dataset of 1,000 transformers. c) Multiple imputations (find the prob-
error or cost function is defined, and a ability distribution function that best
proper statistical process is employed There are several possible approaches adheres to the data),
to minimize such a cost so that each to the problem of missing data, but one d) Use of probabilistic belief propagation
algorithm will provide the best possible can say for sure that missing data is like a algorithm (such as in Bayesian net-
accuracy based on each model’s training medical issue: it will not go away just be- works).

w w w . t ra n sfo r m e r s - m a g a z i n e . co m 79
MACHINE LEARNING

The algorithms that showed the best tion procedure was applied to any of the
tested algorithms and that the so-called
performance were those based on “deep learning” was not employed with
aggregation or ensemble of classification the artificial neural networks.

and regression trees, with an accuracy 4.1 Principle behind CART


close to 97 % A full explanation of entropy and CART
is beyond the scope of the present paper
but interested readers may find a wealth
Statistically speaking, a single value im- the current example), as illustrated in of information in the references below
putation (like in the educated guess or Fig. 3. Several methods have been tested, and further. Finally, it is important to
in the replacement of missing data by but the simple imputation by the median mention that tree bag, random forest,
the mean or median, for example) may showed good enough results in the pres- and the extreme gradient boosting ma-
work well in certain applications but suf- ent work. chines (xGBM1, xGBM2 in the present
fers from a possibly significant change in work) are different forms of association
the original distribution of data. Another 4. Best performing machine of multiple CARTs, so that statistical
very powerful technique is the so-called combinations of weaker algorithms may
statistical multiple imputations, although
learning algorithms lead to much stronger outputs.
it is not of trivial application. The idea is After replacing the missing data for each
to replace each missing datum with a transformer, and training and testing all
randomly selected value from the actu- 12 machine learning algorithms with
5. ML algorithms output
al probability density function that best the available dataset, duly analyzed by Fig. 4 below shows the boxplots with
fits the remaining data for that parame- transformer experts, the algorithms that comparative results of training accuracy
ter. Feature correlation may also work, showed the best performance were those for the 12 described machine learning
but it depends on a complex analysis re- based on aggregation or ensemble of clas- models. Notice that the top 5 best per-
garding the level of correlation between sification and regression trees (CART). forming models are all variations and
features and the target variable (class, in One should mention that no optimiza- ensembles of CART, and their major

Figure 4. Comparative accuracy of machine learning algorithms after training 12 models with 80 % of the available data, by 10-fold cross-validation
(CV) and 3 repeats. The ML algorithms were Naïve Bayes, linear discriminant analysis (LDA), classification and regression trees (CART), general linear
model (GLM), support vector machine (SVM), K-nearest neighbor (KNN), artificial neural networks (ANN), tree bagging, extreme gradient boosting
machine (xGBM1 and xGBM2), random forest (RF) and C5.0.

80 TRANSFORMERS MAGAZINE | Special Edition: Digitalization | 2020


differences are in the process of build- The machine learning algorithms have
ing multiple trees and their combina-
tions that will best separate the data after shown an impressive accuracy when an-
learning from the training dataset. The
test accuracy is obtained by comparing
alyzing complex power transformer data,
the output of the system when classifying however, human expert judgment is crucial
data that were not used during training
(200 new cases not used during training)
in their training process
against the human experts’ opinion for
those new cases. This is typically given
in the format of the so-called confusion classified as green cases out of 200, lead- ing Tools and Techniques”, 3rd edition,
matrix illustrated in Table 3 for the best ing to 3/200 = 1.5  % real miss, since the Elsevier 2011
performing method Extreme Gradient other misses were conservative and would
Boosting Machine 1 (xGBM1). not lead to any unfavorable situation Dr. Luiz Cheim
like a possible failure. The paper has also
demonstrated the importance of human Dr. Luiz Cheim is a Senior Principal
6. Discussion and conclusions expert judgment in the training and learn- R&D Engineer at Hitachi ABB Power
The machine learning algorithms have ing process of the ML algorithms, partic- Grids, with over 30 years’ experience
shown an impressive accuracy when an- ularly with respect to power transformer in the power transformers industry.
alyzing complex power transformer data diagnostics. His major activities and interests are
without using any engineering model in the development of transformer
whatsoever. In other words, the algo- condition assessment, performance
rithms were not provided with reference
Bibliography models and algorithms, as well as new
levels or flags to indicate that a given pa- [1] L. Cheim, “Machine Learning Tools sensors and state-of-the-art monitoring
rameter was within the acceptable range in Support of Transformer Diagnostics”, technologies. Dr. Cheim developed
or outside the “normal” range. The 12 Cigre Paris Session, August 2018, paper the transformer algorithms in Ellipse
ML models were only provided with the A2-206 (Best Paper Award) APM solution and is the proponent of
final classification between green, yellow, the new Transformer Inspection Robot
and red previously established by trans- [2] J. H. Friedman, R. Tibshirani, T. Hastie, (TXploreTM). In August 2018, he was
former human experts. The best perform- “The Elements of Statistical Learning”, granted the Best Paper Award by Cigre
ing algorithm (xGBM1) presented near 2nd. Edition, Springer 2009 organization in Paris, Study Committee
97  % accuracy when analyzing the 200 A2-206/PS2. He has over 20 patents
new test cases unseen during training. It [3] M. Kuhn, K. Johnson, “Applied Pre- granted or filed, over the last 10 years
missed one green case that was “wrong- dictive Modelling”, Springer 2013 alone, and has been recently selected
ly” but conservatively classified as red, 3 to represent ABB at the Public Utilities
yellow cases that were wrongly classified [4] J. Berger, “Statistical Decision Theory Magazine (PUC, November 2019 issue)
as green, and 3 yellow cases that were and Bayesian Analysis”, 2nd Edition 1993 as a top innovator. He has been a long-
wrongly classified as red. No red case was standing member of both Cigre and the
wrongly classified. The significant number [5] I. Witten, E. Frank, M. Hall, “Data IEEE and has taken several active roles in
of misses in practical terms is 3 yellows Mining: Practical Machine Learn- both organizations.

Table 3. Confusion matrix and statistics (200 new test cases, ML = xGBM1)

Human expert classification 

ML Prediction  Green Yellow Red

Green 61 3 0

Yellow 0 14 0

Red 1 3 118

Totals 62 20 118

Algorithm accuracy = (61 + 14 + 118) / 200 = 96.5% Eq. (1)

w w w . t ra n sfo r m e r s - m a g a z i n e . co m 81

You might also like