Solar Energy
Solar Energy
Solar Energy
journal homepage: www.elsevier.com/locate/solener
Keywords: Electroluminescence (EL) imaging is a useful modality for the inspection of photovoltaic (PV) modules. EL
Deep learning images provide high spatial resolution, which makes it possible to detect even finest defects on the surface of PV
Defect classification modules. However, the analysis of EL images is typically a manual process that is expensive, time-consuming,
Electroluminescence imaging and requires expert knowledge of many different types of defects.
Photovoltaic modules
In this work, we investigate two approaches for automatic detection of such defects in a single image of a PV
Regression analysis
cell. The approaches differ in their hardware requirements, which are dictated by their respective application
Support vector machines
Visual inspection scenarios. The more hardware-efficient approach is based on hand-crafted features that are classified in a
Support Vector Machine (SVM). To obtain a strong performance, we investigate and compare various processing
variants. The more hardware-demanding approach uses an end-to-end deep Convolutional Neural Network
(CNN) that runs on a Graphics Processing Unit (GPU). Both approaches are trained on 1968 cells extracted from
high resolution EL intensity images of mono- and polycrystalline PV modules. The CNN is more accurate, and
reaches an average accuracy of 88.42%. The SVM achieves a slightly lower average accuracy of 82.44%, but can
run on arbitrary hardware. Both automated approaches make continuous, highly accurate monitoring of PV cells
feasible.
⁎
Corresponding author at: Energy Campus Nuremberg, Fürther Str. 250, 90429 Nuremberg, Germany.
E-mail address: [email protected] (S. Deitsch).
https://doi.org/10.1016/j.solener.2019.02.067
Received 8 July 2018; Received in revised form 9 February 2019; Accepted 26 February 2019
Available online 02 May 2019
0038-092X/ © 2019 International Solar Energy Society. Published by Elsevier Ltd. All rights reserved.
S. Deitsch, et al. Solar Energy 185 (2019) 455–468
Fig. 1. Various intrinsic and extrinsic defects in monocrystalline ((a)–(b)) and polycrystalline ((c)–(e)) solar cells. (a) shows a solar cell with a typical material defect.
(b) shows finger interruptions in the encircled areas, which do not necessarily reduce the module efficiency. The solar cell in (c) contains a microcrack that is very
subtle in its appearance. While microcracks do not divide the cell completely, they still must be detected because such cracks may grow over time and eventually
impair the module efficiency. The spots at the bottom of this cell are likely to indicate cell damage as well. However, such spots can be oftentimes difficult to
distinguish from actual material defects. (d) shows a disconnected area due to degradation of the cell interconnection. (e) shows a cell with electrically separated or
degraded parts, which are usually caused by mechanical damage.
Electroluminescence (EL) imaging (Fuyuki et al., 2005; Fuyuki and The investigated classification approaches in this work are SVM and
Kitiyanan, 2009) is another established non-destructive technology for CNN classifiers.
failure analysis of PV modules with the ability to image solar modules
at a much higher resolution. In EL images, defective cells appear darker, Support Vector Machines (SVMs) are trained on various features
because disconnected parts do not irradiate. To obtain an EL image, extracted from EL images of
current is applied to a PV module, which induces EL emission at a solar cells.
wavelength of 1150 nm. The emission can be imaged by a silicon Convolutional Neural Network (CNN) is directly fed with image pixels
Charge-coupled Device (CCD) sensor. The high spatial image resolution of solar cells and the corre-
enables the detection of microcracks (Breitenstein et al., 2011), and EL sponding labels.
imaging also does not suffer from blurring due to lateral heat propa-
gation. However, visual inspection of EL images is not only time-con- The SVM approach is computationally particularly efficient during
suming and expensive, but also requires trained specialists. In this training and inference. This allows to operate the method on a wide
work, we remove this constraint by proposing an automated method for range of commodity hardware, such as tablet computers or drones,
classifying defects in EL images. whose usage is dictated by the respective application scenario.
In general, defects in solar modules can be classified into two ca- Conversely, the prediction accuracy of the CNN is generally higher,
tegories (Fuyuki and Kitiyanan, 2009): (1) intrinsic deficiencies due to while training and inference is much more time-intensive and com-
material properties such as crystal grain boundaries and dislocations, monly requires a GPU for an acceptably short runtime. Particularly for
and (2) process-induced extrinsic defects such as microcracks and aerial imagery, however, additional issues may arise and will need to be
breaks, which reduce the overall module efficiency over time. solved. Kang and Cha (2018) highlight several challenges that need to
Fig. 1 shows an example EL image with different types of defects in be addressed before applying our approach outside of a manufacturing
monocrystalline and polycrystalline solar cells. Fig. 1(a) and (b) show setting.
general material defects from the production process such as finger
interruptions which do not necessarily reduce the lifespan of the af- 1.1. Contributions
fected solar panel unless caused by high strain at the solder joints
(Köntges et al., 2014). Specifically, the efficiency degradation induced The contribution of this work consists of three parts. First, we pre-
by finger interruptions is a complex interaction between their size, sent a resource-efficient framework for supervised classification of de-
position, and the number of interruptions (De Rose et al., 2012; Köntges fective solar cells using hand-crafted features and an SVM classifier that
et al., 2014). Fig. 1(c) to (e) show microcracks, degradation of cell- can be used on a wide range of commodity hardware, including tablet
interconnections, and cells with electrically separated or degraded parts computers and drones equipped with low-power single-board compu-
that are well known to reduce the module efficiency. Particularly the ters. The low computational requirements make the on-site evaluation
detection of microcracks requires cameras with high spatial resolution. of the EL imagery possible, similar to analysis of low resolution IR
For the detection of defects during monitoring one can set different images (Dotenco et al., 2016). Second, we present a supervised classi-
goals. Highlighting the exact location of defects within a solar module fication framework using a convolutional neural network that is slightly
allows to monitor affected areas with high precision. However, the more accurate, but requires a GPU for efficient training and classifica-
exact defect location within the solar cell is less important for the tion. In particular, we show how uncertainty can be incorporated into
quality assessment of a whole PV module. For this task, the overall both frameworks to improve the classification accuracy. Third, we
likelihood indicating a cell defect is more important. This enables a contribute an annotated dataset consisting of 2624 aligned solar cells
quick identification of defective areas and can potentially complement extracted from high resolution EL images to the community, and we use
the prediction of future efficiency loss within a PV module. In this work, this dataset to perform an extensive evaluation and comparison of the
we propose two classification pipelines that automatically solve the proposed approaches.
second task, i.e., to determine a per-cell defect likelihood that may lead Fig. 2 shows the assessment results of a solar panel using the pro-
to efficiency loss. posed convolutional neural network. Each solar cell in the EL image is
456
S. Deitsch, et al. Solar Energy 185 (2019) 455–468
Fig. 2. Defect probabilities inferred for each PV module cell by the proposed CNN. A darker shade of red indicates a higher likelihood of a cell defect. (For
interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
overlaid by the likelihood of a defect in the corresponding cell. identification of defects using Independent Component Analysis (ICA)
basis images. Defect-free solar cell subimages are used to find a set of
1.2. Outline independent basis images with ICA. The method achieves a high ac-
curacy of 93.40% with a relatively small training dataset of 300 solar
The remainder of this work is organized as follows. Related work is cell subimages. However, material defects such as finger interruptions
reviewed in Section 2. Section 3 introduces both proposed classification are treated equally to cell cracks. This strategy is therefore only suitable
approaches. In Section 4, we evaluate and compare these approaches, for detection of every abnormality on the surface of solar cells, but not
and discuss the results. This work is concluded in Section 5. for the prediction of future energy loss.
Anwar and Abdullah (2014) developed an algorithm for the detec-
2. Related work tion of microcracks in polycrystalline solar cells. They use anisotropic
diffusion filtering followed by shape analysis to localize the defects in
Visual inspection of solar modules via EL imaging is an active re- solar cells. While the method performs well at detecting microcracks, it
search topic. Most of the related work, however, focuses on the detec- does not consider other defect types such as completely disconnected
tion of specific intrinsic or extrinsic defects, but not on the prediction of cells, which appear completely dark in EL images.
defects that eventually lower the power efficiency of solar modules. Tseng et al. (2015) proposed a method for automatic detection of
Detection of surface abnormalities in EL images of solar cells is related finger interruptions in monocrystalline solar cells. The method employs
to structural health monitoring. However, it is important to note that binary clustering of features from candidate regions for the detection of
certain defects in solar cells are only specific to EL imaging of PV defects. Finger interruptions, however, do not necessarily provide sui-
modules. For instance, fully disconnected solar cells simply appear as table cues for prediction of future power loss.
dark image regions (similar to Fig. 1(d)) and thus have no comparable The success of deep learning led to a gradual replacement of tra-
equivalent in terms of structural defects. Additionally, surface irregu- ditional pattern recognition pipelines for optical inspection. However,
larities in solar wafers (such as finger interruptions) are easily confused to our knowledge, no CNN architecture has been proposed for EL
with cell cracks, even though they do not significantly affect the power images, but only for other modalities or applications. Most closely re-
loss. lated is the work by Mehta et al. (2018), who presented a system for
In the context of visual inspection of solar modules, Tsai et al. predicting the power loss, localization and type of soiling from RGB
(2012) use Fourier image reconstruction to detect defective solar cells images of solar modules. Their approach does not require manual lo-
in EL images of polycrystalline PV modules. The targeted extrinsic de- calization labels, but instead operates on images with the corresponding
fects are (small) cracks, breaks, and finger interruptions. Fourier image power loss as input. Masci et al. (2012) proposed an end-to-end max-
reconstruction is applied to remove possible defects by setting high- pooling CNN for classifying steel defects. Their network performance is
frequency coefficients associated with line- and bar-shaped artifacts to compared against multiple hand-crafted feature descriptors that are
zero. The spectral representation is then transformed back into the trained using SVMs. Although their dataset consists of only 2281
spatial domain. The defects can then be identified as intensity differ- training and 646 test images, the CNN architecture classifies steel de-
ences between the original and the high-pass filtered image. Due to the fects at least twice as accurately as the SVMs. Zhang et al. (2016)
shape assumption, the method has difficulties detecting defects with proposed a CNN architecture for detection of cracks on roads. To train
more complex shapes. the CNN, approximately 45,000 hand-labeled image patches were used.
Tsai et al. (2013) also introduced a supervised learning method for They show that CNNs greatly outperform hand-crafted features
457
S. Deitsch, et al. Solar Energy 185 (2019) 455–468
classified by a combination of an SVM and boosting. Cha et al. (2017) training the classifier and subsequent predictions, a global representa-
use a very similar approach to detect concrete cracks in a broad range of tion needs to be computed from the set of local descriptors, oftentimes
images taken under various environmental and illumination conditions. referred to as encoding. Finally, this global descriptor for a solar cell is
Kang and Cha (2018) employ deep learning for structural health classified into defective or functional. Fig. 3 visualizes the classification
monitoring on aerial imagery. Cha et al. (2018) additionally in- pipeline, consisting of masking, keypoint detection, feature description,
vestigated defect localization using the modern learning-based seg- encoding, and classification. We describe these steps in the following
mentation approaches for region proposals based on the Faster R-CNN subsections.
framework which can perform in real-time. Lee et al. (2019) also use
semantic segmentation to detect cracks in concrete.
3.1.1. Masking
In medical context, Esteva et al. (2017) employ deep neural net-
We assume that the solar cells were segmented from a PV module,
works to classify different types of skin cancer. They trained the CNN
e.g., using the automated algorithm we proposed in earlier work
end-to-end on a large dataset consisting of 129,450 clinical images and
(Deitsch et al., 2018). A binary mask allows then to separate the fore-
2032 different diseases making it possible to achieve a high degree of
ground of every cell from the background. The cell background includes
accuracy.
image regions that generally do not belong to the silicon wafer, such as
the busbars and the inter-cell borders. This mask can be used to strictly
3. Methodology limit feature extraction to the cell interior. In the evaluation, we in-
vestigate the usefulness of masking, and find that its effect is minor, i.e.,
We subdivide each module into its solar cells, and analyze each cell it only slightly improves the performance in a few feature/classifier
individually to eventually infer the defect likelihood. This breaks down combinations.
the analysis to the smallest meaningful unit, in the sense that the me-
chanical design of PV modules interconnects units of cells in series.
3.1.2. Feature extraction
Also, the breakdown considerably increases the number of available
In order to train the SVMs, feature descriptors are extracted first.
data samples for training. For the segmentation of solar cells, we use a
The locations of these local features are determined using two main
recently developed method (Deitsch et al., 2018), which brings every
sampling strategies: (1) keypoint detection, and (2) dense sampling.
cell into a normal form free of perspective and lens distortions.
These strategies are exemplarily illustrated in Fig. 4. Both strategies
Unless otherwise stated, the proposed methods operate on size-
produce different sets of features that can be better suitable for specific
normalized EL images of solar cells with a resolution of 300 × 300
types of solar wafers than others. Dense sampling disregards the image
pixels. This image resolution was derived from the median dimensions
content and instead uses a fixed configuration of feature points. Key-
of image regions corresponding to individual solar cells in the original
point detectors, on the other hand, rely on the textureness in the image
EL images of PV modules. The solar cell images are used directly as
and therefore the number of keypoints is proportional to the amount of
pipeline input. The image resolution of solar cells in the wild will
high-frequency elements, such as edges and corners (as can be seen in
generally deviate from the required resolution and therefore must be
Fig. 4(c) and (d)). Keypoint detectors typically operate in scale space,
adjusted accordingly. The CNN architecture sets a minimum image
allowing feature detection at different scale levels and also at different
resolution, which typically equals the CNN’s receptive field (e.g., the
orientations. Fig. 4(d) shows keypoints detected by KAZE. Here, each
original VGG-19 architecture uses 224 × 224 ). If the resolution is lower
keypoint has a different scale (visualized by the radius of corresponding
than this minimum resolution, then the image must be upscaled. For
circles) and also a specific orientation exemplified by the line drawn
higher resolutions, the network can be applied in a strided window
from the center to the circle border. Keypoints that capture both the
manner and afterwards the outputs are pooled together (typically using
scale and the rotation are invariant to changes in image resolution and
average or maximum pooling). We followed an alternative approach in
to in-plane rotations, which makes them very robust.
which the CNN architecture encodes this process inherently. In case of
Dense sampling subdivides the 300 × 300 pixels PV cell by over-
the SVM pipeline, the resolution requirement is less stringent. Given
laying it with a grid consisting of n × n cells. The center of each grid
local features that are scale-invariant, the image resolution of the
cell specifies the position at which a feature descriptor will be subse-
classified solar cells does not need to be adjusted and may vary from
quently extracted. The number of feature locations only depends on the
image to image.
grid size. Dense sampling can be useful if computational resources are
very limited, or if the purpose is to identify defects only in mono-
3.1. Classification using support vector machines crystalline PV modules.
We employ different popular combinations of keypoint detectors
The general approach for classification using SVMs (Cortes and and feature extractors from the literature, as listed in Table 1 and
Vapnik, 1995) is as follows. First, local descriptors are extracted from outlined below.
images of segmented PV cells. The features are typically extracted at Several algorithms combine keypoint detection and feature de-
salient points, also known as keypoints, or from a dense pixel grid. For scription. Probably the most popular of these methods is Scale-invariant
Fig. 3. An overview of the SVM classification pipeline with the four proposed variations of the preprocessing and feature extraction process.
458
S. Deitsch, et al. Solar Energy 185 (2019) 455–468
Fig. 4. Two different feature extraction strategies applied to the same PV cell with and without masking. In (a), keypoints are sampled at fixed positions specified by
the center of a cell in the overlaid grid. (b) uses equally sized and oriented keypoints laid out on a dense grid similar to (a). (c) shows an example for AGAST keypoints
(detection threshold slightly increased for visualization). (d) shows KAZE keypoints of various sizes and orientations after masking out the background area.
Table 1 binary feature descriptors are typically very fast to compute, they
Investigated keypoint detectors and feature descriptors. SIFT, SURF, and KAZE generally do not perform better than real-valued descriptors (Heinly
(in bold) contain both a detector and a descriptor. We explored also combi- et al., 2012).
nations of the keypoint detectors of AGAST and KAZE with other feature de-
scriptors. Note, the keypoints provided by SIFT and SURF were not reliable
enough and thus not further evaluated.
3.1.3. Combinations of detectors and extractors
Method Keypoint detector Feature descriptor For the purpose of determining the most powerful feature detector/
extractor combination, we evaluated all feature detector and feature
AGAST (Mair et al., 2010)
extractor combinations with few exceptions.
KAZE (Alcantarilla et al., 2012)
HOG (Dalal and Triggs, 2005)
In most cases, we neither tuned the parameters of keypoint detec-
PHOW (Bosch et al., 2007) tors nor those of feature extractors but rather used the defaults by
SIFT (Lowe, 1999) ( ) OPENCV (Itseez, 2017) as of version 3.3.1. One notable exception is
SURF (Bay et al., 2008) ( ) AGAST, where we lowered the detection threshold to 5 to be able to
VGG (Simonyan et al., 2014) detect keypoints in monocrystalline PV modules. For SIFT and SURF,
similar adjustments were not successful, which is why we only used
their descriptors. HOG requires a grid of overlapping image regions,
Feature Transform (SIFT) (Lowe, 1999), which detects and describes which is incompatible with the keypoint detectors. Instead, we down-
features at multiple scales. SIFT is invariant to rotation, translation, and sampled the 300 × 300 pixels cell images to 256 × 256 pixels (the closest
scaling, and partially resilient to varying illumination conditions. power of 2) for feature extraction. Masking was omitted for HOG due to
Speeded Up Robust Features (SURF) (Bay et al., 2008) is a faster variant implementation-specific limitations. Given these exceptions, we overall
of SIFT, and also consists of a keypoint detector and a local feature evaluate twelve feature combinations.
descriptor. However, the detector part of SURF is not invariant to affine
transformations. In initial experiments, we were not able to successfully
use the keypoint detectors of SIFT and SURF, because the keypoint 3.1.4. Encoding
detector at times failed to detect features in relatively homogeneous The computed features are encoded into a global feature descriptor.
monocrystalline cell images, and hence we used only the descriptor The purpose of encoding is the formation of a single, fixed-length global
parts. descriptor from multiple local descriptors. Encoding is commonly re-
KAZE (Alcantarilla et al., 2012) is a multiscale feature detector and presented as a histogram that draws its statistics from a background
descriptor. The keypoint detection algorithm is very similar to SIFT, model. To this end, we employ Vectors of Locally Aggregated
except that the linear Gaussian scale space used by SIFT is replaced by Descriptors (VLAD) (Jégou et al., 2012), which offers a compact state-
nonlinear diffusion filtering. For feature description, however, KAZE of-the-art representation (Peng et al., 2015). VLAD encoding is some-
uses the SURF descriptor. times also used for deep learning based features in classification,
We also investigated Adaptive and Generic Accelerated Segment identification and retrieval tasks (Gong et al., 2014; Ng et al., 2015;
Test (AGAST) (Mair et al., 2010) as a dedicated keypoint detector Paulin et al., 2016; Christlein et al., 2017).
without descriptor. It is based on a random forest classifier trained on a The VLAD dictionary is created by k-means clustering of a random
set of corner features that is known as Features from Accelerated Seg- subset of feature descriptors from the training set. For performance
ment Test (FAST) (Rosten and Drummond, 2005, 2006). reasons, we use the fast mini-batch variant (Sculley, 2010) of k-means.
Among the dedicated descriptors, Pyramid Histogram of Visual The cluster centroids µk correspond to anchor points of the dictionary.
Words (PHOW) (Bosch et al., 2007) is an extension of SIFT that com- Afterwards, first order statistics are aggregated as a sum of residuals of
putes SIFT descriptors densely over a uniformly spaced grid. We use the all descriptors X {xt d|t = 1, …, T } extracted from a solar cell
implementation variant from VLFEAT (Vedaldi and Fulkerson, 2008). image. The residuals are computed with respect to their nearest anchor
Similarly, Histogram of Oriented Gradients (HOG) (Dalal and Triggs, point µk in the dictionary D {µk d|k = 1, …, K } as
Group (VGG) descriptor trained end-to-end using an efficient optimi- k k (xt )(x t µk )
(1)
zation method (Simonyan et al., 2014). In our implementation, we
t=1
459
S. Deitsch, et al. Solar Energy 185 (2019) 455–468
1 if k = arg min x µj 2 C is determined from a slightly smaller set CRBF {10 k|k = 2, …, 6} . The
k (x )
j = 1, … , K , search space of the kernel coefficient is constrained to
0 otherwise (2) {10 7, 10 6, S 1} [0, 1], where S denotes the number of training
samples.
which indicates whether x is the nearest neighbor of µk . The final VLAD
representation Kd
corresponds to the concatenation of all residual
terms (1) into a Kd-dimensional vector:
3.2. Regression using a deep convolutional neural network
( 1 , …, K) . (3)
We considered several strategies to train the CNN. Given the limited
Several normalization steps are required to make the VLAD de- amount of data we had at our disposal, best results were achieved by
scriptor robust. Power normalization addresses issues when some local means of transfer learning. We utilized the VGG-19 network architecture
descriptors occur more frequently than others. Here, each element of (Simonyan and Zisserman, 2015) originally trained on the IMAGENET
the global descriptor vi is normalized as dataset (Deng et al., 2009) using 1.28 million images and 1000 classes.
vi sign(vi )|vi | , i = 1, …, Kd (4) We then refined the network using our dataset.
We replaced the two fully connected layers of VGG-19 by a Global
where we chose = 0.5 as a typical value from the literature. After Average Pooling (GAP) (Lin et al., 2013) and two fully connected layers
power normalization, the vector is normalized such that its 2 -norm with 4096 and 2048 neurons, respectively (cf., Fig. 5). The GAP layer is
equals one. used to make the VGG-19 network input tensor (224 × 224 × 3) com-
Similarly, an over-counting of co-occurrences can occur if at least patible to the resolution of our solar cell image samples (300 × 300 × 3),
two descriptors appear together frequently. Jégou and Ondřej (2012) in order to avoid additional downsampling of the samples. The output
showed that Principal Component Analysis (PCA) whitening effectively layer consists of a single neuron that outputs the defect probability of a
eliminates such co-occurrences and additionally decorrelates the data. cell. The CNN is refined by minimizing the Mean Squared Error (MSE)
To enhance the robustness of the codebook D against potentially loss function. Hereby, we essentially train a deep regression network,
suboptimal solutions from the probabilistic k-means clustering, we which allows us to predict (continuous) defect probabilities trained
compute five VLAD representations from different training subsets using only two defect likelihood categories (functional and defective).
using different random seeds. Afterwards, the concatenation of the By rounding the predicted continuous probability to the nearest
VLAD encodings ( 1 , …, m ) mKd is jointly decorrelated and
neighbor of the four original classes, we can directly compare CNN
whitened by means of PCA (Kessy et al., 2016). The transformed re- decisions against the original ground truth labels without binarizing
presentation is again normalized such that its 2 -norm equals one and them.
the result is eventually passed to the SVM classifier. Data augmentation is used to generate additional, slightly perturbed
training samples. The augmentation variability, however, is kept
3.1.5. Support vector machine training moderate, since the segmented cells vary only by few pixels along the
We trained SVMs both with a linear and a Radial Basis Function translational axes, and few degrees along the axis of rotation. The
(RBF) kernel. For the linear kernel, we use LIBLINEAR (Fan et al., 2008), training samples are scaled by at most 2% of the original resolution.
which is optimized for linear classification tasks and large datasets. For The rotation range is capped to ± 3°. The translation is limited to ± 2%
the non-linear RBF kernel, we use LIBSVM (Chang and Lin, 2011). of the cell dimensions. We also use random flips along the vertical and
The SVM hyperparameters are determined by evaluating the horizontal axes. Since the busbars can be laid out both vertically and
average F1 score (van Rijsbergen, 1979) in an inner fivefold cross-va- horizontally, we additionally include training samples rotated by ex-
lidation on the training set using a grid search. For the linear SVM, we actly 90°. The rotated samples are augmented the same way as de-
employ the 2 penalty on a squared hinge loss. The penalty parameter C scribed above.
is selected from a set of powers of ten, i.e., We fine-tune the pretrained IMAGENET model on our data to adapt the
Clinear {10 k|k = 2, …, 6} > 0 . For RBF SVMs, the penalty parameter CNN to the new task, similar to Girshick et al. (2014). We, however, do
Fig. 5. Architecture of the modified VGG-19 network used for prediction of defect probability in 300 × 300 pixels EL images of solar cells. Boldface denotes layers that
deviate from VGG-19.
460
S. Deitsch, et al. Solar Energy 185 (2019) 455–468
this in two stages. First, we train only the fully connected layers with Table 2
randomly initialized weights while keeping the weights of the con- Partitioning of the solar cells into functional and defective, with an additional
volutional blocks fixed. Here, we employ the ADAM optimizer (Kingma self-assessment on the rater’s confidence after visual inspection. Non-confident
and Ba, 2014) with a learning rate of 10 3 , exponential decay rates decisions obtain a weight lower than 100% for the evaluation of the classifier
performance.
1 = 0.9 and 2 = 0.999 , and the regularization value = 10 8 . In the
second step, we refine the weights of all layers. At this stage, we use the Condition Confident? Label p Weight w
Stochastic Gradient Descent (SGD) optimizer with a learning rate of
functional functional 100%
5·10 4 and a momentum of 0.9. We observed that fine-tuning the CNN
defective 33%
in several stages by subsequently increasing the number of hy-
perparameters slightly improves the generalization ability of the re- defective defective 100%
sulting model compared to a single refinement step. defective 67%
In both stages, we process the augmented versions of the 1968
training samples in mini-batches of 16 samples on two NVIDIA GeForce
GTX 1080, and run the training procedure for a maximum of 100 the ground truth. Instead, the extracted cells were presented in random
epochs. This totals to 196800 augmented variations of the original 1968 order to a recognized expert, who is familiar with intricate details of
training samples that are used to refine the network. For the im- different defects in EL images. The criteria for such failures are sum-
plementation of the deep regression network, we use KERAS version 2.0 marized by Köntges et al. (2014). In their failure categorization, the
(Chollet et al., 2015) with TENSORFLOW version 1.4 (Abadi et al., 2015) in expert focused specifically on defects with known power loss above 3%
the backend. from the initial power output. The expert answered the questions (2) is
the cell functional or defective? (2) are you confident in your assessment?
4. Evaluation The assessments into functional and defective cells by a confident rater
were directly used as labels. Non-confident assessments of functional
For the quantitative evaluation, we first evaluate different feature and defective cells were all labeled as defective. To reflect the rater’s
descriptors extracted densely over a grid. Then, we compare the best uncertainty, lower weights are assigned to these assessments, namely a
configurations against feature descriptors extracted at automatically weight of 33% to a non-confident assessment of functional cell, and a
detected keypoints to determine the best performing variation of the weight of 67% to a non-confident assessment of defective cell. Table 2
SVM classification pipeline. Finally, we compare the latter against the shows this in summary, with the rater assessment on the left, and the
proposed deep CNN, and visualize the internal feature mapping of the associated classification labels and weights on the right. Table 3 shows
CNN. the distribution of ground truth solar cell labels, separated by the type
of the source PV module.
We used 25% of the labeled cells (656 cells) for testing, and the
4.1. Dataset
remaining 75% (1968 cells) for training. Stratified sampling was used
to randomly split the samples while retaining the distribution of sam-
We propose a public dataset1 of solar cells extracted from high re-
ples within different classes in the training and the test sets. To further
solution EL images of monocrystalline and polycrystalline PV modules
balance the training set, we weight the classes using the inverse pro-
(Buerhop-Lutz et al., 2018). The dataset consists of 2624 solar cell
portion heuristic derived from King and Zeng (2001)
images at a resolution of 300 × 300 pixels originally extracted from 44
different PV modules, where 18 modules are of monocrystalline type, cj
S
,
and 26 are of polycrystalline type. 2nj (5)
The images of PV modules used to extract the individual solar cell
where S is the total number of training samples, and nj is the number of
samples were taken in a manufacturing setting. Such controlled con-
functional ( j = 0 ) or defective ( j = 1) samples.
ditions enable a certain degree of control on quality of imaged panels
and allow to minimize negative effects on image quality, such as
4.2. Dense sampling
overexposure. Controlled conditions are also required particularly be-
cause background irradiation can predominate EL irradiation. Given PV
In this experiment, we evaluate different grid sizes for subdividing a
modules emit the only light during acquisition performed in a dark
single 300 × 300 pixels cell image. The number of grid points per cell is
room, it can be ensured the images are uniformly illuminated. This is
varied between 5 × 5 to 75 × 75 points. At each grid point, SIFT, SURF,
opposed to image acquisition in general structural health monitoring,
and VGG descriptors are computed. The remaining two descriptors,
which introduces additional degrees of freedom where images can
PHOW and HOG, are omitted in this experiment, because they do not
suffer from shadows or spot lighting (Cha et al., 2017). An important
allow to arbitrarily specify the position for descriptor computation.
issue in EL imaging, however, can be considered blurry (i.e., out-of-
Note that at a 75 × 75 point grid, the distance between two grid points
focus) EL images due to incorrectly focused lens which can be at times
is only 4 pixels, which leads to a significant overlap between neigh-
challenging to attain. Therefore, we ensured to include such images in
bored descriptors. Therefore, further increase of the grid resolution
the proposed dataset (cf., Fig. 1 for an example).
cannot be expected to considerably improve the classification results.
The solar cells exhibit intrinsic and extrinsic defects commonly oc-
The goal of this experiment is to find the best performing combi-
curring in mono- and polycrystalline solar modules. In particular, the
nation of grid size and classifier. We trained both linear SVMs and SVMs
dataset includes microcracks and cells with electrically separated and
with the RBF kernel. For each classifier, we also examine two additional
degraded parts, short-circuited cells, open inter-connects, and soldering
options, namely whether the addition of the sample weights w (cf.
failures. These cell defects are widely known to negatively influence
Table 2) or masking out the background region (cf. Section 3.1.1) im-
efficiency, reliability, and durability of solar modules. Finger inter-
proves the classifiers.
ruptions are excluded since the power loss caused by such defects is
Performance is measured using the F1 score, which is the harmonic
typically negligible.
mean of precision and recall. Fig. 6 shows the F1 scores that are aver-
Measurements of power degradation were not available to provide
aged over the individual per-class F1 scores. From left to right, these
scores are shown for the SURF descriptor (Fig. 6(a)), SIFT descriptor
1
The solar cell dataset is available at https://github.com/zae-bayern/elpv- (Fig. 6(b)) and VGG descriptor (Fig. 6(c)). Here, the VGG descriptor
dataset. achieves the highest score on a grid of size 65 × 65 using a linear SVM
461
S. Deitsch, et al. Solar Energy 185 (2019) 455–468
Fig. 6. Classification performance for different dense sampling configurations in terms of F1 score grouped by the feature descriptor, classifier, weighting strategy,
and the use of masking. The highest F1 score is achieved using a linear SVM and the VGG feature descriptor at a grid resolution of 65 × 65 cells with sample weighting
and masking ( ) (c).
462
S. Deitsch, et al. Solar Energy 185 (2019) 455–468
Fig. 7. Receiver Operating Characteristic (ROC) for top performing feature detector/extractor combinations grouped by mono-, polycrystalline, and both solar
module types combined. The dashed curve (—) represents the baseline in terms of a random classifier. Note the logarithmic scale of the false positive rate axis. Refer
to the text for details.
Fig. 8. ROC curves of the best performing KAZE/VGG feature detector/descriptor combination ( ) compared to the ROC of the deep regression network ( ).
While in the monocrystalline case (a) the classification performance of the CNN is almost on par with the linear SVM. For polycrystalline PV modules (b) the CNN
considerably outperforms SVM with the linear kernel trained on KAZE/VGG features. The latter outcome leads to a higher CNN ROC Area Under the Curve (AUC) for
both PV modules types combined (c). The dashed curve ( ) represents the baseline in terms of a ra.ndom classifier.
abnormalities on homogeneous surfaces almost as accurate as a CNN negatives and true positives) in each category on its primary diagonal.
trained on image pixels directly. The secondary diagonal provides the proportion of incorrectly identi-
For polycrystalline PV modules, the CNN is able to predict defective fied solar cells (false negatives and false positives) with respect to the
solar cells almost 11% more accurately than the SVM in terms of the other category.
AUC. This is also clearly a more difficult test due to the large variety of Fig. 9 shows the confusion matrices for the proposed models. The
textures among the solar cells. confusion matrices are given for each type of solar wafers, and their
Overall, the CNN outperforms the SVM. However, the performances combination. The vertical axis of a confusion matrices specifies the
of both classifiers differ in total by only about 6%. The SVM classifier expected (i.e., ground truth) labels, whereas the horizontal one the la-
can therefore also be useful for a quick, on-the-spot assessment of a PV bels predicted by the corresponding model. Here, the predictions of the
module in situations where specialized hardware for a CNN is not CNN were thresholded at 50% to produce the two categories of func-
available. tional (0%) and defective (100%) solar cells.
In regard to monocrystalline PV modules, the confusion matrices in
4.5. Model performance per defect category Fig. 9(a) and (d) underline that both models provide comparable clas-
sification results. The linear SVM, however, is able to identify more
Here, we provide a detailed report of the performance of proposed defective cells correctly than the CNN at the expense of functional cells
models with respect to individual categories of solar cells (i.e., defective being identified as defective (false negatives). To this end, the linear
and functional) in terms of confusion matrices. The two dimensional SVM makes also less errors at identifying defective solar cells as being
confusion matrix stores the proportion of correctly identified cells (true intact (false positives).
463
S. Deitsch, et al. Solar Energy 185 (2019) 455–468
Fig. 9. Confusion matrices for the proposed classification models. Each row of confusion matrices stores the relative frequency of instances in the expected defect
likelihood categories. The columns, on the other hand, contain the relative frequency of instances of predictions made by the classification models. Ideally, only the
diagonals of confusion matrices would contain non-zero entries which corresponds to perfect agreement in all categories between the ground truth and the clas-
sification model. The CNN generally makes less prediction errors than an SVM trained on KAZE/VGG features.
In polycrystalline case given by Fig. 9(b) and (e), the CNN clearly benefit from additional training data. In order to examine how the
outperforms the linear SVM in every category. This also leads to overall proposed models improve if more training samples are used, we eval-
better performance of the CNN in both cases, as evidenced in Fig. 9(c) uate their performance on subsets of original training samples since no
and (f). additional training samples are available.
To a infer the performance trend, we evaluate the models on three
differently sized subsets of original trainings samples. We used 25%,
4.6. Impact of training dataset size on model performance 50% and 75% of original training samples. To avoid a bias in the ob-
tained metrics, we not only sample the subsets randomly but also
For training both the linear SVM and the CNN a relatively small sample each subset 50 times to obtain the samples used to train the
dataset of unique solar cell images was used. Given that typical PV models. We additionally use stratified sampling to retain the distribu-
module production lines have an output of 1500 modules per day tion of labels from the original set of training samples. To evaluate the
containing around 90,000 solar cells, models can be expected to greatly
Fig. 10. Performance of the proposed models trained on subsets of original training samples. The results are grouped by the solar wafer type (left two columns) and
the combination of both wafer types (last column). The first three plots in the top row show the distribution of evaluated metrics as boxplots for the linear SVM
trained using KAZE/VGG features. The bottom row shows the results for the CNN. The horizontal lines specify the reference scores with respect to the F1 measure
( ), ROC AUC ( ), and the accuracy ( ) of the proposed models trained on 100% of training samples. The circles ( ) denote outliers in the distribution of
evaluated metrics given by each boxplot. Increasing the number of training samples directly improves the performance of both models. The improvement is
approximately logarithmic with respect to the number of t.raining samples.
464
S. Deitsch, et al. Solar Energy 185 (2019) 455–468
Fig. 12. Qualitative results of predictions made by the proposed CNN with correctly classified solar cell images (a) and missclassifications (b). Each column is labeled
using the ground truth label. Red shaded probabilities above each solar cell image correspond to predictions made by the CNN. The upper two rows correspond to
monocrystalline solar cells and bottom two rows to polycrystalline solar cell images.
465
S. Deitsch, et al. Solar Energy 185 (2019) 455–468
Fig. 13. Qualitative defect classification results in a PV module previously not seen by the deep regression network. The red shaded circles in the top right corner of
each solar cell specify the ground truth labels. The solar cells are additionally overlaid by CAMs determined using Grad-CAM++ (Chattopadhay et al., 2018). The
CAM for individual solar cells was additionally weighted by network’s predictions to reduce the clutter. Notably, the network pays attention to very specific defects
(such as fine cell cracks) that are harder to identify than cell cracks which a.re more obvious.
466
S. Deitsch, et al. Solar Energy 185 (2019) 455–468
cells are overlaid by CAMs and additionally weighted by network’s defective solar cells with an accuracy of 88.42%. The corresponding F1
predictions to reduce the amount of visual clutter. By inspecting the score is 88.39%. The 2-dimensional visualization of the CNN feature
CAMs it can be observed that the CNN focuses on particularly unique distribution via t-SNE underlines that the network learns the actual
defects within solar cells that are harder to identify than more obvious structure of the task at hand.
defects such as degraded or electrically insulated cell parts (appearing A limitation of the proposed method is that each solar cell is ex-
as dark regions) in the same cell. amined independently. In particular, some types of surface abnormal-
ities that do not affect the module efficiency can appear in repetitive
4.9. Runtime evaluation patterns across cells. Accurate classification of such larger-scale effects
requires to take context into consideration, which is subject to future
Here, we evaluate the time taken by each step of the SVM pipeline work.
and by the CNN, both during training and testing. The runtime is Instead of predicting the defect likelihood one may want to predict
evaluated on a system running an Intel i7-3770 K CPU clocked at specific defect types. Given additional training data, the methodology
3.50 GHz with 32 GB of RAM. The results are summarized in Fig. 14. presented in this work can be applied without major changes (e.g., by
Unsurprisingly, training takes most of the time for both models. fine-tuning to the new defect categories) given additional training data
While training the SVM takes in total around 30 min. Refining the CNN with appropriate labels. Fine-tuning the network to multiple defect
is almost ten times slower and takes around 5 h. However, inference categories with the goal of predicting defect types instead of their
using CNN is much faster than that of the SVM pipeline and takes just probabilities, however, will generally affect the choice of the loss
under 20 s over 8 min of the SVM. It is, however, important to note that function and consequently the number of neurons in the last activation
the SVM pipeline inference duration is reported for the execution on the layer. A common choice for the loss function for such tasks is the (ca-
CPU, whereas the duration of the much faster CNN inference is ob- tegorical) cross entropy loss with softmax activation (Goodfellow et al.,
tained on the GPU only. Additionally, only a part of the SVM pipeline 2016).
performs the processing in parallel. When running the highly parallel
CNN inference on the CPU, the test time increases considerably to over 5. Conclusions
12 min. Consequently, training the CNN on the CPU becomes in-
tractable and we therefore refrained from measuring the corresponding We presented a general framework for training an SVM and a CNN
runtime. that can be employed for identifying defective solar cells in high re-
Considering the relative contributions of individual SVM pipeline solution EL images. The processing pipeline for the SVM classifier is
steps, feature extraction is most time-consuming, followed by encoding carefully designed. In a series of experiments, the best performing pi-
of local features and clustering (cf., Fig. 15). Preprocessing of features peline is determined as KAZE/VGG features in a linear SVM trained on
and hyperparameter optimization require the least. samples that take the confidence of the labeler into consideration. The
In applications that require not only a low resource footprint but CNN network is a fine-tuned regression network based on VGG-19,
also must run fast, the total execution time of the SVM pipeline can be trained on augmented cell images that also consider the labeler con-
reduced by replacing the VGG feature descriptor either by SIFT or fidence.
PHOW. Both feature descriptors substantially reduce the time taken for On monocrystalline solar modules, both classifiers perform similarly
feature extraction during inference from originally 8 min to around 23 s well, with only a slight advantage on average for the CNN. However,
and 12 s, respectively while maintaining a classification performance the CNN classifier outperforms the SVM classifier by about 6% accuracy
similar to the VGG descriptor. on the more inhomogeneous polycrystalline cells. This leads also to the
better average accuracy across all cells of 88.42% for the CNN versus
82.44% for the SVM. The high accuracies make both classifiers useful
4.10. Discussion
for visual inspection. If the application scenario permits the usage of
GPUs and higher processing times, the computationally more expensive
Several conclusions can be drawn from the evaluation results. First,
CNN is preferred. Otherwise, the SVM classifier is a viable alternative
masking can be useful if the spatial distribution of keypoints is rather
for applications that require a low resource footprint.
sparse. However, in most cases masking does not improve the classifi-
cation accuracy. Secondly, weighting samples proportionally to the
Acknowledgments
confidence of the defect likelihood in a cell does improve the general-
ization ability of the learned classifiers.
This work was funded by Energy Campus Nuremberg (EnCN) and
KAZE/VGG features trained using linear SVM is the best performing
partially supported by the Research Training Group 1773
SVM pipeline variant with an accuracy of 82.44% and an F1 score of
“Heterogeneous Image Systems” funded by the German Research
82.52%. The CNN is even more accurate. It distinguishes functional and
Foundation (DFG).
References
Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis,
A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G., Isard, M.,
Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M., Levenberg, J., Mané, D., Monga, R.,
Moore, S., Murray, D., Olah, C., Schuster, M., Shlens, J., Steiner, B., Sutskever, I.,
Talwar, K., Tucker, P., Vanhoucke, V., Vasudevan, V., Viégas, F., Vinyals, O., Warden,
P., Wattenberg, M., Wicke, M., Yu, Y., Zheng, X., 2015. TensorFlow: Large-scale
machine learning on heterogeneous systems. URL: https://www.tensorflow.org.
Alcantarilla, P.F., Bartoli, A., Davison, A.J., 2012. KAZE features. In: European
Conference on Computer Vision (ECCV). Lect. Notes Comput. Sci., vol. 7577. pp.
214–227. https://doi.org/10.1007/978-3-642-33783-3_16.
Anwar, S.A., Abdullah, M.Z., 2014. Micro-crack detection of multicrystalline solar cells
featuring an improved anisotropic diffusion filter and image segmentation technique.
EURASIP J. Image Video Process. 2014, 15. https://doi.org/10.1186/1687-5281-
Fig. 15. Relative runtime contributions to training and test phases of the SVM 2014-15.
pipeline. The most time-demanding step during SVM training and inference is Bay, H., Essa, A., Tuytelaarsb, T., Van Goola, L., 2008. Speeded-up robust features
the detection and extraction of KAZE/VGG features. (SURF). Comput. Vis. Image Underst. 110, 346–359. https://doi.org/10.1016/j.cviu.
467
S. Deitsch, et al. Solar Energy 185 (2019) 455–468
468