0% found this document useful (0 votes)
10 views13 pages

J L, Y Y, C S, Y W, R E: Twocolumn

The document presents CAT-PUMA, a new machine learning tool designed for predicting the arrival times of Coronal Mass Ejections (CMEs) with improved accuracy and speed. Utilizing data from 182 geo-effective CMEs, the tool achieves a mean absolute prediction error of approximately 5.9 hours, outperforming existing models in accuracy for 77% of tested events. The paper includes a practical guide and source code for the community to implement CAT-PUMA for their own CME predictions.

Uploaded by

Agus Saefulloh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views13 pages

J L, Y Y, C S, Y W, R E: Twocolumn

The document presents CAT-PUMA, a new machine learning tool designed for predicting the arrival times of Coronal Mass Ejections (CMEs) with improved accuracy and speed. Utilizing data from 182 geo-effective CMEs, the tool achieves a mean absolute prediction error of approximately 5.9 hours, outperforming existing models in accuracy for 77% of tested events. The paper includes a practical guide and source code for the community to implement CAT-PUMA for their own CME predictions.

Uploaded by

Agus Saefulloh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

D RAFT VERSION O CTOBER 25, 2021

Typeset using LATEX twocolumn style in AASTeX61

A NEW TOOL FOR CME ARRIVAL TIME PREDICTION USING MACHINE LEARNING ALGORITHMS: CAT-PUMA

J IAJIA L IU ,1 Y UDONG Y E ,2, 3 C HENGLONG S HEN ,4, 5 Y UMING WANG ,4, 5 AND ROBERT E RDÉLYI1, 6
arXiv:1802.02803v1 [astro-ph.SR] 8 Feb 2018

1 Solar Physics and Space Plasma Research Center (SP2RC), School of Mathematics and Statistics, The University of Sheffield, Sheffield S3 7RH, UK
2 SIGMA Weather Group, State Key Laboratory of Space Weather, National Space Science Center, Chinese Academy of Sciences, Beijing 100190, China
3 University of Chinese Academy of Sciences, Beijing 100049, China
4 CAS Key Laboratory of Geospace Environment, Department of Geophysics and Planetary Sciences, University of Science and Technology of China, Hefei, Anhui
230026, China
5 Synergetic Innovation Center of Quantum Information & Quantum Physics, University of Science and Technology of China, Hefei, Anhui 230026, China
6 Department of Astronomy, Eötvös Loránd University, Budapest, Pázmány P. sétány 1/A, H-1117, Hungary

ABSTRACT
Coronal Mass Ejections (CMEs) are arguably the most violent eruptions in the Solar System. CMEs can cause severe distur-
bances in the interplanetary space and even affect human activities in many respects, causing damages to infrastructure and losses
of revenue. Fast and accurate prediction of CME arrival time is then vital to minimize the disruption CMEs may cause when
interacting with geospace. In this paper, we propose a new approach for partial-/full-halo CME Arrival Time Prediction Using
Machine learning Algorithms (CAT-PUMA). Via detailed analysis of the CME features and solar wind parameters, we build a
prediction engine taking advantage of 182 previously observed geo-effective partial-/full-halo CMEs and using algorithms of the
Support Vector Machine (SVM). We demonstrate that CAT-PUMA is accurate and fast. In particular, predictions after applying
CAT-PUMA to a test set, that is unknown to the engine, show a mean absolute prediction error ∼5.9 hours of the CME arrival
time, with 54% of the predictions having absolute errors less than 5.9 hours. Comparison with other models reveals that CAT-
PUMA has a more accurate prediction for 77% of the events investigated; and can be carried out very fast, i.e. within minutes
after providing the necessary input parameters of a CME. A practical guide containing the CAT-PUMA engine and the source
code of two examples are available in the Appendix, allowing the community to perform their own applications for prediction
using CAT-PUMA.

Keywords: Sun: coronal mass ejections (CMEs) — Sun: solar-terrestrial relations

[email protected]
2 J IAJIA L IU ET AL .

1. INTRODUCTION in the inner heliosphere at CMEs eruption, there are also


Coronal Mass Ejections (CMEs) are one of the two ma- several further effects that make it more complex and rather
jor eruptive phenomena (the other ones are flares) occur- challenging to predict CMEs’ arrival time, including, e.g.,
ring within the solar atmosphere with an effect on the he- the fact that CMEs may experience significant deflection
liosphere. CMEs leave the Sun at average speeds around while traveling in the interplanetary space (e.g. Wang et al.
500 km s−1 , carry a large amount of magnetized plasma 2004; Gui et al. 2011; Isavnin et al. 2014; Kay et al. 2015;
with an average mass 1015 g into the interplanetary space Zhuang et al. 2017) and that CMEs may interact with each
and carry a huge amount of kinetic energy, often of the or- other causing mering or acceleration/deceleration (e.g.,
der 1030 erg (for reviews, see e.g., Low 2001; Chen 2011; Wang et al. 2002a; Shen et al. 2012, 2013c; Mishra et al.
Webb & Howard 2012; Gopalswamy 2016, and references 2016; Lugaz et al. 2017).
therein). The following observational facts highlight some of Current models about the prediction of CME arrival time
the most important aspects why enormous attention has been may be classified into three types: empirical, drag-based
paid towards CMEs in the past several decades since their and physics-based (MHD) models (for a review, see e.g.,
first discovery (Hansen et al. 1971; Tousey 1973): 1) CMEs Zhao & Dryer 2014). Most empirical models use a set of ob-
are usually accompanied by some other dynamic, large- served CMEs to fit a simple relation (linear or parabolic) be-
scale phenomena including e.g. filament eruptions (e.g., tween observed CME speeds (and/or accelerations) and their
Jing et al. 2004; Wang et al. 2006; Liu et al. 2010a), flares transit times in the interplanetary space (e.g., Vandas et al.
(e.g., Harrison 1995; Qiu et al. 2004; Zhang et al. 2012), 1996; Wang et al. 2002b; Manoharan 2006; Schwenn et al.
magneto-hydrodynamic (MHD) waves (e.g., Chen et al. 2005; Xie et al. 2004). Vršnak & Žic (2007) took the ambi-
2005; Biesecker et al. 2002; Liu et al. 2010b), radio bursts ent solar wind speed into account in their empirical model,
(e.g., Jackson et al. 1978; Lantos et al. 1981; Shen et al. but still utilizing linear least-square fitting. The drag-based
2013a; Chen et al. 2014) and solar jets (e.g., Shen et al. models (DBMs) have an advantage over the empirical mod-
2012; Liu et al. 2015; Zheng et al. 2016). Combined stud- els that DBMs take into account the speed difference between
ies of CMEs and their accompanying phenomena could CMEs and their ambient solar wind, which may cause con-
improve our understanding of the physical processes tak- siderable acceleration or deceleration of CMEs (e.g., Vršnak
ing place in various regimes of the Sun. 2) MHD shocks 2001; Subramanian et al. 2012). On the other hand, DBMs
caused by CMEs could be employed to gain insight into the are based on a hydrodynamic (HD) approach and ignore the
characteristic properties of the plasma state in the interplan- potentially important role of magnetic field in the interac-
etary space (for reviews, see e.g., Vršnak & Cliver 2008). tion between CMEs and solar wind. Finally physics-based
3) CMEs occur with a range of rate of abundance both dur- (MHD) models (e.g., Smith & Dryer 1990; Dryer et al. 2001;
ing solar minimum and maximum (e.g., Gopalswamy et al. Moon et al. 2002; Tóth et al. 2005; Detman et al. 2006;
2003; Robbrecht et al. 2009), study of which may help us Feng & Zhao 2006; Feng et al. 2007; Riley et al. 2012, 2013)
in exploring the solar cycle and dynamo. 4) Shocks and are mostly utilizing (M)HD simulations employing observa-
often large amount of magnetic fluxes carried by CMEs tions as boundary/initial conditions in the models to perform
could cause severe disturbances in the Earth’s magnetosphere prediction of the transit times of CMEs. Though, considering
(e.g., Wang et al. 2003; Wang et al. 2007; Zhang et al. 2007; the complexity and less prediction errors of physics-based
Sharma et al. 2013; Chi et al. 2016) and further affect the (MHD) models, there are a few drawbacks, e.g., they are
operation and working of high-tech facilities like spacecraft, still highly idealized and may require extensive computa-
can cause disruption in functioning of modern communi- tional resources in terms of hardware and CPU time (e.g.,
cation systems (including radio, TV and mobile signals), Tóth et al. 2005). Complex or not, previous predictions give,
navigation systems, and affect the working of pipelines and on average, around 10-hour mean absolute errors on CME
high-voltage power grids. arrival times (see review, Zhao & Dryer 2014). Employing
Besides intensive efforts made towards a better under- 3D observations from the STEREO spacecraft, Mays et al.
standing of how CMEs are triggered (e.g., Gibson & Low (2013) reduced the mean absolute error to ∼ 8.2 hours pre-
1998; Antiochos et al. 1999; Lin & Forbes 2000; Forbes dicting the arrival time of 15 CMEs. Again using STEREO
2000), many studies have been focused on predicting the observations, but allowing only very short lead times (∼ 1
arrival (or transit) times of CMEs at the Earth, having con- day), Möstl et al. (2014) further enhanced the performance
sidered their potentials in largely affecting the Earth’s mag- for the arrival times to ∼ 6.1 hours after applying empirical
netosphere and outer atmosphere. This has become one of corrections to their models. A fast and accurate prediction
the most important contents of the so-called space weather with large lead time, using only one spacecraft, is therefore
forecasting efforts. However, despite of the lack of in-situ still much needed.
observations of the ambient solar wind and CME plasma
CAT- PUMA 3

In this paper, we propose a new approach to modeling the tion similar to that of the Richardson and Cane list of
partial-/full-halo CME Arrival Time Prediction Using Ma- 73 geo-effective CMEs and corresponding ICMEs from
chine learning Algorithms (CAT-PUMA). We will divide 182 2007 to 2017. It is available at http://solar.gmu.
geo-effective CMEs observed in the past two decades, i.e. edu/heliophysics/index.php/GMU_CME/ICME_
from 1996 to 2015, into two sets: namely, for training and List. We only select ICME events satisfying the follow-
for testing purposes, respectively. All inputs will be only ob- ing criterion: i) there are associated shocks and ii) multiple
servables. Without a priori assumption or underlying physi- CMEs are not involved. After implementing the selection
cal theory, our method gives a mean absolute prediction er- criteria, 38 events are obtained from this list.
ror, around as little as 6 hours. Details on data mining are 4. The CME Scoreboard developed at the Community Co-
in Sec. 2. Overview of the employed machine learning al- ordinated Modeling Center (CCMC), NASA. It is a web-
gorithms and the implemented training process are described site allowing the community to submit and view the ac-
in Sec. 3. Results and comparison with previous prediction tual and predicted arrival time of CMEs from 2013 to the
models are discussed in Sec. 4. We summarize in Sec. 5. present (https://kauai.ccmc.gsfc.nasa.gov/
A practical guide on how to perform predictions with CAT- CMEscoreboard/). For our analysis, we remove those
PUMA is presented in Appendix A. events that did not interact with the Earth and those that have
a “note”. Event was labeled with a “note” because, e.g., that
2. DATA MINING the target CME did not arrive at Earth, or there was some un-
certainty in measuring the shock arrival time, or there were
To build a suitable set of input for the machine learn-
multiple CME events. Here, we obtained 134 CME events
ing algorithms, the first step of our data mining is to con-
from this list.
struct a list of CMEs that have eventually arrived at Earth
Combining all four lists, we obtain eventually 382 geo-
and have also caused disturbances to the terrestrial magnetic
effective CME events via data-mining. However, there are
field. Such CMEs are usually called geo-effective CMEs.
overlaps between these lists. To remove duplicates, we re-
We defined four different Python crawlers to automatically
move one of such pairs if two CMEs have onset times with a
gather the onset time, which is usually defined as the first ap-
difference less than 1 hour. 90 events are therefore removed.
pearance in the Field-of-View (FOV) of SOHO LASCO C2
The SOHO LASCO CME Catalog (https://cdaw.
(Brueckner et al. 1995), and the arrival time of CMEs, which
gsfc.nasa.gov/CME_list/) provides a database of
represents the arrival time of interplanetary shocks driven by
all CMEs observed by SOHO LASCO from 1996 to 2016
CMEs hereafter, from the following four lists:
(Gopalswamy et al. 2009). Via matching the onset time of
1. The Richardson and Cane List (Richardson & Cane 2010).
CMEs in our list with the onset time of CMEs recorded in the
The list is available at http://www.srl.caltech.
SOHO LASCO CME Catalog, we obtain various parameters
edu/ACE/ASC/DATA/level3/icmetable2.htm
of them including the angular width, average speed, accel-
and contains various parameters, including the average
eration, final speed in the FOV of LASCO, estimated mass
speed, magnetic field, associated DST index of more than
and main position angle (MPA, corresponding to the posi-
500 Interplanetary CMEs (ICMEs) from 1996 to 2006 and
tion angle of the fastest moving part of the CME’s leading
the onset time of their associated CMEs if observed. We dis-
edge). The location of the source region of full halo CMEs
card events with no or ambiguously associated CMEs, and
can be obtained from the SOHO/LASCO Halo CME Cat-
obtain the onset and arrival time of 186 geo-effective CMEs
alog (https://cdaw.gsfc.nasa.gov/CME_list/
from this list.
halo/halo.html). CMEs that have no source-region in-
2. List of Full Halo CMEs provide by the Research Group
formation in the above catalog are further investigated man-
on Solar-TErrestrial Physics (STEP) at University of Sci-
ually, one-by-one, to determine their source region location.
ence and Technology of China (USTC) (Shen et al. 2013b).
Further, events from our compiled list are further removed
A Full halo CME is defined when its angular width ob-
if they have: i) angular width less than 90◦ ; ii) no available
served by SOHO LASCO is 360◦ . This list is available at
mass estimation; or iii) ambiguous source region location.
http://space.ustc.edu.cn/dreams/fhcmes/
Finally, two CMEs at 2003-10-29 20:54 UT and 2011-10-27
index.php and provides the 3D direction, angular width,
12:12 UT are also removed because the first one has incor-
real and projected velocity of 49 CMEs from 2009 to 2012,
rect velocity and acceleration estimation; and, the second
and the arrival time of their associated shocks if observed.
one erupted with more than a dozen CMEs during that day.
Events without observation of the associated interplanetary
Eventually, after applying all the above selection criteria,
shocks are removed. The onset and arrival times of 24 geo-
we obtain a list of 182 events containing geo-effective CMEs
effective CMEs are obtained from this list.
from 1996 to 2015, of which 56 are partial-halo CMEs and
3. The George Mason University (GMU) CME/ICME
List (Hess & Zhang 2017). This list contains informa-
4 J IAJIA L IU ET AL .

126 are halo CMEs. The average speed of these CMEs FOV
ranges from 400 km s−1 to 1500 km s−1 in the LASCO FOV.
y=f(x)+ε
3. OPTIMIZATION
f(x)=ωx+b
One of the most popular machine learning algorithms ε
is the Support Vector Machine algorithm (SVM). It is a
set of supervised learning methods for classification, re-
gression and outliers detection. The original SVMs were y
linear (see the review Smola & Schölkopf 2004), though
SVMs are also suitable for conducting nonlinear analysis via
mapping input parameters into higher dimensional spaces
with different kernel functions. An implementation of the y=f(x)-ε
SVM has been integrated into the Python scikit-learn library
ε
(Pedregosa et al. 2011), with an open-source access and well-
established documentation (http://scikit-learn.
org/stable/). According to the scikit-learn documenta-
tion, major advantages of the SVM are that it is: 1) effective x
in high dimensional spaces, 2) still effective even if the num-
ber of dimensions is greater than the number of samples, and Figure 1. An example of the SVM regression in a simple two-
dimensional, linear and hard-margin problem. Adopted from Fig.
3) memory efficient. Besides, it is particularly well-suited
5-10 in Géron (2017).
for small- or medium-sized datasets (Géron 2017).
Recent works utilizing machine learning algorithms have
The solution for the above two-dimensional, linear
been mainly focused on solar flare prediction, CME pro-
and hard-margin problem can be extended into multi-
ductivity and solar feature identification using classifica-
dimensional, linear and soft-margin problems. In this case,
tion methods (e.g., Li et al. 2007; Qahwaji & Colak 2007;
the target for the SVM regression is to:
Ahmed et al. 2013; Bobra & Couvidat 2015; Bobra & Ilonidis
2016; Nishizuka et al. 2017) or multi-labeling algorithms
l
(e.g. Yang et al. 2017). However, to the best of our knowl- 1 2
X
edge, the SVM regression algorithm which is suitable for minimize ||ω|| + C (ξi + ξi∗ ),
2
1
a wide range of solar/space physics research such as solar 
cycle prediction, DST index prediction and active region (2)
yi − hω, xi i − b ≤ ǫ + ξi ,


occurrence prediction has not yet been widely used by the subject to hω, xi i + b − yi ≤ ǫ + ξi ,
solar/space physics community. Further, no previous study 

has attempted to employ the SVM regression algorithm in ξi , ξi∗ ≥ 0, i = 1, 2, 3...l,

the context of applying it to the prediction of CME arrival


time. where, xi = (x1i , x2i ...xni ) is a n-dimensional vector with n the
number of features, i ∈ [1, l], ||ω|| is the norm of ω, hω, xi i
3.1. Brief Re-cap of SVM Regression is the dot product between ω and xi , ξi , ξi∗ are the introduced
To make it simple and clear, we first briefly explain the slack variables to perform the feasible constrains for the soft
SVM regression algorithm by demonstrating its capabilities margins (Vapnik 2013; Smola & Schölkopf 2004). The reg-
with a simple two-dimensional, linear and hard-margin prob- ularization factor C > 0 is introduced to trade-off the amount
lem. Let us suppose, there is an input set x = (x1 , x2 , x3 ...xl ) up to which deviations larger than ǫ are tolerated. A larger
and a corresponding known result y = (y1 , y2 , y3 ...yl ) where l value of C indicates a lower tolerance on errors.
is the number of data points. The basic idea of SVM regres- To extend the solution to be suitable for non-linear prob-
sion is to find a function, lems, we map the original non-linear n-dimensional input x
into a higher-dimensional space φ(x), in which the problem
f (x) = ωx + b, (1) might be linear. φ(x) then replaces x in Eq. 2. The most com-
mon way to map x into φ(x) is using kernel functions. One of
where, f (x) has at most ǫ (> 0) deviation from the actual
the most frequently used kernels is the Radial Basis Function
result yi for all xi (as shown in Fig. 1). Points at the mar-
(RBF) kernel:
gins (green dots with black edge) are then called the “sup-
port vectors". New observation xl+1 can therefore be taken
2
into Eq. (1) to yield a prediction for its unknown result yl+1 . K(xi , x j ) = exp (−γ||xi − x j || ), (3)
CAT- PUMA 5

Ranking of CME & Solar Wind Features


CME Average Speed
CME Final Speed
CME Angular Width
CME Mass
Solar Wind Bz
Solar Wind Temperature
Solar Wind Speed
Solar Wind Pressure
Solar Wind Longitude
Solar Wind He Proton Ratio
Solar Wind Bx
CME Position Angle
Solar Wind Density
Solar Wind Plasma Beta
Solar Wind Latitude
CME Source Region Longitude
CME Source Region Latitude
Solar Wind By

0.0 0.2 0.4 0.6 0.8 1.0


Normalized F-Score
Figure 2. Normalized F-scores of all 18 CME and Solar Wind features with m = 6 hours. The vertical dashed line indicates a normalized
F-score of 0.01.
2
where, ||xi − x j || is the squared Euclidean distance between alpha to proton ratio, flow latitude (North/South direction),
the two data points. Here, γ > 0 defines the area of a sin- flow longitude (East/West direction), plasma beta, pressure,
gle point can influence. A larger γ indicates less influence speed and proton temperature, are downloaded from the OM-
of a point on its neighbors. The description on the SVM re- NIWeb Plus (https://omniweb.gsfc.nasa.gov/).
gression algorithm above is highly abbreviated. More details Together with suitable CME parameters including CME av-
can be found in e.g. Smola & Schölkopf (2004) and Vapnik erage speed, acceleration, angular width, final speed, mass,
(2013). MPA, source region latitude and source region longitude de-
Besides C and γ, another important variable m will be in- scribed in Sec. 2, we have in total 19 (n = 19) features in the
troduced in the rest of this section. The definition of m is input x space.
given at the beginning of Sec. 3.2. Processes determining the However, some of the above features might be important
value of m employed in building the prediction engine are in determining the CME transit time, while some might be
detailed in Sec. 3.3. Optimization on the selection of param- irrelevant and unnecessary. Firstly, the CME acceleration is
eters C and γ are presented in Sec. 3.4. removed from the feature space because it is not indepen-
dent and basically determined by the CME average speed
3.2. Feature Selection and final speed. To determine the importance of the rest of
Employing the SVM regression algorithms to make pre- the features, following Bobra & Ilonidis (2016) but for re-
gression in this case, we use a univariate feature selection
dictions of CME arrival time, we take the 182 vectors, of
tool (sklearn.feature_selection.SelectKBest) implemented in
which each contains n parameters of the CME and corre-
sponding solar wind plasma, as x and their actual transit time the Python scikit-learn library to test the F-score of every
as y. Because, currently it is not feasible to determine the individual feature. For feature k ∈ [1, n], xk is a vector with
length of l. The correlation between xk and y, and the F-score
actual background solar wind plasma where a CME is im-
of feature k is then defined as:
mersed, therefore we use averaged in-situ solar wind pa-
rameters at Earth detected from the onset of the CME to
(xk − xk ) · (y − y)
m hours later to approximate the actual solar wind param- Corr = ,
eters at the CME location. In-situ solar wind observations at σxk σy
(4)
the Earth, including solar wind Bx , By , Bz , plasma density, Corr2
F= (l − 2),
1 − Corr2
6 J IAJIA L IU ET AL .

1.00
where l is the number of data points as defined in Sec. 3.1, σxk
and σy are the standard deviation of xk and y, respectively. A * 2 X A v e r a g e R2 Sco r e

higher F-score indicates a higher linear correlation between 0.95

the kth feature and the CME transit time y in this case.
Table 1 lists the rankings of all 18 features (excluding CME 0.90

acceleration) with m from 1 to mmax hours. Again, m repre-


sents the number of hours after the onset of the CME. mmax ,

R2 Score
0.85 * M a x i m u m R2 Sco r e
the upper limit of m, is set as 12 hours after considering the
prediction purpose of CAT-PUMA, because an extremely fast
CME (with speed over 3000 km s−1 ) could reach the Earth 0.80

within around 13 hours (Gopalswamy et al. 2010). Features


with higher F-scores have lower ranking numbers in the ta- 0.75

ble. It turns out that the rankings of all features keep rela-
tively stable. They changes are minor with increasing m, es- 0.70
pecially for the first 12 features in the table. Figure 2 depicts 0 2 4 6 8 10
Durat ion Aft er CME Onset For Average Solar Wind Param et ers (hrs)
12

the normalized F-scores of all features when m = 6 hours with


the largest F-score as 1. Figure 3. Variation of the average (blue curve) and maximum
(green curve) R2 scores during a 100000 times training with chang-
Not surprisingly, the average and final CME speeds have
ing values of m for calculating average solar wind parameters after
the highest F-scores, suggesting their importance in deter-
CME onset.
mine the CME transit time. CME angular width and mass
rank 3rd and 4th, respectively, which might be due to that
be given in Sec. 3.4. To evaluate how good the models using
the angular width contains information of CME propagating
solar wind parameters with different values of m are, we use
direction; and, CME angular width and mass together imply
the R2 score defined as:
CME’s plasma density which could play an important role
in the interaction between the CME and the ambient solar l
(yi − f (xi ))2
P
wind. Solar wind features including magnetic field Bz and Bx
1
(strength and poloidal direction of the solar wind magnetic R2 = 1 − l
, (5)
field), proton temperature, plasma pressure, plasma speed, (yi − y)2
P
1
flow longitude (toroidal direction of the solar wind plasma
flow) also play important roles with relatively high normal- where, yi , f (xi ), l are the same as defined in Sect 3.1, and y
ized F-scores. The alpha particle to proton number density is the average value of y. The variation of the maximum and
ratio in solar wind also ranks high in all the features, which average R2 scores with increasing m is shown in Figure 3.
might be caused by that the ratio is usually high in CMEs The average R2 score peaks at m = 6 hours, indicating that
and Co-rotating Interaction Regions (CIRs) (e.g., Prise et al. the best fitting result is revealed with 6-hour averaged solar
2015). CMEs/CIRs in front of a CME could potentially in- wind parameters after CME onset. The maximum R2 score
fluence its transit time. However, this needs to be further varies “periodically” within the range of 0.7 to 0.85 without
examined via analyzing the in-situ observations preceding an overall peak. This “periodicity” might have been caused
all the CMEs. Finally, we select 12 features with normal- by the combined effect of that 1) 100000 is only a fraction
37
ized F-score over 0.01 from high to low as the input of the of all C182 (∼ 6 × 1038 ) possibilities (for further details see
SVM. CME MPA is also included because it has a normal- Sec. 3.4), thus the best R2 score out of all possibilities can-
ized Fisher score of 0.008, very close to 0.01. not always be found during every training, and 2) imperfect
stochastic process of the computer in shuffling the dataset
(see Paragraph 2, Sec. 3.4). Even though the exact causes of
3.3. Determine Solar Wind Parameters
the above “periodicity” need further investigation, the vari-
In the previous sub-section, we have shown the result of ation of the average R2 scores suggests that 100000 is large
feature selection using solar wind parameters averaged be- enough to reflect the overall distribution of the R2 scores.
tween the onset time of CMEs and m hours later, where m To summarize the above, we found, using 6-hour averaged
ranges from 1 to 12. To determine the most favorable value solar wind parameters after the CME onset can result in the
of m in building the prediction engine, 1) we find the optimal best output.
C and γ for the dataset, followed by 2) training the SVM for
100000 times. 3) we re-calculate the optimal C and γ for the 3.4. Training the SVM
best training result. Finally, we repeat the above 3 steps for One major concern of the SVM regression is the choice of
m ranging from 1 to 12 hours. Details on the first 3 steps will parameters C and γ. In Sec. 3.1, it was demonstrated that
CAT- PUMA 7

Table 1. Ranking of all 18 features with m from 1 to 12 hours

Feature m (hours)
1 2 3 4 5 6 7 8 9 10 11 12
CME Average Speed 1 1 1 1 1 1 1 1 1 1 1 1
CME Final Speed 2 2 2 2 2 2 2 2 2 2 2 2
CME Angular Width 3 3 3 3 3 3 3 3 3 3 3 3
CME Mass 4 4 4 4 4 4 4 5 5 5 4 4
Solar Wind Bz 5 5 5 5 5 5 5 4 4 4 5 6
Solar Wind Temperature 7 7 7 7 7 6 6 6 6 6 6 5
Solar Wind Speed 6 6 6 6 6 7 7 7 7 7 7 7
Solar Wind Pressure 8 8 8 8 8 8 8 8 8 8 8 8
Solar Wind Longitude 11 9 9 9 9 9 9 9 9 9 9 9
CME Acceleration 10 10 10 10 10 10 10 10 10 10 10 10
Solar Wind He Proton Ratio 12 12 11 11 11 11 11 11 11 11 11 11
Solar Wind Bx 9 11 12 12 12 12 12 12 12 13 15 15
CME Position Angle 13 13 13 13 13 13 13 13 15 14 13 12
Solar Wind Density 16 17 15 14 14 14 14 14 14 15 14 13
Solar Wind Plasma Beta 19 18 17 15 18 15 15 15 13 12 12 14
Solar Wind Latitude 18 19 19 19 16 16 18 18 17 17 17 16
CME Source Region Longitude 15 15 14 16 15 17 16 16 16 16 16 17
CME Source Region Latitude 17 16 16 18 17 18 17 17 18 18 18 18
Solar Wind By 14 14 18 17 19 19 19 19 19 19 19 19
Note. The column in bold denotes the ranking of all features at m =6 hours, which is the most favorable value in building the prediction engine
(Sec. 3.3).

the regularization factor C trades off the tolerance on errors. the partition of the entire dataset between the training set and
A larger (smaller) C indicates that the SVM will attempt to the test set should be 80%:20% (145:37). Using the optimal
incorporate more (less) data points. Ill-posed C or γ could pair of parameters C and γ found above, we feed the training
result in over-fitting (the SVM attempts to fit all data points, set into the SVM regression algorithm to build a prediction
which may result in bad prediction for new inputs) or under- engine. Next, we make a prediction of the CME transit times
fitting (the SVM fits too few data points - it cannot represent using the test set and calculate the R2 score between the pre-
the trend of variation of the data). To find the optimal param- dicted and actual transit times. To find the best result with the
eters, we utilize the sklearn.model_selection.GridSearchCV highest R2 score, we randomly shuffle the entire dataset (the
function to perform exhaustive searches over specified val- order of the events in the dataset is shuffled, which is a gen-
ues. First, we build a logarithmic grid with basis of 10, in eral practice to avoid bias, see e.g., Géron 2017) and repeat
which C ranges in [10−2 , 106 ] and γ ranges in [10−5, 103 ], the above steps (i.e. split the shuffled dataset into the training
as the input of the GridSearchCV function. It turns out that and test sets, build an engine using the training set and calcu-
the R2 score peaks when C is of the order of 102 and γ of late the R2 score of the test set). Theoretically, there are C182
37

10−2 (Fig. 4a). Then, we perform the above exhaustive search (∼ 6 × 1038) possible combinations of the training set and test
again but with C in (0, 200] with a step of 1 and γ in (0, 0.2] set. This is a huge number, and is impossible to exhaustively
with a step of 10−3 . A more accurate pair of C and γ is then test all the possibilities given the available computer power
found, C = 32 and γ = 0.012 (Fig. 4b). for us.
For the cross-validation purpose, we split the entire dataset Figure. 5 shows the variation of the average (blue curve)
into two subsets: the training set and the test set. Amari √
et al. and maximum (green curve) R2 scores among all the test
(1997) found the optimal number of the test set as l/ 2n, sets with the increasing number of trainings. The average R2
where l and n are the number of data points and features, re- score increases continuously before the number of trainings
spectively. Taking l = 187 and n = 12 in our case, we find reaches 1000, and remains almost unchanged after that. This
8 J IAJIA L IU ET AL .
1e-20
0.01 (a) (b)
0.40 20.0
0.40
0.1
40.0
0.35
1.0 0.35
60.0

10.0 0.30 80.0


0.30

CC
100.0
CC

100.0
0.25
0.25
1000.0 120.0

0.20 140.0 0.20


10000.0

160.0
100000.0
0.15 0.15
180.0
1000000.0
0.10 200.0 0.10

02

04

06

08

12

14

16

18

2
5

01

01

.0

.0

0.

0.
0.

1.

0.

-2
-0

00

10

00

0.

0.

0.

0.

0.

0.

0.

0.
0.
00

1e
10
1e

0.

10
γgam m a γ
0.

gam m a

Figure 4. Distribution of the average correlation coefficient between the predicted and actual CME transit times of test sets during 3-fold
cross-validations repeated for different pairs of C and γ. In panel (a), C ranges in [10−2 , 106 ] and γ ranges in [10−5 , 103 ]. In panel (b), C ranges
in (0, 200] and γ ranges in (0, 0.2].

suggests, when the training is performed over 1000 times, the training only takes ∼25 minutes on an Intel(R) Core(TM) i7-
result can reflect the basic distribution of the R2 scores for all 7770K desktop with 8 threads. However, we should notice
37
C182 possibilities. The maximum R2 score increases steeply that, training the SVM regression 100000 times cannot al-
when the number of performed trainings is less than 100000, ways reveal the best result (as shown by the green dashed
and yields a similar value when it is increased by a factor of line in Fig. 3), because 100000 is only a fraction of all the
37
10. This indicates that it becomes more feasible to find the possibilities (C182 ). Multiple runs are sometimes needed to
best engine with increasing number of trainings. repeat the 100000 times of trainings.
1.00 4. RESULTS AND COMPARISON

0.95
Let us now use the shuffled dataset that yields the highest
R2 score of the test set among all the training instances as
0.90 * 2 X A v e r a g e R2 Sco r e the input to the engine. The optimal C =71 and γ =0.012 are
obtained, again, based on the selected shuffled dataset. Then,
0.85
we split this dataset into a training set and a test set. CAT-
PUMA is then built based on the training set and optimal
R2 Score

0.80
parameters.
0.75
Figure 6a shows the relation between the actual transit time
and predicted transit time given by CAT-PUMA of the test
0.70 * M a x i m u m R2 Sco r e set. Different blue dots represent different CME events. The
black dashed line represents a perfect prediction when the
0.65
predicted transit time has the same value as the actual transit
time. From the distribution of the dots, one sees that they
0.60
10 1 10 2 10 3 10 4 10 5 10 6 scatter close to the dashed line. The R2 score is ∼0.82. The
Num ber of Trainings
mean absolute error of the prediction is 5.9 ± 4.3 hours, and
Figure 5. Variation of the average (blue curve) and maximum the root mean square error is 7.3 hours. The probability of
(green curve) R2 scores with increasing number of trainings. detection (POD) is defined as:
Hits
Considering the above results and reasonable CPU time POD = . (6)
Hits + Misses
consumption, we repeat 100000 times of trainings to find the
best training set, which results in a highest R2 score of its Where, events with absolute prediction errors less and more
corresponding test set, to construct the engine. This could than 5.9 hours are defined as “hits” and “misses”, respec-
be rather costly. However, via paralleling the process em- tively. There are 20 events in the test set having absolute
ploying the open source Message Passing Interface (Open prediction errors less than 5.9 hours (Table 2), giving a POD
MPI, https://www.open-mpi.org/), a 100000 times of 54%.
CAT- PUMA 9
100
R2 Sco r e : 0 . 8 2
Table 2. Number and percentage of hits and misses in the test set
M e a n A b so l u  e E r r or  5 .9 ± .2 9
90
Ro o M e a n S q u a r e rror 7. 2
Hits Misses

80 Number 20 17
rs




Percentage 54% 46%


Transi Tim e

70


60
There are currently more than a dozen different meth-

re ic e

P 50 ods submitted to the NASA CME Scoreboard by a number


of teams to present their predictions of CME arrival times.
These methods include empirical, drag-based and physics-
40

30
(a) based models. More details on the utilized models can be
30 40 50 60 70
Act ual Transi t Tim e (h rs)
80 90 100 found in the NASA CME Scoreboard website (https://
20.0 kauai.ccmc.gsfc.nasa.gov/CMEscoreboard/)
and references therein. Let us now compare the absolute pre-
17.5
\
diction error of CAT-PUMA and the average absolute errors
MA Mo e

15.0 of all other methods available from the NASA CME Score-
Z
Y
X
W 1 1 .1^ 1 1 .1 _ board, and determine how much progress we have made over
A

V 12.5
the average level of current predictions. Figure 6b shows the
e

S
R
rror of

10.0
comparison for CMEs included in both the test set and the
Q

NASA CME Scoreboard, with Figure 6c for all CMEs in-


Abso u e re ic ion

N 7.5 1 1 .1`
6 6 .7 ]
M

L
cluded in the NASA CME Scoreboard. The dashed lines in
K
J
5.0
both panels indicate when CAT-PUMA has the same predic-
2.5
tion error as the average of other models. The dash-dotted
(b) lines represent a prediction error level of 9.3 (panel b) and
0.0
0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0 13.7 (panel c) hours, which are the mean values of the aver-
Average Abso 7 u 8 e 9re ; ic < ion =rror b > ?@ A er Me B D o F s G H rsI

!0
age absolute errors of other methods. Both panels show very
similar results. Considering there are only 9 data points in
/
5
panel (b), we focus on results revealed by panel (c). Green
MA Mo e

0
1 0 .61 dots (61.7%) are events of which CAT-PUMA performs bet-
U
-
,
T
ter and has errors less than 13.7 hours. Blue dots (14.9%) are
events of which CAT-PUMA performs better but has errors
A

C 25
e

+
*

larger than 13.7 hours. Purple dots (12.8%) are events of


rror of

20 12 .9 5
'
which CAT-PUMA performs worse but has errors less than
Abso u e re ic ion

13.7 hours. Finally, red dots (10.6%) are events of which


& 15
%

$
#
"
10 1 2 .86 CAT-PUMA performs worse and has errors larger than 13.7
6 1 .7 0 hours. In total, CAT-PUMA gives a better prediction for 77%
5
of the events, and has an error less than 13.7 hours for 74%
(c)
0
0 5 10 15 20 25 0 5 0
of the events.
Average Abso  u  e re  ic  ion rror b y O  er Me   o  s   rs

5. SUMMARY
Figure 6. (a): Predicted transit time by CAT-PUMA V.S. actual
transit time for CMEs in the test set. Black dashed line denotes the In this paper, we proposed a new tool for partial-/full-halo
same values of the predicted and actual transit time. (b): Compar- CME Arrival Time Prediction Using Machine learning Algo-
ison between absolute prediction errors by CAT-PUMA and aver- rithms (CAT-PUMA). During building the prediction engine,
age absolute errors of other methods in the NASA CME Scoreboard. we investigated which observed features may be important
Only data points included in both the NASA CME Scoreboard and
in determining the CME arrival time via a feature selection
the test set are shown in this panel. (c): Similar with panel (b), but
for all CMEs included in the NASA CME Scoreboard. Black dashed
process. CME properties including the average speed, final
lines represent that CAT-PUMA has the same prediction errors with speed, angular width and mass were found to play the most
the average of other methods. Black dash-dotted lines indicate an relevant roles in determining the transit time in the interplan-
absolute error of 9.3 (panel b) and 13.7 (panel c) hours, respectively. etary space. Solar wind parameters including magnetic field
Bz and Bx , proton temperature, flow speed, flow pressure,
flow longitude and alpha particle to proton number density
ratio were found important too.
10 J IAJIA L IU ET AL .

The average values of solar wind parameters between the viding necessary inputs. The shortcoming of CAT-PUMA is
onset time of the CME and 6 hours later were found to be that it cannot give a prediction whether a CME will hit the
the most favorable in building the engine. Considering an Earth or not.
average speed of 400 km s−1 of the solar wind, it typically CAT-PUMA has not included information on the 3D prop-
takes a 104-hour traveling time from the Sun to Earth. Our agating direction of CMEs. We propose that future efforts
results indicate that properties of solar wind detected at Earth towards including the 3D propagation direction and 3D de-
might have a periodicity of (104+6)/24=4.6 days. However, projected speed, employing either the graduated cylindri-
this needs to be further examined very carefully by future cal shell (GCS) model with multi-instrument observations
works. (Thernisien et al. 2006) or the integrated CME-arrival fore-
After obtaining the optimal pair of input parameters C and casting (iCAF) system (Zhuang et al. 2017), together with
γ, the CAT-PUMA engine is then constructed based on the more observed geo-effective CME events, will further im-
training set that yields a highest F-score of the test set during prove the prediction accuracy of CAT-PUMA.
trainings carried out 100000 times. The constructed engine
turns out to have a mean absolute error of about 5.9 hours in Acknowledgements. The SOHO LASCO CME catalog is
predicting the arrival time of CMEs for the test set, with 54% generated and maintained at the CDAW Data Center by
of the predictions having absolute errors less than 5.9 hours. NASA and The Catholic University of America in coopera-
Comparing with the average performance of other models tion with the Naval Research Laboratory. SOHO is a project
available in the literature, CAT-PUMA has better predictions of international cooperation between ESA and NASA. JL
in 77% events and prediction errors less than the mean value appreciates discussions with Dr. Xin Huang (National As-
of average absolute errors of other models in 74% events. tronomical Observatories, Chinese Academy of Sciences).
To summarize, the main advantages of CAT-PUMA are We thank Dr. Manolis K. Georgoulis (Research Center for
that: it provides accurate prediction with mean absolute er- Astronomy and Applied Mathematics, Academy of Athens)
ror less than 6 hours; it does not rely on a priori assumption for his useful advice in improving this paper. JL and RE
or theory; due to the underlying principles of machine learn- acknowledge the support (grant number ST/M000826/1)
ing, CAT-PUMA can evolve and promisingly improve with received by the Science and Technology Facility Council
more input events in the future; and finally, CAT-PUMA is (STFC), UK. RE is grateful for the support received from
a very fast open-source tool allowing all interested users to the Royal Society (UK). YW is supported by the grants
give their own predictions within several minutes after pro- 41574165 and 41774178 from NSFC.

APPENDIX

A. A PRACTICAL GUIDE OF USING THE CAT-PUMA TO PREDICT CME ARRIVAL TIME


CAT-PUMA is designed to have a very easy user-friendly approach. Users can download the CAT-PUMA engine (“en-
gine.obj"), the source code (“cat_puma.py") of an example demonstrating how we perform the prediction, and the source
code (“cat_puma_qt.py”) of a well-designed User Interface (UI) from the following link: https://github.com/PyDL/
cat-puma. All codes are written in Python, and have been tested with Python 2.7 on two Debian-based x86-64 Linux systems
(Ubuntu and Deepin) and the x86-64 Windows 10 system. Modifications of the code will be needed if one prefers to run CAT-
PUMA with Python 3. Python libraries, including datetime, numpy, pandas, pickle and scikit − learn (v0.19.1), are needed for a
proper run of “cat_puma.py”. In the following, we first explain the example code “cat-puma.py" in details.
The first 134 lines in the code import necessary libraries and define functions that will be used in the main program. Lines
138 to 152 define that features we are going to use, value of m (see Sec. 3.3) and the location of the engine file. Users are not
suggested to revise these lines. Lines 155 to 163 are as following:

# CME Parameters
time = ’2015-12-28T12:12:00’ # CME Onset time in LASCO C2
width = 360. # angular width, degree, set as 360 if it is halo
speed = 1212. # linear speed in LASCO FOV, km/s
final_speed = 1243. # second order final speed leaving LASCO FOV, km/s
mass = 1.9e16 # estimated mass using ‘cme_mass.pro’ in SSWIDL or
# obtained from the SOHO LASCO CME Catalog
mpa = 163. # degree, position angle corresponding to the fasted front
actual = ’2015-12-31T00:02:00’ # Actual arrival time, set to None if unknown
CAT- PUMA 11

The above lines define the onset time, angular width, average speed, final speed, estimated mass and MPA of the target CME.
These parameters can easily be obtained from the SOHO LASCO CME Catalog (https://cdaw.gsfc.nasa.gov/CME_
list/) if available or via analyzing LASCO fits files otherwise. Here, we employ a fast halo CME that erupted at 2015-
12-28T12:12 UT as the first example. This event was not included in our input dataset when constructing CAT-PUMA. Line
166 defines whether a user prefers to obtain the solar wind parameters automatically. If yes, the code will download solar wind
parameters for the specified CME automatically from the OMNIWeb Plus website (https://omniweb.gsfc.nasa.gov/).
Next, one can then run the code, typically via typing in the command python2 cat_puma.py, after following the above instruc-
tions to setup the user’s own target CME. The prediction will be given within minutes. The prediction result for the above CME
is as following (information in the last two lines will not be given if one has not specified the actual arrival time):
CME with onset time 2015-12-28T12:12:00 UT
will hit the Earth at 2015-12-30T18:29:33 UT
with a transit time of 54.3 hours
The actual arrival time is 2015-12-31T00:02:00 UT
The prediction error is -5.5 hours

(a) (b)

Figure 7. The User Interface of CAT-PUMA.

Alternatively, one can use the well-designed UI via running the command python2 cat_puma_qt.py. A proper run of it needs
additional Python library PyQt5 installed. Let us illustrate how this UI can be used with another example CME that erupted at
2016-04-10T11:12 UT. Again, this event was not included in our input dataset when constructing CAT-PUMA either. Figure 7a
shows the UI and corresponding CME parameters for this event. Average speed (543 km s−1 ), final speed (547 km s−1 ), angular
width (136◦) and the MPA (25◦ ) were obtained from the SOHO LASCO CME Catalog. The mass of the CME was estimated by the
built-in function “cme_mass.pro" in the SolarSoft IDL. It turns out to be ∼ 4.6 × 1015 g. By checking the option “Automatically
Obtain Solar Wind Parameters”, solar wind parameters are obtained automatically from the OMNIWeb Plus website (https://
omniweb.gsfc.nasa.gov/) after clicking the “Submit” button. Then, actual values of the solar wind parameters is shown.
Parameters that are not available from the OMNIWeb Plus website are set to 0.00001 (manually input of these parameters are
then needed in this case, near real-time solar wind data can be download from the CDAWeb website https://cdaweb.sci.
gsfc.nasa.gov/istp_public/). Figure 7b shows the prediction result for the above CME, revealing an error of 5.2
hours.

REFERENCES
Ahmed, O. W., Qahwaji, R., Colak, T., et al. 2013, SoPh, 283, 157 Amari, S.-i., Murata, N., Muller, K.-R., Finke, M., & Yang, H. H.
1997, IEEE Transactions on Neural Networks, 8, 985
12 J IAJIA L IU ET AL .

Antiochos, S. K., DeVore, C. R., & Klimchuk, J. A. 1999, Liu, R., Liu, C., Wang, S., Deng, N., & Wang, H. 2010a,
ApJ, 510, 485 ApJL, 725, L84
Biesecker, D. A., Myers, D. C., Thompson, B. J., Hammer, D. M., Liu, W., Nitta, N. V., Schrijver, C. J., Title, A. M., & Tarbell, T. D.
& Vourlidas, A. 2002, ApJ, 569, 1009 2010b, ApJL, 723, L53
Bobra, M. G., & Couvidat, S. 2015, Astrophys. J., 798, 135 Low, B. C. 2001, J. Geophys. Res., 106, 25141
Bobra, M. G., & Ilonidis, S. 2016, Astrophys. J., 821, 127 Lugaz, N., Temmer, M., Wang, Y., & Farrugia, C. J. 2017,
Brueckner, G. E., Howard, R. A., Koomen, M. J., et al. 1995, SoPh, 292, 64
Sol. Phys., 162, 357 Manoharan, P. K. 2006, SoPh, 235, 345
Chen, P. F. 2011, Living Reviews in Solar Physics, 8, 1 Mays, M. L., Taktakishvili, A., Pulkkinen, A. A., et al. 2013, AGU
Chen, P. F., Fang, C., & Shibata, K. 2005, ApJ, 622, 1202 Fall Meeting Abstracts
Chen, Y., Du, G., Feng, L., et al. 2014, ApJ, 787, 59 Mishra, W., Wang, Y., & Srivastava, N. 2016, ApJ, 831, 99
Chi, Y., Shen, C., Wang, Y., et al. 2016, SoPh, 291, 2419 Moon, Y.-J., Dryer, M., Smith, Z., Park, Y. D., & Cho, K. S. 2002,
Detman, T., Smith, Z., Dryer, M., et al. 2006, Geophys. Res. Lett., 29, 28
Journal of Geophysical Research (Space Physics), 111, A07102 Möstl, C., Amla, K., Hall, J. R., et al. 2014,
Dryer, M., Fry, C. D., Sun, W., et al. 2001, SoPh, 204, 265 The Astrophysical Journal, 787, 119
Feng, X., & Zhao, X. 2006, SoPh, 238, 167 Nishizuka, N., Sugiura, K., Kubo, Y., et al. 2017,
Feng, X., Zhou, Y., & Wu, S. T. 2007, ApJ, 655, 1110 Astrophys. J., 835, 156
Forbes, T. G. 2000, J. Geophys. Res., 105, 23153
Pedregosa, F., Varoquaux, G., Gramfort, A., et al. 2011, Journal of
Géron, A. 2017, Hands-on machine learning with Scikit-Learn and
Machine Learning Research, 12, 2825
TensorFlow: concepts, tools, and techniques to build intelligent
Prise, A. J., Harra, L. K., Matthews, S. A., Arridge, C. S., &
systems
Achilleos, N. 2015,
Gibson, S. E., & Low, B. C. 1998, ApJ, 493, 460
Journal of Geophysical Research (Space Physics), 120, 1566
Gopalswamy, N. 2016, Geoscience Letters, 3, 8
Qahwaji, R., & Colak, T. 2007, SoPh, 241, 195
Gopalswamy, N., Lara, A., Yashiro, S., Nunes, S., & Howard, R. A.
Qiu, J., Wang, H., Cheng, C. Z., & Gary, D. E. 2004, ApJ, 604, 900
2003, in ESA Special Publication, Vol. 535, Solar Variability as
Richardson, I. G., & Cane, H. V. 2010, Sol. Phys., 264, 189
an Input to the Earth’s Environment, ed. A. Wilson, 403
Riley, P., Linker, J. A., Lionello, R., & Mikic, Z. 2012,
Gopalswamy, N., Yashiro, S., Michalek, G., et al. 2009,
Journal of Atmospheric and Solar-Terrestrial Physics, 83, 1
Earth Moon and Planets, 104, 295
Riley, P., Linker, J. A., & Mikić, Z. 2013,
Gopalswamy, N., Yashiro, S., Michalek, G., et al. 2010, Sun and
Journal of Geophysical Research (Space Physics), 118, 600
Geosphere, 5, 7
Robbrecht, E., Berghmans, D., & Van der Linden, R. A. M. 2009,
Gui, B., Shen, C., Wang, Y., et al. 2011, SoPh, 271, 111
ApJ, 691, 1222
Hansen, R. T., Garcia, C. J., Grognard, R. J.-M., & Sheridan, K. V.
Schwenn, R., dal Lago, A., Huttunen, E., & Gonzalez, W. D. 2005,
1971, Proceedings of the Astronomical Society of Australia, 2,
57 Annales Geophysicae, 23, 1033
Harrison, R. A. 1995, A&A, 304, 585 Sharma, R., Srivastava, N., Chakrabarty, D., Möstl, C., & Hu, Q.
Hess, P., & Zhang, J. 2017, SoPh, 292, 80 2013,
Isavnin, A., Vourlidas, A., & Kilpua, E. K. J. 2014, Journal of Geophysical Research (Space Physics), 118, 3954
SoPh, 289, 2141 Shen, C., Liao, C., Wang, Y., Ye, P., & Wang, S. 2013a,
Jackson, B. V., Sheridan, K. V., Dulk, G. A., & McLean, D. J. SoPh, 282, 543
1978, Proceedings of the Astronomical Society of Australia, 3, Shen, C., Wang, Y., Pan, Z., et al. 2013b,
241 Journal of Geophysical Research (Space Physics), 118, 6858
Jing, J., Yurchyshyn, V. B., Yang, G., Xu, Y., & Wang, H. 2004, Shen, C., Wang, Y., Wang, S., et al. 2012, Nature Physics, 8, 923
ApJ, 614, 1054 Shen, F., Shen, C., Wang, Y., Feng, X., & Xiang, C. 2013c,
Kay, C., Opher, M., & Evans, R. M. 2015, ApJ, 805, 168 Geophys. Res. Lett., 40, 1457
Lantos, P., Kerdraon, A., Rapley, G. G., & Bentley, R. D. 1981, Shen, Y., Liu, Y., Su, J., & Deng, Y. 2012, ApJ, 745, 164
A&A, 101, 33 Smith, Z., & Dryer, M. 1990, SoPh, 129, 387
Li, R., Wang, H.-N., He, H., Cui, Y.-M., & Zhan-LeDu. 2007, Smola, A. J., & Schölkopf, B. 2004,
ChJA&A, 7, 441 Statistics and Computing, 14, 199
Lin, J., & Forbes, T. G. 2000, J. Geophys. Res., 105, 2375 Subramanian, P., Lara, A., & Borgazzi, A. 2012,
Liu, J., Wang, Y., Shen, C., et al. 2015, ApJ, 813, 115 Geophys. Res. Lett., 39, L19107
CAT- PUMA 13

Thernisien, A. F. R., Howard, R. A., & Vourlidas, A. 2006, Wang, Y. M., Ye, P. Z., Wang, S., & Xue, X. H. 2003,
Astrophys. J., 652, 763 Geophys. Res. Lett., 30, 33
Tóth, G., Sokolov, I. V., Gombosi, T. I., et al. 2005, Wang, Y. M., Ye, P. Z., Wang, S., Zhou, G. P., & Wang, J. X.
Journal of Geophysical Research (Space Physics), 110, A12226 2002b,
Tousey, R. 1973, in Space Research Conference, Vol. 2, Space Journal of Geophysical Research (Space Physics), 107, 1340
Research Conference, ed. M. J. Rycroft & S. K. Runcorn, 713 Webb, D. F., & Howard, T. A. 2012,
Vandas, M., Fischer, S., Dryer, M., Smith, Z., & Detman, T. 1996, Living Reviews in Solar Physics, 9, 3
J. Geophys. Res., 101, 15645 Xie, H., Ofman, L., & Lawrence, G. 2004,
Vapnik, V. 2013, The nature of statistical learning theory (Springer Journal of Geophysical Research (Space Physics), 109, A03109
science & business media) Yang, Y. H., Tian, H. M., Peng, B., Li, T. R., & Xie, Z. X. 2017,
Vršnak, B. 2001, SoPh, 202, 173 SoPh, 292, 131
Vršnak, B., & Cliver, E. W. 2008, SoPh, 253, 215 Zhang, J., Cheng, X., & Ding, M.-D. 2012,
Vršnak, B., & Žic, T. 2007, A&A, 472, 937 Nature Communications, 3, 747
Wang, Y., Shen, C., Wang, S., & Ye, P. 2004, SoPh, 222, 329 Zhang, J., Richardson, I. G., Webb, D. F., et al. 2007, Journal of
Geophysical Research (Space Physics), 112, A10102
Wang, Y., Ye, P., & Wang, S. 2007, Solar Physics, 240, 373
Zhao, X., & Dryer, M. 2014, Space Weather, 12, 448
Wang, Y., Zhou, G., Ye, P., Wang, S., & Wang, J. 2006,
Zheng, R., Chen, Y., Du, G., & Li, C. 2016, ApJL, 819, L18
ApJ, 651, 1245
Zhuang, B., Wang, Y., Shen, C., et al. 2017, ApJ, 845, 117
Wang, Y. M., Wang, S., & Ye, P. Z. 2002a, SoPh, 211, 333

You might also like