J L, Y Y, C S, Y W, R E: Twocolumn
J L, Y Y, C S, Y W, R E: Twocolumn
A NEW TOOL FOR CME ARRIVAL TIME PREDICTION USING MACHINE LEARNING ALGORITHMS: CAT-PUMA
J IAJIA L IU ,1 Y UDONG Y E ,2, 3 C HENGLONG S HEN ,4, 5 Y UMING WANG ,4, 5 AND ROBERT E RDÉLYI1, 6
arXiv:1802.02803v1 [astro-ph.SR] 8 Feb 2018
1 Solar Physics and Space Plasma Research Center (SP2RC), School of Mathematics and Statistics, The University of Sheffield, Sheffield S3 7RH, UK
2 SIGMA Weather Group, State Key Laboratory of Space Weather, National Space Science Center, Chinese Academy of Sciences, Beijing 100190, China
3 University of Chinese Academy of Sciences, Beijing 100049, China
4 CAS Key Laboratory of Geospace Environment, Department of Geophysics and Planetary Sciences, University of Science and Technology of China, Hefei, Anhui
230026, China
5 Synergetic Innovation Center of Quantum Information & Quantum Physics, University of Science and Technology of China, Hefei, Anhui 230026, China
6 Department of Astronomy, Eötvös Loránd University, Budapest, Pázmány P. sétány 1/A, H-1117, Hungary
ABSTRACT
Coronal Mass Ejections (CMEs) are arguably the most violent eruptions in the Solar System. CMEs can cause severe distur-
bances in the interplanetary space and even affect human activities in many respects, causing damages to infrastructure and losses
of revenue. Fast and accurate prediction of CME arrival time is then vital to minimize the disruption CMEs may cause when
interacting with geospace. In this paper, we propose a new approach for partial-/full-halo CME Arrival Time Prediction Using
Machine learning Algorithms (CAT-PUMA). Via detailed analysis of the CME features and solar wind parameters, we build a
prediction engine taking advantage of 182 previously observed geo-effective partial-/full-halo CMEs and using algorithms of the
Support Vector Machine (SVM). We demonstrate that CAT-PUMA is accurate and fast. In particular, predictions after applying
CAT-PUMA to a test set, that is unknown to the engine, show a mean absolute prediction error ∼5.9 hours of the CME arrival
time, with 54% of the predictions having absolute errors less than 5.9 hours. Comparison with other models reveals that CAT-
PUMA has a more accurate prediction for 77% of the events investigated; and can be carried out very fast, i.e. within minutes
after providing the necessary input parameters of a CME. A practical guide containing the CAT-PUMA engine and the source
code of two examples are available in the Appendix, allowing the community to perform their own applications for prediction
using CAT-PUMA.
[email protected]
2 J IAJIA L IU ET AL .
In this paper, we propose a new approach to modeling the tion similar to that of the Richardson and Cane list of
partial-/full-halo CME Arrival Time Prediction Using Ma- 73 geo-effective CMEs and corresponding ICMEs from
chine learning Algorithms (CAT-PUMA). We will divide 182 2007 to 2017. It is available at http://solar.gmu.
geo-effective CMEs observed in the past two decades, i.e. edu/heliophysics/index.php/GMU_CME/ICME_
from 1996 to 2015, into two sets: namely, for training and List. We only select ICME events satisfying the follow-
for testing purposes, respectively. All inputs will be only ob- ing criterion: i) there are associated shocks and ii) multiple
servables. Without a priori assumption or underlying physi- CMEs are not involved. After implementing the selection
cal theory, our method gives a mean absolute prediction er- criteria, 38 events are obtained from this list.
ror, around as little as 6 hours. Details on data mining are 4. The CME Scoreboard developed at the Community Co-
in Sec. 2. Overview of the employed machine learning al- ordinated Modeling Center (CCMC), NASA. It is a web-
gorithms and the implemented training process are described site allowing the community to submit and view the ac-
in Sec. 3. Results and comparison with previous prediction tual and predicted arrival time of CMEs from 2013 to the
models are discussed in Sec. 4. We summarize in Sec. 5. present (https://kauai.ccmc.gsfc.nasa.gov/
A practical guide on how to perform predictions with CAT- CMEscoreboard/). For our analysis, we remove those
PUMA is presented in Appendix A. events that did not interact with the Earth and those that have
a “note”. Event was labeled with a “note” because, e.g., that
2. DATA MINING the target CME did not arrive at Earth, or there was some un-
certainty in measuring the shock arrival time, or there were
To build a suitable set of input for the machine learn-
multiple CME events. Here, we obtained 134 CME events
ing algorithms, the first step of our data mining is to con-
from this list.
struct a list of CMEs that have eventually arrived at Earth
Combining all four lists, we obtain eventually 382 geo-
and have also caused disturbances to the terrestrial magnetic
effective CME events via data-mining. However, there are
field. Such CMEs are usually called geo-effective CMEs.
overlaps between these lists. To remove duplicates, we re-
We defined four different Python crawlers to automatically
move one of such pairs if two CMEs have onset times with a
gather the onset time, which is usually defined as the first ap-
difference less than 1 hour. 90 events are therefore removed.
pearance in the Field-of-View (FOV) of SOHO LASCO C2
The SOHO LASCO CME Catalog (https://cdaw.
(Brueckner et al. 1995), and the arrival time of CMEs, which
gsfc.nasa.gov/CME_list/) provides a database of
represents the arrival time of interplanetary shocks driven by
all CMEs observed by SOHO LASCO from 1996 to 2016
CMEs hereafter, from the following four lists:
(Gopalswamy et al. 2009). Via matching the onset time of
1. The Richardson and Cane List (Richardson & Cane 2010).
CMEs in our list with the onset time of CMEs recorded in the
The list is available at http://www.srl.caltech.
SOHO LASCO CME Catalog, we obtain various parameters
edu/ACE/ASC/DATA/level3/icmetable2.htm
of them including the angular width, average speed, accel-
and contains various parameters, including the average
eration, final speed in the FOV of LASCO, estimated mass
speed, magnetic field, associated DST index of more than
and main position angle (MPA, corresponding to the posi-
500 Interplanetary CMEs (ICMEs) from 1996 to 2006 and
tion angle of the fastest moving part of the CME’s leading
the onset time of their associated CMEs if observed. We dis-
edge). The location of the source region of full halo CMEs
card events with no or ambiguously associated CMEs, and
can be obtained from the SOHO/LASCO Halo CME Cat-
obtain the onset and arrival time of 186 geo-effective CMEs
alog (https://cdaw.gsfc.nasa.gov/CME_list/
from this list.
halo/halo.html). CMEs that have no source-region in-
2. List of Full Halo CMEs provide by the Research Group
formation in the above catalog are further investigated man-
on Solar-TErrestrial Physics (STEP) at University of Sci-
ually, one-by-one, to determine their source region location.
ence and Technology of China (USTC) (Shen et al. 2013b).
Further, events from our compiled list are further removed
A Full halo CME is defined when its angular width ob-
if they have: i) angular width less than 90◦ ; ii) no available
served by SOHO LASCO is 360◦ . This list is available at
mass estimation; or iii) ambiguous source region location.
http://space.ustc.edu.cn/dreams/fhcmes/
Finally, two CMEs at 2003-10-29 20:54 UT and 2011-10-27
index.php and provides the 3D direction, angular width,
12:12 UT are also removed because the first one has incor-
real and projected velocity of 49 CMEs from 2009 to 2012,
rect velocity and acceleration estimation; and, the second
and the arrival time of their associated shocks if observed.
one erupted with more than a dozen CMEs during that day.
Events without observation of the associated interplanetary
Eventually, after applying all the above selection criteria,
shocks are removed. The onset and arrival times of 24 geo-
we obtain a list of 182 events containing geo-effective CMEs
effective CMEs are obtained from this list.
from 1996 to 2015, of which 56 are partial-halo CMEs and
3. The George Mason University (GMU) CME/ICME
List (Hess & Zhang 2017). This list contains informa-
4 J IAJIA L IU ET AL .
126 are halo CMEs. The average speed of these CMEs FOV
ranges from 400 km s−1 to 1500 km s−1 in the LASCO FOV.
y=f(x)+ε
3. OPTIMIZATION
f(x)=ωx+b
One of the most popular machine learning algorithms ε
is the Support Vector Machine algorithm (SVM). It is a
set of supervised learning methods for classification, re-
gression and outliers detection. The original SVMs were y
linear (see the review Smola & Schölkopf 2004), though
SVMs are also suitable for conducting nonlinear analysis via
mapping input parameters into higher dimensional spaces
with different kernel functions. An implementation of the y=f(x)-ε
SVM has been integrated into the Python scikit-learn library
ε
(Pedregosa et al. 2011), with an open-source access and well-
established documentation (http://scikit-learn.
org/stable/). According to the scikit-learn documenta-
tion, major advantages of the SVM are that it is: 1) effective x
in high dimensional spaces, 2) still effective even if the num-
ber of dimensions is greater than the number of samples, and Figure 1. An example of the SVM regression in a simple two-
dimensional, linear and hard-margin problem. Adopted from Fig.
3) memory efficient. Besides, it is particularly well-suited
5-10 in Géron (2017).
for small- or medium-sized datasets (Géron 2017).
Recent works utilizing machine learning algorithms have
The solution for the above two-dimensional, linear
been mainly focused on solar flare prediction, CME pro-
and hard-margin problem can be extended into multi-
ductivity and solar feature identification using classifica-
dimensional, linear and soft-margin problems. In this case,
tion methods (e.g., Li et al. 2007; Qahwaji & Colak 2007;
the target for the SVM regression is to:
Ahmed et al. 2013; Bobra & Couvidat 2015; Bobra & Ilonidis
2016; Nishizuka et al. 2017) or multi-labeling algorithms
l
(e.g. Yang et al. 2017). However, to the best of our knowl- 1 2
X
edge, the SVM regression algorithm which is suitable for minimize ||ω|| + C (ξi + ξi∗ ),
2
1
a wide range of solar/space physics research such as solar
cycle prediction, DST index prediction and active region (2)
yi − hω, xi i − b ≤ ǫ + ξi ,
occurrence prediction has not yet been widely used by the subject to hω, xi i + b − yi ≤ ǫ + ξi ,
solar/space physics community. Further, no previous study
has attempted to employ the SVM regression algorithm in ξi , ξi∗ ≥ 0, i = 1, 2, 3...l,
1.00
where l is the number of data points as defined in Sec. 3.1, σxk
and σy are the standard deviation of xk and y, respectively. A * 2 X A v e r a g e R2 Sco r e
the kth feature and the CME transit time y in this case.
Table 1 lists the rankings of all 18 features (excluding CME 0.90
R2 Score
0.85 * M a x i m u m R2 Sco r e
the upper limit of m, is set as 12 hours after considering the
prediction purpose of CAT-PUMA, because an extremely fast
CME (with speed over 3000 km s−1 ) could reach the Earth 0.80
ble. It turns out that the rankings of all features keep rela-
tively stable. They changes are minor with increasing m, es- 0.70
pecially for the first 12 features in the table. Figure 2 depicts 0 2 4 6 8 10
Durat ion Aft er CME Onset For Average Solar Wind Param et ers (hrs)
12
Feature m (hours)
1 2 3 4 5 6 7 8 9 10 11 12
CME Average Speed 1 1 1 1 1 1 1 1 1 1 1 1
CME Final Speed 2 2 2 2 2 2 2 2 2 2 2 2
CME Angular Width 3 3 3 3 3 3 3 3 3 3 3 3
CME Mass 4 4 4 4 4 4 4 5 5 5 4 4
Solar Wind Bz 5 5 5 5 5 5 5 4 4 4 5 6
Solar Wind Temperature 7 7 7 7 7 6 6 6 6 6 6 5
Solar Wind Speed 6 6 6 6 6 7 7 7 7 7 7 7
Solar Wind Pressure 8 8 8 8 8 8 8 8 8 8 8 8
Solar Wind Longitude 11 9 9 9 9 9 9 9 9 9 9 9
CME Acceleration 10 10 10 10 10 10 10 10 10 10 10 10
Solar Wind He Proton Ratio 12 12 11 11 11 11 11 11 11 11 11 11
Solar Wind Bx 9 11 12 12 12 12 12 12 12 13 15 15
CME Position Angle 13 13 13 13 13 13 13 13 15 14 13 12
Solar Wind Density 16 17 15 14 14 14 14 14 14 15 14 13
Solar Wind Plasma Beta 19 18 17 15 18 15 15 15 13 12 12 14
Solar Wind Latitude 18 19 19 19 16 16 18 18 17 17 17 16
CME Source Region Longitude 15 15 14 16 15 17 16 16 16 16 16 17
CME Source Region Latitude 17 16 16 18 17 18 17 17 18 18 18 18
Solar Wind By 14 14 18 17 19 19 19 19 19 19 19 19
Note. The column in bold denotes the ranking of all features at m =6 hours, which is the most favorable value in building the prediction engine
(Sec. 3.3).
the regularization factor C trades off the tolerance on errors. the partition of the entire dataset between the training set and
A larger (smaller) C indicates that the SVM will attempt to the test set should be 80%:20% (145:37). Using the optimal
incorporate more (less) data points. Ill-posed C or γ could pair of parameters C and γ found above, we feed the training
result in over-fitting (the SVM attempts to fit all data points, set into the SVM regression algorithm to build a prediction
which may result in bad prediction for new inputs) or under- engine. Next, we make a prediction of the CME transit times
fitting (the SVM fits too few data points - it cannot represent using the test set and calculate the R2 score between the pre-
the trend of variation of the data). To find the optimal param- dicted and actual transit times. To find the best result with the
eters, we utilize the sklearn.model_selection.GridSearchCV highest R2 score, we randomly shuffle the entire dataset (the
function to perform exhaustive searches over specified val- order of the events in the dataset is shuffled, which is a gen-
ues. First, we build a logarithmic grid with basis of 10, in eral practice to avoid bias, see e.g., Géron 2017) and repeat
which C ranges in [10−2 , 106 ] and γ ranges in [10−5, 103 ], the above steps (i.e. split the shuffled dataset into the training
as the input of the GridSearchCV function. It turns out that and test sets, build an engine using the training set and calcu-
the R2 score peaks when C is of the order of 102 and γ of late the R2 score of the test set). Theoretically, there are C182
37
10−2 (Fig. 4a). Then, we perform the above exhaustive search (∼ 6 × 1038) possible combinations of the training set and test
again but with C in (0, 200] with a step of 1 and γ in (0, 0.2] set. This is a huge number, and is impossible to exhaustively
with a step of 10−3 . A more accurate pair of C and γ is then test all the possibilities given the available computer power
found, C = 32 and γ = 0.012 (Fig. 4b). for us.
For the cross-validation purpose, we split the entire dataset Figure. 5 shows the variation of the average (blue curve)
into two subsets: the training set and the test set. Amari √
et al. and maximum (green curve) R2 scores among all the test
(1997) found the optimal number of the test set as l/ 2n, sets with the increasing number of trainings. The average R2
where l and n are the number of data points and features, re- score increases continuously before the number of trainings
spectively. Taking l = 187 and n = 12 in our case, we find reaches 1000, and remains almost unchanged after that. This
8 J IAJIA L IU ET AL .
1e-20
0.01 (a) (b)
0.40 20.0
0.40
0.1
40.0
0.35
1.0 0.35
60.0
CC
100.0
CC
100.0
0.25
0.25
1000.0 120.0
160.0
100000.0
0.15 0.15
180.0
1000000.0
0.10 200.0 0.10
02
04
06
08
12
14
16
18
2
5
01
01
.0
.0
0.
0.
0.
1.
0.
-2
-0
00
10
00
0.
0.
0.
0.
0.
0.
0.
0.
0.
00
1e
10
1e
0.
10
γgam m a γ
0.
gam m a
Figure 4. Distribution of the average correlation coefficient between the predicted and actual CME transit times of test sets during 3-fold
cross-validations repeated for different pairs of C and γ. In panel (a), C ranges in [10−2 , 106 ] and γ ranges in [10−5 , 103 ]. In panel (b), C ranges
in (0, 200] and γ ranges in (0, 0.2].
suggests, when the training is performed over 1000 times, the training only takes ∼25 minutes on an Intel(R) Core(TM) i7-
result can reflect the basic distribution of the R2 scores for all 7770K desktop with 8 threads. However, we should notice
37
C182 possibilities. The maximum R2 score increases steeply that, training the SVM regression 100000 times cannot al-
when the number of performed trainings is less than 100000, ways reveal the best result (as shown by the green dashed
and yields a similar value when it is increased by a factor of line in Fig. 3), because 100000 is only a fraction of all the
37
10. This indicates that it becomes more feasible to find the possibilities (C182 ). Multiple runs are sometimes needed to
best engine with increasing number of trainings. repeat the 100000 times of trainings.
1.00 4. RESULTS AND COMPARISON
0.95
Let us now use the shuffled dataset that yields the highest
R2 score of the test set among all the training instances as
0.90 * 2 X A v e r a g e R2 Sco r e the input to the engine. The optimal C =71 and γ =0.012 are
obtained, again, based on the selected shuffled dataset. Then,
0.85
we split this dataset into a training set and a test set. CAT-
PUMA is then built based on the training set and optimal
R2 Score
0.80
parameters.
0.75
Figure 6a shows the relation between the actual transit time
and predicted transit time given by CAT-PUMA of the test
0.70 * M a x i m u m R2 Sco r e set. Different blue dots represent different CME events. The
black dashed line represents a perfect prediction when the
0.65
predicted transit time has the same value as the actual transit
time. From the distribution of the dots, one sees that they
0.60
10 1 10 2 10 3 10 4 10 5 10 6 scatter close to the dashed line. The R2 score is ∼0.82. The
Num ber of Trainings
mean absolute error of the prediction is 5.9 ± 4.3 hours, and
Figure 5. Variation of the average (blue curve) and maximum the root mean square error is 7.3 hours. The probability of
(green curve) R2 scores with increasing number of trainings. detection (POD) is defined as:
Hits
Considering the above results and reasonable CPU time POD = . (6)
Hits + Misses
consumption, we repeat 100000 times of trainings to find the
best training set, which results in a highest R2 score of its Where, events with absolute prediction errors less and more
corresponding test set, to construct the engine. This could than 5.9 hours are defined as “hits” and “misses”, respec-
be rather costly. However, via paralleling the process em- tively. There are 20 events in the test set having absolute
ploying the open source Message Passing Interface (Open prediction errors less than 5.9 hours (Table 2), giving a POD
MPI, https://www.open-mpi.org/), a 100000 times of 54%.
CAT- PUMA 9
100
R2 Sco r e : 0 . 8 2
Table 2. Number and percentage of hits and misses in the test set
M e a n A b so l u e E r r or 5 .9 ± .2 9
90
Ro o M e a n S q u a r e rror 7. 2
Hits Misses
80 Number 20 17
rs
70
60
There are currently more than a dozen different meth-
re ic e
30
(a) based models. More details on the utilized models can be
30 40 50 60 70
Act ual Transi t Tim e (h rs)
80 90 100 found in the NASA CME Scoreboard website (https://
20.0 kauai.ccmc.gsfc.nasa.gov/CMEscoreboard/)
and references therein. Let us now compare the absolute pre-
17.5
\
diction error of CAT-PUMA and the average absolute errors
MA Mo e
15.0 of all other methods available from the NASA CME Score-
Z
Y
X
W 1 1 .1^ 1 1 .1 _ board, and determine how much progress we have made over
A
V 12.5
the average level of current predictions. Figure 6b shows the
e
S
R
rror of
10.0
comparison for CMEs included in both the test set and the
Q
N 7.5 1 1 .1`
6 6 .7 ]
M
L
cluded in the NASA CME Scoreboard. The dashed lines in
K
J
5.0
both panels indicate when CAT-PUMA has the same predic-
2.5
tion error as the average of other models. The dash-dotted
(b) lines represent a prediction error level of 9.3 (panel b) and
0.0
0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0 13.7 (panel c) hours, which are the mean values of the aver-
Average Abso 7 u 8 e 9re ; ic < ion =rror b > ?@ A er Me B D o F s G H rsI
!0
age absolute errors of other methods. Both panels show very
similar results. Considering there are only 9 data points in
/
5
panel (b), we focus on results revealed by panel (c). Green
MA Mo e
0
1 0 .61 dots (61.7%) are events of which CAT-PUMA performs bet-
U
-
,
T
ter and has errors less than 13.7 hours. Blue dots (14.9%) are
events of which CAT-PUMA performs better but has errors
A
C 25
e
+
*
20 12 .9 5
'
which CAT-PUMA performs worse but has errors less than
Abso u e re ic ion
$
#
"
10 1 2 .86 CAT-PUMA performs worse and has errors larger than 13.7
6 1 .7 0 hours. In total, CAT-PUMA gives a better prediction for 77%
5
of the events, and has an error less than 13.7 hours for 74%
(c)
0
0 5 10 15 20 25 0 5 0
of the events.
Average Abso u e re ic ion rror b y O er Me o s rs
5. SUMMARY
Figure 6. (a): Predicted transit time by CAT-PUMA V.S. actual
transit time for CMEs in the test set. Black dashed line denotes the In this paper, we proposed a new tool for partial-/full-halo
same values of the predicted and actual transit time. (b): Compar- CME Arrival Time Prediction Using Machine learning Algo-
ison between absolute prediction errors by CAT-PUMA and aver- rithms (CAT-PUMA). During building the prediction engine,
age absolute errors of other methods in the NASA CME Scoreboard. we investigated which observed features may be important
Only data points included in both the NASA CME Scoreboard and
in determining the CME arrival time via a feature selection
the test set are shown in this panel. (c): Similar with panel (b), but
for all CMEs included in the NASA CME Scoreboard. Black dashed
process. CME properties including the average speed, final
lines represent that CAT-PUMA has the same prediction errors with speed, angular width and mass were found to play the most
the average of other methods. Black dash-dotted lines indicate an relevant roles in determining the transit time in the interplan-
absolute error of 9.3 (panel b) and 13.7 (panel c) hours, respectively. etary space. Solar wind parameters including magnetic field
Bz and Bx , proton temperature, flow speed, flow pressure,
flow longitude and alpha particle to proton number density
ratio were found important too.
10 J IAJIA L IU ET AL .
The average values of solar wind parameters between the viding necessary inputs. The shortcoming of CAT-PUMA is
onset time of the CME and 6 hours later were found to be that it cannot give a prediction whether a CME will hit the
the most favorable in building the engine. Considering an Earth or not.
average speed of 400 km s−1 of the solar wind, it typically CAT-PUMA has not included information on the 3D prop-
takes a 104-hour traveling time from the Sun to Earth. Our agating direction of CMEs. We propose that future efforts
results indicate that properties of solar wind detected at Earth towards including the 3D propagation direction and 3D de-
might have a periodicity of (104+6)/24=4.6 days. However, projected speed, employing either the graduated cylindri-
this needs to be further examined very carefully by future cal shell (GCS) model with multi-instrument observations
works. (Thernisien et al. 2006) or the integrated CME-arrival fore-
After obtaining the optimal pair of input parameters C and casting (iCAF) system (Zhuang et al. 2017), together with
γ, the CAT-PUMA engine is then constructed based on the more observed geo-effective CME events, will further im-
training set that yields a highest F-score of the test set during prove the prediction accuracy of CAT-PUMA.
trainings carried out 100000 times. The constructed engine
turns out to have a mean absolute error of about 5.9 hours in Acknowledgements. The SOHO LASCO CME catalog is
predicting the arrival time of CMEs for the test set, with 54% generated and maintained at the CDAW Data Center by
of the predictions having absolute errors less than 5.9 hours. NASA and The Catholic University of America in coopera-
Comparing with the average performance of other models tion with the Naval Research Laboratory. SOHO is a project
available in the literature, CAT-PUMA has better predictions of international cooperation between ESA and NASA. JL
in 77% events and prediction errors less than the mean value appreciates discussions with Dr. Xin Huang (National As-
of average absolute errors of other models in 74% events. tronomical Observatories, Chinese Academy of Sciences).
To summarize, the main advantages of CAT-PUMA are We thank Dr. Manolis K. Georgoulis (Research Center for
that: it provides accurate prediction with mean absolute er- Astronomy and Applied Mathematics, Academy of Athens)
ror less than 6 hours; it does not rely on a priori assumption for his useful advice in improving this paper. JL and RE
or theory; due to the underlying principles of machine learn- acknowledge the support (grant number ST/M000826/1)
ing, CAT-PUMA can evolve and promisingly improve with received by the Science and Technology Facility Council
more input events in the future; and finally, CAT-PUMA is (STFC), UK. RE is grateful for the support received from
a very fast open-source tool allowing all interested users to the Royal Society (UK). YW is supported by the grants
give their own predictions within several minutes after pro- 41574165 and 41774178 from NSFC.
APPENDIX
# CME Parameters
time = ’2015-12-28T12:12:00’ # CME Onset time in LASCO C2
width = 360. # angular width, degree, set as 360 if it is halo
speed = 1212. # linear speed in LASCO FOV, km/s
final_speed = 1243. # second order final speed leaving LASCO FOV, km/s
mass = 1.9e16 # estimated mass using ‘cme_mass.pro’ in SSWIDL or
# obtained from the SOHO LASCO CME Catalog
mpa = 163. # degree, position angle corresponding to the fasted front
actual = ’2015-12-31T00:02:00’ # Actual arrival time, set to None if unknown
CAT- PUMA 11
The above lines define the onset time, angular width, average speed, final speed, estimated mass and MPA of the target CME.
These parameters can easily be obtained from the SOHO LASCO CME Catalog (https://cdaw.gsfc.nasa.gov/CME_
list/) if available or via analyzing LASCO fits files otherwise. Here, we employ a fast halo CME that erupted at 2015-
12-28T12:12 UT as the first example. This event was not included in our input dataset when constructing CAT-PUMA. Line
166 defines whether a user prefers to obtain the solar wind parameters automatically. If yes, the code will download solar wind
parameters for the specified CME automatically from the OMNIWeb Plus website (https://omniweb.gsfc.nasa.gov/).
Next, one can then run the code, typically via typing in the command python2 cat_puma.py, after following the above instruc-
tions to setup the user’s own target CME. The prediction will be given within minutes. The prediction result for the above CME
is as following (information in the last two lines will not be given if one has not specified the actual arrival time):
CME with onset time 2015-12-28T12:12:00 UT
will hit the Earth at 2015-12-30T18:29:33 UT
with a transit time of 54.3 hours
The actual arrival time is 2015-12-31T00:02:00 UT
The prediction error is -5.5 hours
(a) (b)
Alternatively, one can use the well-designed UI via running the command python2 cat_puma_qt.py. A proper run of it needs
additional Python library PyQt5 installed. Let us illustrate how this UI can be used with another example CME that erupted at
2016-04-10T11:12 UT. Again, this event was not included in our input dataset when constructing CAT-PUMA either. Figure 7a
shows the UI and corresponding CME parameters for this event. Average speed (543 km s−1 ), final speed (547 km s−1 ), angular
width (136◦) and the MPA (25◦ ) were obtained from the SOHO LASCO CME Catalog. The mass of the CME was estimated by the
built-in function “cme_mass.pro" in the SolarSoft IDL. It turns out to be ∼ 4.6 × 1015 g. By checking the option “Automatically
Obtain Solar Wind Parameters”, solar wind parameters are obtained automatically from the OMNIWeb Plus website (https://
omniweb.gsfc.nasa.gov/) after clicking the “Submit” button. Then, actual values of the solar wind parameters is shown.
Parameters that are not available from the OMNIWeb Plus website are set to 0.00001 (manually input of these parameters are
then needed in this case, near real-time solar wind data can be download from the CDAWeb website https://cdaweb.sci.
gsfc.nasa.gov/istp_public/). Figure 7b shows the prediction result for the above CME, revealing an error of 5.2
hours.
REFERENCES
Ahmed, O. W., Qahwaji, R., Colak, T., et al. 2013, SoPh, 283, 157 Amari, S.-i., Murata, N., Muller, K.-R., Finke, M., & Yang, H. H.
1997, IEEE Transactions on Neural Networks, 8, 985
12 J IAJIA L IU ET AL .
Antiochos, S. K., DeVore, C. R., & Klimchuk, J. A. 1999, Liu, R., Liu, C., Wang, S., Deng, N., & Wang, H. 2010a,
ApJ, 510, 485 ApJL, 725, L84
Biesecker, D. A., Myers, D. C., Thompson, B. J., Hammer, D. M., Liu, W., Nitta, N. V., Schrijver, C. J., Title, A. M., & Tarbell, T. D.
& Vourlidas, A. 2002, ApJ, 569, 1009 2010b, ApJL, 723, L53
Bobra, M. G., & Couvidat, S. 2015, Astrophys. J., 798, 135 Low, B. C. 2001, J. Geophys. Res., 106, 25141
Bobra, M. G., & Ilonidis, S. 2016, Astrophys. J., 821, 127 Lugaz, N., Temmer, M., Wang, Y., & Farrugia, C. J. 2017,
Brueckner, G. E., Howard, R. A., Koomen, M. J., et al. 1995, SoPh, 292, 64
Sol. Phys., 162, 357 Manoharan, P. K. 2006, SoPh, 235, 345
Chen, P. F. 2011, Living Reviews in Solar Physics, 8, 1 Mays, M. L., Taktakishvili, A., Pulkkinen, A. A., et al. 2013, AGU
Chen, P. F., Fang, C., & Shibata, K. 2005, ApJ, 622, 1202 Fall Meeting Abstracts
Chen, Y., Du, G., Feng, L., et al. 2014, ApJ, 787, 59 Mishra, W., Wang, Y., & Srivastava, N. 2016, ApJ, 831, 99
Chi, Y., Shen, C., Wang, Y., et al. 2016, SoPh, 291, 2419 Moon, Y.-J., Dryer, M., Smith, Z., Park, Y. D., & Cho, K. S. 2002,
Detman, T., Smith, Z., Dryer, M., et al. 2006, Geophys. Res. Lett., 29, 28
Journal of Geophysical Research (Space Physics), 111, A07102 Möstl, C., Amla, K., Hall, J. R., et al. 2014,
Dryer, M., Fry, C. D., Sun, W., et al. 2001, SoPh, 204, 265 The Astrophysical Journal, 787, 119
Feng, X., & Zhao, X. 2006, SoPh, 238, 167 Nishizuka, N., Sugiura, K., Kubo, Y., et al. 2017,
Feng, X., Zhou, Y., & Wu, S. T. 2007, ApJ, 655, 1110 Astrophys. J., 835, 156
Forbes, T. G. 2000, J. Geophys. Res., 105, 23153
Pedregosa, F., Varoquaux, G., Gramfort, A., et al. 2011, Journal of
Géron, A. 2017, Hands-on machine learning with Scikit-Learn and
Machine Learning Research, 12, 2825
TensorFlow: concepts, tools, and techniques to build intelligent
Prise, A. J., Harra, L. K., Matthews, S. A., Arridge, C. S., &
systems
Achilleos, N. 2015,
Gibson, S. E., & Low, B. C. 1998, ApJ, 493, 460
Journal of Geophysical Research (Space Physics), 120, 1566
Gopalswamy, N. 2016, Geoscience Letters, 3, 8
Qahwaji, R., & Colak, T. 2007, SoPh, 241, 195
Gopalswamy, N., Lara, A., Yashiro, S., Nunes, S., & Howard, R. A.
Qiu, J., Wang, H., Cheng, C. Z., & Gary, D. E. 2004, ApJ, 604, 900
2003, in ESA Special Publication, Vol. 535, Solar Variability as
Richardson, I. G., & Cane, H. V. 2010, Sol. Phys., 264, 189
an Input to the Earth’s Environment, ed. A. Wilson, 403
Riley, P., Linker, J. A., Lionello, R., & Mikic, Z. 2012,
Gopalswamy, N., Yashiro, S., Michalek, G., et al. 2009,
Journal of Atmospheric and Solar-Terrestrial Physics, 83, 1
Earth Moon and Planets, 104, 295
Riley, P., Linker, J. A., & Mikić, Z. 2013,
Gopalswamy, N., Yashiro, S., Michalek, G., et al. 2010, Sun and
Journal of Geophysical Research (Space Physics), 118, 600
Geosphere, 5, 7
Robbrecht, E., Berghmans, D., & Van der Linden, R. A. M. 2009,
Gui, B., Shen, C., Wang, Y., et al. 2011, SoPh, 271, 111
ApJ, 691, 1222
Hansen, R. T., Garcia, C. J., Grognard, R. J.-M., & Sheridan, K. V.
Schwenn, R., dal Lago, A., Huttunen, E., & Gonzalez, W. D. 2005,
1971, Proceedings of the Astronomical Society of Australia, 2,
57 Annales Geophysicae, 23, 1033
Harrison, R. A. 1995, A&A, 304, 585 Sharma, R., Srivastava, N., Chakrabarty, D., Möstl, C., & Hu, Q.
Hess, P., & Zhang, J. 2017, SoPh, 292, 80 2013,
Isavnin, A., Vourlidas, A., & Kilpua, E. K. J. 2014, Journal of Geophysical Research (Space Physics), 118, 3954
SoPh, 289, 2141 Shen, C., Liao, C., Wang, Y., Ye, P., & Wang, S. 2013a,
Jackson, B. V., Sheridan, K. V., Dulk, G. A., & McLean, D. J. SoPh, 282, 543
1978, Proceedings of the Astronomical Society of Australia, 3, Shen, C., Wang, Y., Pan, Z., et al. 2013b,
241 Journal of Geophysical Research (Space Physics), 118, 6858
Jing, J., Yurchyshyn, V. B., Yang, G., Xu, Y., & Wang, H. 2004, Shen, C., Wang, Y., Wang, S., et al. 2012, Nature Physics, 8, 923
ApJ, 614, 1054 Shen, F., Shen, C., Wang, Y., Feng, X., & Xiang, C. 2013c,
Kay, C., Opher, M., & Evans, R. M. 2015, ApJ, 805, 168 Geophys. Res. Lett., 40, 1457
Lantos, P., Kerdraon, A., Rapley, G. G., & Bentley, R. D. 1981, Shen, Y., Liu, Y., Su, J., & Deng, Y. 2012, ApJ, 745, 164
A&A, 101, 33 Smith, Z., & Dryer, M. 1990, SoPh, 129, 387
Li, R., Wang, H.-N., He, H., Cui, Y.-M., & Zhan-LeDu. 2007, Smola, A. J., & Schölkopf, B. 2004,
ChJA&A, 7, 441 Statistics and Computing, 14, 199
Lin, J., & Forbes, T. G. 2000, J. Geophys. Res., 105, 2375 Subramanian, P., Lara, A., & Borgazzi, A. 2012,
Liu, J., Wang, Y., Shen, C., et al. 2015, ApJ, 813, 115 Geophys. Res. Lett., 39, L19107
CAT- PUMA 13
Thernisien, A. F. R., Howard, R. A., & Vourlidas, A. 2006, Wang, Y. M., Ye, P. Z., Wang, S., & Xue, X. H. 2003,
Astrophys. J., 652, 763 Geophys. Res. Lett., 30, 33
Tóth, G., Sokolov, I. V., Gombosi, T. I., et al. 2005, Wang, Y. M., Ye, P. Z., Wang, S., Zhou, G. P., & Wang, J. X.
Journal of Geophysical Research (Space Physics), 110, A12226 2002b,
Tousey, R. 1973, in Space Research Conference, Vol. 2, Space Journal of Geophysical Research (Space Physics), 107, 1340
Research Conference, ed. M. J. Rycroft & S. K. Runcorn, 713 Webb, D. F., & Howard, T. A. 2012,
Vandas, M., Fischer, S., Dryer, M., Smith, Z., & Detman, T. 1996, Living Reviews in Solar Physics, 9, 3
J. Geophys. Res., 101, 15645 Xie, H., Ofman, L., & Lawrence, G. 2004,
Vapnik, V. 2013, The nature of statistical learning theory (Springer Journal of Geophysical Research (Space Physics), 109, A03109
science & business media) Yang, Y. H., Tian, H. M., Peng, B., Li, T. R., & Xie, Z. X. 2017,
Vršnak, B. 2001, SoPh, 202, 173 SoPh, 292, 131
Vršnak, B., & Cliver, E. W. 2008, SoPh, 253, 215 Zhang, J., Cheng, X., & Ding, M.-D. 2012,
Vršnak, B., & Žic, T. 2007, A&A, 472, 937 Nature Communications, 3, 747
Wang, Y., Shen, C., Wang, S., & Ye, P. 2004, SoPh, 222, 329 Zhang, J., Richardson, I. G., Webb, D. F., et al. 2007, Journal of
Geophysical Research (Space Physics), 112, A10102
Wang, Y., Ye, P., & Wang, S. 2007, Solar Physics, 240, 373
Zhao, X., & Dryer, M. 2014, Space Weather, 12, 448
Wang, Y., Zhou, G., Ye, P., Wang, S., & Wang, J. 2006,
Zheng, R., Chen, Y., Du, G., & Li, C. 2016, ApJL, 819, L18
ApJ, 651, 1245
Zhuang, B., Wang, Y., Shen, C., et al. 2017, ApJ, 845, 117
Wang, Y. M., Wang, S., & Ye, P. Z. 2002a, SoPh, 211, 333