Predictive Models For Vertical Total Electron Content in Ionosphere
Predictive Models For Vertical Total Electron Content in Ionosphere
Submitted by, S.PRIYA M.Phil (Computer Science) Avinashilingam University for Women 30/07/2011
model will be used for predicting the value of the response variable y for any new value of the predictor x. Even descriptive models make predictions : a histogram predicts the value of the density of a distribution where a new observation happens to fall.
The Figure1. shows the proposed system design for prediction of Vertical Total Electron Content.
KNN-AD
Select Winner
The research methodology consists of three stages. Stage 1: Preparation of input data into training and testing set Stage 2: Prediction of future values Stage 3: Comparison of the results with respect to efficiency in prediction In stage 1, given an input dataset (X) with n VTEC values obtained over a period of timet, the proposed methodology first groups X into two sets, namely, training dataset and testing dataset. To make meaningful forecasts, the predictors has to be trained on an appropriate data series. Data in the form of <input, output> pairs are extracted from X, where input and output are vectors equal in size to the number of network inputs and outputs, respectively. In the present research work, a 70% and 30% division was adopted, that is, 70% of the records in X is taken as training data and the rest of the 30% is taken as testing data.
PREDICTION METHODS 1. Prediction using K Nearest Neighbor with Correlation method 2. Prediction using K Nearest Neighbor with Absolute Distance method 3. Prediction using Neural Network Method 4. Prediction using Linear Predictive Coding
DATASET USED The TEC data is obtained by using the dual frequency of GPS receivers installed for the satellite navigation project of ISRO called GAGAN. These TEC data recorded by the Agatti satellite on January 2008 with the elevation angle 60 .
0
TEC dataset used in this research work contains 1065 records with the columns IST time, latitude, longitude, VTEC (Vertical Total Electron Content) and STEC (Slant Total Electron Content).VTEC values are taken for predictions. From the dataset 70% are taken for training and 30% are taken for testing.
PREDICTION USING K NEAREST NEIGHBOR WITH CORRELATION METHOD
Input X Input Vetor for VTEC D - Defines where to start the forecasts, Specify the column number to predict the VTEC values M - Embedding dimension k-Number of nearest neighbors to use in the forecast's calculation Output Insamp_for_corr Predicted VTEC values Procedure K Nearest Neighbor method is usually based on the identification of several historic Neighbors that may be then used for forecasting by either averaging their contribution or using an extrapolation method. The accuracy of the method is directly dependent on the ability to identify good neighbors.
Steps for KNN Correlation: Step 1: Define a starting training period and divide such period on different VTEC vectors represented as
Xm t
,T.T is the number of observation on the training period. The term m is also defined as the dimension of the time series. The training vector VTEC is denoted as
X1
m k 1
denoted as X m + k 2 , where k2 is the number of predictions to be made. m Step 2: Select k1 observations that is most similar to the training set of VTEC Yt m .For the method of correlation, in a formal notation, it is searched the k pieces with the highest value of ||.which represents the absolute (euclidian) correlation between Yi m and Yt m . Step 3: With the k1 data on hand, each for each observations, it is necessary to understand in which way the k vectors can be used to construct the forecast on t+1. The absolute distance method simply verifies the observations ahead of the k chosen neighbors and takes the average of them. The absolute correlation is calculated using the following equation (1).
m | | ( X im , X T ) + | | ( X im , YTm )
(1)
Input X Input Vetor for VTEC D - Defines where to start the forecasts, Specify the column number to predict the VTEC values M - Embedding dimension k-Number of nearest neighbors to use in the forecast's calculation Output
Insamp_For_Abs Predicted VTEC values Procedure The steps in the KNN Absoute Distance algorithm are presented below Step 1: Define a starting training period and divide such period on different VTEC vectors represented as
Xm t
The term m is also defined as the dimension of the time series. The training vector VTEC is denoted as
X1
m k 1
made. Step 2: Select k1 observations that is most similar to the training set VTEC
X1
m k 1
Step 3: With the k1 data on hand, each for each observations, it is necessary to understand in which way the k vectors can be used to construct the forecast on t+1. The absolute distance method simply verifies the observations ahead of the k chosen neighbors and takes the average of them. The steps 1-3 are executed in a loop until the point that all forecasts on t+1 are created.
PREDICTION USING NEURAL NETWORK METHOD
Input y - A time series in vertical vector form.(VTEC values) maxlag - maximume lag that should be entered in model.(eg 5) nhiden - number of hidden layer units. trset - percent of observations for trainig set. HPF - number of periods that should be forecasted. lr - Learning rate. Learning rate define a control parameter of some training algorithms,
which controls the step size when weights are iteratively adjusted.
Outputs RMSE -root mean squares error. yL - matrix of y's lags. minRMSE , - minimum of root mean squares error. yf - forecast values of VTEC. Procedure The backpropagation training stage uses the training VTEC dataset from and train the network in of three steps,
1.
Present an input vector VTEC to the network for training. Compute activation functions sequentially forward from the first hidden layer to the output layer (Figure 4.2, from layer A to layer C).
2.
Compute the difference between the desired output, and the actual network output (output of unit(s) in the output layer). Propagate the error sequentially backward from the output layer to the first hidden layer (Figure 4.2, from layer C to layer A).
3.
For every connection, change the weight modifying that connection in proportion to the error. When these three steps have been performed for every input from the
training data set, one epoch occurs. Training usually lasts thousands of epochs, possibly until a predetermined maximum number of epochs (epochs limit) is reached or the network output error (error limit) falls below an acceptable threshold. Training can be time-consuming, depending on the network size, number of examples, epochs limit, and error limit. In the first step, when an given input vector VTEC is presented, for each layer starting with the first hidden layer and for each unit in that layer, the output of the units activation function is computed (Equations 4.1 and 4.2), using which, the network propagates values through all units to the output(s). In the second
step, for each layer, each unit of the output layer, an error term is computed. For each unit in the output layer, the error term in (Equation 2) is computed.
c = h (x)(D c O c ) Output
(2)
Where Dc is the desired network output (from the output vector) corresponding to the current output layer unit, Oc is the actual network output corresponding to the current output layer unit, and the error term in Equation 3 is computed.
N c = h W Hidden n n,c n =1
h Output (x)
is the derivative of
the output unit linear activation function, i.e., 1. For each unit in the hidden layers,
(3)
Where N is the number of units in the next layer (either hidden or output layer),
n
is the error term for a unit in the next layer, and wn,c is the weight
h ( x) is Ok(1-Ok). Hidden
modifying the connection from unit c to unit n. The derivative of the hidden unit sigmoid activation function
In the third step, for each connection, the value of Equation 4, which presents the change in the weight modifying the connection from unit p to unit c, is computed and added to the weight.
w c, p = c O p
(4)
The weight modifying the connection from unit p to unit c is wc,p, is the learning rate (discussed later), and Op is the output of unit p or the network input p. Thus, after step three, the weights will have a different value. The goal of backpropagation training is to converge to a near-optimal solution based on the total squared error calculated (Equation 3).
1 c E c = (Dc O c ) 2 2 c= 1
(5)
Where C is the number of units in the output layer, Dc is the desired output corresponding to the current output layer unit, and Oc is the actual network output corresponding to the current output layer unit.
PREDICTION USING LINEAR PREDICTIVE CODING LPC uses linear prediction to extrapolate data, typically a time series, this is not the same as linear extrapolation. A window of autocorrelation coefficients is moved beyond the data limits to extrapolate the data. Input X - the input VTEC data series as a column vector or a matrix with series organized in columns np - the number of predictor coefficients to use (>=2) npred - The number of data values to return in the output pos - a string 'pre' or 'post' (default: post) .This determines whether extrapolation occurs before or after the observed series x. Output y-the output, appropriately sequenced for concatenation with input x.Output y is calculated using the formula y(k) = -a(2)*y(k-1) - a(3)*y(k-2) - ... - a(np)*y(k-np) Where y(n) => x(end-n) for n<=0 a - the coefficients returned by LPC (organized in rows). These can be used to check the quality/stability of the fit to the observed data to the LPC function. Procedure
Given x(n-1), x(n-2), , x(n-M), the problem here is to predict the value of VTEC denoted as x(n). In LPC, this predicted VTEC value can be expressed as a linear function of the given M past samples (Equation 6) . X(n | n-1, n-2, n-M) = (x(n-1), x(n-2), x(n-m)) (6)
When a value is predicted using the above equation, then it is said to be predicted linearly. The above equation can be rewritten as a M-dimensional vector (Equation 7) X(n | n-1, n-2, n-M) =
a
k =1
x( n k )
(7)
Using the equation 7 VTEC values are predicted, where ak is the constant coefficients. The prediction error rate for VTEC is defined as fM(n) = x(n) X(n | n-1, n-2, , n-M) (8)
Where M is the number of past samples that are used to predict the next VTEC values.