Statistical Estimation
Statistical Estimation
Introduction
Nowadys, you can often meet articles and publications written on subjects connected with
econometrics, forecasting of price series, choosing and estimating adequacy of a model, etc. But
in most cases, reasoning is based on assumption, that a reader is acquainted with the methods of
math statistics and can easily estimate statistical parameters of an analyzed sequence.
The article shortly describes the simplest statistical parameters of a random sequence and several
methods of its visual analysis. It offers the implementation of these methods in MQL5 and the
methods of visualization of the result of calculations using the Gnuplot application. By no means
has this article pretended to be a manual or a reference; that is why it may contain certain
familiarities accepted regarding the terminology and definitions.
Avoiding Outliers
Before starting the statistical estimation of parameters, we should note that the accuracy of
estimation may be insufficient if the sampling contains gross errors (outliers). There is a huge
influence of outliers on the accuracy of estimations if the sampling has a small volume. Outliers
are the values that abnormally diverge from the middle of distribution. Such deviations can be
caused by different hardly probable events and errors appeared while gathering statistical data
and forming the sequence.
It is hard to make a decision whether to filter outliers or not, since in most cases it is impossible
to clearly detect whether a value is an outlier or belongs to the analyzed process. So if outliers
are detected and there's a decision to filter them, then a question arises - what should we do
with those error values? The most logical variant is excluding from the sampling, what may
increase the accuracy of estimation of statistical characteristics; but you should be careful with
excluding outliers from sampling when working with time sequences
To have a possibility of excluding outliers from a sampling or at least detecting them, let's
implement the algorithm described in the book "Statistics for Traders" written by S.V. Bulashev.
According to this algorithm, we need to calculate five values of estimation of the center of
distribution:
1. Median;
2. Center of 50% interquartile range (midquartile range, MQR);
3. Arithmetical mean of entire sampling;
4. Arithmetical mean on the 50% interquartile range (interquartile mean, IQM);
5. Center of range (midrange) - determined as the average value of the maximum and
minimum value in a sampling.
Then the results of estimation of the center of distribution are sorted in ascending order; and
then the average value or the third one in order is chosen as the center of distribution Xcen.
Thus, the chosen estimation appears to be minimally affected by outliers.
Further, using the obtained estimation of the center of distribution Xcen, let's calculate the
standard deviation s, the excess K and the rate of censoring according to the empirical formula:
will be counted as outliers, thus they should be excluded from the sampling.
This method is described in details in the "Statistics for Traders" book, so let's go straight to
implementation of the algorithm. The algorithm that allows detecting and excluding outliers is
implemented in the erremove() function.
Below you can find the script written for testing this function.
//-----------------------------------------------------------------
-----------
// erremove.mq5
// Copyright 2011, MetaQuotes Software
Corp.
//
http://www.mql5.com
//-----------------------------------------------------------------
-----------
#property copyright "Copyright 2011, MetaQuotes Software Corp."
#property link "http://www.mql5.com"
#property version "1.00"
#import "shell32.dll"
bool ShellExecuteW( int hwnd, string lpOperation, string lpFile,
string lpParameters, string lpDirectory, int nShowCmd);
#import
//-----------------------------------------------------------------
-----------
// Script program start function
//-----------------------------------------------------------------
-----------
void OnStart ()
{
int i;
double dat[ 100 ];
double y[];
srand ( 1 );
for (i= 0 ;i< ArraySize(dat);i++)dat[i]= rand()/ 16000.0 ;
dat[ 25 ]= 3 ; // Make Error !!!
erremove(dat,y, 1 );
}
//-----------------------------------------------------------------
-----------
int erremove( const double &x[], double &y[], int visual= 1 )
{
int i,m,n;
double a[],b[ 5 ];
double dcen,kurt,sum2,sum4,gs,v,max,min;
if (! ArrayIsDynamic(y)) // Error
{
Print ( "Function erremove() error!" );
return (- 1 );
}
n= ArraySize(x);
if (n< 4 ) // Error
{
Print ( "Function erremove() error!" );
return (- 1 );
}
ArrayResize (a,n);
ArrayCopy(a,x);
ArraySort(a);
b[ 0 ]=(a[ 0 ]+a[n- 1 ])/ 2.0 ; // Midrange
m=(n- 1 )/ 2 ;
b[ 1 ]=a[m]; // Median
if ((n& 0x01)== 0 )b[ 1 ]=(b[ 1 ]+a[m+ 1 ])/ 2.0 ;
m=n/ 4 ;
b[ 2 ]=(a[m]+a[n-m- 1 ])/ 2.0 ; // Midquartile range
b[ 3 ]= 0 ;
for (i=m;i<n-m;i++)b[ 3 ]+=a[i]; // Interquartile mean(IQM)
b[ 3 ]=b[ 3 ]/(n- 2 *m);
b[ 4 ]= 0 ;
for (i= 0 ;i<n;i++)b[ 4 ]+=a[i]; // Mean
b[ 4 ]=b[ 4 ]/n;
ArraySort(b);
dcen=b[ 2 ]; // Distribution center
sum2= 0 ; sum4= 0 ;
for (i= 0 ;i<n;i++)
{
a[i]=a[i]-dcen;
v=a[i]*a[i];
sum2+=v;
sum4+=v*v;
}
if (sum2< 1 .e- 150 )kurt= 1.0 ;
kurt=((n*n- 2 *n+ 3 )*sum4/sum2/sum2-( 6.0 *n- 9.0 )/n)*(n- 1.0 )/(n- 2.0 )/(n- 3.0 ); //
Kurtosis
if (kurt< 1.0 )kurt= 1.0 ;
gs=( 1.55+ 0.8 * MathLog10(( double )n/ 10.0)* MathSqrt (kurt- 1 ))* MathSqrt (sum2/(n- 1
));
max=dcen+gs;
min=dcen-gs;
m= 0 ;
for (i= 0 ;i<n;i++) if (x[i]<=max&&x[i]>=min)a[m++]=x[i];
ArrayResize (y,m);
ArrayCopy(y,a, 0 , 0 ,m);
if (visual== 1 )vis(x,dcen,min,max,n-m);
return (n-m);
}
//-----------------------------------------------------------------
-----------
void vis( const double &x[], double dcen, double min, double max, int numerr)
{
int i;
double d,yma,ymi;
string str;
yma=x[ 0 ];ymi=x[ 0 ];
for (i= 0 ;i< ArraySize(x);i++)
{
if (yma<x[i])yma=x[i];
if (ymi>x[i])ymi=x[i];
}
if (yma<max)yma=max;
if (ymi>min)ymi=min;
d=(yma-ymi)/ 20.0;
yma+=d;ymi-=d;
str= "unset key\n" ;
str+= "set title 'Sequence and error levels (number of errors = "+
( string )numerr+ ")' font ',10'\n" ;
str+= "set yrange [" +( string )ymi+ ":" +( string )yma+ "]\n" ;
str+= "set xrange [0:"+( string ) ArraySize(x)+ "]\n" ;
str+= "plot " +( string )dcen+ " lt rgb 'green',";
str+=( string )min+ " lt rgb 'red'," ;
str+=( string )max+ " lt rgb 'red'," ;
str+= "'-' with line lt rgb 'dark-blue'\n" ;
for (i= 0 ;i< ArraySize(x);i++)str+=( string )x[i]+ "\n";
str+= "e\n" ;
if (!saveScript(str)){ Print ( "Create script file error" ); return ;}
if (!grPlot()) Print ( "ShellExecuteW() error" );
}
//-----------------------------------------------------------------
-----------
bool grPlot()
{
string pnam,param;
pnam= "GNUPlot\\binary\\wgnuplot.exe" ;
param= "-p MQL5\\Files\\gplot.txt" ;
return (ShellExecuteW( NULL, "open" ,pnam,param, NULL, 1 ));
}
//-----------------------------------------------------------------
-----------
bool saveScript( string scr1= "" , string scr2= "" )
{
int fhandle;
fhandle= FileOpen ( "gplot.txt" , FILE_WRITE | FILE_TXT | FILE_ANSI);
if (fhandle== INVALID_HANDLE) return ( false );
FileWriteString (fhandle, "set terminal windows enhanced size 560,420 font
8\n");
FileWriteString (fhandle,scr1);
if (scr2!= "" ) FileWriteString (fhandle,scr2);
FileClose(fhandle);
return ( true);
}
//-----------------------------------------------------------------
-----------
Let's take a detailed look at the erremove() function. As the first parameter of the function we
pass the address of the array x[], where the values of the analyzed sampling are stored; the
sampling volume must be no less than four elements. It is supposed that the size of the x[] array
is equal to the sampling size, that's why the N value of volume of the sampling is not passed.
Data located in the x[] array is not changed as a result of execution of the function.
The next parameter is the address of the y[] array. In case of successful execution of the
function, this array will contain the input sequence with outliers excluded. Size of the y[] array is
less than the size of the x[] array by the number of values excluded from the sampling. The y[]
array must be declared as a dynamic one, otherwise it will be impossible to change its size in the
body of the function.
The last (optional) parameter is the flag responsible for the visualization of the calculation results.
If its value is equal to one (default value), then before the end of execution of the function the
chart displaying the following information will be drawn in a separate window: the input
sequence, the line of center of distribution and the limits of range, values outside of which will be
considers as outliers.
The method of drawing charts will be described later. In case of successful execution, the
function returns the number of values excluded from the sampling; in case of an error, it returns
-1. If no error values (outliers) are discovered, the function will return 0 and the sequence in the
y[] array will be the same as in x[].
In the beginning of the function, the information is copied from the x[] to the a[] array, then it is
sorted in ascending order, and then five estimations of the center of distribution are made.
The middle of range (midrange) is determined as the sum of extreme values of the sorted
array a[] divided by two.
Considering that indexes of the sorted array a[] start from zero, we get:
m=(n- 1 )/ 2 ;
median=a[m];
if ((n& 0x01)== 0 )b[ 1 ]=(median+a[m+ 1 ])/ 2.0 ;
m=n/ 4 ;
MQR=(a[m]+a[n-m- 1 ])/ 2.0 ; // Midquartile range
Arithmetical means of the 50% interquartile range (interquartile mean, IQM). 25% of
samples are cut from both sides of the sampling, and the remaining 50% are used for calculation
of the arithmetical mean:
m=n/ 4 ; IQM= 0 ;
for (i=m;i<n-m;i++)IQM+=a[i];
IQM=IQM/(n- 2 *m); // Interquartile mean(IQM)
Each of the determined values is written to the b[] array, and then the array is sorted in
ascending order. An element value of the b[2] array is chosen as the center of the distribution.
Further, using this value, we will calculate the unbiased estimations of the arithmetical mean and
the coefficient of excess; the algorithm of calculation will be described later.
The obtained estimations are used for calculation of the coefficient of censoration and limits of
the range for detecting outliers (the expressions are shown above). In the end, the sequence
with excluded outliers is formed in the y[] array, and the vis() functions is called for drawing the
graph. Let's take a short look at the method of visualization used in this article.
Visualization
To display the results of calculation, I use the freeware application gnuplot intended for making
various 2D and 3D graphs. Gnuplot has the possibility of displaying charts on the screen (in a
separate window) or writing them to a file in different graphic formats. The commands of plotting
charts can be executed from a preliminary prepared text file. The official web page of the gnuplot
project is - gnuplot.sourceforge.net. The application is multi-platform, and it is distributed both as
the source code files and as binary files compiled for a certain platform.
The examples written for this article were tested under Windows XP SP3 and the 4.2.2 version of
gnuplot. The gp442win32.zip file can be downloaded at
http://sourceforge.net/projects/gnuplot/files/gnuplot/4.4.2/. I haven't tested the examples with
other versions and builds of gnuplot.
Once you download the gp442win32.zip archive, unzip it. As a result, the \gnuplot folder is
created, it contains the application, the help file, documentation and examples. To interact with
applications, put the entire \gnuplot folder to the root folder of your MetaTrader 5 client terminal.
Once the folder is moved, you can change the operability of the gnuplot application. To do it,
execute the file \gnuplot\binary\wgnuplot.exe, and then, as the "gnuplot>" command prompt
appears, type the "plot sin(x)" command. As a result, a window with the sin(x) function drawn in
it should appear. Also you can try the examples included in the application delivery; to do it,
choose the File\Demos item and select the file \gnuplot\demo\all.dem.
Now as you start the erremove.mq5 script, the graph demonstrated in the figure 2 will be draw
in a separate window:
Figure 2. The graph that is drawn using the erremove.mq5 script.
Further in the article, we are going to talk little about some feature of using gnuplot, since the
information about the program and its controls can be easily found in the documentation, which
is delivered together with it, and at various websites, such as http://gnuplot.ikir.ru/.
The examples of programs written for this article use a simplest method of interaction with
gnuplot for drawing the charts. At first, the text file gplot.txt is created; it contains the gnuplot
commands and information to be displayed. Then the wgnuplot.exe application is started with the
name of that file passed as an argument in the command line. The wgnuplot.exe application is
called using the ShellExecuteW() function imported from the system library shell32.dll; it's the
reason why the import of external dlls must be allowed in the client terminal.
The given version of gnuplot allows drawing charts in a separate window for two types of
terminals: wxt and windows. The wxt terminal uses the algorithms of antialiasing for drawing of
charts, what allows obtaining a higher quality picture comparing to the windows terminal.
However, the windows terminal was used for writing the examples for this article. The reason is
when working with the windows terminal, the system process created as a result of the
"wgnuplot.exe -p MQL5\\Files\\gplot.txt" call and opening of a graph window is automatically
killed as the window is closed.
If you choose the wxt terminal, then when you close the chart window, the system process
wgnuplot.exe will not be shut down automatically. Thus, if you use the wxt terminal and call
wgnuplot.exe for many times as described above, multiple processes without any signs of activity
may accumulate in the system. Using the "wgnuplot.exe -p MQL5\\Files\\gplot.txt" call and the
windows terminal, you can avoid opening of unwanted additional window and appearing of
unclosed system processes.
The window, where the chart is displayed, is interactive and it processes the mouse click and
keyboard events. To get the information about default hotkeys, run wgnuplot.exe, choose a type
of terminal using the "set terminal windows" command and plot any chart, for example using the
"plot sin(x)" command. If the chart window is active (in focus), then you'll see a tip displayed in
the text window of wgnuplot.exe as soon as you press the "h" button.
Estimation of Parameters
After the short acquaintance with the method of chart drawing, let's return to the estimation of
parameters of the general population on the basis of its finite sampling. Supposing that no
statistical parameters of the general population are known, we are going to use only unbiased
estimations of these parameters.
The estimation of mathematical expectation or the sampling mean can be considered as the main
parameter that determines the distribution of a sequence. The sampling mean is calculated using
the following formula:
The mean value is an estimation of the center of distribution and it is used for calculation of
other parameters connected with central moments, what makes this parameter especially
important. In addition to the mean value, we will use the estimation of dispersion (dispersion,
variance), the standard deviation, the coefficient of skewness (skewness) and the coefficient of
excess (kurtosis) as statistical parameters.
The second, third and fourth selective central moments are determined by the following
expressions:
But those values are unbiased. Here we should mention k-Statistic and h-Statistic. Under certain
conditions they allow obtaining unbiased estimations of central moments, so they can be used for
calculation of unbiased estimations of dispersion, standard deviation, skewness and kurtosis.
Note that the calculation of the fourth moment in the k and h estimations is performed in
different ways. It results in obtaining different expressions for the estimation of kurtosis when
using k or h. For example, in Microsoft Excel the excess is calculated using the formula that
corresponds to the use of k-estimations, and in the "Statistics for Traders" book, the unbiased
estimation of kurtosis is done using the h-estimations.
Let's choose the h-estimations, and then by substituting them instead of 'm' in previously given
expression, we will calculate the necessary parameters.
Skewness:
Kurtosis:
The coefficient of excess (kurtosis) calculated according to the given expression for the sequence
with normal distribution law is equal to 3.
You should pay attention that the value obtained by subtracting 3 from the calculated value is
often used as the kurtosis value; thus the obtained value is normalized relatively to the normal
distribution law. In the first case, this coefficient is called kurtosis; in the second case, it is called
"excess kurtosis".
The calculation of parameters according to the given expression is performed in the dStat()
function:
struct statParam
{
double mean;
double median;
double var;
double stdev;
double skew;
double kurt;
};
//-----------------------------------------------------------------
-----------
int dStat( const double &x[],statParam &sP)
{
int i,m,n;
double a,b,sum2,sum3,sum4,y[];
ZeroMemory (sP); // Reset sP
n= ArraySize(x);
if (n< 4 ) // Error
{
Print ( "Function dStat() error!" );
return (- 1 );
}
sP.kurt= 1.0 ;
ArrayResize (y,n);
ArrayCopy(y,x);
ArraySort(y);
m=(n- 1 )/ 2 ;
sP.median=y[m]; // Median
if ((n& 0x01)== 0 )sP.median=(sP.median+y[m+ 1 ])/ 2.0 ;
sP.mean= 0 ;
for (i= 0 ;i<n;i++)sP.mean+=x[i];
sP.mean/=n; // Mean
sum2= 0 ;sum3= 0 ;sum4= 0 ;
for (i= 0 ;i<n;i++)
{
a=x[i]-sP.mean;
b=a*a;sum2+=b;
b=b*a;sum3+=b;
b=b*a;sum4+=b;
}
if (sum2< 1 .e- 150 ) return ( 1 );
sP.var=sum2/(n- 1 ); // Variance
sP.stdev= MathSqrt (sP.var); // Standard deviation
sP.skew=n*sum3/(n- 2 )/sum2/sP.stdev; // Skewness
sP.kurt=((n*n- 2 *n+ 3 )*sum4/sum2/sum2-( 6.0 *n- 9.0 )/n)*
(n- 1.0 )/(n- 2.0 )/(n- 3.0 ); // Kurtosis
return ( 1 );
When dStat() is called, the address of the x[] array is passed to the function. It contains the
initial data and the link to the statParam structure, which will contain calculated values of the
parameters. In case of an error occurring when there are less than four elements in the array,
the function returns -1.
Histogram
In addition to the parameters calculated in the dStat() function, the law of distribution of the
general population is of a big interest for us. To visually estimate the distribution law on the finite
sampling, we can draw a histogram. When drawing the histogram, the range of values of the
sampling is divided into several similar sections. And then the number of elements in each section
is calculated (group frequencies).
Further, a bar diagram is drawn on the basis of group frequencies. It is called a histogram. After
normalizing to the range width, the histogram will represent an empiric density of distribution of
a random value. Let's use the empiric expression described in the "Statistics for Traders" for
determining an optimal number of sections for drawing the histogram:
where L is the required number of sections, N is the volume of sampling and e is the kurtosis.
Below you can find the dHist(), which determines the number of sections, calculates the number
of elements in each of them and normalizes obtained group frequencies.
struct statParam
{
double mean;
double median;
double var;
double stdev;
double skew;
double kurt;
};
//-----------------------------------------------------------------
-----------
int dHist( const double &x[], double &histo[], const statParam &sp)
{
int i,k,n,nbar;
double a[],max,s,xmin;
if (! ArrayIsDynamic(histo)) // Error
{
Print ( "Function dHist() error!" );
return (- 1 );
}
n= ArraySize(x);
if (n< 4 ) // Error
{
Print ( "Function dHist() error!" );
return (- 1 );
}
nbar=(sp.kurt+ 1.5 )* MathPow (n, 0.4 )/ 6.0 ;
if ((nbar& 0x01)== 0 )nbar--; if (nbar< 5 )nbar= 5 ; // Number of bars
ArrayResize (a,n);
ArrayCopy(a,x);
max= 0.0 ;
for (i= 0 ;i<n;i++)
{
a[i]=(a[i]-sp.mean)/sp.stdev; // Normalization
if ( MathAbs (a[i])>max)max= MathAbs (a[i]);
}
xmin=-max;
s= 2.0 *max*n/nbar;
ArrayResize (histo,nbar+ 2 );
ArrayInitialize (histo, 0.0 );
histo[ 0 ]= 0.0 ;histo[nbar+ 1 ]= 0.0 ;
for (i= 0 ;i<n;i++)
{
k=(a[i]-xmin)/max/ 2.0 *nbar;
if (k>(nbar- 1 ))k=nbar- 1 ;
histo[k+ 1 ]++;
}
for (i= 0 ;i<nbar;i++)histo[i+ 1 ]/=s;
return ( 1 );
}
The address of the x[] array is passed to the function. it contains the initial sequence. The
content of the array is not changed as a result of execution of the function. The next parameters
is the link to the histo[] dynamic array, where the calculated values will be stored. The number of
elements of that array will correspond to the number of sections used for the calculation plus two
elements.
One element containing zero value is added to the beginning and to the end of the histo[] array.
The last parameter is the address of the statParam structure that should contain the previously
calculated values of the parameters stored in it. In case the histo[] array passed to the function
is not a dynamic array or the input array x[] contains less than four elements, the function stops
its execution and returns -1.
Once you've drawn a histogram of obtained values, you can visually estimate whether the
sampling corresponds to the normal law of distribution. For a more visual graphical representation
of the correspondence to the normal law of distribution, we can draw a graph with the scale of
normal probability (Normal Probability Plot) in addition to the histogram.
To calculate values required for drawing the graph of normal probability, the function dRankit()
shown below is used.
struct statParam
{
double mean;
double median;
double var;
double stdev;
double skew;
double kurt;
};
//-----------------------------------------------------------------
-----------
int dRankit( const double &x[], double &resp[], double &xscale[], const statParam &sp)
{
int i,n;
double np;
if (! ArrayIsDynamic(resp)||! ArrayIsDynamic(xscale)) // Error
{
Print ( "Function dHist() error!" );
return (- 1 );
}
n= ArraySize(x);
if (n< 4 ) // Error
{
Print ( "Function dHist() error!" );
return (- 1 );
}
ArrayResize (resp,n);
ArrayCopy(resp,x);
ArraySort(resp);
for (i= 0 ;i<n;i++)resp[i]=(resp[i]-sp.mean)/sp.stdev;
ArrayResize (xscale,n);
xscale[n- 1 ]= MathPow ( 0.5 , 1.0 /n);
xscale[ 0 ]= 1 -xscale[n- 1 ];
np=n+ 0.365 ;
for (i= 1 ;i<(n- 1 );i++)xscale[i]=(i+ 1 - 0.3175 )/np;
for (i= 0 ;i<n;i++)xscale[i]=ltqnorm(xscale[i]);
return ( 1 );
}
//-----------------------------------------------------------------
-----------
double A1 = - 3.969683028665376 e+ 01 , A2 = 2.209460984245205 e+ 02 ,
A3 = - 2.759285104469687 e+ 02 , A4 = 1.383577518672690 e+ 02 ,
A5 = - 3.066479806614716 e+ 01 , A6 = 2.506628277459239 e+ 00 ;
double B1 = - 5.447609879822406 e+ 01 , B2 = 1.615858368580409 e+ 02 ,
B3 = - 1.556989798598866 e+ 02 , B4 = 6.680131188771972 e+ 01 ,
B5 = - 1.328068155288572 e+ 01 ;
double C1 = - 7.784894002430293 e- 03 , C2 = - 3.223964580411365 e- 01 ,
C3 = - 2.400758277161838 e+ 00 , C4 = - 2.549732539343734 e+ 00 ,
C5 = 4.374664141464968 e+ 00 , C6 = 2.938163982698783 e+ 00 ;
double D1 = 7.784695709041462 e- 03 , D2 = 3.224671290700398 e- 01 ,
D3 = 2.445134137142996 e+ 00 , D4 = 3.754408661907416 e+ 00 ;
//-----------------------------------------------------------------
-----------
double ltqnorm( double p)
{
int s= 1 ;
double r,x,q= 0 ;
if (p<= 0 ||p>= 1 ){ Print ( "Function ltqnorm() error!" ); return ( 0 );}
if ((p>= 0.02425 )&&(p<= 0.97575 )) // Rational approximation for central
region
{
q=p- 0.5 ; r=q*q;
x=(((((A1*r+A2)*r+A3)*r+A4)*r+A5)*r+A6)*q/(((((B1*r+B2)*r+B3)*r+B4)*r+B5)*r+ 1 );
return (x);
}
if (p< 0.02425 ) // Rational approximation for lower region
{
q= sqrt(- 2 * log (p));
s= 1 ;
}
else //if(p>0.97575) // Rational approximation for upper
region
{
q = sqrt(- 2 * log ( 1 -p));
s=- 1 ;
}
x=s*(((((C1*q+C2)*q+C3)*q+C4)*q+C5)*q+C6)/((((D1*q+D2)*q+D3)*q+D4)*q+ 1 );
return (x);
}
The address of the x[] array is inputted to the function. The array contains the initial sequence.
The next parameters are references to the output arrays resp[] and xscale[]. After the execution
of the function, the values used for drawing of the chart on the X and Y axes respectively are
written to the arrays. Then the address of the statParam structure is passed to the function. It
should contain previously calculated values of the statistical parameters of the input sequence. In
case of an error, the function returns -1.
When forming values for the X axis, the function ltqnorm() is called. It calculates the reverse
integral function of normal distribution. The algorithm that is used for calculation is taken from
"An algorithm for computing the inverse normal cumulative distribution function".
Four Charts
Previously I mentioned the dStat() function where the values of the statistical parameters are
calculated. Let's briefly repeat their meaning.
Dispersion (variance) – the mean value of squares of deviation of a random value from its
mathematical expectation (average value). The parameter that shows how big is the deviation of
a random value from its center of distribution. The more the value of this parameter is, the more
is the deviation.
Standard deviation – since the dispersion is measured as the square of a random value, the
standard deviation is often used as a more obvious characteristics of dispersion. It is equal to the
square root of the dispersion.
Skewness – if we draw a curve of distribution of a random value, the skewness will show how
asymmetrical is the curve of probability density relatively to the center of distribution. If the
skewness value is greater than zero, the curve of probability density has a steep left slope and a
flat right slope. If the skewness value is negative, then the left slope is flat and the right one is
steep. When the curve of probability density is symmetric to the center of distribution, the
skewness is equal to zero.
The coefficient of excess (kurtosis) – it describes the sharpness of a peak of the curve of
probability density and the steepness of slopes of the distribution tails. The sharper is the curve
peak near the center of distribution, the greater is the value of the kurtosis.
Despite the fact that mentioned statistical parameters describe a sequence in details, often you
can characterize a sequence in an easier way - on the basis of result of estimations represented
in a graphical form. For example, an ordinary graph of a sequence can greatly complete a view
obtained when analyzing the statistical parameters.
Previously in the article, I have mentioned the dHist() and dRankit() functions that allow
preparing data for drawing a histogram or a chart with the scale of normal probability. Displaying
the histogram and the graph of normal distribution together with the ordinary graph on the same
sheet, allows you determining main features of the analyzed sequence visually.
Those three listed charts should be supplemented with another one - the chart with the current
values of the sequence on the Y axis and its previous values on the X axis. Such a chart is called
a "Lag Plot". If there is a strong correlation between adjacent indications, values of a sampling
will stretch in a straight line. And if there is no correlation between adjacent indications, for
example, when analyzing a random sequence, then values will be dispersed all over the chart.
For a quick estimation of an initial sampling, I suggest to draw four charts on a single sheet and
to display the calculated values of the statistical parameter on it. This is not a new idea; you can
read about using the analysis of four mentioned charts here: "4-Plot".
In the end of the article, there is the "Files" section containing the script s4plot.mq5, which draws
those four charts on a single sheet. The dat[] array is created within the OnStart() function of the
script. It contains the initial sequence. Then the dStat(), dHist() and dRankit() functions are called
consequently for calculation of data required for drawing of the charts. The vis4plot() function is
called next. It creates a text file with the gnuplot commands on the basis of the calculated data,
and then it calls the application for drawing the charts in a separate window.
There is no point is showing the entire code of the script in article, since the dStat(), dHist() and
dRankit() were described previously, the vis4plot() function, which creates a sequence of gnuplot
commands, doesn't have any significant peculiarities, and description of the gnuplot commands
goes out of bounds of the article subject. In addition to that, you can use another method for
drawing the charts instead of the gnuplot application.
//-----------------------------------------------------------------
-----------
// Script program start function
//-----------------------------------------------------------------
-----------
void OnStart ()
{
int i;
double dat[ 128 ],histo[],rankit[],xrankit[];
statParam sp;
MathSrand( 1 );
for (i= 0 ;i< ArraySize(dat);i++) dat[i]= MathRand ();
if (dStat(dat,sp)==- 1 ) return ;
if (dHist(dat,histo,sp)==- 1 ) return ;
if (dRankit(dat,rankit,xrankit,sp)==- 1 ) return ;
vis4plot(dat,histo,rankit,xrankit,sp, 6 );
}
In this example, a random sequence is used for filling the dat[] array with initial data using the
MathRand() function. The script execution should result in following:
You should pay attention to the last parameter of the vis4plot() function. It is responsible for the
format of outputted numeric values. In this example, the values are outputted with six decimal
places. This parameter is the same as the one that determines the format in the DoubleToString()
function.
If values of the input sequence are too small or too big, you can use the scientific format for a
more obvious displaying. To do it, set that parameter to -5, for example. The -5 value is set on
default for the vis4plot() function.
To demonstrate the obviousness of the method of four charts for displaying of peculiarities of a
sequence, we need a generator of such sequences.
Below, there is the source code of the include file describing that class.
//-----------------------------------------------------------------
------------------
// RNDXor
// 2011,
victorg
// http://www.m
//-----------------------------------------------------------------
------------------
#property copyright "2011, victorg"
#property link "http://www.mql5.com"
#include <Object.mqh>
//-----------------------------------------------------------------
------------------
// Generation of pseudo-random sequences. The Xorshift RNG algorithm
// (George Marsaglia) with the 2**128 period of initial sequence is used.
// uint rand_xor128()
// {
// static uint x=123456789,y=362436069,z=521288629,w=88675123;
// uint t=(x^(x<<11));x=y;y=z;z=w;
// return(w=(w^(w>>19))^(t^(t>>8)));
// }
// Methods:
// Rand() - even distribution withing the range
[0,UINT_MAX=4294967295].
// Rand_01() - even distribution within the range [0,1].
// Rand_Norm() - normal distribution with zero mean and dispersion one.
// Rand_Exp() - exponential distribution with the parameter 1.0.
// Rand_Laplace() - Laplace distribution with the parameter 1.0
// Reset() - resetting of all basic values to initial state.
// SRand() - setting new basic values of the generator.
//-----------------------------------------------------------------
------------------
#define xor32 xx=xx^(xx<<13);xx=xx^(xx>>17);xx=xx^(xx<<5)
#define xor128 t=(x^(x<<11));x=y;y=z;z=w;w=(w^(w>>19))^(t^(t>>8))
#define inidat
x=123456789;y=362436069;z=521288629;w=88675123;xx=2463534242
class RNDXor128: public CObject
{
protected:
uint x,y,z,w,xx,t;
uint UINT_half;
public :
RNDXor128() {UINT_half= UINT_MAX >> 1 ;inidat;};
double Rand() {xor128; return (( double )w);};
int Rand( double & a[], int n)
{ int i; if (n< 1 ) return (- 1 );
if ( ArraySize(a)<n) return (- 2 );
for (i= 0 ;i<n;i++){xor128;a[i]=( double )w;}
return ( 0 );};
double Rand_01() {xor128; return (( double )w/ UINT_MAX );};
int Rand_01( double & a[], int n)
{ int i; if (n< 1 ) return (- 1 );
if ( ArraySize(a)<n) return (- 2 );
for (i= 0 ;i<n;i++){xor128;a[i]=( double )w/ UINT_MAX ;}
return ( 0 );};
double Rand_Norm() { double v1,v2,s,sln; static double ra; static uint
b= 0 ;
if (b==w){b= 0 ; return (ra);}
do {
xor128;v1=( double )w/UINT_half- 1.0 ;
xor128;v2=( double )w/UINT_half- 1.0 ;
s=v1*v1+v2*v2;
}
while (s>= 1.0 ||s== 0.0 );
sln= MathLog (s);sln= MathSqrt ((-sln-sln)/s);
ra=v2*sln;b=w;
return (v1*sln);};
int Rand_Norm( double & a[], int n)
{ int i; if (n< 1 ) return (- 1 );
if ( ArraySize(a)<n) return (- 2 );
for (i= 0 ;i<n;i++)a[i]=Rand_Norm();
return ( 0 );};
double Rand_Exp() {xor128; if (w== 0 ) return ( DBL_MAX );
return (- MathLog (( double )w/ UINT_MAX ));};
int Rand_Exp( double & a[], int n)
{ int i; if (n< 1 ) return (- 1 );
if ( ArraySize(a)<n) return (- 2 );
for (i= 0 ;i<n;i++)a[i]=Rand_Exp();
return ( 0 );};
double Rand_Laplace() { double a;xor128;
a=( double )w/UINT_half;
if (w>UINT_half)
{a= 2.0 -a;
if (a== 0.0 ) return (- DBL_MAX );
return ( MathLog (a));}
else
{ if (a== 0.0 ) return ( DBL_MAX );
return (- MathLog (a));}};
int Rand_Laplace( double & a[], int n)
{ int i; if (n< 1 ) return (- 1 );
if ( ArraySize(a)<n) return (- 2 );
for (i= 0 ;i<n;i++)a[i]=Rand_Laplace();
return ( 0 );};
void Reset() {inidat;};
void SRand( uint seed) { int i; if (seed!= 0 )xx=seed;
for (i= 0 ;i< 16 ;i++){xor32;}
xor32;x=xx;xor32;y=xx;
xor32;z=xx;xor32;w=xx;
for (i= 0 ;i< 16 ;i++){xor128;}};
int SRand( uint xs, uint ys, uint zs, uint ws)
{ int i; if (xs== 0 &&ys== 0 &&zs== 0 &&ws== 0 ) return (- 1 );
x=xs;y=ys;z=zs;w=ws;
for (i= 0 ;i< 16 ;i++){xor128;}
return ( 0 );};
};
//-----------------------------------------------------------------
------------------
The algorithm used for generating a random sequence is described in details in the article
"Xorshift RNGs" by George Marsaglia (see the xorshift.zip at the end of the article). Methods of
the RNDXor128 class are described in the RNDXor128.mqh file. Using this class, you can get
sequences with even, normal or exponential distribution or with Laplace distribution (double
exponential).
You should pay attention to the fact, that when an instance of the RNDXor128 class is created,
the basic values of the sequence are set to initial state. Thus in contrast to calling the
MathRand() function at each new start of a script or indicator that uses RNDXor128, one and the
same sequence will be generated. The same as when calling MathSrand() and then MathRand().
Sequence Examples
Below, as an example, you can find the results obtained when analyzing sequences that are
extremely different from each other with their properties.
#include "RNDXor128.mqh"
RNDXor128 Rnd;
//-----------------------------------------------------------------
-----------
void OnStart ()
{
int i;
double dat[ 512 ];
for (i= 0 ;i< ArraySize(dat);i++) dat[i]=Rnd.Rand_01();
...
}
Figure 4. Even distribution
#include "RNDXor128.mqh"
RNDXor128 Rnd;
//-----------------------------------------------------------------
-----------
void OnStart ()
{
int i;
double dat[ 512 ];
for (i= 0 ;i< ArraySize(dat);i++) dat[i]=Rnd.Rand_Norm();
...
}
Figure 5. Normal distribution
#include "RNDXor128.mqh"
RNDXor128 Rnd;
//-----------------------------------------------------------------
-----------
void OnStart ()
{
int i;
double dat[ 512 ];
for (i= 0 ;i< ArraySize(dat);i++) dat[i]=Rnd.Rand_Exp();
...
}
Figure 6. Exponential distribution
#include "RNDXor128.mqh"
RNDXor128 Rnd;
//-----------------------------------------------------------------
-----------
void OnStart ()
{
int i;
double dat[ 512 ];
for (i= 0 ;i< ArraySize(dat);i++) dat[i]=Rnd.Rand_Laplace();
...
}
Figure 7. Laplace Distribution
//-----------------------------------------------------------------
-----------
void OnStart ()
{
int i;
double dat[ 512 ];
for (i= 0 ;i< ArraySize(dat);i++) dat[i]= MathSin ( 2 * M_PI/ 4.37*i);
...
}
Figure 8. Sinusoidal sequence
#include "RNDXor128.mqh"
RNDXor128 Rnd;
//-----------------------------------------------------------------
-----------
void OnStart ()
{
int i;
double dat[ 512 ],a;
for (i= 0 ;i< ArraySize(dat);i++) {a+=Rnd.Rand_Laplace();dat[i]=a;}
...
}
Figure 9. Correlation between adjacent indications
Conclusion
Development of program algorithms that implement any kind of calculations is always a hard
work to do. The reason is a necessity of considering a lot of requirements to minimize mistakes
that can arise when rounding, truncating and overflowing variables.
While writing the examples for the article, I didn't perform any analysis of program algorithms.
When writing the function, the mathematical algorithms were implemented "directly. Thus if you
are going to use them in "serious" applications, you should analyze their stability and accuracy.
The article doesn't describe features of the gnuplot application at all. Those questions are just
beyond the scope of this article. Anyway, I would like to mention that gnuplot can be adapted for
joint using with MetaTrader 5. For this purpose, you need to make some corrections in its source
code and recompile it. In addition to it, the way of passing commands to gnuplot using a file is
probably not an optimal way, since interaction with gnuplot can be organized via a programming
interface.
Files
erremove.mq5 – example of a script that excludes errors from a sampling.
function_dstat.mq5 – function for calculation of statistical parameters.
function_dhist.mq5 - function for calculation of values of the histogram.
function_drankit.mq5 – function for calculation of values used when drawing a chart with
the scale of normal distribution.
s4plot.mq5 – example of a script that draws four chart on a single sheet.
RNDXor128.mqh – class of the generator of a random sequence.
xorshift.zip - George Marsaglia. "Xorshift RNGs".
Attached files
erremove.mq5 (4.53 KB)
function_dhist.mq5 (1.21 KB)
function_drankit.mq5 (2.46 KB)
xorshift.zip (38.35 KB)
function_dstat.mq5 (1.35 KB)
s4plot.mq5 (10.52 KB)
rndxor128.mqh (12.01 KB)