Chapter Six Stochastic Hydrology 5 Stochastic Hydrology 5.4: Engineering Hydrology Lecture Note
Chapter Six Stochastic Hydrology 5 Stochastic Hydrology 5.4: Engineering Hydrology Lecture Note
CHAPTER SIX
STOCHASTIC HYDROLOGY
5 Stochastic Hydrology
5.4 Introduction
Stochastic hydrology describes the physical processes involved in the movement of water onto,
over, and through the soil surface. Quite often the hydrologic problems we face do not require a
detailed discussion of the physical process, but only a time series representation of these
processes. Stochastic models may be used to represent, in simplified form, these hydrologic time
series. Some background in probability and statistics is necessary to fully understand this
concept.
The first two terms are deterministic in form and can be identified and quantified fairly easily;
the last two are stochastic with major random elements, and some minor persistence effects, less
easily identified and quantified.
~1~
Engineering Hydrology; Lecture Note
_ 1
Mean, µ=E(X1) = X = ∑ n Xt (5.2) nt=1
1 −
2 2 2
Variance, σ = S = ∑ n (Xt − X ) (5.3) n
−1 t=1 n−L − _ Co var
1
iance, λ=Cov( Xt , Xt+L ) =∑(
Xt −X )(Xt+L −X ) (5.4)
~2~
Engineering Hydrology; Lecture Note
n−L
t=1
In hydrology, moments of the third and higher orders are rarely considered because of the
unreliability of their estimates. Second order stationarity, also called covariance stationarity, is
usually sufficient in hydrology. A process is strictly stationary when the distribution of Xt does
not depend on time and when
all simultaneous distributions of the random variables of the process are only dependent on their
mutual time-lag. In another words, a process is said to be strictly stationary if its n-th (n for any
integers) order moments do not depend on time and are dependent only on their time lag.
If the values of the statistics of the sample (mean, variance, covariance, etc.) as calculated by
equations (5.2)-(5.4) are dependent on the timing or the length of the sample, i.e. if a definite
trend is discernible in the series, then it is a non-stationary series. Similarly, periodicity in a
series means that it is non-stationary. Mathematically one can write as:
E(X1) =µt
2
Var(Xt ) =σt
Cov(Xt , Xt+L ) =λL,t
White noise time series:
For a stationary ties series, if the process is purely random and stochastically independent, the
time series is called a white noise series. Mathematically one can write as:
E(X1) =µ
2
Var(Xt ) =σ
Cov(Xt , Xt+L ) =0 for all L ≠ 0
Gaussian time series:
A Gaussian random process is a process (not necessarily stationary) of which all random
variables are normally distributed, and of which all simultaneous distributions of random
variables of the process are normal. When a Gaussian random process is weekly stationary, it is
~3~
Engineering Hydrology; Lecture Note
also strictly stationary, since the normal distribution is completely characterized by its first and
second order moments.
Different statistical methods, both nonparametric tests and parametric tests, for identifying trend
in time-series are available in the literature. Two commonly used methods for identifying the
trend are discussed briefly in this section.
(1) Mann-Kendall test The test uses the raw (un-smoothed) hydrologic data to detect possible
trends. The Kendall statistic was originally devised by Mann (1945) as a non-parametric test for
trend. Later the exact distribution of the test statistic was derived by Kendall (1975).
~4~
Engineering Hydrology; Lecture Note
S =∑ ∑sgn(xj − Xi ) (5.5)
i=1 j=i+1
Where the Xj are the sequential data values, n is the length of the data set, and
⎧1if θ>0
⎪
sgn(θ) = ⎨0if θ=0 (5.6) ⎪
⎩−1if θ<0
Mann (1945) and Kendall (1975) have documented that when, the statistic S is approximately
normally distributed with the mean and the variance as follows:
E(S) =0 (5.7)
q
n(n −1)(2n +5) −∑tp (tp −1)(2tp + 5)
p=1
V (S) = (5.8)
18
th th
Where n = number of datatp = the number of ties for the p value (number of data in the p
group)q = the number of tied values (number of groups with equal values/ties) The standardized
Mann-Kendall test statistic ZMK is computed by
⎧S −1
⎪ S > 0
Var(s)
⎪
⎪
sgn(θ) = ⎨0 S =0 (5.9) ⎪ S +1
⎪S <0⎪Var(s)
⎩
The standardized MK statistic Z follows the standard normal distribution with mean of zero and
variance of one. The hypothesis that there has not trend will be rejected if
Zmk
> Z1−α/2
(5.10)
Where Z1−α/2 is the value read from a standard normal distribution table with αbeing the
significance level of the test.
~5~
Engineering Hydrology; Lecture Note
Linear regression method can be used to identify if there exists a linear trend in a hydrologic
time series. The procedure consists of two steps, fitting a linear regression equation with the time
T as independent variable and the hydrologic data, Y as dependent variable, i.e. Y = α + β.T
(5.11) and testing the statistical significance of the regression coefficient β. Test of hypothesis
concerning β can be made by noting that (β – β0)/Sβ has t distribution with n-2 degrees of
freedom. Thus the hypothesis H0: β = β0 versus H0: β≠β0 is tested by computing
β−β0
t= (5.12)
Sβ
Where Sβ is the standard deviation of the coefficient β with
SSβ=
(5.13)
n_
2
∑(Ti −T )
i=1
1 ^
2
and S = ∑ n (Y i − Y i ) (5.14) n − 2 i=1
^
Where S is the standard error of the regression, Yi and Y are observed and estimated hydrologic
variable from the regression equation, respectively. The hypothesis, i.e. no trend, is rejected if
t
> t1−α/2, n−2
Models for trend:
The shape of the trend depends on the background of the phenomenon studied. Any smooth trend
that is discernible may be quantified and then subtracted from the sample series. Common
models for trend may take the following forms: Tt = a + bt (a linear trend, as in Fig. 5.1) (5.15)
2 3
or Tt = a + bt + ct + dt + ... (a non-linear trend) (5.16) The coefficients a, b, c, d ... are usually
evaluated by least-squares fitting. The number of terms required in a polynomial trend being
primarily imposed by the interpretation of the studied phenomenon. The number of terms is
usually based on statistical analysis, which determines the terms contributing significantly to the
description and the interpretation of the time series. Restriction is made to the significant terms
because of the principle of parsimony concerning the number of unknown parameters (constants)
used in the model. One wishes to use as small a number of parameters as possible, because in
most cases the addition of a complementary parameter decreases the accuracy of the other
parameters. Also prediction- and control procedures are negatively influenced by an exaggerated
number of parameters. This principle of parsimony is not only important with respect to the
selection of the trend function but also with respect to other parts of the model.
~6~
Engineering Hydrology; Lecture Note
occurred prior to time t-τ. The correlogram for such a non-independent stochastic process is
shown in Fig 5.2(b). This is representative of an auto regressive process. Typically, such a
correlogram could be produced from a series described by the Autoregressive model:
Xt = a1Xt-1 + a2Xt-2 +a3Xt-3 + … + εt
(4.18) where ai are related to the autocorrelation coefficients ri and εt is a random
independent element.
• In the case of data containing a cyclic (deterministic) component, then rL ≠0 for all L ≠0, the
correlogram would appear as in Fig. 5.2(c). Where T is
~7~
Engineering Hydrology; Lecture Note
The smallest value of T is called the period. The dimension of T is time, T thus being a number
of time-units (years, months, days or hours, etc.) and we also have Pt+nT = Pt for all t and for all
integer n.
~8~
Engineering Hydrology; Lecture Note
2π1
ω==2π. frequency
period
the constant α is termed the amplitude and βthe phase (with respect to the origin) of the sine-
function.
A simple model for the periodic component may be defined as (for morediscussions refer to the
literature of Time Series Analysis):Pt = m + Csin(2πt/T) (5.19)
Where C is the amplitude of the sine wave about a level m and of wavelength T.The serial (auto)
correlation coefficients for such a Pt are given by:rL = cos (2πL/T) (5.20)
The cosine curve repeats every T time units throughout the correlogram with r L
= 1 for L = 0, T, 2T, 3T,… Thus periodicities in a time series are exposed by regular cycles in
the corresponding correlograms.
Once the significant periodicities, Pt, have been identified and quantified by µt
(the means) and σt (the standard deviations) they can be removed from the original times series
along with any trend, Tt, so that a new series of data, Et, is
formed:
Xt − Tt − mt
Et = (5.21)
St Simple models for periodic component in hydrology can be seen in the literature.
For example, in many regions, typical monthly potential evapotranspiration variation during the
year can be modelled more or less by a sinusoidal function, with a couple of parameters to tune
the annual mean and the amplitude (Xu and Vandewiele, 1995).
This behavior leads to the idea to model by a truncated Fourier series:
ept = {a + bsin[(2π /12)(t-c)]}+
where again t is time in month. The plus sign at the end is necessary for avoiding negative values
of ep which otherwise may occur in rare cases. Again parameters a, b and c are characteristics of
the basin.
~9~
Engineering Hydrology; Lecture Note
E −E
Zt = t (5.22) SE
_
Where E and sE are the mean and standard deviation of the Et series. The series, Zt, then has
zero mean and unit standard deviation. The autocorrelation coefficients of Zt are calculated and
the resultant correlogram is examined for evidence and recognition of a correlation and/or
random structure.
For example, in Fig. 5.3a for a monthly flow, the correlogram of the Zt stationary series (with the
periodicities removed) has distinctive features that can be recognized. Comparing it with Fig.
5.2, the Zt correlogram resembles that of an auto regressive (Markov) process. For a first order
Markov model:
Zt = r1Zt−1 + et (5.23)
The correlogram of residuals is finally computed and drawn (Fig.5.3b). For this data this
resembles the correlogram of 'white noise', i.e. independently distributed random values. If there
are still signs of autoregression in the e t correlogram, a second-order Markov model is tried, and
the order is increased until a random et correlogram is obtained. The frequency distribution
diagram of the first order evalues (Fig. 5.3c) demonstrates an approximate approach to the
t
normal (Gussian) distribution.
At this stage, the final definition of the recognizable components of the time series has been
accomplished including the distribution of the random residuals. As part of the analysis, the
fitted models should be tested by the accepted statistical methods applied to times series. Once
the models have been formulated and quantified to satisfactory confidence limits, the total
mathematical representation of the time series can be used for solving hydrological problems by
synthesizing non-historic data series having the same statistical properties as the original data
series.
~ 10 ~
Engineering Hydrology; Lecture Note
Figure 5.3 River Thames at Teddington Weirs ( 82 years of monthly flows, from Shaw, 1988)
The periodic component Pt represented by mt and st for time period t is then added to the Et
values to give:
Xt = Tt + EtSt + mt (from equation 5.21) (5.26)
The incorporation of the trend component Tt then produces a synthetic series of Xt having
similar statistical properties to the historic data series.
~ 11 ~
Engineering Hydrology; Lecture Note
precipitation, solar radiation, etc., nor is it likely that deterministic models for these inputs will
be available in the near future. Stochastic models must be used for these inputs.
Where RN is a standard random normal deviate (i.e. a random observation from a standard
normal distribution) and µ and σ are the parameters of the desired normal distribution of y.
Computer routines are available for generating standard random normal distribution.
Where µ is mean value of the series, β is the regression coefficient, the {y1, y2, …, yt,…} is the
observed sequence and the random variables εt are usually assumed to be normally and
independently distributed with zero mean and variance . In order to determining the order k of
autoregression required to describe the persistence adequately, it is necessary to estimate k+2
parameters: β1, β2, …βk, µ and the variance of residuals . Efficient methods for
estimating these parameters have been described by Kendall and Stuart (1968), Jenkins and
Watts (1968).
Equation (5.29) is the well-known first order Markov Model in the literature. It has three
2
parameters to be estimated: µ, β1, and σ E .
For the moment method of parameter estimation, parameter µ can be computedfrom the time
series as the arithmetic mean of the observed data. As for β 1, the Yule-Walker equation (Delleur,
~ 12 ~
Engineering Hydrology; Lecture Note
P
ρk =∑β j ρk −jk >0 (5.30)
j=1
the above equation, written for k = 1, 2, …, yields a set of equations. Where ρ k is the
autocorrelation coefficient for time lag k. As the autocorrelation coefficients ρ 1, ρ2, …, can be
estimated from the data using equation (4.17), these equations can be solved for the
autoregressive parameters β1, β2, …, β p. This is the estimation of parameters by the method of
moments. For example, for the first order autoregressive model, AR(1), the Yule-Walker
equations yield
ρ1 =β1.ρ0 =β1 sin ceρ0 =1 (5.31)
Similarly we can derive the equations for computing β1 and β2 for the AR(2) model as
ρ (1 − ρ )
β1 = 1 12 2 (5.32)
1 − ρ
2
It can be shown that σ E . is related to (the variance of the yt series) by:
2 2 2
σε =σ y (1−β1 ) (5.33)
2 2
If the distribution of y is N (µy ,σ y ) then distribution of ε is N(0, σE .). Random
2
values yt can now be generated by selecting εt randomly from a N(0, σ E .)
2 2
distribution. If z is N(0,1) then Zσφ or Zσ y 1−β1 is N (0,σε . Thus, a model for
2
generating Y’s that are N(µy ,σ y ) and follow the first order Markov model is
2
yt =µ y +β1( yt−1 −µ y ) + Ztσ y 1−β1 (5.34)
The procedure for generating a value for yt is:
_
(1) estimate µy, σ y, and β1 by y , sx and r1(eq.5.17) respectively,
(2) select a zt at random from a N(0, 1) distribution, and
_
(3) calculate yt by eq. (5.34) based on y , s and β1, and yt-1.
x
~ 13 ~
Engineering Hydrology; Lecture Note
2
The first value of yt, i.e. y1, might be selected at random from a N(µy, σy ).
To eliminate the effect of y1 on the generated sequence, the first 50 or 100 generated values
might be discarded. Equation (5.34) has been widely used for generating annual runoff from
watersheds
5.9.3 First order Markov process with periodicity: Thomas - Fiering model
The first order Markov model of the previous section assumes that the process is stationary in its
first three moments. It is possible to generalise the model so that the periodicity in hydrologic
data is accounted for to some extent. The main application of this generalisation has been in
generating monthly streamflow where pronounced seasonality in the monthly flows exists. In its
simplest form, the method consists of the use of twelve linear regression equations. If, say,
twelve years of record are available, the twelve January flows and the twelve December flows
are abstracted and January flow is regressed upon December flow; similarly, February flow is
regressed upon January flow, and so on for each month of the year.
q = q jan +b (q − qdec ) +ε
jan jan dect jant
Z.Sj +1 (1− r 2 j )
where is the standard deviation of the flows in month j+1, rj is the correlation coefficient
between flows in months j+1 and j throughout the record, and Z = N(0, 1), a normally distributed
random deviate with zero mean and unit standard deviation. The general form may written as
^
qj = qj + bj (qj − qj ) + Zj+1.i .Sj+1 (1− rj 2) (5.36)
+1 +1
~ 14 ~
Engineering Hydrology; Lecture Note
Where bj = rj * Sj 1/ Sj , there are 36 parameters for the monthly model (q, for
+
each month). The subscript j refers to month. For monthly synthesis j varies from 1 to 12
throughout the year. The subscript i is a serial designation from year 1 to year n. Other symbols
are the same as mentioned earlier.
n −1
_
_
2 2
∑(qj,i − qj ) ∑(qj+1,i − qj+1)
ii
~ 15 ~
Engineering Hydrology; Lecture Note
(d) The slope of the regression equation relating the month’s flow to flow in the preceding
month:
Sj+1
bj =rj Sj
~ 16 ~
Engineering Hydrology; Lecture Note
(5.43)
Equations (5.40) and (5.41) can be used for the estimation of the parameters by method of
moments. For this purpose they are rewritten as follows:
(5.44)
data.
~ 17 ~
Engineering Hydrology; Lecture Note
One of the merits of the ARMA process is that, in general, it is possible to fit a model with a
small number of parameters, i.e. p+q. This number is generally smaller than the number of
parameters that would be necessary using either an AR model or a MA model. This principle is
called the parsimony of parameters.
Properties of ARMA model: Consider in the ARMA(1, 1) model which has been used
extensively in
(5.52) Multiplying
both sides of (5.52) by Xt-k
~ 18 ~
Engineering Hydrology; Lecture Note
(5.53)
For k = 0, equ (5.53) becomes
but
and
(5.54)
Thus
(5.55)
(5.56)
and
(5.57)
For k ≥ 2
~ 19 ~
Engineering Hydrology; Lecture Note
k ≥ 2 (5.58)
the autocorrelation function (ACF) is obtained by dividing (5.56), (5.57) and
Observe that the MA parameter θ 1 enters only in the expression for ρ 1. For ρ2
and beyond the behaviour of the autocorrelation is identical to that of the AR(1) model.
Estimates of the parameters θ1 and β2 and can be obtained from equations (5.59b) and (5.59c),
since the serial (auto) correlation coefficients ρ1 and ρ2 can be computed from data.
evapotranspiration bXt. The surface runoff is (1-a-b)Xt = dXt. (See Fig 5.5).
~ 20 ~
Engineering Hydrology; Lecture Note
(5.62)
(5.63)
or
(5.64)
Rewriting (5.62) or
~ 21 ~
Engineering Hydrology; Lecture Note
(5.65) and
rewriting (5.64) as
(5.66) Combining
(5.62), (5.66) and (5.65) we obtain
which has the form of an ARMA (1, 1), i.e. equation (5.52) model when the precipitation, X t is
an independent series and when (1-c) = β1, d = 1, and [d(1-c)ac)] = θ1..
~ 22 ~
Engineering Hydrology; Lecture Note
sequence {yt} can then be constructed, and the frequency with which the extreme event occurs in
them can be taken as an estimate of the "true" frequency with which it would occur in the long
run.
(2) For the investigation of system operating rules
A further use for synthetic sequences generated by stochastic models is in reservoir operation,
such as the investigation of the suitability of proposed operating rules for the release of water
from complex systems of interconnected reservoirs. By using the generated sequence as inputs to
the reservoir system operated according to the proposed rules, the frequency with which
demands fail to be met can be estimated. This may lead to revision of the proposed release rules;
the modified rules may be tested by a similar procedure.
(3) To provide short-term forecasts
Stochastic models have been used to make forecasts. Given the values xt, xt-1, xt-2, ...; yt, yt-1,
yt-2, ... assumed by the input and output variables up to time t, stochastic models have been
constructed from this data for forecasting the output from the system at future times, t+1, t+2, ...,
t+k, .... In statistical terminology, k is the lead-time of the forecast. Many stochastic models have
a particular advantage for forecasting purposes in that they provide, as a by-product of the
procedure for estimating model parameters, confidence limits for forecasts (i.e. a pair of values,
one less than the forecast and one greater, such that there is a given probability P that these
values will bracket the observed value of the variable at time t+k). Confidence limits therefore
express the uncertainty in forecasts; the wider apart the confidence limits, the less reliable the
forecast. Furthermore, the greater the lead-time k for which forecasts is required, the greater will
be the width of the confidence interval, since the distant future is more uncertain than the
immediate.
(4) To "extend" records of short duration, by correlation
Stochastic models have been used to "extend" records of basin discharge where this record is
short. For example, suppose that it is required to estimate the instantaneous peak discharge with
a return period of T years (i.e. such that it would recur with frequency once in T years, in the
long run). One approach to this problem is to examine the discharge record at the site for which
the estimate is required, to abstract the maximum instantaneous discharge for each year of
record, and to represent the distribution of annual maximum instantaneous discharge by a
suitable probability density function. The abscissa, Yo, say, that is exceeded by a proportion 1/T
of the distribution then estimates the T-year flood. It, however, frequently happens that the
length of discharge record available is short, say ten years or fewer. On the other hand, a much
longer record of discharge may be available for another gauging site, such that the peak
discharges at the two sites are correlated. In certain circumstances, it is then permissible to
represent the relation between the annual maximum discharges at the two sites by a regression
equation and to use this fitted equation to estimate the annual maximum instantaneous discharges
for the site with short record.
(5) To provide synthetic sequences of basin input
Suppose that the model has been developed for a system consisting of a basin with rainfall as
input variable, streamflow as output variable. If a stochastic model were developed from which a
synthetic sequence of rainfall could be generated having statistical properties resembling those of
the historic rainfall sequence, the synthetic rainfall sequence could be used as input to the main
model for transformation to the synthetic discharge sequence. The discharge so derived could
then be examined for the frequency of extreme events.
This approach to the study of the frequency of extreme discharge events is essentially an
~ 23 ~
Engineering Hydrology; Lecture Note
alternative to that described in paragraph (1) above. In the latter, a synthetic sequence is derived
from a stochastic model of the discharge alone; in the former, a synthetic discharge sequence is
derived by using a model to convert a synthetic sequence of rainfall into discharge.
~ 24 ~