Math-Stats-Econometrics_Revisit
Math-Stats-Econometrics_Revisit
Foundations: Revisit
1
Mathematical Foundations
2
Functions
3
Straight Lines
• The intercept is the point at which the line crosses the y-axis
• Example: suppose that we were modelling the relationship between a
student’s average mark, y (in percent), and the number of hours studied
per year, x
• Suppose that the relationship can be written as a linear function
y = 25 + 0.05x
• The intercept, a, is 25 and the slope, b, is 0.05
• This means that with no study (x=0), the student could expect to earn a
mark of 25%
• For every hour of study, the grade would on average improve by 0.05%,
so another 100 hours of study would lead to a 5% increase in the mark
4
Plot of Hours Studied Against Mark Obtained
5
Straight Lines
6
Roots
• The point at which a line crosses the x-axis is known as the root
• A straight line will have one root (except for a horizontal line such as y=4
which has no roots)
• In this case it does not have a sensible interpretation: the number of hours
of study required to obtain a mark of zero!
7
Quadratic Functions
9
The Roots of Quadratic Functions (Cont’d)
• If b2 > 4ac, the function will have two unique roots and it will cross the x-
axis in two separate places
• If b2 = 4ac, the function will have two equal roots and it will only cross
the x-axis in one place
• If b2 < 4ac, the function will have no real roots (only complex roots), it
will not cross the x-axis at all and thus the function will always be above
the x-axis.
10
Calculating the Roots of Quadratics - Examples
1. y = x2 + x − 6
2. y = 9x2 + 6x + 1
3. y = x2 − 3x + 1
4. y = x2 − 4x
11
Calculating the Roots of Quadratics - Solutions
13
The Exponential Function, e
• It is sometimes the case that the relationship between two variables is best
described by an exponential function
• For example, when a variable grows (or reduces) at a rate in proportion to
its current value, we would write y = ex
• e is a simply number: 2.71828. . .
• It is also useful for capturing the increase in value of an amount of money
that is subject to compound interest
• The exponential function can never be negative, so when x is negative, y is
close to zero but positive
• It crosses the y-axis at one and the slope increases at an increasing rate
from left to right.
14
A Plot of the Exponential Function
15
Logarithms
• There are at least three reasons why log transforms may be useful.
1. Taking a logarithm can often help to rescale the data so that their variance is
more constant, which overcomes a common statistical problem known as
heteroscedasticity.
2. Logarithmic transforms can help to make a positively skewed distribution
closer to a normal distribution.
3. Taking logarithms can also be a way to make a non-linear, multiplicative
relationship between variables into a linear, additive one.
16
How do Logs Work?
17
A Graph of a Log Function
18
How do Logs Work?
19
The Laws of Logs
• ln (x y) = ln (x) + ln (y)
• ln (x/y) = ln (x) − ln (y)
• ln (yc) = c ln (y)
• ln (1) = 0
• ln (1/y) = ln (1) − ln (y) = −ln (y)
• ln(ex) = x ln(e) = x
20
Sigma Notation
21
Properties of the Sigma Operator
22
Pi Notation
• Similar to the use of sigma to denote sums, the pi operator (Π) is used to
denote repeated multiplications.
• For example
means ‘multiply together all of the xi for each value of i between the lower
and upper limits’.
23
Differential Calculus
• The effect of the rate of change of one variable on the rate of change of
another is measured by a mathematical derivative
• If the relationship between the two variables can be represented by a
curve, the gradient of the curve will be this rate of change
• Consider a variable y that is a function f of another variable x, i.e. y = f (x):
the derivative of y with respect to x is written
or sometimes f ′(x).
• This term measures the instantaneous rate of change of y with respect to x,
or in other words, the impact of an infinitesimally small change in x
• Notice the difference between the notations Δy and dy
24
Differentiation: The Basics
26
The Derivative of a Power Function or of a Sum
27
The Derivatives of Logs and Exponentials
29
Higher Order Derivatives (Cont’d)
30
Optimization / Maxima and Minima of Functions
32
How to do Partial Differentiation
33
Integration
34
Integration
y = f (x ) dy dx
8x
20x 9
15x 2
ex + 2
8x + 1
x
y = f (x ) dy dx
4x 2 8x
3x + 2 3
2x 10 20x 9
5x 3 15x 2
e x + 2x ex + 2
4x 2 + ln (x ) 8x + 1
x
d
dx
( )
4x 2 = 8x
∫ 8x dx = 4x 2
4x 2 8x
4x 2 + 32
4x 2 − π
8x ?
4x 2 + c 8x
• Part of the integration process involves the use of a constant for the f ’(x) in
order to get the initial function f(x)
∫ f ′(x ) dx = F (x ) + c = f (x )
• There are basic formulas (rules) of integration for most known functions, which
comprise the inverse operation vis-à-vis differentiation
Definite integral
The Definite integral of f (x) from a to b defines the area below the “line” f(x) in
a Cartesian plane and is symbolized as:
∫ f (x )dx
b
A=
a
x =b
f (x )dx = F (x ) = F (b ) − F (a )
b
∫a x =a
Matrix Algebra - Background
41
Working with Matrices
• When the number of rows and columns is equal (i.e. R = C), it would be
said that the matrix is square, e.g. the 2 × 2 matrix:
43
Working with Matrices 2
44
Working with Matrices 3
• A diagonal matrix with 1 in all places on the leading diagonal and zero
everywhere else is known as the identity matrix, denoted by I, e.g.
• The identity matrix is essentially the matrix equivalent of the number one
• Multiplying any matrix by the identity matrix of the appropriate size
results in the original matrix being left unchanged
• So for any matrix M, MI = IM = M
• In order to perform operations with matrices , they must be conformable
• The dimensions of matrices required for them to be conformable depend
on the operation.
45
Matrix Addition or Subtraction
46
Matrix Multiplication
• More generally, for two matrices A and B of the same order and for c a
scalar, the following results hold
– A+B=B+A
– A+0=0+A=A
– cA = Ac
– c(A + B) = cA + cB
– A0 = 0A = 0
47
Matrix Multiplication
48
Matrix Multiplication Example
49
The Transpose of a Matrix
• If A is of dimensions R × C, A′ will be C × R.
50
The Rank of a Matrix
• In the first case, all rows and columns are (linearly) independent of one another,
but in the second case, the second column is not independent of the first (the
second column is simply twice the first)
• A matrix with a rank equal to its dimension is a matrix of full rank
• A matrix that is less than of full rank is known as short rank matrix, and is singular
• Three important results:
- Rank(A) = Rank (A′);
- Rank(AB) ≤ min(Rank(A), Rank(B));
- Rank (A′A) = Rank (AA′) = Rank (A)
51
The Inverse of a Matrix
• The inverse of a matrix A, where defined and denoted A−1, is that matrix
which, when pre-multiplied or post multiplied by A, will result in the
identity matrix, i.e. AA−1 = A−1A = I
• The inverse of a matrix exists only when the matrix is square and non
non--
singular i.e., the determinant is different than zero |A|≠0
|A|≠0
• Properties of the inverse of a matrix include:
– I−1 = I
– (A−1)−1 = A
– (A′)−1 = (A−1)′
– (AB)−1 = B−1A−1
52
Calculating Inverse of a 2×2 Matrix
• The expression (ad − bc) is the determinant of the matrix, and will be a
scalar. The determinant is defined ONLY for square matrices
• If the matrix is
• As a check, multiply the two matrices together and it should give the
identity matrix I.
53
Complex Numbers
But this is impossible, since the square of any real number is positive. [For
example,(–2)2 = 4, a positive number.]
A complex number is then a number of the form a + bi, where a and b are
real numbers.
Complex Numbers
Note that both the real and imaginary parts of a complex number are real
numbers.
Complex Numbers
Z = 5 + 3i
Imaginary part
Real part
The only difference that we need to keep in mind is that i2 = –1. Thus, the
following calculations are valid.
(a + bi)(c + di) = ac + (ad + bc)i + bdi2 Multiply and collect like terms
Note that
z z = (a + bi)(a – bi) = a2 + b2
So the product of a complex number and its conjugate is always a
nonnegative real number.
Arithmetic Operations on Complex Numbers
Solution:
We multiply both the numerator and denominator by the complex
conjugate of the denominator to make the new denominator a real number.
cont’d
Solution
(a)
(b)
(c)
Square Roots of Negative Numbers
but
so
But in the complex number system, this equation will always have
solutions, because negative numbers have square roots in this expanded
setting.
cont’d
Example
Solve x2 + 4x + 5 = 0
= –2 ± i
The next theorem gives the form that every convergent power series must
take.
The coefficients of the power series are precisely the coefficients of the Taylor
polynomials for f(x) at c. For this reason, the series is called the Taylor series
for f(x) at c.
Taylor Series and Maclaurin Series
Example – Forming a Power Series
Solution:
Successive differentiation of f(x) yields
You cannot conclude that the power series converges to sin x for all x.
You can simply conclude that the power series converges to some function, but
you are not sure what function it is.
might converge to a function other than f, remember that the derivatives are
being evaluated at a single point.
Taylor Series and Maclaurin Series
It can easily happen that another function will agree with the values of
f (n)(x) when x = c and disagree at other x-values.
The Taylor series for f may fail to converge for some x in I. Or, even if it
is convergent, it may fail to have f(x) as its sum.
where
Taylor Series and Maclaurin Series
Note that in this remainder formula, the particular value of z that makes the
remainder formula true depends on the values of x and n. If then
the next theorem tells us that the Taylor series for f actually converges to f
(x) for all x in I.
Taylor Series and Maclaurin Series
Binomial Series
Find the Maclaurin series for f(x) = (1 + x)k and determine its radius of
convergence. Assume that k is not a positive integer and k ≠ 0.
Solution:
By successive differentiation, you have
f(x) = (1 + x)k f(0) = 1
f'(x) = k(1 + x)k – 1 f'(0) = k
f''(x) = k(k – 1)(1 + x)k – 2 f''(0) = k(k – 1)
f'''(x) = k(k – 1)(k – 2)(1 + x)k – 3 f'''(0) = k(k – 1)(k – 2)
. .
.
Because an + 1/an→1, you can apply the Ratio Test to conclude that
the radius of convergence is R = 1.
So, the series converges to some function in the interval (–1, 1).
Basic Taylor Series
Statistical Foundations
84
Distributions: The population and the sample
85
Probability and probability distributions
- Some definitions
88
A plot of the pdf for a normal distribution
89
Other important distributions
• The spread of a series about its mean value can be measured using the
variance or standard deviation (which is the square root of the variance)
• This quantity is an important measure of risk in finance
• The standard deviation scales with the data, whereas the variance scales
with the square of the data. So, for example, if the units of the data points
are US dollars, the standard deviation will also be measured in dollars
whereas the variance will be in dollars squared
• Other measures of spread include the range (the difference between the
largest and smallest of the data points) and the semi-interquartile range
(the difference between the first (25%) and third quartile (75%) points in
the series
• The coefficient of variation divides the standard deviation by the sample
mean to obtain a unit-free measure of spread that can be compared across
series with different scales.
92
Higher moments
• The higher moments of a data sample give further indications of its features
and shape.
• The formulae for skewness and kurtosis calculate the quantities using the
sample data in the same way that the variance is calculated 93
Plot of a positive skewed series versus a normal
distribution
94
Plot of a leptokurtic df (fat-tailed, kurtosis >3) versus a
Normal distribution
95
Measures of association
97
Introduction:
The Nature and Purpose of Econometrics
• What is Econometrics?
• There are 3 types of data which econometricians might use for analysis:
1. Time series data
2. Cross-sectional data
3. Panel data, a combination of 1. & 2.
4. Big data
• The data may be quantitative (e.g. exchange rates, stock prices, number of
shares outstanding), or qualitative (e.g. day of the week).
Steps involved in the formulation of
econometric models
Collection of Data
Model Estimation
No Yes
101
Why do we include a random term?
102
Determining the Regression Coefficients
103
Ordinary Least Squares Method
• The most common method used to fit a line to the data is known as
OLS (ordinary least squares).
• Take each distance and square it (i.e. take the area of each of the
squares in the diagram) and minimise the total sum of the squares
(hence least squares).
• In general:
yt denotes the actual data point t
ŷt denotes the fitted value from the regression line
ût denotes the residual, yt - ŷt
104
Actual and Fitted Value
yi
û i
ŷi
xi x
105
OLS (1)
5
2 2 2 2 2
• So min. uˆ + uˆ + uˆ + uˆ + uˆ , or minimise
1 2 3 4 5 uˆ 2
t . This is known as the
residual sum of squares. t =1
• But what was ût ? It was the difference between the actual point and the line,
yt - ŷt .
106
OLS (2)
∂L
= −2 ( yt − αˆ − βˆxt ) = 0 (1)
∂ αˆ t
∂L
= −2 xt ( yt − αˆ − βˆxt ) = 0 (2)
∂βˆ t
107
OLS (3)
• It is proven that
ˆ
β = xt yt − T x y
and
and αˆ = y − βˆx
x t2 − T x 2
or
βˆ = ( xt − x )( yt − y )
( xt − x ) 2
108
Estimator or Estimate?
• Estimate is the actual numerical value for the coefficients, or the value of
the estimator for a particular sample.
109
Properties of the OLS Estimator
• If assumptions hold, then the estimators are known as Best Linear Unbiased
Estimators (BLUE).
110
Precision and Standard Errors
• Any set of regression estimates of αɵ and βɵ are specific to the sample used in their
estimation.
2
x t2 xt
S E ( αˆ ) = V a r ( aˆ ) = s = s ,
T ( xt − x ) 2 T x t2 − T 2 x 2
1 1
S E ( βˆ ) = V a r ( βˆ ) = s = s
( xt − x )2 x t2 − T x 2
s=
t
ˆ
u 2
T −2
111
Example: How to Calculate the Parameters and
Standard Errors
• We write yˆ t = αˆ + βˆ x t
yˆ t = -59 . 12 + 0 .35 x t
112
Example (cont’d)
• SE(regression),
s=
ˆ
u 2
t
=
130.6
= 2.55
T −2 20
3919654
SE (α^ ) = 2.55 * = 3.35
(
(22 × 3919654) − 22 × 416.5 2
)
^ 1
SE ( β ) = 2.55 * = 0.0079
(
3919654 − 22 × 416.5 2
)
• We now write the results as
yˆ t = − 59.12 + 0.35 xt
(3.35) (0.0079)
113
Hypothesis Testing
• We will always have two hypotheses that go together, the null hypothesis (denoted H0)
and the alternative hypothesis (denoted H1).
• The null hypothesis is the statement or the statistical hypothesis that is actually being
tested. The alternative represents the remaining outcomes of interest.
• For example:
H0 : β = 0.5
H1 : β ≠ 0.5
This would be known as a two sided test.
• If we have prior information that, e.g. β > 0.5 rather than β < 0.5, we would do a one-
sided test:
H0 : β = 0.5
H1 : β > 0.5 or H1 : β < 0.5
• There are two ways to conduct a hypothesis test: via the test of significance approach
or via the confidence interval approach.
114
Testing Hypotheses: I) The Test of Significance Approach
• After estimating αɵ , βɵ and SE(αɵ ) , SE( βɵ ) in the usual way, we calculate the test
statistic. This is given by the formula (proof in the book)
βɵ − β *
test statistic =
SE ( βɵ )
where β * is the value of β under the null hypothesis.
• This test statistic follows a Student t-distribution with T-2 degrees of freedom
• Then we need to choose a “significance level”, often denoted α. This determines the
region where we will reject or not reject the null hypothesis that we are testing.
• We use the t-tables to obtain a critical value/values with which to compare the test
statistic.
• Finally we perform the test: If the test statistic lies in the rejection region then reject
the null hypothesis (H0), else do not reject H0
115
Determining the Rejection Region for a Test of
Significance
f(x)
116
The Rejection Region for a 1-Sided Test (Upper Tail)
f(x)
95% non-rejection
5% rejection region
117
II) The Confidence Interval Approach to Hypothesis Testing
3. Use the t-tables to find the appropriate critical value, which will again have T-2
degrees of freedom.
5. Perform the test: If the hypothesised value β* lies outside the confidence interval,
then reject the H0: β = β*, otherwise do not reject the null.
118
The t-ratio
119
Multiple Linear Regression Model
• In order to obtain the parameter estimates, β1, β2,..., βk, we would minimise the
RSS with respect to all the β s. It can be shown from Gauss-Markov theorem that:
βˆ1
βˆ
βˆ = 2 = ( X ′X ) −1 X ′ y
⋮
β k
ˆ
uɵ' uɵ
• To estimate the variance of the errors, we use s2 =
T−k
where k = number of regressors. It can be proved that the OLS estimator of
the variance of βɵ is given (Gauss-Markov) by the diagonal elements of
s2 ( X ' X ) −1
121
Goodness of Fit
t
( ) t
( ) + uˆt2
2 2
• TSS = ESS + RSS => y − y = ˆ
y − y
t t t
where, the part which we have explained is known as the explained sum of
squares, ESS and the part which cannot be explained using the model and is
due to random factors (the RSS).
ESS
• Our goodness of fit statistic is R 2 = TSS
ESS TSS − RSS RSS
• But since TSS = ESS + RSS, we can also write R 2 = = = 1−
TSS TSS TSS
• R2 must always lie in [0, 1]. To understand this, consider two extremes
RSS = TSS i.e. ESS = 0 so R2 = ESS/TSS = 0
ESS = TSS i.e. RSS = 0 so R2 = ESS/TSS = 1
122
R2 = 0 and R2 = 1
yt yt
xt xt
123