0% found this document useful (0 votes)
4 views

Math-Stats-Econometrics_Revisit

The document provides an overview of mathematical foundations relevant to functions, including linear and quadratic functions, their roots, and the use of exponential and logarithmic functions. It explains key concepts such as derivatives, differentiation, and optimization, highlighting their applications in understanding relationships between variables. Additionally, it covers sigma and pi notation for summation and multiplication, respectively.

Uploaded by

maingoc578
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Math-Stats-Econometrics_Revisit

The document provides an overview of mathematical foundations relevant to functions, including linear and quadratic functions, their roots, and the use of exponential and logarithmic functions. It explains key concepts such as derivatives, differentiation, and optimization, highlighting their applications in understanding relationships between variables. Additionally, it covers sigma and pi notation for summation and multiplication, respectively.

Uploaded by

maingoc578
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 123

Mathematics, Statistics & Econometrics

Foundations: Revisit

1
Mathematical Foundations

2
Functions

• A function is a mapping or relationship between an input or set of inputs


and an output
• We write that y, the output, is a function f of x, the input, or y = f(x)
• y could be a linear function of x where the relationship can be expressed
on a straight line
• Or it could be non-linear where it would be expressed graphically as a
curve
• If the equation is linear, we would write the relationship as
y = a + bx
where y and x are called variables and a and b are parameters
• a is the intercept and b is the slope or gradient

3
Straight Lines

• The intercept is the point at which the line crosses the y-axis
• Example: suppose that we were modelling the relationship between a
student’s average mark, y (in percent), and the number of hours studied
per year, x
• Suppose that the relationship can be written as a linear function
y = 25 + 0.05x
• The intercept, a, is 25 and the slope, b, is 0.05
• This means that with no study (x=0), the student could expect to earn a
mark of 25%
• For every hour of study, the grade would on average improve by 0.05%,
so another 100 hours of study would lead to a 5% increase in the mark

4
Plot of Hours Studied Against Mark Obtained

5
Straight Lines

• In the graph above, the slope is positive


– i.e. the line slopes upwards from left to right
• But in other examples the gradient could be zero or negative
• For a straight line the slope is constant – i.e. the same along the whole line
• In general, we can calculate the slope of a straight line by taking any two
points on the line and dividing the change in y by the change in x
• ∆ (Delta) denotes the change in a variable
• For example, take two points x=100, y=30 and x=1000, y=75
• We can write these using coordinate notation (x,y) as (100,30) and
(1000,75)
• We would calculate the slope as

6
Roots

• The point at which a line crosses the x-axis is known as the root

• A straight line will have one root (except for a horizontal line such as y=4
which has no roots)

• To find the root of an equation set y to zero and rearrange


0 = 25 + 0.05x

• So the root is x = −500

• In this case it does not have a sensible interpretation: the number of hours
of study required to obtain a mark of zero!
7
Quadratic Functions

• A linear function is often not sufficiently flexible to accurately describe


the relationship between two series
• We could use a quadratic function instead. We would write it as
y = a + bx + cx2
where a, b, c are the parameters that describe the shape of the function
• Quadratics have an additional parameter compared with linear functions
• The linear function is a special case of a quadratic where c=0
• a still represents where the function crosses the y-axis
• As x becomes very large, the x2 term will come to dominate
• Thus if c is positive, the function will be ∪-shaped, while if c is negative it
will be ∩-shaped.
8
The Roots of Quadratic Functions

• A quadratic equation has two roots


• The roots may be distinct (i.e., different from one another), or they may
be the same (repeated roots); they may be real numbers (e.g., 1.7, -2.357,
4, etc.) or what are known as complex numbers
• The roots can be obtained either by factorising the equation (contracting it
into parentheses), by ‘completing the square’, or by using the formula:

9
The Roots of Quadratic Functions (Cont’d)

• If b2 > 4ac, the function will have two unique roots and it will cross the x-
axis in two separate places

• If b2 = 4ac, the function will have two equal roots and it will only cross
the x-axis in one place

• If b2 < 4ac, the function will have no real roots (only complex roots), it
will not cross the x-axis at all and thus the function will always be above
the x-axis.

10
Calculating the Roots of Quadratics - Examples

Determine the roots of the following quadratic equations:

1. y = x2 + x − 6
2. y = 9x2 + 6x + 1
3. y = x2 − 3x + 1
4. y = x2 − 4x

11
Calculating the Roots of Quadratics - Solutions

• We solve these equations by setting them in turn to zero


• We could use the quadratic formula in each case, although it is usually
quicker to determine first whether they factorise

1. x2 + x − 6 = 0 factorises to (x − 2)(x + 3) = 0 and thus the roots are 2 and


−3, which are the values of x that set the function to zero. In other words,
the function will cross the x-axis at x = 2 and x = −3

2. 9x2 + 6x + 1 = 0 factorises to (3x + 1)(3x + 1) = 0 and thus the roots are


−1/3 and −1/3. This is known as repeated roots – since this is a quadratic
equation there will always be two roots but in this case they are both the
same.
12
Calculating the Roots of Quadratics – Solutions Cont’d

3. x2 − 3x + 1 = 0 does not factorise and so the formula must be used


with a = 1, b = −3, c = 1 and the roots are 0.38 and 2.62 to two decimal
places

4. x2 − 4x = 0 factorises to x(x − 4) = 0 and so the roots are 0 and 4.

• All of these equations have two real roots


• But if we had an equation such as y = 3x2 − 2x + 4, this would not
factorise and would have complex roots since b2 − 4ac < 0 in the
quadratic formula.

13
The Exponential Function, e

• It is sometimes the case that the relationship between two variables is best
described by an exponential function
• For example, when a variable grows (or reduces) at a rate in proportion to
its current value, we would write y = ex
• e is a simply number: 2.71828. . .
• It is also useful for capturing the increase in value of an amount of money
that is subject to compound interest
• The exponential function can never be negative, so when x is negative, y is
close to zero but positive
• It crosses the y-axis at one and the slope increases at an increasing rate
from left to right.

14
A Plot of the Exponential Function

15
Logarithms

• Logarithms were invented to simplify cumbersome calculations, since


exponents can then be added or subtracted, which is easier than
multiplying or dividing the original numbers

• There are at least three reasons why log transforms may be useful.
1. Taking a logarithm can often help to rescale the data so that their variance is
more constant, which overcomes a common statistical problem known as
heteroscedasticity.
2. Logarithmic transforms can help to make a positively skewed distribution
closer to a normal distribution.
3. Taking logarithms can also be a way to make a non-linear, multiplicative
relationship between variables into a linear, additive one.

16
How do Logs Work?

• Consider the power relationship 23 = 8


• Using logarithms, we would write this as log28 = 3, or ‘the log to the base
2 of 8 is 3’
• Hence we could say that a logarithm is defined as the power to which the
base must be raised to obtain the given number
• More generally, if ab = c, then we can also write logac = b
• If we plot a log function, y = log(x), it would cross the x-axis at one – see
the following slide
• It can be seen that as x increases, y increases at a slower rate, which is the
opposite to an exponential function where y increases at a faster rate as x
increases.

17
A Graph of a Log Function

18
How do Logs Work?

• Natural logarithms, also known as logs to base e, are more commonly


used and more useful mathematically than logs to any other base
• A log to base e is known as a natural or Naperian logarithm, denoted
interchangeably by ln(y) or log(y)
• Taking a natural logarithm is the inverse of a taking an exponential, so
sometimes the exponential function is called the antilog
• The log of a number less than one will be negative, e.g. ln(0.5) ≈ −0.69
• We cannot take the log of a negative number
– So ln(−0.6), for example, does not exist.

19
The Laws of Logs

For variables x and y:

• ln (x y) = ln (x) + ln (y)
• ln (x/y) = ln (x) − ln (y)
• ln (yc) = c ln (y)
• ln (1) = 0
• ln (1/y) = ln (1) − ln (y) = −ln (y)
• ln(ex) = x ln(e) = x

20
Sigma Notation

• If we wish to add together several numbers (or observations from


variables), the sigma or summation operator can be very useful
• Σ means ‘add up all of the following elements.’ For example, Σ(1 + 2 + 3)
=6
• In the context of adding the observations on a variable, it is helpful to add
‘limits’ to the summation
• For instance, we might write
where the i subscript is an index, 1 is the lower limit and 4 is the upper
limit of the sum
• This would mean adding all of the values of x from x1 to x4.

21
Properties of the Sigma Operator

22
Pi Notation

• Similar to the use of sigma to denote sums, the pi operator (Π) is used to
denote repeated multiplications.

• For example

means ‘multiply together all of the xi for each value of i between the lower
and upper limits’.

• It also follows that

23
Differential Calculus

• The effect of the rate of change of one variable on the rate of change of
another is measured by a mathematical derivative
• If the relationship between the two variables can be represented by a
curve, the gradient of the curve will be this rate of change
• Consider a variable y that is a function f of another variable x, i.e. y = f (x):
the derivative of y with respect to x is written

or sometimes f ′(x).
• This term measures the instantaneous rate of change of y with respect to x,
or in other words, the impact of an infinitesimally small change in x
• Notice the difference between the notations Δy and dy
24
Differentiation: The Basics

1. The derivative of a constant is zero – e.g. if y = 10, dy/dx = 0


This is because y = 10 would be a horizontal straight line on a graph of y
against x, and therefore the gradient of this function is zero
2. The derivative of a linear function is simply its slope
e.g. if y = 3x + 2, dy/dx = 3
• But non-linear functions will have different gradients at each point along
the curve
• In effect, the gradient at each point is equal to the gradient of the tangent
at that point
• The gradient will be zero at the point where the curve changes direction
from positive to negative or from negative to positive – this is known as a
turning point.
25
The Tangent to a Curve

26
The Derivative of a Power Function or of a Sum

• The derivative of a power function n of x, i.e. y = cxn is given by


dy/dx = cnxn−1
• For example:
– If y = 4x3, dy/dx = (4 × 3)x2 = 12x2
– If y = 3/x = 3x−1, dy/dx= (3 × −1)x−2 = −3x−2 = −3/x2

• The derivative of a sum is equal to the sum of the derivatives of the


individual parts: e.g., if y = f (x) + g (x), dy/dx = f ′(x) + g′(x)
• The derivative of a difference is equal to the difference of the derivatives
of the individual parts: e.g., if y = f (x) − g (x), dy/dx = f ′(x) − g′(x).

27
The Derivatives of Logs and Exponentials

• The derivative of the log of x is given by 1/x, i.e. d(log(x))/dx = 1/x


• The derivative of the log of a function of x is the derivative of the
function divided by the function, i.e. d(log(f (x)))/dx = f ′(x)/f (x)
E.g., the derivative of log(x3 + 2x − 1) is (3x2 + 2)/(x3 + 2x − 1)

• The derivative of ex is ex.


• The derivative of e f (x) is given by f ′(x)e f (x)
E.g., if y = e3x2, dy/dx = 6xe3x2

• The total differentiation of a composite function is implemented via the


so-called “chain
chain--rule
rule”
28
Higher Order Derivatives

• It is possible to differentiate a function more than once to calculate the


second order, third order, . . ., nth order derivatives
• The notation for the second order derivative, which is usually just termed
the second derivative, is

• To calculate second order derivatives, differentiate the function with


respect to x and then differentiate it again
• For example, suppose that we have the function y = 4x5 + 3x3 + 2x + 6,
the first order derivative is

29
Higher Order Derivatives (Cont’d)

• The second order derivative is

• The second order derivative can be interpreted as the gradient of the


gradient of a function – i.e., the rate of change of the gradient
• How can we tell whether a particular turning point is a maximum or a
minimum?
• The answer is that we would look at the second derivative
• When a function reaches a maximum, its second derivative is negative,
while it is positive for a minimum (optimization).

30
Optimization / Maxima and Minima of Functions

• Consider the quadratic function y = 5x2 + 3x − 6


• Since the squared term in the equation has a positive sign (i.e., it is 5
rather than, say, −5), the function will have a ∪-shape
∪ rather than an ∩-
shape, and thus it will have a minimum rather than a maximum:
dy/dx = 10x + 3, d2y/dx2 = 10
• Since the second derivative is positive, the function indeed has a
minimum
• To find where this minimum is located, take the first derivative, set it to
zero and solve it for x
• So we have 10x + 3 = 0, and x = −3/10 = −0.3. If x = −0.3, y is found by
substituting −0.3 into y = 5x2 + 3x − 6 = 5 × (−0.3)2 + (3 × −0.3) − 6 =
−6.45. Therefore, the minimum of this function is found at (−0.3,−6.45).
31
Partial Differentiation

• In the case where y is a function of more than one variable (e.g.


y = f (x1, x2, . . . , xn)), it may be of interest to determine the effect that
changes in each of the individual x variables would have on y
• Differentiation of y with respect to only one of the variables, holding the
others constant, is partial differentiation
• The partial derivative of y with respect to a variable x1 is usually denoted
∂y/∂x1
• All of the rules for differentiation explained above still apply and there
will be one (first order) partial derivative for each variable on the right
hand side of the equation.

32
How to do Partial Differentiation

• We calculate these partial derivatives one at a time, treating all of the


other variables as if they were constants.
• To give an illustration, suppose y = 3x13 + 4x1 − 2x24 + 2x22, the partial
derivative of y with respect to x1 would be ∂y/∂x1 = 9x12 + 4, while the
partial derivative of y with respect to x2 would be ∂y/∂x2 = −8x23 + 4x2

33
Integration

• Indefinite Integration is the opposite of differentiation


• If we integrate a function and then differentiate the result, we get back
the original function

• Definite Integration is used to calculate the area under a curve (between


two specific points)

34
Integration

y = f (x ) dy dx
8x

20x 9

15x 2

ex + 2

8x + 1
x

Μαθη ατικός Λογισ ός σε Ε̟ιχειρησιακά Προβλή ατα


Integration

y = f (x ) dy dx
4x 2 8x

3x + 2 3

2x 10 20x 9

5x 3 15x 2

e x + 2x ex + 2

4x 2 + ln (x ) 8x + 1
x

Μαθη ατικός Λογισ ός σε Ε̟ιχειρησιακά Προβλή ατα


Indefinite integral

d
dx
( )
4x 2 = 8x

… so the inverse operation is ….

∫ 8x dx = 4x 2

The (indefinite) integral of 8x depending on x is 4x2


Integration constant

4x 2 8x

4x 2 + 32

4x 2 − π

8x ?

4x 2 + c 8x

We call c the constant/parameter of indefinite integration


Indefinite integral

• Part of the integration process involves the use of a constant for the f ’(x) in
order to get the initial function f(x)

∫ f ′(x ) dx = F (x ) + c = f (x )

This is known as the indefinite integral of f ’(x)

• There are basic formulas (rules) of integration for most known functions, which
comprise the inverse operation vis-à-vis differentiation
Definite integral

The Definite integral of f (x) from a to b defines the area below the “line” f(x) in
a Cartesian plane and is symbolized as:

∫ f (x )dx
b
A=
a

The Fundamental theorem of integral calculus proves that if F (x) symbolizes


the inverse derivative, then the definite integral of f (x) is:

x =b
f (x )dx = F (x ) = F (b ) − F (a )
b

∫a   x =a
Matrix Algebra - Background

• Some useful terminology:


– A scalar is simply a single number (although it need not be a whole number
– e.g., 3, −5, 0.5 are all scalars)
– A vector is a one-dimensional array of numbers (see below for examples)
– A matrix is a two-dimensional collection or array of numbers. The size of a
matrix is given by its numbers of rows and columns (e.g., an Excel spreadsheet)
• Matrices are very useful and important ways for organising sets of data
together, which make manipulating and transforming them easy
• Matrices are widely used in econometrics and finance for solving
systems of linear equations, for deriving key results, and for expressing
formulae.

41
Working with Matrices

• The dimensions of a matrix are quoted as R × C, which is the number of


rows by the number of columns
• Each element in a matrix is referred to using subscripts.
• For example, suppose a matrix M has two rows and four columns. The
element in the second row and the third column of this matrix would be
denoted m23.
• More generally mij refers to the element in the ith row and the jth column.
• Thus a 2 × 4 matrix would have elements

• If a matrix has only one row, it is a row vector, which will be of


dimension 1 × C, where C is the number of columns, e.g.
( 2.7 3.0 −1.5 0.3 )
42
Working with Matrices

• A matrix having only one column is a column vector, which will be of


dimension R× 1, where R is the number of rows, e.g.

• When the number of rows and columns is equal (i.e. R = C), it would be
said that the matrix is square, e.g. the 2 × 2 matrix:

• A matrix in which all the elements are zero is a zero matrix.

43
Working with Matrices 2

• A symmetric matrix is a special square matrix that is symmetric about the


leading diagonal so that mij = mji ∀ i, j, e.g.

• A diagonal matrix is a square matrix which has non-zero terms on the


leading diagonal and zeros everywhere else, e.g.

44
Working with Matrices 3

• A diagonal matrix with 1 in all places on the leading diagonal and zero
everywhere else is known as the identity matrix, denoted by I, e.g.

• The identity matrix is essentially the matrix equivalent of the number one
• Multiplying any matrix by the identity matrix of the appropriate size
results in the original matrix being left unchanged
• So for any matrix M, MI = IM = M
• In order to perform operations with matrices , they must be conformable
• The dimensions of matrices required for them to be conformable depend
on the operation.
45
Matrix Addition or Subtraction

• Addition and subtraction of matrices requires the matrices concerned to


be of the same order (i.e. to have the same number of rows and the same
number of columns as one another)
• The operations are then performed element by element

46
Matrix Multiplication

• Multiplying or dividing a matrix by a scalar (that is, a single number),


implies that every element of the matrix is multiplied by that number

• More generally, for two matrices A and B of the same order and for c a
scalar, the following results hold
– A+B=B+A
– A+0=0+A=A
– cA = Ac
– c(A + B) = cA + cB
– A0 = 0A = 0

47
Matrix Multiplication

• Multiplying two matrices together requires the number of columns of the


first matrix to be equal to the number of rows of the second matrix
• Note that the ordering of the matrices is important in multiplication:
AB ≠ BA
• When the matrices are multiplied together, the resulting matrix will be of
size (number of rows of first matrix × number of columns of second
matrix), e.g.
(3 × 2) × (2 × 4) = (3 × 4).
• More generally, (a × b) × (b × c) ×(c × d) × (d × e) = (a × e), etc.
• In general, matrices cannot be divided by one another.
– Instead, we multiply by the inverse.

48
Matrix Multiplication Example

• The actual multiplication of the elements of the two matrices is done by


multiplying along the rows of the first matrix and down the columns of
the second

49
The Transpose of a Matrix

• The transpose (or prime) of a matrix, written A′ or AT, is the matrix


obtained by transposing (switching) the rows and columns of a matrix

• If A is of dimensions R × C, A′ will be C × R.

50
The Rank of a Matrix

• The rank of a matrix A is given by the maximum number of linearly independent


rows (or columns). For example,

• In the first case, all rows and columns are (linearly) independent of one another,
but in the second case, the second column is not independent of the first (the
second column is simply twice the first)
• A matrix with a rank equal to its dimension is a matrix of full rank
• A matrix that is less than of full rank is known as short rank matrix, and is singular
• Three important results:
- Rank(A) = Rank (A′);
- Rank(AB) ≤ min(Rank(A), Rank(B));
- Rank (A′A) = Rank (AA′) = Rank (A)
51
The Inverse of a Matrix

• The inverse of a matrix A, where defined and denoted A−1, is that matrix
which, when pre-multiplied or post multiplied by A, will result in the
identity matrix, i.e. AA−1 = A−1A = I
• The inverse of a matrix exists only when the matrix is square and non
non--
singular i.e., the determinant is different than zero |A|≠0
|A|≠0
• Properties of the inverse of a matrix include:
– I−1 = I
– (A−1)−1 = A
– (A′)−1 = (A−1)′
– (AB)−1 = B−1A−1

52
Calculating Inverse of a 2×2 Matrix

• The inverse of a 2 × 2 non-singular matrix whose elements are


will be

• The expression (ad − bc) is the determinant of the matrix, and will be a
scalar. The determinant is defined ONLY for square matrices
• If the matrix is

the inverse will be

• As a check, multiply the two matrices together and it should give the
identity matrix I.

53
Complex Numbers

If the discriminant of a quadratic equation is negative, the equation has no


real solution. For example, the equation
x2 + 4 = 0
has no real solution. If we try to solve this equation, we get
x2 = –4, so

But this is impossible, since the square of any real number is positive. [For
example,(–2)2 = 4, a positive number.]

Thus, negative numbers don’t have real square roots.


Complex Numbers

To make it possible to solve all quadratic equations, mathematicians


invented an expanded number system, called the complex number system.

First they defined the new number (imaginary)

This means that i 2 = –1.

A complex number is then a number of the form a + bi, where a and b are
real numbers.
Complex Numbers

Note that both the real and imaginary parts of a complex number are real
numbers.
Complex Numbers

Have both a real and imaginary part

Z = 5 + 3i

Imaginary part
Real part

General form: z = x +iy


Example

The following are examples of complex numbers.

3 + 4i Real part 3, imaginary part 4

Real part , imaginary part

6i Real part 0, imaginary part 6

–7 Real part –7, imaginary part 0


Arithmetic Operations on Complex Numbers

Complex numbers are added, subtracted, multiplied, and divided just as


we would any number of the form a + ib

The only difference that we need to keep in mind is that i2 = –1. Thus, the
following calculations are valid.

(a + bi)(c + di) = ac + (ad + bc)i + bdi2 Multiply and collect like terms

= ac + (ad + bc)i + bd(–1) i2 = –1

= (ac – bd) + (ad + bc)i Combine real and imaginary parts


Arithmetic Operations on Complex Numbers

We therefore define the sum, difference, and product of complex numbers


as follows.
Complex conjugate

Division of complex numbers is much like rationalizing the denominator


of a radical expression.

For the complex number z = a + bi we define its complex conjugate to be


z = a – bi.

Note that
z z = (a + bi)(a – bi) = a2 + b2
So the product of a complex number and its conjugate is always a
nonnegative real number.
Arithmetic Operations on Complex Numbers

We use this property to divide complex numbers.


Example Dividing Complex Numbers

Express the following in the form a + bi.

Solution:
We multiply both the numerator and denominator by the complex
conjugate of the denominator to make the new denominator a real number.
cont’d
Solution

The complex conjugate of 1 – 2i is 1 – 2i = 1 + 2i.


Square Roots of Negative Numbers

Just as every positive real number r has two square roots


every negative number has two square roots as well.
Examples

(a)

(b)

(c)
Square Roots of Negative Numbers

but

so

When multiplying radicals of negative numbers, express them first in the


form (where r > 0) to avoid possible errors of this type.
Complex Solutions of Quadratic Equations

We have already seen that if a ≠ 0, then the solutions of the quadratic


equation ax2 + bx + c = 0 are

If b2 – 4ac < 0, then the equation has no real solution.

But in the complex number system, this equation will always have
solutions, because negative numbers have square roots in this expanded
setting.
cont’d
Example

Solve x2 + 4x + 5 = 0

By the Quadratic Formula we have x

= –2 ± i

So the solutions are –2 + i and –2 – i.


Taylor Series and Maclaurin Series

The next theorem gives the form that every convergent power series must
take.

The coefficients of the power series are precisely the coefficients of the Taylor
polynomials for f(x) at c. For this reason, the series is called the Taylor series
for f(x) at c.
Taylor Series and Maclaurin Series
Example – Forming a Power Series

Use the function f(x) = sin x to form the Maclaurin series

and determine the interval of convergence.

Solution:
Successive differentiation of f(x) yields

f(x) = sin x f(0) = sin 0 = 0


f'(x) = cos x f'(0) = cos 0 = 1
f''(x) = –sin x f''(0) = –sin 0 = 0
f(3)(x) = –cos x f(3)(0) = –cos 0 = –1
Example – Solution
cont’d

f(4)(x) = sin x f(4)(0) = sin 0 = 0


f(5)(x) = cos x f(5)(0) = cos 0 = 1
and so on.

The pattern repeats after the third derivative.


Example – Solution
cont’d

So, the power series is as follows:


Taylor Series and Maclaurin Series

You cannot conclude that the power series converges to sin x for all x.

You can simply conclude that the power series converges to some function, but
you are not sure what function it is.

This is a subtle, but important, point in dealing with Taylor or Maclaurin


series.

To persuade yourself that the series

might converge to a function other than f, remember that the derivatives are
being evaluated at a single point.
Taylor Series and Maclaurin Series

It can easily happen that another function will agree with the values of
f (n)(x) when x = c and disagree at other x-values.

If you formed the power series for


the function shown in Figure
you would obtain the same series
as in Example

You know that the series converges


for all x, and yet it obviously cannot
converge to both f(x) and sin x
for all x .
Taylor Series and Maclaurin Series

Let f have derivatives of all orders in an open interval I centered at c.

The Taylor series for f may fail to converge for some x in I. Or, even if it
is convergent, it may fail to have f(x) as its sum.

Nevertheless, the Theorem tells us that for each n,

where
Taylor Series and Maclaurin Series

Note that in this remainder formula, the particular value of z that makes the
remainder formula true depends on the values of x and n. If then
the next theorem tells us that the Taylor series for f actually converges to f
(x) for all x in I.
Taylor Series and Maclaurin Series
Binomial Series

Before presenting the basic list for elementary functions, you


will develop one more series — for a function of the form
f(x) = (1 + x)k

This produces the binomial series


Example – Binomial Series

Find the Maclaurin series for f(x) = (1 + x)k and determine its radius of
convergence. Assume that k is not a positive integer and k ≠ 0.

Solution:
By successive differentiation, you have
f(x) = (1 + x)k f(0) = 1
f'(x) = k(1 + x)k – 1 f'(0) = k
f''(x) = k(k – 1)(1 + x)k – 2 f''(0) = k(k – 1)
f'''(x) = k(k – 1)(k – 2)(1 + x)k – 3 f'''(0) = k(k – 1)(k – 2)
. .
.

f (n)(x) = k…(k – n + 1)(1 + x)k – n f (n)(0) = k(k – 1)…(k – n + 1)


Example – Binomial Series
cont’d

which produces the series:

Because an + 1/an→1, you can apply the Ratio Test to conclude that
the radius of convergence is R = 1.

So, the series converges to some function in the interval (–1, 1).
Basic Taylor Series
Statistical Foundations

84
Distributions: The population and the sample

• The population is the total collection of all objects to be studied.


• The population may be either finite or infinite, while a sample is a
selection of just some items from the population.
• A population is finite if it contains a fixed number of elements.
• In general, either all of the observations for the entire population will not
be available, or they may be so many in number that it is infeasible to work
with them, in which case a sample of data is taken for analysis.
• The sample is usually random, and it should be representative of the
population of interest.
• A random sample is one in which each individual item in the population is
equally likely to be drawn.

85
Probability and probability distributions
- Some definitions

• A random variable can take any value from a given set


• A discrete random variable can take on only certain specific values (e.g.,
the sum of two dice thrown)
• A probability is the likelihood of a particular event happening
• A probability distribution function shows the outcomes that are possible
from a random process and how likely each one is to occur
• A continuous random variable can take any value (possibly only within a
fixed range), and the probabilities associated with each range of outcomes
is shown in a probability density function (pdf)
• The probability that a continuous variable takes on a specific value is
always zero, since the variable could be defined to any arbitrary degree of
accuracy (0.1 vs 0.1000001…. etc.) and thus we can only calculate the
probability that the variable lies within a particular range.
• There are many continuous distributions, including the uniform and the
normal, Student, Chi-square, F etc.
86
The normal distribution

• The normal (Gaussian) distribution is the most commonly used in statistics


• It has many desirable properties and is easy to work with
• It is unimodal (has only one peak) and symmetric
• The moments of a distribution describe its properties.
• The first two moments of a distribution are its mean and variance
respectively
• Only knowledge of the mean and variance are required to completely
describe the Normal distribution
• A normal distribution has a skewness of zero and a kurtosis of 3 (excess
kurtosis of zero)
• Skewness and kurtosis are the (standardised) third and fourth moments of
the distribution respectively
• All moments are derived from the moment generation function.
87
The normal distribution 2

88
A plot of the pdf for a normal distribution

89
Other important distributions

• Three other important continuous distributions are the chi-squared (χ2),


the F and the t (sometimes known as Student’s t)
• These distributions are all related to the normal and to each other
• The sum of squares of n independent normal distributed random
variables will follow a chi-squared distribution variable with n degrees
of freedom
• The ratio of two independent chi-squared distributions divided by their
degrees of freedom n1 and n2 will be an F-distribution with n1 and n2
degrees of freedom
• The t distribution tends to the normal as its degrees of freedom
increase to infinity
• Each of these distributions will be discussed in detail later at the point
used.
90
Descriptive stats: Measures of central tendency

• The average value of a series is its measure of location or measure of


central tendency, capturing its ‘typical’ behaviour
• There are three broad method to calculate the average value of a series: the
mean, median and mode
• The mean is the very familiar sum of all N observations divided by N.
More strictly, this is known as the arithmetic mean, adding all values in the
data set and then dividing by the number of obs.
• The mode is the most frequently occurring value in a set of observations
• The median is the middle value in a series when the observations are
arranged in ascending order (corresponding to the 50% quartile). Or, it is
the value where 50% of the data values fall exactly at, or below it
it..
• Each of the three methods of calculating an average has advantages and
disadvantages
91
Measures of spread

• The spread of a series about its mean value can be measured using the
variance or standard deviation (which is the square root of the variance)
• This quantity is an important measure of risk in finance
• The standard deviation scales with the data, whereas the variance scales
with the square of the data. So, for example, if the units of the data points
are US dollars, the standard deviation will also be measured in dollars
whereas the variance will be in dollars squared
• Other measures of spread include the range (the difference between the
largest and smallest of the data points) and the semi-interquartile range
(the difference between the first (25%) and third quartile (75%) points in
the series
• The coefficient of variation divides the standard deviation by the sample
mean to obtain a unit-free measure of spread that can be compared across
series with different scales.
92
Higher moments

• The higher moments of a data sample give further indications of its features
and shape.

• Skewness is the standardised third moment of a distribution and indicates the


extent to which it is asymmetric

• Kurtosis is the standardised fourth moment and measures whether a series is


‘fat’ or ‘thin’ tailed

• Skewness can be positive or negative while kurtosis can only be positive


(and equal to 3 for the Normal)

• The formulae for skewness and kurtosis calculate the quantities using the
sample data in the same way that the variance is calculated 93
Plot of a positive skewed series versus a normal
distribution

94
Plot of a leptokurtic df (fat-tailed, kurtosis >3) versus a
Normal distribution

95
Measures of association

• Covariance is a linear measure of association between two random


variables.
Cov =
 ( x − x )( y − y )
N −1
• It is simple to calculate but scales with the standard deviations of the two
variables
• Correlation is another measure of association that is calculated by dividing
the covariance between two variables by the product of their standard
deviations
• Correlations are unit-free and must lie between [-1,+1]
• The correlation calculated in this way is more specifically known as
Pearson’s correlation measure.
• An alternative measure is known as Spearman’s rank correlation measure – this is
preferable when the data are a long way from following a normal distribution
96
Econometrics/Regression
Foundations

97
Introduction:
The Nature and Purpose of Econometrics

• What is Econometrics?

• Literal meaning is “measurement in economics”.

• Definition of financial econometrics:


The application of statistical and mathematical techniques to problems in
finance.
Types of Data

• There are 3 types of data which econometricians might use for analysis:
1. Time series data
2. Cross-sectional data
3. Panel data, a combination of 1. & 2.
4. Big data

• The data may be quantitative (e.g. exchange rates, stock prices, number of
shares outstanding), or qualitative (e.g. day of the week).
Steps involved in the formulation of
econometric models

Economic or Financial Theory (Previous Studies)

Formulation of an Econometrically Estimable Theoretical Model

Collection of Data

Model Estimation

Is the Model Statistically Adequate?

No Yes

Reformulate Model Interpret Model

Use for Analysis & Extrapolation


Finding the “Best Fit” line

• We can use the general equation for a straight line,


y=a+bx
to get the line that best “fits” the data.

• But y=a+bx is completely deterministic. Is this realistic? No.

• So what we do is to add a random term, u into the equation.


yt = α + βxt + ut
where t = 1,2,3,4,5

101
Why do we include a random term?

• The disturbance term can capture:

- “Omitted” or “neglected” determinants of yt


- Errors in the measurement of yt that cannot be modelled.
- Random outside influences on yt which we cannot model
- Ignorance…

102
Determining the Regression Coefficients

• So how do we determine what α and β are?


• Choose α and β so that the distances from the data points to the fitted lines
are minimised:
y

103
Ordinary Least Squares Method

• The most common method used to fit a line to the data is known as
OLS (ordinary least squares).

• Take each distance and square it (i.e. take the area of each of the
squares in the diagram) and minimise the total sum of the squares
(hence least squares).

• In general:
yt denotes the actual data point t
ŷt denotes the fitted value from the regression line
ût denotes the residual, yt - ŷt

104
Actual and Fitted Value

yi

û i

ŷi

xi x

105
OLS (1)

5
2 2 2 2 2
• So min. uˆ + uˆ + uˆ + uˆ + uˆ , or minimise
1 2 3 4 5  uˆ 2
t . This is known as the
residual sum of squares. t =1

• But what was ût ? It was the difference between the actual point and the line,
yt - ŷt .

• So minimising  ( y t − yˆ t ) is equivalent to minimising  t


2 2
ˆ
u
with respect to αɵ and βɵ.

106
OLS (2)

• But yˆ t = αˆ + βˆx t , so let L =  ( yt − yˆ t ) 2 =  ( y t − αˆ − βˆxt ) 2


t i

• Want to minimise L with respect to (w.r.t.)αɵ and βɵ, so differentiate L w.r.t.


αɵ and βɵ

∂L
= −2  ( yt − αˆ − βˆxt ) = 0 (1)
∂ αˆ t

∂L
= −2  xt ( yt − αˆ − βˆxt ) = 0 (2)
∂βˆ t

107
OLS (3)

• It is proven that

ˆ
β =  xt yt − T x y
and
and αˆ = y − βˆx
 x t2 − T x 2
or

βˆ =  ( xt − x )( yt − y )
 ( xt − x ) 2

108
Estimator or Estimate?

• Estimator is the formula/function used to calculate the coefficients, or a


function of the obs. of the sample used to calculate the population
parameters

• Estimate is the actual numerical value for the coefficients, or the value of
the estimator for a particular sample.

109
Properties of the OLS Estimator

• If assumptions hold, then the estimators are known as Best Linear Unbiased
Estimators (BLUE).

• “Estimator” - βɵ is an estimator of the true value of β.


• “Linear” - βɵ is a linear estimator
• “Unbiased” - On average, the actual value of the αɵ and βɵ’s will be equal to
the true values.
• “Best” - The OLS estimator has minimum variance among the class of
linear unbiased estimators.

110
Precision and Standard Errors

• Any set of regression estimates of αɵ and βɵ are specific to the sample used in their
estimation.

• What we need is some measure of the reliability or precision of the estimators.


The precision of the estimate is given by its standard error. Given the BLUE
assumptions before, then the standard errors can be shown to be given by

 
2
x t2 xt
S E ( αˆ ) = V a r ( aˆ ) = s = s ,
T  ( xt − x ) 2 T  x t2 − T 2 x 2

1 1
S E ( βˆ ) = V a r ( βˆ ) = s = s
 ( xt − x )2  x t2 − T x 2

where s is the estimated standard deviation of the residuals ût


• An unbiased estimator of σ is given by

s=
 t
ˆ
u 2

T −2
111
Example: How to Calculate the Parameters and
Standard Errors

• Assume we have the following data calculated from a regression of y on a


single variable x and a constant over 22 observations.
• Data:
 x t y t = 830102 , T = 22 , x = 416 .5, y = 86 .65 ,
x t
2
= 3919654 , RSS = 130 . 6

ɵ 830102 − (22 * 416.5 * 86.65)


• Calculations: β = 2 = 0.35
3919654 − 22 * (416.5)

αɵ = 86.65 − 0.35 * 416.5 = −59.12

• We write yˆ t = αˆ + βˆ x t
yˆ t = -59 . 12 + 0 .35 x t

112
Example (cont’d)

• SE(regression),
s=
 ˆ
u 2
t
=
130.6
= 2.55
T −2 20

3919654
SE (α^ ) = 2.55 * = 3.35
(
(22 × 3919654) − 22 × 416.5 2
)
^ 1
SE ( β ) = 2.55 * = 0.0079
(
3919654 − 22 × 416.5 2
)
• We now write the results as

yˆ t = − 59.12 + 0.35 xt
(3.35) (0.0079)

113
Hypothesis Testing

• We will always have two hypotheses that go together, the null hypothesis (denoted H0)
and the alternative hypothesis (denoted H1).
• The null hypothesis is the statement or the statistical hypothesis that is actually being
tested. The alternative represents the remaining outcomes of interest.
• For example:
H0 : β = 0.5
H1 : β ≠ 0.5
This would be known as a two sided test.

• If we have prior information that, e.g. β > 0.5 rather than β < 0.5, we would do a one-
sided test:
H0 : β = 0.5
H1 : β > 0.5 or H1 : β < 0.5

• There are two ways to conduct a hypothesis test: via the test of significance approach
or via the confidence interval approach.
114
Testing Hypotheses: I) The Test of Significance Approach

• After estimating αɵ , βɵ and SE(αɵ ) , SE( βɵ ) in the usual way, we calculate the test
statistic. This is given by the formula (proof in the book)
βɵ − β *
test statistic =
SE ( βɵ )
where β * is the value of β under the null hypothesis.

• This test statistic follows a Student t-distribution with T-2 degrees of freedom
• Then we need to choose a “significance level”, often denoted α. This determines the
region where we will reject or not reject the null hypothesis that we are testing.
• We use the t-tables to obtain a critical value/values with which to compare the test
statistic.
• Finally we perform the test: If the test statistic lies in the rejection region then reject
the null hypothesis (H0), else do not reject H0

115
Determining the Rejection Region for a Test of
Significance

For a 2-sided test:

f(x)

2.5% 95% non-rejection 2.5%


rejection region region rejection region

116
The Rejection Region for a 1-Sided Test (Upper Tail)

f(x)

95% non-rejection
5% rejection region

117
II) The Confidence Interval Approach to Hypothesis Testing

1. Calculate αɵ , βɵ and SE(αɵ ), SE( βɵ ) as before.

2. Choose a significance level, α, (again the convention is 5%). This is equivalent to


choosing a (1-α)×100% confidence interval, i.e. 5% significance level = 95%
confidence interval

3. Use the t-tables to find the appropriate critical value, which will again have T-2
degrees of freedom.

4. The confidence interval is given by ( βˆ − t crit × SE ( βˆ ), βˆ + t crit × SE ( βˆ ))

5. Perform the test: If the hypothesised value β* lies outside the confidence interval,
then reject the H0: β = β*, otherwise do not reject the null.

118
The t-ratio

• The test of significance approach to hypothesis testing using a t-test was:


βɵi − β i*
test statistic =
SE( βɵi )

• If the test is H0 : βi = 0, H1 : βi ≠ 0, i.e. a test that the population coefficient is


zero against a two-sided alternative, this is known as a t-ratio test

• Since β i* = 0, the ratio of the coefficient to its SE is the t-ratio or t-statistic:


ɵ
βi
test stat = ɵ = t-ratio
SE( β i )

119
Multiple Linear Regression Model

• We could write out a separate equation for every value of t (obs.):

y1 = β1 + β 2 x21 + β 3 x31 + ... + β k xk1 + u1


y2 = β1 + β 2 x22 + β 3 x32 + ... + β k xk 2 + u2
⋮ ⋮ ⋮
yT = β1 + β 2 x2T + β 3 x3T + ... + β k xkT + uT
• We can write this in matrix form
β +u
y = Xβ

where y is T × 1, X is T × k (1st column of ones), β is k × 1, u is T × 1


 uˆ1 
uˆ 
• Now the RSS would be given by uˆ ' uˆ = [uˆ1 uˆ2 … uˆT ] 2  = uˆ12 + uˆ 22 + ... + uˆT2 =  uˆt2
 
 
uˆT 
120
The OLS Estimator & St. Errors for the MLR Model

• In order to obtain the parameter estimates, β1, β2,..., βk, we would minimise the
RSS with respect to all the β s. It can be shown from Gauss-Markov theorem that:

 βˆ1 
 
 βˆ 
βˆ =  2  = ( X ′X ) −1 X ′ y

 
 β k 
ˆ

uɵ' uɵ
• To estimate the variance of the errors, we use s2 =
T−k
where k = number of regressors. It can be proved that the OLS estimator of
the variance of βɵ is given (Gauss-Markov) by the diagonal elements of
s2 ( X ' X ) −1
121
Goodness of Fit

• How well our regression model actually fits the data?

 t
( )  t
( ) +  uˆt2
2 2
• TSS = ESS + RSS => y − y = ˆ
y − y
t t t

where, the part which we have explained is known as the explained sum of
squares, ESS and the part which cannot be explained using the model and is
due to random factors (the RSS).

ESS
• Our goodness of fit statistic is R 2 = TSS
ESS TSS − RSS RSS
• But since TSS = ESS + RSS, we can also write R 2 = = = 1−
TSS TSS TSS

• R2 must always lie in [0, 1]. To understand this, consider two extremes
RSS = TSS i.e. ESS = 0 so R2 = ESS/TSS = 0
ESS = TSS i.e. RSS = 0 so R2 = ESS/TSS = 1
122
R2 = 0 and R2 = 1

yt yt

xt xt

123

You might also like