0% found this document useful (0 votes)
85 views60 pages

Lecture Topic 4

This document discusses key probability distributions including the binomial, normal, and standard normal distributions. It provides information on their properties and how to calculate probabilities using the distributions. Formulas are given for the mean, variance, and probability mass/density functions. Examples are provided to demonstrate how to apply the distributions to calculate probabilities of outcomes. Tables for the standard normal distribution are introduced as a way to look up cumulative probabilities.

Uploaded by

dhadkan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
85 views60 pages

Lecture Topic 4

This document discusses key probability distributions including the binomial, normal, and standard normal distributions. It provides information on their properties and how to calculate probabilities using the distributions. Formulas are given for the mean, variance, and probability mass/density functions. Examples are provided to demonstrate how to apply the distributions to calculate probabilities of outcomes. Tables for the standard normal distribution are introduced as a way to look up cumulative probabilities.

Uploaded by

dhadkan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 60

Econ6034-Econometrics and Business

Statistics
Topic 4: Some Important Probability Distributions

1 / 60
Probability Distributions

I Discrete probability distributions


Discrete probability distributions
(Ref: Keller Chapter 7-4)

I Continuous probability distributions


(Ref: Keller Chapter 8-1,8-2,8-4)
Normal distribution
Chi-square distribution
F-distribution
T-distribution

2 / 60
The Binomial Distribution

I A Bernoulli process has the following properties:

There are two possible outcomes, which we call success and


failure.
The probability of a success is p, the probability of failure is
(1-p).

I If a fixed number, n, of Bernoulli trials are undertaken, the


random variable representing the number of successes in the n
trials has a binomial distribution.

The trials are independent – that is, the result of one trial does
not affect the result of any other trials.

3 / 60
The Binomial Distribution

I Notation X ∼ BIN (n, p)


Examples:
Flip a coin 10 times, X = number of heads. Then
X is a binomial random variable, X Bin(n,p)
n=10,p=0.5

I Do a survey of 1000 people, X = number of people who think


current PM is doing a good job.
X is a binomial random variable, n=1000, p=?

I Check the quality of 100 products made at a factory, X =


number of products which successfully pass quality control.
X is a binomial random variable, n=100, p=?

4 / 60
Binomial Random Variable

I The number of successes in n trials, denoted by X.


I Possible values : 0, 1, 2, ..., n
I Discrete random variable
I Formula to calculate probabilities:

n!
P(X = x) = px (1 − p)n−x
x!(n − x)!

Note: n! = n × (n − 1) × (n − 2)...2 × 1

5 / 60
Binomial Random Variable
Example:
A student sitting an econometric quiz decides to answer each of the
10 multiple choice questions entirely by chance.

I Each question has 5 options, only one of which is correct.


I Let X be the number of questions the student answers correctly

Is this a binomial experiment? conditions:


There is a fixed finite number of trials (n=10).
An answer can be either correct or incorrect.
Each answer is independent of the others.

I The probability of a correct answer (P(success)=.20) does not


change from question to question.
Then, X ∼ Bin(n=10, p=0.2)

6 / 60
Example cont.

I What is the probability the student gets no answers correct?

n!
P(X = x) = px (1 − p)n−x
x!(n − x)!

10!
P(X = 0) = 0.20 (1 − 0.2)10−0
0!(10 − 0)!
= 1 × 1 × 0.810
= 0.10737(5 dp)

7 / 60
Example cont.

I What is the probability that the student passes (i.e. gets 5 or


more correct)?

P (X ≥ 5) = P (X = 5) + P (X = 6) + P (X = 7)
+ P (X = 8) + P (X = 9) + P (X = 10)
= a lot of formulas to calculate!

P (X ≥ 5) = 1 − P (X ≤ 4) = 1 − P (X = 4) + P (X = 3)
+ P (X = 2) + P (X = 1) + P (X = 0)

8 / 60
Using Excel

=binomdist(x,n,p,Cumulative)

If Cumulative=FALSE, it returns the probability that there are


exactly x successes;

If Cumulative=TRUE, the function returns the cumulative


distribution (probability that there are at most x success).

P(X≤4)=binomdist(4,10,0.2,TRUE)=0.9672
=1-binomdist(4,10,0.2,1)=1-0.9672=0.0328
P(X=5)=binomdist(5,10,0.2,FALSE)=0.026424

9 / 60
Binomial Tables
Binomial Tables (available on iLearn)
give P(X ≤k) for different values of k, n, and p.

10 / 60
Binomial Tables

11 / 60
I The probability that the student passes:

P (X ≥ 5) = 1 − P (X ≤ 4) = 1 = 0.967 = 0.033

I Find the probability the student gets exactly 2 questions right.

P (X = 2) = P (X ≤ 2) − P (X ≤ 1)
= 0.6778 − 0.3758
= 0.3020

12 / 60
Binominal Distribution

Formulas for the mean, variance and standard deviations of a


binominal variable:

µ=E(X)=np

σ 2 =V(X)=np(1-p)
p
σ =SD(X)= np(1 − p)

13 / 60
Probability Density Function (PDF)
I A function f(x) is called a probability density function if it
satisfies the following requirements:

1. f(x) ≥ 0 for all x, that is, it must be non-negative.


2. The total area underneath the curve representing f(x) equals 1.

14 / 60
Notes

1. P(a<X<b) = area under the curve between a and b

Evaluate as
Z b
P (a < X < b) = f (x)dx
a

Notice that this shows the link between probabilities and pdf.

15 / 60
2. For a continuous pdf, the probability that X will take any
specific value is zero. Let a → b − see that area → 0

3. A continuous random variable has a mean and a variance (with


exceptions).
The mean measures the location of the distribution.
The variance measures the spread of the distribution.

16 / 60
The Normal Distribution

I The normal distribution is the most important of all probability


distributions.

Notation: X∼N(µ, σ 2 ).

I The general form of the pdf is given by:

1 1 x−µ 2
f (x) = √ e− 2 ( σ ) , −∞ < x < ∞
σ 2π

17 / 60
The Normal Distribution cont.

I Bell-shaped, symmetric about µ, reaches highest point at x=µ,


tends to zero as x → ± ∞ (range from minus infinity to plus
infinity).

18 / 60
Characteristics of the Normal Distribution

I involves a continuous variable;


I is symmetric about its mean;
I the median and the mode are equal to the mean;
I has a single mode (unimodal);
I is defined over −∞ ≤ X ≤ +∞;
I is shaped by its variance (σ 2 ); and
I is shifted by its mean (µ).

19 / 60
Normal Distribution cont.

20 / 60
Probabilities from the Normal Distribution

I Cumulative distribution function


Z x
F(x) = P(X ≤ x) = f(x)dx
−∞

21 / 60
Probabilities from the Normal Distribution
Z b
I Probability P(a < X < b) = f(x)dx
a

22 / 60
Area under the curve

I Calculate the area under the curve

Z b Z b
1 1 x−u 2
P(a < X < b) = f(x)dx = √ e− 2 ( σ ) dx
a a σ 2π

Not easy!

I Statistical tables are available that will tell us the P(X≤x) for a
normal distribution.

I However, there are infinite number of normal distributions


(with different means and variances) out there!

23 / 60
The Standard Normal Distribution

I One statistical table is provided for a “standardized” normal


distributions ∼N(0,1).

I The standard normal distribution has mean zero and variance


one, denoted by N(0,1).

I All possible normal distributions can be “standardized” to


become a N(0,1) distribution.

I Hence, all we need is this one table – known as the z-table.

I The z-table assumes a N(0,1), and provides the P(X≤x).

24 / 60
Standardizing a Normal Distribution

25 / 60
Standardizing a Normal Distribution

I Standardizing is the process of converting any Normal random


variable to a Standard Normal Random Variable.

I if X ∼N(µ, σ 2 ), then use the linear transformation below:

X −µ
Z= ∼ N (0, 1)
σ

26 / 60
Z-Table

Use the z-table to find P(Z<z) for various values of z.

27 / 60
Z-Table (Cumulative Standardised Normal
Distribution)

28 / 60
Finding areas using Z-Table
P(Z<1.5)=?

29 / 60
Finding areas using Z-Table

P(Z<1. 5)=0.9332

30 / 60
Finding areas using Z-Table

P(Z>1)=?

Hint: the area under the curve is equal to 1, therefore


P(Z>1) = 1 – P(Z ≤1).

31 / 60
Finding areas using Z-Table

P(Z>1) =1 – P(Z<1) = 1 – 0.8413 = 0.1587

32 / 60
Finding areas using Z-Table
P(Z< -1.52)=P(Z>1.52). The normal distribution is sym-
metrical.

33 / 60
Finding areas using Z-Table

P(Z< -1.52)=P(Z>1.52)=1-P(Z ≤1.52)=1-0.9357=0.0643

34 / 60
Finding areas using Z-Table

P(1<Z<1.5)=P(Z<1.5) - P(Z<1)

35 / 60
Finding areas using Z-Table

P(1<Z<1.5)=P(Z<1.5) - P(Z<1) =0.9332-0.8413=0.0919

How to use Z -Table


I Symmetry →P(Z< -a)=P(Z> a)
I Total area under curve is 1 → P(Z>a) = 1-P(Z ≤ a)
I Subtracting probabilities → P(a<Z<b) = P(Z<b) - P(Z<a)
I Drawing the graph and shading the regions of interest always
help!

36 / 60
Application Example 1

The length of metallic strips produced by a machine are normally


distributed with a mean of 100cm and a variance of 2.25cm2 . Only
strips that are between 98 and 103cm are acceptable. What
proportion of strips are acceptable?

I Let X be the length of a metallic strip in cm.


X ∼N(µ=100, σ 2 =2.25)
Find area P(98<X<103).

37 / 60
Application Example 1

38 / 60
Application Example 1

Standardize X = 98 and X = 103 from the X distribution, so we can


determine the cut-off points of the shaded region in the Z
distribution.
I Standardize X = 98

X −µ 98 − 100
Z= = √ = −1.33
σ 2.25

I Standardize X = 103

X −µ 103 − 100
Z= = √ =2
σ 2.25

39 / 60
Application Example 1

40 / 60
Application Example 1

I The shaded area is

98 − 100 X−µ 103 − 100


 
P(98 < X < 103) = P √ < < √
2.25 σ 2.25
= P(−1.33 < Z < 2)
= P(Z < 2) − P(Z < −1.33)
= 0.9772 − 0.0918
= 0.8854

I 88.54% of metallic strips produced are acceptable.

41 / 60
Application Example 2

Consider an investment whose return is normally distributed with a


mean of 10% and a standard deviation of 5%.

I Determine the probability of losing money.


Let X = investment return.
X ∼N(10, 52 )
P(X<0)=?
I Standardize X = 0 from the X distribution, so we can determine
the cut-off points of the shaded region in the Z distribution.
X −µ 0 − 10
Standardize X=0 → Z = = = −2
σ 5

42 / 60
Application Example 2

X −µ 0 − 10
P(X<0)=P ( < )
σ 5
=P(Z<-2)
=0.0228

The probability of losing money is .0228.

43 / 60
Application Example 2

I Find the probability of losing money when the standard


deviation is equal to 10%.

X∼N(10,102 )
P(X<0)=?

I Standardize X = 0 from the X distribution


X −µ 0 − 10
Standardize X = 0 → Z = = = −1
σ 10
P(X<0)=P(Z<-1)= 0.1587
The probability of losing money is .1587.
The larger the standard deviation is, the riskier the investment
is.

44 / 60
Application Example 3
Salaries of workers in a factory are normally distributed with mean
$48,000 and standard deviation $3,500. What is the minimum
salary of the top 20% of workers?

I Let X = salary of a worker.

I X∼ N(48000, 35002 )

I find P(X>x)=0.2
We essentially have to work “backwards” i.e. start with the
probability, find the z-statistic, de-standardize it to arrive at the
associated X-value ($$$).
45 / 60
Application Example 3

Find the cut-off z-value for the


shaded region.

That is, the z-value that satisfies


P(Z<z) = 0.8.

46 / 60
Application Example 3
P(Z<z) = 0.8
z=0. 84

47 / 60
Application Example 3

I The cut-off for the 20th percentile is 0.84 in the Z -distribution.


I We now have to de-standardize this to find the corresponding
value in the X-distribution ∼N(48000,35002 )
x−µ
Z= = 0.84
σ
x − 48000
= 0.84
3500
x = 50940

I The minimum salary for the top 20% of workers is $50,940.

48 / 60
Using Excel

I =NORMDIST(x, mean, std. deviation, cumulative)

I X – the value for which you want the distribution.

I If cumulative is TRUE, NORMDIS T returns the cumulative


distribution function; if FALSE, it returns the probability mass
function.

49 / 60
Linear Combinations of Normal Distributions

I A linear combination of normal variables is also normally


distributed.

I Thus if

   
X ∼ N µx , σx2 2
and Y ∼ N µy , σY
 
(αX ± βY) ∼ N αµx ± βµy , α2 σx2 + β 2 σy2 ± 2αβσxy

where α and β are constants,and σXY = cov(X, Y ).

50 / 60
Example

Consider a vending machine that sells canned cola and lemonade.


Each can of cola is sold at $1.20 while each can of lemonade is sold
at $1.50. It is known that mean daily sales of cola and lemonade are
200 cans and 180 cans respectively, while the standard deviations of
daily sales of cola and lemonade are 50 and 30 cans respectively.
Both daily sales of cola and daily sales of lemonade are normally
distributed and the covariance between the two variables is -1050.
What is the distribution of the total revenue that this vending
machine can make in a day?

51 / 60
Example
I Let X be the number of cans of cola sold in a day and Y be the
number of cans of lemonade sold in a day. Then, it is known
that
X ∼N(200, 502 ) Y∼N(180, 302 )

E(R) = E(1.2X + 1.5Y) = E(1.2X) + E(1.5Y)


= 1.2E(X) + 1.5E(Y) = 1.2 × 200 + 1.5 × 180 = 510
V (R) = V (1.2X + 1.5Y ) = V (1.2X) + V (1.5Y ) + 2 Cov(1.2X, 1.5Y )
= 1.22 V (X) + 1.52 V (Y ) + 2 × 1.2 × 1.5 × Cov(X, Y )
= 1.44 × 502 + 2.25 × 302 + 3.6 × (−1050)
= 3600 + 2025 − 3780 = 1845

Since both X and Y are normally distributed, (1.2X + 1.5Y) is also


normally distributed. So, R ≡ (1.2X + 1.5Y) ∼N(510,1845)
52 / 60
We have looked at

I Normal distribution
I Standard normal distribution
I Linear combinations of normal distributions

Next: Important variants of the normal distribution


I Chi-square distribution
I (Student’s) t distribution
I F distribution.

53 / 60
Chi-Squared Distribution

I The sum of k squared independent standardized normal


variable(s) follows a χ2 distribution with k degrees of freedom.

X∼χ2 (k)

I The mean and variance of a Chi-squared random variable are

E(x)=k and Var(X)=2k

54 / 60
Chi-Squared Distribution

The chi-squared distribution is not symmetrical.

55 / 60
Student t Distribution

I The ratio of the standard normal distribution (Z) to the square


root of an independent χ2 variable divided by its degrees of
freedom follows a Student’s t-distribution.
X∼ tk
where k is the degree of freedom.

I The student t distribution is used extensively in statistical


inference.

I The mean and variance of a student t random variable


k
E(X)=0 and V ar(X) = , k>2
k−2

56 / 60
Student t Distribution
I The Student t distribution is “mound” shaped and symmetrical
about its mean of zero.

I This distribution converges to the standardized normal


distribution as the degrees of freedom increases.

57 / 60
F Distribution

I The ratio between two independent χ2 variables, each of which


being divided by the corresponding degrees of freedom, follows
an F distribution (with a d.f. for the numerator and a d.f. for
the denominator).

X ∼ Fv1,v2

Both parameters are degrees of freedom.

58 / 60
F Distribution
I The F distribution is similar to the Chi-squared distribution
in that its starts at zero (is non-negative) and is not symmetrical.

59 / 60
Summary

I Discrete probability distribution


Binomial Distribution

I Continuous probability distribution


Normal distribution
Standard normal distribution
Linear combinations of normal distributions

Important variants of the normal distribution


I Chi-square distribution
I (Student’s) t distribution
I F distribution.

60 / 60

You might also like