Probability Density Functions

This document discusses probability density functions (PDFs) and how to construct and normalize them using experimental data. It provides the following key points: 1. A PDF is a smooth curve fit to a vertically normalized histogram that represents the probability of a continuous variable taking on a given value. 2. To construct a PDF from data: bin the data, calculate probabilities and midpoints, determine the PDF by dividing probabilities by bin widths, and plot the smoothed curve. 3. Normalizing the PDF transforms the variables so the PDF is centered at zero with a standard spread, allowing comparison to standard distributions. 4. The document walks through constructing and normalizing the PDF of 1000 temperature measurements to demonstrate the process

Uploaded by

Sebastian Santiago

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

54 views

Probability Density Functions

Uploaded by

Sebastian Santiago

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Probability Density Functions, Page 1

Probability Density Functions

Author: J ohn M. Cimbala, Penn State University
Latest revision: 20 J anuary 2010

Probability Density Functions
Probability density function In simple terms, a probability density function (PDF) is constructed by
drawing a smooth curve fit through the
vertically normalized histogram as
sketched. You can think of a PDF as the
smooth limit of a vertically normalized
histogram if there were millions of
measurements and a huge number of bins.
o The main difference between a
histogram and a PDF is that a
histogram involves discrete data
(individual bins or classes), whereas a PDF involves continuous data (a smooth curve).
x
f(x)
x
1
x
2
x
3
...
0.02
0.03
0
0.01
o Mathematically, f(x) is defined as ( )
2 2
i i
i
dx dx
P x x x
f x
dx

< +

= , where
2 2
i i
dx dx
P x x x

< +

represents the probability that variable x lies in the given range, and f(x) is the probability density
function (PDF). In other words, for the
given infinitesimal range of width dx
between x
i
dx/2 and x
i
+dx/2, the
integral under the PDF curve is the
probability that a measurement lies
within that range, as sketched.

x
f(x)
xi +dx/2
0.02
0.03
0
0.01
xi dx/2
dx
x
i

2 2
i i
dx dx
P x x x

< +

o As shown in the sketch, this probability
is equal to the area (shaded blue region)
under the f(x) curve i.e., the integral
under the PDF over the specified
infinitesimal range of width dx.
o The usefulness of the PDF is as follows: Suppose we choose a range of variable x, say between a and b.
The probability that a measurement lies
between a and b is simply the integral
under the PDF curve between a and b,
as sketched, where we define the
probability as
( ) (
x b
x a
P a x b f x dx
=
=
< =

x
f(x)
b
0.02
0.03
0
0.01
a
P(a <x b)
)

o If a and b +, the probability
must equal 1 (100%), i.e., ( ) ( ) 1
x
x
P x f x dx
=
=
< < = =

.
In other words, the probability that x lies between and + is 100% (a fact that should be obvious,
since there are no other possibilities for real number x).
o Once we have defined the probability density function f(x), we leave the system of discrete random
variables and enter the system of continuous random variables, on which we make some more formal
definitions:
Expected value is defined in terms of the probability density function as the mean of all possible x
values in the continuous system. Namely, ( ) ( ) expected value E x xf x

= = =

dx . In an ideal
situation in which f(x) exactly represents the population, is the mean of the entire population of x
values, and that is why it is called the expected value. It is therefore also called the population
mean. In general, x , but x when n is large, i.e., the sample mean approaches the
Probability Density Functions, Page 2
expected value when n is large. x and are often used interchangeably, but this should be done
only if n is large.
Standard deviation is defined in terms of the PDF as
( ) ( )
2
standard deviation x f x dx

= =

. In an ideal situation in which f(x) exactly represents

the population, is the standard deviation of the entire population. It is therefore also called the
population standard deviation. If n is large, S . Often, S and are used interchangeably, but
this should be done only if n is large.

Normalized probability density function a normalized probability density function is constructed by
transforming both the abscissa (horizontal axis) and ordinate (vertical axis) of the PDF plot as follows:
x
z

= and ( ) ( ) f z f = x .
o The above transformations accomplish two things:
The first transformation normalizes the abscissa such that the PDF is centered around z =0.
The second transformation normalizes the ordinate such that the PDF is spread out in similar fashion
regardless of the value of standard deviation.
o When normalized in this way, the normalized PDF can be directly compared to standard PDFs, which we
discuss in a later learning module.
o To summarize, here are several steps used in Excel to generate a normalized PDF of experimental data:
1. Generate the histogram with Excel as discussed in the histogram learning module. Excel generates a
table called a frequency table. The table contains two columns, bin and frequency. Bin is the
maximum value of the range of each bin, and frequency is the number of data points in that bin range.
(For example, suppose there are 200 data points total, the mean value of x is 10.0, and the standard
deviation of the data set is 3.0. Also suppose that 8 of those data points lie in the bin with x between
4 and 6 (4 <x 6). Thus, for this bin, Bin =6 and Frequency =8.)
2. Create a new column called probability in which you divide each frequency by the total number of
data points. This gives the probability that a data point lies in that bin, i.e. probability frequency/ n = .
(In the example here, probability =8/200 =0.040 or 4.0%.)
3. Create a new column called x
mid
in which you list the mid value of each bin:
mid min max
( ) x x x = + / 2.
(In the example here, the mid value of the sample bin is (4 +6)/2 =5.0.)
4. Create a new column called f(x) in which you divide each probability by the appropriate bin width,
i.e., ( ) probability/ f x x = .
(In the example here, the bin width of the sample bin is x =6 4 =2, and f(x) =0.04/2 =0.02 at x =
x
mid
=5.0.) A smoothed plot of f(x) versus x is the PDF.
5. Create a new column called z in which you normalize the x values into nondimensional z values.
This is accomplished by converting each mid value of x into z: ( ) / z x = .
(In the example here, z for the sample bin is z =(5.0 10.0)/3.0 =1.667.)
6. Create a new column called f(z) in which you normalize the PDF into the f(z) values. This is
accomplished by converting each f(x) into f(z): ( ) ( ) f z f x = .
(In the example here, f(z) of the sample bin is f(z) =0.02*3.0 =0.060 at z =1.667.)
7. Finally, a plot of f(z) vs. z can be generated. A smooth curve through these data represents the
normalized PDF.

Example:
Given: The same 1000 temperature measurements used in a previous example for generating a histogram.
The data are provided in an Excel spreadsheet (Temperature_data_analysis.xls) on the website.
To do: Generate a PDF of these data. Normalize the PDF.
Solution:
o In a previous example (see the Histogram learning module), we generated a histogram of the temperature
data. We begin with the bin and frequency data generated in Excel.
Probability Density Functions, Page 3
o To generate the PDF, we follow the step-by-step instructions provided above. This will be shown in class
in Excel. The vertically normalized PDF is shown below (left side).

Transform
o Finally, we transform to normalized variables the fully normalized PDF is shown above (right side).
Notice that the shape is the same, but the variable transformation to f(z) is nondimensional, making it
more useful for comparison with other probability density distributions.
o The final PDF should be continuous, not discrete. Because of scatter, it is difficult to get Excel to draw a
smooth curve through these data. For lack of a better method at this point, we sketch the smooth curve
by eye below:

Discussion:
o The peak in the vertically normalized PDF occurs at x 31, which is very close to the sample mean. This
peak transforms to z 0 in the fully normalized PDF; this is a useful feature of the normalization.
o We can estimate the area under the f(x) curve by eye by counting squares the area is indeed
approximately 1.0 or 100%, as it must be.
o We can also estimate the area under the f(z) curve by eye it is approximately 1.0 or 100%, as it also
must be.
There are several standard PDFs discussed in statistics literature. Of these, the normal PDF, is the most
common, and will be discussed next. We will also compare the above results with the normal PDF.

Solucionario de Matematicas para Administracion y Economia PDF
90% (747)
Solucionario de Matematicas para Administracion y Economia PDF
719 pages
110 Normal Distribution
No ratings yet
110 Normal Distribution
5 pages
M131-Lecture Notes No. 4
No ratings yet
M131-Lecture Notes No. 4
58 pages
Probability Densities and Normality
No ratings yet
Probability Densities and Normality
17 pages
Lesson 7.1 Introduction To The Normal Distribution
No ratings yet
Lesson 7.1 Introduction To The Normal Distribution
9 pages
Probability Density Function:: Time Again. More Closely The Histogram Will Approximate The PDF
No ratings yet
Probability Density Function:: Time Again. More Closely The Histogram Will Approximate The PDF
46 pages
ME 288 Data Analysis Lab: Histogram and Probability Density Function PDF
No ratings yet
ME 288 Data Analysis Lab: Histogram and Probability Density Function PDF
3 pages
Lecture_07
No ratings yet
Lecture_07
70 pages
Unit 4 - Continuous Random Variables
No ratings yet
Unit 4 - Continuous Random Variables
35 pages
7.Continuous Probability Distribution
No ratings yet
7.Continuous Probability Distribution
78 pages
Lecture 8
No ratings yet
Lecture 8
15 pages
MIT18 05S14 Class6slides PDF
No ratings yet
MIT18 05S14 Class6slides PDF
24 pages
Module 6 Common Continuous Probability Distribution
No ratings yet
Module 6 Common Continuous Probability Distribution
45 pages
111 08.6 Lecture Notes
0% (1)
111 08.6 Lecture Notes
5 pages
UNIT - 3
No ratings yet
UNIT - 3
19 pages
Descriptive Statistics and Probability Distributions: Session 1
No ratings yet
Descriptive Statistics and Probability Distributions: Session 1
34 pages
Continuous Probability Distributions
100% (1)
Continuous Probability Distributions
48 pages
STAT1012 Ch4 Continuous Probability Distribution
No ratings yet
STAT1012 Ch4 Continuous Probability Distribution
53 pages
Week5 BAM
No ratings yet
Week5 BAM
48 pages
2.normal Distribution
No ratings yet
2.normal Distribution
69 pages
[05_1] probabailités 1-1
No ratings yet
[05_1] probabailités 1-1
16 pages
Ch 6 - Normal Distribution (1)
No ratings yet
Ch 6 - Normal Distribution (1)
41 pages
Module 6 - Continuous Distribution
No ratings yet
Module 6 - Continuous Distribution
54 pages
Manipulating Continuous Random Variables Class 5, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom 1 Learning Goals
No ratings yet
Manipulating Continuous Random Variables Class 5, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom 1 Learning Goals
3 pages
Statistics Mean Mode
No ratings yet
Statistics Mean Mode
10 pages
Chapter 4
No ratings yet
Chapter 4
7 pages
SLIDES Probability-Part3
No ratings yet
SLIDES Probability-Part3
17 pages
01. Review on Normal Dist., Bivariate Dist
No ratings yet
01. Review on Normal Dist., Bivariate Dist
148 pages
Orientation - Basic Mathematics and Statistics - ND
No ratings yet
Orientation - Basic Mathematics and Statistics - ND
33 pages
Calculator Use Part8
No ratings yet
Calculator Use Part8
1 page
Discrete Probability Distributions
No ratings yet
Discrete Probability Distributions
53 pages
Continuous Probability Distribution PDF
No ratings yet
Continuous Probability Distribution PDF
47 pages
STA 211 Lecture 1
No ratings yet
STA 211 Lecture 1
18 pages
2 Other PDFs
No ratings yet
2 Other PDFs
51 pages
Normal, Binomial, Poisson, and Exponential Distributions
No ratings yet
Normal, Binomial, Poisson, and Exponential Distributions
39 pages
Department of Mathematics: Faculty of Basic Sciences Probability and Statistics Exercises
No ratings yet
Department of Mathematics: Faculty of Basic Sciences Probability and Statistics Exercises
8 pages
8.normal Distribution
No ratings yet
8.normal Distribution
32 pages
Lecture
No ratings yet
Lecture
6 pages
Lecture 6
No ratings yet
Lecture 6
57 pages
5 ContinuousDiscributions
No ratings yet
5 ContinuousDiscributions
34 pages
Lecture_7
No ratings yet
Lecture_7
41 pages
Continuous Random Variable
No ratings yet
Continuous Random Variable
44 pages
Ch.3 Normal Distribution
No ratings yet
Ch.3 Normal Distribution
1 page
Activity No, 1 Continuous Probability Distributions
No ratings yet
Activity No, 1 Continuous Probability Distributions
17 pages
Math10282 Ex05 - An R Session
No ratings yet
Math10282 Ex05 - An R Session
6 pages
Lecture3 Na
No ratings yet
Lecture3 Na
73 pages
Normal Distribution
No ratings yet
Normal Distribution
17 pages
Probability Distributions
No ratings yet
Probability Distributions
18 pages
Normal Distribution
No ratings yet
Normal Distribution
51 pages
Continuous Random Variables
No ratings yet
Continuous Random Variables
10 pages
Constructing A Probability Histogram A Continues Random Variable
No ratings yet
Constructing A Probability Histogram A Continues Random Variable
23 pages
11 Normal Distribution
No ratings yet
11 Normal Distribution
48 pages
3. Continuous Random Variables and Probability .Distributions
No ratings yet
3. Continuous Random Variables and Probability .Distributions
52 pages
VI - Probability Distributions
No ratings yet
VI - Probability Distributions
55 pages
EDA01 Normal Distribution
No ratings yet
EDA01 Normal Distribution
14 pages
Statistics
No ratings yet
Statistics
51 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
A Short Course in Discrete Mathematics
From Everand
A Short Course in Discrete Mathematics
Edward A. Bender
3/5 (1)
Calculus I Essentials
From Everand
Calculus I Essentials
Editors of REA
1/5 (1)
Optimization in Function Spaces
From Everand
Optimization in Function Spaces
Amol Sasane
No ratings yet
Square Summable Power Series
From Everand
Square Summable Power Series
Louis de Branges
5/5 (1)
Chapter 15
50% (2)
Chapter 15
70 pages
Biomass Power Generation
100% (3)
Biomass Power Generation
217 pages
A Process Model To Estimate Biodiesel Production Costs
100% (1)
A Process Model To Estimate Biodiesel Production Costs
8 pages