0% found this document useful (0 votes)
36 views32 pages

Ins305e Part1

The document outlines a course on Probability and Statistics, taught by Ayda Şafak AĞAR ÖZBEK, with a focus on engineering applications. It includes course materials, grading criteria, and a detailed course outline covering key statistical concepts and their relevance in engineering. The document emphasizes the importance of probability and statistics in decision-making under uncertainty, providing examples of their application in various fields.

Uploaded by

legiontoper
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views32 pages

Ins305e Part1

The document outlines a course on Probability and Statistics, taught by Ayda Şafak AĞAR ÖZBEK, with a focus on engineering applications. It includes course materials, grading criteria, and a detailed course outline covering key statistical concepts and their relevance in engineering. The document emphasizes the importance of probability and statistics in decision-making under uncertainty, providing examples of their application in various fields.

Uploaded by

legiontoper
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Probability and Statistics

INS 305E
PART 1
Introduction

Ayda Şafak AĞAR ÖZBEK

Course originally developed by : Prof. Dr. Mehmetçik BAYAZIT, Prof. Dr. Beyhan YEĞEN
1
INSTRUCTOR:

Ayda Şafak AĞAR ÖZBEK


sagar@itu.edu.tr
Civil Engineering / Building Materials Section

2
TEXTBOOK:
• Bayazıt, M., Oğuz, B., Probability and Statistics for Engineers, Birsen Yayınevi, 1998.

OTHER REFERENCES:
• Ross, S., A First Course in Probability, Prentice-Hall International, 1998.
• Walpore, E. W., Myers, R. H., Myers, S. L., Ye, K., Essentials of Probability and Statistics
for Engineers and Scientists, Pearson, 2013.
• Murray R. Spiegel, Theory and Problems of Statistics, McGraw-Hill, 1961.
• Bulu, A., İstatistik Problemleri, Teknik Kitaplar Yayınevi, 1986.
• Weiss, N. A. Introductory Statisitics, Pearson, 2008

3
GRADING:
• 1 Midterm Exam 40 %
• 2 Homework Assignments ,ü a 10 % (5 % each)
• Final Exam 50 %

All course material will be uploaded to Ninova. Check Ninova

4
COURSE OUTLINE:
• Introduction
• Elements of Probability Theory
• Distributions of Random Variables
• Multivariable Distributions
• Parameters of Random Variables, Bernoulli Trials
• Probability Distribution Functions (Normal Distribution)
• Probability Distribution Functions (Other Distributions)
• Sampling Distributions
• Statistical Hypotheses
• Hypothesis Tests
• Regression Analysis

5
PROBABILITY and STATISTICS
Using and understanding probability and statistics theories have become required skills in every
profession and academic discipline.

 Probability theory is nothing but common sense reduced to calculation. 


P. S. LAPLACE
 Statistics is the grammar of science. 
K. PEARSON
Probability:
The chance that a given event will occur.
A branch of mathematics concerned with developing models to define the likelihood of an
event.
Statistics:
Statistics is based on the collection, analysis, interpretation, and presentation of numerical
facts (data).
A branch of mathematics dealing with fitting the available data to probability models and thus
estimating the properties of the variable.
6
Probability vs. Statistics

Probability theory is devoted to the study of uncertainty and variability.


Probability quantifies how certain/uncertain we are about future events.

Statistics can be described as the study of how to make decisions in the face of uncertainty and
variability. 7
Some examples of how probability and statistics shape your life when you
don't even know it.
• Weather Forecasts
• Emergency Preparedness
• Predicting Catastrophes (earthquakes, floods...)
• Medical Studies
• Genetics
• Insurance
• Consumer Goods
• Stock Market
• Quality Control
• etc...
8
Engineers...

Civil Engineers apply scientific laws and mathematics


to design, develop, test, construct and supervise
various structures.
They perform experiments and collect and analyze
data that can be used to explain relationships better
and to reveal information about the quality of the
products they provide.

9
Engineers make use of fundamental laws of probability and statistical results to draw
conclusions about scientific systems.
Information is gathered in the form of sample data or collections of observations.
To be able to better visualize and examine the nature of the available information,
several types of tools are often used:
histograms, scatter plots, box plots etc...

10
How to approach a problem:
Deterministic Approach
• Deterministic approach assumes certainty in all aspects.
• A deterministic situation is the one in which the system parameters can be determined
exactly. This is also called a situation of certainty.
• In engineering systems in reality, such a system rarely exists. There is usually some
uncertainty associated.
Some Examples:
Predicting the amount of money in a bank account.
If you know the initial deposit, the amount of interest and the amount you spent, then:
You can determine the amount left in the account.
Finding the acceleration (a) of a body of known mass (m) when a certain force (F) is
exerted.
Using the Newton’s second law, you can calculate the acceleration (F=ma) and always
obtain the same output from the provided input.
11
Probabilistic Approach
• Probabilistic situation is called a situation of uncertainty.
• You know the likelihood that something will happen, but you don’t know if or when it
is going to happen.

Some Examples:
Predicting what number will come up when you roll a die.
(Dice are commonly used to give examples in probability. Dice is plural , die is
singular )
Predicting when number 6 will come up when rolling dice.
You know that in each roll, each number will come up with the probability of 1/6, but you
cannot exactly predict what will come up and when.

12
Some of the commonly used statistical terms...
(which will again be mentioned in detail during the rest of the course)

Mean (Arithmetic Mean)


It is computed by adding all of the numbers in the data together and dividing by the
number elements contained in the data set.

Example :
• Data Set = 2, 5, 9, 3, 5, 4, 7
• Mean = ( 2 + 5 + 9 + 7 + 5 + 4 + 3 ) / 7 = 5

13
Median

• Median is the middle number in a sorted list of numbers.

• How to calculate:
First reorder the data set from the smallest to the largest.
Find the middle value.
If there are 2 middles, add them up and divide by 2.

14
Example : Odd Number of Elements
• Data Set = 2, 5, 9, 3, 5, 4, 7
• Number of Elements in Data Set = 7
• Reordered = 2, 3, 4, 5, 5, 7, 9
^
• Median = 5
Example: Even Number of Elements
• Data Set = 2, 5, 9, 3, 5, 4
• Number of Elements in Data Set = 6
• Reordered = 2, 3, 4, 5, 5, 9
^ ^
• Median = ( 4 + 5 ) / 2 = 4.5

15
Mean vs. Median

16
Mode
• The most frequently occurring number (or member) found in a set of
numbers (members).
• The mode is found by collecting and organizing data in order to count the
frequency of each result.
• The result with the highest count of occurrences is the mode of the set.
Example:
mode: 6

17
Range :

• The range for a data set is the difference between the largest value and smallest value
contained in the data set.
• First reorder the data set from smallest to largest. Then subtract the first element
from the last element (or just subtract the smallest from the largest).
Example:
• Data Set = 7,6,4,9,3
• Reordered = 3, 4, 6, 7, 9
• Range = ( 9 - 3 ) = 6

18
Variance :

• The variance measures how far each number in the set is from the arithmetic mean.
• Variance is calculated by taking the differences between each number in the set and
the mean, squaring the differences (to make them positive) and dividing the sum of
the squares by the number of values in the set.

19
Standard Deviation :

• Standard deviation is a measure of the dispersion of a set of data from its arithmetic
mean.
• It is calculated as the square root of variance.
• If the data points are further away from the mean, there is higher deviation within the
data set.

• It is always a positive quantity. It has the same unit (dimension) as the data itself.
• It would be equal to zero if all the data were equal.

20
Standard Deviation :

• Standard deviation would not be affected if all the data were increased or decreased
by the same amount.

21
Coefficient of Variation:

• Coefficient of variation is a statistical measure of the dispersion of data points in a


data series around the mean.
• The coefficient of variation represents the ratio of the standard deviation to the
mean.
• It is a useful term for comparing the degree of variation from one data series to
another, even if the means are drastically different from one another.
• Coefficient of variation is unitless (dimensionless).

22
Coefficient of Skewness:
• Skewness can be measured by the mean of the cubes of the differences for each term
𝑥 − 𝑥̅ 𝟑 divided by the cube of the standard deviation sx3.
• Coefficient of skewness is unitless (dimensionless).
• It is expressed as either a number smaller than 1 or a percentage. It can also be
negative (negatively skewed).
• If the data is perfectly symmetrical, the cube of a positive difference is canceled by
the cube of an equal negative difference, and therefore the mean of the cubes (skew)
is zero. Therefore, coefficient of skewness is also zero.

23
Coefficient of Skewness:

• Skewness is a term in statistics which is used to describe asymmetry.


• Skewness can come in the form of negative skewness or positive skewness,
depending on whether data points are skewed to the left or to the right.

24
Some introductory examples that show the significance of statistics in engineering
problems...
Example:
Annual flow volumes at Keban station (dam) on Fırat river were measured from
years 1937 to 1967.
Thirty one recorded values are given below (in 109 m3 ).

How can we arrange and analyze the data to better understand it?

25
• We can start by drawing a Histogram (step diagram):
First, we can classify the data into class intervals (of 3x109 m3).
We can then plot the number of observations in each class interval as a horizontal
line.
• The histogram clearly demonstrates the distribution of the observations (which
cannot that clearly be extracted from tabulated values).

For example, we can now easily find out that the streamflow was in the range of
19-22 (x 109 m3) for seven years.
26
• We can then draw a Frequency Histogram by plotting the frequencies (defined as the
percentage of observations in a class interval) on the vertical axis to have a more
meaningful graph.
• The y axis of the Frequency histogram is unitless (dimensionless).

For example, now we can find out that the frequency in the range of 19-22 (x 109 m3)
equals 7/31 ≈ 0.23 = 23%.
27
We can also summarize the information contained in the tabulated or sketched
data by using statistical parameters.
• We can calculate the Arithmetic Mean around which the observations are
scattered.

• We can also determine the median.


• First, the data should be rearranged in an increasing sequence (in terms of
streamflow).

28
• It is not sufficient to characterize the set of data by only the mean and/or the
median.
• It is necessary to use at least one more parameter to define the uncertainty.
• In several years, the streamflow is either smaller or higher than the mean.
Therefore, variance which is a measure of the scatter (dispersion) of the data
around the mean can be used.

• The variance in this example has the unit of m6 because it is calculated by taking
the square of the variable.
• To obtain a parameter that has the same unit (dimension) as the data set, we
should take the square root of the variance, which is the standard deviation.

29
• We can also check the coefficient of skewness of the distribution.
• Coefficient of skewness is zero for symmetrical data.

• For the Fırat River flows, coefficient of skewness is equal to 0.075 which means
that the data is nearly symmetrical with a small positive skew.

30
If we want to make a comparison:

Example:
Annual flows of the Ceyhan river has:
a mean of 𝑦=7.1x109 m3, and
a standard deviation of sY=2.3x109 m3.

Which river (Ceyhan or Fırat) has more variable (more


scattered) flows?

To compare two variables, a (unitless) dimensionless parameter should be used.


31
Example:
The Coefficient of Variation ( which is unitless) is the ratio of the standard deviation of a
variable to its mean.

Flows of Ceyhan River show a higher dispersion.


Therefore, Ceyhan River has more variable flows
(even though its standard deviation is lower).
32

You might also like