I E101
ENGINEERING DATA ANALYSIS
Mo du l e 1– Par t 1
Review of Basic Concepts of Probability and Statistics
Descriptive Statistics
Organizing Data
When data are collected in original form, they are
called raw data.
When the raw data is organized into a frequency
distribution, the frequency will be the number of values in
a specific class of the distribution.
A frequency distribution is the organizing of raw data in
table form, using classes and frequencies.
Organizing Data
Categorical frequency distributions - can be used for
data that can be placed in specific categories, such as
nominal- or ordinal-level data.
o Examples:
political affiliation,
religious affiliation,
blood type
Organizing Data
Categorical frequency distributions
Class Frequency Percent
5
A 5 20% ¿
25
B 7 28%
O 9 36%
AB 4 16%
Total 25
Organizing Data
Ungrouped frequency distributions - can be used for
data that can be enumerated and when the range of values
in the data set is not large.
o Examples:
Number of miles your instructors have to travel
from home to campus,
number of girls in a 4-child family,
student’s score in a 10-point quiz
Organizing Data
Ungrouped frequency
Class Frequency Percent
24
10 24 48% ¿
50
15 16 32%
20 10 20%
Total 50
Organizing Data
Grouped frequency distributions - can be used when
the range of values in the data set is very large. The data
must be grouped into classes that are more than one unit
in width.
o Example:
the life of boat batteries in hours.
Class Class Cumulative
Frequency
Limits Boundaries Frequency
24 - 37 23.5 – 37.5 4 4
38 - 51 37.5 – 51.5 14 18
52 - 65 51.5 – 65.5 7 25
Terms associated with Grouped Frequency
Distribution
Class limits represent the smallest and largest data values
that can be included in a class.
In the lifetimes of boat batteries example, the values 24
and 37 of the first class are the class limits.
The lower class limit is 24 and the upper class limit is
37.
Terms associated with Grouped Frequency
Distribution
The class boundaries are used to separate the classes so
that there are no gaps in the frequency distribution.
The class width for a class in a frequency distribution is
found by subtracting the lower (or upper) class limit of one
class minus the lower (or upper) class limit of the previous
class.
Guidelines for constructing a frequency
distribution
The classes must be mutually exclusive.
The classes must be continuous.
The class must be equal in width.
Procedures for constructing a frequency
distribution
1. Find the highest and lowest value.
2. Find the range.
3. Select the number of classes desired.
4. Find the width by dividing the range by the number of
classes and rounding up.
5. Select a starting point (usually the lowest value); add the
width to get the lower limits.
6. Find the upper class limits.
7. Find the boundaries.
8. Tally the data, find the frequencies, and find the
cumulative frequency.
Procedures for constructing a frequency
distribution
Example: In a survey of 20 patients who smoked, the
following data were obtained. Each value represents the
number of cigarettes the patient smoked per day. Construct a
frequency distribution using six classes.
10 8 6 14
22 13 17 19
11 9 18 14
13 12 15 15
5 11 16 11
Procedures for constructing a frequency
distribution
Step 1: Find the highest and lowest values: H = 22 and L = 5.
Step 2: Find the range: R = H – L = 22 – 5 = 17.
Step 3: Select the number of classes desired. In this case it is equal
to 6.
10 8 6 14
22 13 17 19
11 9 18 14
13 12 15 15
5 11 16 11
Procedures for constructing a frequency
distribution
Step 4: Find the class width by dividing the range by the number of
classes. Width = 17/6 = 2.83. This value is rounded up to 3.
Step 5: Select a starting point for the lowest class limit. For
convenience, this value is chosen to be 5, the smallest data value.
The lower class limits will be 5, 8, 11, 14, 17, and 20.
10 8 6 14
22 13 17 19
11 9 18 14
13 12 15 15
5 11 16 11
Procedures for constructing a frequency
distribution
Step 6: The upper class limits will be 7, 10, 13, 16, 19, and 22.
Step 7: Find the class boundaries by subtracting 0.5 from each
lower class limit and adding 0.5 to the upper class limit.
Step 8: Tally the data, write the numerical values for the tallies in
the frequency column, and find the cumulative frequencies.
Class Class Cumulative
Frequency
Limits Boundaries Frequency
5 to 7 4.5 - 7.5 2 2
8 to 10 7.5 - 10.5 3 5
11 to 13 10.5 - 13.5 6 11
14 to 16 13.5 - 16.5 5 16
17 to 19 16.5 - 19.5 3 19
20 to 22 19.5 - 22.5 1 20
Frequency Tables and Graphs
The five most commonly used graphs in research
are:
o The histogram
o The frequency polygon
o The cumulative frequency graph,or ogive
(pronounced o-jive)
o Pie graph
o Time series graph
Frequency Tables and Graphs
The histogram is a graph that displays the data by using
vertical bars of various heights to represent the
frequencies.
o Example: No. of Students
35
30
No. of
Class 25
Students
20
English 26
15
Algebra 24
10
Economics 32
5
Science 27
0
h ra s ce
Gym 18 lis b ic n ym hop
g ge om ie G s
En Al n Sc od
Woodshop 13 Ec
o
W
o
No. of Students
Frequency Tables and Graphs
A frequency polygon is a graph that displays the data by
using lines that connect points plotted for frequencies at
the midpoint of classes. The frequencies represent the
heights of the midpoints.
Class Class Fr eq u en c
Midpoints
Limits Boundaries y
5 to 7 4.5 - 7.5 2 6.5
8 to 10 7.5 - 10.5 3 9.5
11 to 13 10.5 - 13.5 6 12.5
14 to 16 13.5 - 16.5 5 15.5
17 to 19 16.5 - 19.5 3 18.5
20 to 22 19.5 - 22.5 1 21.5
Frequency Tables and Graphs
A cumulative frequency graph or ogive is a graph that
represents the cumulative frequencies for the classes
in a frequency distribution
Class Class Fr eq u en c Cumulative
Limits Boundaries y Frequency
5 to 7 4.5 - 7.5 2 2
8 to 10 7.5 - 10.5 3 5
11 to 13 10.5 - 13.5 6 11
14 to 16 13.5 - 16.5 5 16
17 to 19 16.5 - 19.5 3 19
20 to 22 19.5 - 22.5 1 20
Frequency Tables and Graphs
Pie graph - A pie graph is a circle that is divided into
sections or wedges according to the percentage of
frequencies in each category of the distribution.
No. of Students
No. of
Class % English Algebra Economics
Students
26 Science Gym Woodshop
26 °
English 26 19% ¿
140 ¿ 𝑥360 =66.86°
Algebra 24 17% 9%
19%
140
Economics 32 23%
13%
Science 27 19%
Gym 18 13%
17%
Woodshop 13 9%
19%
Total 140
23%
Frequency Tables and Graphs
Time series graph - A time series graph represents data
that occur over a specific period of time.
Frequency Tables and Graphs
Example: These data represent the record high
temperatures in degrees Fahrenheit (F) for each of the 50
states. Construct a grouped frequency distribution for the
data using 7 classes. Construct an ogive for the frequency
distribution.
Class Midpoin Fr eq u en c Cumulative
Class Limits
Boundaries ts y Frequency
100 to 104 99.5 to 104.5 102 2 2
105 to 109 104.5 to 109.5 107 8 10
110 to 114 109.5 to 114.5 112 18 28
114 to 119 114.5 to 119.5 117 13 41
119 to 124 119.5 to 124.5 122 7 48
124 to 129 124.5 to 129.5 127 1 49
Frequency Tables and Graphs
Frequency Tables and Graphs
Distribution Shapes
o When one is describing data, it is important to be able
to recognize the shapes of distribution values.
o In later chapters you will see that the shape of a
distribution also determines the appropriate statistical
methods used to analyze the data.
Frequency Tables and Graphs
Example: Construct a histogram, frequency polygon, and
ogive using relative frequencies for the distribution (shown
below) of the miles that 20 randomly selected runners ran
during a given week.
Class Frequenc
Boundaries y
5.5 to 10.5 1
10.5 to 15.5 2
15.5 to 20.5 3
20.5 to 25.5 5
25.5 to 30.5 4
30.5 to 35.5 3
35.5 to 40.5 2
Frequency Tables and Graphs
Compute for the relative frequency per class
Class Frequenc
Boundaries y 1
¿ = 0.05
5.5 to 10.5 1 20
10.5 to 15.5 2
15.5 to 20.5 3
20.5 to 25.5 5
25.5 to 30.5 4
30.5 to 35.5 3
35.5 to 40.5 2
Total 20
Frequency Tables and Graphs
(a) Histogram
Relative
Class Frequenc
Frequenc
Boundaries y
y
5.5 to 10.5 1 0.05
10.5 to 15.5 2 0.10
15.5 to 20.5 3 0.15
20.5 to 25.5 5 0.25
25.5 to 30.5 4 0.20
30.5 to 35.5 3 0.15
35.5 to 40.5 2 0.10
Frequency Tables and Graphs
(b) Frequency Polygon
Relative
Class Frequenc
Frequenc Midpoint
Boundaries y
y
5.5 to 10.5 1 0.05 8
10.5 to 15.5 2 0.10 13
15.5 to 20.5 3 0.15 18
20.5 to 25.5 5 0.25 23
25.5 to 30.5 4 0.20 28
30.5 to 35.5 3 0.15 33
35.5 to 40.5 2 0.10 38
Frequency Tables and Graphs
(c) O-give
Cumulativ
Class Cumulative e
Boundaries Frequency Relative
Frequency
5.5 to 10.5 1 0.05
10.5 to 15.5 3 0.15
15.5 to 20.5 6 0.30
20.5 to 25.5 11 0.55
25.5 to 30.5 15 0.75
30.5 to 35.5 18 0.90
35.5 to 40.5 20 1.00