0% found this document useful (0 votes)
38 views31 pages

Probabilistik Dan Proses Stokastik

The document discusses various statistical concepts for representing and analyzing data, including histograms, measures of central tendency (mean, median), measures of spread (range, interquartile range), and outliers. It provides examples to demonstrate how to calculate the median, quartiles, IQR, and identify outliers for a data set. Box and whisker plots are also introduced as a way to visually depict these aspects of a data distribution.

Uploaded by

faris
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views31 pages

Probabilistik Dan Proses Stokastik

The document discusses various statistical concepts for representing and analyzing data, including histograms, measures of central tendency (mean, median), measures of spread (range, interquartile range), and outliers. It provides examples to demonstrate how to calculate the median, quartiles, IQR, and identify outliers for a data set. Box and whisker plots are also introduced as a way to visually depict these aspects of a data distribution.

Uploaded by

faris
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

Probabilistik dan Proses

Stokastik

Todays Agenda
Continue from data Representation
Histogram

Center and Spread of Data


Quartiles
Box and Whisker Plot
Outliers

Data Representation (Example)

89 84 87 81 89 86 91 90 78 89 87 99 83 89
Sort this data
78 81 83 84 86 87 87 89 89 89 89 90 91 99
Group this data
Make 5 groups
Group

No of Elements

75 - 79

80 - 84

85 - 89

90 - 94

94 - 99

Data Representation (Example)


78 81 83 84 86 87 87 89 89 89 89 90 91 99
Representing the same data in stem and leaf
plot,

Stem

Leaf

134

6779999

01

Data Representation (Example)


78 81 83 84 86 87 87 89 89 89 89 90 91 99
Counting how many leaves a certain stem
has, we write that number in the left most
column, and call it absolute frequency
Absolute
frequency

Stem

Leaf

134

677999
9

01

Data Representation (Example)


78 81 83 84 86 87 87 89 89 89 89 90 91 99
To find the cumulative absolute frequency, we
add up the absolute frequencies up to the line
of the leaf
Cumulativ
e Absolute
frequency

Absolute
frequency

Group

No of
Elements

134

11

677999
9

13

01

14

Data Representation (Example)


Cumulative
Absolute
frequency

Absolute
frequency

Group

No of
Elements

134

11

6779999

13

01

14

Individual entries of left most column in stem


and leaf plot are called Cumulative Absolute
Frequency CAS, i. e. the sum of the absolute
frequencies of values up to the line of the
leaf.
For example, 11 shows that there are 11 values in
the data not exceeding 89.

Data Representation (Example)


Dividing the absolute frequency by n (total
number of entries in the data) gives Relative

class Frequency

In the present example there are total 14


entries, therefore, relative frequency is
calculated as
Group

Abs. Freq

Relative C.
Frequency

75 - 79

1/14

80 - 84

3/14

85 - 89

7/14

90 - 94

2/14

94 - 99

1/14

Relative frequency
How Relative class Frequency is used for data
representation?

Histogram
Area of the rectangles are proportional to the
relative frequency.
Grou
p

Abs.
Freq

Rel. Freq

Rel. Freq

75 79

1/14

0.07

80 84

3/14

0.21

85 89

7/14

0.50

90 94

2/14

0.14

0,10

94 99

1/14

0.07

0,00

0,60
0,50
0,40
0,30
0,20

75 - 79 80 - 84 85 - 89 90 - 94 94 - 99

Histogram
What information does Histogram?
The data was
78 81 83 84 86 87 87 89 89 89 89 90 91 99
0,60
0,50
0,40
0,30

0,20
0,10
0,00
75 - 79

80 - 84

85 - 89

90 - 94

94 - 99

Histogram
What information does Histogram?
It give us a clear picture where is the
concentration of data
Or we can say, which way the data is inclined

Progress so far?
We have studied,
absolute frequencies
Relative frequency
And how to use it in plotting histogram

Data
We have collected data and we want to
analyze it,
We take the previous data
89 84 87 81 89 86 91 90 78 89 87 99 83 89
Sorting this data we get
78 81 83 84 86 87 87 89 89 89 89 90 91 99

Center and Spread of Data


As a center of the location of data values we can
take a median.
78 81 83 84 86 87 87 89 89 89 89 90 91 99
There are total 14 values
As in the present data set we have even number
of values so there is no center value
But we have 87 and 89 as middle values (7th and
8th) so

We take the median as


(87+89)/2
=88
Therefore, The median is 88.

Remember Median may not be present in the data.

Median Cont..
Take another example
51 54 55 55 57 62 63 63 69
There are total 9 values
As in the present data set we have ODD
number of values so there is a center value
The center value is 57

Therefore, The median is 57.

Notice in this example Median is present in


the data.

Median Cont..
Take another example
51 54 55 55 56 57 62 63 63 69
There are total 10 values
As in the present data set we have even
number of values so there is no center value
But we have 56 and 57 as middle values (5th
and 6th) so

We take the median as


(56+57)/2
=56.5
Therefore, The median is 56.5.

Remember Median may have decimal places.

Spread of Data
Spread of data can be measured by the range
Spread is also called variability.
Spread = maximum value minimum value
Example data
78 81 83 84 86 87 87 89 89 89 89 90 91 99
In this case spread is 99 78 = 21.

Spread of Data
Example data
51 54 55 55 57 62 63 63 69
In this case spread is 69 51 = 18.

Example1
3, 13, 7, 5, 21, 23, 39, 23, 40, 23, 14, 12, 56, 2
3, 29
putting data in order
3, 5, 7, 12, 13, 14, 21, 23, 23, 23, 23, 29, 39, 4
0, 56
Total value are 15, 8th value is in the middle.
The median value turns out to 23
The spread 56 3 = 53

Example1
3, 13, 7, 5, 21, 23, 23, 40, 23, 14, 12, 56, 23, 29
Here we have even number of elements in data.
Putting this data in order
3, 5, 7, 12, 13, 14, 21, 23, 23, 23, 23, 29, 40, 56
n = 14
3, 5, 7, 12, 13, 14, 21, 23, 23, 23, 23, 29, 40, 56
Median is found by (21 + 23)/2 = 22 i.e. by taking
mean value of two middle values.
The spread 56 3 = 53
Median separates the data in two equal halves.

Quartiles
With Quartiles data is divided in 4 groups in
the same manner as we do for median.
There are three quartiles in data called
Lower Quartile ql (median of the lower half of the
data)
Middle Quartile qm(median of the data)
Upper Quartile qu (median of the upper half of the
data)

Interquartile Range IQR can be found by


IQR = qu - ql

Example2

78 81 83 84 86 87 87 89 89 89 89 90 91 99
Lower half of data is
78 81 83 84 86 87 87
Lower Quartile is 84
Upper half of data is
89 89 89 89 90 91 99
Lower Quartile is 89
Middle Quartile (same as median) is 88
IQR (interquartile range) = 89 84 = 5

Box and Whisker Plot


Also called Box Plot
Box plot is obtained by 5 values of data.
Minimum value of the data
Three quartiles
Maximum value of the data

Example2
78 81 83 84 86 87 87 89 89 89 89 90 91 99
Middle Quartile is 88
Lower half of data is
78 81 83 84 86 87 87
Lower Quartile is 84

Upper half of data is


89 89 89 89 90 91 99
Upper Quartile is 89
IQR = 89 84 = 5

Outliers
Lets say an experiment was performed in
which time was noted for a toy parachute to
land on the ground from a fixed height. The
experiment was repeated 10 times, under
similar conditions
The data was recorded as
14 13 15 16 5 27 16 11 12 22

Outliers
14 13 15 16 5 27 16 11 12 22
Sorting this data
5 11 12 13 14 15 16 16 22 27
Remember we said that the same experiment is
repeated 10 times under the same
conditions, then the time take should be same in
all the cases and we should have the same
number 10 times,
However due to unavoidable delay in the response
of the human in clicking the stop watch, we have
varied data,
But some of the data is completely out of sink
with the rest of the data.
The data which is not representative of the rest
of the data is called OUTLIERS

Outliers
An outlier is a value that appears to be
uniquely different from the rest of the data
set.
It might indicate that something went wrong
with the data collection process
The outlier is normally defined as a value
more than a distance of 1.5 IQR, from either
end of the box.

Outliers

Coming back to the data


14 13 15 16 5 27 16 11 12 22
Sorting this data
5 11 12 13 14 15 16 16 22 27
Middle quartile = 14.5
Lower quartile = 12
Upper quartile = 16
Spread = 27-5 = 22
IQR = 16-12 = 4
1.5xIQR = 1.5x4 = 6
Therefore all values above upper quartile +6
16+6 = 22, are outliers as is 27

Outliers

Coming back to the data


14 13 15 16 5 23 16 11 12 22
Sorting this data
5 11 12 13 14 15 16 16 22 23
Middle quartile = 14.5
Lower quartile = 12
Upper quartile = 16
Spread = 23-7 = 16
IQR = 16-12 = 4
1.5xIRQ = 1.5x4 = 6
Therefore all values below (lower quartile -6)
12-6 = 6, are outliers as is 5

References
1: Advanced Engineering Mathematics by E
Kreyszig 8th edition

You might also like