Introductory Statistics Course Overview
Introductory Statistics Course Overview
Course Contents
Learning Objectives:
To introduce the students with basic concepts of statistical theory, translating data into statistical
information, further analysis and ultimately drawing inferences from the data.
Theory:
STAT-302/STAT-102 Role of statistics in research, Measurement and measurement scales, Type of data; primary and
secondary, continuous and discrete. Presentation of data using graphs and tables, stem-leaf and box
2(2-0)/3(3-0) plots. Measures of central tendency and dispersion. Chebyshev theorem and empirical rule, concept of
outlier and its detection through Box-whisker plot. Sampling and sampling distribution of single mean.
Introductory Statistics Statistical Inference for single population mean, difference between two means, paired
observations. Chi-square goodness of fit test and chi-square test of independence. One way and two
way analysis of variance (ANOVA).
Suggested Readings:
1. Choudhry, M., 2000. Introduction to statistical theory. Ilmi Kitab Khana Urdu Bazar, Lahore.
Provided
2. Muhammad, F., 2015. Statistical methods and data analysis. Kitab Markaz, Bhawana bazaar,
Faisalabad.
3. Walpole, R.E., R.H. Myers and S.L. Myers, 2007. Probability and Statistics for Engineers and
scientist. 7th edition. Prentice Hall, N.Y.
By 4. Zar, J. H., 2010. Biostatistical Analysis, 5th Edition Prentice Hall.
Dr. Muhammad Imran Khan Preferably Suggested Book: “Statistical Methods & Data Analysis” 6th Edition by Faqir
Muhammad & Hassan Dawood
Kitab Markaz Amin Pur Bazar Faisalabad 041-2642707
1 2
1
05-Mar-19
5 6
7 8
2
05-Mar-19
Variables
INTRODUCTION TO STATISTICS Any characteristic, which may varies with respect to individual,
time, and (or) place. For example
Parameter No of products produced by a machine during a specified
period of time.
A parameter is a numerical characteristic of a population, such Number of workers
as its mean or standard deviation, etc. Parameters are fixed Weight of a any individual
constants that characterize a population. They are denoted y Price, Sale, Adv. expenditures
Greek letters. Parameter is a fixed quantity. Quality, Design, Performance
Statistic
A statistic is a numerical characteristic of a sample such as its Varia les are usually represented y last alpha ets as X, Y, Z etc.
mean or standard deviation, etc. The statistics are used to draw Dimension of Variable
valid inferences a out the population. They are denoted y Latin Univariate
letters. Statistic is a varia le quantity. Bivariate
Multivariate
11 12
3
05-Mar-19
Types of variables
Quantitative : Types of variables
Continuous Qualitative variable
Characteristic which varies in quality (not numerically) from one
Discrete individual to another, also called attri ute, e.g. eye color, education
level, Behavior, quality, Design, Performance.
Qualitative : Quantitative variable
Varia le is called a quantitative varia le when it varies in quantity (or
Categorical numerically) from one individual to another, e.g. age, income,
temperature, Price , Sale, Advertising Expenditures
Latent
Fixed varia les:- Random Variables:- Types of Quantitative variable
1. Design 1.Sales Discrete variable
2. Adv. Expenditures 2.Growth Varia le take only specified values or take values y jumps or reaks,
3. Diet 3.Recovery Time e.g. num er of rooms in a house, num er of deaths in an accident etc.
Continuous variable
4. Dose of a medicine 4. Yield If it can assume any value (fractional or integral) within two specified
5. Amount of Fertilizer 5. Marks values ‘a’ and ‘ ’, e.g. height of a plant, speed of a car, Sale, Price
13
6. Study Hours 14
15 16
4
05-Mar-19
17 18
INTRODUCTION TO STATISTICS
INTRODUCTION TO STATISTICS Collection of primary data
Data
(1) Direct personal investigation
(2) Personal interview
The Collection of some related o servations is called data.
(3) Collection through questionnaires.
Classification of data (4) Collection through enumerators.
Data that may have een originally collected and have not (5) Collection through local sources
undergone any sort of statistical treatment are called Collection of Secondary data:
Primary data, while the data that have undergone any sort 1. Official Pu lications
of statistical treatment at least once are called Secondary Federal Bureau of Statistics
data. Population Census of Organization
Data may e availa le from existing sources e.g. records Ministries of Health, Food, Agriculture, Finance etc.
and pu lications or the same may have to e collected Provincial Bureaus of Statistics
afresh. (2) Semi-official
19 20
5
05-Mar-19
21 22 22
23 24
6
05-Mar-19
25 26
7
05-Mar-19
8
05-Mar-19
FREQUENCY DISTRIBUTION
Solution Class limits Class
boundaries
Tally Frequency Midpoint c.f r.f % Cumulative
Marks f X F frequency % fre
Step-1 : Number values in the data set=n
n = 30 86----90 85.5----90.5 |||| | 6 88 6 0.2000 20.00 20.00
Step-2 : Find maximum & minimum values 91----95 90.5----95.5 |||| 4 93 10 0.1333 13.33 33.33
Max value = Xm =112 Min value= X0= 87
96----100 95.5----100.5 |||| |||| 10 98 20 0.3333 33.33 66.66
Range: R = X m- X 0 = 112-87 =25
Step-3 101----105 100.5----105.5 |||| | 6 103 26 0.2000 20.00 86.66
9
05-Mar-19
Weight Height
4.0 — 4.4 4.5 — 4.9 5.0 — 5.4 5.0 — 4.9 Histogram
45 — 49 | (1) || (2)
50 — 54 ||| (3) Frequency polygon & Frequency curve
55 — 59 || (2) |||| (5)
60 — 64 ||| (3) || (2)
Cumulative Frequency polygon or Ogive
65 — 69 || (2) |||| (5)
39 40
10
05-Mar-19
Historigram
Historigram
Historigram is constructed y taking
Time along X-axis and
Value of the varia le along Y-axis 3500
Saving(Rs)
Historigram.
2000
Example:
The data represent the records of a company’s savings
1500
over the years. Construct a time series plot to represent it. 1000
Index 1 2 3 4 5 6 7 8
Year 1950 1951 1952 1953 1954 1955 1956 1957
41 42
HISTOGRAM
Histogram A histogram consists of a set of adjacent
rectangles whose bases are marked off by class
A Histogram consists of a set of adjacent rectangles whose boundaries along the X-axis, and whose heights
ases are marked off y are proportional to the frequencies associated
with the respective classes.
Class oundaries along the X-axis
Class Class
Frequency
Frequency along Y-axis Limit Boundaries
30.0 – 32.9 29.95 – 32.95 2
Draw rectangles whose height are proportional to the 33.0 – 35.9 32.95 – 35.95 4
frequencies with respective classes 36.0 – 38.9 35.95 – 38.95 14
39.0 – 41.9 38.95 – 41.95 8
42.0 – 44.9 41.95 – 44.95 2
Total 30
43 44
11
05-Mar-19
Number of Cars
second class, and thus obtain the
Number of Cars
10
10
following situation:
8
8
6
6
4
4
2
2
0 X
0 X
5
29.95 32.95 35.95 38.95 41.95 44.95
.9
.9
.9
.9
.9
.9
29
32
35
38
41
44
Miles per gallon Miles per gallon
45 46
Number of Cars
12
the following picture: 10
Number of Cars
10
8
8
6
6
4
4
2
2
0 X
0 X
5
.9
.9
.9
.9
.9
.9
5
29
32
35
38
41
44
.9
.9
.9
.9
.9
.9
29
32
35
38
41
44
12
05-Mar-19
Frequency Distribution
Y
16 For Hudson Auto Repair, if we choose six classes:
14 Approximate Class Width = (109 - 52)/6 = 9.5 10
Number of Cars
5
Total 50
.9
.9
.9
.9
.9
.9
29
32
35
38
41
44
49
M iles per gallon 50
13
05-Mar-19
FREQUENCY POLYGON
Number of Cars
classes were: 12
10
Class Mid-Point Frequency
8
Boundaries (X) (f)
6
26.95 – 29.95 28.45 4
29.95 – 32.95 31.45 2
2
32.95 – 35.95 34.45 4
0 X
35.95 – 38.95 37.45 14
38.95 – 41.95 40.45 8
5
.4
.4
.4
.4
.4
.4
.4
28
31
34
37
40
43
46
41.95 – 44.95 43.45 2
53 44.95 – 47.95 46.45 54
Miles per gallon
Cumulative Distributions
TYPES OF FREQUENCY
Distribution Hudson Auto Repair
Cumulative Cumulative
Symmetrical distribution Cumulative Relative Percent
A frequency distri ution or curve is symmetrical if values Cost ($) Frequency Frequency Frequency
equidistant from a central maximum have the same < 49.5 0 0 0
frequencies.
< 59.5 2 .04 4
Skewed distribution < 69 .5 15 .30 30
2 +13
A frequency distri ution or curve is skewed when it < 79.5 31 .62 15/50 62 .30(100)
departs from symmetry. < 89.5 38 .76 76
< 99.5 45 .90 90
< 109.5 50 1.00 100
55 56
14
05-Mar-19
57 58
25
11 23
13 27
15
15 30 10
X
15 20
20 35
59 60
15
05-Mar-19
63 64
64
16
05-Mar-19
65 66
67 68
17
05-Mar-19
GRAPHICAL REPRESENTATION
MULTIPLE BAR CHART
Example: PC Hotel
Bar Graph A multiple ar chart consists of horizontal or vertical
9 more than one ars of equal widths and respective
8
lengths proportional to the magnitudes of the
7
respective o servations. The space separating the ars
Frequency
6
5 should not exceed the width of the ar and should not
4 e less than half of its width. The data when do not
3 relate to time should e arranged in ascending or
2
1
descending order efore charting.
Rating
Poor Below Average Above Excellent
Average Average
69 70
Frequency
Below Average 3 2 6
Average 5 4 5
4
Above Average 9 7
3
Excellent 1 4
2
Total 20 20 1
Rating
Poor Below Average Above Excellent
Average Average
71 72
18
05-Mar-19
TURNOVER
40000
Years 1965 1966 1967 1968 1969
30000
20000
10000
Turnover 35000 42000 435000 48000 485000
0
(Rupees) 1965 1966 1967 1968 1969
YEARS
73 74
characteristics and each of which is shaded differently for 1987 1988 1989
identification.
Locality-I 500 600 800
COMPONENT BAR CHART
In Component ar chart each ar is divided into two or more Locality-II 600 700 700
sections. The length of the ar represents the total and various
sections represent the components of total. Locality-III 200 400 500
75 76
19
05-Mar-19
GRAPHICAL REPRESENTATION
Multiple bar diagram of different
localities Component Bar Chart:
A component ar diagram to represent the population of different
1000 divisions of the Pakistan.
800 Population in Lakhs
Production
L-I
600 Division Male Female Total
L-II
400
L-III Peshawar 33 31 64
200
0
Rawalpindi 21 19 40
1987 1988 1989 Sargodha 32 28 60
Years Lahore 35 30 65
77 78
Pie Chart
COMPONENT BAR CHART
SHOWING POPULATION OF 4
The pie chart is a commonly used graphical
DIVISIONS
device for presenting relative frequency
distributions for qualitative data.
POPULATION (IN LAKH)
70
60 First draw a circle; then use the relative
50 frequencies to subdivide the circle into sectors
40 Female
that correspond to the relative frequency for each
30 Male
20
class.
10 Since there are 360 degrees in a circle, a class
0
with a relative frequency of .25 would consume
.25(360) = 90 degrees of the circle.
DIVISIONS
79 80
20
05-Mar-19
GRAPHICAL REPRESENTATION
Pie Chart for production of urea
Pie Chart:
A Pie Chart to represent the total production of urea fertilizer and its use on different
fertilizer
crops.
Bar chart is used for Nominal or Ordinal data Stem and Leaf
Pie chart is an alternative to the ar chart
83 84
21
05-Mar-19
Uses actual data values to create a Stem and Leaf graph (Marks of
39 Students) 52, 56, 60, 60, 60, 60, 60, 60, 64, 64, 64, 64,64, 64, 64,
68, 68, 68, 68, 72, 72, 72, 72, 72, 72, 72, 72, 74, 76, 76, 76, 76, 80, 80,
STEM AND LEAF PLOTS 84, 88, 88, 88, 92
Let’s
try one First, draw
the dividing
together… lines.
87 88
22
05-Mar-19
Here are one student’s math test scores. Here are one student’s math test scores.
Make a stem-and-leaf plot for this data. Make a stem-and-leaf plot for this data.
Here are one student’s math test scores. Continue with this process until
Make a stem-and-leaf plot for this data. you have entered all the data.
23
05-Mar-19
Key: 7 5 means 75
93 94
95 96
24
05-Mar-19
97
25