INTRODUCTION
INTRODUCTION
Module 1
INTRODUCTION TO STATISTICS
Overview
I. Objectives
1. identify the different types of sample;
2. discuss the statistical procedures involved in collecting and presenting data;
and
3. summarize observation/data following the statistical procedure.
Statistics, Defined
Singular Sense: The science that deals with the collection, organization,
presentation, analysis and interpretation of data
Plural Sense: A set of numerical information; a processed data set. Examples
are: statistics on enrollment, graduates
1
Inferential Statistics – deals with making generalizations or drawing
conclusions/judgments about the entire set of data based on the analysis of a
subset of data.
Examples: sampling and sampling distributions, estimation, hypothesis testing
Basic Terms
Universe – the set of all entities under study; addresses the question, “Who do
we want to study?”
Variable – a quantity that can assume any of the range or prescribed set of
values; a characteristic that manifests differences/variation in magnitudes; the
characteristic that is measurable/observable in all units in the universe;
answers the question, “What do we wish know from the elements of the
universe?”
Example: GPA of students
Constant – a quantity that takes a single fixed value
Qualitative Variables – those that cannot be ordered in some dimension;
categorical data; classifications are simply labels, assuming alphanumeric
possible values
Example: college majors, hair color
Quantitative Variables – those that assume numerical data
Example: heights, air temperature
Discrete Data – those that are usually associated with count values
Example: number of students in a class, money
Continuous Data – those that are usually associated with measurement values
Example: car speed, height, temperature
Distribution – the pattern of variation of the variable, displaying how often
each value occurs in the data set
Data Gathering
Types of Data
• Primary Data – data gathered by the user directly from the units in the
universe
• Secondary Data – data gathered not directly from the units in the universe
2
Raw Data – the data as gathered/collected
3
Frequency Distribution Table, FDT
The class width is the distance between lower (or upper) limits of consecutive
classes.
The range is the difference between the maximum and minimum data entries.
4
Tally the data by counting the frequency or number of observations that
belong to each of the classes
e. Construct other columns of information, such as:
• Class Mark or Midpoint – obtained by adding the lower and the upper
class limits and dividing by two.
Example:
Example:
• Cumulative Frequency
< CF – the number of observations less than or equal to the upper
limit of a class
> CF – the number of observations greater than or equal to the lower
limit of a class
Example:
5
• Relative Cumulative Frequency – the cumulative frequency expressed
as a percentage of the total number of observations
Definition of Terms
Methods of Sampling
6
Sampling Procedures
Measures of Central Tendency – values computed from the data that tend to
center or cluster around
7
Properties:
• The sum of the deviations from the arithmetic mean is zero
• The sum of the squares of the deviations from the arithmetic mean is less
than the sum of the squares of the deviations from any value
Advantages:
• The most commonly used average
• Easy to compute
• Easily understood
• Lends itself to algebraic manipulation
Disadvantage:
• Unduly affected by extreme values and may therefore be far from
representative of the sample
Example:
The following are the ages of all seven employees of a small company:
53 32 61 57 39 44 57
Example:
The following frequency distribution represents the ages of 30 students in a
statistics class. Find the mean of the frequency distribution.
8
b. Weighted Mean, Xw – an average of n quantities by attaching more significance
(or weight) to some of the numbers than to others.
_ n n
Xw = ∑ Wi Xi / ∑ Wi where: W = assigned weight for the ith quantity
i=1 i =1
Example:
Grades in a statistics class are weighted as follows:
Tests are worth 50% of the grade, homework is worth 30% of the grade and
the final is worth 20% of the grade. A student receives a total of 80 points on
tests, 100 points on homework, and 85 points on his final. What is his current
grade?
c. Geometric Mean, G – the nth root of the product of k positive numbers; used
primarily to average data for which the ratio of consecutive terms remains
approximately constant, which occurs with such data as rates of change, ratios,
etc.
G == ( X1 X2 … Xn) 1/n
d. Harmonic Mean, H – for n numbers, the number n divided by the sum of the
reciprocals of the n numbers; most frequently used in averaging speeds for
various distances covered where the distances remain constant, also in finding
the average cost of common commodity when several different purchases are
made by investing the same amount of money each time.
H = n / ( ∑(1/Xi) )
9
Exercise:
Find the Geometric and Harmonic Mean.
e. Midrange, MR – the average of the highest and lowest value in the data set.
MR = (Xmin + Xmax)/2
Quick and easy to compute but is often inefficient because all the information
contained in the intermediate values has been ignored.
f. Mode – the value that occurs most frequently (absolute mode). There could be
several modes; a relative mode is a value that occurs more frequently than
neighboring values even if it is not an absolute mode
Mo = LMo + [d1/(d1+d2)] (w) where: LMo = lower limit of the modal class
d1 = the difference, sign neglected,
between the frequency of the modal class
and the frequency of the preceding class
d2 = the difference, sign neglected,
between the frequency of the modal class
and the frequency of the following class
w = width of the modal class
10
g. Median – Half of the observations should have a value less than the median
and half should have a value greater than the median
Md = X (N+1)/2 if N is odd
(N+1)/2 - S
Md = LMd + --------------- w where: LMd = lower limit of the median
fMd class
N = number of observations in
the sample
S = sum of the frequencies
in all classes preceding the
median class
fMd = frequency of the median
class
w = width of the median class
11
h. Percentile, Decile, and Quartile Limits
Percentiles – values dividing the array into 100 equal parts
Deciles – values dividing the array into 10 equal parts
Quartiles – values dividing the array into 4 equal parts
Example:
If a score is located at the 80th percentile (P80 ), it means that 80% of all the
scores fall at or below this score in the distribution and 20% of all the scores fall
above this value.
The three quartiles, Q1, Q2, and Q3, approximately divide an ordered data set
into four equal parts.
Example:
The quiz scores for 15 students is listed below. Find the first, second, and third
quartiles of the scores.
28 43 48 51 43 30 55 44 48 33 45 37 37 42 38
12
Measures of Dispersion – values used to describe the extent of dispersion or
variability of data
a. Range – the difference between the largest and the smallest measurement in
the data set
R = Xmax - Xmin
Example:
The following data are the closing prices for a certain stock on ten successive
Fridays. Find the range.
b. Mean Deviation, MD – the arithmetic mean of the absolute deviations from the
mean
_
∑ │Xi - X│
MD = ----------------
n
c. Variance
For population variance, σ2
∑(Xi - µ)2
σ2 = -------------
N
n ∑ Xi2 - ( ∑ Xi )2
S2 = ------------------------
n (n – 1)
13
For grouped data, the sample variance is obtained from:
n ∑ fi Xi2 - ( ∑ fi Xi )2
2
S = -----------------------------
n (n – 1)
Example:
The following data are the closing prices for a certain stock on five successive
Fridays. Find the population standard deviation.
Q = ½ ( Q3 - Q1)
14
b. Pearson’s Second Coefficient of Skewness, SK
_
SK = 3 ( X – Md)/ S If SK = 0, distribution of data is symmetric SK <
0, distribution is negatively skewed, or
the frequency curve of the distribution
has a longer tail to the left of the
central maximum than to the right
SK > 0, distribution is positively skewed
b. Kurtosis, K
K = ( a4 - 3) If K = 0, distribution is normal
K > 0, distribution is leptokurtic
K < 0, distribution is platykurtic
15
ABEN 3413: Engineering Data Analysis
III. Assessment
1. Conceptualized a research problem to collect data for the
identified variable(s), using:
a. objective method
b. subjective method
c. use of existing record
2. The following data represents the ages of 30 students in a
statistics class. Construct a frequency distribution that has five
classes.
3. Consider the following set of test scores for James and Rob in
Chemistry.
James 62 80 83 72 73
Rob 50 46 90 91 93
IV. References
Walpole, R.E. and raymond h.m. 1989. Probability and statistics for engineers
and scientists. 4th edition. Macmillan publishing company. 866 third
avenue, new york, new york 10022
Triola, M.F. 1994. Elementary statistics. 6th edition. Addison-wesley publishing
company, inc. United states of america
Triola, M.F. 2011. Essentials of statistics. 4th edition. Pearson education, inc.
500 boylston street, suite 900, boston ma 02116
Page 16 of 16