0% found this document useful (0 votes)
16 views16 pages

INTRODUCTION

ab power engineering

Uploaded by

Ibale, Arjay D.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views16 pages

INTRODUCTION

ab power engineering

Uploaded by

Ibale, Arjay D.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Central Luzon State University

Science City of Muñoz 3120


Nueva Ecija, Philippines

Instructional Module for the Course


ENGINEERING DATA ANALYSIS

Module 1
INTRODUCTION TO STATISTICS

Overview

This section explains the fundamental concepts and terminology in


statistics. The different statistical procedures involved in data collection and
presentation are discussed.

I. Objectives
1. identify the different types of sample;
2. discuss the statistical procedures involved in collecting and presenting data;
and
3. summarize observation/data following the statistical procedure.

II. Learning Activities

II.1 BASIC CONSIDERATIONS

Statistics, Defined

Singular Sense: The science that deals with the collection, organization,
presentation, analysis and interpretation of data
Plural Sense: A set of numerical information; a processed data set. Examples
are: statistics on enrollment, graduates

Two Phases/Fields of Statistics

Descriptive Statistics – deals with the methods of collecting/gathering,


organizing, summarizing, presenting data and their interpretation.
Examples: measures of central tendency, variability, skewness

1
Inferential Statistics – deals with making generalizations or drawing
conclusions/judgments about the entire set of data based on the analysis of a
subset of data.
Examples: sampling and sampling distributions, estimation, hypothesis testing

Basic Terms

Universe – the set of all entities under study; addresses the question, “Who do
we want to study?”
Variable – a quantity that can assume any of the range or prescribed set of
values; a characteristic that manifests differences/variation in magnitudes; the
characteristic that is measurable/observable in all units in the universe;
answers the question, “What do we wish know from the elements of the
universe?”
Example: GPA of students
Constant – a quantity that takes a single fixed value
Qualitative Variables – those that cannot be ordered in some dimension;
categorical data; classifications are simply labels, assuming alphanumeric
possible values
Example: college majors, hair color
Quantitative Variables – those that assume numerical data
Example: heights, air temperature
Discrete Data – those that are usually associated with count values
Example: number of students in a class, money
Continuous Data – those that are usually associated with measurement values
Example: car speed, height, temperature
Distribution – the pattern of variation of the variable, displaying how often
each value occurs in the data set

Data Gathering

Ways of Data Gathering


• Objective Method – involves getting actual measurements and
observations
Example: direct observations, experiments
• Subjective Method – involves the respondent providing the data
Example: Surveys
• Use of Existing Records – done by utilizing the data previously gathered
by certain persons/agencies

Types of Data
• Primary Data – data gathered by the user directly from the units in the
universe
• Secondary Data – data gathered not directly from the units in the universe

2
Raw Data – the data as gathered/collected

Array – arrangement of raw numerical data in ascending or descending order of


magnitude

Grouped Data – arrangement/organization of data in a frequency distribution


form

Methods of Data Presentation

Textual Method – uses sentences and paragraphs, giving a narrative that


describes the characteristics of the universe or the population based on the
data gathered and organized
Tabular Method – data are organized in terms of rows and columns to present
the most number of information
Graphical method – uses pictures, figures to display trends and distribution of
data. Examples are: bar graph, line graph, pie diagram, pictograph, statistical
map, histogram, frequency polygon, ogive.
• Bar graph – represents the frequency or magnitude of quantities of each of
the categories as a bar rising vertically (or horizontally) from the horizontal
(or vertical) axis, with the height (or width) of each bar being proportional
to the frequency or magnitude of the corresponding category
• Line graph – obtained by plotting the frequency of a category above the
point on the horizontal axis representing the category, and then joining the
points with a straight line.
• Pie diagram – a circle subdivided into a number of slices that represent the
different categories, and with each slice proportional to the percentage
corresponding to the category
• Pictograph – makes use of symbols and is used to compare few discrete
data, usually of one kind
• Statistical map – shows the geographical location and may contain different
symbols on the map. This should carry a legend to tell the meaning of the
symbols
• Histogram – a bar graph associated with a frequency distribution table. It
is constructed by marking off the true class boundaries along the horizontal
axis and erecting over each class interval a rectangle whose height is equal
to the frequency of the class.
• Frequency polygon – a line graph associated with a frequency distribution
table. It is constructed by plotting the class mark vs. the frequency for the
class and then joining the points with a straight line.
• Ogive – a graphical representation of the cumulative frequency of a
frequency distribution table

3
Frequency Distribution Table, FDT

A tabular presentation of a given set of data that presents the classes/categories


established for the data and the frequency of observations falling under a
specified class/category.

The frequency f of a class is the number of data points in the class.

The class width is the distance between lower (or upper) limits of consecutive
classes.

The range is the difference between the maximum and minimum data entries.

Construction of a Quantitative FDT


a. Determine the lowest value (LV) and the highest value (HV) in the set of
data to compute the range, R = HV – LV
b. Determine the number of classes, k = √N , where N is the number of
observations in the data set and k being rounded off to the nearest integer.
Each class is an interval of values defined by its lower limit (LL) and upper
limit (UL).
c. Obtain the class size or class width, c = R/k, rounding off c to the nearest
value with precision the same as those of the raw data.
d. Construct the classes as follows:
The LL of the lowest class is LV. The lower limits of the succeeding classes
are set by adding c to the lower limit of the preceding class.
The UL of the lowest class is set as the lower limit of the next class minus
one unit of measure. The upper limits of the succeeding classes are
computed by adding c to the UL of the preceding class.

4
Tally the data by counting the frequency or number of observations that
belong to each of the classes
e. Construct other columns of information, such as:
• Class Mark or Midpoint – obtained by adding the lower and the upper
class limits and dividing by two.

Example:

• Relative Frequency – the frequency of a class expressed as a


percentage of the total number of observations

Example:

• Cumulative Frequency
< CF – the number of observations less than or equal to the upper
limit of a class
> CF – the number of observations greater than or equal to the lower
limit of a class

Example:

5
• Relative Cumulative Frequency – the cumulative frequency expressed
as a percentage of the total number of observations

II.2 ELEMENTS OF SAMPLING AND DESCRIPTIVE STATISTICS

Definition of Terms

Population – the totality of all possible values (measurements, counts, etc,) of a


particular characteristic for a specific group of objects
Example: all of the BSABE students who take classes at CLSU this school year.
Sample – a part of a population selected according to some rule or plan. It is desired
that the sample be representative of the population
Example: 10 BSABE students who attend CLSU as representative of all the BSABE
students who take classes at CLSU this school year
Parameter – a value computed from a population; a number describing some
property of a population
Example: average age of all BSABE students
Statistic – a value computed from a sample; a number describing some property of
a sample.
Example: the average age of the 10 BSABE students
Sampling – the process of selecting a part of the universe or the population
Advantages of Sampling
• Sampling just a few units save money
• Sampling just a few units save time
• Some measurements are destructive:
Example: cutting down trees to inspect ring patterns or stem or rooting
depth analysis; capturing wildlife to examine their morphology
Sampling Design – the set of rules or procedures employed in selecting the sample,
including the sampling scheme or the manner by which the samples are taken
and the sample size which is the number of sample units taken from the universe
or population.

Methods of Sampling

Non-probability Sampling – the elements of the universe or population have no


known chance of being taken in the sample
Probability Sampling –assigns a known probability of selection for all possible
samples; allows for the computation of sampling error, or the error in inference
inherent to the fact that what was observed was only a sample

6
Sampling Procedures

Probability Sampling Procedures


• Simple Random Sampling – the elements of the universe or population have
equal chance of being included in the sample; applicable when the universe is
believed to be homogeneous
• Stratified Random Sampling – the elements of the universe/population are first
grouped into strata and simple random samples are taken from each stratum;
applicable under the following situations: information is required for certain
subdivisions of the population; the population is extremely heterogeneous; the
problem of sampling may differ in different parts of the population
• Cluster Sampling – the elements are grouped into clusters, for example,
geographical location, and a simple random sample of clusters is selected and all
the elements of the selected clusters are included in the sample
• Systematic Sampling – adopts a skipping pattern in the selection of the sample
units; the only sampling scheme that allows sample selection without a
sampling frame
• Multi-stage Sampling – characterized by sampling being done in stages before
the ultimate sampling units are selected

Non-probability Sampling Procedures


• Purposive sampling
• Quota sampling
• Judgment sampling
• Accidental sampling

Some Descriptive Statistics

Measures of Central Tendency – values computed from the data that tend to
center or cluster around

a. Arithmetic Mean – the arithmetic average of all the


values.

The sample mean, for ungrouped data, is computed


as:
_
X = ∑ Xi / n where n = sample size
For grouped data
_
X = ∑ fi Xi / ∑ fi where fi = frequency of the ith class
Xi = class mark for the ith class

The population mean, μ, for ungrouped data is:

μ = ∑ Xi / N where N = population size

7
Properties:
• The sum of the deviations from the arithmetic mean is zero
• The sum of the squares of the deviations from the arithmetic mean is less
than the sum of the squares of the deviations from any value

Advantages:
• The most commonly used average
• Easy to compute
• Easily understood
• Lends itself to algebraic manipulation

Disadvantage:
• Unduly affected by extreme values and may therefore be far from
representative of the sample

Example:
The following are the ages of all seven employees of a small company:
53 32 61 57 39 44 57

Calculate the population mean.

Example:
The following frequency distribution represents the ages of 30 students in a
statistics class. Find the mean of the frequency distribution.

8
b. Weighted Mean, Xw – an average of n quantities by attaching more significance
(or weight) to some of the numbers than to others.
_ n n
Xw = ∑ Wi Xi / ∑ Wi where: W = assigned weight for the ith quantity
i=1 i =1

Example:
Grades in a statistics class are weighted as follows:
Tests are worth 50% of the grade, homework is worth 30% of the grade and
the final is worth 20% of the grade. A student receives a total of 80 points on
tests, 100 points on homework, and 85 points on his final. What is his current
grade?

c. Geometric Mean, G – the nth root of the product of k positive numbers; used
primarily to average data for which the ratio of consecutive terms remains
approximately constant, which occurs with such data as rates of change, ratios,
etc.

G == ( X1 X2 … Xn) 1/n

d. Harmonic Mean, H – for n numbers, the number n divided by the sum of the
reciprocals of the n numbers; most frequently used in averaging speeds for
various distances covered where the distances remain constant, also in finding
the average cost of common commodity when several different purchases are
made by investing the same amount of money each time.

H = n / ( ∑(1/Xi) )

9
Exercise:
Find the Geometric and Harmonic Mean.

e. Midrange, MR – the average of the highest and lowest value in the data set.

MR = (Xmin + Xmax)/2

Quick and easy to compute but is often inefficient because all the information
contained in the intermediate values has been ignored.

f. Mode – the value that occurs most frequently (absolute mode). There could be
several modes; a relative mode is a value that occurs more frequently than
neighboring values even if it is not an absolute mode

For grouped data, the mode Mo is:

Mo = LMo + [d1/(d1+d2)] (w) where: LMo = lower limit of the modal class
d1 = the difference, sign neglected,
between the frequency of the modal class
and the frequency of the preceding class
d2 = the difference, sign neglected,
between the frequency of the modal class
and the frequency of the following class
w = width of the modal class

10
g. Median – Half of the observations should have a value less than the median
and half should have a value greater than the median

The median, Md, for ungrouped data is computed as:

Md = X (N+1)/2 if N is odd

= ½ ( X N/2 + X(N/2)+1 ) if N is even

For grouped data,

(N+1)/2 - S
Md = LMd + --------------- w where: LMd = lower limit of the median
fMd class
N = number of observations in
the sample
S = sum of the frequencies
in all classes preceding the
median class
fMd = frequency of the median
class
w = width of the median class

11
h. Percentile, Decile, and Quartile Limits
Percentiles – values dividing the array into 100 equal parts
Deciles – values dividing the array into 10 equal parts
Quartiles – values dividing the array into 4 equal parts

Example:
If a score is located at the 80th percentile (P80 ), it means that 80% of all the
scores fall at or below this score in the distribution and 20% of all the scores fall
above this value.

The jth Percentile is obtained as follows:


• Arrange the data in increasing order
• Compute for k
k = (j /100) N where N = total number of observations
• If k is a whole number, then the jth percentile is the aerage of the values in
the kth and (k + 1)th position. Otherwise, it is the observation in the next
higher whole number position.

The three quartiles, Q1, Q2, and Q3, approximately divide an ordered data set
into four equal parts.

Example:
The quiz scores for 15 students is listed below. Find the first, second, and third
quartiles of the scores.
28 43 48 51 43 30 55 44 48 33 45 37 37 42 38

12
Measures of Dispersion – values used to describe the extent of dispersion or
variability of data

a. Range – the difference between the largest and the smallest measurement in
the data set
R = Xmax - Xmin

Example:
The following data are the closing prices for a certain stock on ten successive
Fridays. Find the range.

b. Mean Deviation, MD – the arithmetic mean of the absolute deviations from the
mean
_
∑ │Xi - X│
MD = ----------------
n

c. Variance
For population variance, σ2
∑(Xi - µ)2
σ2 = -------------
N

For sample variance, S2


_
∑(Xi - X)2
2
S = ------------- where: n – 1 = degree of freedom, or the number of
n–1 values that are free to vary after
certain restrictions have been placed
upon the data

The computing formula is:

n ∑ Xi2 - ( ∑ Xi )2
S2 = ------------------------
n (n – 1)

13
For grouped data, the sample variance is obtained from:

n ∑ fi Xi2 - ( ∑ fi Xi )2
2
S = -----------------------------
n (n – 1)

d. Standard Deviation – the square root of the variance

Example:
The following data are the closing prices for a certain stock on five successive
Fridays. Find the population standard deviation.

e. Coefficient of Variation, CV – a measure of variation relative to the mean; An


ideal device for comparing the variation in two series of data that are measured
in two different units, e.g. a comparison of variation in height with variation in
weight
_
CV = S / X

f. Quartile Deviation or Semi-Interquartile Range, Q – points out that 50 per cent


of the total distribution is comprised of variates lying between the first and
third quartiles

Q = ½ ( Q3 - Q1)

Measures of Skewness – values measuring the extent of departure of the


distribution from symmetry

a. Pearson’s First Coefficient of Skewness


_
Skewness = ( X - Mo) / S

14
b. Pearson’s Second Coefficient of Skewness, SK
_
SK = 3 ( X – Md)/ S If SK = 0, distribution of data is symmetric SK <
0, distribution is negatively skewed, or
the frequency curve of the distribution
has a longer tail to the left of the
central maximum than to the right
SK > 0, distribution is positively skewed

c. Moment Coefficient of Skewness, a3


_
m3 ∑ ( X - X) 3 / n
a3 = -------- = -------------------
If
S3 S3

Measures of Kurtosis – a value that measures the flatness or peakedness of the


distribution of data, usually taken relative to a normal curve
• Leptokurtic – a distribution having a relatively high peak
• Platykurtic – a flat-topped distribution
• Mesokurtic – a distribution which is not very peaked or very flat-topped like
the normal distribution

a. Moment Coefficient of Kurtosis, a4


m4 ∑ ( X - X) 4 / n
a4 = -------- = ------------------- If a4 = 3, distribution is normal
S4 (S ) 2 2

b. Kurtosis, K

K = ( a4 - 3) If K = 0, distribution is normal
K > 0, distribution is leptokurtic
K < 0, distribution is platykurtic

15
ABEN 3413: Engineering Data Analysis

III. Assessment
1. Conceptualized a research problem to collect data for the
identified variable(s), using:
a. objective method
b. subjective method
c. use of existing record
2. The following data represents the ages of 30 students in a
statistics class. Construct a frequency distribution that has five
classes.

3. Consider the following set of test scores for James and Rob in
Chemistry.

James 62 80 83 72 73
Rob 50 46 90 91 93

Calculate the standard deviation for each set of data and


interpret the results.

IV. References
Walpole, R.E. and raymond h.m. 1989. Probability and statistics for engineers
and scientists. 4th edition. Macmillan publishing company. 866 third
avenue, new york, new york 10022
Triola, M.F. 1994. Elementary statistics. 6th edition. Addison-wesley publishing
company, inc. United states of america
Triola, M.F. 2011. Essentials of statistics. 4th edition. Pearson education, inc.
500 boylston street, suite 900, boston ma 02116

Page 16 of 16

You might also like