Data Analysis
Data Analysis
INTERPRETATION AND
RESEARCH REPORT WRITING
measurement of the data and can use statistics to help reveal results and
Benefits Of Data Analysis
Among the many benefits of data analysis, the more important ones are:
Data analysis helps in structuring the findings from different sources of data.
Data analysis is very helpful in breaking a macro problem into micro parts.
Scoring
Coding
Data Cleaning
STATISTICAL DATA ANALYSIS –
DESCRIPTIVE STATISTICS
Descriptive and Inferential Statistics
Statistics is a set of procedures for gathering, measuring, classifying,
computing, describing, synthesizing, analyzing, and interpreting
systematically acquired quantitative data.
look at:
Distribution
Central Tendency
Dispersion
Distribution
The distribution is a summary of the frequency of individual values
or ranges of values for a variable. The simplest distribution would list
every value of a variable and the number of times each value occurs.
One of the most common ways to describe a single variable is with a
frequency distribution.
It is used when there is the need for a rough estimate of the measure of location.
It is used when there is the need to know the most frequently occurring value e.g., dress
styles.
It is not useful for further statistical work because the distribution can be bi-modal or
The Median
It is a score such that approximately one-half (50%) of the
scores are above it and one-half (50%) are below it when
the scores are arranged sequentially – in short, the
midpoint. The Median is the score found at the exact
middle of the set of values
Features of the median
It is not influenced by extreme scores. For example, the median for the following numbers,
It does not use all the scores in a distribution but uses only one value.
It can be used when there is incomplete data at the beginning or end of the distribution.
It is mostly appropriate for data from interval and ratio scales.
Where there are very few observations, the median is not representative of the data.
Where the data set is large, it is tedious to arrange the data in an array for ungrouped data
Uses of the median
1. It is used as the most appropriate measure of location when there is reason to believe that the
distribution is skewed.
2. It is used as the most appropriate measure of location when there are extreme scores to affect
the mean. E.g., Typical income in a company of senior and junior staff.
3. It is useful when the exact midpoint of the distribution is wanted.
4. It provides a standard of performance for comparison with individual scores when the score
distribution is skewed. For example, if the median score is 60 and an individual student
obtains 55, performance can be said to be below average/median. Also performance can be
described as just above average or far below average or just below average.
5. It can be compared with the mean to determine the direction of student performance.
The Mean
It is the sum of a set of observations divided by the total
number of observations.
To compute the mean, all the values are added up and divided
by the number of values. If the distribution is truly normal
(i.e., bell-shaped), the mean, median and mode are all equal to
each other
Uses of the mean
It is useful when the actual magnitude of the scores is needed to get
an average. E.g., total sales for a new product, selecting a student to
represent a whole class in a competition.
tendency are useful statistics for summarizing the scores in a distribution, they are not sufficient.
Averages are representatives of a frequency distribution but they fail to give a complete picture of
the distribution. They do not tell anything about the scatterness of observations within the
distribution.
the standard deviation. Both variance and standard deviation are computed for both
ungrouped and grouped data. Microsoft Excel is also useful in obtaining the variance and
standard deviations.
The Standard Deviation (SD) is a more accurate and detailed estimate of dispersion
because an outlier can greatly exaggerate the range. The Standard Deviation shows the
relation that set of scores has to the mean of the sample. The standard deviation is the
square root of the sum of the squared deviations from the mean divided by the number of
2. It helps to find out the variation in achievement among a group of students ( i.e.,
However, if their means are widely different or if they are expressed in different
units of measurement, we cannot use the standard deviations as such, for
comparing their variability. We have to use the relative measures of dispersion in
such situations. There are relative dispersions in relation to range, the quartile
deviation, the mean deviation, and the standard deviation. Of these, the coefficient
of variation (∂)which is related to the standard deviation is important.
Measurement Scales
Nominal variables
Ordinal variables
Interval variables
Ratio variables
Quartile Deviation(QDS)
It is also called the semi-inter quartile range and it depends on quartiles.
Quartiles divide distributions into four equal parts. Practically, there are three
quartiles.
The QD is half the distance between the first quartile (Q1) and the third quartile
(Q3).
Features of the Quartile Deviation
For skewed distributions, where the median is used as a measure of location the
b. Set the level of significance or alpha level for rejecting the null hypothesis
c. Collect data
compare the means of a normally distributed interval dependent variable for two independent
groups.
attributed to chance. More technically, it means that if the Null Hypothesis is true (which
means there really is no difference), there's a low probability of getting a result that large or
larger.
Comparing Two Groups
If the p-Value is Greater than .05
That's our p-value! When a p-value is less than or equal to the significance