0% found this document useful (0 votes)
4 views40 pages

Data Analysis

The document discusses data analysis, interpretation, and research report writing, emphasizing the importance of both qualitative and quantitative methods. It outlines the benefits of data analysis, various statistical techniques including descriptive and inferential statistics, and the significance of measures such as central tendency and dispersion. Additionally, it covers hypothesis testing, types of statistical tests, and common misconceptions in data analysis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views40 pages

Data Analysis

The document discusses data analysis, interpretation, and research report writing, emphasizing the importance of both qualitative and quantitative methods. It outlines the benefits of data analysis, various statistical techniques including descriptive and inferential statistics, and the significance of measures such as central tendency and dispersion. Additionally, it covers hypothesis testing, types of statistical tests, and common misconceptions in data analysis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 40

DATA ANALYSIS,

INTERPRETATION AND
RESEARCH REPORT WRITING

DR. CHARLES OMANE-ADJEKUM


DEPARTMENT OF ACCOUNTING
PREPARATION AND
ORGANIZATION OF DATA
Research
Data analysis is a process used to inspect, clean, transform and remodel data

with a view to reaching to a certain conclusion for a given situation. Data

analysis is typically of two kinds: qualitative or quantitative. The type of data

dictates the method of analysis.


In qualitative research, any non-numerical data like text or individual words

are analysed. Quantitative analysis, on the other hand, focuses on

measurement of the data and can use statistics to help reveal results and
Benefits Of Data Analysis
Among the many benefits of data analysis, the more important ones are:
Data analysis helps in structuring the findings from different sources of data.

Data analysis is very helpful in breaking a macro problem into micro parts.

Data analysis acts like a filter when it comes to acquiring meaningful

insights out of a huge data set.


Data analysis helps in keeping human bias away from the research

conclusion with the help of proper statistical treatment.


Data
 Editing

Scoring

Coding

Data Cleaning
STATISTICAL DATA ANALYSIS –
DESCRIPTIVE STATISTICS
Descriptive and Inferential Statistics
 Statistics is a set of procedures for gathering, measuring, classifying,
computing, describing, synthesizing, analyzing, and interpreting
systematically acquired quantitative data.

 Statistics has two major components: Descriptive Statistics and


Inferential Statistics.

 Descriptive Statistics give numerical and graphic procedures to


summarize a collection of data in a clear and understandable way
whereas Inferential Statistics provide procedures to draw inferences
about a population from a sample.
Variable
There are three major characteristics of a single variable that we tend to

look at:
 Distribution

 Central Tendency

 Dispersion
Distribution
 The distribution is a summary of the frequency of individual values
or ranges of values for a variable. The simplest distribution would list
every value of a variable and the number of times each value occurs.
One of the most common ways to describe a single variable is with a
frequency distribution.

 Frequency distributions can be depicted in two ways, as a table or as


a graph. Distributions may also be displayed using percentages.
Frequency distribution organizes raw data or observations that have
been collected into ungrouped data and grouped data
Shape of the Distribution
 Simple descriptive statistics can provide some information relevant to
this issue. For example, if the skewness (which measures the deviation
of the distribution from symmetry) is clearly different from 0, then
that distribution is asymmetrical, while normal distributions are
perfectly symmetrical.

 If the kurtosis (which measures the peakedness of the distribution) is


clearly different from 0, then the distribution is either flatter or more
peaked than normal; the kurtosis of the normal distribution is 0.
Central Tendency
 These measures are also called Averages. They provide single
values which are used to summarise a set of
observations/data. The central tendency of a distribution is an
estimate of the "centre" of a distribution of values. They are
measures or statistics that describe the location of the centre
of a distribution.
 A distribution as we have described earlier, consists of scores
and other numerical values such as number of years of
teaching, age, income, score in a test and the frequency of
their occurrence.
The Mode
 The Mode is the most frequently occurring value in a set of
scores.

 Thus, the mode is the most frequent score in a distribution –


that is the score obtained by more students than any other
score
Features of the mode
The main advantage is that it is the only measure that is useful for nominal scale data.

It is used when there is the need for a rough estimate of the measure of location.

It is used when there is the need to know the most frequently occurring value e.g., dress

styles.

It is not useful for further statistical work because the distribution can be bi-modal or

trimodal or have no mode at all.


The Median
It is a score such that approximately one-half (50%) of the
scores are above it and one-half (50%) are below it when
the scores are arranged sequentially – in short, the
midpoint. The Median is the score found at the exact
middle of the set of values
Features of the median
It is not influenced by extreme scores. For example, the median for the following numbers,

2, 3, 4, 5, 6 is 4. If 6 changes to 23 as an extreme score, the median remains 4.

It does not use all the scores in a distribution but uses only one value.

It has limited use for further statistical work.

It can be used when there is incomplete data at the beginning or end of the distribution.

It is mostly appropriate for data from interval and ratio scales.

Where there are very few observations, the median is not representative of the data.

Where the data set is large, it is tedious to arrange the data in an array for ungrouped data
Uses of the median
1. It is used as the most appropriate measure of location when there is reason to believe that the
distribution is skewed.
2. It is used as the most appropriate measure of location when there are extreme scores to affect
the mean. E.g., Typical income in a company of senior and junior staff.
3. It is useful when the exact midpoint of the distribution is wanted.
4. It provides a standard of performance for comparison with individual scores when the score
distribution is skewed. For example, if the median score is 60 and an individual student
obtains 55, performance can be said to be below average/median. Also performance can be
described as just above average or far below average or just below average.
5. It can be compared with the mean to determine the direction of student performance.
The Mean
 It is the sum of a set of observations divided by the total
number of observations.

 The Mean or average is probably the most commonly used


method of describing central tendency.

 To compute the mean, all the values are added up and divided
by the number of values. If the distribution is truly normal
(i.e., bell-shaped), the mean, median and mode are all equal to
each other
Uses of the mean
 It is useful when the actual magnitude of the scores is needed to get
an average. E.g., total sales for a new product, selecting a student to
represent a whole class in a competition.

 It is useful for further statistical work (e.g., standard deviation)

 It is useful when the scores are symmetrically distributed (i.e.,


normal).
Uses of the mean (CON’T)
4. It provides a direction of performance, compared with other measures of location,
especially the median. Where Mean > Median, the distribution is skewed to the
right (positive skewness) showing that performance tends to be low and where
Mean < Median, the distribution is skewed to the left (negative skewness) showing
that performance tends to be high.

5. It serves as a standard of performance with which individual scores are


compared. For example, for normally distributed scores, where the mean is 56, an
individual score of 80 can be said to be far above average. Also, performance can
be described as just above average or far below average or just below average.
Dispersion
These are also called measures of variation, dispersion or scatter. While the measures of central

tendency are useful statistics for summarizing the scores in a distribution, they are not sufficient.

Averages are representatives of a frequency distribution but they fail to give a complete picture of

the distribution. They do not tell anything about the scatterness of observations within the

distribution.

The main measures that are used mainly are:


1. The Range
2. The Variance
3. The Standard Deviation
4. The Quartile Deviation (Semi-interquartile range)
Variance & Standard Deviation
The variance is always considered together with the standard deviation. It is the square of

the standard deviation. Both variance and standard deviation are computed for both

ungrouped and grouped data. Microsoft Excel is also useful in obtaining the variance and

standard deviations.

The Standard Deviation (SD) is a more accurate and detailed estimate of dispersion

because an outlier can greatly exaggerate the range. The Standard Deviation shows the

relation that set of scores has to the mean of the sample. The standard deviation is the

square root of the sum of the squared deviations from the mean divided by the number of

scores minus one.


Uses of the Standard Deviation
1. It is used as the most appropriate measure of variation/dispersion when there is

reason to believe that the distribution is normal.

2. It helps to find out the variation in achievement among a group of students ( i.e.,

it determines if a group is homogeneous or heterogeneous).


Measure of Relative Dispersion
Suppose that the two distributions to be compared are expressed in the same units
and their means are equal or nearly equal, then their variability can be compared
directly by using their standard deviations.

However, if their means are widely different or if they are expressed in different
units of measurement, we cannot use the standard deviations as such, for
comparing their variability. We have to use the relative measures of dispersion in
such situations. There are relative dispersions in relation to range, the quartile
deviation, the mean deviation, and the standard deviation. Of these, the coefficient
of variation (∂)which is related to the standard deviation is important.
Measurement Scales

Nominal variables

Ordinal variables

Interval variables

Ratio variables
Quartile Deviation(QDS)
It is also called the semi-inter quartile range and it depends on quartiles.

Quartiles divide distributions into four equal parts. Practically, there are three

quartiles.

The QD is half the distance between the first quartile (Q1) and the third quartile

(Q3).
Features of the Quartile Deviation
 For skewed distributions, where the median is used as a measure of location the

quartile deviation is a better measure of variability.

 The quartile deviation is a measure of individual differences. It helps to find out

the variation in achievement among a group of students. (i.e., it determines if a

group is homogeneous or heterogeneous).


STATISTICAL DATA ANALYSIS:
HYPOTHESIS TESTING
Inferential Statistics
Inferential statistics are divided into two types: parametric and
non-parametric.
‘Parametric’ is derived from the word parameter. A parameter
describes some aspect of a set of scores for a population. For
example, the mean of a set of scores for a population would be a
parameter, whereas the mean of a set of scores for a sample would
be a statistic. Parametric statistics are, therefore, statistical tests
based on the premise that the population from which samples are
obtained follows a normal distribution and the parameters of interest
to the researcher are the population mean (m) and standard
deviation(∂).
Inferential Statistics
Parametric statistics have certain assumptions about the observations/scores.
These assumptions are:
a. The variables are measured on interval scales
b. Scores from any two individuals in a study are independent of each other
c. The variables that distinguish each population are similarly distributed
d. The variables that distinguish each population are similarly distributed among
each population, in the case of two or more groups
Parametric statistics
 Examples of parametric statistics include the t-test and
Pearson Correlation Coefficient.

 There are other advanced statistical parametric procedures


such analysis of variance (ANOVA), analysis of covariance
(ANCOVA).
Nonparametric tests
 Non-parametric statistics, on the other hand, are statistical tests that
only make the assumption of independent observations of scores for
each individual in the study.

Nonparametric tests are also called distribution-free tests because they


don't assume that your data follow a specific distribution. You should
use nonparametric tests when your data don't meet the assumptions of
the parametric test, especially the assumption about normally
distributed data.
Example Nonparametric tests
Apparently, Pearson's correlation coefficient is parametric and
Spearman's rho is non-parametric.
 In nonparametric tests, the data are typically measured in categorical
scores on either the independent or dependent variable.
The most frequently used non- parametric test in research is chi-square
(X2). It is used to test a hypothesis concerned with category within
groups for comparison.
 In the next section of this session, we will explain the procedure of
testing a hypothesis using chi-square
Conducting Hypothesis Testing
Generally, there are six steps in hypothesis testing (Cresswell, 2002):

a. Establish a null and alternate hypotheses

b. Set the level of significance or alpha level for rejecting the null hypothesis

c. Collect data

d. Compute the sample statistic, (usually using a computer programme)

e. Make a decision about rejecting or failing to reject the null hypothesis

f. Determine the degree of differences if a statistically significant difference


Types of Statistical Tests and their uses
TYPES USES

Paired t-test -- Tests for the difference between two


related variables

Independent t-test-- Tests for the difference between two independent


variables

ANOVA -- Tests the difference between three or more independent groups.


Steps of Hypothesis Testing
Step 1: State the Null Hypothesis.

Step 2: State the Alternative Hypothesis.

Step 3: Set \alpha.

Step 4: Collect Data.

Step 5: Calculate a test statistic.

Step 6: Construct Acceptance / Rejection regions.

Step 7: Based on steps 5 and 6, draw a conclusion about H0


a.
Comparing Two Groups
Two independent samples t-test. An independent samples t-test is used when you want to

compare the means of a normally distributed interval dependent variable for two independent

groups.

What it means if t test is not Statistically Significant


In principle, a statistically significant result (usually a difference) is a result that's not

attributed to chance. More technically, it means that if the Null Hypothesis is true (which

means there really is no difference), there's a low probability of getting a result that large or

larger.
Comparing Two Groups
If the p-Value is Greater than .05

In the majority of analyses, an alpha of .05 is used as the cutoff

for significance. If the p-value is less than .05, we reject the

null hypothesis that there's no difference between the means

and conclude that a significant difference does exist.



Definition of p-value
In statistical science, the p-value is the probability of obtaining a result, at

least, as extreme as the one that was actually observed in experiment or a

study, given that the null hypothesis is true.


This probability represents the likelihood of obtaining a sample mean that

is at least as extreme as our sample mean in both tails of the distribution.

That's our p-value! When a p-value is less than or equal to the significance

level, you reject the null hypothesis


COMMON MYTHS IN DATA
ANALYSIS
Complex analysis and big words impress people.
Most people appreciate practical and understandable analyses.
Analysis comes at the end after all the data are collected.
Think about analysis upfront so that you can collect all the data you
need to analyze.
Quantitative analysis is the most accurate type of data analysis.
Some think numbers are more accurate than words but it is the quality
of the analysis process that matters.
COMMON MYTHS IN DATA
ANALYSIS
Data have their own meaning.
Data must be interpreted. Numbers do not speak for themselves.
Stating limitations to the analysis weakens the evaluation.
All analyses have weaknesses; it is more honest and responsible to
acknowledge them.
Computer analysis is always easier and better.
It depends upon the size of the data set and personal competencies.
For small sets of information, hand tabulation may be more efficient.

You might also like