0% found this document useful (0 votes)
2 views37 pages

Unit-5

The document discusses data analysis, editing, coding, and various methods of data presentation, including tables, frequency distribution, and graphical representations like bar charts and pie charts. It also covers hypothesis formulation, testing, and the analysis of variance (ANOVA) as a statistical technique for comparing datasets. Key concepts include null and alternative hypotheses, types of errors in hypothesis testing, and the differences between one-way and two-way ANOVA.

Uploaded by

rituku31
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views37 pages

Unit-5

The document discusses data analysis, editing, coding, and various methods of data presentation, including tables, frequency distribution, and graphical representations like bar charts and pie charts. It also covers hypothesis formulation, testing, and the analysis of variance (ANOVA) as a statistical technique for comparing datasets. Key concepts include null and alternative hypotheses, types of errors in hypothesis testing, and the differences between one-way and two-way ANOVA.

Uploaded by

rituku31
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 37

Unit-5

Data Analysis is a process of inspecting, cleaning, transforming, and modelling data with the goal of

discovering useful information, informing conclusions, and supporting decision-making. Data analysis has

multiple facets and approaches, encompassing diverse techniques under a variety of names, while being

used in different business, science, and social science domains. In today’s business, data analysis is playing

a role in making decisions more scientific and helping the business achieve effective operation.
EDITING

EDITING is the process of checking and adjusting responses in the completed questionnaires for omissions,
legibility, and consistency and readying them for coding and storage.

Purpose of Editing

Purpose of Editing For consistency between and among responses. For completeness in responses– to
reduce effects of item non-response. To better utilize questions answered out of order. To facilitate the
coding process.
Basic Principles of Editing

1. Checking of the no. of Schedules / Questionnaire)

2. Completeness (Completed in filling of questions)

3. Legibility

4. Avoid Inconstancies in answers

5. Maintain Degree of Uniformity

6. Eliminate Irrelevant Responses


CODING

The process of identifying and classifying each answer with a numerical score or other character symbol.
The numerical score or symbol is called a code, and serves as a rule for interpreting, classifying, and
recording data. Identifying responses with codes is necessary if data is to be processed by computer.

Coded data is often stored electronically in the form of a data matrix – a rectangular arrangement of the data
into rows (representing cases) and columns (representing variables).
The data matrix is organized into fields, records, and files:

Field: A collection of characters that represents a single type of data.

Record: A collection of related fields, i.e., fields related to the same case (or respondent).

File: A collection of related records, i.e. records related to the same sample.
Tabular Representation of Data

Presentation of data is of utter importance nowadays. After all everything that’s pleasing to our eyes never
fails to grab our attention. Presentation of data refers to an exhibition or putting up data in an attractive and
useful manner such that it can be easily interpreted.

Tabular Representation

A table facilitates representation of even large amounts of data in an attractive, easy to read and organized
manner. The data is organized in rows and columns. This is one of the most widely used forms of
presentation of data since data tables are easy to construct and read.
Components of Data Tables

 Table Number: Each table should have a specific table number for ease of access and locating. This
number can be readily mentioned anywhere which serves as a reference and leads us directly to the
data mentioned in that particular table.

 Title: A table must contain a title that clearly tells the readers about the data it contains, time period
of study, place of study and the nature of classification of data.

 Headnotes: A headnote further aids in the purpose of a title and displays more information about
the table. Generally, headnotes present the units of data in brackets at the end of a table title.

 Stubs: These are titles of the rows in a table. Thus a stub display information about the data
contained in a particular row.
 Caption: A caption is the title of a column in the data table. In fact, it is a counterpart if a stub
and indicates the information contained in a column.

 Body or field: The body of a table is the content of a table in its entirety. Each item in a body
is known as a ‘cell’.

 Footnotes: Footnotes are rarely used. In effect, they supplement the title of a table if required.

 Source: When using data obtained from a secondary source, this source has to be mentioned
below the footnote.
Frequency Table

Frequency Distribution Table is a way to organize data. A frequency distribution table is an organized
tabulation of the number of individual events located in each category. It contains at least two columns, one
for the score categories (X) and another for the frequencies (f). Below we have explained briefly for you to
understand the concept of frequency table better and workout frequency table example:
Solved Example
Question: Here is the list of marks obtained for the students in the examination. Find the number of students who
got more than 85 marks, More than 95, Less than 80 more than 76.

Score (X) Frequency (f)


Below 75 4
76 – 80 14
81 – 85 2

86 – 90 8
91 – 95 5
96 – 100 1
Construction of Frequency Distribution

(1) Find the range of the data: The range is the difference between the largest and the smallest values.

(2) Decide the approximate number of classes in which the data are to be grouped. There are no hard and first rules
for number of classes. In most cases we have 5 to 20 classes. H.A. Sturges provides a formula for determining the
approximation number of classes.

(3) Determine the approximate class interval size: The size of class interval is obtained by dividing the range of data
by the number of classes and is denoted by h class interval size

In the case of fractional results, the next higher whole number is taken as the size of the class interval.

(4) Decide the starting point: The lower class limit or class boundary should cover the smallest value in the raw data.
It is a multiple of class intervals.
Graphic representation

Graphic representation is another way of analyzing numerical data. A graph is a sort of chart through which
statistical data are represented in the form of lines or curves drawn across the coordinated points plotted on
its surface.

Graphs enable us in studying the cause and effect relationship between two variables. Graphs help to
measure the extent of change in one variable when another variable changes by a certain amount.

Graphs also enable us in studying both time series and frequency distribution as they give clear account and
precise picture of problem. Graphs are also easy to understand and eye catching.
BAR CHART

A bar chart or bar graph is a chart or graph that presents categorical data with rectangular bars with heights
or lengths proportional to the values that they represent. The bars can be plotted vertically or horizontally. A
vertical bar chart is sometimes called a line graph.

A bar graph shows comparisons among discrete categories. One axis of the chart shows the specific
categories being compared, and the other axis represents a measured value. Some bar graphs present bars
clustered in groups of more than one, showing the values of more than one measured variable.

A vertical bar graph is shown below:


PIE CHART

A pie chart (or a circle chart) is a circular statistical graphic, which is divided into slices to illustrate
numerical proportion. In a pie chart, the arc length of each slice (and consequently its central angle and
area), is proportional to the quantity it represents. While it is named for its resemblance to a pie which has
been sliced, there are variations on the way it can be presented.

Pie charts are very widely used in the business world and the mass media. However, they have been
criticized, and many experts recommend avoiding them, pointing out that research has shown it is difficult
to compare different sections of a given pie chart, or to compare data across different pie charts. Pie charts
can be replaced in most cases by other plots such as the bar chart, box plot or dot plots.
HISTOGRAM

Histogram is a non-cumulative frequency graph, it is drawn on a natural scale in which the representative
frequencies of the different class of values are represented through vertical rectangles drawn closed to each
other. Measure of central tendency, mode can be easily determined with the help of this graph.
Hypothesis

A hypothesis (plural: hypotheses), in a scientific context, is a testable statement about the relationship between
two or more variables or a proposed explanation for some observed phenomenon. In a scientific experiment or
study, the hypothesis is a brief summation of the researcher’s prediction of the study’s findings, which may be
supported or not by the outcome. Hypothesis testing is the core of the scientific method.

The researcher’s prediction is usually referred to as the alternative hypothesis, and any other outcome as the null
hypothesis — basically, the opposite outcome to what is predicted. The null hypothesis satisfies the requirement
for falsifiability: the capacity for a proposition to be proven false, which some schools of thought consider
essential to the scientific method. According to others, however, testability is adequate, on the grounds that if
there is sufficient support for a hypothesis it is not necessary to be able to conceive of a contrary outcome.
Framing Null Hypothesis

The null hypothesis is a general statement or default position that there is no relationship between two
measured phenomena, or no association among groups. Testing (accepting, approving, rejecting, or
disproving) the null hypothesis—and thus concluding that there are or are not grounds for believing that
there is a relationship between two phenomena (e.g. that a potential treatment has a measurable effect)—is a
central task in the modern practice of science; the field of statistics gives precise criteria for rejecting a null
hypothesis.
A null hypothesis is a precise statement about a population
that we try to reject with sample data
“Null” Does Not Mean “Zero”

A common misunderstanding is that “null” implies “zero”. This is often but not always the case. For example, a null
hypothesis may also state that

The correlation between frustration and aggresion is 0.5.

No zero involved here and -although somewhat unusual- perfectly valid.

The “null” in “null hypothesis” derives from “nullify”: the null hypothesis is the statement that we’re trying to refute,
regardless whether it does (not) specify a zero effect.

Null Hypothesis – Limitations

Thus far, we only concluded that the population correlation is probably not zero. That’s the only conclusion from our
null hypothesis approach and it’s not really that interesting.

What we really want to know is the population correlation. Our sample correlation of 0.25 seems a reasonable
estimate. We call such a single number a point estimate.
Framing Alternative Hypothesis
An alternative hypothesis is one in which a difference (or an effect) between two or more variables is anticipated by the researchers;
that is, the observed pattern of the data is not due to a chance occurrence. This follows from the tenets of science, in which empirical
evidence must be found to refute the null hypothesis before one can claim support for an alternative hypothesis (i.e. there is in fact a
reliable difference or effect in whatever is being studied). The concept of the alternative hypothesis is a central part of formal
hypothesis testing.

An alternative hypothesis states that there is statistical significance between two variables. In the earlier example, the two variables
are Mentos and Diet Coke. The alternative hypothesis is the hypothesis that the researcher is trying to prove. In the Mentos and Diet
Coke experiment, Arnold was trying to prove that the Diet Coke would explode if he put Mentos in the bottle. Therefore, he proved
his alternative hypothesis was correct.

The alternative hypothesis is generally denoted as H1. It makes a statement that suggests or advises a potential result or an outcome
that an investigator or the researcher may expect. It has been categorized into two categories: directional alternative hypothesis and
non-directional alternative hypothesis.
Hypothesis Testing

Hypothesis testing was introduced by Ronald Fisher, Jerzy Neyman, Karl Pearson and Pearson’s son, Egon
Pearson. Hypothesis testing is a statistical method that is used in making statistical decisions using
experimental data. Hypothesis Testing is basically an assumption that we make about the population
parameter.

Hypothesis Testing is done to help determine if the variation between or among groups of data is due to true
variation or if it is the result of sample variation. With the help of sample data we form assumptions about
the population, then we have to test our assumptions statistically. This is called Hypothesis testing.
Key terms and concepts:

Level of significance: Refers to the degree of significance in which we accept or reject the null-hypothesis.
100% accuracy is not possible for accepting or rejecting a hypothesis, so we therefore select a level of
significance that is usually 5%.
Level of confidence: The confidence level is the percentage of times you expect to get close to the same
estimate if you run your experiment again or resample the population in the same way.
Types of Errors:
Type I error: When we reject the null hypothesis, although that hypothesis was true. Type I error is
denoted by alpha. In hypothesis testing, the normal curve that shows the critical region is called the alpha
region.

Type II errors: When we accept the null hypothesis but it is false. Type II errors are denoted by beta. In
Hypothesis testing, the normal curve that shows the acceptance region is called the beta region.
The key steps of hypothesis testing:

 Formulate the hypotheses to be tested. This means stating the null hypothesis and the alternative hypothesis.

 Determine the sampling distribution of the proportion. If the sample proportion is the outcome of a binomial
experiment, the sampling distribution will be binomial. If it is the outcome of a hypergeometric experiment, the
sampling distribution will be hypergeometric.

 Specify the significance level. (Researchers often set the significance level equal to 0.05 or 0.01, although other
values may be used.)

 Based on the hypotheses, the sampling distribution, and the significance level, define the region of acceptance.

 Test the null hypothesis. If the sample proportion falls within the region of acceptance, do not reject the null
hypothesis; otherwise, reject the null hypothesis.
Analysis of Variance
Analysis of Variance (ANOVA) is a parametric statistical technique used to compare datasets. This
technique was invented by R.A. Fisher, and is thus often referred to as Fisher’s ANOVA, as well. It is
similar in application to techniques such as t-test and z-test, in that it is used to compare means and the
relative variance between them. However, analysis of variance (ANOVA) is best applied where more than 2
populations or samples are meant to be compared.
Example of How to Use ANOVA

A researcher might, for example, test students from multiple colleges to see if students from one of the
colleges consistently outperform students from the other schools. In a business application, an R&D
researcher might test two different processes of creating a product to see if one process is better than the
other in terms of cost efficiency.
Differences between One-Way and Two-Way ANOVA

1. A one-way ANOVA is primarily designed to enable the equality testing between three or more means. A two-way
ANOVA is designed to assess the interrelationship of two independent variables on a dependent variable.

2. A one-way ANOVA only involves one factor or independent variable, whereas there are two independent variables in
a two-way ANOVA.

3. In a one-way ANOVA, the one factor or independent variable analyzed has three or more categorical groups. A two-
way ANOVA instead compares multiple groups of two factors.

4. One-way ANOVA need to satisfy only two principles of design of experiments, i.e. replication and randomization. As
opposed to Two-way ANOVA, which meets all three principles of design of experiments which are replication,
randomization, and local control.
ANOVA TEST
Suppose we have three different diets, and we want to determine if there is a significant difference in weight
loss among the three diets. We collect data from 15 participants, with 5 participants following each diet. The
weight losses (in pounds) are as follows:

• Diet 1: 2, 4, 5, 3, 4

• Diet 2: 3, 2, 4, 4, 3

• Diet 3: 5, 4, 3, 6, 5

We want to test the null hypothesis that the mean weight loss is the same for all three diets at a 0.05
significance level.
Solution
SSW=∑(2−3.6)2+(4−3.6)2+(5−3.6)2+(3−3.6)2+(4−3.6)2+(3−3.2) 2+(2−3.2) 2+(4−3.2) 2+(4−3.2) 2+(3−3.2)
2
+(5−4.6) 2+(4−4.6) 2+(3−4.6) 2+(6−4.6) 2+(5−4.6) 2

SSW=2.56+0.16+1.96+0.36+0.16+0.04+1.44+0.64+0.64+0.04+0.16+0.36+2.56+1.96+0.16=13
Step 4: Calculate the Mean Squares
Mean Square Between Groups (MSB):
Conclusion
There is not enough evidence to conclude that there is a significant difference in mean weight loss among the
three diets at the 0.05 significance level.
Report

A report is the formal writing up of a project or a research investigation. A report has clearly defined

sections presented in a standard format, which are used to tell the reader what you did, why and how you

did it and what you found. Reports differ from essays because they require an objective writing style which

conveys information clearly and concisely.


Most reports include the following sections:
 Title

 Abstract

 Introduction

 Methods

 Results

 Discussion

 Conclusions

 References

 Appendices

You might also like