T10 statisitical analysis

Statistical Analysis

By Rama Krishna Kompella

Relationships Between Variables
• The relationship between variables can be
explained in various ways such as:
– Presence /absence of a relationship
– Directionality of the relationship
– Strength of association
– Type of relationship

• Presence / absence of a relationship
– E.g., if we are interested to study the customer
satisfaction levels of a fast-food restaurant, then
we need to know if the quality of food and
customer satisfaction have any relationship or not

• Direction of the relationship
– The direction of a relationship can be either
positive or negative
– Food quality perceptions are related positively to
customer commitment toward a restaurant.

• Strength of association
– They are generally categorized as nonexistent, weak,
moderate, or strong.
– Quality of food is strongly associated with customer
satisfaction in a fast-food restaurant

• Type of association
– How can the link between Y and X best be
described?
– There are different ways in which two variables
can share a relationship
• Linear relationship
• Curvilinear relationship

Chi-Square (χ2) and Frequency Data
• Today the data that we analyze consists of frequencies; that
is, the number of individuals falling into categories. In other
words, the variables are measured on a nominal scale.
• The test statistic for frequency data is Pearson Chi-Square.
The magnitude of Pearson Chi-Square reflects the amount of
discrepancy between observed frequencies and expected
frequencies.

Steps in Test of Hypothesis
1. Determine the appropriate test
2. Establish the level of significance:α
3. Formulate the statistical hypothesis
4. Calculate the test statistic
5. Determine the degree of freedom
6. Compare computed test statistic against a
tabled/critical value

1. Determine Appropriate Test
• Chi Square is used when both variables are
measured on a nominal scale.
• It can be applied to interval or ratio data that
have been categorized into a small number of
groups.
• It assumes that the observations are randomly
sampled from the population.
• All observations are independent (an individual
can appear only once in a table and there are no
overlapping categories).
• It does not make any assumptions about the
shape of the distribution nor about the
homogeneity of variances.

2. Establish Level of Significance
• α is a predetermined value
• The convention
• α = .05
• α = .01
• α = .001

3. Determine The Hypothesis:
Whether There is an Association
or Not
• Ho : The two variables are independent
• Ha : The two variables are associated

4. Calculating Test Statistics
• Contrasts observed frequencies in each cell of a
contingency table with expected frequencies.
• The expected frequencies represent the number of
cases that would be found in each cell if the null
hypothesis were true ( i.e. the nominal variables are
unrelated).
• Expected frequency of two unrelated events is
product of the row and column frequency divided by
number of cases.
Fe= Fr Fc / N


 ( Fo − Fe )  2
χ = ∑
2

 Fe 

O
fre bse
qu rv
en ed
cie
s

 ( Fo − Fe )  2
χ = ∑
2

 Fe 

Ex que
fre
pe nc
cte y
d
qu ted
cy
fre pec
en
Ex

5. Determine Degrees of

of
ber
Num ls in
leve n
m
df = (R-1)(C-1)

colu le
b
Freedom

varia
Numb
e
levels r of
in ro
variab w
le

6. Compare computed test statistic
against a tabled/critical value
• The computed value of the Pearson chi-
square statistic is compared with the critical
value to determine if the computed value is
improbable
• The critical tabled values are based on
sampling distributions of the Pearson chi-
square statistic
• If calculated χ2 is greater than χ2 table
value, reject Ho

Example
• Suppose a researcher is interested in buying
preferences of environmentally conscious
consumers.
• A questionnaire was developed and sent to a
random sample of 90 voters.
• The researcher also collects information about
the gender of the sample of 90 respondents.

Bivariate Frequency Table or
Contingency Table

Favor Neutral Oppose f row

Male 10 10 30 50

Female 15 15 10 40

f column 25 25 40 n = 90

Contingency Table


Male 10 10 30 50

Female 15 15 10 40

f column e d 25 25 40 n = 90
erv cies
bs en
O qu
fre


Row frequency
Contingency Table


Male 10 10 30 50

Female 15 15 10 40

f column 25 25 40 n = 90

Contingency Table


Male 10 10 30 50

Female 15 15 10 40

f column 25 25 40 n = 90
Column frequency

1. Determine Appropriate Test

1. Gender ( 2 levels) and Nominal
2. Buying Preference ( 3 levels) and Nominal

2. Establish Level of Significance

Alpha of .05

3. Determine The Hypothesis
• Ho : There is no difference between men and
women in their opinion on pro-environmental
products.

• Ha : There is an association between gender
and opinion on pro-environmental products.



Men fo =10 fo =10 fo =30 50
fe =13.9 fe =13.9 fe=22.2
Women fo =15 fo =15 fo =10 40
fe =11.1 fe =11.1 fe =17.8
f column 25 25 40 n = 90


= 50*25/90
Men fo =10 fo =10 fo =30 50
fe =13.9 fe =13.9 fe=22.2
Women fo =15 fo =15 fo =10 40
fe =11.1 fe =11.1 fe =17.8
f column 25 25 40 n = 90



Men fo =10 fo =10 fo =30 50
fe =13.9 fe =13.9 fe=22.2
= 40* 25/90
Women fo =15 fo =15 fo =10 40
fe =11.1 fe =11.1 fe =17.8
f column 25 25 40 n = 90


(10 − 13.89) 2 (10 − 13.89) 2 (30 − 22.2) 2
χ =
2
+ + +
13.89 13.89 22.2

(15 − 11.11) 2 (15 − 11.11) 2 (10 − 17.8) 2
+ +
11.11 11.11 17.8

= 11.03

5. Determine Degrees of
Freedom
df = (R-1)(C-1) =
(2-1)(3-1) = 2

6. Compare computed test statistic
against a tabled/critical value
• α = 0.05
• df = 2
• Critical tabled value = 5.991
• Test statistic, 11.03, exceeds critical value
• Null hypothesis is rejected
• Men and women differ significantly in their
opinions on pro-environmental products

SPSS Output Example

Chi-Square Tests

Asymp. Sig.
Value df (2-sided)
Pearson Chi-Square 11.025a 2 .004
Likelihood Ratio 11.365 2 .003
Linear-by-Linear
8.722 1 .003
Association
N of Valid Cases 90
a. 0 cells (.0%) have expected count less than 5. The
minimum expected count is 11.11.

Additional Information in SPSS Output
• Exceptions that might distort χ2 Assumptions
– Associations in some but not all categories
– Low expected frequency per cell
• Extent of association is not same as statistical
significance

Demonstrated
through an example

Another Example Heparin Lock
Placement
Complication Incidence * Heparin Lock Placement Time Group Crosstabulation

Heparin Lock Time:
Placement Time Group
1 = 72 hrs
1 2 Total
Complication Had Compilca Count 9 11 20
2 = 96 hrs
Incidence Expected Count 10.0 10.0 20.0
% within Heparin Lock
18.0% 22.0% 20.0%
Had NO Compilca Count 41 39 80
Expected Count 40.0 40.0 80.0
82.0% 78.0% 80.0%
Total Count 50 50 100
Expected Count 50.0 50.0 100.0
100.0% 100.0% 100.0%

from Polit Text: Table 8-1

Hypotheses in Smoking Habit

• Ho: There is no association between
complication incidence and duration of
smoking habit. (The variables are
independent).
• Ha: There is an association between
complication incidence and duration of
smoking habit. (The variables are related).

More of SPSS Output

Chi-Square Tests

Asymp. Sig. Exact Sig. Exact Sig.
Value df (2-sided) (2-sided) (1-sided)
Pearson Chi-Square .250b 1 .617
Continuity Correctiona .063 1 .803
Likelihood Ratio .250 1 .617
Fisher's Exact Test .803 .402
Linear-by-Linear
.248 1 .619
Association
N of Valid Cases 100
a. Computed only for a 2x2 table
b. 0 cells (.0%) have expected count less than 5. The minimum expected count is 10.
00.

Pearson Chi-Square
• Pearson Chi-Square = .
250, p = .617
Since the p > .05, we fail to
reject the null hypothesis Chi-Square Tests

that the complication rate Value df
Asymp. Sig.
(2-sided)
Exact Sig.
(2-sided)
Exact Sig.
(1-sided)

is unrelated to smoking Pearson Chi-Square
Continuity Correctiona
.250b
.063
1
1
.617
.803

habit duration. Likelihood Ratio
Fisher's Exact Test
Linear-by-Linear
.250 1 .617
.803 .402

• Continuity correction is
.248 1 .619
Association

used in situations in which
a. Computed only for a 2x2 table
b. 0 cells (.0%) have expected count less than 5. The minimum expected count is 10.

the expected frequency
00.

for any cell in a 2 by 2
table is less than 10.

More SPSS Output

Symmetric Measures

Asymp.
a b
Value Std. Error Approx. T Approx. Sig.
Nominal by Phi -.050 .617
Nominal Cramer's V .050 .617
Interval by Interval Pearson's R -.050 .100 -.496 .621c
Ordinal by Ordinal Spearman Correlation -.050 .100 -.496 .621c
a. Not assuming the null hypothesis.
b. Using the asymptotic standard error assuming the null hypothesis.
c. Based on normal approximation.

Phi Coefficient
• Pearson Chi-Square Symmetric Measures

Asymp.
a
Value Std. Error

provides information Nominal by
Nominal
Phi
Cramer's V
-.050
.050

about the existence of
Interval by Interval Pearson's R -.050 .100
Ordinal by Ordinal Spearman Correlation -.050 .100

relationship between 2 a. Not assuming the null hypothesis.
b. Using the asymptotic standard error assuming the null hypothes

nominal variables, but not
c. Based on normal approximation.

about the magnitude of
the relationship
• Phi coefficient is the χ 2
measure of the strength φ=
of the association N

Cramer’s V
• When the table is larger than 2 Symmetric Measures

by 2, a different index must be
Asymp.
a
Value Std. Error
Nominal by Phi -.050
used to measure the strength Nominal
Interval by Interval
Cramer's V
Pearson's R
.050
-.050 .100
of the relationship between the Ordinal by Ordinal
N of Valid Cases
Spearman Correlation -.050
100
.100

variables. One such index is a. Not assuming the null hypothesis.
b. Using the asymptotic standard error assuming the null hypothesis
Cramer’s V. c. Based on normal approximation.

• If Cramer’s V is large, it means
that there is a tendency for
particular categories of the first
variable to be associated with
χ 2
particular categories of the
second variable. V=
N (k − 1)

Cramer’s V
• When the table is larger than 2 Symmetric Measures

by 2, a different index must be
Asymp.
a
Value Std. Error
Nominal by Phi -.050
used to measure the strength Nominal
Interval by Interval
Cramer's V
Pearson's R
.050
-.050 .100
of the relationship between the Ordinal by Ordinal
N of Valid Cases
Spearman Correlation -.050
100
.100

variables. One such index is a. Not assuming the null hypothesis.
b. Using the asymptotic standard error assuming the null hypothesis
Cramer’s V. c. Based on normal approximation.

• If Cramer’s V is large, it means
that there is a tendency for
particular categories of the first
variable to be associated with
χ 2
particular categories of the
second variable. V=
N (k − 1)
Number of Smallest of
cases number of rows or

T10 statisitical analysis

More Related Content

Similar to T10 statisitical analysis (20)

More from kompellark (20)

Recently uploaded (20)

T10 statisitical analysis

Editor's Notes