0% found this document useful (0 votes)
9 views25 pages

STATISTICS

The document discusses the Normal Probability Curve (NPC), its characteristics, importance in statistics, and applications in psychology and education. It explains key concepts such as mean, standard deviation, skewness, kurtosis, and various correlation methods including Pearson's and Spearman's correlation. Additionally, it highlights the significance of normal distribution in hypothesis testing and statistical inference.

Uploaded by

srashta garg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views25 pages

STATISTICS

The document discusses the Normal Probability Curve (NPC), its characteristics, importance in statistics, and applications in psychology and education. It explains key concepts such as mean, standard deviation, skewness, kurtosis, and various correlation methods including Pearson's and Spearman's correlation. Additionally, it highlights the significance of normal distribution in hypothesis testing and statistical inference.

Uploaded by

srashta garg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 25

STATISTICS

NORMAL PROBABILITY CURVE


- This Bell shaped curve technically known as Normal Probability Curve or simply
Normal Curve and the corresponding frequency distribution of scores, having just the
same values of all three measures of central tendency (Mean, Median and Mode) is
known as Normal Distribution.
- A normal probability curve, also called a Gaussian distribution, is a bell-shaped curve
that depicts the probability of a continuous random variable.
- It shows the distribution of a continuous random variable.
- It helps us understand how things are distributed across the population
- It predicts the data from the population but can also be used to represent sample
CHARACHTERISTICS/ PROPERTIES OF NPC
1. The Normal Curve is Symmetrical: The normal probability curve is symmetrical
around its vertical axis called ordinate. the left and right halves to the middle central
point are mirror images
2. The Normal Curve is Unimodal: Since there is only one maximum point in the curve,
thus the normal probability curve is unimodal, i.e. it has only one mode.
3. The Maximum Ordinate occurs at the Center: The maximum height of the ordinate
always occur at the central point of the curve, that is the mid-point.
4. The Normal Curve is Asymptotic to the X Axis: The normal probability curve
approaches the horizontal axis asymptotically; i.e. the curve continues to decrease in
height on both ends away from the middle point (the maximum ordinate point); but
it never touches the horizontal axis
5. The Height of the Curve declines Symmetrically: In the normal probability curve the
height declines symmetrically in either direction from the maximum point.
6. The Total Percentage of Area of the Normal Curve within Two Points of Influxation is
fixed
7. The Normal Curve is Bilateral: The 50% area of the curve lies to the left side of the
maximum central ordinate and 50% of the area lies to the right side. Hence the curve
is bilateral
IMPORTANCE OF NORMAL DISTRIBUTION
The Normal distribution is by far the most used distribution in inferential statistics because
of the following reasons

1. Hypothesis Testing- In hypothesis testing, the normal distribution plays a crucial role
in determining critical regions and calculating p-values. Many statistical tests, such as
the t-test and z-test, rely on the assumption that the data follows a normal
distribution. Deviations from normality may affect the validity of these tests,
emphasizing the importance of understanding normal distribution properties.
2. Statistical Inference: Many statistical methods assume normality. Parametric tests
(e.g., t-tests, ANOVA, linear regression) often require the assumption of normality.
3. Modeling Real-World Phenomena:
o The normal distribution is used to model and describe the behavior of many real-
valued random variables.
o In natural and social sciences, we encounter phenomena that exhibit a bell-shaped
distribution. Examples include:
o Heights of people in a population.
o Errors in measurements (e.g., instrument readings, experimental data).
o Blood pressure levels in patients.
o Test scores in educational assessments.
o IQ scores.
4. The normal distribution is of great value in educational evaluation and educational
research, when we make use of mental measurement

APPLICATIONS OF NPC
There are number of applications of normal curve in the field of psychology as well as
educational measurement and evaluation. These are:
i) To determine the percentage of cases (in a normal distribution) within given limits
or scores.
ii) To determine the percentage of cases that are above or below a given score or
reference point.
iii) To determine the limits of scores which include a given percentage of cases to
determine the percentile rank of an individual or a student in his own group.
iv) To find out the percentile value of an individual on the basis of his percentile rank.
v) Dividing a group into sub-groups according to certain ability and assigning the
grades.
vi) To compare the two distributions in terms of overlapping.
vii) To determine the relative difficulty of test items.
Mean: It is the average of the data set
- Mean as the Location Parameter
- The mean signifies the center of the bell curve.
- It tells you where the data tends to cluster the most.
- Imagine the curve shifting left or right on the horizontal axis (x-axis). The mean value
dictates this location shift.

Standard Deviation: How schools are dispersed particularly mean


- How one class interval deviates from the mean & other class intervals as well
- Standard Deviation as the Scale Parameter
- The standard deviation (SD) determines how spread out the data is.
- A larger SD makes the curve wider, indicating data points are further from the mean.
- A smaller SD creates a narrower curve, showing data concentrated around the mean.
DIVERGENCE FROM NORMALITY
Generally two types of divergence occur in the normal curve.

Skewnes
A distribution is said to be is skewed when the mean and median fall at different points in
the distribution and the balance i.e. the point of center of gravity is shifted to one side or
the other to left or right. In a normal distribution the mean equals, the median exactly and
the skewness is of course zero (SK = 0).
There are two types of skewness which appear in the normal curve.
a) Negative Skewness : Distribution said to be skewed negatively or to the left when scores
are massed at the high end of the scale, i.e. the right side of the curve are spread out more
gradually toward the low end i.e. the left side of the curve. In negatively skewed distribution
the value of median will be higher than that of the value of the mean

Positive Skewness: Distributions are skewed positively or to the right, when scores are
massed at the low; i.e. the left end of the scale, and are spread out gradually toward the
high or right end
Kurtosis
The term kurtosis refers to (the divergence) in the height of the curve, specially in the
peakness. There are two types of divergence in the peakness of the curve
a) Leptokurtosis: the curve become more peeked i.e. its top become more narrow than the
normal curve and scatterdness in the scores or area of the curve shrink towards the center.
Thus in a Leptokurtic distribution, the frequency distribution curve is more peaked than to
the normal distribution curve

b) Platykurtosis: Now suppose we put a heavy pressure on the top of the wire made normal
curve. What would be the change in the, shape of the curve? Probably you may say that the
top of the curve become mor flat than to the normal. Thus a distribution of flatter Peak
than to the normal is known Platykurtosis distribution.

FACTORS CAUSING DIVERGENCE IN THE NORMAL DISTRIBUTION /NORMAL CURVE

1. Selection of the Sample: Selection of the subjects (individuals) produce skewness and
kurtosis in the distribution. If the sample size is small or sample is biased one,
skewness is possible in the distribution of scores obtained on the basis of selected
sample or group of individuals. Scores from small and highly hetrogeneous groups
yield platykurtic distribution.
2. Unsuitable or Poorly Made Tests: . If a test is too easy, scores will pile up at the high
end of the scale, whereas the test is too hard, scores will pile up at the low end of the
scale.
3. The Trait being Measured is Non-Normal: Skewness or Kurtosis or both will appear
when there is a real lack of normality in the trait being measured, e.g. interest,
attitude, suggestibility
4. Errors in the Construction and Administration of Tests: while administrating the test,
the unclear instructions ñ Error in timings, Errors in the scoring, practice and
motivation to complete the test all the these factors may cause skewness in the
distribution.
PARAMETRIC TESTS
Those test that makes the assumptions about the parameters of the population distribution
from which the sample is drawn.When data can be measured in units which are
interchangeable e.g., weights (by ratio scales), temperatures (by interval scales), that data
is said to be parametric and can be subjected to most kinds of statistical and mathematical
processes.

Assumptions

 Normality: Each Sample was drawn from a nominee distributed population


 Independence: The observations independent of each other and does no affect each
other in any way.
 continuous variable: The level of measurement used in parametric test is interval or
ratio scale
 homogeneity of variance: The variance of the population data samples came from
our equal
 probabilities sampling: The sample is randomly selected

CORRELATION
Correlation is a measure of association between to variables. Typically, one variable is
denoted as X and the other variable is denoted as Y. The relationship between these
variables is assessed by correlation coefficient.
The relationship between two variables can be of various types. Broadly, they can be
classified as linear and nonlinear relationships.
1. Linear Relationship - One of the basic forms of relationship is linear relationship.
Linear relationship can be expressed as a relationship between two variables that can
be plotted as a straight line.
2. Non-linear Relationship - They are called as curvilinear or nonlinear relationships.
The Yorkes-Dodson Law, Steven’s Power Law in Psychophysics, etc. are good examples
of non-linear relationships.
If the two variables are correlated then the relationship is either positive or negative. The
absence of relationship indicates “zero correlation”.
1. Positive Correlation- The positive correlation indicates that as the values of one
variable increases the values of other variable also increase. Consequently, as the
values of one variable decreases, the values of other variable also decrease. This
means that both the variables move in the same direction
2. Negative Correlation- The Negative correlation indicates that as the values of one
variable increases, the values of the other variable decrease. Consequently, as the
values of one variable decreases, the values of the other variable increase. This
means that two variables move in the opposite direction.
3. No relationship- . If they do not share any relationship (that is, technically the
correlation coefficient is zero), then, obviously, the direction of the correlation is
neither positive nor negative. It is often called as zero correlation or no correlation.

Correlation Coefficient
1. The correlation between any two variables is expressed in terms of a number, usually
called as correlation coefficient. The correlation coefficient is denoted by various
symbols depending on the type of correlation. The most common is ‘r’ (small ‘r’)
indicating the Pearson’s product-moment correlation coefficient.
2. The range of the correlation coefficient is from –1.00 to + 1.00
3. If the correlation coefficient is 1, then relationship between the two variables is
perfect.
4. This will happen if the correlation coefficient is – 1 or + 1.
5. As the correlation coefficient moves nearer to + 1 or – 1, the strength of relationship
between the two variables increases.
6. If the correlation coefficient moves away from the + 1 or – 1, then the strength of
relationship between two variables decreases (that is, it becomes weak).

Pearson’s Product Moment Correlation


1. The most popular way to compute correlation is ‘Pearson’s Product Moment
Correlation (r)’. This correlation coefficient can be computed when the data on both
the variables is on at least equal interval scale or ratio scale.
2. The Person’s correlation coefficient was developed by Karl Pearson in 1886
3. The Pearson’s correlation coefficient is usually calculated for two continuous variables
4. The product moment coefficient of correlation may be thought of essentially as that
ratio which expresses the extent to which changes in one variable are accompanied
by or are dependent upon changes in a second variable.
5. the following formula suggested by Karl Pearson can be used for measuring the
degree of relationship of correlation.

X & Y = deviations from actual mean


= Sum of squared deviation in X and Y taken from two means
It is used to find how strong a relationship is between data. the formula returns a value
between -1 and 1.
 1 indicates a strong positive relationship : For every positive increase in one variable
there is a positive increase of a fixed proportion in the other
 -1 Indicates a strong negative relationship : For every positive increase in one
variable there is a NEGATIVE DECREASE of a fixed proportion in the other
 a result of zero indicates no relationship: There is no positive or negative relationship
between the two variables.

SIGNIFICANCE- One can determine whether a correlation is statistically significant by


looking for the p-value. If it is less than 0.05 then it indicates a statistically significant
correlation. This means that there is less than a 5% chance that this finding is due to chance
or error.

When using Pearson’s product-moment correlation coefficient (also known as Pearson’s r),
there are several assumptions that need to be met:
1. Level of Measurement:
Both variables should be measured at the interval or ratio level.
2. Linear Relationship:
There should exist a linear relationship between the two variables.
3. Normality:
Both variables should be roughly normally distributed.
4. Related Pairs:
Each observation in the dataset should have a pair of values for the two variables.
5. No Outliers:
There should be no extreme outliers in the dataset.
BISERIAL CORRELATION
1. The biserial correlation coefficient is computed when one variable is continuous and
the other variable is artificially reduced to two categories (dichotomy).
2. The general formula for this is

p= proportion of cases in one categories of dichotomous variable


q= proportion of cases in lower group
Mp= mean of values in higher group
Mq= proportion of values in lower group
= standard deviation
y= height of ordinate of the normal curve separation proportion p & q
Assumptions
The biserial correlation coefficient rbis gives and estimate of the roduct moment r for the
given data when the following assumptions are fulfilled:
1. Continuity in the dichotomized trait
2. Normality of the distribution underlying the dichotomy
3. A large N
4. A split near the median
Limitations
1. The biserial r cannot be used in regression equation
2. Does not have any standard error of estimate
3. Is not limited unlike r to a range of +1.00
4. Creates problems in matching comparison with other coefficients of correlation

POINT BISERIAL CORRELATION


1. We resort to the computation of point biserial correlation coefficient rp,bis for
estimating the relationship between two variables when one variable is in a
continuous state and the other is in the state of natural or genuine dichotomy
2. This means that it is a correlational index that estimates the strength of a relationship
between a true dichotomous variable and a true continuous variable
3. One continuous variable (ratio or interval) must be there
4. One naturally binary variable
Mp= Mean of higher group that received positive binary variable
Mq= Mean of the lower group that received negative binary variable
= Standard deviation
q= proportion of cases in lower group
p= proportion of cases in higher group

Spearman’s Rank Order Correlation


1. Spearman’s Rank Order Correlation or Spearman’s rho (r ) is useful correlation
coefficient when the data is in rank order.
2. Spearman's correlation coefficient measures the strength and direction of association
between two ranked variables.
3. To test for a rank order relationship between two quantitative variables when
concerned that one or both variables is ordinal (rather than interval) and/or not
normally distributed or when the sample size is small.
4. Thus, it is used in the same data situation as a Pearson's correlation, except that it is
used when the data are either importantly non-normally distributed, the measurement
scale of the dependent variable is ordinal (not interval or ratio), or from a too-small
sample.
5. Spearman’s correlation focuses on the monotonic relationship between variables
rather than the linear relationship assessed by Pearson’s correlation.
A monotonic relationship is one where:
- As the value of one variable increases, the value of the other variable also increases
(or decreases).
- It does not strictly assume linearity.
6. It was developed by Edward Spearman, its coefficient (R) is expressed by the following
formula:
𝑅 = Rank correlation coefficient
D = Difference of rank between paired item in two series.
N = Total number of observation.
The value of rank correlation coefficient, R ranges from -1 to +1
• If 𝑅𝑘 = +1, then there is complete agreement in the order of the
ranks and the ranks are in the same direction
• If 𝑅𝑘 = -1, then there is complete agreement in the order of the
ranks and the ranks are in the opposite direction
• If 𝑅𝑘 = 0, then there is no correlation

What is a scatter diagram plot?


- It is a graph of ordered pairs showing a relationship between two sets of data. When
creating a scatter plot, one has two sets of information known as bivariate data, two
sets of variables that can change and are composed to find relationships.
- Each point on this graph is called an ordered pair which is two number that indicates
a location on the coordinate plane.
- The first number is the location on the x axis and the second number is the location
on the Y axis.
- Scatter plots are the graphs that present the relationship between two variables in a
data-set. It represents data points on a two-dimensional plane or on a Cartesian
system.
- The independent variable or attribute is plotted on the X-axis, while the dependent
variable is plotted on the Y-axis. These plots are often called scatter graphs or scatter
diagrams.
SIGNIFICANCE OF DIFFERENCE BETWEEN MEANS

STUDENT T-TEST
A t test is a statistical test that is used to compare the means of two groups. It is often used
in hypothesis testing to determine whether a process or treatment actually has an effect on
the population of interest, or whether two groups are different from one another.

Assumptions
The t test is a parametric test of difference, meaning that it makes the same assumptions
about your data as other parametric tests. The t test assumes your data:
1. are independent
2. are (approximately) normally distributed
3. have a similar amount of variance within each group being compared (a.k.a.
homogeneity of variance)

Choosing the Right t-Test


Scenario t-Test Type Design
Comparing groups from the same population Paired t-test Within-
(before-after treatment) subjects
Comparing groups from different populations Two-sample t-test Between-
(Independent t-test) subjects
Comparing one group to a standard value One-sample t-test N/A (Single
group)

When to use which t-test?

Scenario t-Test Type


Testing for differences between two populations Two-tailed
t-test
Testing if one population mean is greater/lesser than the other One-tailed t-
test
ANOVA
It is a statistical method that compares the mean of two or more groups to determine if
there are significant differences among them by using variance. This is done using the F-test
in statistics to determine if the variables are significant. The ANOVA Test allows for different
levels of an independent variable to be compared against the dependent variable.

NULL HYPOTHESIS: The mean score of all the groups are the same
ALTERNATE HYPOTHESIS: At least one group has different means

Assumptions:
1. Normality: The data within each group should be approximately normally distributed.
2. Homogeneity of Variance: The variance of the data within each group should be
similar.
3. Independence: Observations within each group should be independent.

Importance and Uses:


1. Biomedical Research: ANOVA helps compare means across different treatments,
survival rates, or medical device efficacy.
2. Experimental Studies: It’s crucial for assessing the impact of various factors on
outcomes.
3. Quality Control: ANOVA detects differences in product quality across production
lines.
4. Social Sciences: Used in psychology, sociology, and education research.
5. Business and Marketing: Analyzing customer preferences, product effectiveness, etc.
6. ANOVA technique enables us to compare several population means simultaneously
and thus results in lot of savings in time and money

Formula
F = Mean of the sum of squares between the group(MSB)
Mean of the summer squares due to error (MSE)

What is ANOVA Table ?


It is a table that is used to summarize the findings of an ANOVA test. There are five columns
that consist of the source of variation The sum of squares, degree of freedom, mean
squares, and the F statistic respectively.
ONE-WAY ANOVA
It compares the means of two or more independent groups to determine whether there is
statistical evidence that the associated population means are significantly different. In one
way ANOVA test, there is one categorical independent variable with at least 3 levels and
one quantitative dependent variable.

Assumption
 Normality
 Independence
 Homogeneity of variance
 Level of measurement: Continuous and categorical

TWO-WAY ANOVA
A statistical test used to determine the effect of two nominal predictor variables on a
continuous outcome variable. In two-way ANOVA, there is still one Quantitative dependent
variable and two categorical independent variables.

NULL HYPOTHESIS 1: Mean of observations grouped by one factor are the same
NULL HYPOTHESIS 2: The mean of observations grouped by other factors are the same.
NULL HYPOTHESIS 3: There is no interaction effect between the two factors

Assumption
 Normality
 Independence
 Homogeneity of variance
 Level of measurement: Continuous and categorical
When to Use Two-Way ANOVA:
 You can use it when you have collected data on a quantitative dependent variable at
multiple levels of two categorical independent variables.
 Examples:
 Investigating the effect of different social media platforms (Facebook, Twitter,
Instagram) and time of day (morning, afternoon, evening) on user engagement.
 Analyzing how temperature (hot, moderate, cold) and humidity (low, moderate, high)
impact energy consumption in buildings.

NON-PARAMETRIC TEST

1. Non-parametric tests are the mathematical methods used in statistical hypothesis


testing, which do not make assumptions about the frequency distribution of variables
that are to be evaluated.
2. The non-parametric experiment is used when there are skewed data, and it
comprises techniques that do not depend on data pertaining to any particular
distribution.
3. Nonparametric tests serve as an alternative to parametric tests such as T-test or
ANOVA that can be employed only if the underlying data satisfies certain criteria and
assumptions.

ASSUMPTIONS
1. Random Sampling- The underlying data do not meet the assumptions about the
population sample
2. The population sample size is too small
3. Level of measurement- The analyzed data is ordinal or nominal
Chi-square TEST
It is used to determine whether there is a relationship between two categorical variables
And see whether the data is significantly different from what was expected. Chi-square,
symbolically written as χ2. It is used for one of the two purposes:

 Goodness of fit: To see whether the data from the sample matches the population
from which the data was taken. In other words, To test whether the frequency
distribution of categorical variable matches your expectations
 Test for Independence: To see whether two categorical variables differ from each
other

Chi-Square Goodness of Fit Test:

 Used for one categorical variable.


 Tests whether the observed frequencies of the categories match a predefined
expected distribution.
 Imagine you flip a fair coin 100 times and expect 50 heads and 50 tails. You can use
the chi-square goodness of fit test to see if the actual number of heads and tails
observed in your experiment deviates significantly from this expectation.

Chi-Square Test of Independence:

 Used for two categorical variables.


 Tests whether the two variables are independent of each other, meaning the
outcome of one doesn't influence the other.
 For instance, you might record eye color (brown, blue, green) and hair color (blonde,
brunette, black) of people. The chi-square test of independence can help determine if
there's a relationship between these two traits (e.g., are people with brown eyes
more likely to be brunette?).
Formula:

DF = (rows - 1)(Columns - 1)

Assumptions
 Level of measurement: Categorical
 Independence
 Cells in the contingency table are mutually exclusive: It’s assumed that individuals
can only belong to one cell in the contingency table. That is, cells in the table are
mutually exclusive – an individual cannot belong to more than one cell.
 The expected value of cells should be 5 or greater in at least 80% of cells.

The null hypothesis (H0) is that there is no association between the two variables
The alternative hypothesis (H1) is that there is an association of any kind.

SIGN TEST
- It compares the sizes of two groups.
- The Sign Test stands as a fundamental non-parametric statistical method designed to
compare two related samples, typically used in scenarios where more conventional
tests such as the t-test cannot be applied due to the distributional characteristics of
the data.
- It focuses on the direction (sign) of changes between paired observations rather than
their numerical difference.

Assumptions of the Sign Test


 -Data Distribution: The test is distribution-free, meaning it does not require the data
to follow a specific distribution pattern.
 -Sample Origin: The data should originate from two related samples, which could
represent the same group under different conditions or times.
 -Dependence: Samples must be paired or matched, often reflecting a ‘before-and-
after’ scenario, where the pairing is intrinsic to the research design.
Key Features of the Sign Test
 Non-parametric Nature: It does not assume a normal distribution of the data, making
it suitable for a wide range of datasets, including ordinal data.
 Simplicity: The test relies solely on the signs (+ or -) of the differences between
paired observations, disregarding their magnitudes.
 Application: Known as the binomial sign test, it operates under the hypothesis that
the probability (p) of observing a positive difference is 0.5, reflecting no systematic
bias between the two groups.

Practical Application
1. Consumer preference testing: Imagine a taste test comparing two sodas (A and B)
for a group of people. By analyzing the "before and after" preferences (positive for A,
negative for B), the sign test can tell you if there's a statistically significant preference
for one soda over the other.
2. Medical research: Researchers might use the sign test to compare the effectiveness
of two different pain medications. They can track pain levels (before and after) for
patients and use the sign test to see if one medication leads to a significantly greater
reduction in pain compared to the other.
3. Survey analysis: Imagine a survey asking people's opinions on a new policy before
and after its implementation. The sign test can help analyze if there's a significant
shift in public opinion (more positive, more negative) after the policy change.

MEDIAN TEST
The median test is used to compare the performance of two independent groups as for
example an experimental group and a control group.

The null hypothesis: the groups are drawn from populations with the same median.
The alternative hypothesis: either that the two medians are different (two-tailed test) or
that one median is greater than the other (one-tailed test).

Assumptions
 Level of measurement: ordinal or continuous.
 Independence
 Random Sampling: each observation is chosen randomly and represents the
population.

Practical Applications- Median tests find use in diverse fields:

 Social science: Comparing educational attainment between different social


demographics.
 Medicine: Evaluating the effectiveness of new treatments by comparing median
recovery times between treatment and control groups.
 Ecology: Assessing if there are variations in plant growth between different fertilizer
types based on median plant heights.
 Business: Analyzing customer reviews to see if there's a difference in sentiment
between products based on median rating scores.

Making decisions based on results: The outcome of a median test, typically given by a p-
value, helps you decide whether to reject the null hypothesis (that the medians are equal).
This informs practical decisions. For example, a marketing campaign might target a specific
demographic group if the median test shows a significant difference in purchase
preferences between that group and others.

MANN-WHITNEY U TEST
Mann-Whitney U test is the non-parametric alternative test to the independent sample t-
test. It is a non-parametric test that is used to compare two sample means that come from
the same population, and used to test whether two sample means are equal or not.

Null hypothesis: The two populations are equal


Alternate hypothesis: The two populations are not equal

Assumption
1. The sample drawn from the population is random.
2. Independence within the samples and mutual
independence is assumed. That means that an
observation is in one group or the other (it cannot be
in both).
3. Ordinal measurement scale is assumed.
Practical Implications

 Wide range of applications: The Mann-Whitney U test finds use in various fields like
psychology (comparing treatment effects), medicine (evaluating drug efficacy
between groups), economics (analyzing differences between income groups), and
many more.
 Focus on medians: While not directly providing information about means, the test
helps us understand if the medians (center points) of the two groups are likely to be
different. This can be crucial when data may have outliers or skewed distributions.
 Decision making: The test results help researchers and analysts decide whether to
reject the null hypothesis (no difference between groups) or accept it. This informs
conclusions about the effectiveness of interventions, group characteristics, and more.

FRIEDMAN TEST
The Friedman test is a non-parametric statistical test developed by Milton Friedman. Similar
to the parametric repeated measures ANOVA, it is used to detect differences in treatments
across multiple test attempts

Assumptions
 Data should be ordinal (e.g. the Likert scale) or continuous,
 Data comes from a single group, measured on at least three different
occasions,
 The sample was created with a random sampling method,
 Blocks are mutually independent (i.e. all of the pairs are independent — one
doesn’t affect the other),
 Observations are ranked within blocks with no ties.

Null hypothesis: there is no significant difference between the dependent groups.


Alternative hypothesis: there is a significant difference between the dependent groups.

Formula:
k= number of coloumns
n= number of rows
Rj= sum of ranks in coloumn
Practical Implications

 Educational research: Comparing the effectiveness of different teaching methods on


student achievement.
 Psychology research: Assessing changes in mood or behavior across different time
points within the same group.
 Marketing research: Evaluating consumer preference for various product designs or
marketing campaigns.

By using the Friedman test, researchers can gain valuable insights into whether different
interventions, conditions, or time points have a statistically significant impact within the
same group of subjects. This helps them draw stronger conclusions about the effects being
studied.

MULTIPLE REGGRESION
Multiple regression is a statistical technique that explores how several independent
(predictor) variables influence a single dependent (criterion) variable. It can be used to
predict the value of dependent variable if the independent variables are known. It is also
used to see if there is a statistically significant relationship between sets of variable and to
find the trends in those sets of data.

Assumptions
1. Model Specification: Ensure the model includes all relevant variables and
accurately reflects the relationships being studied.
2. Linearity: The relationship between the predictors and the outcome should be
linear.
3. Normality: The variables involved should follow a normal distribution.
4. Homoscedasticity: The variance (spread of values) should be consistent across all
levels of the predictors.

Formula
Practical
Implications
Understanding complex relationships: In real-world scenarios, outcomes are rarely
influenced by just one factor. Multiple regression allows you to analyze how multiple
independent variables interact to affect a dependent variable. This provides a more
comprehensive understanding of the system you're studying.
 Making predictions: A well-constructed regression model can be used to predict
future values of the dependent variable based on the values of the independent
variables. This is useful in various domains, from business (e.g., predicting sales based
on marketing spend and economic factors) to science (e.g., predicting crop yield
based on weather patterns and fertilizer application).
 Informed decision-making: By isolating the independent contributions of different
factors, you can make more informed decisions. For instance, a company might use
regression to assess the impact of various advertising channels on customer
conversion rates, allowing them to optimize their marketing budget.

 Controlling for confounding variables: Often, there might be extraneous factors


affecting your results. Regression helps control for these confounding variables,
isolating the specific effects you're interested in. For example, studying the link
between income and educational attainment might be confounded by factors like
family background. Regression can account for this, providing a clearer picture of the
relationship between income and education.

Types

 Standard multiple regression: This is the most common type of multiple regression.
In standard multiple regression, all of the independent variables (predictors) are
entered into the regression equation at once. This type of regression is used to assess
the overall relationship between the independent variables and the dependent
variable.
 Stepwise multiple regression:Stepwise multiple regression is a more complex type of
regression that is used to identify the best subset of independent variables to predict
the dependent variable. In stepwise regression, the variables are entered into the
model one at a time, based on a statistical criterion. The process continues until no
more variables meet the criteria for inclusion. Stepwise regression is a useful tool for
identifying the most important predictors of a dependent variable, but it is important
to be aware that the results can be sensitive to the order in which the variables are
entered into the model.

 Hierarchical regression: This type of regression is used to assess the impact of


independent variables on the dependent variable, while controlling for the effects of
other independent variables. Hierarchical regression is often used in research studies
where there are multiple levels of analysis, such as a study of student achievement
that examines the effects of individual student characteristics, classroom factors, and
school-level factors.

 Ridge regression and Lasso regression: These are types of regression that are used to
address the problem of multicollinearity, which occurs when the independent
variables are highly correlated with each other. Multicollinearity can make it difficult
to estimate the coefficients of the regression model and can lead to unreliable
results. Ridge regression and Lasso regression are techniques that can be used to
shrink the coefficients of the regression model, which can help to reduce the impact
of multicollinearity.

FACTOR ANALYSIS
Factor analysis is a sophisticated statistical method aimed at reducing a large number of
variables into a smaller set of factors. This technique is valuable for extracting the
maximum common variance from all variables, transforming them into a single score for
further analysis. It is a part of the general linear model (GLM).
It determines if they underlining latent variable (factors) may explain the predictable
(Patterned)connection within a set of observed variables.
Four primary objectives
 To determine the factors that underlined within a set of observable variables
 To provide a system that can explain variance amid certain observable variables
through fewer statistically established factors.
 To reduce the data by extracting a small group of factors from a collection of
observable variables to be able to summarise said variables into fewer factor.
 To establish the characteristics of the extracted variables.

Assumptions
1. There is a linear relationship between variables
2. There is no multicollinearity, Which implies that each variable is unique. it exists
when two independent variables are highly correlated
3. Include relevant variables into the analysis
4. There is a true Correlation between variables and factors
5. There is no outliers in the data set
6. The sample used is of sufficient sizethat is there are more variables than factors and
that each variable has more data values than factor

Practical Applications

Data Reduction and Simplification:


 Fewer Variables, More Meaning: Imagine having dozens of survey questions. Factor
analysis helps condense these into a smaller set of underlying factors, making data
easier to handle and visualize.

Understanding Underlying Structures:


 What's Really Going On?: It reveals hidden patterns in your data. For instance, in
psychology, it might identify a factor like "neuroticism" that explains why someone
scores high on anxiety and stress-related questions.

Developing Measurement Tools:


 Building Better Surveys: Say you're designing a personality test. Factor analysis can
help ensure your questions effectively tap into the specific traits you're interested in
measuring.

Improving Research & Decision Making:


 Sharper Insights: By identifying key factors, it strengthens the foundation of your
research. This can lead to more targeted interventions, better product development,
or effective marketing campaigns.

Here are some specific examples of how factor analysis is used in practice:
 Psychology: Identifying personality traits (e.g., "Big Five").
 Marketing: Understanding customer preferences and segmenting markets.
 Finance: Analyzing financial risk factors in investments.
 Education: Assessing student learning outcomes and identifying areas for
improvement.

Types

 Exploratory Factor Analysis (EFA): It is applied in situations where there isn't a fixed
idea of numbers or factors in port or the relationship they have with the observed
variables.The goal is to investigate the way the factors are structured and to identify
the underlining correlations within the variables.
 It is not based on previous theories and it aims to uncover structures in large sets of
variable through the measuring of latent factors that affects the variable within a
determined data structure.
 It does not require Previous hypothesis on the relationship between factors and
variables and the results are of an inductive nature based on observations
 It is mostly used in empirical research and in the development, Validation, An
adaptation of measurement instruments in psychology because it's useful to detect a
set of common factors that explain the response to test item.
 Confirmatory Factor analysis(CFA): It is used to confirm predefined components that
have already been explored in literature before and it is applied to sanction The
effects and the possible correlation between a collection of certain factors and
variables.
 It usually requires a large sampleand the model is specified in advanceand it
produces statistics based on deduction.It is used in the situation where the
researcher has a particular hypothesis on how many factors there are and how he
observable variables are associated with each component.
 The hypothesis is founded on past studies or theories that has the purpose of
corroborating that there's a link between the factors and the observed variables.

You might also like