0% found this document useful (0 votes)
45 views85 pages

ENENDA30 - Module 01 Part 1

Uploaded by

janelalaura01
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views85 pages

ENENDA30 - Module 01 Part 1

Uploaded by

janelalaura01
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 85

NATIONAL UNIVERSITY - LAGUNA

COLLEGE OF ENGINEERING AND ARCHITECTURE

ENGR. KENT PATRICK FERRARO


INSTRUCTOR
Statistics may be defined as the science that deals with the collection,
organization, presentation, analysis and interpretation of data in order be able
to draw judgements or conclusions that help in the decision-making process.
Statistics is divided into two main divisions:

❑ Descriptive Statistics deals with the procedures that organize, summarize


and describe quantitative data.

❑ Inferential Statistics deals with making a judgement or a conclusion


about a population based on the findings from a sample that is taken
from the population.
Terms Definition
Population or Universe Refers to the overall number of subjects under a particular study
Sample Any subset of population
Data Information collected on some characteristics of a population or sample.
Classified as qualitative or quantitative data.
Ungrouped Data Data which are not organized in any specific way.
(Raw Data) They are simply the collection of data as they are gathered.
Grouped Data Raw data organized into groups or categories with corresponding
frequencies.
Parameter Descriptive measure of a characteristic of a population
Statistic Measure of a characteristic of sample
Constant Characteristic or property of a population or sample which is common to
all members of the group.
Variable A characteristic or attribute that can assume different values
A subset of the population (sample) is taken for a
particular study. Data gathered is analyzed in order to
draw conclusion about the population. Since not all
of the subjects in the population are taken for the
study, there will be variations or uncertainty within
data gathered. The role of the probability is to reduce
these uncertainties and increase the strength or
confidence in the conclusions.
Data can be collected in a variety of ways. One of the most common methods is
through the use of surveys. Surveys can be done by using a variety of methods.
Using samples saves time and money and in some cases enables the researcher
to get more detailed information about a particular subject.

Samples cannot be selected in haphazard ways because the information


obtained might be biased. To obtain samples that are unbiased, statisticians use
four basic methods of sampling.
Random samples are selected by using chance methods or random numbers.

Example:
Placing numbered cards in a bowl, mix them thoroughly and select as many
cards as needed.
Researches obtain systematic samples by numbering each subject of the
population and then selecting every nth subject.

Example:
For example, suppose there were 200 subjects in a population and a sample of
20 subjects are needed. For every 10th the first subject will be selected.
Researchers obtain stratified samples by dividing the population into groups
called strata according to some characteristics that is important to the study,
then sampling from each group.
Researchers also use cluster samples. Here the population is divided into groups
called clusters by some means such as geographic area or schools in a large
school district, etc. Then the researcher randomly selects some of these clusters
and uses all members of the selected clusters as the subject of the samples.
There are different ways for conducting a survey.
❑ Telephone Surveys
❑ Mailed questionnaire Surveys
❑ Personal Interview

Surveys can take different forms. They can be used to ask only one question or
they can ask a series of questions. We can use surveys to test out people’s
opinions or to test a hypothesis.
When designing a survey, the following steps are useful:
1. Determine the objectives of your survey: What question do you want to
answer?
2. Identify the sample population: Whom will you interview?
3. Choose an interviewing method
4. Decide what questions you will ask in what order, and how to phrase
them
5. Conduct the interview and collect the data.
6. Analyze the result by making graphs and drawing conclusions.
The products and processes in the engineering and scientific disciplines are
mostly derived from experimentation. An experiment is a series of tests
conducted in a systematic manner for a better understanding of an existing
process or to explore a new product or process.

If time and resources are infinite there will be no need for designing
experiments. In production and quality control, we want to control the error
and learn as much as we can about the process or the underlying theory with
the resources at hand.
From an engineering perspective we are trying to use experimentation for the
following purposes:
❑ Reduce time
❑ Improve performance
❑ Improve reliability
❑ Achieve product and process robustness
❑ Perform evaluation of materials, design alternatives, setting component
and system tolerances
The practical steps needed for planning and conducting an experiment is
somewhat similar to scientific method. These includes:
1. Recognition and statement of the problem
2. Choice of factors, levels and ranges
3. Selection of the response variables
4. Choice of design
5. Conducting the experiment
6. Statistical analysis
7. Drawing conclusions and making recommendations
Statistical analysis means investigating trends, patterns, and relationships
using quantitative data. It is an important research tool used by scientists,
governments, businesses, and other organizations.

To draw valid conclusions, statistical analysis requires careful planning from


the very start of the research process. You need to specify your hypotheses and
make decisions about your research design, sample size, and sampling
procedure.
After collecting data from your sample, you can organize and summarize the
data using descriptive statistics. Then, you can use inferential statistics to
formally test hypotheses and make estimates about the population. Finally, you
can interpret and generalize your findings.
Step 1: Write your hypotheses and plan your research design

To collect valid data for statistical analysis, you first need to specify
your hypotheses and plan out your research design.

Writing Statistical Hypotheses


The goal of research is often to investigate a relationship between variables
within a population. You start with a prediction, and use statistical analysis to
test that prediction.
A statistical hypothesis is a formal way of writing a prediction about a
population. Every research prediction is rephrased into null and alternative
hypotheses that can be tested using sample data.

While the null hypothesis always predicts no effect or no relationship between


variables, the alternative hypothesis states your research prediction of an effect
or relationship.
Example: Statistical Hypotheses to Test an Effect

Null Hypothesis:
A 5-minute meditation exercise will have no effect on math test scores in teenagers.

Alternative Hypothesis:
A 5-minute meditation exercise will improve math test scores in teenagers.
Example: Statistical Hypotheses to Test a Correlation

Null Hypothesis:
Parental income and GPA have no relationship with each other in college students.

Alternative Hypothesis:
Parental income and GPA are positively correlated in college students.
A research design is your overall strategy for data collection and analysis. It
determines the statistical tests you can use to test your hypothesis later on.

First, decide whether your research will use a descriptive, correlational, or


experimental design. Experiments directly influence variables, whereas
descriptive and correlational studies only measure variables.
In an experimental design, you can assess a cause-and-effect relationship (e.g.,
the effect of meditation on test scores) using statistical tests of comparison or
regression.

In a correlational design, you can explore relationships between variables (e.g.,


parental income and GPA) without any assumption of causality using
correlation coefficients and significance tests.

In a descriptive design, you can study the characteristics of a population or


phenomenon (e.g., the prevalence of anxiety in U.S. college students) using
statistical tests to draw inferences from sample data.
Your research design also concerns whether you’ll compare participants at the
group level or individual level, or both.
In a between-subjects design, you compare the group-level outcomes of
participants who have been exposed to different treatments (e.g., those who
performed a meditation exercise vs those who didn’t).
In a within-subjects design, you compare repeated measures from participants
who have participated in all treatments of a study (e.g., scores from before and
after performing a meditation exercise).
In a mixed (factorial) design, one variable is altered between subjects and
another is altered within subjects (e.g., pretest and posttest scores from
participants who either did or didn’t do a meditation exercise).
Example:
You design a within-subject experiment to study whether a 5-minute
meditation exercise can improve math test scores. Your study takes repeated
measures from one group of participants.

First, you will take baseline test scores from participants.


Then, your participants will undergo a 5-minute meditation exercise.
Finally, you will record participants’ scores from a second math test.

In this experiment, the independent variable is the 5-minute meditation


exercise, and the dependent variable is the math test score from before and after
the intervention.
Example:
In a correlational study, you test whether there is a relationship between
parental income and GPA in graduating college students. To collect your data,
you will ask participants to fill in a survey and self-repot their parents’ income
and their own GPA.

There are no dependent or independent variables in this study, because you


only want to measure variables without influencing them in any way.
When planning a research design, you should operationalize your variables and
decide exactly how you will measure them.

For statistical analysis, it’s important to consider the level of measurement of


your variables, which tells you what kind of data they contain:

❑ Categorical data represents groupings. These may be nominal (e.g.,


gender) or ordinal (e.g. level of language ability).

❑ Quantitative data represents amounts. These may be on an interval


scale (e.g. test score) or a ratio scale (e.g. age).
Many variables can be measured at different levels of precision. For example,
age data can be quantitative (8 years old) or categorical (young). If a variable is
coded numerically (e.g., level of agreement from 1–5), it doesn’t automatically
mean that it’s quantitative instead of categorical.

Identifying the measurement level is important for choosing appropriate


statistics and hypothesis tests. For example, you can calculate a mean score with
quantitative data, but not with categorical data.

In a research study, along with measures of your variables of interest, you’ll


often collect data on relevant participant characteristics.
Example:
You can perform many calculations with quantitative age or test score data,
whereas categorical variables can be used to decide groupings for comparison
tests.

Variable Type of Data


Age Quantitative (ratio)
Gender Categorical (nominal)
Race or Ethnicity Categorical (nominal)
Baseline Test Scores Quantitative (interval)
Final Test Scores Quantitative (interval)
Example:
The types of variables in a correlational study determine the test you will use
for a correlation coefficient. A parametric correlation test can be used for
quantitative data, while a non-parametric correlation test should be used if one
of the variable is ordinal

Variable Type of Data


Parental Income Quantitative (ratio)
GPA Quantitative (interval)
Step 2: Collect Data from a Sample

In most cases, it’s too difficult or expensive to collect data from every member
of the population you’re interested in studying. Instead, you’ll collect data from
a sample.

Statistical analysis allows you to apply your findings beyond your own sample
as long as you use appropriate sampling procedures. You should aim for a
sample that is representative of the population.
Sampling for statistical analysis

There are two main approaches to selecting a sample.

❑ Probability sampling: every member of the population has a chance of


being selected for the study through random selection.

❑ Non-probability sampling: some members of the population are more


likely than others to be selected for the study because of criteria such as
convenience or voluntary self-selection.
A population is the entire group that you want to
draw conclusions about.

A sample is the specific group that you will collect


data from. The size of the sample is always less
than the total size of the population.
In research, a population doesn’t always refer to people. It can mean a group
containing elements of anything you want to study, such as objects, events,
organizations, countries, species, organisms, etc.
Populations are used when your research question requires, or when you have
access to, data from every member of the population.

Usually, it is only straightforward to collect data from a whole population when


it is small, accessible and cooperative.

Example:
A high school administrator wants to analyze the final exam scores of all
graduating seniors to see if there is a trend. Since they are only interested in
applying their findings to the graduating seniors in this high school, they use
the whole population dataset.
When your population is large in size, geographically dispersed, or difficult to
contact, it’s necessary to use a sample. With statistical analysis, you can use
sample data to make estimates or test hypotheses about population data.

Example:
You want to study political attitudes in your pe4ople. Your population is the
300,000 undergraduate students in the Netherlands. Because it is not practical
to collect data from all of them, you use a sample of 300 undergraduate
volunteers from three Dutch universities who meet your inclusion criteria. This
is the group who will complete your online survey.
❑ Necessity: Sometimes it’s simply not possible to study the whole
population due to its size or inaccessibility.

❑ Practicality: It’s easier and more efficient to collect data from a sample.

❑ Cost-effectiveness: There are fewer participant, laboratory, equipment, and


researcher costs involved.

❑ Manageability: Storing and running statistical analyses on smaller datasets


is easier and reliable.
A sampling error is the difference between a population parameter and a
sample statistic. In your study, the sampling error is the difference between the
mean political attitude rating of your sample and the true mean political
attitude rating of all undergraduate students in the Netherlands.

Sampling errors happen even when you use a randomly selected sample. This is
because random samples are not identical to the population in terms of
numerical measures like means and standard deviations.
Step 3: Summarize Your Data with Descriptive Statistics
Once you’ve collected all of your data, you can inspect them and
calculate descriptive statistics that summarize them.
Inspect your data
❑ There are various ways to inspect your data, including the following:
❑ Organizing data from each variable in frequency distribution tables.
❑ Displaying data from a key variable in a bar chart to view the distribution
of responses.
❑ Visualizing the relationship between two variables using a scatter plot.
❑ By visualizing your data in tables and graphs, you can assess whether your
data follow a skewed or normal distribution and whether there are any
outliers or missing data.
A normal distribution means that your data are symmetrically distributed
around a center where most values lie, with the values tapering off at the tail
ends.
In contrast, a skewed distribution is asymmetric and has more values on one
end than the other. The shape of the distribution is important to keep in mind
because only some descriptive statistics should be used with skewed
distributions.

Extreme outliers can also produce misleading statistics, so you may need a
systematic approach to dealing with these values.
Measures of central tendency describe where most of the values in a data set lie.
Three main measures of central tendency are often reported:
❑ Mode: the most popular response or value in the data set.
❑ Median: the value in the exact middle of the data set when ordered from
low to high.
❑ Mean: the sum of all values divided by the number of values.
However, depending on the shape of the distribution and level of
measurement, only one or two of these measures may be appropriate. For
example, many demographic characteristics can only be described using the
mode or proportions, while a variable like reaction time may not have a mode
at all.
Calculate measures of variability
Measures of variability tell you how spread out the values in a data set are. Four
main measures of variability are often reported:
❑ Range: the highest value minus the lowest value of the data set.
❑ Interquartile range: the range of the middle half of the data set.
❑ Standard deviation: the average distance between each value in your data
set and the mean.
❑ Variance: the square of the standard deviation.
Once again, the shape of the distribution and level of measurement should
guide your choice of variability statistics. The interquartile range is the best
measure for skewed distributions, while standard deviation and variance
provide the best information for normal distributions.
Descriptive statistics summarize and organize characteristics of a data set. A
data set is a collection of responses or observations from a sample or entire
population.

In quantitative research, after collecting data, the first step of statistical


analysis is to describe characteristics of the responses, such as the average of one
variable (e.g., age), or the relation between two variables (e.g., age and
creativity).

The next step is inferential statistics, which help you decide whether your data
confirms or refutes your hypothesis and whether it is generalizable to a larger
population.
There are 3 main types of descriptive statistics:
❑ The distribution concerns the frequency of each value.
❑ The central tendency concerns the averages of the values.
❑ The variability or dispersion concerns how spread out the values are.
Example:

You want to study the popularity of different lesire activities by gender. You
distribute a survey and ask participants how many times they did each of the
following in the past year:
• Go to library
• Watch a movie at a theatre
• Visit a national park

Your dataset is the collection of responses to the survey. Now you can use
descriptive statistics to find out the overall frequency of each activity
(distribution), the averages for each activity (central tendency), and the spread
of responses for each activity (variability).
A data set is made up of a distribution of values, or scores. In tables or graphs,
you can summarize the frequency of every possible value of a variable in
numbers or percentages. This is called a frequency distribution.
For the variable of gender, you list all possible answers on the left-hand column.
You count the number of percentage of responses for each answer and display
it on the right-hand column.

Gender Number
Male 182
Female 235
Other 27

From this table, you can see that more women than men or people with
another gender identity took part in this study.
In a grouped frequency distribution, you can group numerical response values
and add up the number of responses for each group. You can also convert each
of these numbers to percentage.
Library Visits in the Past Year Percent
0-4 6%
5-8 20 %
9-12 42 %
13-16 24 %
17+ 8%

From this table, you can see that most people visited library between 5 and 16
times in the past year.
Measures of central tendency estimate the center, or average, of a data set. The
mean, median and mode are 3 ways of finding the average.

Here we will demonstrate how to calculate the mean, median, and mode using
the first 6 responses of our survey.

Data Set: 15, 3, 12, 0, 24, 3


Total Number of Responses: 6
The mean, or M, is the most commonly used method for finding the average.
To find the mean, simply add up all response values and divide and sum by the
total number of responses. The total number of responses or observations is
called N.
Step 1: Get the sum of all values:
15 + 3 + 12 + 0 + 24 + 3 = 57
Step 2: Divide the sum to the total number of responses:
57
= 9.5
6
Mean is equal to 9.5.
The median is the value that’s exactly in the middle of a data set.
To find the median, order each response value from the smallest to the biggest.
Then, the median is the number in the middle. If there are two numbers in the
middle, find their mean.
Step 1: Arrange from smallest to biggest
0 3 3 12 15 24
Step 2: Get the middle. 3 and 12
3 + 12 15
= = 9.5
2 2
Median is equal to 9.5.
The mode is the simply the most popular or most frequent response value. A
data set can have no mode, one mode, or more than one mode.
To find the mode, order your data set from lowest to highest and find the
response that occurs most frequently.
Step 1: Arrange from lowest to highest
0 3 3 12 15 24
Step 2: Get the most frequent response value.
Mode is equal to 3.
Measures of variability give you a sense of how spread out the response values
are. The range, standard deviation and variance each reflect different aspects of
spread.

Here we will demonstrate how to calculate the range, standard deviation and
variance using the first 6 responses of our survey.

Data Set: 15, 3, 12, 0, 24, 3


Total Number of Responses: 6
The range gives you an idea of how far apart the most extreme response scores
are.
To find the range, simply subtract the lowest value from the highest value.
Step 1: Arrange from lowest to highest
0 3 3 12 15 24
Step 2: Subtract the lowest from the highest value
24 − 0 = 24
Range is equal to 24.
The standard deviation, s or SD, is the average amount of variability in your
data set. It tells you, on average, how far each score lies from the mean. The
larger the standard deviation, the more variable the data set is.

There are six steps for finding the standard deviation:


1. List each score and find their mean.
2. Subtract the mean from each score to get the deviation from the mean.
3. Square each of these deviations.
4. Add up all of the squared deviations.
5. Divide the sum of the squared deviations by N – 1.
6. Find the square root of the number you found.
Step 1 Step 2 Step 3 Step 4 Step 5 Step 6
Arranged
Deviation: Square the Sum of Divide the Square root
Data Set Mean
Data Set - Mean Deviation Deviations Sum by N-1 of Step 5
Range is equal to 24.
σ 𝑥𝑖 − 𝑥 2 2
2 ෍ 𝑥𝑖 − 𝑥 2 σ 𝑥𝑖 − 𝑥
𝑥𝑖 𝑥 𝑥𝑖 − 𝑥 𝑥𝑖 − 𝑥
𝑁−1 𝑁−1

0 0 − 9.5 = −9.5 −9.5 2 = 90.25


3 3 − 9.5 = −6.5 −6.5 2 = 42.25
3 3 − 9.5 = −6.5 −6.5 2 = 42.25
9.5 421.50 84.30 9.181503
12 12 − 9.5 = 2.5 2.5 2 = 6.25
15 15 − 9.5 = 5.5 5.5 2 = 30.25
24 24 − 9.5 = 14.5 14.5 2 = 210.25
The variance is the average of squared deviations from the mean. Variance
reflects the degree of spread in the data set. The more spread the data, the larger
the variance is in relation to the mean.
To find the variance, simply square the standard deviation. The symbol for
variance is 𝑠 2 .
Step 1: Get the standard deviation. 9.181503
Step 2: Square the standard deviation 9.1815032 = 84.30

Variance is equal to 84.30.


Univariate descriptive statistics focus Library Visits in the
on only one variable at a time. It’s Past Year

important to examine data from N 6


each variable separately using Mean 9.5
multiple measures of distribution, Median 7.5
central tendency and spread.
Mode 3
Programs like SPSS and Excel can be
used to easily calculate these. Standard Deviation 9.181503
Variance 84.30
Range 24
If you were to only consider the mean as a measure of central tendency, your
impression of the “middle” of the data set can be skewed by outliers, unlike the
median or mode.

Likewise, while the range is sensitive to outliers, you should also consider the
standard deviation and variance to get easily comparable measures of spread.
If you’ve collected data on more than one variable, you can use bivariate or
multivariate descriptive statistics to explore whether there are relationships
between them.

In bivariate analysis, you simultaneously study the frequency and variability of


two variables to see if they vary together. You can also compare the central
tendency of the two variables before performing further statistical tests.

Multivariate analysis is the same as bivariate analysis but with more than two
variables.
In a contingency table, each cell represents the intersection of two variables.
Usually, an independent variable (e.g., gender) appears along the vertical axis
and a dependent one appears along the horizontal axis (e.g., activities).

You read “across” the table to see how the independent and dependent
variables relate to each other.

Library Visits in the Past Year


Group 0-4 5-8 9-12 13-16 17+
Children 32 68 37 23 22
Adults 36 48 43 83 25
Interpreting a contingency table is easier when the raw data is converted to
percentages. Percentages make each row comparable to the other by making it
seem as if each group had only 100 observations or participants. When creating
a percentage-based contingency table, you add the N for each independent
variable on the end.

Library Visits in the Past Year (Percentages)


Group 0-4 5-8 9-12 13-16 17+ N
Children 18 % 37 % 20 % 13 % 12 % 182
Adults 15 % 20 % 18 % 35 % 11 % 235
From this table, it is clearer that similar proportions of children and adults go
to the library over 17 times a year. Additionally, children most commonly went
to the library between 5 and 8 times, while for adults, this number was between
13 and 16.
A scatter plot is a chart that shows you the relationship between two or
three variables. It’s a visual representation of the strength of a relationship.

In a scatter plot, you plot one variable along the x-axis and another one along
the y-axis. Each data point is represented by a point in the chart.
Example:
You investigate whether people who visit the library more tend to watch a
movie at a theatre less. You plot the number of times participants watched
movies at a theatre along the x-axis and visits the library along the y-axis.

From your scatter plot, you see that as the number of movies seen at movie
theatres increases, the number of visits to the library decreases. Based on your
visual assessment of a possible linear relationship, you perform further tests of
correlation and regression.
Step 4: Test Hypotheses or Make Estimates with Inferential Statistics
A number that describes a sample is called a statistic, while a number
describing a population is called a parameter. Using inferential statistics, you
can make conclusions about population parameters based on sample statistics.

Researchers often use two main methods (simultaneously) to make inferences


in statistics.
❑ Estimation: calculating population parameters based on sample
statistics.
❑ Hypothesis testing: a formal process for testing research predictions
about the population using samples.
You can make two types of estimates of population parameters from sample
statistics:
❑ A point estimate: a value that represents your best guess of the exact
parameter.
❑ An interval estimate: a range of values that represent your best guess of
where the parameter lies.

If your aim is to infer and report population characteristics from sample data,
it’s best to use both point and interval estimates in your paper.
You can consider a sample statistic a point estimate for the population
parameter when you have a representative sample (e.g., in a wide public
opinion poll, the proportion of a sample that supports the current government
is taken as the population proportion of government supporters).

There’s always error involved in estimation, so you should also provide


a confidence interval as an interval estimate to show the variability around a
point estimate.
Using data from a sample, you can test hypotheses about relationships between
variables in the population.

Hypothesis testing starts with the assumption that the null hypothesis is true in
the population, and you use statistical tests to assess whether the null
hypothesis can be rejected or not.
Statistical tests determine where your sample data would lie on an expected
distribution of sample data if the null hypothesis were true. These tests give two
main outputs:

❑ A test statistic tells you how much your data differs from the null
hypothesis of the test.

❑ A p value tells you the likelihood of obtaining your results if the null
hypothesis is actually true in the population.
Statistical tests come in three main varieties:

❑ Comparison tests assess group differences in outcomes.

❑ Regression tests assess cause-and-effect relationships between variables.

❑ Correlation tests assess relationships between variables without


assuming causation.

Your choice of statistical test depends on your research questions, research


design, sampling method, and data characteristics.
Parametric tests make powerful inferences about the population based on
sample data. But to use them, some assumptions must be met, and only some
types of variables can be used. If your data violate these assumptions, you can
perform appropriate data transformations or use alternative non-parametric
tests instead.
A regression models the extent to which changes in a predictor variable results
in changes in outcome variable(s).
❑ A simple linear regression includes one predictor variable and one
outcome variable.
❑ A multiple linear regression includes two or more predictor variables
and one outcome variable.
Comparison tests usually compare the means of groups. These may be the
means of different groups within a sample (e.g., a treatment and control
group), the means of one sample group taken at different times (e.g., pretest
and posttest scores), or a sample mean and a population mean.

❑ A t test is for exactly 1 or 2 groups when the sample is small (30 or less).

❑ A z test is for exactly 1 or 2 groups when the sample is large.

❑ An ANOVA is for 3 or more groups.


The z and t tests have subtypes based on the number and types of samples and
the hypotheses:
❑ If you have only one sample that you want to compare to a population
mean, use a one-sample test.
❑ If you have paired measurements (within-subjects design), use
a dependent (paired) samples test.
❑ If you have completely separate measurements from two unmatched
groups (between-subjects design), use an independent (unpaired)
samples test.
❑ If you expect a difference between groups in a specific direction, use
a one-tailed test.
❑ If you don’t have any expectations for the direction of a difference
between groups, use a two-tailed test.
The only parametric correlation test is Pearson’s r. The correlation coefficient
(r) tells you the strength of a linear relationship between two quantitative
variables.
Step 5: Test Hypotheses or Make Estimates with Inferential Statistics
The final step of statistical analysis is interpreting your results.

In hypothesis testing, statistical significance is the main criterion for forming


conclusions. You compare your p value to a set significance level (usually 0.05)
to decide whether your results are statistically significant or non-significant.

Statistically significant results are considered unlikely to have arisen solely due
to chance. There is only a very low chance of such a result occurring if the null
hypothesis is true in the population.
Example:
You compare your p value of 0.0027 to your significance threshold of 0.05.

Since your p value is lower, you decide to reject the null hypothesis, and you
consider your results statistically significant.

This means that you believe the meditation intervention, rather than random
factors, directly caused the increase in test scores
Example:
You compare your p value of 0.001 to your significance threshold of 0.05. With
a p value under this threshold, you can reject the null hypothesis. This indicates
a statistically significant correlation between parental income and GPA in male
college students.
Note that correlation doesn’t always means causation, because there are often
many underlying factors contributing to a complex variable like GPA. Even if
one variable is related to another, this may be because of a third variable
influencing both of them or indirect links between the two variables.
A larger sample size can also strongly influence the statistical significance of a
correlation coefficient by making very small correlation coefficients seem
significant.
A statistically significant result doesn’t necessarily mean that there are
important real-life applications or clinical outcomes for a finding.

In contrast, the effect size indicates the practical significance of your results. It’s
important to report effect sizes along with your inferential statistics for a
complete picture of your results. You should also report interval estimates of
effect sizes if you’re writing an APA style paper.
Type I and Type II errors are mistakes made in research conclusions.
❑ Type I error means rejecting the null hypothesis when it’s actually true.
❑ Type II error means failing to reject the null hypothesis when it’s false.

You can aim to minimize the risk of these errors by selecting an optimal
significance level and ensuring high power. However, there’s a trade-off
between the two errors, so a fine balance is necessary

You might also like