ENENDA30 - Module 01 Part 1
ENENDA30 - Module 01 Part 1
Example:
Placing numbered cards in a bowl, mix them thoroughly and select as many
cards as needed.
Researches obtain systematic samples by numbering each subject of the
population and then selecting every nth subject.
Example:
For example, suppose there were 200 subjects in a population and a sample of
20 subjects are needed. For every 10th the first subject will be selected.
Researchers obtain stratified samples by dividing the population into groups
called strata according to some characteristics that is important to the study,
then sampling from each group.
Researchers also use cluster samples. Here the population is divided into groups
called clusters by some means such as geographic area or schools in a large
school district, etc. Then the researcher randomly selects some of these clusters
and uses all members of the selected clusters as the subject of the samples.
There are different ways for conducting a survey.
❑ Telephone Surveys
❑ Mailed questionnaire Surveys
❑ Personal Interview
Surveys can take different forms. They can be used to ask only one question or
they can ask a series of questions. We can use surveys to test out people’s
opinions or to test a hypothesis.
When designing a survey, the following steps are useful:
1. Determine the objectives of your survey: What question do you want to
answer?
2. Identify the sample population: Whom will you interview?
3. Choose an interviewing method
4. Decide what questions you will ask in what order, and how to phrase
them
5. Conduct the interview and collect the data.
6. Analyze the result by making graphs and drawing conclusions.
The products and processes in the engineering and scientific disciplines are
mostly derived from experimentation. An experiment is a series of tests
conducted in a systematic manner for a better understanding of an existing
process or to explore a new product or process.
If time and resources are infinite there will be no need for designing
experiments. In production and quality control, we want to control the error
and learn as much as we can about the process or the underlying theory with
the resources at hand.
From an engineering perspective we are trying to use experimentation for the
following purposes:
❑ Reduce time
❑ Improve performance
❑ Improve reliability
❑ Achieve product and process robustness
❑ Perform evaluation of materials, design alternatives, setting component
and system tolerances
The practical steps needed for planning and conducting an experiment is
somewhat similar to scientific method. These includes:
1. Recognition and statement of the problem
2. Choice of factors, levels and ranges
3. Selection of the response variables
4. Choice of design
5. Conducting the experiment
6. Statistical analysis
7. Drawing conclusions and making recommendations
Statistical analysis means investigating trends, patterns, and relationships
using quantitative data. It is an important research tool used by scientists,
governments, businesses, and other organizations.
To collect valid data for statistical analysis, you first need to specify
your hypotheses and plan out your research design.
Null Hypothesis:
A 5-minute meditation exercise will have no effect on math test scores in teenagers.
Alternative Hypothesis:
A 5-minute meditation exercise will improve math test scores in teenagers.
Example: Statistical Hypotheses to Test a Correlation
Null Hypothesis:
Parental income and GPA have no relationship with each other in college students.
Alternative Hypothesis:
Parental income and GPA are positively correlated in college students.
A research design is your overall strategy for data collection and analysis. It
determines the statistical tests you can use to test your hypothesis later on.
In most cases, it’s too difficult or expensive to collect data from every member
of the population you’re interested in studying. Instead, you’ll collect data from
a sample.
Statistical analysis allows you to apply your findings beyond your own sample
as long as you use appropriate sampling procedures. You should aim for a
sample that is representative of the population.
Sampling for statistical analysis
Example:
A high school administrator wants to analyze the final exam scores of all
graduating seniors to see if there is a trend. Since they are only interested in
applying their findings to the graduating seniors in this high school, they use
the whole population dataset.
When your population is large in size, geographically dispersed, or difficult to
contact, it’s necessary to use a sample. With statistical analysis, you can use
sample data to make estimates or test hypotheses about population data.
Example:
You want to study political attitudes in your pe4ople. Your population is the
300,000 undergraduate students in the Netherlands. Because it is not practical
to collect data from all of them, you use a sample of 300 undergraduate
volunteers from three Dutch universities who meet your inclusion criteria. This
is the group who will complete your online survey.
❑ Necessity: Sometimes it’s simply not possible to study the whole
population due to its size or inaccessibility.
❑ Practicality: It’s easier and more efficient to collect data from a sample.
Sampling errors happen even when you use a randomly selected sample. This is
because random samples are not identical to the population in terms of
numerical measures like means and standard deviations.
Step 3: Summarize Your Data with Descriptive Statistics
Once you’ve collected all of your data, you can inspect them and
calculate descriptive statistics that summarize them.
Inspect your data
❑ There are various ways to inspect your data, including the following:
❑ Organizing data from each variable in frequency distribution tables.
❑ Displaying data from a key variable in a bar chart to view the distribution
of responses.
❑ Visualizing the relationship between two variables using a scatter plot.
❑ By visualizing your data in tables and graphs, you can assess whether your
data follow a skewed or normal distribution and whether there are any
outliers or missing data.
A normal distribution means that your data are symmetrically distributed
around a center where most values lie, with the values tapering off at the tail
ends.
In contrast, a skewed distribution is asymmetric and has more values on one
end than the other. The shape of the distribution is important to keep in mind
because only some descriptive statistics should be used with skewed
distributions.
Extreme outliers can also produce misleading statistics, so you may need a
systematic approach to dealing with these values.
Measures of central tendency describe where most of the values in a data set lie.
Three main measures of central tendency are often reported:
❑ Mode: the most popular response or value in the data set.
❑ Median: the value in the exact middle of the data set when ordered from
low to high.
❑ Mean: the sum of all values divided by the number of values.
However, depending on the shape of the distribution and level of
measurement, only one or two of these measures may be appropriate. For
example, many demographic characteristics can only be described using the
mode or proportions, while a variable like reaction time may not have a mode
at all.
Calculate measures of variability
Measures of variability tell you how spread out the values in a data set are. Four
main measures of variability are often reported:
❑ Range: the highest value minus the lowest value of the data set.
❑ Interquartile range: the range of the middle half of the data set.
❑ Standard deviation: the average distance between each value in your data
set and the mean.
❑ Variance: the square of the standard deviation.
Once again, the shape of the distribution and level of measurement should
guide your choice of variability statistics. The interquartile range is the best
measure for skewed distributions, while standard deviation and variance
provide the best information for normal distributions.
Descriptive statistics summarize and organize characteristics of a data set. A
data set is a collection of responses or observations from a sample or entire
population.
The next step is inferential statistics, which help you decide whether your data
confirms or refutes your hypothesis and whether it is generalizable to a larger
population.
There are 3 main types of descriptive statistics:
❑ The distribution concerns the frequency of each value.
❑ The central tendency concerns the averages of the values.
❑ The variability or dispersion concerns how spread out the values are.
Example:
You want to study the popularity of different lesire activities by gender. You
distribute a survey and ask participants how many times they did each of the
following in the past year:
• Go to library
• Watch a movie at a theatre
• Visit a national park
Your dataset is the collection of responses to the survey. Now you can use
descriptive statistics to find out the overall frequency of each activity
(distribution), the averages for each activity (central tendency), and the spread
of responses for each activity (variability).
A data set is made up of a distribution of values, or scores. In tables or graphs,
you can summarize the frequency of every possible value of a variable in
numbers or percentages. This is called a frequency distribution.
For the variable of gender, you list all possible answers on the left-hand column.
You count the number of percentage of responses for each answer and display
it on the right-hand column.
Gender Number
Male 182
Female 235
Other 27
From this table, you can see that more women than men or people with
another gender identity took part in this study.
In a grouped frequency distribution, you can group numerical response values
and add up the number of responses for each group. You can also convert each
of these numbers to percentage.
Library Visits in the Past Year Percent
0-4 6%
5-8 20 %
9-12 42 %
13-16 24 %
17+ 8%
From this table, you can see that most people visited library between 5 and 16
times in the past year.
Measures of central tendency estimate the center, or average, of a data set. The
mean, median and mode are 3 ways of finding the average.
Here we will demonstrate how to calculate the mean, median, and mode using
the first 6 responses of our survey.
Here we will demonstrate how to calculate the range, standard deviation and
variance using the first 6 responses of our survey.
Likewise, while the range is sensitive to outliers, you should also consider the
standard deviation and variance to get easily comparable measures of spread.
If you’ve collected data on more than one variable, you can use bivariate or
multivariate descriptive statistics to explore whether there are relationships
between them.
Multivariate analysis is the same as bivariate analysis but with more than two
variables.
In a contingency table, each cell represents the intersection of two variables.
Usually, an independent variable (e.g., gender) appears along the vertical axis
and a dependent one appears along the horizontal axis (e.g., activities).
You read “across” the table to see how the independent and dependent
variables relate to each other.
In a scatter plot, you plot one variable along the x-axis and another one along
the y-axis. Each data point is represented by a point in the chart.
Example:
You investigate whether people who visit the library more tend to watch a
movie at a theatre less. You plot the number of times participants watched
movies at a theatre along the x-axis and visits the library along the y-axis.
From your scatter plot, you see that as the number of movies seen at movie
theatres increases, the number of visits to the library decreases. Based on your
visual assessment of a possible linear relationship, you perform further tests of
correlation and regression.
Step 4: Test Hypotheses or Make Estimates with Inferential Statistics
A number that describes a sample is called a statistic, while a number
describing a population is called a parameter. Using inferential statistics, you
can make conclusions about population parameters based on sample statistics.
If your aim is to infer and report population characteristics from sample data,
it’s best to use both point and interval estimates in your paper.
You can consider a sample statistic a point estimate for the population
parameter when you have a representative sample (e.g., in a wide public
opinion poll, the proportion of a sample that supports the current government
is taken as the population proportion of government supporters).
Hypothesis testing starts with the assumption that the null hypothesis is true in
the population, and you use statistical tests to assess whether the null
hypothesis can be rejected or not.
Statistical tests determine where your sample data would lie on an expected
distribution of sample data if the null hypothesis were true. These tests give two
main outputs:
❑ A test statistic tells you how much your data differs from the null
hypothesis of the test.
❑ A p value tells you the likelihood of obtaining your results if the null
hypothesis is actually true in the population.
Statistical tests come in three main varieties:
❑ A t test is for exactly 1 or 2 groups when the sample is small (30 or less).
Statistically significant results are considered unlikely to have arisen solely due
to chance. There is only a very low chance of such a result occurring if the null
hypothesis is true in the population.
Example:
You compare your p value of 0.0027 to your significance threshold of 0.05.
Since your p value is lower, you decide to reject the null hypothesis, and you
consider your results statistically significant.
This means that you believe the meditation intervention, rather than random
factors, directly caused the increase in test scores
Example:
You compare your p value of 0.001 to your significance threshold of 0.05. With
a p value under this threshold, you can reject the null hypothesis. This indicates
a statistically significant correlation between parental income and GPA in male
college students.
Note that correlation doesn’t always means causation, because there are often
many underlying factors contributing to a complex variable like GPA. Even if
one variable is related to another, this may be because of a third variable
influencing both of them or indirect links between the two variables.
A larger sample size can also strongly influence the statistical significance of a
correlation coefficient by making very small correlation coefficients seem
significant.
A statistically significant result doesn’t necessarily mean that there are
important real-life applications or clinical outcomes for a finding.
In contrast, the effect size indicates the practical significance of your results. It’s
important to report effect sizes along with your inferential statistics for a
complete picture of your results. You should also report interval estimates of
effect sizes if you’re writing an APA style paper.
Type I and Type II errors are mistakes made in research conclusions.
❑ Type I error means rejecting the null hypothesis when it’s actually true.
❑ Type II error means failing to reject the null hypothesis when it’s false.
You can aim to minimize the risk of these errors by selecting an optimal
significance level and ensuring high power. However, there’s a trade-off
between the two errors, so a fine balance is necessary