Desings
Desings
Scientific Research
- Scientific orientation: the essence of such orientation is a critical attitude toward findings
a) expecting constant change and improvement
b) public nature
c) no certainty
1
- Definitions and variables
• Definition: a statement about the meaning of a work
- Theoretical definition: a scientifically meaningful concept or construct
• a scientific concept or construct has to be part of an implicit or explicit theoretical
framework that explains its relations to other concepts
• indicators have to be measurable, and it’s them who conform the variables
- broadly speaking, scientific inquiry is the pursuit of relations among variables
• a variable must…
(a) have at least 2 values or categories (EX: sex—male/female)
(b) assign only one value to each element in the population at a given time
• Classification of variables
- Measurement perspective: categorical, semi-quantitative, quantitative
- Research design perspective: dependent (criterion), independent (predictor), moderator,
mediator, and control
2
- Problems
• Problem: a statement about the relations btwn 2+ variables, usually in a interrogative form
- problems can come in diff formats….
• EX:
- Format 1: What is the relation btwn school engagement and academic achievement?
What is the relation btwn sex and physical violence?
3
• Problem: What is the effect of school engagement on academic achievement?
Hypothesis: There is a positive effect of school engagement on academic
achievement.
- hypotheses guide what to observe, what variables to relate w/ one another, and how to
relate them
- EX: Kepler made 19 hypotheses about the motion of Mars, calculated the results of
all of them, before finally establishing that a planet orbit is an ellipse
• Participants
- in order to statistically test hypotheses, must get observations
• observations in psychology usually mean getting participants!!
- Three necessary characteristics of the individuals to be observed:
(1) Representativeness: participants are representative if they resemble other
individuals in the population to be analyzed
4
(2) Suitability: the adequacy of the participants in relation to the phenomenon to
be studied
(3) Accessibility: the choice of participants must take into account the space and
time limitations of the research
5
- BUT!! variables are not always dependent/independent…
a) Strange variables: all variables that can influence the dependent variable(s),
despite not being the independent variable(s)
- poor habits (poor food, drugs and alcohol abuse) are more likely on
weekends and therefore on Mondays after these days a heart attack is more
likely (poor habits is confounded)
- EX: we may assume that social support from a teacher causes better academic
achievement in students
6
d) Moderators: variables that affect the relationship btwn an independent variable
and a dependent variable
- EX: we may assume that group work in the classroom causes better academic
achievement, but ONLY if the social climate is positive (thought of as an
interaction)
- as many potential variables may explain a certain phenomenon (DV), the researcher
needs to control for these other variables…
• Forms of control:
(a) Manipulation
(c) Statistical
(d) Randomization
• Control
A. Manipulation
- where the researcher chooses the levels or categories of the IV the participants are
going to receive
7
- EX:
Three doses of a drug: 0ml / 10ml / 20ml
Teaching methods for math: Online vs. traditional
• this type of control is only possible for certain variables and in certain types of
research designs (experimental and quasi-experimental)
B. Elimination or inclusion
- scientists try to identify and isolate strange variables that may confound the causal
effects of the IV
• can either eliminate these variables from the design OR include them
- eliminating a variable is to make it constant
• including a variable is to estimate its effects w/in the analyses of the research
design
8
C. Statistical control
• EX: a research wants to see if adding graded exercises at the end of each
lesson affects academic achievement
- SO!! they compare students in a class where grades are dependent only on
the final exam w/ students where exercises are added to the final grade
D. Randomization
- when there’s a large number of potential confounders, these confounders have not
been registered (measured), or they’re simply not known… elimination, inclusion,
or statistical control are not adequate anymore!!
• securing that in the long run, all confounders will be equally located in the
different categories of the IV— so their effect will be constant across those
levels
9
• Validity
- highly related to control
• Campbell and colleagues initially considered internal and external validity
- later, they distinguished btwn statistical conclusion validity, construct validity,
internal validity, and external validity
(b) Maturation: changes in the people analyzed during the time of the study
- EX: practice
(d) Instrumentation: any change in the measurement instruments, including
malfunctions
(e) Regression toward the mean: the tendency for extreme outcomes to be
followed by more average results; occurring usually when two variables
10
are not perfectly correlated—while high performance on one occasion is
likely to be followed by above-average performance, it’s not usually as
extreme the next time
(g) Mortality: any attrition of people during the research (missing data for any
reason)
- External validity
• the generalizability (stability) of the research findings to different populations, times,
or settings
- Scientific research
• Steps in all research (no matter what design is used):
1. Formulation of the problem
- hypothesis as the experimenters’ judgment of the results they expect to find in their
research
11
3. Data collection
- sample selection
• considering two issues:
(a) representative (sampling)
12
c) Designs that try to DESCRIBE: observational, case studies
- Experimental designs
• the IV is manipulated and subjects (participants) are randomized
- Quasi-experimental designs
• the IV is manipulated but individual subjects are NOT randomized
- Non-experimental (correlational) designs
• there is no manipulation and no randomization
13
Unit 2 and 3: Experimental designs
- in order to better understand the main types of experiments, we will follow the notational
system by Campbell and Stanley (1963):
• EX: two groups, one treated and one untreated (control) measured before and after the
treatment would be represented as…
- O1 X O2
O1 O2
ii) since row 1 is treated (applying X), expectation for a change in row 1
1
iii) can test mean differences btwn O1 and O2 in the same row to observe
if there are any changes due to the treatment
iv) expecting no change btwn O1 and O2 in the second row (there could
be some changes that occur naturally in the IV), so!! must compare
changes btwn both groups
• in this design, only the measures after the treatment has ended are taken
- in the long run, randomization guarantees internal validity
• equality of groups before treatment can’t be analyzed: will never know if groups
are different before treatment bc of missing information (why it’s called a pre-
experiment!!)
- there’s usually a control group (w/ no treatment), BUT!! sometimes this isn’t
possible and the group is simply a comparison group
2
- they can’t have a pretest bc there’s no attitude toward the task (DV) before
the task is done
- Hypotheses
a) Alternative hypothesis (H1): In the attitude test, the mean of the 20-euro group
will be higher than the mean of the 1-euro group.
b) Null hypothesis (H0): The mean of the 20-euro group will be less or equal to the
mean of the 1 euro group.
• we will treat the IV as a categorical variable (two groups) even if its actually
quantitative (amount of money paid)
- where the 1-euro group shows a better attitude toward the task (contrary to
cognitive dissonance theory)
3
- Simple statistical analysis: Comparison of two means
• two groups that are independent of each other (and all the people involved)
- can perform a t-test (Student’s t-test)
• Assumptions in a t-test:
1. Normality in the dependent variable (quantitative variable) — use Mann-
Whitney in case of extreme non-normality
3. Independent observations
4
- we have 2 values for t, and in this case we are lucky and they’re the same (bc
the standard deviations of both groups are similar, both groups have similar
variance)
• BUT!! usually they are different and we want the first one!!
• variances are homogeneous (equal)
5
- in the long run, randomization guarantees internal validity
• there may or may no be a control group
• Statistical analysis: Janis and Feshbach
- giving talks to high school students about dental hygiene
• 3 conditions in manipulated IV:
- information on dental hygiene followed by consequences of bad hygiene w/ either…
(1) strong fear appeal
- Descriptive statistics:
6
• we can observe means and SD for all conditions, and the graph of means w/ confidence
intervals
- One-factor ANOVA
• F-statistic is statistically significant (p < 0.05)
7
- Pretest-posttest treatment and control design
• aka randomized experimental/clinical trial
- O1 X O2
O1 O2
• children were randomly assigned to either receive the yoga program or their standard
morning routine
8
- BUT!! problem w/ gain scores: not simple to generalize!!!
• w/ two means of two gain scores, can calculate the differences when there are two
measurement times
(b) when there’s an initial improvement (better scores in time 2) but then
comes back to the initial scores and then worsens (worse scores in time 4)
• the mean of gain scores will not indicate anything in these cases
- and it’s quite standard to have more than 4 times of measurement in any
experiment: this is why we have 3 options of statistical tests and can’t use
t-test alone!!!!!
9
• we observe significant differences btwn treatment and control bc…
- p < 0.05
- Cohen’s d = 1.01 (large effect size)
• variables can be considered normal (normality is reasonable)
10
- Mixed ANOVA (treatment x time)
• called mixed ANOVA bc there’s repeated measure of the same people AND people in
the two groups are independent samples
- need some measure of effect size to know how effective the program is
• EX: obtain the following results indicates that there’s an effect of the yoga
program (decreasing the number of disruptive behaviors in the treatment group),
while the control group increased the number of disruptive behaviors
- if we calculate a new mean (different from the mean µ of each of the groups;
EX: µ treatment = 20, µ control = 10), we can obtain the mean between the two
groups in T1 = µ T1 = (20+10)/2 = 15
- H0: There’s no interaction effect, where all four means (T1 for treatment,
T1 for control, T2 for treatment, and T2 for control) are equal
11
- H1: The means are different, suggesting an interaction btwn time and
group
- the effects of time and group alone are not of primary interest
• instead, we need to examine the four specific means involved in the interaction
btwn time and group to determine if there’s a differential effect
(3) Sphericity: correlations have to be equal (checked when there’s more than two
repeated measures!!)
• First, name factors according to the experiment and place each variable in its place…
12
• we are interested in the time x group interaction…
13
• when looking at the graph, we observe that there’s a change in the treatment group (a
decrease in the number of disruptive behaviors), while there’s no change in the control
group (constant number of disruptive behaviors)
- results are as expected: the yoga program decreases the number of disruptive
behaviors!!
• an average autistic child under the yoga program is going to have 3 less disruptive
behaviors in T2
- ANCOVA
• basic idea of ANCOVA: we can test mean differences while controlling for a
quantitative variable
- condition for this statistical test: both groups are measured in both time points!!
• O1 X O2
O1 X O2
- this way, we can compare means to know if any difference btwn the two time
points is due to the treatment
14
• Testing assumptions:
(1) Variances are homogenous, as p = 0.072 > 0.05
in Levene test
• When applying the ANCOVA test, we observe that there are group differences (3
disruptive behaviors less) in time 2, even after controlling for time 1 as a covariate, as
p = 0.009 < 0.05
- when we have two groups and two time points, the best things we can do is an
ANCOVA analysis to see if the results match
• if t-test and mixed ANOVA tell us that the treatment is not effective, but
ANCOVA test shows differences, we have to trust ANCOVA bc it has the
main power to test differences
b) can still use mixed ANOVA to make the analysis in two time points w/ a categorical
variable w/ 3+ categories (the number of groups)
15
- BUT!! this is not the case if the randomized
clinical trial is expanded
- in this experimental design, there are only posttest measures of several treatments (or
treatments and control)
• this variable is included when we suspect that randomization may not have fully
controlled for it influence, where it potentially acts as a confounding variable
- the type of concomitant variable affects which statistical test we should use:
(a) if C is categorical, factorial ANOVA
—> similar to mixed ANOVA, but w/o repeated measures for the
same subjects— measuring distinct groups (e.g., male, female, and
treatment groups) only once, such as after treatment at time O2
• she has 60 depressive participants, half of whom are men and half women (30 and 30)
16
- placing them in 3 treatment groups, always following the
rule that there should be an equal number of men and women
in each of the three treatment groups (stratification by
gender)
- BUT!! if we consider gender, we now have 6 means of interest (the mean of each
gender in each of the treatments)
• Testing assumptions:
17
• in a simple one-factor ANOVA w/ only the dependent variable (DV) and independent
variable (IV), we would compare three means, one for each level of the IV
- BUT!! since we also have a concomitant variable (gender), we now have 6 means to
consider…
- we can examine the overall means for each gender, ignoring treatment
effects
—> providing 2 additional means of interest: the total mean for depressive
symptoms in each gender (Xm and Xw) — representing the effect of
gender only
- we can look at the overall means for each treatment, ignoring gender
differences
—> providing 3 additional means of interest: the means for each treatment
(Xp, Xd, and Xb) — representing the effect of treatment only
3) Does the effectiveness of each treatment differ btwn men and women?
18
• Effect of gender only:
- after the treatment, women have a much lower number of
depressive symptoms than men
• when looking at the numbers in post-hoc tests, we observe that means are different
19
- treatments are working differently for men and women (interaction)
• BUT!! the effect of each variable is also significant
- the effects of gender only and treatment only are both signifiant (p-values less
than 0.05)
- ANCOVA
• now considering the statistical control of age
- as age is a quantitative variable, we have independent
variables “treatment” (w/ 3 levels— psychotherapy,
antidepressants, and both) and a covariate (age)
• these are the estimated means after controlling for age w/ a confidence interval
(CI) of 95%
20
- multiple comparisons (Bonferroni in SPSS and Post-hoc in Jamovi) do not show
mean differences
- Solomon design
• all the aforementioned designs may be further complicated…
- EX: adding covariates, factors, time points, etc.
• one such complication: the so-called Solomon design
- O1 X1 O2
O1 O2
X1 O2
O2
21
- Within subjects designs
• sometimes, control groups are not needed bc subjects are their own control
- they’re compared in the experimental conditions w/ themselves
• here, all subjects are subjected to all treatment conditions (repeated measures of the
participants)
• to test this effect, he recruited 36 student volunteers from the faculty, experimenting on
them w/ the tachistoscopic presentation of both words and pseudo-words
- the subject had to answer as quickly as possible whether it was a word or not
• all subjects saw the same words and pseudo-words, only in a different order
- words could be of high, medium, or low frequency
• pseudo-words were chosen to have the same number of letters as the words
- for each subject in each experimental condition, the average number of
milliseconds it took to answer whether it was a word was measured
22
• Testing assumptions:
- Sphericity doesn’t hold: p < 0.05
- Normality is reasonable, as seen in the graph
• when making the statistical analysis (repeated measures ANOVA), we observe that
there are mean differences in the three conditions
23
• often resulting in lower or altered levels as they progress through treatments (making it
challenging to conduct a within subjects design)
- BUT!! these designs are still possible if you don’t expect strong carryover effects
• still have to control for them though by counterbalancing the order of treatments
- counterbalancing: administering treatments in all possible orders across
participants to minimize the impact of carryover on the results
- nowadays, many academic journals, research agencies, ethical committees at the university,
etc. ask for previous calculations of the sample size needed in order to get a determinate effect
b) the statistical power the research wants to achieve (usually 0.80 in psychology, or
0.90)
24
- EX: imagine you have designed a pretest-posttest experiment w/ a control group and
treatment (RCT)
- now you need to know what statistical tool you are going to use….
i) independent samples t-test on gain scores
• Mixed ANOVA
25
- Interpreting effect sizes
• partialling out: ƞ2p calculates the effect size for each IV while controlling for the effects of
other variables in the model
Summary
1. For two categories (treatment and control): use a t-test to compare gain scores, or the
differences btwn the two time points (pretest and posttest)
2. For more than two treatment groups: use ANOVA to assess differences across
multiple groups
26
3. For repeated measures (e.g., pretest and posttest for each group): use mixed ANOVA
to test both main effects (time and treatment) and the interaction effect btwn them
(1) Define your variables and identify factors (e.g., time and treatment)
(2) Check if F (btwn two variables) is greater than 1, w/ p < 0.05 and ƞ2p > 0.06
(this indicates a significant effect!!)
(3) If assumptions are met, run a post-hoc test (e.g., Tukey’s), and if not, use a
non-parametric test; look for p < 0.05 to find significant mean differences
(3) Control for other categorical variables (EX: sex) by setting them as fixed
factors, and check if treatment remains significant
27
- Within subjects design
• we perform a repeated measures ANOVA
a) Name variables (factor) and check assumptions
- variances have to be equal bc people are the same in all time points!!!
b) Look at F, p, and eta square to see if there are significant differences and the effect size
c) Post-hoc (Ptukey) to see if there are differences among all means (time points)
28
Unit 4: Quasi-experimental designs
• SO!! all experimental designs mentioned in units 2+3 may be considered quasi-
experimental if observation units are not individually randomized!!
1
• participants are “treated” for a particular behavior (dependent variable), while in these
same participants, another behavior that has not been treated is measured and used as a
control (similar to a within subjects design!!)
• the therapy is directed toward the behavior “arguing w/ adults” (the dependent
variable, Y)
- NOTE: one of the main problems of this designs is that the variable not being treated
needs to be similar to the one of interest
2
• Pretest-posttest w/ quasi-control from a previous cohort
- this is also a very specific design
- Example: A school is thinking about eliminating paper materials and instead adopting
multimedia teaching tools (tablets, computer, pdfs, exercises on computers, …)
• before they make all necessary changes, they decide to implement this strategy for one
year in a first year math class
- they get the marks at the beginning and end of the year, and compare them to the
marks of the same first year classroom, but from the previous year (which used
traditional paper materials)!!!!
3
Unit 5: Single case experimental design
(1) Single case is their own control (w/ repeated measures taken)
e) Change from one condition to the next is not fixed, the researcher waits until the
participant’s behavior in the condition becomes fairly consistent (steady state)
1
- Basic single-subject research design is the ABA design
• phase A, a baseline (no treatment) is established for the dependent variable
- the baseline phase is a kind of control condition
• when a steady state is reached, phase B begins as the researcher introduces the
treatment
- again, the researcher waits until the measures reach a steady state
• finally, the researcher removes the treatment and again waits until the
dependent variable (A) reaches a steady state
• the basic ABA design can be extended w/ the reintroduction of the treatment (ABAB)
and another return to the baseline (ABABA), etc.
- the amount of generalization is in doubt unless we observe that the effect of the
treatment works among individuals
• where sometimes the treatment may work for one individual but not for others
- “Simpler” single case designs
• characterized by not having reversal and are not considered “proper” experiments
- two types of designs that are not experiments but are single-case designs:
(1) AB design: no reversal to the baseline condition
2
- Multiple treatment reversal designs
• single case designs, such as those already mentioned, can be extended to several
“treatments”
- Multiple-baseline designs
• single case designs can be changed to suit several “participants”
- potential problem w/ the reversal design is that sometimes it’s impossible to reverse bc…
a) the treatment is working, where its unethical not to treat
b) the dependent variable does not return to the baseline (the treatment has lasting
effects)
3
- BUT!! if the dependent variable changes when the treatment is introduced for
multiple participants (especially when the treatment is introduced at different
times for different participants), then it is extremely unlikely to be a
coincidence
- Time series
• all single case designs have something in common: many measures are taken in different
time laps
- these models are fitted to time series data either to better understand the data or to
predict future points in the series (forecasting)
4
- this is the statistical decomposition of this time series w/ ARIMA methodology:
• We observe…
(1) small changes in time rather than a big change across timeline
(3) seasonal changes (e.g., observing in the trend that there are more diseases in
winter than summer)
(4) the peaks of random error (particular points where data change for no reason
— we must investigate these points!!)
- time laps are not exactly the same, and this must be managed statistically
• Latent Growth Modeling: the methodology to establish changes
5
Unit 6: Ex post facto designs
c) Clinical method
d) Developmental designs
1
• we only have a DV that has happened (the person has committed suicide)
- Ex: Durkheim is one of the fathers of sociology, and in 1897, he analyzed in retrospect
about 26,000 cases of completed suicides
• he offered an examination of how suicide rates at the time differed across religions
- suicidal rates for Catholics were lower than for Protestants (theorized that this was
due to stronger forms of social control and cohesion in Catholics)
• additionally, he found that suicide was less common among women than men,
more common among single people that those who were romantically partnered,
and less common among those who had children
- further, he found that soldiers commit suicide more often than civilians and
that, curiously, raters of suicide are higher during peacetimes than they are
during wars
• all these predictions (observations) are done once that suicides are already
committed!!!
• we don’t have a comparison group, we don’t have similar people to this control
group, we have a constant
2
• EX: people w/o that disease
- this group is called the quasi-control group
- very important to understand that this is NOT an experiment
• the subjects are not randomly allocated to the different categories of the IV
- rather, they are selected bc of their values in the DV
• once the people in the quasi-control group is found, it’s important to match them
w/ the key group characteristics in the confounding variables (CV)
- we are doing a retrospective ex post facto design, but we are trying to control for the
confounding variables
- EX: such a research design was used by Plutchik and van Praag (1990)
• they got data on 20 adolescents that had committed suicide
- they measured a lot of variables related to the suicidal act
• later, they looked for friends of these adolescents w/ similar characteristics and
backgrounds and tried to measure the same variables
3
- Prospective ex post facto designs
• Simple prospective
- sometimes what “has already happened” are the independent variables (IV), but the
dependent variable has yet to occur
• later on, we will study these people regrading the dependent variable
- EX: we want to analyze the effects of motivation on academic achievement
• we select a group of students at the beginning of the semester and measure
them in motivation (IV), grouping them into high-medium-low motivation
groups
4
- Clinical method
• again, the clinical method is non-experimental, or correlational
- sometimes the “standardized” measurement of attitudes or behaviors alone is not enough
to understand a phenomenon
- EX: very difficult w/ simple observation to know for certainty if a child who
has a conversation w/ a toy believes that the toy is alive or if they’re just
pretending
- then, examining the child’s perception of the world through their responses
• cornerstone of the design: to give the child a task and then, after task completion,
intervention the child to try to understand the processes involved
5
- the child is asked if the two balls (masses) are still the same, and is asked to justify
their statements
• these are the responses of a 7 and a half year old boy once one of the clay balls
has been transformed into a flat disk, and the other into a cylinder…
- Child: This one (cylinder) is heavier than the other because it is thicker
- Researcher: But why is it heavier?
- Child: Because it has more clay.
- Researcher: But earlier you told me that they had just as much clay!
- Child: Yes, I said that, but now there are more here than there, because it is
thicker.
- Developmental designs
• any design in which the independent variable is time!!!
- since time can’t be manipulated, this design is an ex post facto prospective design
• Three main developmental designs:
1) Cross-sectional developmental design
• EX: a research team wants to explore the trajectory of recent memory in people older than
65 yrs old
- in other words, they want to see if there’s some sort of evolution over time in older
adults’ recent memory
• idea to measure every 5 years recent memory (and covariates) in people 65+ for 20
years….
6
a) Cross-sectional developmental design
- we can compare the means of the people when they started the study (at 65) to
analyze generational differences (between subjects design)
• we can compare the means of people within the same subject at the
beginning and at the end of the study (at 65 and 85) to analyze the cohort
effect (within subjects design)
7
Unit 7: Survey designs
• one of the defining features of surveys: we have a population (or set of individuals
about whom we want to obtain information) and an impossibility of obtaining info
from all of them, or practical reasons against it
- SO!! we will only use a sample, or a subset of the total number of individuals!!!
- survey designs = non-experimental methods
• no experimental manipulation of variables
- we are dealing w/ a correlational design!!!
c) Time series design: observations of the same individual on one or more variables
many times over time
1
- Types of survey designs according to method of data collection
• Mail survey
- questionnaires sent by mail to the sample to be answered w/in a certain period of time,
and forwarded (typically by mail)
• Telephone survey
- an interviewer (or machine) questions from the telephone, using a partial or fully
standardized questionnaire or survey
• In-person survey
- surveys where the interviewer and interviewee meet face-to-face or at least an interviewer
is present
• Computer-based methods
- most use the internal, although we could include computerized test passing methods here
too!!
- Components of a survey
• A survey, even as a unitary research design, has distinct identifiable components:
1. Sampling techniques
3. Data collection
4. Statistical analysis
• Sampling procedures
- the first step in sampling: to be clear about the target population
• target population: ideal group of objects (or subjects) that will be subjected to the
survey design
2
• for this to be possible, the population must be known and manageable!!
(2) Whether the entire population is potentially available during the time the
survey is to be conducted
• SO!! distinguish btwn the target population and the survey population or
sample frame
(1) Purposive or opinion sampling: the researcher selects the sample and tries
to make it representative
- once a certain sample size is known or estimated, it’s necessary to randomly select
the cases that will be part of the sample
• necessary to have a list of all the subjects of the population in order to obtain a
sample from them
3
• Stratified sampling: researchers divide or classify different subjects into different
subpopulations or strata, and the perform simple random sampling w/in each stratum
- each individual must belong to a stratum, and each individual in that stratum will
have the same probability of being chosen to be part of the sample
• to form the strata, one or more variables are used that are of interest to the
researcher, and/or that are related to the objective of the study
- you may be interested in strata that are not defined by a single variable, but
rather a combination of several variables
• Cluster sampling: here, clusters, or sets, are defined such that they include 2+ of the
ultimate sampling units to be selected (people, for example)
- what is chosen at random is a random sample of clusters, and w/in each chosen
cluster no sampling is done, but all the target sampling units (people) are selected
• SO!! what’s chosen at random are the clusters, which are usually naturally formed
sets, and not the elementary units to finally be studied
• in some cases, translation skills are necessary, as when back translation is done
- in addition, there is now a tendency to consider cognitive aspects, such as the way in
which questions are understood, the ability to remember events, etc.
(b) To ensure that the questions we ask are understood in the same way, and
correctly, but all respondents
(c) To favor a high response rate (EX: a survey of 120 pages won’t favor a
high response rate!!)
4
(d) To improve the quality of the answers, eliminating biases (social
desirability, lying, mechanical answering, etc.)
• Data collection
- in general, the main methods of data collection are…
a) interviews conducted by interviewers
b) telephone surveys
c) self-administered surveys
d) group surveys
e) mail surveys
f) electronic surveys
• response rate: the percentage of selected sample units that actually answer the survey
and return it to the survey manager
- a low response rate may cause us to take longer to have our results (as we may
need someone superior to convince the sample to participate)
5
- different analysis may be used depending on the types of variables we are working
w/
• many types of variables are used in different statistical ways, leading us to use
multivariate models to work w/ many variables at once
- Data processing
• the first phase of data analysis begins when the data are entered into a database
- the order of the variables, their possible combinations, their transformations, their
selections, etc. are all aspects that can determine the quality of the results
(2) for quantitative variables: central tendency, variability, skewness, and kurtosis
6
• scatterplots can help us understand if variables are related to each other (high
Pearson coefficient)
• this leads us to understand the variables, as well as giving important information on the
type of analyses that will be necessary later (EX: parametric or non-parametric)
- we want to know if our data is “clean” to start w/ analysis, as there are multivariate
outliers
- descriptive statistics allow us to understand what fits better for our analysis
• multivariate description helps us to understand the associations based on
many variables
- where it’s needed to control for many variables and decide the type of
analysis
• when we split the relationship btwn men and women, we can observe that
neurotic women were more hostile against other women, while more
neurotic men were less hostile
7
- here, a distinction can be made btwn bivariate relationships (relationships of
variables taken in pairs) and multivariate relationship btwn variables, which many
involve the simultaneous analysis of a multitude of variables
• Interdependence methods:
- not distinguishing btwn dependent and independent variables
• to identify which variables are related, how they’re related, and why
(1) for quantitative variables: principal component analysis, factor analysis,
cluster analysis, and similar (EX: latent profile analysis) or multidimensional
scaling
• Dependence methods:
- assuming that the variables analyzed are divided into two groups: dependent and
independent variables
• to determine whether and how the set of independent variables affects the set of
dependent variables
8
(c) B = slope; how much Y changes w/ one-unit change in X
2. Basic or reduced report: a reduced, non-technical report of all of the above that can
be understood by a wide audience
9
(a) Categorical variables: only percentages (EX: for the variable sex, analysis
simply involves percentages)
ii) variability: grades in this subject are not highly variable bc at the
beginning the scores are not really good; low levels of variability are
going to make future predictions difficult (dependent variable will be
very homogenous in the future)
10
- important to know what’s the minimum and maximum of every variable if it’s a
Likert scale
- Average: knowing the mean grade for a course can help interpret student
performance, where if the course is known to be very challenging and the mean
grade is 90, this indicates that most students perform exceptionally well
• from this, you can identify who falls into the “excellent” category (those
scoring around or above 90) and what can be considered “standard” or
typical performance for the course
- Symmetry: can observe some negative symmetry in the results, but symmetry is
close to the mean grades
- Kurtosis: very low in all of them, but not in grades 4 (kurtosis = 14.1)
• SO!! will have problems in future research in the concrete variable
2. Obtaining a graphical display to visually observe these variables
b) Density plot
11
(median), percentiles (variability), max and min scores, and variability
• Pearson and Spearman are not that different, they provide similar results (but
Spearman’s suitable for non-parametric data!!)
• T-tests performed to test the association btwn some factor and the variables
- EX: Is sex a factor associated w/ grades?
12
• in this case, we observe that variances are homogenous and that there are no
gender differences at all (all p > 0.05)!!
• we only have one DV (mean grades from 0-100) and the predictors by which we
will predict the DV
- we follow a bivariate model: have to ask ourselves how many are statistically
associated to the p-value level
(1) Know which are the predictors and how to evaluate them
• we can also perform a t-test for independent samples to control for sex
- We observed that there is not a signifiant correlation btwn mean grades and sex
differences
• SO!! we focus on the predictors and if they are important and relevant or
not!!
13
• Important to take into account that…
- the signs say that there’s no direct correlation, BUT!! it doesn’t mean that
there’s no correlation— it can be a correlation!!
• there are 2 predictors that have no effect (sex and autonomy) but there are
small effects of the variables, so they add something to the overall prediction
(some are positive and others negative)
• these factors are related to each other, so we are going to calculate the
multivariate model!!
• slopes for each predictor tell us about the association w/ the DV controlled
by the rest of the predictors
- we observe that the overall prediction is poor (R² = 0.0459 < 0.05) and only
two predictors are significant (p < 0.05)
14
- collinearity statistics: when analyzing predictors, it’s important to check for
collinearity— this happens when 2+ predictors are highly correlated, where
there’s an overlap btwn predictors, making it hard to determine their individual
effects
• Variance Inflation Factor (VIF) to assess this: if it’s close to 10, potential
problem, where the value is significant and the factor is too heavily
associated w/ one or others (so we really don’t need this
factor!!)
- we observe that the Q-Q plot is not bad (all points are close to
the middle line)
15
5. Logistic regression (binary dependent variable)
• the dependent variable is not always quantitative: quite common that DV is binary
(something happens or doesn’t), as in our scenario (pass/fail)!!
• BUT!! w/ a binary variable, math isn’t easy— we may try to predict the
probability of passing the exam, or p(Y=1), but the functional relation w/ a
probability of several predictors is exponential…
• in order to linearize the relationship, logarithms are used, and this is the logistic
regression!!
(a) Positive B estimate: the probability of 1 (pass the exam) increases when X
goes up
(b) Negative B estimate: the probability of 1 (pass the exam) decreases when
X goes up
16
• Odd-ratios go from 0 to infinity…
(a) 1 means no relationship btwn the DV and predictor (same probability to
pass and fail)
(b) A value of 2+ means that when the predictor changes 1 unit, the
probability of 1 (pass) doubles!!
(c) If the odd ratio is 4, the probability of the DV (EX: passing the exam) is
multiplied by 4… if it’s 10, we multiply the chance by 10…
(d) 0 is the maximum negative association (the close the value is to 0, the
higher the negative effect, bc 1 means no effect) — the closer we are to 0,
the closer we are from the strongest negative association
• at the conceptual level, logistic regression works the same as linear regression, but
the results we get are in a scale that’s impossible to understand (logarithms or
prob. of 1/ prob. of 0)
17
• Statistical analysis in survey designs: Example 2
- survey of 709 Valencian high-school students about prediction of sexist attitudes and
behaviors
c) Age (quantitative)
h) Extraversion (quantitative)
i) Agreeableness (quantitative)
j) Conscientiousness (quantitative)
k) Neuroticism (quantitative)
l) Openness (quantitative)
- Statistical analysis:
1. Descriptive statistics
18
• we can focus on…
(a) the range from the minimum to the maximum in order to know the parameters
(b) skewness
(c) kurtosis
• we can observe the graphs and find things like that there’s more homogeneity
from 0-3 than from 3-6
- when looking a the violin box plot, we observe where most people are and how the
data is distributed
- if we split by gender, we observe that men have higher hostile sexism than women
19
• we can do it w/ all variables we are interested in, and we can split by other
variables (e.g., age)
- when observing the box plots of the living context (urban, metropolitan, or rural),
we observe that there are no big differences btwn them
2. Bivariate relations
• once we have checked the descriptive statistics, we start w/ the bivariate analysis (in
Jamovi: analysis — correlation matrix)
- we obtain the correlation matrix and look for the highest correlations
• we only have to look at the * because…
(a) * = p < 0.05
- exploration w/ a scatterplot:
(1) X = neuroticism
20
(3) regression line = linear
• we observe that neurotic women are more sexist that women who are not neurotic,
but that low neuroticism men are even more sexist!!
21
- we perform more tests
22
• in order to analyze the categorical and quantitative variables, we must perform an
ANCOVA but it’s very difficult so we will use quantitative regression
• we may change the reference people being examined (EX: change urban for
metropolitan in the context variable)
- reference group: the important group w/ which we want to compare the others
• if we analyze the estimated marginal means of the three variables (sex,
impulsivity, and context), we obtain these graphs comparing the DV (hostile
sexism) of each group!!
- when analyzing the significance of the predictors, we look at the p-values (must
be p < 0.05) and the standardized estimates
• the main predictor of hostile sexism is sex, since the p < 0.001 and has the
largest estimate (-0.7737), so it’s significant!!
23
Unit 8: Observational studies
What, how, who, when, and where to observe; and assessments of observations
- Simple research problem
• in 1923, Piaget observed that pre-school children often speak alone, even in a group
- in other words, they talk to themselves!!
• he called this egocentric speech
- theorizing that this speech was part of a developmental process, where children first
learn to speak and then understanding that speech is for communication
• later, Vygotsky, observing the same evidence, called it private speech, and
theorized the opposite
- Observation
• the action or process of carefully watching someone or something
- in order to make an observation scientific, something else is needed: systematic
observation!!
1
(5) Where to observe: natural setting or lab?
• What to observe
- researchers in a good observational study have to…
a) decide which dependent variable or outcome to be measured
• each observer needs to know how each behavior should be considered and
categorized
(2) Intra-rater (w/in-rater) reliability: how consistently the same rater can assign a
score or category to the same subjects; conducted by re-scoring video footage
2
• researchers as observers: must be decided if the researcher (the one that develops
the research and its theoretical framework) is going to be an observer
• Who to observe
- first thing to decide: the level of analysis
(1) Group level
3
• this time frame may be further divided into periods w/ and w/o observation
- EX: the chosen time frame is one week of teaching lessons in the first year of
high school in a particular group of students
• it’s decided that 5 measures w/ a time lap of 10 minutes will be taken each
day
- when a time frame is divided into different times of observation, this can
be done randomly or non-randomly (just as w/ sampling of subjects, but
sampling of time)
• Where to observe
- in the continuum btwn natural setting to laboratory conditions, it must be decided where
to observe
4
- Example of reliability (agreement)
• two individuals observe the occurrence of a behavior in 10 time intervals
- 3 possible categories:
1. Not happening
5
- Inter-observers (inter-rater) agreement
• Cohen’s kappa coefficient (κ, lowercase Greek kappa)
- statistic used to measure inter-rater reliability (and also intra-rater reliability) for
qualitative (categorical) items
• it’s a more robust measure than simple percent agreement calculation, as κ takes
into account the possibility that the agreement occurred by chance
• Validity
- Content validity
• whether the selection of behaviors in the code is a representative sample of the
phenomenon to be observed (expects)
- Construct validity
• the extent to which the observation code is congruent w/ the theory from which the
problem is formulated
- Criterion validity
• the degree of sensitivity of the observation code to variations in the phenomenon under
study (relationship to external measurements)