0% found this document useful (0 votes)
11 views85 pages

Desings

The document provides an overview of research designs in psychology, emphasizing the importance of scientific inquiry, the distinction between basic and applied research, and the formulation of hypotheses. It discusses the classification of variables, the necessity of control and validity in research, and outlines the steps involved in conducting scientific research. Additionally, it categorizes different research designs based on their objectives, such as explaining, predicting, or describing phenomena.

Uploaded by

Salma El Jari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views85 pages

Desings

The document provides an overview of research designs in psychology, emphasizing the importance of scientific inquiry, the distinction between basic and applied research, and the formulation of hypotheses. It discusses the classification of variables, the necessity of control and validity in research, and outlines the steps involved in conducting scientific research. Additionally, it categorizes different research designs based on their objectives, such as explaining, predicting, or describing phenomena.

Uploaded by

Salma El Jari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 85

Research designs in psychology: Unit 1.

Scientific Research

Introduction to research designs


- Science: derived from the Latin “scientia,” or knowledge
- Scientific inquiry: a way of comprehending nature (Thurstone, 1947)
• trying to find truth, where truth is a working hypothesis held until further notice, until it’s
replaced by another hypothesis

- Scientific orientation: the essence of such orientation is a critical attitude toward findings
a) expecting constant change and improvement

b) public nature

c) no certainty

d) willingness to further questioning our knowledge

e) comparing rival hypotheses

- Basic vs. Applied research


• Basic science: research in the pursuit of knowledge and understanding
• Applied science: research aimed at solving practical problems
- for some researchers, socio-behavioral research is inherently applied
• BUT!! it has some peculiarities…
(1) the object under study has free will

(2) cultural and temporal relativity of the laws

(3) uniqueness of each individual and non-repeatability of experience

(4) complexity of the object under study

(5) ethical issues when experimenting in humans

(6) reactivity and reflexivity of humans

(7) ability to “speak” w/ the object of study

1
- Definitions and variables
• Definition: a statement about the meaning of a work
- Theoretical definition: a scientifically meaningful concept or construct
• a scientific concept or construct has to be part of an implicit or explicit theoretical
framework that explains its relations to other concepts

- BUT!! in order to test hypotheses derived from theory, we need empirical


definitions…

- Empirical definition: relating a construct to reflective or formative indicators


a) Reflective indicators: caused by the construct

• EX: in Depression, indicators could include pessimism, suicidal ideations,


guilt, sleeping problems, lack of interest, irritability, lack of appetite, fatigue,
lack of concentration, lack of enjoyment, tearfulness, etc.

- need for covariation btwn variables…


b) Formative indicators: the cause of the construct

• EX: in relation to socio-economic status, indicators could include income,


education, and occupation

• indicators have to be measurable, and it’s them who conform the variables
- broadly speaking, scientific inquiry is the pursuit of relations among variables
• a variable must…
(a) have at least 2 values or categories (EX: sex—male/female)

(b) assign only one value to each element in the population at a given time

• Classification of variables
- Measurement perspective: categorical, semi-quantitative, quantitative
- Research design perspective: dependent (criterion), independent (predictor), moderator,
mediator, and control

2
- Problems
• Problem: a statement about the relations btwn 2+ variables, usually in a interrogative form
- problems can come in diff formats….
• EX:
- Format 1: What is the relation btwn school engagement and academic achievement?
What is the relation btwn sex and physical violence?

• no distinction btwn independent and dependent variables!!!


- Format 2: How do students of differing levels of engagement differ in their
academic achievement?
How do males and females differ in physical violence?

• no clear distinction btwn independent and dependent variables!!!


- Format 3: What is the effect of school engagement on academic achievement?
What is the effect of sex on physical violence?

• clear distinction btwn independent and dependent variables!!!


• once you have a research problem, you need two more things:
I. to know what is already known about this problem (literature review!!)

II. formulation of hypotheses

• Hypotheses and hypothesis testing


- Hypothesis: a conjectural statement about the relation btwn 2+ variables
• relationship neither implies nor fails to imply causality
- EX:
• Problem: What is the relation btwn school engagement and academic
achievement?
Hypothesis: There is a positive association btwn school engagement an academic
achievement.

3
• Problem: What is the effect of school engagement on academic achievement?
Hypothesis: There is a positive effect of school engagement on academic
achievement.

- hypotheses guide what to observe, what variables to relate w/ one another, and how to
relate them

• scientists have recognized the importance of developing several alternative hypotheses


about a problem of interest

- EX: Kepler made 19 hypotheses about the motion of Mars, calculated the results of
all of them, before finally establishing that a planet orbit is an ellipse

- Hypothesis testing: examining the evidence implied by them


• hypotheses are rejected if evidence is not consistent w/ them
• Statistical tests
- tools to assess the amount of evidence contained in the data regarding the tenability of the
hypotheses under consideration

• always complemented by effect sizes


- statistical tests dealing w/ the likelihood that the null hypothesis is true, while
broadly speaking, effect sizes focus on the substantive importance of the results

• Participants
- in order to statistically test hypotheses, must get observations
• observations in psychology usually mean getting participants!!
- Three necessary characteristics of the individuals to be observed:
(1) Representativeness: participants are representative if they resemble other
individuals in the population to be analyzed

• random sampling (seen in Statistics) makes samples representative of their


populations

4
(2) Suitability: the adequacy of the participants in relation to the phenomenon to
be studied

• EX: if we are interested in studying patients w/ depression, may be necessary


to have both depressed and non-depressed people in order to be able to
compare them

(3) Accessibility: the choice of participants must take into account the space and
time limitations of the research

• Variables (in depth)


- once we have a research problem, w/ the corresponding hypothesis, we should clearly
state what their role is in the research

• based on their role, variables may be…


a. Dependent: a variable we want to explain

- where we expect changes as a consequence of changes in other


(independent) variable(s)

b. Independent: the explanatory variables, causing the changes (variations) in


the dependent variable

- a supposed or assumed cause


- Yerkes-Dodson law (1908): relationship btwn arousal and performance
• performance as increasing w/ physiological or mental arousal, but only up to a point
- when levels of arousal become too high, performance decreases

5
- BUT!! variables are not always dependent/independent…
a) Strange variables: all variables that can influence the dependent variable(s),
despite not being the independent variable(s)

b) Confounding variables: a type of strange variable that’s masked w/ the levels of


the independent variable

• EX: heart attacks found to be more prevalent on Mondays


- plausible explanation is day of week causes heart attack
• BUT!! other potential explanations depend upon confounded variables
- maybe stress levels at work increase on Monday after a weekend (stress
being the confounding variable)

- poor habits (poor food, drugs and alcohol abuse) are more likely on
weekends and therefore on Mondays after these days a heart attack is more
likely (poor habits is confounded)

- Both strange and confounding variables must be controlled for!!!


• Other types of variables:
c) Mediators: variables that intervene in the causal process btwn an independent and
dependent variable

- EX: we may assume that social support from a teacher causes better academic
achievement in students

• BUT!! this may be due to the teacher’s behavior increasing students’


motivation, where this produces better achievement

6
d) Moderators: variables that affect the relationship btwn an independent variable
and a dependent variable

- EX: we may assume that group work in the classroom causes better academic
achievement, but ONLY if the social climate is positive (thought of as an
interaction)

- Control and validity


• central to any research design!!
- even more central for explanatory research than they are for predictive research
• Explanatory research: trying to explain the variability of 1+ dependent variables (DV)
by attributing it to its potential or presumed causes (independent variables, IV)

- as many potential variables may explain a certain phenomenon (DV), the researcher
needs to control for these other variables…

• Forms of control:
(a) Manipulation

(b) Elimination or inclusion

(c) Statistical

(d) Randomization

• Control
A. Manipulation

- where the researcher chooses the levels or categories of the IV the participants are
going to receive

• the researcher manipulates the levels of the IV to the participants

7
- EX:
Three doses of a drug: 0ml / 10ml / 20ml
Teaching methods for math: Online vs. traditional

• this type of control is only possible for certain variables and in certain types of
research designs (experimental and quasi-experimental)

- Uniformity in the implementation: crucial for manipulation


• i.e. how to manipulate must be very well explained so that any other
researcher can replicate it themselves (replicability in science)

B. Elimination or inclusion

- scientists try to identify and isolate strange variables that may confound the causal
effects of the IV

• can either eliminate these variables from the design OR include them
- eliminating a variable is to make it constant
• including a variable is to estimate its effects w/in the analyses of the research
design

- EX: a scientist wants to evaluate the effect that three doses of an


antidepressant have on the level of depression, suspecting that treatment
with this antidepressant may be differentially effective in men than in
women

- SO!! she wishes to control for sex as a possible confounding variable…


(a) Control through elimination by studying only depressed
women

(b) Control though inclusion by adding sex in the design as a


factor (men/women)

- BUT!! elimination, as opposed to inclusion, affects the generalizability of the effects


• elimination compromising the generalizability, as only one category (a constant)
is included

8
C. Statistical control

- strange variables can be quantitative, apart from categorical


• when this type of variable is included in the analysis of the research design, it’s
usually called statistical control, and the confounding variable is called
“covariate”

- SO!! statistical control consists of the inclusion of a strange variable as a


covariate

• EX: a research wants to see if adding graded exercises at the end of each
lesson affects academic achievement

- SO!! they compare students in a class where grades are dependent only on
the final exam w/ students where exercises are added to the final grade

- thinking that motivation could function as a confounding variable, the


researcher includes motivation toward the subject as a covariate

D. Randomization

- when there’s a large number of potential confounders, these confounders have not
been registered (measured), or they’re simply not known… elimination, inclusion,
or statistical control are not adequate anymore!!

• here, if possible, randomization may be used to control for confounders


- Randomization: a random process in which participants (or units measured) are
allocated randomly to different levels of the IV

• securing that in the long run, all confounders will be equally located in the
different categories of the IV— so their effect will be constant across those
levels

- randomization is done w/ random numbers, tables, and/or generators (e.g.,


statistical software, etc.)

- EX: a scientist, evaluating the effect of three doses of an antidepressant on


depression levels, suspects that there may be many confounding variables
SO!! she decides to randomly allocate her participants into different
groups w/ a random number generator

9
• Validity
- highly related to control
• Campbell and colleagues initially considered internal and external validity
- later, they distinguished btwn statistical conclusion validity, construct validity,
internal validity, and external validity

• statistical conclusion validity as related to data analysis (statistics)


- construct validity to psychometrics
• internal and external validity are those properly related to research design…
- Internal validity
• the adequacy of assertions regarding the causal connection of the independent
variable(s) w/ the dependent one(s)

- i.e. how sure we are that the relation is causal


• Major threats to internal validity:
(a) History: any factor occurring during the research that may alter the results

- EX: a longitudinal study linking conservatives to voting for right-wing


parties— during the study, a great scandal on unfair commissions of a
right-wing party is uncovered

(b) Maturation: changes in the people analyzed during the time of the study

- EX: a study on intellectual capabilities in early adolescents


(c) Testing: people measured several times may change their performance

- EX: practice
(d) Instrumentation: any change in the measurement instruments, including
malfunctions

- EX: differential functioning of tests among subgroups— where some tests


work well only for certain populations

(e) Regression toward the mean: the tendency for extreme outcomes to be
followed by more average results; occurring usually when two variables

10
are not perfectly correlated—while high performance on one occasion is
likely to be followed by above-average performance, it’s not usually as
extreme the next time

(f) Selection: experiments conducted w/ the pre-existing groups (quasi-


experiments) may lead to group differences affecting the results

(g) Mortality: any attrition of people during the research (missing data for any
reason)

(h) Diffusion or imitation of treatments: people exposed to a treatment


(including controls) learn about another treatment that is not meant for
them

- External validity
• the generalizability (stability) of the research findings to different populations, times,
or settings

- EX: a researcher has found that a program promoting transactional leadership


improves sales of a multinational company, and wonders if this finding can be
generalized to medium-sized companies

• Interaction effects: factors jeopardizing external validity


- when a relation btwn 2+ variables changes as a consequence of another factor (e.g.,
a change in population, setting, time, etc.)

- Scientific research
• Steps in all research (no matter what design is used):
1. Formulation of the problem

2. Formulation of a hypothesis (relationship btwn variables)

- hypothesis as the experimenters’ judgment of the results they expect to find in their
research

• must be precise and testable (empirical)


- usually tested in terms of a null hypothesis

11
3. Data collection

- sample selection
• considering two issues:
(a) representative (sampling)

(b) large enough to make adequate conclusions (power)

- Choice of research design


(1) What variables are going to be manipulated—measured/controlled?

(2) How are they going to be quantified?

(3) Choice of apparatus and materials

4. Application of appropriate statistical analyses

- statistical analysis beginning w/ measurement of raw data and its description


• choice of analytical technique
- checking conditions of use (statistical assumptions)
• application of the analytical technique
5. Inference of the type of relationship btwn variables

- assessment of statistical analyses in relation to the hypothesis


6. Preparation of a report

- conclusion, discussion, and assessment of the generalizability of results


• Research design
- Design: strategy or sequence of decisions about how to collect, sort, and analyze data
• involving decision-making about how to solve a research problem
- broadly speaking, research designs can be divided according to the nature of relations
they try to tap:

a) Designs that try to EXPLAIN: experimental and quasi-experimental

b) Designs that they to PREDICT: correlational, surveys

12
c) Designs that try to DESCRIBE: observational, case studies

• all research designs try to accomplish the MAX-MIN-CON principle


(1) MAXimize the effect of the IV on the DV (primary systematic variance)

(2) MINimize error variance (random error)

(3) CONtrol the effects of confounding variables (secondary systematic variance)

- Experimental designs
• the IV is manipulated and subjects (participants) are randomized
- Quasi-experimental designs
• the IV is manipulated but individual subjects are NOT randomized
- Non-experimental (correlational) designs
• there is no manipulation and no randomization

- Final note about research designs


• analytical (statistical) solutions for experimental and quasi-experimental designs are almost
the same

- analytical (statistical) solutions for non-experimental designs are usually different—


involving more complex statistical multivariate models

13
Unit 2 and 3: Experimental designs

Treatment-control post-measure only, RCT, etc.


- Experiment
• a study where at least one variable is manipulated, and units are randomly assigned to the
different levels or categories of the manipulated variable(s)

- in order to better understand the main types of experiments, we will follow the notational
system by Campbell and Stanley (1963):

a) O stands for an observation, a measurement

• numbered subscripts represent different time points (some subjects are


observed in a DV)

b) X stands for treatment (manipulated IV)

• numbered subscripts represent different categories or levels of the IV


c) Different rows refer to different groups

• EX: two groups, one treated and one untreated (control) measured before and after the
treatment would be represented as…

- O1 X O2
O1 O2

• representing an experimental design where we have an observation of a


randomized group of people before the treatment X (O1) and then after the
treatment (O2)

- second row represents the control group w/o treatment application


• at the same time point, we measure the experimental and control group
i) as groups are randomly created, expectation that both groups will be
naturally balanced, where O1 in both rows will be similar (no previous
differences in the pretest)

ii) since row 1 is treated (applying X), expectation for a change in row 1

1
iii) can test mean differences btwn O1 and O2 in the same row to observe
if there are any changes due to the treatment

iv) expecting no change btwn O1 and O2 in the second row (there could
be some changes that occur naturally in the IV), so!! must compare
changes btwn both groups

v) if they are equal in O1, any change in O2 is due to the treatment

Types of experimental designs

- Treatment-control post-measure only


• aka pre-experimental design: heavily relying on good randomization
- X O2
O2

• in this design, only the measures after the treatment has ended are taken
- in the long run, randomization guarantees internal validity
• equality of groups before treatment can’t be analyzed: will never know if groups
are different before treatment bc of missing information (why it’s called a pre-
experiment!!)

- there’s usually a control group (w/ no treatment), BUT!! sometimes this isn’t
possible and the group is simply a comparison group

• Statistical analysis: Festinger and Carlsmith


- college students performing very dull tasks for an hour are then randomly paid either 1 or
20 euros to lie saying that the task was very interesting to a prospective participant

• testing if different payments would differently affect attitude on the task


- after the experiment ends, attitude is measured in both groups
• goal of testing Festinger’s cognitive dissonance theory
- NOTE: There is no control group in the sense of no treatment
• BUT!! there is manipulation, randomization, and only post-measure

2
- they can’t have a pretest bc there’s no attitude toward the task (DV) before
the task is done

- Hypotheses
a) Alternative hypothesis (H1): In the attitude test, the mean of the 20-euro group
will be higher than the mean of the 1-euro group.

b) Null hypothesis (H0): The mean of the 20-euro group will be less or equal to the
mean of the 1 euro group.

• we will treat the IV as a categorical variable (two groups) even if its actually
quantitative (amount of money paid)

- instead of using 1 and 20 to name the categories, we could use 0 and 1


• before making any statistical analysis, we can observe that in general the scores in
the attitude tests are higher in the 1-euro group than in the 20-euro group

- where the 1-euro group shows a better attitude toward the task (contrary to
cognitive dissonance theory)

3
- Simple statistical analysis: Comparison of two means
• two groups that are independent of each other (and all the people involved)
- can perform a t-test (Student’s t-test)
• Assumptions in a t-test:
1. Normality in the dependent variable (quantitative variable) — use Mann-
Whitney in case of extreme non-normality

2. Homogeneity of variance of both groups (2 categories) — use Welch in


case of heterogeneous variances

3. Independent observations

• different means for each group!!

• the t-value we are interested in is always the first value

4
- we have 2 values for t, and in this case we are lucky and they’re the same (bc
the standard deviations of both groups are similar, both groups have similar
variance)

• BUT!! usually they are different and we want the first one!!
• variances are homogeneous (equal)

- Levene test: p = 0.731 > 0.05


• regarding the significance of difference btwn groups (p-value in the t-test)…
(a) if p < 0.05, alternative hypothesis (H1)

(b) if p > 0.05, null hypothesis (H0)

- since p = 0.000 < 0.05, H1 is true!!


• large effect size: Cohen’s d = 2.57

- Categorical IV post-measure only


• X1 O2
X2 O2
X3 O2

- similar to the previous design


• only adding more groups!!!

5
- in the long run, randomization guarantees internal validity
• there may or may no be a control group
• Statistical analysis: Janis and Feshbach
- giving talks to high school students about dental hygiene
• 3 conditions in manipulated IV:
- information on dental hygiene followed by consequences of bad hygiene w/ either…
(1) strong fear appeal

(2) moderate fear appeal

(3) low fear appeal

• students randomly allocated to these three conditions


- after the talk, they filled out a questionnaire to assess how much information
they could recall about the talk

• NOTE: No control group in the sense of no treatment


- BUT!! there’s manipulation, randomization, and only post-measure
- Testing assumptions:
a) Normality is assumed, as p = 0.924 > 0.05 in the
Shapiro-Wilk test

b) Q-Q plot shows reasonable normality

c) Since p > 0.05 in Levene test, variances can be


considered equal

- Descriptive statistics:

6
• we can observe means and SD for all conditions, and the graph of means w/ confidence
intervals

- One-factor ANOVA
• F-statistic is statistically significant (p < 0.05)

- SO!! we consider that there are mean differences


• Tukey’s post hoc test (homogeneous variances): differences among all conditions

- Note: if variances are heterogeneous, Games-Howell test is adequate!!


• when normality assumption is greatly violated and/or the dependent variable is
ordinal w/ few categories, a non-parametric test is possible (Kruskal-Wallis test)

- since p < 0.05, there are statistically significant mean differences!!!

• and again, post-hoc test shows differences among all conditions

7
- Pretest-posttest treatment and control design
• aka randomized experimental/clinical trial
- O1 X O2
O1 O2

• in the long run, randomization guarantees internal validity


- there is a control group!! although it may only be a comparison group
• Statistical analysis: Koenig et al.
- using a pretest-posttest control group design (RCT), they studied the effect of a yoga
program on the classroom behavior of autistic children

• children were randomly assigned to either receive the yoga program or their standard
morning routine

- the DV is a measure of children’s disruptive behavior in the classroom


• NOTE: There is a “control” group in the sense of “usual treatment”
- also, there’s manipulation, randomization, and pre- and post-test
- Three main analytical strategies (statistical tests) to be used:
1) T-test comparing treatment and control in the gain scores (= the subtraction of
pretest and posttest for each person)

2) Mixed ANOVA treatment x time

3) ANCOVA w/ pretest scores as the covariate

• Gain scores, in this case, will be the pretest MINUS posttest


- bc that way a positive score indicates less disruptive behaviors (which is positive as
an effect of the treatment), while negative scores indicate more disruptive behaviors

• EX: subject 6—> 18 - 12 = 6


- SO!! down by 6 disruptive behaviors
after treatment

8
- BUT!! problem w/ gain scores: not simple to generalize!!!
• w/ two means of two gain scores, can calculate the differences when there are two
measurement times

- but it’s a problem when there’s three or four time of measurement…


(a) when there is a non-linear effect (if the treatment group recovers to the
control group at the third time): there’s an improvement in time 2, but
comes back to initial scores in time 3

(b) when there’s an initial improvement (better scores in time 2) but then
comes back to the initial scores and then worsens (worse scores in time 4)

• the mean of gain scores will not indicate anything in these cases
- and it’s quite standard to have more than 4 times of measurement in any
experiment: this is why we have 3 options of statistical tests and can’t use
t-test alone!!!!!

- T-test comparing control and treatment on gain scores


• expecting that subjects in the treatment group present less
disruptive behaviors after treatment

- the first 5 subjects show no effect of the treatment (gain


difference = 0), as they have the same number of
disruptive behaviors in the two measurement times

• subject 6 shows an improvement in behavior after the


yoga treatment (gain score = 6 disruptive behaviors
less than before treatment)

- if we look at the control group, the first person


shows an improvement of 1 disruptive behavior less
than in the first measurement time (even if this
person has not had a treatment)

• in general, the control group shows a mix btwn


better and worse behaviors in time 2 (negative
numbers indicating more disruptive behaviors in time 2)

- while in the treatment group, we only have positive numbers!!

9
• we observe significant differences btwn treatment and control bc…

- p < 0.05
- Cohen’s d = 1.01 (large effect size)
• variables can be considered normal (normality is reasonable)

- p = 0.269 > 0.05 in the Shapiro-Wilk normality test


• variances can be considered equal (homogeneous)

- p = 0.170 > 0.05 in the Levene test


• results regarding group means indicate that we can expect an improvement of 3 in the
treatment group

- where the yoga program is effective in decreasing the number of disruptive


behaviors

• while the control group doesn’t show any difference!!

10
- Mixed ANOVA (treatment x time)
• called mixed ANOVA bc there’s repeated measure of the same people AND people in
the two groups are independent samples

- need some measure of effect size to know how effective the program is
• EX: obtain the following results indicates that there’s an effect of the yoga
program (decreasing the number of disruptive behaviors in the treatment group),
while the control group increased the number of disruptive behaviors

- if we calculate a new mean (different from the mean µ of each of the groups;
EX: µ treatment = 20, µ control = 10), we can obtain the mean between the two
groups in T1 = µ T1 = (20+10)/2 = 15

• no matter which group you’re in, the mean of T1 is 15


- if we are interested in the differences btwn T1 and T2 (no matter what
group you’re in), we’ll observe that there’s no difference btwn T1 and T2

- in this case, it makes no sense bc the mean in T2 will also be 15,


indicating no change btwn the two measurement times, BUT!! we have to
understand that ANOVA analysis uses this type of mean in its
calculations…

• Three results obtained w/ ANOVA (analysis of variance):


1. Effect of time: testing if there are differences btwn time points (e.g., T1 vs. T2),
regardless of group

2. Effect of group: testing if there’s differences btwn groups, regardless of time


points

3. Interaction effect (Time x Group): testing whether there’s an interaction btwn


time and group

- the primary focus of our analysis!!


• specifically, we want to know if the two groups behave differently over the
two time points

- H0: There’s no interaction effect, where all four means (T1 for treatment,
T1 for control, T2 for treatment, and T2 for control) are equal

11
- H1: The means are different, suggesting an interaction btwn time and
group

- the effects of time and group alone are not of primary interest
• instead, we need to examine the four specific means involved in the interaction
btwn time and group to determine if there’s a differential effect

• Conditions for ANOVA (Assumptions):


(1) Homogeneity of variances

(2) Normality of the DV

(3) Sphericity: correlations have to be equal (checked when there’s more than two
repeated measures!!)

(4) In mixed ANOVA, we have repeated measures of the same people!!

• First, name factors according to the experiment and place each variable in its place…

• Next, test assumptions:


a) Sphericity necessarily holds, where NaN is non-
applicable (there’s only two repeated measures,
so there’s only one correlation)

b) Homogeneous variances, as p = 0.7 > 0.05 in


Levene test

c) Normality reasonably holds, as shown in the Q-Q plot

12
• we are interested in the time x group interaction…

- F-value = 7.65 + associated p-value = 0.010 < 0.05


• SO!! significant mean differences!! rejecting the null hypothesis
- the four means are somehow different, where the yoga program has an effect
• BUT!! we still don’t know what effect that is…
- need to conduct a post-hoc test!!
• Post-hoc test:

- 1st: we expect both groups to be equal in T1 bc subjects are randomly assigned to


each of the groups

• T1: p = 0.999 > 0.05


- SO!! null hypothesis (H0) is true, meaning there are no significant differences
btwn groups in T1 and randomization worked!!

- 2nd: we expect differences btwn groups in T1 bc we have applied a yoga program


• T2: p = 0.002 < 0.05
- SO!! H0 is not true, and there are significant differences btwn groups after the
treatment!!

13
• when looking at the graph, we observe that there’s a change in the treatment group (a
decrease in the number of disruptive behaviors), while there’s no change in the control
group (constant number of disruptive behaviors)

- results are as expected: the yoga program decreases the number of disruptive
behaviors!!

• an average autistic child under the yoga program is going to have 3 less disruptive
behaviors in T2

- ANCOVA
• basic idea of ANCOVA: we can test mean differences while controlling for a
quantitative variable

- condition for this statistical test: both groups are measured in both time points!!
• O1 X O2
O1 X O2

- this way, we can compare means to know if any difference btwn the two time
points is due to the treatment

• can test if the treatment is effective or not: expecting less disruptive


behaviors in O2 in the treatment group, and no change in control group

14
• Testing assumptions:
(1) Variances are homogenous, as p = 0.072 > 0.05
in Levene test

(2) Normality holds, as p = 0.341 > 0.05 in


Shapiro-Wilk test

• When applying the ANCOVA test, we observe that there are group differences (3
disruptive behaviors less) in time 2, even after controlling for time 1 as a covariate, as
p = 0.009 < 0.05

- when we have two groups and two time points, the best things we can do is an
ANCOVA analysis to see if the results match

• usually, we will find differences!!


- if we perform the three types of tests (t-test, mixed ANOVA, and ANCOVA)
and results don’t match, we have to take the one that tells us there are
differences!!

• if t-test and mixed ANOVA tell us that the treatment is not effective, but
ANCOVA test shows differences, we have to trust ANCOVA bc it has the
main power to test differences

- ANCOVA as the most sensitive test to group differences


• if there’s 3-4 groups and 2 time points, can still use the three
techniques if it’s a randomized clinical trial…

a) can compare mean groups in two time points (not exactly


a t-test, but the same idea)

b) can still use mixed ANOVA to make the analysis in two time points w/ a categorical
variable w/ 3+ categories (the number of groups)

c) can use O1 as the covariate and compare means in time 2 (ANCOVA)

15
- BUT!! this is not the case if the randomized
clinical trial is expanded

• EX: 16 means to be compared


- when we have a RCT w/ more than two time points, we can’t use t-test or ANCOVA
• can only used mixed ANOVA!!!!

- Treatments and concomitant variables


• C X1 O2
C X2 O2
C X3 O2

- in this experimental design, there are only posttest measures of several treatments (or
treatments and control)

• BUT!! there’s at least 1 concomitant variable to control for…


- the concomitant variable (C) may be either categorical or quantitative (at least
semiquantitative; EX: age, sex, …

• this variable is included when we suspect that randomization may not have fully
controlled for it influence, where it potentially acts as a confounding variable

- the type of concomitant variable affects which statistical test we should use:
(a) if C is categorical, factorial ANOVA
—> similar to mixed ANOVA, but w/o repeated measures for the
same subjects— measuring distinct groups (e.g., male, female, and
treatment groups) only once, such as after treatment at time O2

(b) if C is quantitative, ANCOVA w/ C as a covariate


—> EX: if C is IQ

• Statistical analysis: Depression treatments


- clinical psychologist testing which treatment is best: psychotherapy, pharmacological
treatment (antidepressants), or both

• she has 60 depressive participants, half of whom are men and half women (30 and 30)

16
- placing them in 3 treatment groups, always following the
rule that there should be an equal number of men and women
in each of the three treatment groups (stratification by
gender)

• randomization + stratifying by gender: randomly allocating the same number of


men and women to each group (10 men and 10 women per group)

- 2 concomitant variables (confounders) the clinical psychologist wants to


control for: age and gender

• NOTE: There is manipulation and randomization, but only posttest measures


- Factorial ANOVA
• if we don’t consider gender, we have 3 means of interest (one mean per treatment
groups)

- BUT!! if we consider gender, we now have 6 means of interest (the mean of each
gender in each of the treatments)

• sex (2) * treatment (3)


- considering three variables:
(a) dependent variable: depression (number of
depressive symptoms)

(b) manipulated/independent variable: treatment type


(categorical variable w/ 3 groups)

(c) concomitant variable: gender (categorical variable w/ 2 groups)

• Testing assumptions:

17
• in a simple one-factor ANOVA w/ only the dependent variable (DV) and independent
variable (IV), we would compare three means, one for each level of the IV

- BUT!! since we also have a concomitant variable (gender), we now have 6 means to
consider…

• ANOVA will address three main questions:


1) Is there a difference btwn genders, regardless of treatment?

- we can examine the overall means for each gender, ignoring treatment
effects
—> providing 2 additional means of interest: the total mean for depressive
symptoms in each gender (Xm and Xw) — representing the effect of
gender only

2) Is there a difference btwn treatments, regardless of gender?

- we can look at the overall means for each treatment, ignoring gender
differences
—> providing 3 additional means of interest: the means for each treatment
(Xp, Xd, and Xb) — representing the effect of treatment only

3) Does the effectiveness of each treatment differ btwn men and women?

- to answer this, we examine all 6 means


—> addressing the interaction btwn treatment and gender (whether certain
treatments work differently for men than for women)

18
• Effect of gender only:
- after the treatment, women have a much lower number of
depressive symptoms than men

• all treatments together work better for women than men


• Effect of treatment only:
- no matter the gender, the treatment that works better (the
only showing less depressive symptoms) for both genders is
the combination of both, followed by psychotherapy, with
antidepressants being last

• drugs alone don’t work for any gender (the worst


treatment for both genders!!)

• Interaction btwn treatment and gender:


- the combination of both = the better treatment for
women

• BUT!! not much difference btwn treatment for


men…

• looking at the table of data, we observe significant differences

• when looking at the numbers in post-hoc tests, we observe that means are different

19
- treatments are working differently for men and women (interaction)
• BUT!! the effect of each variable is also significant
- the effects of gender only and treatment only are both signifiant (p-values less
than 0.05)

- ANCOVA
• now considering the statistical control of age
- as age is a quantitative variable, we have independent
variables “treatment” (w/ 3 levels— psychotherapy,
antidepressants, and both) and a covariate (age)

• SO!! must resort to ANCOVA…


• between-subjects 3 (treatment) ANCOVA
- when looking at the between-subjects effects, we observe that there are neither
significant effects for the covariate (p = 0.992 > 0.05) nor for the treatment (p =
0.063 > 0.05)

- we can estimate the means after controlling for the


covariate age

• these are the estimated means after controlling for age w/ a confidence interval
(CI) of 95%

20
- multiple comparisons (Bonferroni in SPSS and Post-hoc in Jamovi) do not show
mean differences

• where the value of p is greater than 0.05 in all cases

- Solomon design
• all the aforementioned designs may be further complicated…
- EX: adding covariates, factors, time points, etc.
• one such complication: the so-called Solomon design
- O1 X1 O2
O1 O2
X1 O2
O2

• combining two designs into one:


1. Treatment-control post-measure only, AND

2. Pretest-posttest treatment and control

- we will not analyze such design as it can be done in different ways


• but basically it can be analyzed w/ several ANOVAs depending on the
research question to be answered

- EX: if we want to know if pretest affects treatment effects in posttest, a t-


test comparing both groups under treatment will do the trick!!

21
- Within subjects designs
• sometimes, control groups are not needed bc subjects are their own control
- they’re compared in the experimental conditions w/ themselves
• here, all subjects are subjected to all treatment conditions (repeated measures of the
participants)

• Statistical analysis: Word frequency


- a language psychologist wants to replicate results from another research group that points
out that words of a higher frequency are recognized faster than those of a lower
frequency

• to test this effect, he recruited 36 student volunteers from the faculty, experimenting on
them w/ the tachistoscopic presentation of both words and pseudo-words

- the subject had to answer as quickly as possible whether it was a word or not
• all subjects saw the same words and pseudo-words, only in a different order
- words could be of high, medium, or low frequency
• pseudo-words were chosen to have the same number of letters as the words
- for each subject in each experimental condition, the average number of
milliseconds it took to answer whether it was a word was measured

- One factor within design: Repeated measures ANOVA of one factor

22
• Testing assumptions:
- Sphericity doesn’t hold: p < 0.05
- Normality is reasonable, as seen in the graph
• when making the statistical analysis (repeated measures ANOVA), we observe that
there are mean differences in the three conditions

- all values of p = < 0.001 (< 0.05)


• we can also observe that the 18.5% of variance is explained by word frequency
- where ƞ2 in the Greenhouse-Geisser is 0.185
• Post-hoc tests
- showing differences among all means (p = < 0.05)

• we can also observe it in the graph of means w/ 95% CI


• NOTE: Carryover effects
- in within subjects designs, people may experience carryover effects, where their
responses in later treatments are influenced by earlier ones

23
• often resulting in lower or altered levels as they progress through treatments (making it
challenging to conduct a within subjects design)

- BUT!! these designs are still possible if you don’t expect strong carryover effects
• still have to control for them though by counterbalancing the order of treatments
- counterbalancing: administering treatments in all possible orders across
participants to minimize the impact of carryover on the results

Calculating a priori sample sizes

- nowadays, many academic journals, research agencies, ethical committees at the university,
etc. ask for previous calculations of the sample size needed in order to get a determinate effect

• in order to calculate a priori sample size, we need…


a) a significance level (the standard 0.05)

b) the statistical power the research wants to achieve (usually 0.80 in psychology, or
0.90)

- we can get an estimated a priori effect size…


(1) from meta-analysis (effect that other people obtained in other research)

(2) from a leading experiment in the field

(3) using Cohen’s d (or other) conventions

• Cohen’s d = a scaled use of eta squared (ƞ2) to indicate effect size


- Cohen’s d of 0.5 = medium effect size
- Cohen’s d > 1.35 = large effect size
• NOTE: there’s a formula to change ƞ2 to F
- G*power
• there are different packages, spreadsheets, etc. to calculate sample sizes
- a very good and free one is G*power
• we have to enter the 3 values we need

24
- EX: imagine you have designed a pretest-posttest experiment w/ a control group and
treatment (RCT)

• you want to be able to detect an effect size measured by f = 0.25 or Cohen’s d =


0.5, for an alpha of 0.05 and power of 0.80

- now you need to know what statistical tool you are going to use….
i) independent samples t-test on gain scores

ii) mixed ANOVA

• Independent samples t-test on gain scores

• Mixed ANOVA

- NOTE: we may select F tests here

25
- Interpreting effect sizes

Small Medium Large Used in…

Cohen’s d 0.2 0.5 0.8 t-tests

Eta squared (ƞ2) 0.01 0.06 0.14 ANOVA

Partial eta squared 0.01 0.06 0.14 Effects of IV are


(ƞ2p) partialled out
Cohen's f 0.10 0.25 0.40 One-way ANCOVA

• partialling out: ƞ2p calculates the effect size for each IV while controlling for the effects of
other variables in the model

- allowing us to isolate the unique contribution of each IV to the variance in the DV

Summary

- Treatment-control post-measure only


• IV = categorical w/ 2 groups; DV = quantitative
- t-test of independent samples:
• check assumptions…
a) if non-homogeneity of variances —> Welch-t

b) if non-normality of independent variable —> Mann-Whitney

c) if all assumptions are met —> Student-t

- Pretest-posttest treatment and control design (Randomized clinical trial)


• we can perform any of the main analytical strategies (depending on the number of
categories)…

1. For two categories (treatment and control): use a t-test to compare gain scores, or the
differences btwn the two time points (pretest and posttest)

2. For more than two treatment groups: use ANOVA to assess differences across
multiple groups

26
3. For repeated measures (e.g., pretest and posttest for each group): use mixed ANOVA
to test both main effects (time and treatment) and the interaction effect btwn them

(1) Define your variables and identify factors (e.g., time and treatment)

(2) Check if F (btwn two variables) is greater than 1, w/ p < 0.05 and ƞ2p > 0.06
(this indicates a significant effect!!)

(3) If assumptions are met, run a post-hoc test (e.g., Tukey’s), and if not, use a
non-parametric test; look for p < 0.05 to find significant mean differences

4. For controlling additional continuous variables (covariates): use ANCOVA

- look at F and p-value to determine significance and check if differences remain


significant after controlling for covariates

- Treatment and concomitant variables


• only posttest measures of several treatments
1. If C is categorical (ex: gender): factorial ANOVA, btwn subjects

(1) Test assumptions

(2) Look at F and p-value to observe if there are significant differences

(3) Post hoc (Ptukey or non-parametric) in treatment*gender to identify where


the differences lie

2. If C is quantitative (ex: age): ANCOVA, btwn subjects

(1) If assumptions are not met, there is no non-parametric alternative available…

(2) Look at p and F to check if there are significant differences when we


introduce age as a covariate

(3) Control for other categorical variables (EX: sex) by setting them as fixed
factors, and check if treatment remains significant

(4) Post hoc (Ptukey)

27
- Within subjects design
• we perform a repeated measures ANOVA
a) Name variables (factor) and check assumptions

- variances have to be equal bc people are the same in all time points!!!
b) Look at F, p, and eta square to see if there are significant differences and the effect size

c) Post-hoc (Ptukey) to see if there are differences among all means (time points)

28
Unit 4: Quasi-experimental designs

Pretest-posttest w/ quasi-control in a different dependent variable or from a


previous cohort, and interrupted time series design
- Quasi-experiment
• best defined in comparison to experiments…
- an experiment has both manipulation and randomization of the observation units
• a quasi-experiment has manipulation, BUT!! randomization is not applied to
individuals observation units

- rather, natural groups of observation units (pre-existing groups; e.g., classes of


students, athletic teams, wards of patients, etc.) are randomly assigned to different
conditions

• SO!! all experimental designs mentioned in units 2+3 may be considered quasi-
experimental if observation units are not individually randomized!!

- where the statistical analysis of these designs do not vary!!!


• NOTE: less internal validity than in an experiment…
• Graphically, quasi-experiments are represented w/ a dashed line
- EX: the pretest-posttest w/ control groups turns into a pretest-posttest designs w/ quasi-
control groups or non-equivalent control group

• nevertheless, there are some “specific” quasi-experimental designs…

- Types of quasi-experimental designs


• Pretest-posttest w/ quasi-control in a different dependent variable
- this design is rare to find, but relatively useful in behavioral therapy

1
• participants are “treated” for a particular behavior (dependent variable), while in these
same participants, another behavior that has not been treated is measured and used as a
control (similar to a within subjects design!!)

- Y is the dependent variable (behavior typically treated)


- Z as another behavior that is not treated
- Example: Oppositional defiant disorder— a repetitive pattern of negative, defiant, and
disobedient behavior, often directed against authority figures

• we have 20 children w/ this disorder in a psychology clinic who have to be treated


- they are going to be treated w/ behavior modification therapy, specifically token
economy

• the therapy is directed toward the behavior “arguing w/ adults” (the dependent
variable, Y)

- Y is measured in all children before therapy and after therapy ends


• additionally, the behavior “doing things to annoy or upset other children” is
measured in pre and posttest, but is not treated

- there is manipulation, but no randomization: we are treating a group of


children w/ the disorder (a group that has been created previous to the
study)

- NOTE: one of the main problems of this designs is that the variable not being treated
needs to be similar to the one of interest

• obviously this can lead to covariation


- where one of the goals of the design is to find a variable similar enough to variable
of interest that does not covariate…

2
• Pretest-posttest w/ quasi-control from a previous cohort
- this is also a very specific design

• including “treated” participants AND a previous cohort of similar participants,


measured earlier, but not treated (acting as controls)

- Example: A school is thinking about eliminating paper materials and instead adopting
multimedia teaching tools (tablets, computer, pdfs, exercises on computers, …)

• before they make all necessary changes, they decide to implement this strategy for one
year in a first year math class

- they get the marks at the beginning and end of the year, and compare them to the
marks of the same first year classroom, but from the previous year (which used
traditional paper materials)!!!!

• this previous year acts as a control group


• Interrupted time series design
- in this type of design there is no control group
• BUT!! observations are extended (repeated measures over time)
- usually involving more than 3 pretest measures and more than 3 posttest measures

• can be further complicated, for example by doing an…


(a) interrupted time series w/ non-equivalent
control group

(b) interrupted time series w/ quasi-control in


a different dependent variable

- and so on, and so forth…

3
Unit 5: Single case experimental design

ABA, Multiple treatment reversal, Multiple baseline, Time series, etc.


- Single case design
• emerged in parallel to the development of learning theories, and the behavioral therapy that
went w/ it

- usually employed in clinical psychology (but also in other disciplines…)


• How can we get control over confounders in order to be able to call this case design
experimental?

(1) Single case is their own control (w/ repeated measures taken)

(2) We take many measures in each step of the design

(3) The researcher can “reverse” the treatment

- these many measures make us confident about the trends


• this design is useful when its extremely difficult to get subjects (EX: very rare
diseases)

• Common features of all case designs (aka Reversal designs)


a) The dependent variable (y-axis) is measured repeatedly over time (x-axis) at regular
intervals

b) The study is divided into distinct phases and


participant is tested under one condition per phase

c) Conditions are usually marked w/ capital letters: A,


B, C

d) The case is measured first in one condition (A: usually


baseline or no treatment), then another condition (B: usually treatment), and finally
the original condition (A)

e) Change from one condition to the next is not fixed, the researcher waits until the
participant’s behavior in the condition becomes fairly consistent (steady state)

1
- Basic single-subject research design is the ABA design
• phase A, a baseline (no treatment) is established for the dependent variable
- the baseline phase is a kind of control condition
• when a steady state is reached, phase B begins as the researcher introduces the
treatment

- again, the researcher waits until the measures reach a steady state
• finally, the researcher removes the treatment and again waits until the
dependent variable (A) reaches a steady state

• the basic ABA design can be extended w/ the reintroduction of the treatment (ABAB)
and another return to the baseline (ABABA), etc.

- the amount of generalization is in doubt unless we observe that the effect of the
treatment works among individuals

• where sometimes the treatment may work for one individual but not for others
- “Simpler” single case designs
• characterized by not having reversal and are not considered “proper” experiments
- two types of designs that are not experiments but are single-case designs:
(1) AB design: no reversal to the baseline condition

(2) BA design: no reversal to the treatment condition

• EX: in cases such as a suicidal patient, treatment may be immediately


necessary, preventing an initial baseline period

- sometimes, immediate treatment is required, so it’s not possible to first establish a


steady bases

• in these cases, a basic BAB design is used


- this design can be extended, ex: BABA
• these kind of designs are considered “proper” experiments
- NOTE: for single case designs to be considered “proper” experiments
there must be at least one reversal!!

2
- Multiple treatment reversal designs
• single case designs, such as those already mentioned, can be extended to several
“treatments”

- first, a baseline phase (A) is established till steady state is accomplished


• followed by separate phases in which different treatments are introduced
- EX: a researcher may establish a baseline of studying behavior for a disruptive
student (A), then introduces a reward for attention treatment (B) and then a new
phase is introduced with a mild punishment for lack of attention treatment (C)

• then, we reverse the treatment going back to A (baseline)


- finally, treatments are reintroduced but in a different order to control for
carryover effects

• SO!! overall we get an ABCACB design, that can be extended

- Multiple-baseline designs
• single case designs can be changed to suit several “participants”
- potential problem w/ the reversal design is that sometimes it’s impossible to reverse bc…
a) the treatment is working, where its unethical not to treat

b) the dependent variable does not return to the baseline (the treatment has lasting
effects)

• we may prove the effects of the treatment establishing a baseline


for each of the several participants, and the treatment is then
introduced for each one

- in essence, each participant is tested in an AB design


• the key to this design is that the treatment is introduced at a
different time for each participant, where if the dependent
variable changes when the treatment is introduced for one
participant, it may be a coincidence

3
- BUT!! if the dependent variable changes when the treatment is introduced for
multiple participants (especially when the treatment is introduced at different
times for different participants), then it is extremely unlikely to be a
coincidence

• SO!! the treatment is effective!!

- Time series
• all single case designs have something in common: many measures are taken in different
time laps

- SO!! these data may be considered time series


• in statistics and econometrics, time series analysis is usually made w/ AutoRegressive
Integrated Moving Average (ARIMA) models

- these models are fitted to time series data either to better understand the data or to
predict future points in the series (forecasting)

• ARIMA models are pretty difficult, so we won’t enter them here!!!


- what we need in order to compare conditions (baseline A, treatment B, baseline
A, …) is a t-test of paired repeated samples, since we have pretest-posttest
measures of the same subject at different times

• we simply compare the means!!!


• Examples
- Time series of diseases of the nervous system (Spain, 1980-2017)
• observed trend: curvilinear increase

4
- this is the statistical decomposition of this time series w/ ARIMA methodology:

• We observe…
(1) small changes in time rather than a big change across timeline

(2) the general increase of the trend

(3) seasonal changes (e.g., observing in the trend that there are more diseases in
winter than summer)

(4) the peaks of random error (particular points where data change for no reason
— we must investigate these points!!)

- Another example w/ real data (although it’s not a single case)


• Several people are measured in “search for meaning of life” at the beginning of the
COVID-19 lockdown (14 measurements) and after lockdown is over

- time laps are not exactly the same, and this must be managed statistically
• Latent Growth Modeling: the methodology to establish changes

- we clearly observe a progressive decrease in the meaning of life during the


lockdown and then a progressive increase when it ends

5
Unit 6: Ex post facto designs

Retrospective ex post facto, prospective ex post facto, clinical method, and


developmental designs
- Ex post facto designs
• non-experimental (correlational) research designs!!
- no manipulation and no randomization of subjects or groups!!!!
• Types:
a) Retrospective ex post facto designs

(1) Simple retrospective

(2) Retrospective w/ quasi-control

b) Prospective ex post facto designs

(1) Simple prospective

(2) Factorial prospective

c) Clinical method

d) Developmental designs

- Retrospective ex post facto designs


• Simple retrospective
- ex post facto: Latin for “from a thing done afterward”
• in this case, the DV we want to analyze has already occurred
- the researcher finds that when they arrive to investigate the relationships, everything
has already happened!!!

• EX: imagine that a researcher is interested in potential explanations of suicide


- we only know that someone has committed suicide (completed) when this
person dies

1
• we only have a DV that has happened (the person has committed suicide)
- Ex: Durkheim is one of the fathers of sociology, and in 1897, he analyzed in retrospect
about 26,000 cases of completed suicides

• he offered an examination of how suicide rates at the time differed across religions
- suicidal rates for Catholics were lower than for Protestants (theorized that this was
due to stronger forms of social control and cohesion in Catholics)

• additionally, he found that suicide was less common among women than men,
more common among single people that those who were romantically partnered,
and less common among those who had children

- further, he found that soldiers commit suicide more often than civilians and
that, curiously, raters of suicide are higher during peacetimes than they are
during wars

• all these predictions (observations) are done once that suicides are already
committed!!!

- Main problems w/ this particular type of design


1. Dependent variable (DV) is not a variable, it’s a constant

• we don’t have a comparison group, we don’t have similar people to this control
group, we have a constant

2. The potential independent variables (IV) have to be


recovered after the event has been produced

• this means that only already recorded data are


available

• Retrospective w/ quasi-control group


- first, the group that has the value of interest in the DV is located
• EX: people w/ a disease
- this group is called the key group— natural groups, created bc they previously have
specific values in the DV

- second, a new group w/ a different value in the DV is located

2
• EX: people w/o that disease
- this group is called the quasi-control group
- very important to understand that this is NOT an experiment
• the subjects are not randomly allocated to the different categories of the IV
- rather, they are selected bc of their values in the DV
• once the people in the quasi-control group is found, it’s important to match them
w/ the key group characteristics in the confounding variables (CV)

- then, the IV are measured for both groups


- Example of matrix:
a) DV: depression

b) CV1: gender (1 = male, 2 = female)—


we match this variable to control for this
confounder

c) CV2: age— we match this variable to control


for this confounder

• we may match variables w/ as many concomitant variables as we can, and then


measure the IV we’re interested in

- we are doing a retrospective ex post facto design, but we are trying to control for the
confounding variables

• we are trying to create a “control” group, but we can’t be as confident as w/


randomization

- EX: such a research design was used by Plutchik and van Praag (1990)
• they got data on 20 adolescents that had committed suicide
- they measured a lot of variables related to the suicidal act
• later, they looked for friends of these adolescents w/ similar characteristics and
backgrounds and tried to measure the same variables

3
- Prospective ex post facto designs
• Simple prospective
- sometimes what “has already happened” are the independent variables (IV), but the
dependent variable has yet to occur

• thus, the order is inverted


- first, we recover people that have some independent variable (cause) we want to
know something about

• later on, we will study these people regrading the dependent variable
- EX: we want to analyze the effects of motivation on academic achievement
• we select a group of students at the beginning of the semester and measure
them in motivation (IV), grouping them into high-medium-low motivation
groups

- later on, we recover their grades (DV)


• Factorial prospective
- this type of research design simply adds more than one IV to the design
• EX: we want to analyze the effects of motivation on academic achievement
- we select a group of students at the beginning of the semester and measure them in
motivation (IV), grouping them into high-medium-low motivation groups

• later on, we recover their grades (DV)


- but the researcher may also be interested in other IVs such as previous grades
in the subject under study, number of hours of study, general intelligence, etc.

• if the “new three IVs” are measured quantitatively, an ANCOVA may be


used w/ these three new IVs as covariates

4
- Clinical method
• again, the clinical method is non-experimental, or correlational
- sometimes the “standardized” measurement of attitudes or behaviors alone is not enough
to understand a phenomenon

• a case or problem may help to understand this type of research design


• EX: Jean Piaget was a Swiss psychologist known for his work on child development
- Piaget’s theory of cognitive development and epistemological view are called “genetic
epistemology”

• Piaget was mainly interested in the processes of knowledge acquisition in children


- BUT!! how can we understand how they obtain knowledge?
• pure observation is not enough bc we can’t see the internal processes involved in
something like thinking!!

- EX: very difficult w/ simple observation to know for certainty if a child who
has a conversation w/ a toy believes that the toy is alive or if they’re just
pretending

• SO!! Piaget proposed clinical examination…


• the clinical method included questioning and carefully examining their responses in order to
observe how the child reasoned according to the questions asked

- then, examining the child’s perception of the world through their responses
• cornerstone of the design: to give the child a task and then, after task completion,
intervention the child to try to understand the processes involved

- in essence, the research design is a structured intervention!!


• BUT!! the internal validity of this type of research is quite low…
• Prototype task from Piaget’s lab: Task of substance preservation
- child is offered a ball of clay, and is asked to create one exactly like it
• once the child considers that the two balls are the same, one of these balls is deformed
by elongating it, squashing it, or breaking it into several pieces

5
- the child is asked if the two balls (masses) are still the same, and is asked to justify
their statements

• these are the responses of a 7 and a half year old boy once one of the clay balls
has been transformed into a flat disk, and the other into a cylinder…

- Child: This one (cylinder) is heavier than the other because it is thicker
- Researcher: But why is it heavier?
- Child: Because it has more clay.
- Researcher: But earlier you told me that they had just as much clay!
- Child: Yes, I said that, but now there are more here than there, because it is
thicker.

- Developmental designs
• any design in which the independent variable is time!!!
- since time can’t be manipulated, this design is an ex post facto prospective design
• Three main developmental designs:
1) Cross-sectional developmental design

2) Longitudinal developmental design

3) Sequential developmental design

- may also be thought of as a type of survey design where time an independent


variable

• EX: a research team wants to explore the trajectory of recent memory in people older than
65 yrs old

- in other words, they want to see if there’s some sort of evolution over time in older
adults’ recent memory

• idea to measure every 5 years recent memory (and covariates) in people 65+ for 20
years….

6
a) Cross-sectional developmental design

• we take a sample of different age-groups at the same time


- potential confounder: differences in memory may be due to
generational differences (where people aged 65 are from a
diff generation than people aged 85 in 2022) and not due to
the evolution of memory

b) Longitudinal developmental design

• we take a group of people and measure


them at different time points (following the same subjects across their age)

- potential confounder: cohort effects


• the effects found in memory are specific to the people in that particular age
cohort

- in order to avoid these confounders (generational differences and cohort effects), we


may further complicate the design…

c) Sequential developmental design

• combination of the two former designs:


we follow people from 65 to 85

- we measure groups of the same people at their same age periods


• when we finish w/ one cohort of people (finishing one longitudinal design),
we start with/ a new group of people using the same life period (a second
longitudinal study)

- kinda like an extended longitudinal study in a sequential way (ex: in 5


year steps)

- we can compare the means of the people when they started the study (at 65) to
analyze generational differences (between subjects design)

• we can compare the means of people within the same subject at the
beginning and at the end of the study (at 65 and 85) to analyze the cohort
effect (within subjects design)

- these kinds of designs can only be done w/ a large national effect!!!

7
Unit 7: Survey designs

Sampling techniques, data collection, statistical analysis, etc.


- Survey
• a system for collecting information to describe, compare, or explain knowledge, attitudes,
and/or behaviors

- more formally, relatively systematic and standardized procedures to collect information


from individuals, or entities, through the questioning of systematically identified samples
of individuals

• one of the defining features of surveys: we have a population (or set of individuals
about whom we want to obtain information) and an impossibility of obtaining info
from all of them, or practical reasons against it

- SO!! we will only use a sample, or a subset of the total number of individuals!!!
- survey designs = non-experimental methods
• no experimental manipulation of variables
- we are dealing w/ a correlational design!!!

- Types of survey designs according to time


• Cross-sectional
- units of information (usually participants) are measured only one time
• Longitudinal
- units of information are measured more than one time
a) Panel designs: information collected from the same subjects at different time points

b) Repeated cross-sectional designs: different samples of subjects at each time point,


measured on the same variables

c) Time series design: observations of the same individual on one or more variables
many times over time

1
- Types of survey designs according to method of data collection
• Mail survey
- questionnaires sent by mail to the sample to be answered w/in a certain period of time,
and forwarded (typically by mail)

• Telephone survey
- an interviewer (or machine) questions from the telephone, using a partial or fully
standardized questionnaire or survey

• In-person survey
- surveys where the interviewer and interviewee meet face-to-face or at least an interviewer
is present

• Computer-based methods
- most use the internal, although we could include computerized test passing methods here
too!!

- Components of a survey
• A survey, even as a unitary research design, has distinct identifiable components:
1. Sampling techniques

2. Design of questions, scales, and questionnaires

3. Data collection

4. Statistical analysis

• Sampling procedures
- the first step in sampling: to be clear about the target population
• target population: ideal group of objects (or subjects) that will be subjected to the
survey design

- when the target population is clear, two questions remain…


(1) Whether you want to survey the entire population

2
• for this to be possible, the population must be known and manageable!!
(2) Whether the entire population is potentially available during the time the
survey is to be conducted

• SO!! distinguish btwn the target population and the survey population or
sample frame

- Most types of sampling can be distinguished:


a) Random or probabilistic sampling

• the probability of extraction of all the elements of a population is known, or it’s


possible to calculate it

- random sampling is the only procedure that’s 100% scientific


• Three main types of probabilistic sampling schemas:
(1) Simple random sampling

(2) Stratified sampling

(3) Cluster sampling

b) Non-probabilistic sampling, which can be divided into…

(1) Purposive or opinion sampling: the researcher selects the sample and tries
to make it representative

(2) Non-normative or incidental sampling: the sample is taken for


convenience or circumstances

- Probabilistic sampling schemes


• Simple random sampling: where a priori all the elements of the population have the
same probability of occurrence, whether finite or infinite

- once a certain sample size is known or estimated, it’s necessary to randomly select
the cases that will be part of the sample

• necessary to have a list of all the subjects of the population in order to obtain a
sample from them

3
• Stratified sampling: researchers divide or classify different subjects into different
subpopulations or strata, and the perform simple random sampling w/in each stratum

- each individual must belong to a stratum, and each individual in that stratum will
have the same probability of being chosen to be part of the sample

• to form the strata, one or more variables are used that are of interest to the
researcher, and/or that are related to the objective of the study

- you may be interested in strata that are not defined by a single variable, but
rather a combination of several variables

• Cluster sampling: here, clusters, or sets, are defined such that they include 2+ of the
ultimate sampling units to be selected (people, for example)

- what is chosen at random is a random sample of clusters, and w/in each chosen
cluster no sampling is done, but all the target sampling units (people) are selected

• SO!! what’s chosen at random are the clusters, which are usually naturally formed
sets, and not the elementary units to finally be studied

• Design of questions, scales, and questionnaires


- in the process of choosing the most suitable questions, there is a mixture of an adequate
knowledge of the literature together w/ a knowledge of the contents to be covered and
how to ask them

• in some cases, translation skills are necessary, as when back translation is done
- in addition, there is now a tendency to consider cognitive aspects, such as the way in
which questions are understood, the ability to remember events, etc.

• Aims of good questions:


(a) To maximize the relationship btwn the answers from the subjects and the
contents we intend to measure

(b) To ensure that the questions we ask are understood in the same way, and
correctly, but all respondents

(c) To favor a high response rate (EX: a survey of 120 pages won’t favor a
high response rate!!)

4
(d) To improve the quality of the answers, eliminating biases (social
desirability, lying, mechanical answering, etc.)

• Data collection
- in general, the main methods of data collection are…
a) interviews conducted by interviewers

b) telephone surveys

c) self-administered surveys

d) group surveys

e) mail surveys

f) electronic surveys

• response rate: the percentage of selected sample units that actually answer the survey
and return it to the survey manager

- non-response may be due to a multitude of factors


• including the temporary absence of the respondent, refusal to collaborate, lack of
knowledge or interest of the objectives of the survey, distrust in the anonymity
and/or confidentiality of the answers, failure to reach them through the
information collection method in place, sensitivity of the survey topics, etc.

- a low response rate may cause us to take longer to have our results (as we may
need someone superior to convince the sample to participate)

• proxy-responses: going to the person and asking them directly


• Statistical analysis
- for the results of a survey to be tractable, informative, and ultimately useful, they need to
have some kind of quantitative (statistical) treatment

• in all statistical analyses, there are some phases…


(1) Data processing

(2) Description of variables: Univariate and Multivariate

(3) Relations among variables

5
- different analysis may be used depending on the types of variables we are working
w/

• many types of variables are used in different statistical ways, leading us to use
multivariate models to work w/ many variables at once

- Data processing
• the first phase of data analysis begins when the data are entered into a database
- the order of the variables, their possible combinations, their transformations, their
selections, etc. are all aspects that can determine the quality of the results

• this is done to assure that data is in its correct form


- EX: if someone has said they are 199 years old, this is probably a mistake
• we process the data to assure that responses make sense!!
- Description of variables: Univariate and Multivariate
A. Univariate description

- carried out variable-by-variable


• to clearly know the properties of each of the variables
(1) for categorical variables: absolute or relative frequencies (percentages)

(2) for quantitative variables: central tendency, variability, skewness, and kurtosis

- we understand the % of women and men in the same


• remember that if there’s no variability, no predictions can be made
- where the transformation of variables is sometimes necessary!!
B. Multivariable description

- recalling information on 2+ variables


• requiring more sophisticated techniques, such as double-entry tables, multivariate
graphs, factorial correspondence analysis, etc.

- this gets worse when more than 2 predictors are used

6
• scatterplots can help us understand if variables are related to each other (high
Pearson coefficient)

• this leads us to understand the variables, as well as giving important information on the
type of analyses that will be necessary later (EX: parametric or non-parametric)

- we want to know if our data is “clean” to start w/ analysis, as there are multivariate
outliers

• we can suddenly realize that a quantitative variable is better described as ordinal


(yes/no, rather than many levels)

- descriptive statistics allow us to understand what fits better for our analysis
• multivariate description helps us to understand the associations based on
many variables

- where it’s needed to control for many variables and decide the type of
analysis

- EX: prediction of sexism, particularly against women


• 2 main types: hostile and benevolent in terms of behavior
- Univariate variable: young men are more hostile against women than other
women

• this gives us an idea that the problem is related to gender


- BUT!! personality and culture may affect
- if we study the relationship of neuroticism w/ sexism, we see that there’s no
relationship (this is good, as neuroticism can’t be changed)

• when we split the relationship btwn men and women, we can observe that
neurotic women were more hostile against other women, while more
neurotic men were less hostile

- Relations among variables


• the third phase of statistical analysis is broadly based on the analysis of relationship
btwn variables

7
- here, a distinction can be made btwn bivariate relationships (relationships of
variables taken in pairs) and multivariate relationship btwn variables, which many
involve the simultaneous analysis of a multitude of variables

• we can also distinguish btwn methods of interdependence (based on associations)


and dependence (based on explanation or prediction)…

• Interdependence methods:
- not distinguishing btwn dependent and independent variables
• to identify which variables are related, how they’re related, and why
(1) for quantitative variables: principal component analysis, factor analysis,
cluster analysis, and similar (EX: latent profile analysis) or multidimensional
scaling

(2) for categorical (or non-quantitative) variables: correspondence analysis, log-


linear models, cluster analysis (or more current models such as latent class
analysis) or multidimensional scaling

• Dependence methods:
- assuming that the variables analyzed are divided into two groups: dependent and
independent variables

• to determine whether and how the set of independent variables affects the set of
dependent variables

(1) for quantitative dependent variables: multiple linear or non-linear regression,


survival analysis, ANOVA, ANCOVA, MANOVA, MANCOVA, canonical
correlation, or structural equation models

(2) for categorical dependent variables: logistic regression, Poisson regression,


discriminant analysis, or conjoint analysis

• if we are lucky, the independent variables may be independent among them,


but in some cases, they can be related among them

- EX: interpretation of the slope (B) in regression models (Y = A + BX)


(a) Y = dependent variable (e.g., final grade)

(b) X = independent variable (e.g., hours studied)

8
(c) B = slope; how much Y changes w/ one-unit change in X

• if B = 5, studying one extra hour increases the grade by 5 points

- Report of the survey


• each survey must end in a final report
- usually, there are…
1. Technical report: composed of background, introduction, survey details, design,
sampling, method and psychometric aspects, results, conclusions, recommendations,
and references

2. Basic or reduced report: a reduced, non-technical report of all of the above that can
be understood by a wide audience

3. Executive report: an “easy-to-read” summary of the report of about 3-15 pages

• Statistical analyses in survey designs: Example 1


- survey of Dominican students at the Universidad Autónoma de Santo Domingo (UASD)
• sampling: all available students in a specific degree; selection of a larger survey
- variables measured in the survey include:
(1) Demographics and performance: age, sex, grades in social psychology,
history of psychology, psychobiology, statistics, and mean grades

(2) Psychosocial factors: classroom climate (for mastery and execution),


autonomy, relation, competence, satisfaction w/ the university, and teacher
support for student autonomy

(3) Outcome variable: binary grades (pass/fail)

• the choice of independent (predictors) and dependent (outcome) variables


depends on the research question

- BUT!! no matter the research design of choice, understanding each variable’s


characteristics is essential before analysis…

9
(a) Categorical variables: only percentages (EX: for the variable sex, analysis
simply involves percentages)

(b) Quantitative variables: must more specific

i) Central tendency: mean, median, and mode

ii) Symmetry: assessing skewness (e.g., grades may exhibit negative


skewness)

iii) Kurtosis: evaluating peakedness (e.g., some variables may show


extreme kurtosis)

- EX: Grades in social psychology (quantitative variable):


• by looking at the density plot, we can extract the
following conclusions to define the variable (of
course we will have to contrast our conclusions w/ the
obtained numbers to verify them!!):

i) the average of people passing the exam is


higher than the mean

ii) variability: grades in this subject are not highly variable bc at the
beginning the scores are not really good; low levels of variability are
going to make future predictions difficult (dependent variable will be
very homogenous in the future)

iii) Symmetry: negative skewness is observed

iv) Kurtosis: extremely unlikely that a very asymmetrical variable follows


a kurtosis, but needs to be verified numerically

1. Obtaining descriptive statistics of all variables

• Analyzing the variables for future research problems:

10
- important to know what’s the minimum and maximum of every variable if it’s a
Likert scale

• EX: in teacher support, the scale goes from 1 (minimum) to 5 (maximum),


where 1 means “I never feel teacher support” and 5 means “I feel a lot of
teacher support”

- Average: knowing the mean grade for a course can help interpret student
performance, where if the course is known to be very challenging and the mean
grade is 90, this indicates that most students perform exceptionally well

• from this, you can identify who falls into the “excellent” category (those
scoring around or above 90) and what can be considered “standard” or
typical performance for the course

- Standard deviation: 0.7, so we know that variability is going to be low


• where most of the students obtained similar grades bc they’re very
concentrated

- Symmetry: can observe some negative symmetry in the results, but symmetry is
close to the mean grades

- Kurtosis: very low in all of them, but not in grades 4 (kurtosis = 14.1)
• SO!! will have problems in future research in the concrete variable
2. Obtaining a graphical display to visually observe these variables

- visual aids are crucial for understanding variable distributions


a) Histograms: showing score concentration (e.g., grades btwn
80-100), very visual!!

b) Density plot

c) Box plots: very important!! indicating central tendency

11
(median), percentiles (variability), max and min scores, and variability

d) Violin plots: good!! combining box plots w/ density info

e) Q-Q plots: very visual, assessing whether data follow a


normal distribution, BUT!! when you have many
subjects and they are disperse, they are not so clear…

• NOTE: w/ categorical variables, don’t make a box plot!! histograms are


better here!!

3. Obtaining data about the associations btwn variables

• standard Pearson’s correlations


- ranging from -1 to +1, with 0 indicating no association,
negative values an inverse relation, and positive values a
direct, positive relation

• we can obtain the same correlations matrix but w/ a graphical


display of the relationships btwn the variables (bivariate)

- here, we are only interested in the relationship of mean


grades w/ other variables (predictors) of interest (again,
Spearman correlations)

• Pearson and Spearman are not that different, they provide similar results (but
Spearman’s suitable for non-parametric data!!)

• T-tests performed to test the association btwn some factor and the variables
- EX: Is sex a factor associated w/ grades?

12
• in this case, we observe that variances are homogenous and that there are no
gender differences at all (all p > 0.05)!!

- we can also perform a MANOVA for multiple


dependent variables, which can yield similar
results to individual t-tests in a single analysis!!!

• we only have one DV (mean grades from 0-100) and the predictors by which we
will predict the DV

- we follow a bivariate model: have to ask ourselves how many are statistically
associated to the p-value level

(1) Know which are the predictors and how to evaluate them

(2) For the quantitative variables, we need to follow this procedure:

• we can do it one by one to measure significant, but it’s better to measure


everything together (regression —> correlation matrix)…

• Example 1: Controlling for sex—


- Important: we must put the “mean grades” as the first one so that we can only
focus on the correlations on the first column

• we can also perform a t-test for independent samples to control for sex
- We observed that there is not a signifiant correlation btwn mean grades and sex
differences

• SO!! we focus on the predictors and if they are important and relevant or
not!!

13
• Important to take into account that…
- the signs say that there’s no direct correlation, BUT!! it doesn’t mean that
there’s no correlation— it can be a correlation!!

• there are 2 predictors that have no effect (sex and autonomy) but there are
small effects of the variables, so they add something to the overall prediction
(some are positive and others negative)

• these factors are related to each other, so we are going to calculate the
multivariate model!!

4. From bivariate association to multivariate prediction

• there are several potential predictors for the grades


- grades are quantitative
• so!! we may use the linear regression model here!!
• Linear regression (Quantitative dependent variable):
- predicting scores in the DV (Y) w/ a linear composite of the predictors (which
may be quantitative, qualitative, or a mixture of both!!)

• slopes for each predictor tell us about the association w/ the DV controlled
by the rest of the predictors

- we observe that the overall prediction is poor (R² = 0.0459 < 0.05) and only
two predictors are significant (p < 0.05)

• there are no problems of collinearity, as VIF does not approach 10

14
- collinearity statistics: when analyzing predictors, it’s important to check for
collinearity— this happens when 2+ predictors are highly correlated, where
there’s an overlap btwn predictors, making it hard to determine their individual
effects

• Variance Inflation Factor (VIF) to assess this: if it’s close to 10, potential
problem, where the value is significant and the factor is too heavily
associated w/ one or others (so we really don’t need this
factor!!)

- we observe that the Q-Q plot is not bad (all points are close to
the middle line)

- we also need the adjustment of the model, R² is a bias


estimate (always a high number)

• related to the formula, so if we introduce more predictors, even if they don’t


make any change, the R² will be higher

• that’s why we need the adjusted or corrected R² !!!


- we should also put the standardized estimator to see if it’s statistically
significant

15
5. Logistic regression (binary dependent variable)

• the dependent variable is not always quantitative: quite common that DV is binary
(something happens or doesn’t), as in our scenario (pass/fail)!!

- we may not be interested in predicting grades (quantitative), but rather who


passes or fails the subject (0 or 1)

• BUT!! w/ a binary variable, math isn’t easy— we may try to predict the
probability of passing the exam, or p(Y=1), but the functional relation w/ a
probability of several predictors is exponential…

• in order to linearize the relationship, logarithms are used, and this is the logistic
regression!!

- w/ such a linear model, slopes (o B coefficients) can be “easily” understood:


each B means that the change made in the logarithm of p/1-p for a unit change
of the X variable

(a) Positive B estimate: the probability of 1 (pass the exam) increases when X
goes up

(b) Negative B estimate: the probability of 1 (pass the exam) decreases when
X goes up

• unfortunately, a “standard mortal” can’t easily figure out what “a change in


the logarithm” means…

• SO!! we normally use a transformation of the B estimates: Odd-ratios


- interpretation of the Odd-ratios is easier…

16
• Odd-ratios go from 0 to infinity…
(a) 1 means no relationship btwn the DV and predictor (same probability to
pass and fail)

(b) A value of 2+ means that when the predictor changes 1 unit, the
probability of 1 (pass) doubles!!

(c) If the odd ratio is 4, the probability of the DV (EX: passing the exam) is
multiplied by 4… if it’s 10, we multiply the chance by 10…

(d) 0 is the maximum negative association (the close the value is to 0, the
higher the negative effect, bc 1 means no effect) — the closer we are to 0,
the closer we are from the strongest negative association

• at the conceptual level, logistic regression works the same as linear regression, but
the results we get are in a scale that’s impossible to understand (logarithms or
prob. of 1/ prob. of 0)

- in order to understand those results, we need odd-ratios to understand the


meaning of the effects

• statistical software makes it relatively easy to calculate and interpret logistic


regression:

i) Estimates of the amount of variance explained = R²

ii) Odd ratios: in this case, only two are significant

17
• Statistical analysis in survey designs: Example 2
- survey of 709 Valencian high-school students about prediction of sexist attitudes and
behaviors

• variables in the data base include:


b) Living context (a categorical variable or factor)

c) Age (quantitative)

d) Sex (categorical variable or factor)

e) Do you consider yourself an impulsive person? (binary indicator yes/no)

f) Hostile sexism (quantitative)

g) Benevolent sexism (quantitative)

h) Extraversion (quantitative)

i) Agreeableness (quantitative)

j) Conscientiousness (quantitative)

k) Neuroticism (quantitative)

l) Openness (quantitative)

m) Critical thinking (quantitative)

n) Need for closure (quantitative)

- Statistical analysis:
1. Descriptive statistics

18
• we can focus on…
(a) the range from the minimum to the maximum in order to know the parameters

(b) skewness

(c) kurtosis

- when considering categorical variables, we need frequency tables!!


- we obtain the histograms:

• we can observe the graphs and find things like that there’s more homogeneity
from 0-3 than from 3-6

- when looking a the violin box plot, we observe where most people are and how the
data is distributed

- if we split by gender, we observe that men have higher hostile sexism than women

19
• we can do it w/ all variables we are interested in, and we can split by other
variables (e.g., age)

- when observing the box plots of the living context (urban, metropolitan, or rural),
we observe that there are no big differences btwn them

2. Bivariate relations

• once we have checked the descriptive statistics, we start w/ the bivariate analysis (in
Jamovi: analysis — correlation matrix)

- we obtain the correlation matrix and look for the highest correlations
• we only have to look at the * because…
(a) * = p < 0.05

(b) ** = p < 0.01

(c) *** = p < 0.001

- we focus on the valence of the correlations (positive or


negative)

• EX: negative correlation btwn openness and hostile


sexism

• the professor marks that maybe sex interacts w/ some


predictors differently for men and women

- exploration w/ a scatterplot:
(1) X = neuroticism

(2) Y = hostile sex

20
(3) regression line = linear

(4) if we add the group by sex

• we observe that neurotic women are more sexist that women who are not neurotic,
but that low neuroticism men are even more sexist!!

- this information will be useful when designing an intervention


• the relationship btwn neuroticism and sexism is an interesting relationship
mediated by sex

- Next, we will perform a one-way ANOVA:

• nothing seems remarkable, but now we split by sex and… differences!!

21
- we perform more tests

• in the one-way ANOVA, p-values are off the charts


3. Multivariate prediction

• now we start w/ the multivariate prediction


- categorical variables are sex, impulsivity, and context
• DV = hostile sexism — between subjects!!
- we perform an ANOVA
• looking at the descriptive statistics of ANOVA, we find
significant interactions (p-value)

• then, we look at the effect size by looking at ƞ2


- regarding the effect of context on hostile sexism and dividing by sex, we observe
that regardless of the context, women are lower than men in hostile sexism

22
• in order to analyze the categorical and quantitative variables, we must perform an
ANCOVA but it’s very difficult so we will use quantitative regression

- in Jamovi: regression — linear regression

• we may change the reference people being examined (EX: change urban for
metropolitan in the context variable)

- reference group: the important group w/ which we want to compare the others
• if we analyze the estimated marginal means of the three variables (sex,
impulsivity, and context), we obtain these graphs comparing the DV (hostile
sexism) of each group!!

- when analyzing the significance of the predictors, we look at the p-values (must
be p < 0.05) and the standardized estimates

• the main predictor of hostile sexism is sex, since the p < 0.001 and has the
largest estimate (-0.7737), so it’s significant!!

23
Unit 8: Observational studies

What, how, who, when, and where to observe; and assessments of observations
- Simple research problem
• in 1923, Piaget observed that pre-school children often speak alone, even in a group
- in other words, they talk to themselves!!
• he called this egocentric speech
- theorizing that this speech was part of a developmental process, where children first
learn to speak and then understanding that speech is for communication

• later, Vygotsky, observing the same evidence, called it private speech, and
theorized the opposite

- where speech first develops as a communication tool, and then starts to be


individual, in order to produce thinking

• SO!! radically dissimilar theories using observation to try to understand a


similar finding!!

- Observation
• the action or process of carefully watching someone or something
- in order to make an observation scientific, something else is needed: systematic
observation!!

• observation is systematic if it results in data that can be obtained (replicated) by any


other observer

- SO!! we need to carefully think about…


(1) What to observe: the variable(s) to be uncovered

(2) How to observe: one or several observers? taped or videotaped?

(3) Who to observe: which subjects? all present?

(4) When to observe: continuously or not?

1
(5) Where to observe: natural setting or lab?

• What to observe
- researchers in a good observational study have to…
a) decide which dependent variable or outcome to be measured

• EX: private speech in preschool children, reinforcement used by parents during


parent-children interactions, violent behavior during sports games, etc.

b) decide the level of analysis

• Main levels of analysis in psychology:


i. group (EX: teams)

ii. dyads (EX: mother and child)

iii. individuals (EX: children)

c) establish a clear and specific code of observation

• each observer needs to know how each behavior should be considered and
categorized

d) choose what types of measures they’re going to observe in the outcome

• main ones: occurrence, frequency, latency, duration, and intensity


• How to observe
- in order to observe, we need observer(s)
• sometimes there’s one one, other times more than one
- among other things, this election has implications for the analysis of the reliability of
the observation

(1) Inter-rater (btwn-rater) reliability: researchers evaluate agreement in how


consistently different (usually trained) raters can assign the same score or
category to the study subjects

(2) Intra-rater (w/in-rater) reliability: how consistently the same rater can assign a
score or category to the same subjects; conducted by re-scoring video footage

2
• researchers as observers: must be decided if the researcher (the one that develops
the research and its theoretical framework) is going to be an observer

- or!! if observers will be “independent.” or!! a combination!!


• if the researcher is involved, this may lead to observer bias
• finally, it must be decided if the observation will be recorded somehow
- if recorded, the observation may be done later (and several times by several
different observers)

• Who to observe
- first thing to decide: the level of analysis
(1) Group level

• EX: observing misbehaviors in the classroom


(2) Dyad level

• EX: observing reward behaviors of parents when interacting w/ their children


and their corresponding response

(3) Individual level

• EX: observing violent behaviors of a child


• also!! must be decided how many people to be observe
- sample: some of the available subjects
• EX: some of the children in the classroom are selected (maybe randomly) for
observation

- population: all available subjects are observed at once


• When to observe
- a time frame for observation must be decided
• this time frame may be continuously observed
- EX: when a teaching lesson is observed and the observer is measuring the outcome
of interest all the time

3
• this time frame may be further divided into periods w/ and w/o observation
- EX: the chosen time frame is one week of teaching lessons in the first year of
high school in a particular group of students

• it’s decided that 5 measures w/ a time lap of 10 minutes will be taken each
day

- when a time frame is divided into different times of observation, this can
be done randomly or non-randomly (just as w/ sampling of subjects, but
sampling of time)

• Where to observe
- in the continuum btwn natural setting to laboratory conditions, it must be decided where
to observe

- How to assess if the observation was properly made


• Reliability
- we can say that an observation is reliable if the procedure gives the same results as long
as the same conditions are met

• usually tested w/ inter-rater agreement


- simplest way: percentage of agreement btwn (or among) independent observers, or
inter raters

- better measure: Cohen’s kappa coefficient


• considering agreements expected by chance

4
- Example of reliability (agreement)
• two individuals observe the occurrence of a behavior in 10 time intervals
- 3 possible categories:
1. Not happening

2. Happening at a low intensity

3. Happening at a high intensity

5
- Inter-observers (inter-rater) agreement
• Cohen’s kappa coefficient (κ, lowercase Greek kappa)
- statistic used to measure inter-rater reliability (and also intra-rater reliability) for
qualitative (categorical) items

• it’s a more robust measure than simple percent agreement calculation, as κ takes
into account the possibility that the agreement occurred by chance

- Kappa statistics may be interpreted as follows…


i) less than 0.2 = poor agreement

ii) 0.2 - 0.4 = fair agreement

iii) 0.41 - 0.6 = moderate agreement

iv) 0.61 - 0.8 = substantial agreement

v) greater than 0.8 = great agreement

• Validity
- Content validity
• whether the selection of behaviors in the code is a representative sample of the
phenomenon to be observed (expects)

- Construct validity
• the extent to which the observation code is congruent w/ the theory from which the
problem is formulated

- Criterion validity
• the degree of sensitivity of the observation code to variations in the phenomenon under
study (relationship to external measurements)

You might also like