Unit 2
Unit 2
DEFINITIONS
OBJECTIVES
1. To gain familiarity with new insights into a phenomenon (i.e., formulative research studies)
RESEARCH SIGNIFICANCE
RESEARCH APPROACHES
RESEARCH DESIGN
x) Data Analysis
RESEARCH PROBLEM
VARIABLES
● Independent - Dependent
● Mediating - Moderating
● Extraneous (Potential to affect)
● Confound (Does affect)
● Control- Composite
• A variable is any kind of attribute or characteristic that you are trying to measure, manipulate
and control in statistics and research.
Independent variables Dependent variables
Definition A variable that stands alone and A variable that relies on and can
isn't changed by the other be changed by other factors that
variables or factored that are are measured
measured
Example Access to health care: If wealth is the Age: If a study looks at the
independent variable, and a long life relationship between economic
span is a dependent variable, a status (independent variable) and
researcher might hypothesize that how frequently people get physical
access to quality health care is the exams from a doctor (dependent
intervening variable that links wealth variable), age is a moderating
and life span. variable. That relationship might
be weaker in younger individuals
and stronger in older individuals.
Definition Characteristics that are constant and Two or more variables combined
do not change during a study to make a more complex variable
Example In an experiment about plant Overall health is an example of a
development, control variables might composite variable if a researcher
include the amounts of fertilizer and uses other variables, such as
water each plant gets. These amounts genetics, medical care, education,
are always the same so that they do quality of environment and chosen
not affect the plants' growth. behaviors, to determine overall
health in an
experiment.
LEVELS OF MEASUREMENT
Levels of measurement :
● If Hois accepted, then it implies that Ha is being rejected & vice versa
● Ha is usually the one which a researcher wishes to prove
TYPES OF HYPOTHESIS
CHARACTERISTICS OF HYPOTHESIS
SAMPLING
SAMPLE DESIGNS
2. Data collection
- Anonymity
Research ethics provides guidelines for the responsible conduct of research. It educates
and monitors scientists conducting research. The following is a general summary of some
ethical principles:
● Honesty: Honestly report data, results, methods and procedures, and publication status.
Do not fabricate, falsify, or misrepresent data.
● Objectivity: Strive to avoid bias in experimental design, data analysis, data interpretation,
peer review, personnel decisions, grant writing, expert testimony, and other aspects of
research.
● Integrity: Keep your promises and agreements; act with sincerity; strive for consistency
of thought and action.
● Carefulness: Avoid careless errors and negligence; carefully and critically examine your
own work and the work of your peers. Keep good records of research activities. ● Openness:
Share data, results, ideas, tools, resources. Be open to criticism and new ideas. ● Respect for
Intellectual Property: Honor patents, copyrights, and other forms of intellectual property. Do
not use unpublished data, methods, or results without permission. Give credit where credit is
due. Never plagiarize.
● Confidentiality: Protect confidential communications, such as papers or grants submitted
for publication, personnel records, trade or military secrets, and patient records. ●
Responsible Publication: Publish in order to advance research and scholarship, not to
advance just your own career. Avoid wasteful and duplicative publication. ● Responsible
Mentoring: Help to educate, mentor, and advise students. Promote their welfare and allow
them to make their own decisions.
● Respect for Colleagues: Respect your colleagues and treat them fairly. ● Social
Responsibility: Strive to promote social good and prevent or mitigate social harms through
research, public education, and advocacy.
● Legality: Know and obey relevant laws and institutional and governmental policies. ●
Animal Care: Show proper respect and care for animals when using them in research. Do not
conduct unnecessary or poorly designed animal experiments.
● Human Subjects Protection: When conducting research on human subjects, minimize
harms and risks and maximize benefits; respect human dignity, privacy, and autonomy. ●
Non-Discrimination: Avoid discrimination against colleagues or students on the basis of
sex, race, ethnicity, or other factors that are not related to their scientific competence and
integrity.
● Competence: Maintain and improve your own professional competence and expertise
through lifelong education and learning; take steps to promote competence in science as a
whole.
● Basic Research Misconduct known as the three “cardinal sins” of research conduct,
falsification, fabrication, and plagiarism (FFP) are the primary concerns in avoiding
research misconduct.
● Falsification: Falsification is the changing or omission of research results (data) to
support claims, hypotheses, other data, etc. Falsification can include the manipulation of
research instrumentation, materials, or processes.
● Fabrication is the construction and/or addition of data, observations, or characterizations
that never occurred in the gathering of data or running of experiments. Fabrication is
"filling out" the rest of experiment runs
● Plagiarism is “the appropriation of another person's ideas, processes, results, or words
without giving appropriate credit.
PARADIGMS OF RESEARCH
● Descriptive versus Analytical
● Applied (Action) versus Fundamental
● Quantitative versus Qualitative
● Conceptual versus Empirical
● Cross sectional versus longitudinal (cohort studies)
● Laboratory (simulation) research versus field setting research
Descriptive research classifies, describes, compares, and measures data. Meanwhile, analytical
research focuses on cause and effect. For example, you may talk about the mean or average trade
deficit in descriptive research. Meanwhile, analytical research measures something different.
Instead, you’d look at why andhow the trade deficit has changed.
Conceptual research is about creating an idea after looking at existing data or adding on a theory
after going through available literature. Empirical research involves research based on
observation, experiments, and verifiable evidence.
While longitudinal studies repeatedly observe the same participants over a period of time,
cross-sectional studies examine different samples (or a “cross-section”) of the population at one
point in time
OBSERVATION
- Types:
● Non-participant: Naturalistic Observing of events- Artificial probes or manipulation
might destroy the character of the event being studied- To avoid intrusion ●
Participant Observation: participant can interface with the subjects-Undisguised &
Disguised Participant Observation (observer disguise to the extent of being accepted)
● Hawthorne effect- observation bias
SURVEY- QUESTIONNAIRES
● A notable feature- usually only one opportunity to collect data from each
informant Therefore, questions needs to be clear, comprehensive and effective
● Close-ended Questions - MCQ’s- straight-forward, quick to answer- responses are
easily turned into quantitative data. Types of Closed Questions are:
INTERVIEWS
● They involve social interaction- researchers need training- different types of
questions generate different types of data
● Interview schedule- a set of prepared questions designed to be asked exactly as worded-
astandardised format
● Recording of interviews- data written up as a transcript- written account of
interview ● Special care when interviewing vulnerable groups
● Language appropriate to the vocabulary of the group- social background/
respondents' age / educational level / social class /ethnicity etc.
● Types of Interviews: Structured & Unstructured
STRUCTURED INTERVIEW
● Formal interview- questions are set / standardized order and the interviewer will
not deviate from the interview schedule- based on worded, closed-ended- questions.
● Strengths
- easy to replicate as a fixed set of closed questions are used- high reliability. - fairly
quick to conduct- many interviews can take place within a short amount of time- a large
sample can be obtained- generalized to a large population.
● Limitations
UNSTRUCTURED INTERVIEW
GROUP INTERVIEW
● Many respondents are interviewed together – ‘focus group’
● A researcher must be highly skilled- ability to establish rapport and knowing when to
probe- make sure the group interact with each other and do not drift off topic ●
Strengths- Same as unstructured interview
● Limitations
- Keeping confidentiality and respecting participant privacy difficult- the researcher cannot
guarantee that other participants will keep information private
- less reliable- use open questions and answers may deviate- difficult to repeat the question -
May lack validity- social desirability bias- conform to peer pressure and give false answers
THE INTERVIEWER EFFECT
FIELD RESEARCH
● Social scientists observe, interact and understand people while they are in a
natural environment- how they react to situations around them
● Expensive and timely, however amount and diversity of the data invaluable-
original ● Techniques- Direct observation, document analysis, informal interviews,
surveys ● Data analysis is based mostly on correlation; No manipulation
● Data collected is specific only to the one purpose or setting- not generalizable ●
Method- clearly stating the problem > defining area of study > a hypothesis, or a theory
isset forth> decision on how data to be classified and scaled> Observations are classified
-what to look for and what to disregard> Observations are also scaled- a way to rank the
importance or significance > data analysed and processed > resolve the problem or
accept/reject the hypothesis
CROSS-CULTURAL STUDIES
DESCRIPTIVE PHENOMENOLOGY
INTERPRETIVE PHENOMENOLOGY
CODING-ANALYSIS
“meaning units” > Transform the
meaning units-to transferable
language and then to
psychological meanings ● OPEN CODING- a tool for initial analysis of the
(intentionality)> using the reading text several times & dividing into segm
transformed meaning units as the data that summarize an idea done at this stage
basis for explaining the ● AXIAL CODING- next step is organizing and co
psychological structure of the (Axes)- The tentative labels are grouped based
phenomenon> General structure. (20-30) to get core themes (3 -5)- requires induc
● Colaizzi’s method (returning to for causal relationships
validate)- Van kaam method ● SELECTIVE CODING- finding the core categor
(intersubjective agreement using categories- Create a scheme or map that connec
expert judges) them to core category- story lining- overall expl
GROUNDED THEORY
● Derived largely from the field of anthropology- studying a phenomenon within the
context of its culture
● Researcher must be deeply immersed in the social culture over an extended period of
time (usually 8 months to 2 years) and should engage, observe, and record the daily life of
the studied culture and its social participants within their natural setting ● The primary
mode of data collection is participant observation
● Advantages
- sensitiveness to the context- natural setting
- the rich and detailed understanding it generates
- Minimal respondent bias
● Disadvantages:
- an extremely time and resource-intensive approach
- findings are specific to a given
culture and less generalizable to
other cultures ● The classic example:
Jane Goodall’s study of primate
behaviours- lived with chimpanzees
at Gombe National Park in Tanzania,
how chimpanzees seek food and
shelter, socialize communication
patterns, their mating behaviors
NARRATIVES
THEMATIC ANALYSIS
● TA was first developed by Gerald Holton, a physicist and historian of science, in the
1970s- currently the most common form of analysis method used in Qualitative research
● Pinpointing, examining and recording patterns or themes within the data ● Themes are
patterns across data sets that are important to the description of a phenomenon and are
associated to a specific research question
● The theme is the meaningful ‘essence’ of the data
● It is done in 6 phases to establish meaningful patterns:
1. Familiarization with the data (reading & re-reading & notetaking)
2. Generating initial codes (meaningful parts, cyclical process- going back &
forth) 3. Searching for themes among the codes (meaning identifying
phrases/sentences) 4. Reviewing themes (search data that supports/refutes
proposed theory- collapse or condense themes,)
5. Defining and naming themes (engaging & descriptive theme
names) 6. Producing the final report (important final themes
discussed)
CASE STUDY
● Case studies are in-depth investigations of a single person, group, event or community to
describe a phenomenon
● Data source-observations & interviews, diaries, personal journals, medical reports,
documents- case history in clinics
● The information is mainly biographical- past & current life
● Can be analysed by any method- grounded theory/ phenomenology
● Provides rich qualitative information- for further research
● Time consuming, non-generalizable, researcher bias, non-replicable
RESEARCH METHODOLOGY AND STATISTICS - 3
LEVELS OF EVIDENCE
● IV not manipulated
● Subjects not randomised
● No control group
● Provides Level 4 evidence
● Also called as Observational studies
● Descriptive research- designed to provide a snapshot of the current state of
affairs ● Correlational research- designed to discover relationships among
variables and to predict future events from present knowledge
TYPICAL DESCRIPTIVE
● Cross sectional survey- single variable research- survey
● IV and DV are not used because causality or assessing relationships between
variables are not an aim
● To get a clearer description of the phenomenon
● Protection against bias
- conceptual & operational definitions of variables
- Sample selection techniques
- Large sample size
COMPARATIVE DESCRIPTIVE
● To predict the value of one variable from the value of another variable ●
Trying to find causal phenomenon (initial step) and thus the variables are called
independent and dependant variables
● Statistics used in such a research are regression analysis and multiple variate
statistics ● Example: to determine if smoking could predict the incidence of lung
cancer
EXPERIMENTAL RESEARCH
PRE - EXPERIMENTAL
● The researchers simply find some group of subjects who have experienced an event X
and then measure them on some criterion variable
● The researcher then tries to relate X to O
● No variable is manipulated- No pretested data
● No control/comparison group
● Eg: conducting research to determine the effect of reading stories to children at night
impact language skills in children
● Implicit comparison group using Norms on the measuring instrument
● Researcher doesn’t know whether subjects already differed from the “norms” prior
to experiencing the X
ONE GROUP PRE TEST POST TEST
● No Pre-tests
● Two groups
● The experimental group is exposed to the
treatment and then evaluated ● Other group
(un matched) evaluated without any treatment
● Eg: effect of praising on learning mathematics in school going children
QUASI-EXPERIMENTAL
● Quasi means partial or half or resembling
● Due to ethical issues and other constraints- Ex-smoking in pregnant women -
Treatment manipulation done (by selection) , but random assignment not done -
Mostly conducted in field settings and not laboratory
- Constraints in controlling variables by researcher
● Internal validity & external validity- threats
- History threat- covid19 condition
- Maturation threat- age not controlled- in children
- Test effect threat- social desirability biases- demand characteristics- practise effect -
Selection bias- groups are not comparable
- Regression to the mean- score closer to middle in the second time -
Attrition- incomplete data due to leaving of participants (dropout)
QUASI-EXPERIMENTAL DESIGNS
● Randomness is important!- eliminate any possible biases and balances the groups
● When randomness is neither possible nor practical
● Disadvantage: do not
control for all confounding
variables so they can’t
completely rule out
alternative explanations for
the observed results
● Possible solution:
Matching instead of
random assignment-
individual matching;
ex-post-facto
matching; aggregate
matching (using
statistics)
EX-POST-FACTO RESEARCH
● After-the-fact Research/Causal
Comparative Research
● Investigation starts after the fact has
occurred
● Researcher tries to
find out the causes behind its occurrence by going backwards in history
● Manipulation is not possible
● E.g. Earthquake hit area, Cause of delinquency, etc.
Eg:
● Independent samples design: cases are assigned to groups in a way that should not
create any correlation between the scores in any one group and the scores in any other
group. Also called between subjects designs (randomly selected)
● Correlated samples design cases are assigned to groups in a way that should produce
a positive correlation between the scores in any one group and the scores in any other
● When you have a blocking variable (positively correlated with the DV)
● Eg: age/ gender as blocking variable
● Variability within blocks is less than the variability between blocks
● You match the subjects up in blocks of two, such that within each block the subjects are
nearly identical on the blocking variable(s)
● Then you randomly assign, within each block, one case to Treatment 1, another to
Treatment 2 (control)
● Design could be described as matched pairs
REPEATED MEASURES DESIGN
● Often interchanged with the term 'within subjects’ design
● A subtype of the within subjects design
● Uses the same subjects with every treatment of the research including the control
● Disadvantages: practice effects; boredom and fatigue; dropout
● Crossover design (counterbalancing)- ensuring that all of the subjects receive all of the
treatments- presence
of carryover effect
● The within or
intrasubject variability
in will be smaller than
the between or
intersubject variability
used for the
comparison of
treatments in parallel
groups design
LATIN SQUARES
● The crossover design is a type of Latin square
● In a Latin square the number of treatments equals the number of patients
● Another important factor- order of treatment
● Order of treatments included in a balanced way
● The net result is an N X N array (where N is the number of treatments or patients) ● This
is most easily shown pictorially with N letters such that a given letter appears only once in
a given row or column
● A-CT
● B-BT
● C-CBT
● D- Pharma
CHARACTERISTICS
● One or only a few subjects tested ("single-subject designs")
● Subjects not put into groups, but run as individuals
● Experiments are long, allowing performance to stabilize over time
● Data from subjects usually not combined but considered
separately ● Data analysed visually with minimal use of inferential
statistics
● Used in educational, clinical and animal research studies
FACTORIAL DESIGN
● Have more than one independent variable
● See the combined effects of two or more IVs on a single DV
● Effect of expertise & suggestion on false memory (Mazzoni, Loftus, Seitz &
Lynn: 1999)- Participants never been bullied
- Asked to report a recurring dream to
(1) another participant or (2) Psychiatrist
- In experiment condition (a) confederates (1 & 2) say “ this types of dreams are as a result
of being bullied as a child”
- In control condition (b) confederates ( 1 & 2 ) say “this type of dreams are a result of
feeling powerless as a child”
- One week later: As a separate study they were asked if they were bullied as a child
DISTRIBUTION OF A VARIABLE
● Frequency
distribution: a
table/ polygon of
how many times
each score occurs
● Histogram is a graph plotting values of observations on horizontal axis- each bar show
how many times each value occurred in the data set
● Normal curve
- In an ideal world data would be distributed symmetrically around the centre of all scores -
If we drew a vertical line through the centre of the distribution then it should look the same
on both sides
- This distribution is known as a normal distribution and is characterized by the
bell-shaped curve
DEVIATION FROM NORMAL
● Mode is the tallest bar-the score that occurs the most is the mode
● Median is the middle score when scores are ranked in order of magnitude -
arrange these scores into ascending order
- (n + 1)/2
- unaffected by extreme scores
MEAN
● Add up all of the scores and then divide by the total number of
scores ● Used only with interval or ratio data
● It uses every score
● Disadvantage- it can be influenced by extreme scores
● Mode and median ignore most of the scores in a data set
● Z score is a numerical measurement that describes a value's relationship to the
mean. Z-score = 0, score is identical to the mean score
DISPERSION
1. Range
● Range of scores: take the largest score and subtra
affected dramatically by extreme scores
● Quartiles are the three values that split the sorted
calculate the median, which is also called the se
into two equal parts.
● Interquartile range- cut off the top and bottom 25
of the middle 50% of scores (Q3- Q1) (half lost
PERCENTILE RANKS
2. Inter-Quartile range
3. Percentile ranks
4. Variance
5. Standard deviation
6. Coefficient of variation
RANGE & INTER-QUARTILE RANGE
VARIANCE
● Deviance-the difference between each score and the mean- some deviations
positive some negative
● Sum of squared deviances (SS)- add up the square of deviance for each data point in
the distribution- total dispersion score
● Variance- average this total dispersion. Why N-1?
DEGREES OF FREEDOM
● Ball in cup- data required for understanding the distribution- mean takes in all the
scores ● If mean is already in the calculation- we only require the N-1 data points ● Mean
= Total / N
● Total = Mean * N
1 2 3 4 5 Mean
8 5 15 14 ? 10
STANDARD DEVIATION
NORMAL DISTRIBUTION
● Gaussian distribution- Carl Friedrich Gauss
● Mean=median=mode= 0 ; SD=1
● The values of skew and excess kurtosis are 0
● Half of the population is less than the mean and half is greater than the mean. ●
Empirical Rule allows you to determine the proportion of values that fall within certain
distances from the mean
● ≈68% of the data falls within 1 SD
● ≈95% of the data falls within 2 SD
● ≈99.7% of the data falls within 3 SD
● We can use Z scores to understand where a specific observation falls relative to the
entire distribution
● Can place observations drawn from different normal distributions (diff mean and SD)
on a standard scale and compare them
● This process is called standardization
● To standardize your data- convert the raw scores into Z-scores
EXAMPLE - COMPARE APPLES TO ORANGES
● Compare their weights- we have 110 gms apple and 100 gms orange
Apples Oranges
Standard Deviation 15 25
● If repeated random samples of a given size n are taken from a population- for a
quantitative variable, where the population mean is μ (mu) and the population
standard deviation is σ (sigma) then the mean of all sample means (x-bars) is equal to
population mean μ (mu)
Making a Formal Statement > Selecting a Significance Level > Deciding the Distribution to
Use > Determining the suitable test to use > Selection of a Random Sample > Collecting
required data > Calculate the probability of the sample > Reject or Accept Null hypothesis
CONFIDENCE LEVEL
● Confidence interval denotes the values within which population parameter is
expected to lie. Consists of the upper and lower bounds of the estimate you expect to
find at a given level of confidence
● Confidence level- Probability that a parameter will fall between a set of values- If
95%, 1/20 times parameter is missed.
● Ho & Ha
● The significance level, also denoted as alpha or α, is the probability of rejecting the
null hypothesis when it is true- Predefined (α may be kept as 0.1, 0.05, or 0.01) ● P
value or calculated probability is the estimated probability of rejecting the null
hypothesis (Ho) of a research question when that hypothesis is true
DISTRIBUTIONS
● Test statistic- a number calculated from a statistical test of a hypothesis (central value,
variation, sample size, and no. of predictor variables)
● Discrete distributions:
- Uniform (outcomes are equally likely)
- Binomial (success/ failure in n trials)
- Bernoulli (success/ failure in one trial)
- Geometric ( no. of failures before success)
- Poisson (probability of n events)
● Z test-normal distribution; t-test- t distr.; ANOVA- F distr.
● Symmetrical distributions (t and z distributions)
● Asymmetrical (F and chi-square distributions)
TYPE I AND TYPE II ERRORS
POWER ANALYSIS
● Ability of a test to find an effect is known as its statistical power - probability of
finding an effect when it exists.
● Power of test= 1 − β (Type II error)
● We typically aim to achieve a power of 0.8 (1 - 0.2)
● Power can be calculated and reported for a completed experiment to comment on
the confidence one might have in the conclusions drawn from the results of the study
EFFECT SIZE
● A significance does not always tell about the strength of the effect
● Bigger effects will be easier to spot and be significant
● It is a standardized measure of the magnitude of observed effect.
● Most common measures of effect size are
- Cohen’s d- d = 0.2 (small), 0.5 (medium) and 0.8 (large)
- Pearson’s correlation coefficient r = 0.1 (small), 0.3 (medium), 0.5 (large)
CORRELATIONAL ANALYSIS
● Relationship between predictor variable and outcome variable
● We measure what naturally goes on in the world without directly interfering with
it ● r is the coefficient of correlation
● Types- Pearson correlation, Spearman correlation, partial, multiple, special
PRODUCT MOMENT CORRELATION
● Also known as Pearson r
● Two continuous variables, or one continuous variable and a categorical variable-
two categories
● It can vary from –1 (a perfect negative relationship) through 0 (no relationship) to +1
(a perfect positive relationship)
● > +0.5 or <-0.5 is strong correlation
● >+0.3 or <-0.3 is moderate correlation
● Between -0.3 to +0.3 weak correlation
● 0 means there is no relationship at all