0% found this document useful (0 votes)
12 views50 pages

Research and Statistics

The document outlines the principles and dimensions of good research, emphasizing the importance of empirical evidence, replicability, and the scientific method. It categorizes research into descriptive, correlational, and experimental types, while also discussing key concepts such as hypotheses, variables, and research designs. Additionally, it highlights the significance of paradigms, ethical considerations, and the steps involved in conducting psychological research.

Uploaded by

Ralitsa D'souza
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views50 pages

Research and Statistics

The document outlines the principles and dimensions of good research, emphasizing the importance of empirical evidence, replicability, and the scientific method. It categorizes research into descriptive, correlational, and experimental types, while also discussing key concepts such as hypotheses, variables, and research designs. Additionally, it highlights the significance of paradigms, ethical considerations, and the steps involved in conducting psychological research.

Uploaded by

Ralitsa D'souza
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

Research and Statistics

1. Good research is empirical: primarily concerned with one or more aspects of a real world
situation and deals with concrete data that serves as a foundation for the external validity of
research findings.
2. Good research is replicable: research findings must be validated by replicating the study,
resulting in a solid foundation for decision making

Dimensions of research
Purpose of research
Principal objective or purpose of research in any field of inquiry is to add to what is known about the
phenomenon under the investigation through the application of scientific methods.
The 3 major purposes are exploring, explaining and describing any problem/event under study.

3 basic types of research

No control over the variables is intended. The research methods employed include
Descriptive observation, interviews, case study, introspection. The goal is to merely describe the
research phenomena under study and may involve categorization, frequencies and descriptive
statistics.

Finding the direction and strength of the association between the variables under
Correlational study. The manipulation of the variables is not possible as we study the covariance a hi
research nd not the causation. The goal is not to predict but to find the degree of association
between the dependent and the independent variable, it helps us to understand
characteristics and understand behavior

We manipulate the independent variable to study the effect on the dependent variable.
Experimental It helps us to analyse the relationship between the variables and explain changes and
research test a theory

Fundamentals of doing research in psychology


1. Research comes from a deterministic world and must be based on empirical evidence.
2. Research must be replicable and falsifiable to determine whether the hypothesis or theory is
true
3. Results must be parsimonious not requiring excessive or impossible ideas.
4. Determinism: events have natural causes. Natural causes can be manipulated or reproduced as
well as understood while supernatural causes cannot be.
5. Empiricism: human senses are the ultimate source of human knowledge. Empiricism does not
rely on intuition, faith or even logic to make its point
6. Replicability: means findings are reliable and reproducible. Single findings might be a fluke
with just the right situation to give significant results. If experiment cannot be replicated,
finding aren’t true
7. Falsifiability: means experimental hypothesis or theory can be proven false. A falsifiable
theory allows a scientist to determine if something is actually true
8. Parsimony: defined as looking for the simplest explanation to a phenomenon. If your theory
can explain why something is happening without making assumptions then you are on the
right track. In psychology, this means you try to minimize your reliance on untestable,
ill-defined concepts or psychic forces.

Paradigms in Psychology
1. Essentially a worldview, a whole framework of beliefs, values and methods within which
research takes place.
2. Research philosophy combined with research methodology comprises research paradigm
3. Define what falls within and outside the limits of legitimate research and how members of
research communities view both the phenomena that they study and research methodology
that should be employed to study these phenomena
4. Another use of the word is in the form or worldview
5. The term is used to describe the set of experiences, beliefs and values that affect the way an
individual perceives reality and responds to that perception.
6. Kuhnian phrase: paradigm shift, to denote a change in how a given society goes about
organizing and understanding reality.
7. Word refers to pattern in greek
8. Concerned with epistemology, ontology, methodology and axiology

ontological Nature of our beliefs about reality. Concerns with the notion of what actually exists, the
nature of reality and what can be known about it. Comes from Greek ontos meaning
being. Relates to the nature of reality and its characteristics.

epistemological Concerned with the search for the foundations of human knowledge which can offer
some assuredness of the truth of our knowledge claims. Comes from the Greek root
episteme, meaning knowledge and can be understood as the study of knowledge itself.
This word root was originally applied to what greeks considered rigorous knowledge or
knowledge gained from studying the world

methodological Strategy, plan of action, process or design that informs one’s choice of research
methods. Guides the researcher in deciding what type of data is required for the study
and which data collection tools will be most apt for the purpose of his/her study.
Methods refers to the specific means of collecting and analysing data such as
questionnaires and open ended interviews. What methods to use for a research project
will depend on the design of that project and the researcher’s theoretical mindset.

axiological Refers to ethical issues that need to be considered when planning a research proposal.
Consider the philosophical approach to making decisions of value or the right
decisions. Involves defining concepts of right and wrong behavior relating to the
research. Consider what value we shall attribute to different aspects of our research,
participants, and the data.

Research designs in psychology


1. Procedures for collecting, analyzing, interpreting and reporting data in research studies.
2. Rigorous research designs are important because they guide the methods decisions that
researchers must make during their studies and set the logic by which they make
interpretations at the end of the studies
3. It is a plan that specifies the sources and type of information relevant to the research problem.
4. It also includes the time and budget of the research

Inductive vs deductive approach

Deductive Inductive

Deductive approach involves the search of Replaces the previously held major premise by
knowledge in a scientific way. hypothesis or assumptions.

The scientific enquiry that we seek today is Hypothesis is then tested by collection of data and
possible due to the two key contributions in logical analysis of data
establishing and popularizing the “the
scientific method’ of inquiry.

Developed by Aristotle Popularized by Francis Bacon

Based on syllogistic reasoning establishing a Start with a set of observations looking for patterns in
relationship between a major premise, minor observations and then they move from those specific
premise and a major conclusion experiences to a more general set of propositions about
those experiences.

Moves from general assumptions to the Moves from specific observations to generalizations
specific application

Starts with a theory, developing hypotheses Utilized to discover patterns and construct
from that theory and then collecting and generalisations and theories based on the specific
analysing data to test those hypotheses observations

Developed based on prior research and Study would usually begin with data and then try to find
theory pattern in it and then attempt to make a general theory

Steps to conduct research


● John Dewey (1938) suggested it
● Identification and definition of problem
● Formulation of a hypothesis
● Collection, organization and analysis of data
● Formulation of conclusions
● Verification, rejection or modification of hypothesis
● Research process consists of a series of action and steps
● Defining a research problem
● Reviewing the literature
● Formulating hypothesis
● Preparing the research design and determining the sample design
● Collecting the data and ensuring the right execution of the project
● Analysis of data leading to hypothesis testing, interpretation and generalization
● Preparation of report

Important components of research


Research problem
1. Interrogative testable statement that expresses the relationship between 2 or more variables
2. Offers direction to the whole research
3. Originates from one of the following:
4. Noticeable gap in the existing research of the same field
5. There is some unexplained phenomena that needs to be understood
6. When the results of various research are contradictory or disagree with each other
7. Should be clearly written
8. Must express relationship between 2 or more variables
9. Constructed statement must be testable using empirical methods
10. Statement must be precise and specific
11. Must not answer any ethical or moral judgements.

Operational definitions
1. Detailed explanation of the technical or ambiguous terms and measurements used in the
research process
2. Helps to better address subjective and ambiguous constructs in psychology when they are
used in scientific research
3. Avoiding confusion regarding the intended meaning, addressing the distortion of meaning due
to subjective experiences
4. The insistence that all abstract scientific terms must be operationally defined is called
operationalism or operationism

Research variables
1. A variable is a property that takes on different values as expectations and circumstances
change.
2. Types of variables
3. Independent variable are manipulated by the experimenter
4. Types of independent varible
5. Task variables: refer to characteristics which are associated with a behavioral task given to the
participant. Includes physical characteristics or complexity of the apparatus used in
experiment.
6. Environment variable: characteristics of env that tend to produce changes in DV
7. Subject variables: characteristics of subject that tend to produce DV. 2 major types are natural
subject (age, sex) and induced or instructional subject variable (induced by instructions given
by the researcher)
8. Outcome variable: DV that the researcher measured to observe any resulting change made by
the IV. also called response variable
9. Unmeasured variable: all the extraneous variables that are the focus of the research but they
affect the relationship under study.
10. 3 types: subject relevant (constitute the characteristics of the participant that affects the
researcher’s study in an undesired manner). For eg, age intelligence
11. Situational relevant: environmental and task variables whose undesired effect is controlled by
the research
12. Sequence relevant: occur due to different ordinal positions that conditions of the experiment
occupy in a sequence.
13. Connecting variable: intervening/mediator variables that are essential in completing the
relationship between the independent and dependent variable when a direct relationship
cannot be observed in certain cases.
14. Dependent variable is observed and recorded by the experimenter
15. It depends on the behaviour of the participant which is supposed to depend on the independent
variable
16. There are 2 levels of independent variable: experimental and control
17. Control/ extraneous variable
18. Potential independent variable that is held constant during an experiment
19. Qualitative vs quantitative variable
20. Quantitative are numbers
21. Qualitative are attitudes (good or bad). They can be compared but not measured
(Nominal/Ordinal).
22. Active vs Attribute Variables
23. Active Variables are those that can be manipulated, changed and controlled experimentally.
24. Attribute variables are those that cannot be manipulated, or controlled rather reflect the
characteristics of the study population. They are thus, the pre-existing qualities of the
population.
25. Continuous vs Discrete variables
26. The Continuous variables exist between a range, say from 30 to 40 the value can be 30.1 to
39.9. A continuous variable is one that may take on an infinite number of intermediate values
along a specified interval.
27. The discrete Variables are the absolute values. A discrete variable, restricted to certain values,
usually consists of whole numbers, such as the family size, number of defective items in a
box.
28. Can be divided into 3 types
29. Constant: when a variable can have only one value
30. Dichotomous: can have 2 values
31. Polytomous: can have many values.
32. Other types of variables
33. Confounding variables also known as a third variable affects the dependent variable despite
the fact that it is not the independent variable being studied. This can cause problems in a
study
34. Extraneous variable is any variable present in the experiment that may cause the relationship
between the independent and dependent variable to be weaker than expected or observed. Any
env clue that may push the participant to behave or act in a certain way is referred to as a
demand characteristic. Experimenter effect is any hint provided by the experimenter to
persuade or sway the results in some way. Situational variables are any variables
corresponding to the noise level, the temperature or anything else present in the situation.
35. To control extraneous variables, it is important to ensure elimination, noise can be eliminated
by using sound proof situations or settings. Constancy, by holding the extraneous values
constant in all situations. Balancing, participants are made equal in all aspects in both
controlled and experimental groups. Counterbalancing, used to control variables occurring as
a result of practice or fatigue together called as order effect. Randomization, each member of
the population, having an equal chance to be selected. This technique is applied where the
extraneous variables are known, but these effects can’t be controlled by known techniques.
36. Control variable is something that the researcher manipulates to keep it constant across
something that the researcher manipulates to keep it constant across conditions allowing the
results to be more homogenous and/or valid by preventing it from becoming confounded.
37. A moderator variable modifies the strength of the relationship between the 2 variables by
changing how much the independent variable influences the dependent variable.
38. Moderator variable could be anything related to a person’s categorical variables or quantitative
variables
39. Depending on the causation, variables are classified as:
40. Change variables: are the independent variables that are manipulated, measured and selected
by the research to produce some observable effect on dependent variables. Also called
stimulus variables. Alternatively be called explanatory, predictor, right hand side or X variable

NOTE:
1. When the env, experimenter, participant hinders the research objective, it becomes necessary
to control them:
2. Using deception: creating an artificial situation/story to disguise the procedures or objectives
of the study
3. Placebo effect: real reaction to a fake treatment
4. Controlling expectation of subject and experimenter: participants should not know which
treatment they are being given, or whether they are being given a placebo, single blind. In
double blind, both participant and experimenter are unaware of the specific conditions being
presented

Hypothesis
1. Speculation or theory based on insufficient evidence that lends itself to further testing and
experimentation.
2. A hypothesis can usually be prove true or false
3. The hypothesis is written in 2 forms
4. Null and alternative hypothesis
5. Null hypothesis is defined as the prediction that there is no interaction between variables. It
says that there is no statistical significance between the 2 variables in the hypothesis. It the
hypothesis that the researcher is trying to disprove or reject
6. Represents the status quo or the prevailing knowledge about a situation
7. Ho: p=1.
8. Null hypothesis must always contain equality.
9. In rare cases may also contain < and > signs
10. If a researcher is unable to disprove or reject the null hypothesis, then it means that the sample
data results are due to chance factors and are not significant in terms of supporting the idea
being investigated.
11. The null hypothesis contains the not equal sign. This indicates we are testing whether or not
there is some effect without specifying the direction of that effect.
12. Alternative hypothesis is inverse or opposite of the null hypothesis
13. States that the results are not due to chance and that they are significant in terms of supporting
the theory being investigated.
14. Strict inequality which is usually the suspicion or claim being tested
15. Also known as research hypothesis and sometimes directional hypothesis
16. Alternative hypothesis contains the less than or greater than signs which indicates whether or
not there is a positive or negative effect
17. Hypothesis testing is the procedure of comparing a null hypothesis and an alternative
hypothesis against each other to determine validity
18. The entire goal of our research now becomes to verify and reject the null hypothesis and in
turn accept the alternative hypothesis

Characteristics of hypothesis

testable Must be able to be tested using scientific methods so that data can be gathered

falsifiable Hypothesis must be able to be proven false. Data supporting the falseness of the
hypothesis must be able to be collected
logical Must be based on reasoning. If a theory already exists that is related to the
hypothesis, deductive reasoning can be used to generate the hypothesis so that it does
not disagree with the theory. If theory does not exist, inductive reasoning can be used
to design a hypothesis that is in agreement with general observations.

positive A hypothesis must be worded in such a way that it proposes the existence of a
relationship between the subjects of the study. Very rarely scientists set out to show
that there is no relationship.

Difficulties in formulating a hypothesis


● Absence of knowledge of a theoretical framework
● When the investigator lacks the ability to utilize the knowledge of the theoretical framework
● When the investigator lacks the ability to utilize the knowledge of the important scientific
research techniques

Aim of a hypothesis
● Difference
● Relationship (statistically significant implies difference in result did not occur by chance)
● Interaction

Rejecting the null hypothesis


● When we perform hypothesis tests, we will use a significance level
● This is the threshold that determines if our study’s results are statistically significant
● That is the if the probability of seeing the observed result is so low that it gives us reason to
doubt the null hypothesis
● If we reject, this indicates evidence in the favor of the alternative
● If we fail to reject, we do not have evidence in favor of the null or against the alternative
● So null stands by default
● When testing for statistical significance, we want a value below 0.05 p-value.
● If we have value below 0.05, reject the null
● If your value is above 0.05, fail to reject the null
● Test of significance
● Statistically significant means the difference between the results do not occur by random
chance. Indicated by p
● Random chance means these results would not occur by chance less than 1 in 20 times or
0.05.
● 2 ways to determine the results are significant
● Should never be used at the same time
● Two tailed tests, also known as non directional defined as the standard test of significance to
determine if there is a relationship between variables in either directions
● One tailed test also known as directional defined as a test of significance to determine if there
is a relationship between variables in 1 direction
● Null hypothesis assumes that whatever you are trying to prove did not happen.
● Alternative hypothesis is the one you would believe if null hypothesis is concluded to be
untrue
● Statistical tests allow psychologists to work out the probability that their results could have
occurred by chance and in general psychologists use a probability level of 0.05. This means
that there is a 5% probability that the results occurred by chance
● Level of statistical significance is often expressed as a p-value between 0 and 1
● The smaller the p-value, the stronger to reject the null hypothesis
● pvalue indicates how likely that your data would have occurred by random chance that is null
hypothesis is true
● Reported to 3 decimals
● Do not use 0 before the decimal point for the statistical value p as it cannot be equal to 1
● p=.000 is not possible and should be written as p<.001
● The opposite of significant is nonsignificant not insignificant

Type 1 and type 2 error


● There is some chance that null hypothesis will not be correctly rejected
● Type 1 error is said to occur when a null hypothesis is incorrectly rejected.
● rejected despite it being true.
● Probability of making type 1 error is represented by alpha
● False positive
● When this happens it causes any additional experiments to be designed based on something
that is not true
● This can also lead to wasted time as it is unlikely that any additional experiments will turn out
the way scientists expects
● The most likely cause is that the significance level is set incorrectly
● In most situations, alpha is set equal to .05 which means there is 5% chance that differences
observed are due to randomness and not the reason studied
● The significance level is same as alpha
● So if type 1 error has to be reduced, try to reduce the value of alpha
● Some studies use alpha= .01 which means 1% chance of any variation observed is due to
randomness.
● This decreases the likelihood of type 1 error but does make it more likely that significant
differences will be missed.
● Type 2 error is said to occur when a null hypothesis is incorrectly not rejected,
● also referred to as false negative.
● And represented by beta
● When null hypothesis is not rejected in a type 2 error, it can result in scientists ending a
project early because of apparent lack of difference
● This error is an error in power which is related to beta.
● Enemy is the variance in the population
● The best way to avoid this type of error is by increasing the sample size.

Collection of data
● Universe or population is the set of individuals from which a statistical sample is drawn for a
study. Any selection of individuals grouped by a common characteristic can be said to be a
population.
● Population or the universe can be finite or infinite
● The population is said to be finite if it is made up of a fixed number of elements so it is
possible to list it in its entirety.
● N used to indicate how many elements or items there are in the case of a finite population
● Sample is the subset of the population that is representative of the entire population
● Sources of data can be:
● primary source in which one itself collects the data. Gathered first hand. Quite expensive.
Data collection is under direct control and supervision of the investigator.
● More reliable and accurate
● secondary data is the one that makes available data that were already collected by some other
agency
● Collected from other published and unpublished sources like census, govt publications,
internal record, reports, books, websites. Easily available, saves time and cost. Usefulness of
the data may be limited in a number of ways like relevance and accuracy.
Techniques of data collection
● 2 types of techniques used for collecting data
● Census technique: one where data is collected from each and every member of the population.
Data is obtained from each member of the population. More representative, accurate and
reliable
● Sample technique: only a part of the population is studied and conclusions are drawn on that
basis for the entire population.

Sampling methods
● Process of choosing or selecting the subjects of the research study that consists of persons,
events, objects or behaviors.
● Group of subjects in the study
● Sampling is the process by which the researcher chooses his or her sample
● Involves 3 elements
● Selecting the sample
● Collecting the info
● Making an inference about the population
● Values obtained from the study of samples such as average and dispersion are known as
statistics.
● Characteristics of a good sample are:
● Representativeness: the sample selected for the research must be best representative of the
population under study.
● Accurate or unbiased: a sample is free from any influence that causes any differences or
doubts on their true representativeness
● Adequacy: size must be adequate in order for it to truly represent the entire population
● Independence: all members of the sample must be chosen independently of one another and
each member should have an equal chance of being selected in the sample
● Homogeneity: there is complete similarity in the nature of the universe and that of the sample.
If 2 samples from the same population are taken they should yield similar results.
● Size of sample means the number of sampling units selected from the population for
investigation
● If sample size is large, it can be burdensome financially and might require more time.
● If the sample is too small, it might not be adequate enough to rightly represent the population.
● If the population consists of homogenous members, then small sample may serve the purpose
● Larger samples reduce error or the difference between the sample and the population.
● Randomly chosen samples usually require fewer participants
● Real world issues that influence the actual size of a sample, including time, cost and practical
considerations.
● There are 2 types of sampling: probability and non probability

Probability non-probability

Involves choosing subjects randomly to Selection of study subjects through non-random


participate in the study methods.

Every unit in the population has a chance Often used when time and money are limited or
(non-zero probability) of being selected in the more information is needed about a particular
sample and this chance can be accurately population
determined

4 basic types: simple random, stratified random, Frequently utilised in qualitative or exploratory
systematic random and cluster random. research to gather data and form a hypothesis

Equal probability of being selected. Avoids bias Types of non probability: convenience, quota,
in the overall choice. Simple random involves snowball.
randomly selecting respondents from a
sampling frame but with large sampling frames,
a table of random or computerized random
number generators is used.

Stratified sampling: stratification is the process Convenience sampling is also called accidental
of classifying sampling units of the population or opportunity sampling, in which sample is
into homogeneous units. The sampling frame is drawn from that part of the population that is
divided into homogeneous and non-overlapping close to hand, readily available or convenient.
subgroups called strata.

Systematic sampling, sampling frame is ordered Population is segmented into mutually-exclusive


according to some criteria and elements are subgroups just as stratified and then a non
selected at regular intervals through that ordered random set of observations is chosen from each
list. Involves a random start and then proceeds subgroup to meet a predefined quota.
with the selection of every ‘k’ element from that
point onwards. k= N/n where k is the ratio of purposive / judgement sample is chosen because
sampling frame size N and the desired sample there are good reasons to believe that it is a
size n and is formally called sampling ratio. representative of the total population.
Starting point is not automatically the first in the
list but is instead randomly chosen from within Consecutive sampling: researcher selects and
the first k elements on the list. studies a sample for a period of time. After the
data from the 1st sample has been collected,
researcher will then move to the 2nd sample,
process continues until the researcher either
accepts the null or accepts an alternative.

Cluster sampling: if the population dispersed Snowball is when you start identifying a few
over a wide geographic region. Large respondents that match the criteria for inclusion
populations divide into smaller groups known as in your study and then ask them to recommend
clusters and then select randomly among the others they know who also meet your selection
clusters to form a sample. Target population is criteria. This hardly leads to representative
too large or spread out and studying each would samples, but sometimes the only way to reach
be costly, time consuming and improbable hard to reach population or when no sampling
frame is available

3 types of cluster sampling are single stage


clusters, sampling is applied in only one time. In
2 stages, first choose a cluster and then draw a
sample from the cluster using simple random
sampling or other procedure. Multistage cluster,
a few steps added to 2 stage is called multistage.
Sampling errors
● The difference between the results of a sample and the results of as population is called a error
● 3 types
● Random error
● Fluctuation or difference between the population and the sample that are due to chance.
Cannot be eliminated and it will always be there.
● Systematic error: there are many types of systematic error, 2 common ones are:
● Underrepresentation: when a demographic group is underrepresented in the sample because
the researcher ignored them or because the research could' find enough subjects
● Researcher bias and inadequate sample size
● Nonresponse error is a special type of systematic error when a large group of subjects do not
respond. One way to reduce is to convenience sampling

Methods of research
Method 1: observation
● Act of meticulously viewing another’s interaction with his or her surroundings
● Supreme technique for nonverbal behavior
● Collecting facts that are direct knowledge of the investigator
● Perception with the purpose aka, regulated perception
● Procedure of observation: by mechanical or electronic device (audio or visual recording),
checklist and schedules (objectifies the observation, a score will be provided which will
facilitate comparative analysis), time sampling (certain behavior occurs during a sample of
short time intervals), event sampling (records all instances of a particular event or behavior
during specific time period ignoring all other behavior), specimen sampling (Researcher
record the description of the subject’s entire scheme of behavior for a specific period)
● Types of observation
● With respect to role of investigator, participant and non participant
● With respect to the method of observation, direct, indirect
● With reference to control on the system to be observed, controlled or uncontrolled
● You can conduct observational research ranging from a complete observer, when the
researcher is not seen or noticed by the participants as a detached observer. Reduces
Hawthorne effect because participants are more likely to act naturally when they don’t realise
they are being watched.
● Observer as participant, the researcher is known and recognized by the participants and the
participants are aware of the observer’s research goals.
● There is some interaction with the participants but it is brief. The researcher’s goal is to
maintain as much neutrality as possible.
● Participant as observer: the researcher is fully engaged with the participants in this situation.
Participants aware that he is a researcher
● Complete participant : full embedded researcher almost if he or she were a spy. Fully interact
with the researcher but unaware that they are being observed and studied.
● When performing indigenous fieldwork it is sometimes referred to as going native

Method 2 : Interviews
● Involve social interaction
● Need training in how to interview
● Interview schedule, set of prepared ques designed to be asked exactly as worded.
● Have a standardized format which means the same ques are asked to each interviewee in the
same order
● Interviews will be recorded by the researchers and the data is written up as a transcript which
can be analyzed at a later date
● Language and special care must be undertaken
● Types of interviews
● Structured interview: also known as formal and ques are asked in a set/standardized order and
the interviewer will not deviate from the interview schedule or probe beyond the answers
received. Based on structures, close ended ques. Easy to replicate as a fixed set of closed
questions are used. Quick. Not flexible. Lack detail
● Unstructured: discovery interviews and are more like guided conservation. Informal. Interview
schedules might not be used and even if used they will contain open ended ques that can be
asked in any order. Added or missed questions as the interview progresses. More flexible.
Generate qualitative data with open ended ques. Increased validity. Probes deeper
understanding.time consuming. Expensive and extensive
● Group interviews, dozens are interviewed altogether also known as focus groups. Group
interacts with each other and does not drift off topic. Highly skilled. Generate qualitative data.
Have increased validity because some participants may feel more comfortable being with
others as they are used to talking in groups in real life. Details are confidential and respect
their privacy. Less reliable and may sometimes lack validity as they conform to peer pressure
and give false answers.

Method 3: Questionnaires
● Only one opportunity to collect data from each informant
● Question needs to be clear, comprehensive and effective
● Closed ques also called mcq, straight forward, quick to naswer and lie within intellectual
range of majority of population
● Checklists, questions, graded response questions, open ended questions
● Drawback of fixed alternative ques is putting answers in people’s mouth
● Advisable to conduct a pilot study for testing the questionnaires due to their limitations in true
representation.

Method 4: experimental research


● Sir Ronald Aylmer Fisher considered as Father of modern Statistics and Experimental design.
● Has its origin in agricultural research
● Often considered to be gold standard in research design
● One of more independent variables are manipulated by the researcher, subjects are randomly
assigned to diff treatment and the results of the treatment on outcomes are observed
● Internal validity (causality) due to its ability to link cause and effect through treatment
manipulation while controlling for the spurious effect of extraneous variable.
● Suited for explanatory research rather than for decriptive or exploratory research where the
gaol is to examine the cause and effect relationships.
● Can be conducted in laboratory or field settings
● Can be grouped into 2 broad categories: true experiental and quasi-experimental
● Both designs require treatment manipulation but true experiments require random assignmnet
but quasi experiments do not
● Non experimental design is the research that is not inclusive of all types of research but do not
employee treatment manipulation of random assignment
● Should be objective. The views and opinions should not affect the results of the study
● More valid and less bias
● Lab experiments are conducted under highly controlled conditions where accurate
measurements are possible. Random, standardized procedure. Tend to be high in interval
valdiity but comes with low external validity because the artifical setting in which study is
conducted may not reflect the real world. For eg, milgram exp on obedience or Loftus and
Palmer’s car crash study. Easier to replicate. May produce unnatural behavior. Demand
characterisitcs or experimenter effects may bias the results and becomes counding
● Field experiment are done in the everyday env of the participants. Manipulates the
independent variable but in a real life setting. Conducted in field setting such as in a real
organzization and high in both internal and external validity. Rare because of difficultirs
associated with manipulating and controlling for extraneous effects in a field setting. Higher
ecological valdity than a lab experiment. Less likelihood of demand charcteristics affecting the
results, as participants may not know they are being studied
● Natural experiments are conducted in the everyday env of the participants but the
experimenter has no control over IV as it occurs naturally in real life. High ecological validity.
Can be used in situations in which it would be ethically unacceptable to manipulate the
independent variable. More expensive and time consuming than lab experiments. No control
over extraneous variable.
● Fisher gave 3 principles : replication which refers to the idea that experiment must be repeater
multiple times and each treatment must be applied to multiple experimental units ;
randomization: combine the variations caused by extraneous variables under the heading of
chance ; local/ variance control: maximize systematic variance, control extranous variable and
minimize error variance.

Method 5: quasi experimental


● Prefix quasi means resembling
● Not true experimental research
● Participants are not randomly assigned to conditions or orders of conditions
● Independent variable is manipulated before the dependent variable is measured, thus
quasi-experimental research eliminates the directionality problem.
● Does not eliminate the problem of confounding variables
● quasi experimental are somewhere between correlational studies and true experiments in
terms of internal validity
● Conducted in field settings in which random assignment is difficult or impossible
● Conducted to evaluate the effectiveness of a treatment
● More feasible and often does not have the time and logistical constraints
● Impractical or impossible because research can only effectively be carried out in natural
settings
● Do not always represent real life situations since all other variables are tightly controlled
which may not create a fully realistic situation. For this reason, external validity is increased
quasi-experimental research
● Reactions of test subjects are more likely to be genuine because it not an artificial research env
● Matching procedures may be used to help create a reasonable control group, making
generalizations more feasible.
● Results generated can be used to reinforce the findings of case studies by conducting research
that may lend itself to statistical analysis
● Conclusions about causality are less definitive
● May not be meaningful due to lack of randomization and the threats to internal validity
● Pre existing factors and other influences are not considered because variables are less
controlled in quasi-experimental research
● Human error
● Must adhere to ethical standards in order to be valid
● The inability to address cause and effect relationships between indepedent and dependent
variables due to threats to internal validity is what limits the quasi experimental design
● True experimental design: includes random assignment to experimental groups. Study groups
will be equivalent, produces better results for statistical analysis
● Quasi experimental does not include random assignment to experimental groups. Study
groups are nonequivalent. Statistical analysis can be complicated.

Method 6: field studies


● Defined as the qualitative method of data collection that aims to observe, interact and
understand people while they are in a natural env.
● Involves data collection outside of an experimental or lab setting
● Expensive and timely
● Amount and diversity collected can be invaluable
● Collect original or uncoventional data via face to face interviews, surveys or direct
observation.
● Not applicable to general public but is specific only to the purpose for which is was gathered.
● Though it involves multiple aspects of quantitative research in it
● Begins in a specific setting. Cause and effect of a certain behavior is tough to analyze due to
presence of multiple variables in a natural env
● Mostly based on correlation and not entirely on cause and effect
● When looks for correlation, small sample size makes it difficult to establish a causal
relationship between 2 or more varibles.
● Role of observer is imp aspect
● Data can be influenced by how the observer interacts with the natural elements of the study
depending on his or her involvement.
● Most field observers are known as complete observers because they oberve what is being
studied without interfering with it.
● Complete observers are field observers who choose to participate in their study env
● For ethological and cultural studies, complete observers are used.
● But they can also be participant observers or complete participants
● Field studies can also have an experimental component and be manipulative which means they
use numerical data to describe or test relationships between variables and then investigate
cause and effect relationships.
● Conducted in a real world and natural env where there is no tampering of variables and env is
not doctored
● Data can be collected even about ancillary topics
● Deep understanding, hence the research is extensive, thorough and accurate
● Expensive and time consuming
● Non-resistant to bias
● Nomenclature is very tough to follow

Method 7: focus groups


● Group of people meeting together in an effort by an organization to collect qualitative data.
Purpose of data collection varies but it may be for marketing and product development
● Facilitators guide the conversations of focus groups while scribes record what the participants
are saying
● Coding is how the data is analyzed from the focus group after the scribe transcribes the
discussion from the participants
● Internal validity cannot be established due to lack of controls and findings may not be
generalized to other settings because of small sample size.
● Not used for explanatory or descriptive research but suited for explanatory
● flexible , broad and in depth
● Lack anonymity and may cause groupthink and run a possibility of experimenter bias.

Method 8: case studies


● In dept investigation of a single person, group, event or community.
● Data are gathered from a variety of sources
● Originated in clinical medicine
● Case studies are often confined to the study of particular individual
● Info is mainly biographical and relates to events in individual’s past as well as to significant
events which are currently occuring in his or her everyday life.
● Case study is not itself a research method but researchers select methods of data collection
and analysis that will generate material suitable for case studies
● Idiographic approach
● Provides info about impratical or unethical situations
● Exploratory research
● Imp for psychologists who adopt holistic view point that is humanistic psychologist
● Researcher bias, cannot be generalized, difficult to replicate and time consuming
● 3 major types
● Intrinsic case study: done to better understand the uniqueness of a case under study
● Instrumental : done to to insights into an issue or to refine a theory
● Collective or multiple case study: cover multiple issues within and across to get deeper
insights into the general condition.

Method 9: Narrative analysis


● In latin, narrario means a narrative or story and the verb narrare to tell or narrate
● Chronological
● Meaningful
● Inherently social in that they are produced for a specific audience
● Refers to the family methods that share a focus on stories
● A story is a specific tale that people tell and narrative is a resource that culture and social
relations make available to us and in turn we use them to help construct our stories.
● Aim is to keep the story intact by ensuring that a story is not over coded or not over analyzed
● Thematic NA: focus on content within the whole story, what is said
● Rhetorical NA: focus on identifying what oppositions and enthymemes are in stories
● Structural NA: focus is on telling of the story
● Interaction NA: focus is on interactional activity through which stories are constructed.
● Personal NA: focus is on internalized and evolving life stories of individuals
● Visual narrative: focus is on how and when the image was made and who created it and
people’s response to image
● Performance NA: focus is on what is spoken and how and who is story directed to when and
why
● Dialogical NA: focus is on what is told in the story and how and what stories do, what
happened as a result of telling the story (its effects)

Method 10: Ethnography


● Qualitative approach
● Anchored into the functioning of cultures by understanding the social interactions and
expressions between people and groups.
● 8 months to 2 years
● Ethno means folk and graphy means describing something
● Attempt to uncover the meaning from an insider’s perspective rather than outsider
● Emic approach: aiming to identify behaviors specific to the group
● Etic approach: aiming to identify trends for generalizations

Method 11: Phenomenological


● Founded by Edmund Husserl refers to study of experiences
● Ultimate source of all meaning and value is human lived experience
● Study of structures of consciousness as experiences from the first person point of view
● Involve intentionality
● There is no absolute reality but only an interpreted reality
● we and the objects of our physical world are inseparable and interactive. This interactive
process is known as intentionality, directedness of experiences toward things in the world, the
property of consciousness that is a consciousness of or about something.

Method 12: Grounded theory


● developed by Glaser and Strauss (1967)
● Constructed in such a way that theory will develop inductively from the data. It is used to
develop theory using qualitative data analysis about a social phenomenon.
● Involves systematically gathering and analyzing the data to generate concepts and theories
grounded in the data itself rather than preconceived hypothesis.
● Invoke constant comparative methods
● Develop emergent concepts
● Adopt an inductive- abductive logic

Method 13: symbolic interactionism


● Involves studying the symbolic meanings and interactions that occur within a specific social
context.
● How social realities are constructed and how these constructions shape behavior.

Method 14: cross cultural


● Comparing and contrasting different cultures to understand similarities and differences in
various aspects of human behavior, beliefs, values and practices.
● Involves collecting data from multiple cultural groups and analyzing them to identify patterns,
trends and variations

Ethics in conducting and reporting research


● Apa is one the oldest organization of mental health professionals in the world
● 1892
● Provided guidelines governing various sciences of psych
● Also would determine the ethical standards that should be practiced by anyone working in the
field
● 5 basic general principles are:
● Beneficence and nonmaleficence: means help others while minimizing involvements or
relationships that may negatively impact the ability to help others
● Fidelity and responsibility: take responsibility for your actions and maintain and open
professional relationship with others
● Integrity: be honest to the best of your ability.
● Justice: give everyone equal access to an excellent standard of care.
● Respect for people’s right and dignity: understanding and eliminating one’s prejudices as a
professional which includes prejudices about culture, ethnic bg or ability status
Statistics in psychology
● Refers to the general calculation of items
● Gathering and summarization of quantitative data
● Latina status or the italian statista
● Key types of statistics
On the basis of function: on the basis of distribution of data
A. Descriptive A. Parametric
B. Inferential B. Non parametric

Descriptive
● Descriptive statistics refers to various statistical calculation that are used to describe a data set
as it appears
● It performs 2 operations: organising data and summarising data
● 4 major techniques involve classification, tabulation, graphical presentation and diagrammatic
presentation
● 2 techniques for summarising: measures of tendency and dispersion
● We make use of 4 types of descriptive stats
● Measures of frequency (concerned with how many items are there in data sets,; frequency,
counts and relative frequency)
● Central tendency (mean, median, mode , dispersion (range) and measures of position
(percentile rank and quartile rank)
Measures of central tendency
● Refer to middle or average of a data set

Mean
● Average of a data set
● Calculatef by taking sum of the data dividing by sum of the size of data set.
● Rigdily defined, easy to calculate and simple, representative, can be computed even if detailed
distribution is not knwn and is least affected by fluctuation of sampling
● Cannot be determined by visual observation, cannot be computed for open ended CI, cannot
be computed for qualitative and too much affected by extreme observations.

Median
● Middle value
● Not the middle point but the point that divides distribution in half.
● Calculated differently when even
● Free from effect of extreme values and preferred for extreme values or outliers
● Real value and is better representative value of the series
● Can be estimated through graphic presentation of data
● Calculated even in the case of open ended classes (incomplete series)
● Not based on all the items in the series

Mode
● Value of the variable which occurs most frequently
● Simple and popular
● Less affected by marginal values and ignores them.
● Located graphically using histogram
● Best representative value
● Does not require knowlege of all items and frequencies
● Difficult to identify when all items are identical
● Involves cumbersome procedure of grouping the data
● Is not representative of all the items in a series.

Mean and median equal: same in any set where the terms are consecutive or equally paced. Mean
skews in the direction that the set spreads out. If the terms greater than the median are more spread
out than those less than the median, the mean is greater than the median. Inverse also true.

Outlier
● Value deviates from all other values in a data set
Measures of dispersion
● Also called measures of variability
● Tell us about how scores are arranged in relation to the centre
● Dispersion refers to how the data is spread out, how widely or narrowly is it scattered on a
plot or how much variability is present in the data points when compared to the mean or
average value of all data points
● The dispersion tells us about the vairability or distance between the data points and the avg
value of data set.
● If there is too much difference between the distances between each data point and the avg
value then the data can be considered is volatile or unstable

Range (R )
● difference between the largest and the smallest value in the data set
● Also known as distance between highest and lowest value in the data set
● Good measure if the data at ordinal level
● Not a good measure if the distribution is highly skewed

Interquartile range
● Best way to determine the consistency of scores within the data set is to identify the quartiles
in the data
● A quartile is a one quarter of the data
● It is possible to see the distribution of scores within a data set more clearly
● Difference between the upper and lower quartile is known as interquartile range

Mean and absolute deviation


● Mean or avg deviaiton summarizes how each number from a data set spreads out from the
average or mean
● Deviation is the distance from the center point. This centre point is called median, mode or
mean
● We use mean and median to find mean deviation and only mean to find SD
● To calculate the dispersion of data, mean deviations uses the mod of deviations
● To calculate dispersion, SD uses the square of deviations

Standard deviation
● Standard means typical or average and deviation means the difference between the score and
their mean
● Stable index of variability since signs are considered because square of deviations is taken
● Also known as root mean square deviation
● If data points are further from the mean, there is higher deviation within the data det, the more
spread out the data, the higher the SD
● The wider the curve’s width, the larger the data set’s standard deviation from the mean
● All scores identical in the sample, SD is 0
● Addition or subtraction from each individual score does not change SD
● Multiplication or division of each score by constant number produced identical change in SD
● Least affected by fluctuations of sampling
● When we pick small sample from large population, there is an exclusion of many extreme
values which will naturally lower SD

Bessel’s correction
● When sample variance is used to estimate population variance then it has a denominator of
n-1, known as bessel’s correction. Sample variance will be less precise than population and -1
corrects this biasness
● Can be used with highly small sample although correction will have no impact
● Excessively large sample. Corrections will have no meaning
● When you have large sample size and want to approximate the population mean
● When you only need to find sample mean and not generalize with population, you can omit
the correction.

● Outliers have a heavier impact on SD.


● SD is preferred over variance since squaring the differences and the mean avoids the issue of
negative differences for values below the mean, but it means the variance is no longer in the
same unit of measure as the original data
● Taking square root means SD returns to original unit of measure and is easier to interpret and
use in further calculations
● Variance is difficult to grasp and SD is easier to picture and apply.
● Average deviation is similar to SD except it uses absolute values while SD uses squares
● When a data set is of normal distribution, then there are not many outliers, then SD is
generally preferred
● When there are large outliers then SD registers higher levels of dispersion than AD is
preferred.
Normal probability curve
● Continuous distribution of data that has the shape of a symmetrical bell curve
● Also known as bell curve
● Also called gaussian distribution after Carl Gauss : created a mathematical formula for the
curve
● Most of the continuous data values in a normal distribution tend to cluster around the mean,
and the further a value is from the mean, the less likely it is to occur.
● Tails are asymptotic which means that they approach but never quite meet the horizon (x axis)
● For a perfectly normal distribution the mean, median and mode will be the same value,
visually represented by the peak of the curve
● Unimodal which means that it has only one high point or maximum
● Normal distribution is also asymptotic, the curve lines slowly get closer and closer to zero as
they move away from mean or central line.
● Empirical rule in stats allows researchers to determine the proportion of values that fall within
certain distances from the mean
● The empirical value is often referred to as three sigma or the 68-95-99.7 rule
● If the data values in a normal distribution are converted to z-scores then the empirical value
describes the percentage of the data that falls within specific numbers of standard deviations
from the mean for bell shaped curves
● On the normal distribution, each SD is a set distance above or below the mean which is the
distribution centre
● SDs are used to build the empirical rule.
● It specifically states that 68% of all data points fall between +1 and -1 SDs from the mean,
95% of all data are found between +2 and -2 SDs from the average and 99.7 is within +3 and
-3 SDs.
● These principles are used to study populations so that they can create informed predictions
about human tendencies or inferences about human behaviors and tendencies
● 2 main parameters of a normal distribution are mean and standard deviation
● Calculate the mean and SD which determine the shape and probabilities of the distribution
● As parameter value changes, the shape of the distribution changes
● If the mean were to change for a specific statistical model, the overall distribution curve would
move to the right or the left
● Move left if the mean were to decrease and right if the mean were to increase
● The larger the SD, the more spread out the distribution since it measures the amount of
variation or dispersion in a data set relative to its mean.
● Large SD forms normal curve wide and gently sloped
● Small SD is more tightly packed around the mean creating a bell shaped curve that is steep
and thin
Kurtosis
1. Defines how heavily the tails of distribution differ from the tails of a normal distribution
2. 3 types
3. Mesokurtic: moderate in breadth and curves with a medium peaked height
4. Leptokurtic: more values in the distribution tails and more values close to the mean
5. Sharply peaked with heavy tails
6. Platykurtic- fewer values in the tails and fewer values close to the mean. The curve has a flat
peak and has more dispersed scores with lighter tails
7. Excess kurtosis: metric that compares the kurtosis of a distribution against the kurtosis of a
normal distribution. The kurtosis of a normal distribution equals 3. It is the coefficient of
kurtosis
8. Excess kurtosis= kurtosis -3
9. When kurtosis =0 the distribution is mesokurtic. This means the kurtosis is the same as the
normal distribution, it is mesokurtic (medium peak). The kurtosis of a mesokurtic distribution
is neither high nor low, rather considered to be a baseline for the 2 other classifications.
10. Skewness : if the curve becomes asymmetric or extends toward right or left, it is called a
skewed bell curve. Skewness represents an imbalance of a normal distribution.
11. mean , median and mode share the same value in a normal distribution. The bell curve is
perfect with the middle point at the top portion of the curve and equal to the average because
the graph is symmetrical.
12. When distribution is skewed, there are outliers to both sides of the graph. A negatively skewed
graph will have outliers on the left side of the graph and positively skewed on the right side.
The measures of central tendency will no longer be in the middle of the bell graph
13. Skewness formula= 3 (mean-median)/ SD

Ways to correct the skewness


1. Log transformation: transforms each data point. It compresses large values more than small
ones, which can help in reducing the right skewness (positive skewness) in the data.
2. Power transformation: Raises each data point to a power. Common transformations include
square root, cube root, squaring, or cubing. These transformations can effectively reduce
skewness by stretching or compressing the data distribution.
3. Exponential Transformation: Exponentially transforms each data point. It can help in reducing
left skewness (negative skewness) by compressing smaller values more than larger ones.
4. Measures of skewness: tells the direction and extent of asymmetry in a series and permits us
to compare 2 or more series.
5. Can be absolute or relative
6. Absolute: skewness can be measured in absolute terms by taking the difference between mean
and mode.
7. Value is greater than skewness will be +, mode is greater than mean, -
8. Relative measures: if absolute differences were expressed in relation to some measure of the
spread of values in their respective distribution, the measure would be relative and can be used
directly for comparison.
9. 3 imp measures of relative skewness:
10. Karl Pearson’s coefficient of skewness
11. Bowley’s coefficient of skewness (based on quartiles)
12. Kelly’s coefficient of skewness (based on percentiles)
13. These are used to make comparisons between 2 or more distributions

T and z scores
1. When we do not known the SD of the distribution, the student’s t distribution or t-distribution
is a hypothetical distribution used to calculate population parameters when the sample size is
small and when population variance is unknown
2. Much like normal distribution but has fatter traits
3. Used when sample size is 30 or less than 30
4. Population SD is unknown
5. Population distribution must be unimodal and skewed
6. The variable in t-distribution ranges from -infinity to +infinity
7. Less peaked than normal distribution at the centre and higher peaked in the tails.
8. Degree of freedom is n-1
9. William Sealy Gosset in 1908 developed t-test and t-distribution
10. He found major techniques for large samples were not useful when used in case of small
samples
11. Published his findings under the pen name “student”, student’s t-test
12. Degrees of freedom depend on the sample, the bigger the sample, the closer we get to the
actual numbers. As the sample size and so does the degrees of freedom increases, the t
distribution approaches (appears to be) the bell shape of the standard normal distribution

Normality of the data


1. Means checking whether the data makes a bell shaped normal probability curve or not when
plotted graphically
2. In case of t distribution we usually take upto 50 observations so we need a statistical tool that
is powerful enough to detect normality with such small sample sizes.
3. Since sample size less than 50 is taken, Shapiro Wilk is used
4. Useful for minor deviations but it has a bias by sample size. The larger the sample, more you
will get statistically significant result

Not normality of data


1. If not normally distributed, a non parametric test is used such as the one sample Wilcoxon
signed rank test.
2. Similar to the one sample t-test but focuses on the median rather than the mean.
3. Do not make assumptions about the normality of the data and highly advisable to use alternate
parametric options in such cases.

Central Limit theorem


1. If sample size increases, checking the normality becomes pointless as now we know that when
the sample size increases, the distribution of the data becomes more like a normal distribution.
This idea is referred to as CLT.
2. Imp to researchers because it permits us to use sample statistics to make inferences about
population parameters without knowing anything about the shape of the frequency distribution
of that population

Inferential statistics
1. Used to draw conclusions and make predictions about an entire population based on the data
from a representative sample
2. Allows to make inferences beyond the data set collected from the sample
3. Deduces whether a give data sample is similar to the population or not
4. Makes use of random samples for testing and allows us to have confidence that the sample
represents the population
5. Goal is to make generalization about a population
6. Hypothesis testing, confidence intervals, correlation and regression analysis. Statistical tests
like t-tests, ANOVA and ANCOVA provide additional info about data collected for inferential
analysis
7. The 1st basic type is called t-test. Used to compare the average scores between 2 different
groups in a study to see if the groups are different from each other.
8. the 2nd basic type is called an analysis of variance. Use the nickname ANOVA for this test.
Compares the average scores between 3 or more different groups in a study to see if the
groups are different from each other.
9. Bonferroni correction states that if one is testing ‘n’ independent hypotheses, one should use a
significant level of 0.05/n.
10. Thus if there are 2 independent hypotheses result would be declared significant only if P<
0.025
Parametrics and non parametrics
● Inferential statistics fall into 2 possible categorizations: parametric and non-parametric.
● Parametric tests rely on the assumption that the data you are testing resembles a particular
distribution (often a normal or bell shaped distribution).
● Non parametric refer to distribution free tests because there are not strict assumptions to check
in regards to the distribution of the data
● As a general rule of thumb, when the dependent variable’s level of measurement is nominal or
ordinal, then a non-parametric test should be selected
● When the dependent variable is measured on a continuous scale, then the parametric test
should be selected
● Common parametric tests: Kolmogorov Smirnov test (KS test), shapiro Wilk test, anderson
darling test

parametric Non parametric

The mean and median are both accurate because


outliers have been removed so that they do not
skew the results of the test

Hypothesis test.

Assumes that the population distribution is No assumptions are made


always known

Mean is known or assumed to be taken to be Median value is the central tendency


known. The population variance is determined
in order to find the sample from the population.
Population is estimated with the help of an
interval scale and variables of concern are
hypothesized.

Produce high quality actionable data. Less precise but much easier to facilitate

Pearson correlation Spearman correlation

Normal probabilistic distribution Arbitrary probabilistic distributionn

Population knowledge is required Population knowledge is not required

Used for interval data Used for finding nominal data

Applicable to variables Applicable to variables and attributes

T-test, z-test Mann-whitney, Kruskal-Wallis

Assumptions of parametric tests:


● Normality: data in each group should be normally distributed
● Independence: data in each group should be sampled randomly and independently
● No outliers: there should be no extreme outliers in the data
● Equal variance: data in each group should have approx equal variance

Types of parametric test

Student’s t-test
● Was developed by Prof W.S Gossett in 1908
● Published statistical papers under the pen name of student
● This test is used when the samples are small and population variances are unknown
● T-test compares the difference between the 2 means of different (independent) groups to
determine whether the difference is statistically significant
● Also used to compare population mean and sample mean
● Used when samples are small and Population variance are not known
● Test makes various assumptions: samples are randomly selected, data utilised is quantitative,
variable follow normal distribution, sample variance are mostly same in both the groups under
study and samples are small and mostly lower than 30
● Used for different purposes giving out tests called
● one sample: if there is a group being compared against any standard value. df= n-1
● 2 sample: if the groups are coming from 2 different populations, also known as independent
t-test. Df =( n-1) (n-1)
● paired t-test: used to determine whether the mean difference between two dependent (or
paired) groups is statistically significant. df= n-1
● t= Xbar- assumed mean/ standard error of mean

ANOVA
● When we need to compare more than 2 groups
● Given by Sir Ronald Fisher
● Explain the variation in measurements
● Involves a test of significance of the difference in mean values of the variable between 2
groups
● If there are more than 2 groups, ANOVA is used
● Assumptions: Sample population can be easily proximate to normal distribution, all
populations have same SD, individuals in population are selected randomly, independent
samples
● ANOVA compares variance by means of a simple ratio called F-ratio. Measured as variance
between groups/variance within groups
● Resulting F stats is then compared with the critical value of F obtained from F tables as done
with t
● If the calculated value exceeds the critical value for the apt level of alpha, the null hypothesis
will be rejected.
● F= ratio of variances
● F tests can also be used independently of the ANOVA technique to test hypothesis of
variances

2 types of ANOVA

One way Two way

Various experimental groups differ in terms of If the various groups differ in terms of 2 or more
only factor at a time factors at a time

A study used to assess effectiveness of 4 diff A study used to assess 4 diff antibiotics on
antibiotics on younger adults adults in three diff age groups

Pearson’s correlation coefficient (r )


● Investigates the relationship between 2 quantitative, continuous variables.
● Measure of the strength of association
● Subjects selected for study with pair of values of X and Y are chosen with random sampling
procedure
● Both X and Y variables are continuous
● Both are assumed to follow normal distribution

Z-test
● Used for testing significance difference between 2 means
● Compares sample mean with population mean
● Compares 2 sample means
● Compares sample proportion with population proportion
● Compare 2 sample proportions
● Sample must be random and quantitative
● Should be larger than 30
● Follow normal distribution
● Sample variances should almost be same in both groups of the study
● If SD is known, a z-test can be applied even if sample is smaller than 30
● One tailed and two tailed z -tests
● A result larger than difference between sample mean will give +z and smaller than difference
between mean will give -z

Z proportionality test
● Used for testing significant differences between 2 proportions

Post Hoc tests


● Conducted after we have found ANOVA which is the statistical difference between 3 or more
groups
● Main purpose is to test the difference between 2 groups at a time
● Allow us to know whether one group is significantly different from another
● Either chosen by the researcher before data analysis takes place or planned contrast testing
will be chosen instead.
● The decision rests upon the researcher
● Many post hoc tests: Least Significant difference (LSD) or protected t test, Tukey’s Honestly
significant difference (HSD)
● Newman-Keuls test
● Duncan’s multiple range test
● Bonferroni often used for clinical trials
● Dunnett’s test for experimental groups within a control group
● Scheffe’s test
● Liberal and conservative tests (bonferroni is conservation because it does not inflate the
probability of committing a type 1 error). Liberal is Fisher’s LSD

MANOVA
● Simply an ANOVA with several dependent variables
● Should represent continous measures
● Moderately correlated

Type of Non parametric test


● Do not assume that the data is normally distributed.
● Likely to come across is the chi-square

Parametric test (means) Nonparametric test (medians)

1 sample t-test 1 sample sign, 1 sample Wilcoxon

2 sample t-test Mann-whitney test

One way ANOVA Kruskal-Wallis, Mood’s median test

Factorial DOE with one factor and one blocking Friedman test
variable

1 sample sign test


● Estimate the median of the population and compare it to a reference value or target value.
● Compare the sizes of 2 groups
● Alternative to one sample t test or a paired t test
● Also be used for ordered (ranked) categorical data
● The null hypothesis is the difference between medians is 0
● For a one sample sign test, median for a single sample is analyzed

1 sample wilcoxon signed rank test


● Average of 2 dependent samples
● Sibling of t test
● Used in similar situations as Mann-Whitney U-test but it is used for 2 dependent samples
● You estimate the population median and compare it to target/reference value
● Assumes that your data comes from a symmetric distribution like the cauchy distribution or
uniform distribution
● If the differences between pairs of data are non-normally distributed
● 2 diff versions of the test exist
● Wilcoxon signed test compared your sample median against a hypothetical median
● Wilcoxon matched paires signed test computes the difference between each set of matched
pairs then follows the same procedure as the signed rank test to compare the sample against
some median
● The null hypothesis for this test is that the medians of 2 samples are equal. Generally used as
a non-parametric alternative to the one sample t test or paired t test, for ordered (ranked)
categorical variables without a numerical scale

Kruskal Wallis
● Use this test instead of a one way ANOVA to find out if 2 or more medians are different
● Ranks of data points used instead of data points themselves
● Your variables should have one independent variable with 2 or more levels, the test is more
commonly used when you have 3 or more levels
● Ordinal scale, ratio scale or interval scale dependent variables
● Independent observations.
● All groups should have same shape distribution

Mann Whitney test ‘U’ test


● Used to compare the differences between 2 independent samples when the sample
distributions are not normally distributed and the sample sizes are small.
● Most common for independent
● Alternative for independent t test
● Compare 2 groups on your variable of interest
● Variable of interest is continuous and have 2 and only 2 groups
● Have independent samples and have a skewed variable of interest
● Also called Mann Whitney Wilcoxon test, Wilcoxon Rank sum test or the wilcoxon Mann
whitney test

Freidman test
● Used for differences between groups with ordinal dependent variables
● Can also be used for continuous data if the one way ANOVA with repeated measures in apt
● Finding differences in treatment across multiple attempts
● Extension of sign test
● Used when there are multiple treatments
● Null hypothesis: All the treatments have identical effects or that the samples differ in some
way
● Alternative hypothesis: treatments do have diff effects

Mood’s median test


● 2 independent samples
● Compare medians for 2 samples if they are different
● Alternative to one way ANOVA
● Medians are same for both groups: null
● Medians are diff for both groups: alternative

Spearman rank correlation (rs)


● Use when you want to find correlation between 2 sets of data
● Alternative to pearson
● Ordinal interval or ratio
● Return a value from -1 (negative correlation between ranks) to 1 (positive correlation between
ranks

Power of a test
● Ability to detect an effect if there is one present
● Reduces the chance of type 2 error
● More power, more we are likely to detect an effect if there
● Probability of correctly rejecting null hypothesis if it is false
● Power = 1- beta = 1-P (type 2 error)
● Test with highest power considered best
● Opposite of level of significance (probability of rejecting null when true)
● Commonly done using G power software

Factors affecting power


● Alpha value: cut off for deeming significance. Standard is 0.05, lower than it will decrease the
amount of power present which in turn can increase the chance of type 2 errors. Higher will
increase power but in turn increase the chance of type 1 error
● Sample size: larger sample sizes create artificial differences between participants, meaning
more data collected, more power you have. Power is higher when SD is small
● Effect size: large effect size can lead to greater difference between groups, power will be high.
Small effect size take more power to detect effect
● One tailed test: have higher power than 2 tailed. As one tailed raises significance level

Effect size
● Objective and standardized measure of the size
● How much impact our effect has on our test population
● Larger the size, more is the effect
● Effect sizes of different hypothesis can be used to determine which gives greatest effect size or
least effective size
● The larger the size the stronger the relationship between 2 variables since it is quantitative
measure of the magnitude of the experimenter effect
● Helps us determine if difference is real or due to chance factor.
● To understand the strength of the difference between 2 groups, a researcher needs to calculate
the effect size
● Effect size calculates the power of a relationship amongst the variables given on the numeric
scale.
● 3 ways to measure the effect size:
● Odd ratio
● The standardized mean difference
● Correlation coefficient
● Calculated by dividing the difference between the mean of 2 variables with the standard
deviation
● The larger the effect size, the more imp the effect
● The more imp the effect, the more easily it can be seen by just looking
● Because the effect size is an objective and standardized way of measuring effect, we can use it
to compare different hypothesis tests to each other.

Degrees of freedom
● 1st appeared in the works of Carl Fredrich Gauss in early 1821
● Defined and popularized by William Sealy Gosset in 1908 and Ronald Fisher in 1922
● Defines the number of values in a dataset having the freedom to vary.
● Estimates parameters in statistical analysis or finds the missing or unknown value when
making the final calculation
● Equals sample size minus the number of parameters or relationships
● df=n-P
● If mean of a set of scores is fixed then df is one less than the number of scores df=n-1
● Df for 2 sample t test= (n1+ n2 )-2
● For paired t test is n-1
● For f test complicated, calculates variance within groups and between groups
● For between groups = p (total number of groups ) -1
● Within groups= subtracting total number of people in all groups by number of groups, N-P
● Chi square: rows and column are used to calculate df = (number of C-1) (number of rows-1)
● ANOVA: N-k, N is the data sample size and k is the number of cell means, groups or
conditions.

Regression analysis
● Predicts one variable from another

Simple linear regression


● Regression line is a straight line that attempts to predict the relationship between 2 points.
Also known as trend line or line of best fit.
● Prediction when a variable (y) is dependent on a 2nd variable (x) based on a regression
equation
● When we predict scores on 1 variable from the scores on a 2nd variable then the variable we
are predicting is called the criterion variable and referred to as Y.
● The variable we are basing our predictions on what is called predictor variable and referred to
as X
● When there is one prediction variable, the prediction method is called simple regression
● Error of prediction : value of the point minus the predicted value (value on the line)
● Formula : Y= bx + A
● Assumptions: linearity, homoscedasticity (variance around the regression line is same for all
of the values of X), errors of prediction are distributed normally
● Deciding whether to keep outliers (points dont follow rest of data) is imp in regression
analysis. Useful data, included and not useful should not be included

Multiple linear regression


● How several independent variables affect dependent variables
● Criterion is predicted by 2 or more variables
● Values of b( b1 and b2) are called regression coefficients or weights

Regression to the mean


● Data that is extremely higher or lower than the mean will likely be closer to the mean if it is
measured a 2nd time.
● Discovered by Francis Galton
● Due to random variance or chance which affects the sample
● Statistic r has a range of -1<r<+1 and r square is used to calculate coefficient of determination
● Used to predict the value is relatively accurate fit
● R square close to 0 means the model is not good fit and predictor variables have very little or
no explanation power
● And close to 1 have good fit and predictor variables have high explanation power
Factor analysis
● General linear model (GLM) used in applied and social research
● Foundation for t test, ANOVA and ANCOVA, regression analysis
● Factor analysis is part of GLM
Reduce large number of variables into fewer number of factors
● Extracts maximum common variance from all variables and puts them into a common score
● Also called dimension reduction or data reduction
● What degree individual items are measuring something in common such as a factor
● 4 purposes
● Useful for constructs that cannot be readily observed in nature
● Development of scales
● Dimension reduction
● Evidence of construct validity
● All purposes useful for developing psychological theories
● Linear
● Multicollinearity, each variable is unique
● Relevant variables into analysis
● No outliers
● Sufficient size

Types of factor analysis


● Confirmatory and exploratory
● Confirmatory factor analysis is used to confirm predefined components that have already been
explored in literature before and it is applied to sanction the effects and the possible
correlations between a collection of certain factors and variables.
● Usually requires a large sample
● The model is specified in advance, and it produces statistics based upon deduction.
● CFA is used in situations where the researcher has a particular hypothesis on how many
factors there are, and how the observable variables are associated with each component. In
most cases, this hypothesis is founded on past studies or theories and has the purpose of
corroborating that there's a link between the factors and the - observed variables.
● EFA is applied in situations where there isn't a fixed idea of the number of factors involved or
the relationship they have with the observed variables.
● The EFA's goal is to investigate the way the factors are structured and to identify the
underlying correlations within the variables.
● This type of factor analysis is not based on previous theories
● aims to uncover structures in large sets of variables through the measuring of latent factors
that affect the variables within a determined data structure.
● It does not require a previous hypothesis on the relationship between factors and variables and
the results are of an inductive nature, based upon observation.
● most used in empirical research and in the development, validation, and adaptation of
measurement instruments in Psychology because it's useful to detect a set of common factors
that explain the responses to test items.

Key Terminologies
1. Factor loading:
● Factor loading is basically the correlation coefficient for the variable and factor. Factor
loadings are merely correlation coefficients.
● Hence they range from - 1.0 through .0 to +1.0. Factor loading shows the variance explained
by the variable on that particular factor.
● In the SEM approach, as a rule of thumb, 0.7 or higher factor loading represents that the factor
extracts sufficient variance from that variable.

2. Eigenvalues:
● When factor analysis is going to generate the factors, each and every factor has ab associated
eigenvalue which will give the total variance explained by each factor.
● Usually, the factors having eigenvalues greater than 1 are useful. Eigenvalues show variance
explained by that particular factor out of the total variance.
● From the commonality column, we can know how much variance is explained by the first
factor out of the total variance.
● If eigenvalues are greater than zero, then it's a good sign.
● Since variance cannot be negative, negative eigenvalues imply the model is ill-conditioned.
● Eigenvalues close to zero imply there is item multicollinearity, since all the variance can be
taken up by the first component.
● The sum of eigenvalues for all the components is the total variance.

Types of Factoring
1. Principal Component Analysis (PCA):
● It is a statistical procedure that uses an orthogonal transformation that converts a set of
correlated variables to a set of uncorrelated variables.
● PCA is the most widely used tool in exploratory data analysis and in machine learning for
predictive models.
● It is a dimensionality-reduction method that is often used to reduce the dimensionality of large
data sets, by transforming a large set of variables into a smaller one that still contains most of
the information in the large set.
● Principal components are new variables that are constructed as linear combinations or mixture
of the initial variables
● These combinations are uncorrelated and most of the info within the initial variables is
squeezed or compressed into the 1st components

2. Common factor analysis


● Also known as principal factor analysis (PFA) or even called principal axis factoring (PAF)
● Most preferred method after PCA
● Pulls out the common variation and put them inside the factors
● Does not include unique variance of all variables

3. Maximum Likelihood method


● Estimation method
● Estimation of factor matrices and useful test statistic in the likelihood ratio for rejection of
overly simple factor models
● Consists in obtaining sets of factor loadings successively in such a way that each explains as
much as possible of the population correlation matrix as estimated from the sample
correlation matrix
● According to Kaiser Criterion, Eigenvalues is a good criteria for determining a factor
● If Eigenvalues greater than 1 , consider the factor
● If less than 1, then we should not consider that factor
● According to variance extraction rule, it should be more than 0.7, if variance is less than 0.7,
then should not be consider that factor
● Assumptions: No outlier: Assume that there are no outliers in data, Adequate sample There
should not be perfect multicollinearity between the variables.
● Homoscedasticity: Since factor analysis is a linear function of measured variables, it does not
require homoscedasticity between the variables, Linearity: Factor analysis is also based on
linearity assumption.

Partitioning the variance in factor analysis


● Factor analysis assumes that variance can be partitioned into two types of variance, common
and unique:
● Common Variance: amount of variance that is shared among a set of items. Items that are
highly correlated will share a lot of variance. Communality (also called h square) is a
definition of common variance that ranges between 0 and 1. Values closer to 1 suggest that
extracted factors explain more of the variance of an individual item
● Unique variance is any portion of variance that's not common
● 2 types of it
● Specific variance: variance that is specific to a particular item
● Error variance: comes from errors of measurement and basically anything unexplained by
common or specific variance (1-h square 2)
Experimental Design
● process of deciding how to implement scientific research.
● process whereby a researcher decides how to run a study to answer their research questions.
● There are two types of variables: independent variables and dependent variables.
● A multivariate design in research includes more than one dependent variable, while a factorial
design includes more than one independent variable.
● A between-groups design allows researchers to see effects without worrying about treatments
influencing each other but can be problematic if the groups are not equivalent.
● A within-groups design guarantees equivalent groups but runs the risk that one treatment will
impact the results of future treatments.
● A single-factor design offers simple, clear results, but can only answer simple research
questions, whereas a factorial design can answer complicated research questions, but doesn't
offer simple and clear results.

Randomized block design


● A block design in statistics, also called blocking, is the arrangement of experimental units or
subjects into groups called blocks.
● A block design is typically used to account for or control potential sources of undesired
variation.
● Blocks are usually divided into relatively uniform subsets that are subjected to experimental
conditions.
● By dividing subjects into blocks, the researcher ensures that the variability within blocks is
less than the variability between blocks.
● subjects or experimental units are grouped into blocks with the different treatments to be
tested randomly assigned to the units in each block.
● groups subjects that share the same characteristics together into blocks, and randomly tests the
effects of each treatment on individual subjects within the block.
● reduces bias and errors.
● helps to ensure that results are not misinterpreted and it improves the robustness of statistical
analyses.
● Therefore, experimental design is the process whereby a researcher decides how to run a study
to answer their research questions. In any study, there are two types of variables: independent
variables and dependent variables. A multivariate design in research includes more than one
dependent variable, while a factorial design includes more than one independent variable.
● there are benefits in utilizing block randomization.reduces bias. reduces errors.
● reduces variability within treatment conditions. helps to ensure that results are not
misinterpreted. helps with correlating the effects of the independent variable on the dependent
variable.produces a better estimate of treatment effects. improves the robustness of statistical
analyses.
● should be considered when there is an unwanted variable that could affect the outcome of the
experiment.
● Confounding variables can lead to misinterpretation of data. It is also ideal to use block design
experiments to account for spatial effects of an experimental layout.

Repeated measures design


● also known as within-subjects designs, can seem like oddball experiments.
● They don't fit our impression of a typical experiment in several keyways.
● ANOVA is an acronym for Analysis of Variance. It's called Repeated Measures because the
same group of study participants is being measured over and over again.
● For example, you could be studying the glucose levels of the patients at 1 month, 6 months,
and 1 year after receiving nutritional counseling.
● We can analyse data using a repeated measures ANOVA for two types:
● Studies that investigate either
● (1) changes in mean scores over three or more time points, or
● (2) differences in mean scores
● In repeated measures ANOVA, the independent variable has categories called levels or related
groups.
● Each level (or related group) is a specific time point
● Advantages include:
● Fewer patients are needed overall
● Higher statistical power
● Disadvantages include:
● Order effects; the possibility that the position of the treatment in the order of treatments
matters. Randomization or counterbalancing can correct this.
● Carryover Effects: administration of one part of the experiment can have trickle down effects
to subsequent parts.
● Subjects can drop out before the second or third part of the experiment, resulting in sample
sizes that are too small to yield meaningful results.
● Used for longitudinal studies

Small subject research design


● It is research design that is applicable to a small sample or a single individual case.
● An individual subject participates, and an intervention is introduced, and its effect is observed
over time.
● developed from the Case Study Method.
● It is used to test whether a certain intervention, behavior modification or a drug test will have
any impact on the behavior or not.
● It is highly popular in clinical research especially in the areas of behavior modification.
● There are 3 principles of single subject design
● Baseline (A),
● Intervention (B),
● Baseline (A).
● ABA Design: In this type of intervention, a three-stage process is applied wherein a baseline
measurement (A) is taken before treatment (B) is offered, and the treatment is eventually
withdrawn (A).
● The more careful and systematic are the repeated measurements, the more valid and reliable
would be data collection.
● determining the status of the subject prior to the intervention by observing various aspects of
one's behavior. A subject must be observed at least 3 to 5 times to establish a baseline.
● A variable is manipulated to study its effect on the behavior. If there are multiple variables,
then they should be manipulated one by one rather than simultaneously.
● The length of each phase must be predetermined. If it is too long, then there is a chance of
Carry Over Effect into the next phase and if it is too short then sufficient data might not have
gathered.

What is Multiple Baseline Design?


● In this design, multiple baselines are simultaneously established prior to the administration of
treatments.
● It can be applied across:
● Multiple-Baseline Design across behaviors: When the effect of IV is seen across multiple
compatible behaviors within a single individual.
● Multiple-Baseline Design across subjects: When a treatment is applied in sequence to the
same class of behavior in different participants who are in the same environment.
● Multiple-Baseline Design across conditions: The treatment is applied to the same behavior
when multiple participants are in different conditions or environments.

Cohort studies
● used to investigate the causes of disease and to establish links between risk factors and health
outcomes.
● The word cohort means a group of people.
● These types of studies look at groups of people.
● They can be forward-looking (prospective) or backward-looking (retrospective).
● Prospective studies are planned in advance and carried out over a future period of time.
Retrospective cohort studies look at data that already exist and try to identify risk factors for
conditions. Interpretations are limited because the researchers cannot go back and gather
missing data.
● These long-term studies are sometimes called longitudinal studies.
● In a prospective cohort study, researchers raise a question and form a hypothesis about what
might cause a disease.
● Then they observe a group of people, known as the cohort, over a period of time. This may
take several years. They collect data that may be relevant to the disease. In this way, they aim
to detect any changes in health linked to the possible risk factors they have identified.
● Cohort studies are also good at finding relationships between health and environmental factors
such as chemicals in the air, water, and food. These are issues that the World Health
Organization (WHO) helps researchers to investigate with large-scale cohort studies.
● Cohort studies are graded as the most robust form of medical research after experiments such
as randomized controlled trials, but they are not always the best form of observational work.
● They are less suited to finding clues about rare diseases.
● Typically unsuitable for identifying the causes of a sudden outbreak of disease
● They are expensive to run and usually take many years, often to produce results
● They can only offer clues about the causes of disease, rather than definitive proof of links
between risk factors and health. This is true of any observational medical research.
● Participants may leave the cohort, perhaps move away, lose touch, or die from a cause that is
not being studied. This can bias the results

Time Series
● Technique for analyzing time series data, or variables that continually change with time. This
technique is popular in econometrics, mathematical finance, and signal processing.
● Special techniques are used to correct for autocorrelation, or correlation within values of the
same variable across time.
● appropriate for longitudinal research designs that involve single subjects or research units that
are measured repeatedly at regular intervals over time.
● TSA can provide an understanding of the underlying naturalistic process and the pattern of
change over time, or it can evaluate the effects of either a planned or unplanned intervention.
● According to Daniel T. Kaplan and Leon Glass (1995), there are two critical features of a time
series that differentiate it from cross-sectional data-collection procedures:
● Repeated measurements of a given behavior are taken across time at equally spaced intervals.
Taking multiple measurements is following characteristics: essential for understanding how
any given behavior unfolds over time, and doing so at equal intervals affords a clear
investigation of how the dynamics of that behavior manifest at distinct time scales.
● The temporal ordering of measurements is preserved. Doing so is the only way to fully
examine the dynamics governing a particular process. If we expect that a given stimulus will
influence the development of a behavior in a particular way, utilizing summary statistics will
completely ignore the temporal ordering of the data and likely occlude one's view of
important behavioral dynamics.
● If the future is expected to be similar to the past, time series analysis can be useful in
projecting future events.
● Historical data analysis is likely to yield three distinct curves: trend, cyclical, and seasonal.
● A trend is defined as a consistent shift in data in one direction or the other.
● A cycle will begin with an increase, followed by a decrease, and then resume with an increase.
A cycle has a tendency to repeat itself on a regular basis. Seasonal variations are also cycles,
but they are limited to a single season of the year.

ANOVA
● Locating differences between multiple levels of a single inpdendent group mean.
● Calculate F-ratio defined as the score to determine level of difference between the means.
● Has many forms
● One way has one independent variable with multiple levels and one dependent variable
● It has 2 flavors
● Between subjects, subjects are placed in mutually exclusive groups and will be compared to
each other
● Repeated measures defined as the study that uses the same group of participants for each level
of the variable.
● With repeated measure you may get a carryover effect
● Carryover effect is defined as previous levels or conditions that may cause subsequent
Two-way ANOVA has two independent variables, and three-way ANOVA has three
independent variables. Although the number of independent variables is unlimited, factorial
ANOVA is limited to one dependent variable.
● Covariance designs: Sometimes, measures of dependent variables may be influenced by
extraneous variables called covariates.
● Covariates are those variables that are not of central interest to an experimental study, but
should nevertheless be controlled in an experimental design in order to eliminate their
potential effect on the dependent variable and therefore allow for a more accurate detection of
the effects of the independent variables of interest.

MANOVA
● Multivariate analysis of variance (MANOVA) is simply an ANOVA with several dependent
variables.
● are appropriate when multiple dependent variables are included in the analysis.
● The dependent variables should represent continuous measures (i.e., interval or ratio data).
● Dependent variables should be moderately correlated.
● If there is no correlation at all, MANOVA offers no improvement over an analysis of variance
(ANOVA)
● if the variables are highly correlated, the same variable may be measured more than once.
● In many MANOVA situations, multiple independent variables, called factors, with multiple
levels are included.
● The independent variables should be categorical (qualitative).
● Unlike ANOVA procedures that analyze differences across two or more groups on one
dependent variable, MANOVA procedures analyze differences across two or more groups on
two or more dependent variables.
● Advantages:
● Normal distribution, linearity, homogeneity of variances and covariances
● Limitations: Outliers - MANOVA is extremely sensitive to outliers. Multicollinearity and
Singularity - When there is high correlation between dependent variables, one dependent
variable becomes a near-linear combination of the other dependent variables.

Longitudinal
● Individual or one group over a continued period of time to observe the effect of time
● Good causal relationship
● Growth increments and patterns
● Typically takes a much longer period of time to get results
● Expensive and difficult to track
● Fewer participants required
● Controlled group differences and cohort effects
● Confounded by variables that they might experience
● Highly prone to carryover effects

Cross sectional
● Between subject quasi experimental
● Researcher observes the subject at different ages or at diff points in temporal sequence
● Immediate snapshot comparison of subjects
● Conducted quicker and cost efficiently
● No demand to observe participants for a continued long term duration
● Easier and cost efficient way to collect way since all of the data is collected at once
● Cohort effect
● Mutiple variables can be studied
● Easier to control
● Often requires more participants
● More time consuming

Cross sequential
● Mix of cross sectional and longitudinal designs
● Involves multiple groups multiple times over a set time period
● Much more complicated, expensive and time consuming
● Rarely used

Ex post facto
● After the fact
● Conducting the study after the event had occurred
● Heart attack
● No direct control on the manipulation of IV since they have already occurred and random
assignment of participants is not possible

Meta analysis
● A review is searching for databases to find existing literature on a topic
● Meta analysis goes one step further and uses data from existing studies to derive conclusions
about the body of research
● Each study is statistically weighted depending on the number of subjects and number of
variables
● Studies on the same treatment can produce contradictory results

Latin square
● A Latin square design is a type of experimental design used in research, particularly in
situations where controlling for multiple sources of variability is important.
● It's a method for arranging experimental units in a manner that ensures each treatment
condition appears exactly once in each row and each column of a square grid.

Spss
● Statistical package for social science
● Now change to statistical product and service solutions
● Original launched in 1968
● By SPSS inc.later acquired by IBM international business machine corporation in oc 2009
● 2 types of variance available in SPSS
● Variable view: in the form of variables
● Data view : rows and column

Important points
● Normal distribution table is the z score table
● Chi square is calculated for goodness of fit that is to see whether data from a sample matches
the population from which the data was taken
● Expected frequency is the probability count that appears in contingency table calculations
including the chi square test
● Observed frequency are the counts made from the experimental data
● Goodness of fit test include chi square, Kolmogorov smirnoff and shapiro wilk
● 1st uses medium to large sample sizes. Discrete distributions are not allowed. 2nd is used with
a small population and its purpose is to test the normality of a random sample.

You might also like