Ases311 All Sem
Ases311 All Sem
INTRODUCTION TO THE CONCEPT OF rapport, and their ability to convey genuineness, empathy,
PSYCHOLOGICAL TESTING AND ASSESSMENT and humor)
Portfolio
Test A file containing the product of one’s work.
A measurement device or technique used to quantify Serve as a sample of one’s abilities and accomplishments
behavior or aid in the understanding and prediction of
behavior. Case History Data
Test Scores Information preserved in records, transcripts, or other
Not perfect measures of a behavior or characteristics, but forms.
they do add significantly to the prediction process. Behavioral Observation
Item Monitoring the actions of people through visual or
A specific stimulus to which a person responds overtly; electronic means.
this response can be scored or evaluated.
The specific questions or problems that make up a test. TYPES OF TESTS
Psychological Test Individual Tests
Educational test is a set of items that are designed to The examiner or test administrator gives the test to only
measure characteristics of human beings that pertain to one person at a time, the same way that psychotherapists
behavior. see only person at a time.
Group Test
Psychological test vary by content, format, Can be administered to more than one person at a time
administration, scoring, interpretation, and technical quality by a single examiner, such as when an instructor gives
everyone in the class a test at the same time.
TECHNICAL QUALITY OR PSYCHOMETROC Ability Test
SOUNDNESS One can also categorize tests according to the type of
Psychometrics behavior they measure.
The science of psychological measurement. The Contain items that can be scored in terms of speed,
psychometric soundness of a test depends on how accuracy, or both.
consistently and accurately the test measures what Id On an ability test, the faster or the more accurate your
purport to measure. responses, the better your scores on a particular
characteristic.
*Test users are sometimes referred to as *Measure skills in terms of speed, accuracy, or both
Psychometrists or pscyhometricians.
DIFFERENT TYPES OF ABILITY
Testing Achievement
The process of measuring psychology-related variables Refers to previous learning.
by means of device or procedures designed to obtain a Aptitude
sample of behavior. Refers to the potential for learning or acquiring a specific
Assessment skill.
The gathering and integration of psychology-related data Intelligence
for the purpose of making a psychological evaluation A person’s general potential to solve problems, adapt to
through tools such as tests, interviews, case studies, changing circumstances, think abstractly, and profit from
behavioral observation, and other methods. experience.
TYPES OF BEHAVIOR (Kaplan and WHAT IS THE DIFFERENCE BETWEEN ABILITY TEST
Saccuzzo) Overt Behavior AND PERSONALITY TEST?
An individual’s observable activity. Ability Tests
Some psychological tests attempt to measure the extent Are related to capacity or potential
to which someone might engage in or “emit” a particular Personality Tests
overt behavior. Are related to the overt and covert dispositions of the
Covert Behavior individual
It takes place within an individual and cannot be directly * Provide an ambiguous test stimulus; response requirements
observed. are unclear.
Objective of Testing Types of Personality tests
Typically to obtain some gauge, usually numerical in Structured Personality Tests
nature, with regard to an ability or tribute. provide a statement, usually of the “self-
Objective of Assessment report” variety, and require the subject to
Typically to answer a referral question, solve a problem choose between two or more alternative
or arrive at a decision through the tools of evaluation. responses such as “True or False”
Projective Personality Test
Interview The stimulus (test materials) or the required
Method of gathering information through direct response-or both - are ambiguous. Rather
communication involving reciprocal exchange. than being asked to choose among
Quality of information obtained in an interview often alternative response the individual is asked
depends on the skills of the interviewer (e.g. their to provide a spontaneous response.
pacing,
PRELIM
ASES311
WHO ARE THE PARTIES? WHERE TO GO FOR INFORMAITON OF TESTS?
The Test Developer Test Catalogue
Tests are created for research studies, publication (as catalogues distributed by publishers of tests. Usually
commercially available instruments) or as modifications brief, and UN-critical, descriptions of tests.
of existing tests. Test Manuals
The Test User Detailed information concerning the development of a
Test are used by a wide range of professionals particular test and technical information.
The standards contains guidelines for who should be Reference Volumes
administering psychological tests but many countries Reference volumes like the mental measurements
have no ethical or legal guidelines for test use. yearbook or tests in print provide detailed information on
many tests.
WHO ARE THE PARTIES? Journal Articles
The Test-Taker Contain reviews of a tests, updated or independent
Anyone who is the subject of an assessment or evaluation studies of its psychometric soundness, or examples of
is a test-taker. how the instrument was used in either research or an
Test-takers may differ on a number of variables at the applied context.
time of testing (e.g. test anxiety, emotional)
Society at Large
Test developers create tests to meet the needs of an LESSON 2: HISTORY
evolving society.
Laws and court decisions may play a major role in test Historical Perspective
development, administration, and interpretation. We now briefly provide the historical context of
Other Parties psychological testing .
Organizations, companies, and governmental agencies
sponsor the development of tests. China
Companies may offer test scoring and interpretation It is believed that tests and testing programs first came
Researchers may review tests and evaluate their into being in china as early ad 2200 B.C.E.
psychometric soundness. Testing was instituted as a means of selecting who, of
many applicants, would obtain government jobs.
WHAT TYPE OF SETTINGS?
Educational Settings Han Dynasty (206-220 B.C.E)
Student typically undergo school ability tests and The use of test batteries was quite common.
achievement tests. Civil law
Diagnostic tests may be used to identify areas for Military affaire
educational intervention. Geography
Educators may be also make informal evaluations of Revenue
their students. Agriculture
Clinical Settings Ming Dynasty (1368-1644 C.E)
Includes hospitals, inpatient and outpatient clinics, A national multistage testing program.
private-practice consulting rooms Local level provincial capitals for more extensive essay
Assessment tools are used to help screen for or diagnose examinations
behavior problems. Second testing, those with the highest test scores went on
Business and Military Settings to the nation; capital
Decisions regarding careers of personnel are made with a Final round only those who passed this third set of tests
variety of achievement, aptitude, interest, motivational, were eligible for public office.
and other tests.
Government and Organizational Credentials Western world
Includes governmental licensing, certification, or general Most likely learned about testing programs through the
credentialing of professionals Chinese.
Reports by British Missionaries
HOW ARE ASSESSMENTS CONDUCTED? Diplomats encouraged the English East India Company
There are many different methods used. Ethical testers have in 1832 to copy the Chinese system as a method of
responsibilities before, during, and after testing. selecting employees for overseas duty.
Obligations Include: Because testing programs worked well for the company,
Familiarity with test materials and the British government adopted a similar system of
procedures testing for its civil service in 1855.
Room is suitable and conducive to the After the British endorsement of a civil service testing
testing system, the French and German Government followed
Establish rapport during test administration suit.
Accommodations need to be made - the adaptation of a test, US. Government 1883
procedure, or situation, or the substitution of one test for Established the American Civil Service Commision,
another, to make the assessment more suitable for an assessed which developed and administered competitive
with exceptional needs. examinations for certain government jobs.
Wiggins 1973
The impetus of the testing movement in the Western
world grew rapidly at that time.
PRELIM
ASES311
CHARLES DARWIN AND INDIVIDUAL OTHER STUDENT OF WUNDT
DIFFERENCES Spearman is credited with originating the concept of test
Perhaps the most basic concept underlying psychological reliability as well as building the mathematical
and educational testing pertains to individual differences. framework for the statistical technique of factor analysis.
No two snowflakes are identical, no two finger prints the Victor Henri is the Frenchman who would collaborate
same. with Alfred Binet on papers suggesting how mental tests
Similarly, no two people are exactly alike in ability and could be used to measure higher mental processes.
typical behavior.
Psychiatrist Emil Kraepelin
An early experimenter with the word association
Individual Differences Came with the Publication of Charles technique as formal test.
Darwin’s book the origin of species in 1859 Kraepelin (1912) devised a series of examinations for
Darwin argued that chance variation in species would be evaluating emotionally impaired people. Similarly, one
selected or rejected by nature according to adaptivity and of the earliest test resembling current procedures, the
survival value Seguin Form Board Test (Seguin, 1866/1907), was
He further argued that humans had descended from the developed in an effort to educate and evaluate the
ape as a result of such chance genetic variations. mentally disabled
Through this process, he argued, life has evolved to its
currently complex and intelligent levels. Lightner Witner
Been cited as the “little-know founder of clinical
Sir Francis Galton psychology”. founded the first psychology clinic in the
Given the concepts of survival of the fittest and invidual United Stated at the University of Pennsylvania. In 1907
differences, Galton set out to show that some people Witmer founded the journal Psychological Clinic.
possessed characteristics that made them more fit than
others. THE MEASUREMENT OF INTELLIGENCE
He concentrated on demonstrating that individual Binet and collaborator Theodore Simon (1895) published
differences exist in human sensory and motor a 30-item “measuring scale of intelligence” designed to
functioning, such as reaction time, visual acuity, and help identify Paris school children with intellectual
physical strength. disability.
Galton would be credited with devising or contributing A representative sample is one that comprises
to the development of many contemporary tools pf individuals similar to those for whom the test is to be
psychological assessment, including questionnaires, used.
rating scales, and self report inventories. The Binet-Simon (1908) Scale determined a child’s
mental age.
Psychological testing developed from at least two
lines of inquiry: L.M. Terman
Based on the work of Darwin, Galton, and Cattell on the In 1911, the Binet-Simon Scale received a minor
measurement of individual differences, and the other revision.
Based on the work of the German psychophysicists By 1916, Stanford University had revised the Binet test
Herbart, Weber, Fechner, and Wundt (more theoretically for use in the United States.
relevant and probably stronger) Terman’s revision, known as the Stanford-Binet
Experimental psychology developed from the latter. Intelligence Scale (1916)
Wilhelm Max Wundt Intelligence
Founded the first experimental psychology laboratory at “the aggregate or global capacity of the individual to act
the University of Leipzig in Germany purposefully, to think rationally, and to deal effectively
Wundt and his student tried to formulate a general with his environment” (wechsler, 1939).
description of human abilities with respect to variables
such as reaction time, perception, and attention span. Wechsler-Bellevue Intelligence Scale
Renamed the Wechsler Adult Intelligence Scale (WAIS)
The objective is to ensure that any observed differences in The WAIS has been revised several times since then, and
performance are indeed due to differences between the people versions of Wechsler’s test have been published that
being measured and not to any extraneous variables. Manuals extend the age range of test takers from early childhood
for the administration of many tests provide explicit through senior adulthood.
instructions design to hold constant or “standardize” the
conditions under which the test is administered. This is so that WORLD WAR I
any differences in scores on the test are due to differences in Army requested the assistance of Robert Yerkes, who
the test takers rather than to differences in the conditions was then the president of the American Psychological
under which the test is administered. Association
Army Alpha - required reading ability
James Mckeen Cattell Army Beta - measured the intelligence of illiterate
Wundt’s students at Leipzig adults.
Completed a doctoral dissertation that dealt with Stanford-Binet Intelligence Scale had appeared at a time
individual differences-specifically, individual differences of strong demand and high optimism for the potential of
in reaction time. measuring human behavior through tests.
Coined the term mental test.
PRELIM
ASES311
World War I and the creation of group tests had then SUMMARY OF PERSONALITY TESTS
added momentum to the testing movement. Shortly after
the appearance of the 1916 Stanford-Binet Intelligence Woodworth Personal Data Sheet
Scale and the Army Alpha test, schools, colleges, and An early structured personality test that assumed that a
industry began using tests. test response can be taken at face value.
The Rorschach Inkblot Test
RISING TO THE CHALLENGE A highly controversial projective test that provided an
The Stanford-Binet test had long been criticized because ambiguous stimulus (an inkblot) and asked the subject
of its emphasis on language and verbal skills, making it what it might be.
inappropriate for many individuals, such as those who The Thematic Apperception Test (TAT)
cannot speak or who cannot read. In addition, few people A projective test that provided ambiguous pictures and
believed that language or verbal skills play an exclusive asked subjects to make up a story
role in human intelligence. The Minnesota Multiphasic Personality Inventory (MMPI)
Wechsler’s inclusion of a nonverbal scale thus helped A structured personality test that made no assumptions
overcome some of the practical and theoretical about the meaning of a test response. Such meaning was
weaknesses of the Binet test. to be determine by empirical research
In 1986, the Binet test was drastically revised to include The California Psychological Inventory (CPI)
performance subtests. A structured personality test developed according to the
same principles as the MMPI
WORLD WAR II The Sixteen Personality Factor Questionnaire (16PF)
Personality tests based on fewer or different assumptions A structured personality test based on the statistical
were introduced, thereby rescuing the structured procedure of factor analysis.
personality test.
Projective personality tests provide an ambiguous
stimulus and unclear response requirements. Futhermore,
the scoring of projective tests is often subjective
The Rorschach test was first published by Herman
Rorschach of Switzerland in 1921.
The first Rorschach doctoral dissertation written in a U.S.
university was not completed until 1932, when Sam
Beck, Levy’s student, decided to investigate the
properties of the Rorschach test scientifically.
FACTOR ANALYSIS
A method of finding the minimum number of
dimensions (characteristics, attributes), called factors, to
account for a large number of variables.
We may say a person is outgoing, is gregarious, seeks
company, is talkative, and enjoy relating to others.
However, these descriptions contain a certain amount of
redundancy.
A factory analysis can identify how much they overlap
and whether they can all be accounted for subsumed
under a single dimension (or factor) such as extroversion.
PRELIM
ASES311
EQUAL INTERVALS
A scale has the property of equal intervals if the
difference between two points at any place on the scale
has the same meaning.
As the difference between two other points that differ by
the same number of scale units.
ABSOLUTE 0
An absolute 0 is obtained when nothing of the property
being measured exists.
TYPE OF SCALES
Nominal Scales
Really not scales at all Simple Frequency Distribution
Only purpose is to name object Indicate that individual scores have been used and the
Simplest form of measurement data have not been grouped.
1 PRELIM
ASES311
HISTOGRAM
Histogram is a graph
With vertical lines drawn at the true limits of each test
distribution. The formula for doing this ×= ∑(fx) where
An arithmetic means can also be computed from a frequency
score (or class interval), forming a series of contiguous
rectangles.
∑(fx) means “multiply the frequency of each score by its
corresponding score and then sum.”
BAR GRAPH
Numbers indicative of frequency also appear on the Y-
axis, and reference to some categorization (e.g.,
yes/no/maybe,male/female) appears on the X-axis.
Here the rectangular bars typically are not contiguous.
FREQUENCY POLYGON
Are expressed by a continuous line connecting the points
where test scores or class intervals (as indicated on the
X-axis) meet frequencies (as indicated on the Y-axis).
Whenever you draw a frequency distribution or a frequency
polygon, you must decide on the width of the class interval.
Refers to the numerical width of any class in a particular
distribution.
2 PRELIM
ASES311
RANGE
Of a distribution is equal to the difference between the
highest and the lowest scores.
Provides a quick but gross description of the spread of
scores.
=
2
deviation scores is 2 ( − )
distribution
VARIABILITY
Is an indication of how scores in a distribution are STANDARD DEVIATION
scattered or dispersed. A statistic that measures the degree of spread or
dispersion of a set of scores.
The value of this statistics is always greater than or equal
to zero.
3 PRELIM
ASES311
NEGATIVE SKEW
When relatively few of the scores fall at the low end of
the distribution.
Negatively skewed examination results may indicate that
the test was too easy.
50% of the scores occur above the mean and 50% of the
scores occur below the mean.
Approximately 34% of all scores occur between the
KURTOSIS mean and 1 standard deviation above the mean.
Use to refer to the steepness of a distribution in its Approximately 34% of all scores occur between the
center To the root kurtic is added to one of the prefixes mean and 1 standard deviation below the mean.
Platy- Approximately 68% of all scores occur between the
Lepto- mean and +/-1 standard deviation.
Meso- Approximately 95% of all scores occur between the
To describe the peakedness/flatness of three general types of mean and +/-2 standard deviations
curves
STANDARD SCORES
Distributions are generally described Why concert raw scores to standard scores?
Platykurtic - relatively flat Standard scores are more easily interpretable than raw
Leptokurtic - relatively peaked scores
Mesokurtic - somewhere in the middle With a standard score, the position of a test-taker’s
performance relative to other test-takers is readily
apparent.
Different systems for standard scores exist
Z scores
T scores
Stanines
Other standard scores.
Asymptotically
It approaches, but never touches, the axis
The curve is perfectly symmetrical, with no skewness
If you folded it in half at the mean, one side would lie
exactly on top of the other.
Because it is symmetrical, the mean, the media, and the
mode all have the same value.
4 PRELIM
ASES311
DEVIATION IQ
Or deviation intelligence quotient.
Most IQ tests, the distribution of raw scores is converted
to IQ scores, whose distribution typically has a mean set
at 100 and a standard deviation set at 15.
SCATTERPLOT
The typical mean and standard deviation for IQ tests
Involves simply plotting one variable on the X
results in approximately 95% of deviation IQs ranging
(horizontal) axis and the other on the Y (vertical) axis
from 70 to 130, which is 2 standard deviations below and
above the mean.
5 PRELIM
ASES311
6 PRELIM
ASES311 MIDTERM
BIOJON 1
ASES311 MIDTERM
History
Developed in 1939
Starke R. Hathway (Clinical psychologist) and J.C.
Mckinley (Neuropsychiatrist)
Published in 1943
To diagnose mental health disorders and assess severity
BIOJON 2
ASES311 MIDTERM
BIOJON 3
ASES311 MIDTERM
Proponent
Lauretta Bender
1938
Administration Materials
16 stimulus cards
Two supplemental tests: Motor Test and Perception
Test
Observation form (for recording time and different
types of testtaking behavior)
Two number 2 pencils with erasers
3-5 Sheet of papers
0 No resemblance, random drawing, scribbling, lack of
design
BIOJON 4
ASES311 MIDTERM
1 Slight - vague
2 Some - moderate resemblance
3 Strong - close resemblance, accurate reproduction
4 Nearly perfect
BIOJON 5
ASES311 MIDTERM
BIOJON 6
ASES311 MIDTERM
Introduction Introduction
In the 1930s, John C. Raven introduced a non-verbal Raymond B. Cattell
cognitive test, minimizing cultural bias. Developed to be a measure of intelligence without
It relies on visual patterns to assess abstract reasoning cultural biases. Aiming at deriving a culture-free
and fluid intelligence. intelligence test based on a research of the literature,
The Standard and Advanced Progressive Matrices the author decided on seven sub-steps
versions have sustained its widespread use, highlighting
the lasting impact of Raven's pioneering work Seven Sub-Steps:
It’s a non-verbal test intelligence test designed to 1-2: Mazes, series
measure abstract reasoning ability. 3: Classification
The test was developed by John C . Raven in 1983. 4: Progressive Matrices I relation matrix first order
Raven was a British psychologist who aimed to create a 5: Progressive Matrices II relation matrix second order
test that could assess person’s cognitive abilities 6: Progressive Matrices III sequence matrix
without relying on language or specific cultural 7: Mirror images
knowledge
Crystallized Intelligence
Age Consideration Represents knowledge acquired through experience.
4 to 90 years of age (the administration time depends Are thought to reflect the influence of culture and
on the age of the individual) schooling such as verbal memory and general
Qualification Level: B knowledge.
Age Starting Item Set Item Set # of Item Time Limit
4-8 A A,B, C 36 30
9-79 B B,C,D,E 48 45 Fluid intelligence
80-90 A A,B,C 36 45 Represents the biological ability to acquire knowledge
and solve problems. Are thought to reflect intelligence
Paper Administration independent of learning such as reasoning speed,
For each examinee: spatial reasoning, and inductive reasoning.
1 test booklet
1 answer sheet The Need For The Culture-Fair test Arises Because:
Pencil with eraser Certain ethnic groups may be naturally favored by the
nature of an exam, particularly if the examination
contains things or language unique to that group.
BIOJON 7
ASES311 MIDTERM
Requirements
Level B: Available only if the test administrator has
completed an advanced level course in testing in a
university, or its equivalent in training under the
direction of a qualified superior or consultant.
Age Range
Scale 1: Ages 4 to 8 years and older, mentally
handicapped individuals
Scale 2: Ages 8 to 14 years and average adults
Scale 3: 14 to college students and adults of superior
intelligence
Materials Required
CFIT Form A and B test booklet
Stopwatch
Screen
Pencil
Eraser
CFIT Manual
CFIT Technical Manual
Response Sheets
BIOJON 8
ASES311 MIDTERM
Construct: An informed, scientific concept developed or Error Variance - The component of a test score attributable
constructed to explain a behavior, inferred from overt to sources other than the trait or ability measured
behavior
Overt Behavior : An observable action or the product of an Potential Sources of Error Variance:
observable action 1. Assessors
Trait is not expected to be manifested in behavior 100% 2. Measuring Instruments
of the time 3. Random errors such as luck
Whether a trait manifests itself in observable behavior,
and to what degree it manifests, is presumed to depend Classical Test Theory - each test-taker has true score on a
not only on the strength of the trait in the individual but test that would be obtained but for the action of
also on the nature of the action (situation-dependent) measurement error
Context within which behavior occurs also plays a role in
helping us select appropriate trait terms for observed Assumption 6: Testing and Assessment can be Conducted in
behaviors a Fair and unbiased Manner
Definition of trait and state also refer to a way in which All major test publishers strive to develop instruments
one individual varies from another that are fair when used in strict accordance with
Assessors may make comparisons among people who, guidelines in the test manual.
because of their membership in some group or for any Problems arise if the test is used with people for whom
number of other reasons, are decidedly not average it was not intended.
Some problems are more political than psychometric in
Assumption 2: Psychological Traits and States Can Be nature
Quantified and Measured Despite best efforts of many professionals, fairness-
Different test developers consider the types of item related questions and problems do occasionally rise
content that would provide insights to it, to gauge the In all questions about tests with regards to fairness, it is
strength of that trait. important to keep in mind that tests are tools they can
Measuring traits and states means of a test entail be used properly or improperly
developing not only appropriate tests items but also
appropriate ways to score the test and interpret the Assumption 7: Testing and Assessment Benefit Society
results Considering the many critical decisions that are based
Cumulative Scoring – assumption that the more the on testing and assessment procedures, we can readily
test-taker responds in a particular direction keyed by appreciate the need for tests
the test manual as correct or consistent with a There is a great need for tests, especially good tests,
particular trait, the higher that test-taker is considering the many areas of our lives that they benefit.
Presumed to be on the targeted ability or trait
BIOJON 9
ASES311 MIDTERM
BIOJON 10
ASES311 MIDTERM
Carryover Effects:
Happened when the test-retest interval is short,
wherein the second test is influenced by the first test
because they remember or practiced the previous test
BIOJON 11
ASES311 MIDTERM
Coefficient alpha
Appropriate for use on tests containing non-
dichotomous items.
Calculated to help answer questions about how similar
sets of data are.
Statistical tool: Pearson R, Spearman Rho On a scale from 0 (absolutely no similarity) to 1
Test-retest reliability (perfectly identical)
Estimate of reliability obtained by correlating pairs of
scores from the same people on two different Inter-scorer reliability
administrations of the same test Is the degree of agreement or consistency between two
or more scorers (or judges or raters) with regard to a
Parallel Forms Method particular measure.
Each form of the test, the means, and the error Often used when coding nonverbal behavior.
variances, are EQUAL; same items, different
positionings/numberings Measures of Inter-Scorer Reliability
True score must be the same for two test. Fleiss Kappa
Determine the level between TWO or MORE raters
Alternate Forms when the method of assessment is measured on
Simply different version of a test that has been CATEGORICAL SCALE
constructed so as to be parallel - test should contain theCohen’s Kappa
same number of items and the items should be Two raters
expressed in the same form and should cover the same Krippendorff’s Alpha
type of content; range and difficulty must also be equal Two or more rater, based on observed disagreement
- if there is a test leakage, use the form that is not corrected for disagreement expected by chance
mostly administered Basic research
It has been suggested that reliability estimates in the
Inter-item consistency / Internal consistency reliability range of .70 and .80 are good enough for most purposes
Refers to the degree of correlation among all the items in .
on a scale. Some people have argued that it would be a waste of
Calculated from a single administration of a single form time and effort to refine research instruments beyond a
of a test. reliability of .90.
Useful in assessing the homogeneity of the test.
A way to measure the validity of the test and each item How Reliable Is Reliable? Clinical
on the test. settings
Such as whether the items in a questionnaire are all High reliability is extremely important.
measuring the same construct. A test with a reliability of .90 might not be good enough.
For a test used to make a decision that affects some
Split-Half Reliability person’s future, evaluators should attempt to find a test
Obtained by correlating two pairs of scores obtained with a reliability greater than .95.
from equivalent halves of a single test administered
ONCE Increase the Number of Items
Useful when it is impractical or undesirable to assess The larger the sample, the more likely that the test will
reliability with two tests or to administer a test twice represent the true characteristic.
Cannot just divide the items in the middle because it
might spuriously raise or lower the reliability coefficient Factor and Item Analysis
The reliability of a test depends on the extent to which
The Spearman–Brown formula all of the items measure one common characteristic
It is a specific application of a more general formula to
estimate the reliability of a test that is lengthened or
shortened by any number of items.
Could also be used to determine the number of items
needed to attain a desired level of reliability.
BIOJON 12
ASES311 MIDTERM
Test Blueprint
A plan regarding the types of information to be covered
by the items, the no. of items tapping each area of
coverage, the organization of the items, and so forth
Concerned with the extent to which the test is
representative of defined body of content consisting the
topics and processes
Criterion Validity
A judgment of how adequately a test score can be used
to infer an individual’s most probable standing on some
measure of interest—the measure of interest being the
criterion.
Tells us just how well a test corresponds with a
particular criterion
Criterion: standard on which a judgement or decision
may be made
Concurrent validity
Is an index of the degree to which a test score is related
to some criterion measure obtained at the same time
(concurrently)
VALIDITY
A judgment or estimate of how well a test measures Predictive validity
what it purports to measure in a particular context Is an index of the degree to which a test score predicts
(Cohen and Swerdlik, 2018). some criterion measure.
The agreement between a test score or measure and
the quality it is believed to measure (Kaplan and Criterion-Related Validity
Sacuzzo (2018). When evaluating the predictive validity of a test,
Validation researchers must take into consideration the base rate
Is the process of gathering and evaluating evidence of the occurrence of the variable in question, both as
about validity that variable exists in the general population and as it
exists in the sample being studied
Face Validity
Relates more to what a test appears to measure to the Base rate
person being tested than to what the test actually The extent to which a particular trait, behavior,
measures. characteristic, or attribute exists in the population
Face validity is really not validity at all because it does (expressed as a proportion).
not offer evidence to support conclusions drawn from
test scores. Hit rate
These appearances can help motivate test takers Defined as the proportion of people a test accurately
because they can see that the test is relevant identifies as possessing or exhibiting a particular trait,
behavior, characteristic, or attribute.
Three Aspects of Validity
Content Validity Miss rate
Criterion-related validity May be defined as the proportion of people the test
Construct validity fails to identify as having, or not having, a particular
characteristic or attribute
Content validity
Whether the test covers the behavior domain to be The category of misses may be further subdivided:
measured which is built through the choice of False positive
appropriate content areas, questions, tasks and items. Predicted success does not occur
Content validation is not done by statistical analysis but False negative
by the inspection of items. A panel of experts can Predicted failure but succeed
BIOJON 13
ASES311 MIDTERM
Incremental validity
The degree to which an additional predictor explains Leniency Error (Generosity Error)
something about the criterion measure that is not Rater is lenient in scoring
explained by predictors already in use;
Used to improve the domain Severity Error:
Rater is strict in scoring
CONSTRUCT VALIDITY
A judgment about the appropriateness of inferences Central Tendency Error:
drawn from test scores regarding individual standings Rater’s rating would tend to cluster in the middle of the
on a variable called a construct. rating scale
Construct One way to overcome rating errors is to use rankings
Is an informed, scientific idea developed or
hypothesized to describe or explain behavior. Halo Effect:
Constructs are unobservable, presupposed (underlying) Tendency to give high score due to failure to
traits that a test developer may invoke to describe test discriminate among conceptually distinct and
behavior or criterion performance. potentially independent aspects of a rate’s behavior
Convergent Evidence * Attempting to define the validity of the test will be futile if
if scores on the test undergoing construct validation the test is NOT reliable
tend to highly correlated with another established,
validated test that measures the same construct
Discriminant Evidence
a validity coefficient showing little relationship between
test scores and/or other variables with which scores on
the test being construct-validated should not be
correlated
Multitrait-multimethod Matrix: useful for examining
both convergent and discriminant validity evidence
Multitrait: two or more traits
Multimethod: two or more methods
Factor Analysis
Designed to identify factors or specific variables that are
typically attributes, characteristics, or dimensions on
which people may differ
Used to study the interrelationships among set of
variables
Identify the factor or factors in common between test
scores on subscales within a particular test
Factor Loading:
Conveys information about the extent to which the
factor determines the test score or scores
Can be used to obtain both convergent and discriminant
validity
Rating
Numerical or verbal judgement that places a person or
an attribute along a continuum identified by a scale of
numerical or word descriptors known as Rating Scale
Rating Error:
Intentional or unintentional misuse of the scale
BIOJON 14
ASES3111 FINAL
BIOJON 1
ASES3111 FINAL
the work of another psychologist whose professional conduct However, we shall immediately discontinue said services
is in question. as soon as the emergency has ended, and ensure that
F. Cooperating with Ethics Committee appropria
We cooperate with the ethics investigation, proceedings
and requirements of any psychological association we
belong to.
G. Improper Complaints
We refrain from filing ethical complaints with reckless
disregard or willful ignorance of facts that would
disprove allegations of ethical violations. We also refrain
from filing complaints without supporting factual
evidence.
H. Unfair Discrimination Against Complainants and
Respondents
1. We do not discriminate against complainants and
respondents of ethical complaints by denying them
employment, advancement, admissions to academic, tenure
or promotion.
2. This does not rule out taking appropriate actions based on
outcomes of proceedings.
II. COMPETENCIES
A. Boundaries of Competence
1. We shall provide services, teach, and conduct research
with persons, populations in areas only within the boundaries
of our competence, based on our education, training,
supervised internship, consultation, thorough study, or
professional experience.
2. We shall make appropriate referrals, except as provided in
Standard A.2, Providing Services in Emergencies, where our
existing competencies are not sufficient to ensure effective
implementation or provision of our services.
3. When we plan to provide services, teach, or conduct
research involving populations, areas, techniques, or
technologies that are new to us and/or are beyond our
existing competence, we must undertake relevant education,
training, supervised experience, consultation, or thorough
study.
4. So as not to deprive individuals or groups of necessary
services, which we do not have existing competence, we may
provide the service, as long as:
a. we have closely related prior training or experience,
and
b. we make a reasonable effort to obtain the
competence required by undergoing relevant
research, training, consultation, or thorough study.
5. In those emerging areas in which generally recognized
standards for preparatory training do not yet exist, but in
which we are required or requested to make available our
services, we shall take reasonable steps to ensure the
competence of our work and to protect our clients/patients,
students, supervisees, research participants, organizational
clients, and others from harm.
6. We shall be reasonably familiar with the relevant judicial
or administrative rules when assuming forensic roles.
BIOJON 2
ASES3111 FINAL
Utility Analysis
Defined as a family of techniques that entail a cost–
benefit analysis designed to yield information relevant The cut score in use
to a decision about the usefulness and/or practical value Also called a cutoff score, we have previously defined a
of a tool of assessment cut score as a (usually numerical) reference point
derived as a result of a judgment and used to divide a
set of data into two or more classifications, with some
BIOJON 3
ASES3111 FINAL
BIOJON 4
ASES3111 FINAL
BIOJON 5