0% found this document useful (0 votes)
33 views29 pages

Ases311 All Sem

The document provides an overview of psychological testing and assessment, including definitions of key terms such as tests, test scores, and types of tests. It discusses the historical context of psychological testing, the roles of various parties involved, and the different settings in which assessments are conducted. Additionally, it outlines the evolution of intelligence measurement and personality tests, highlighting significant contributors to the field.

Uploaded by

ritchel.brie
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views29 pages

Ases311 All Sem

The document provides an overview of psychological testing and assessment, including definitions of key terms such as tests, test scores, and types of tests. It discusses the historical context of psychological testing, the roles of various parties involved, and the different settings in which assessments are conducted. Additionally, it outlines the evolution of intelligence measurement and personality tests, highlighting significant contributors to the field.

Uploaded by

ritchel.brie
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 29

ASES311

INTRODUCTION TO THE CONCEPT OF rapport, and their ability to convey genuineness, empathy,
PSYCHOLOGICAL TESTING AND ASSESSMENT and humor)
Portfolio
Test  A file containing the product of one’s work.
 A measurement device or technique used to quantify  Serve as a sample of one’s abilities and accomplishments
behavior or aid in the understanding and prediction of
behavior. Case History Data
Test Scores  Information preserved in records, transcripts, or other
 Not perfect measures of a behavior or characteristics, but forms.
they do add significantly to the prediction process. Behavioral Observation
Item  Monitoring the actions of people through visual or
 A specific stimulus to which a person responds overtly; electronic means.
this response can be scored or evaluated.
 The specific questions or problems that make up a test. TYPES OF TESTS
Psychological Test Individual Tests
 Educational test is a set of items that are designed to  The examiner or test administrator gives the test to only
measure characteristics of human beings that pertain to one person at a time, the same way that psychotherapists
behavior. see only person at a time.
Group Test
Psychological test vary by content, format,  Can be administered to more than one person at a time
administration, scoring, interpretation, and technical quality by a single examiner, such as when an instructor gives
everyone in the class a test at the same time.
TECHNICAL QUALITY OR PSYCHOMETROC Ability Test
SOUNDNESS  One can also categorize tests according to the type of
Psychometrics behavior they measure.
 The science of psychological measurement. The  Contain items that can be scored in terms of speed,
psychometric soundness of a test depends on how accuracy, or both.
consistently and accurately the test measures what Id  On an ability test, the faster or the more accurate your
purport to measure. responses, the better your scores on a particular
characteristic.
*Test users are sometimes referred to as *Measure skills in terms of speed, accuracy, or both
Psychometrists or pscyhometricians.
DIFFERENT TYPES OF ABILITY
Testing Achievement
 The process of measuring psychology-related variables  Refers to previous learning.
by means of device or procedures designed to obtain a Aptitude
sample of behavior.  Refers to the potential for learning or acquiring a specific
Assessment skill.
 The gathering and integration of psychology-related data Intelligence
for the purpose of making a psychological evaluation  A person’s general potential to solve problems, adapt to
through tools such as tests, interviews, case studies, changing circumstances, think abstractly, and profit from
behavioral observation, and other methods. experience.
TYPES OF BEHAVIOR (Kaplan and WHAT IS THE DIFFERENCE BETWEEN ABILITY TEST
Saccuzzo) Overt Behavior AND PERSONALITY TEST?
 An individual’s observable activity. Ability Tests
 Some psychological tests attempt to measure the extent  Are related to capacity or potential
to which someone might engage in or “emit” a particular Personality Tests
overt behavior.  Are related to the overt and covert dispositions of the
Covert Behavior individual
 It takes place within an individual and cannot be directly * Provide an ambiguous test stimulus; response requirements
observed. are unclear.
Objective of Testing Types of Personality tests
 Typically to obtain some gauge, usually numerical in Structured Personality Tests
nature, with regard to an ability or tribute.  provide a statement, usually of the “self-
Objective of Assessment report” variety, and require the subject to
 Typically to answer a referral question, solve a problem choose between two or more alternative
or arrive at a decision through the tools of evaluation. responses such as “True or False”
Projective Personality Test
Interview  The stimulus (test materials) or the required
 Method of gathering information through direct response-or both - are ambiguous. Rather
communication involving reciprocal exchange. than being asked to choose among
 Quality of information obtained in an interview often alternative response the individual is asked
depends on the skills of the interviewer (e.g. their to provide a spontaneous response.
pacing,

PRELIM
ASES311
WHO ARE THE PARTIES? WHERE TO GO FOR INFORMAITON OF TESTS?
The Test Developer Test Catalogue
 Tests are created for research studies, publication (as  catalogues distributed by publishers of tests. Usually
commercially available instruments) or as modifications brief, and UN-critical, descriptions of tests.
of existing tests. Test Manuals
The Test User  Detailed information concerning the development of a
 Test are used by a wide range of professionals particular test and technical information.
 The standards contains guidelines for who should be Reference Volumes
administering psychological tests but many countries  Reference volumes like the mental measurements
have no ethical or legal guidelines for test use. yearbook or tests in print provide detailed information on
many tests.
WHO ARE THE PARTIES? Journal Articles
The Test-Taker  Contain reviews of a tests, updated or independent
 Anyone who is the subject of an assessment or evaluation studies of its psychometric soundness, or examples of
is a test-taker. how the instrument was used in either research or an
 Test-takers may differ on a number of variables at the applied context.
time of testing (e.g. test anxiety, emotional)
Society at Large
 Test developers create tests to meet the needs of an LESSON 2: HISTORY
evolving society.
 Laws and court decisions may play a major role in test Historical Perspective
development, administration, and interpretation.  We now briefly provide the historical context of
Other Parties psychological testing .
 Organizations, companies, and governmental agencies
sponsor the development of tests. China
 Companies may offer test scoring and interpretation  It is believed that tests and testing programs first came
 Researchers may review tests and evaluate their into being in china as early ad 2200 B.C.E.
psychometric soundness.  Testing was instituted as a means of selecting who, of
many applicants, would obtain government jobs.
WHAT TYPE OF SETTINGS?
Educational Settings Han Dynasty (206-220 B.C.E)
 Student typically undergo school ability tests and  The use of test batteries was quite common.
achievement tests. Civil law
 Diagnostic tests may be used to identify areas for Military affaire
educational intervention. Geography
 Educators may be also make informal evaluations of Revenue
their students. Agriculture
Clinical Settings Ming Dynasty (1368-1644 C.E)
 Includes hospitals, inpatient and outpatient clinics,  A national multistage testing program.
private-practice consulting rooms  Local level provincial capitals for more extensive essay
 Assessment tools are used to help screen for or diagnose examinations
behavior problems.  Second testing, those with the highest test scores went on
Business and Military Settings to the nation; capital
 Decisions regarding careers of personnel are made with a  Final round only those who passed this third set of tests
variety of achievement, aptitude, interest, motivational, were eligible for public office.
and other tests.
Government and Organizational Credentials Western world
 Includes governmental licensing, certification, or general  Most likely learned about testing programs through the
credentialing of professionals Chinese.
Reports by British Missionaries
HOW ARE ASSESSMENTS CONDUCTED?  Diplomats encouraged the English East India Company
There are many different methods used. Ethical testers have in 1832 to copy the Chinese system as a method of
responsibilities before, during, and after testing. selecting employees for overseas duty.
Obligations Include:  Because testing programs worked well for the company,
 Familiarity with test materials and the British government adopted a similar system of
procedures testing for its civil service in 1855.
 Room is suitable and conducive to the  After the British endorsement of a civil service testing
testing system, the French and German Government followed
 Establish rapport during test administration suit.
Accommodations need to be made - the adaptation of a test, US. Government 1883
procedure, or situation, or the substitution of one test for  Established the American Civil Service Commision,
another, to make the assessment more suitable for an assessed which developed and administered competitive
with exceptional needs. examinations for certain government jobs.
Wiggins 1973
 The impetus of the testing movement in the Western
world grew rapidly at that time.

PRELIM
ASES311
CHARLES DARWIN AND INDIVIDUAL OTHER STUDENT OF WUNDT
DIFFERENCES  Spearman is credited with originating the concept of test
 Perhaps the most basic concept underlying psychological reliability as well as building the mathematical
and educational testing pertains to individual differences. framework for the statistical technique of factor analysis.
 No two snowflakes are identical, no two finger prints the  Victor Henri is the Frenchman who would collaborate
same. with Alfred Binet on papers suggesting how mental tests
 Similarly, no two people are exactly alike in ability and could be used to measure higher mental processes.
typical behavior.
Psychiatrist Emil Kraepelin
 An early experimenter with the word association
Individual Differences Came with the Publication of Charles technique as formal test.
Darwin’s book the origin of species in 1859  Kraepelin (1912) devised a series of examinations for
 Darwin argued that chance variation in species would be evaluating emotionally impaired people. Similarly, one
selected or rejected by nature according to adaptivity and of the earliest test resembling current procedures, the
survival value Seguin Form Board Test (Seguin, 1866/1907), was
 He further argued that humans had descended from the developed in an effort to educate and evaluate the
ape as a result of such chance genetic variations. mentally disabled
 Through this process, he argued, life has evolved to its
currently complex and intelligent levels. Lightner Witner
 Been cited as the “little-know founder of clinical
Sir Francis Galton psychology”. founded the first psychology clinic in the
 Given the concepts of survival of the fittest and invidual United Stated at the University of Pennsylvania. In 1907
differences, Galton set out to show that some people Witmer founded the journal Psychological Clinic.
possessed characteristics that made them more fit than
others. THE MEASUREMENT OF INTELLIGENCE
 He concentrated on demonstrating that individual  Binet and collaborator Theodore Simon (1895) published
differences exist in human sensory and motor a 30-item “measuring scale of intelligence” designed to
functioning, such as reaction time, visual acuity, and help identify Paris school children with intellectual
physical strength. disability.
 Galton would be credited with devising or contributing  A representative sample is one that comprises
to the development of many contemporary tools pf individuals similar to those for whom the test is to be
psychological assessment, including questionnaires, used.
rating scales, and self report inventories.  The Binet-Simon (1908) Scale determined a child’s
mental age.
Psychological testing developed from at least two
lines of inquiry: L.M. Terman
 Based on the work of Darwin, Galton, and Cattell on the  In 1911, the Binet-Simon Scale received a minor
measurement of individual differences, and the other revision.
 Based on the work of the German psychophysicists  By 1916, Stanford University had revised the Binet test
Herbart, Weber, Fechner, and Wundt (more theoretically for use in the United States.
relevant and probably stronger)  Terman’s revision, known as the Stanford-Binet
 Experimental psychology developed from the latter. Intelligence Scale (1916)
Wilhelm Max Wundt Intelligence
 Founded the first experimental psychology laboratory at  “the aggregate or global capacity of the individual to act
the University of Leipzig in Germany purposefully, to think rationally, and to deal effectively
 Wundt and his student tried to formulate a general with his environment” (wechsler, 1939).
description of human abilities with respect to variables
such as reaction time, perception, and attention span. Wechsler-Bellevue Intelligence Scale
 Renamed the Wechsler Adult Intelligence Scale (WAIS)
The objective is to ensure that any observed differences in  The WAIS has been revised several times since then, and
performance are indeed due to differences between the people versions of Wechsler’s test have been published that
being measured and not to any extraneous variables. Manuals extend the age range of test takers from early childhood
for the administration of many tests provide explicit through senior adulthood.
instructions design to hold constant or “standardize” the
conditions under which the test is administered. This is so that WORLD WAR I
any differences in scores on the test are due to differences in  Army requested the assistance of Robert Yerkes, who
the test takers rather than to differences in the conditions was then the president of the American Psychological
under which the test is administered. Association
Army Alpha - required reading ability
James Mckeen Cattell Army Beta - measured the intelligence of illiterate
 Wundt’s students at Leipzig adults.
 Completed a doctoral dissertation that dealt with  Stanford-Binet Intelligence Scale had appeared at a time
individual differences-specifically, individual differences of strong demand and high optimism for the potential of
in reaction time. measuring human behavior through tests.
 Coined the term mental test.

PRELIM
ASES311
 World War I and the creation of group tests had then SUMMARY OF PERSONALITY TESTS
added momentum to the testing movement. Shortly after
the appearance of the 1916 Stanford-Binet Intelligence Woodworth Personal Data Sheet
Scale and the Army Alpha test, schools, colleges, and  An early structured personality test that assumed that a
industry began using tests. test response can be taken at face value.
The Rorschach Inkblot Test
RISING TO THE CHALLENGE  A highly controversial projective test that provided an
 The Stanford-Binet test had long been criticized because ambiguous stimulus (an inkblot) and asked the subject
of its emphasis on language and verbal skills, making it what it might be.
inappropriate for many individuals, such as those who The Thematic Apperception Test (TAT)
cannot speak or who cannot read. In addition, few people  A projective test that provided ambiguous pictures and
believed that language or verbal skills play an exclusive asked subjects to make up a story
role in human intelligence. The Minnesota Multiphasic Personality Inventory (MMPI)
 Wechsler’s inclusion of a nonverbal scale thus helped  A structured personality test that made no assumptions
overcome some of the practical and theoretical about the meaning of a test response. Such meaning was
weaknesses of the Binet test. to be determine by empirical research
 In 1986, the Binet test was drastically revised to include The California Psychological Inventory (CPI)
performance subtests.  A structured personality test developed according to the
same principles as the MMPI
WORLD WAR II The Sixteen Personality Factor Questionnaire (16PF)
 Personality tests based on fewer or different assumptions  A structured personality test based on the statistical
were introduced, thereby rescuing the structured procedure of factor analysis.
personality test.
 Projective personality tests provide an ambiguous
stimulus and unclear response requirements. Futhermore,
the scoring of projective tests is often subjective
 The Rorschach test was first published by Herman
Rorschach of Switzerland in 1921.
 The first Rorschach doctoral dissertation written in a U.S.
university was not completed until 1932, when Sam
Beck, Levy’s student, decided to investigate the
properties of the Rorschach test scientifically.

THEMATIC APPERCEPTION TEST (TAT)


 Henry Murray and Christina Morgan in 1935.
 More structured. Its stimuli consisted of ambiguous
picture depicting a variety of scenes and situations, such
as a boy sitting in front of a table with a violin on it.
 Required the subject to make up a story about the
ambiguous scene.

MINNESOTA MULTIPHASIC PERSONALITY


INVENTORY (MMPI)
 To use empirical methods to determine the meaning of a
test response-helped revolutionize structured personality
tests.
 Assesses personality traits and psychopathology.

Traits are relatively enduring dispositions (tendencies to act,


think, or feel in a certain manner in any given circumstance)
that distinguish one individual from another.

FACTOR ANALYSIS
 A method of finding the minimum number of
dimensions (characteristics, attributes), called factors, to
account for a large number of variables.
 We may say a person is outgoing, is gregarious, seeks
company, is talkative, and enjoy relating to others.
However, these descriptions contain a certain amount of
redundancy.
 A factory analysis can identify how much they overlap
and whether they can all be accounted for subsumed
under a single dimension (or factor) such as extroversion.

PRELIM
ASES311

LESSON 3:  These scales involve classification or categorization


PSYCHOLOGICAL ASSESSMENT based on one or more distinguishing characteristics,
where all things measured must be placed into mutually
Statistical methods serve two important purposes in the quest exclusive and exhaustive categories.
for scientific understanding:
1. Statistics are used for purpose of description Ordinal Scales
 Numbers provide convenient summaries and allow us to  This scale allows you to rank individuals or objects but
evaluate some observations relative to others not to say anything about the meaning of the differences
2. We can use statistics to make inferences between the ranks.
 Which are logical deductions about events that cannot be
observed directly. Interval Scale
 When a scale has the properties of magnitude and equal
DESCRIPTIVE STATISTICS interval but not absolute 0
 Methods used to provide a concise description of a  We have reached a level of measurement at which it is
collection of quantitative information possible to average a set of measurements and obtain a
meaningful results.
INFERENTIAL STATISTICS
 Methods used to make inferences from observations of a Ratio Scale
small group of people known as a sample to a larger  A scale that has all three properties
group of individuals known as a population.  Has a absolute 0
 For instance, 0 miles per hour (mph) is the point at which
SCALES OF MEASUREMENT there is no speed at all. If you are driving onto a highway at
 Measurement as the application of rules for assigning 30 mph and increase your speed to 60 when you merge, then
numbers to objects. you have doubled your speed.
Properties of Scales Scales of Measurement and Their Properties
 A scale is a set of numbers (or other symbols) whose Property
properties model empirical properties of the objects to Type of Scale Magnitude Equal Intervals Absolute 0
which the numbers are assigned. Nominal No No No
*measurement always involves error Ordinal Yes No No
Error Interval Yes Yes No
 Refers to the collective influence of all of the factors on Ratio Yes Yes Yes
a test score or measurement beyond those specifically
measured by the test or measurement. DESCRIBING DATA

The three important properties make scales of measurement Distribution


different from one another:  May be defined as a set of test scores arrayed for
1. Magnitude recording or study.
2. Equal Intervals
3. An Absolute 0 Raw Scores
 Unmodified accounting of performance that is usually
MAGNITUDE numerical. A raw score may reflect a simple tally
 Is the property of “moreness”
 A scale has the property of magnitude if we can say thatFREQUENCY DISTRIBUTIONS
a particular instance of the attribute represents more, less,  All scores are listed alongside the number of times each
or equal amounts of the given quantity than does another score occurred.
instance (Gravetter & Wallnau, 2016; Howell, 2008;  Displays scores on a variable or a measure to reflect how
McCall, 2001). frequently each value was obtained.

EQUAL INTERVALS
 A scale has the property of equal intervals if the
difference between two points at any place on the scale
has the same meaning.
 As the difference between two other points that differ by
the same number of scale units.

ABSOLUTE 0
 An absolute 0 is obtained when nothing of the property
being measured exists.

TYPE OF SCALES
Nominal Scales
 Really not scales at all Simple Frequency Distribution
 Only purpose is to name object  Indicate that individual scores have been used and the
 Simplest form of measurement data have not been grouped.

1 PRELIM
ASES311

Group Frequency Distribution PERCENTILE RANKS


 Test-score intervals, also called class intervals, replace  Percentile ranks replace simple ranks when we want to
the actual test scores. adjust for the number of scores in a group.
 The number of class intervals used and the size or width  A percentile rank describes the percentage of people in
of each class interval are for the test user to decide. the comparison group who scored below a particular
score.
*frequency distributions of tests scores can also be illustrated
graphically. DESCRIBING DISTRIBUTIONS
 Statistics are used to summarize data. If you consider a
GRAPH set of scores, the mass of information may be too much
 Is a diagram or chart composed of lines, points, bars, or to interpret all at once. That is why we need numerical
other symbols that describe and illustrate data. conveniences to help summarize the information.
 The arithmetic average score in a distribution is called
Three kinds of graphs used to illustrate frequency the mean.
distributions  N = number of cases
 Histogram  Sigma (∑) = summation
 Bar graph
 Frequency polygon

HISTOGRAM
 Histogram is a graph
With vertical lines drawn at the true limits of each test
distribution. The formula for doing this ×= ∑(fx) where

An arithmetic means can also be computed from a frequency
score (or class interval), forming a series of contiguous
rectangles.
∑(fx) means “multiply the frequency of each score by its
corresponding score and then sum.”

Data from your measurement course test

BAR GRAPH
 Numbers indicative of frequency also appear on the Y-
axis, and reference to some categorization (e.g.,
yes/no/maybe,male/female) appears on the X-axis.
 Here the rectangular bars typically are not contiguous.

Frequency distribution of scores from your test

FREQUENCY POLYGON
 Are expressed by a continuous line connecting the points
where test scores or class intervals (as indicated on the
X-axis) meet frequencies (as indicated on the Y-axis).
Whenever you draw a frequency distribution or a frequency
polygon, you must decide on the width of the class interval.
 Refers to the numerical width of any class in a particular
distribution.

2 PRELIM
ASES311

A grouped frequency distribution SOME MEASURES OF VARIABILITY INCLUDE:


 The Range
 The Interquartile Range
 The Semi-Interquartile Range
 The Average Deviation
 Standard Deviation
 The Variance

RANGE
 Of a distribution is equal to the difference between the
highest and the lowest scores.
 Provides a quick but gross description of the spread of
scores.

THE INTERQUARTILE AND SEMI-INTERQUARTILE


Calculating the arithmetic mean from a grouped RANGES
frequency distribution  A distribution of test scores (or any other data, for that
matter) can be divided into four parts such that 25% of
the test scores occur in each quarter.

The dividing points between the four quarters in the


distributions are the quartiles.

THE INTERQUARTILE RANGE


 A measure of variability equal to the difference between
Q3 and Q1.

THE MEDIAN SEMI-INTERQUARTILE


 Middle score in a distribution to  Range equal to the interquartile range divided by 2. (Q3-
Q1)/2
THE MODE
 The most frequently occurring score in a distribution of VARIANCE
scores is the mode  Is equal to the arithmetic mean of the squares of the
differences between the scores in a distribution and their
MEASURES OF VARIABILITY mean.
Statistics that describe the amount of variation in a The formula used to calculate the variance (s2) using

=
2
 

deviation scores is 2 ( − )
distribution

VARIABILITY
 Is an indication of how scores in a distribution are STANDARD DEVIATION
scattered or dispersed.  A statistic that measures the degree of spread or
dispersion of a set of scores.
 The value of this statistics is always greater than or equal
to zero.

3 PRELIM
ASES311

SKEWNESS The area under the normal curve


 The nature and extent to which symmetry is absent.  The normal curve can be conveniently divided into areas
 An indication of how the measurements in a distribution defined in units of standard deviation.
are distributed.  A normal curve has two tails. The areas on the normal
curve between 2 and 3 standard deviation above the
POSITIVE SKEW mean is referred to as a tail.
 When relatively few of the scores fall at the high end of
the distribution.
 Positively skewed examination results may indicate that
the test was too difficult.

NEGATIVE SKEW
 When relatively few of the scores fall at the low end of
the distribution.
 Negatively skewed examination results may indicate that
the test was too easy.

 50% of the scores occur above the mean and 50% of the
scores occur below the mean.
 Approximately 34% of all scores occur between the
KURTOSIS mean and 1 standard deviation above the mean.
 Use to refer to the steepness of a distribution in its  Approximately 34% of all scores occur between the
center To the root kurtic is added to one of the prefixes mean and 1 standard deviation below the mean.
 Platy-  Approximately 68% of all scores occur between the
 Lepto- mean and +/-1 standard deviation.
 Meso-  Approximately 95% of all scores occur between the
To describe the peakedness/flatness of three general types of mean and +/-2 standard deviations
curves
STANDARD SCORES
Distributions are generally described Why concert raw scores to standard scores?
 Platykurtic - relatively flat  Standard scores are more easily interpretable than raw
 Leptokurtic - relatively peaked scores
 Mesokurtic - somewhere in the middle  With a standard score, the position of a test-taker’s
performance relative to other test-takers is readily
apparent.
 Different systems for standard scores exist
 Z scores
 T scores
 Stanines
 Other standard scores.

THE NORMAL CURVE


 A bell-shaped, smooth, mathematically defined curve
that is highest at its center.
 From the center it tapers on both sides approaching the
x-axis asymptotically

Asymptotically
 It approaches, but never touches, the axis
 The curve is perfectly symmetrical, with no skewness
 If you folded it in half at the mean, one side would lie
exactly on top of the other.
 Because it is symmetrical, the mean, the media, and the
mode all have the same value.

4 PRELIM
ASES311

Z SCORES CORRELATION AND INFERENCE


 The type of standard score scale that may be thought of Coefficient of correlation (or correlation coefficient) (r)
as the zero plus or minus one scale.  A number that provides us with an index of the strength
 Mean set at 0 and a standard deviation set at 1 of the relationship between two things.
 Expresses a linear relationship between two (and only
two) variables, usually continuous in nature.
 It reflects the degree of concomitant variation between
variable X and variable Y.
 The coefficient of correlation is the numerical index that
Crystal’s raw score on the hypothetical main street
expresses this relationship: It tells us the extent to
 Reading test - 24
which x and y are “co-related.”
 Arithmetic Test was - 42
CORRELATION
Converting Crystal’s raw scores to z scores based on the
 An expression of the degree and direction of
performance of other students in her class, suppose we find
correspondence between two things.
that her z score
 Reading test - 1.32
Positively (or directly) correlated
 Arithmetic test was - 0.75
 If two variables simultaneously increase or
simultaneously decrease.
T SCORES
A Negative (or inverse) correlation
 The scale used in the computation of T scores can be
 Occurs when one variable increases while the other
called a fifty plus or minus ten scale.
variable decreases.
 Mean 50
“Perfectly no correlation” (correlation is zero)
 Standard deviation 10
 Absolutely no relationship exists between the two
 standard score system is composed of a scale that ranges
variables.
from 5 standard deviations below the mean to 5 standard
 Just as it is nearly impossible in psychological work to
deviations above the mean to 5 standard deviations
identify two variables that have a perfect correlation, so
above the mean.
it is nearly impossible to identify two variables that have
a zero correlation.
STATINE
Pearson r
 Divided into nine units, the scale was christened a statine,
 A method of computing correlation when both variables
a term that was a contraction of the words standard and
are linearly related and continuous.
nine.
 Interval/ratio + interval/ratio
 Different from other standard scores in that they on
 Once a correlation coefficient is obtained, it needs to be
whole values from 1 to 9
checked for statistical significance.
Spearman Rho
 A method for computing correlation, used primarily
when sample sizes are small or the variable are ordinal in
nature.

GRAPHIC REPRESENTATIONS OF CORRELATION

DEVIATION IQ
 Or deviation intelligence quotient.
 Most IQ tests, the distribution of raw scores is converted
to IQ scores, whose distribution typically has a mean set
at 100 and a standard deviation set at 15.
SCATTERPLOT
 The typical mean and standard deviation for IQ tests
 Involves simply plotting one variable on the X
results in approximately 95% of deviation IQs ranging
(horizontal) axis and the other on the Y (vertical) axis
from 70 to 130, which is 2 standard deviations below and
above the mean.

5 PRELIM
ASES311

 Scatterplots of strong correlations feature points tightly


clustered together in a diagonal line. For positive
correlations the line goes from bottom left to top right.

6 PRELIM
ASES311 MIDTERM

ASSESSMENT ADDMINISTER  The WAIS comprises 10 core sub-tests and 5


supplemental sub-test
WECHSLER INTELLIGENCE SCALE
“I’ll be asking you to do a number of things today.”
Proponent “Some of the things may be really easy for you, but some
 David Wechsler may be hard.”
 American psychologist and inventor
“Most people do not answer every question correctly or
 Studied at the City College of New York and Columbia
finish every item, but please try your best.”
University “Do you have any questions?” Subtests -
 Receiving his doctorate in 1925
 Long association with Bellevue Psychiatric Hospital in
Administration Order (See Record Form)
New York City Block Design Visual Puzzles
 Serving as chief psychologist from 1932 to 1967
Similarities Information
 Used intelligence tests for adults and children Digit Span Coding
 Wechsler Bellevue Intelligence Scale was develop for Matrix Reasoning Letter-Number Sequencing
adults Vocabulary Figure Weights
 Verbal comprehension, perceptual reasoning, working Arithmetic Cancellation
memory and processing speed. Symbol Search Picture Completion
 Test provides a standardized score based on the
individual’s performance
 WAIS test is age appropriate and is widely used in ADMINISTRATION GUIDELINES
various setting such as: clinical, educational evaluations Demonstration Items Examiner explains task
and vocational settings Sample Items Examinee practices
Teaching Items Examiner teaches if needed
Introduction and prescribed
 First published in 1955 and designed to measure
intelligence in adults and older adolescents  Queries - for responses that are marginal, generalized,
 Commonly administered in psychiatric clinics and functional, made with hand gestures
hospitals  Prompts (e.g., “Do you have an answer?”)
 Comprehensive assessment of an individual’s cognitive  Repetition
abilities intelligence
RECORDING RESPONSES
Background Symbol Use
 He described intelligence as “the global capacity of a Q Administered query
person to act purposefully, think rationally, and to deal P Administered prompt
effectively with his environment” R Repeated item
 One of the important things he highlighted “did not DK Examinee indicated s/he did
consider that intellectual performance could deteriorate not know
as a person grew older.” NR Examinee did not respond

A Historical Perspective Test Scoring/ Interpretation


Wechsler - Bellevue 1939  WAIS-IV is scored on a scale of 45 to 155
Wechsler - Bellevue II 1946  Mean of 100 and a standard deviation of 15
WAIS 1955  Test consists of 10 core sub-tests and five supplemental
WAIS - R 1981 sub-tests
WAIS - III 1997
WAIS - IV 2008 The core sub-tests calculate four composite scores:
 Verbal Comprehension Index (VCI)
Age Consideration  A measure of vocabulary, verbal reasoning, and
It has gone through several revisions with different knowledge acquired from one’s environment
categories namely:  Perceptual Reasoning Index (PRI)
1. For adults: 16 - 90 years  Is a measure of perceptual and fluid reasoning, spatial
2. For school-going children: 6 - 16 years processing, and visual-motor integration.
3. For preschoolers: 2 1/2 - 7 years  Working Memory Index (WMI)
 Is a measure of working memory abilities, which
Administration involves attention, concentration, ,mental control and
 Level C reasoning.
 Completion time: 60 - 90 minutes for core sub-tests  Working memory task require the ability to temporarily
 Requirements: paper and pencil or digital retain information in memory, perform some of the
 The test is conducted in a quite, distraction-free person some operation or manipulation with it, and
environment to ensure accurate results. produce a result

BIOJON 1
ASES311 MIDTERM

 Processing Speed Index (PSI)  To assess personality traits and psychopathology


 Composed of sub-tests measuring the speed of mental  One of the most commonly and frequently used
and fine motor control. psychological tests
 The PSI provides a measure of the person’s ability to  One of the most researched
quickly and correctly scan, sequence, or discriminate
simple visual information History of the MMPI (Different Version)
 Composite also measures short-term memory, attention, MMPI - 2 (1989)
and visual-motor coordination  The revised edition of the test was released in 1989
 It is designed for all adults over the age of 18 requires a
The supplemental sub-tests can supplement the composite sixth-grade reading level
scores or provide additional information about the test  Contains 576 true/false items that typically take around
taker’s cognitive abilities. one to two hours to complete
 VCI : Similarities, Vocabulary, Information MMPI-2-RF (2008)
 PRI: Block Design, Matrix Reasoning, Visual Puzzles  Published in 2008, known as the Minnesota Multiphasic
 WMI: Digit Span, Arithmetic Personality Inventory - 2 Restructured Form
 PSI: Symbol Search, Coding  338 questions, significantly fewer than the MMPI-2
 It takes around 35-50 minutes to complete
Computation of Scores MMPI-A (1992)
 Raw scores  Published in 1992
 The raw score of each sub-test is the number of items  Geared toward adolescent aged 14 to 18 years old
the test taker answered correctly  It takes about an hour to complete
 Scaled scores MMPI-A-RF (2016)
 Scaled scores are converted from raw scores using a  Published in 2016
table provided in the WAIS-IV manual.  Minnesota Multiphasic Personality Inventory -
 Scaled scores have a mean of 10 and a standard Adolescent - Restructured Form
deviation of 3  Contain 241 true/false items less than half the number
 Standard scores of items of the original MMPI-A to help combat the
 Converted from scaled scores using a table provided in challenges of adolescent attention span and
the WAIS-IV manual. concentration
 Mean of 100 and a standard deviation of 15  One of the most commonly used psychological tools
 Composite scores among the adolescent population
 Calculated by averaging the scale scores for the sub- MMPI-3 (2020)
tests that make up the composite.  Published in 2020
 The four composite scores on the WAIS-IV as verbal  The latest version of the test
comprehension, perceptual reasoning, working memory  The test takes 25 to 50 minutes to complete with 335
and processing speed items and is available in English, Spanish, and French for
Canada formats.
The WAIS scale helps to determine the following:
1. Intellectual Disability: Mild severity HOW THE MMPI IS USED
2. Intellectual Disability: Moderate severity By Mental Health Professionals
3. Borderline Intellectual Functioning  To assess personality traits and psychopathology
4. Gifted Intellectual Functioning  To diagnose mental health disorder and assess severity
5. Autistic Disorder By Lawyers (Forensics)
6. Asperger’s Disorder  To support forensic evidence in court
7. Learning Disability: Reading By Employers (Job Screening)
8. Learning Disability: Math  To determine which candidate possesses the
9. ADHS characteristics best suited for the role
10. TBI  An employer can immediately see any red flags during
11. Mild Cognitive Impairment the application process and reduce the risk of turnover
12. Dementia of the Alzheimer’s Type rate or workplace issues.
13. Depression
Other Uses
 To evaluate effectiveness (e.g. treatment programs)
 Using the MMPI as a Pre-test and Post-test to determine
MINNESOTA MULTIPHASIC PERSONALITY INVENTORY if there were positive or negative changes in the client
(MMPI) because of the treatment.

History
 Developed in 1939
 Starke R. Hathway (Clinical psychologist) and J.C.
Mckinley (Neuropsychiatrist)
 Published in 1943
 To diagnose mental health disorders and assess severity

BIOJON 2
ASES311 MIDTERM

What The MMPI - 2 Measures


of items History
Number Abbreviation Description What is measured
1 Hs Hypochondriasis Concern with bodily symptoms
 One of the most popular test in world
2 D Depression Depressive symptoms  Based on the Carl Jung’s work
3 Hy Hysteria Awareness of problems and
vulnerabilities  Katherine Briggs and her daughter, Isabel Myers
4 Pd Psychopathic DeviateConflict, struggle, anger, respect
society’s rules expanded in Jung’s work and created the MBTI as we
5 MF Masculinity/Femininity
Stereotyping masculine or
feminine interests/behaviors know it today
6 Pa Paranoia of trust, suspiciousness,
sensitivity  The model uses a series of questions to categorize
7 Pt Psychasthenia Worry, anxiety, tension, doubts,
obsessiveness
people into one of 16 different personality types
8 Sc Schizophrenia thinking and social alienation  These 16 personality types are based on the four
9 Ma Hypomania of excitability
0 Si Social Introversion People orientation distinct dichotomies
 The MBTI instrument was first published in 1962
TEST ADMINISTRATION
Test Description The Key Moments in MBTI
 567 true - false self report items 1919 - Isabel Briggs Myres graduates from Swathmore
 Has 10 clinical scales that are used to indicate different college. Isabel’ mother, Katharine Briggs, starts to research
psychological conditions personality type theory
 Can be administered individually or by group 1921 - Carl Jung Publishes Psychological Types: The
 Qualification: Level C Psychology Of Individuation
 18 years old and older 1943- Form A Of The Instrument Is Copyrighted
 6th grade reading level 1962 - Isabel Self-Publishes Introduction to type. Educational
 Ability to follow standard instructions Testing Services (ETS) publishes research version of the MBTI
instrument and the MBTI Manual
Administering the MMPI-2 1968 - Katharine Cook Briggs Dies. MBTI questionnaire
 Establish rapport before administering published in Japan by Industrial Psychologist Takeshi Ohsawa.
 Follow standard instructions in the manual It’s the first MBTI translation
 Avoid defining words or helping interpret meaning of 1969 - Isabel Briggs Myers and Clinical Psychologist Mary
items Mccaulley start Typology Lab
 Determine the test ability of the test taker (physical 1975 - CPP, Inc. (Formerly Consulting Psychologists Press)
condition, emotional state and reading and publishes the MBTI instrument. Typology Lab becomes The
comprehension skills) Center For Applications Of Psychological Type (Capt). It is the
 Testing conditions – adequate space, good lighting, Center For Research, Data Collection, Information, Training
comfortable chair and quiet surroundings and Publications
 When administering to large groups, special measures 1977 - Capt Publishes First Issue Of The Journal Of
to ensure maximal cooperation and care should be Psychological Type
considered 1980 - Isabel Briggs Myers Dies. Peter And Katharine Myers
become Co-Owners of the MBTI Copyrights
Test Forms 1985 - MBTI Manual Second Edition Published
Printed Booklet (paper and pencil) 1990 - Form K Published. It is the precursor to the step II
 Takes 60-90 minutes assessment (Form Q)
 Have to manually enter the responses into the scoring 1998 - Step I™ (Form M) Updated. MBTI Manual Third Edition
software Published
 Hand Scoring is still an option but it takes time and is 2001 - Step II™ (Form Q) and MBTI Step II Manual Published
error prone 2007 - MBTI Complete Launched
2009 - Step III™ Published
Standard Audio Tape/CD (avoid reading item aloud) 2017 - Cpp, Inc. Buys Opp Ltd
 For test takers with visual impairments 2018 - Cpp, Inc. Becomes The Myers-Briggs Company
 Test taker listens to statements through a CD 2019 - New Global Versions Of MBTI Step I And Step II
 Verbal response Assessments Published.
New Version Of MBTI Online Launched
Computerized Administration
 Takes less time TEST DESCRIPTION
 Same software is used for administration and scoring Age
 Adolescent and adults
 14 and older
 Maturity and cognitive development of the individual
MYERS - BRIGGS TYPE INDICATIOR
Qualification
Proponent  Level: B
 Isabel Briggs Myers  Administered by certified MBTI practitioners
 Katherine Cook Briggs  These practitioners have completed training programs
 Carl Jung accredited by the Myers & Briggs Foundation

BIOJON 3
ASES311 MIDTERM

 Psychologists, Career Counselors, and other  Timer or stopwatch


professionals often administer the MBTI settings.
 Organizations may also have in-house certified Procedure
professionals who can administer the MBTI to Administration of the Bender-Gestalt II involves two phases:
employees The copy phase and the Recall Phase. The examinee is shown
stimulus cards with different designs.
Administration Time
 15 to 30 minutes to complete the questionnaire Start and End Items for Specified Ages From 4 to Adult
Ages Start Item End Item
Where MBTI is Administered 4 to 7 and 11 months 13
 Psychologist or counselor's offices 8 years and older 16
 Career development workshops
 Corporate team-building sessions Directions for Administering the Copy Phase
 Educational institutions for student guidance  Although the test has no time limits, use a stopwatch or
 Online platforms offering personality assessments other timing device to measure how long the examinee
takes to complete the items.
INTROVERSION (I) VS. EXTRAVERSION (E)  Place the stopwatch out of the examinee ' s sight in an
 This dimension reflects how individuals direct their inconspicuous location, such as your lap.
energy. Introverts tend to focus inwardly, feeling more  Position the drawing paper on the table, centering it
energized by spending time alone or in small groups, vertically on front of the examinee.
while Extravert are energized by interacting with others  The examinee is asked to copy each of the designs on a
and the external world. blank sheet of paper. Say: “I have a number of cards
here. Each card has a different drawing on it. I will show
THINKING (T) VS. FEELING (F) you the cards one at a time. Use a pencil to copy the
 This dimension relates to how individuals make drawing from each card onto the sheet of paper. Try to
decisions. Thinkers tend to make decisions based on make your drawings look just like the drawings on the
logic and objective criteria, focusing on impersonal cards. There are no time limits, so take as much time as
analysis. Feelers, on the other hand, make decisions you need.”
based on personal values and the impact on others, Directions for Administering the Recall Phase
considering emotions and empathy.  Although the test has no time limits, the examiner
records how long it takes the examinee to reproduce
INTUITION (N) VS. SENSING (S) the designs.
 This dimension describes how individuals take in  Ask the examinee is asked to redraw the designs from
information. Intuitive types are more focused on memory. Say: “Now I want you to draw as many of the
abstract ideas and possibilities, relying on their intuition designs that I just showed you as you can remember.
and imagination. Sensing types, on the other hand, are Draw them on the new sheet of paper. Try to make your
more focused on concrete information from their drawings just like the ones on the cards that you saw
senses and the present reality. earlier. There are no time limits, so take as much time as
you need.”
JUDGING (J) VS. PERCEIVING (P)
 This dimension reflects how individuals approach the Test Scoring
outside world. Judging types prefer structure and Global Scoring System
organization, seeking closure and making plans.  Used to evaluate the examinee overall representation of
Perceiving types are more flexible and adaptable, each design during copy and recall phases of
preferring to keep their options open and explore administration
possibilities.  Consist of a 5-point rating scale
 Higher scores indicate better performance.
BENDER VISUAL - MOTOR GESTALT TEST II

Proponent
 Lauretta Bender
 1938

Administration Materials
 16 stimulus cards
 Two supplemental tests: Motor Test and Perception
Test
 Observation form (for recording time and different
types of testtaking behavior)
 Two number 2 pencils with erasers
 3-5 Sheet of papers
0 No resemblance, random drawing, scribbling, lack of
design
BIOJON 4
ASES311 MIDTERM

1 Slight - vague
2 Some - moderate resemblance
3 Strong - close resemblance, accurate reproduction
4 Nearly perfect

BIOJON 5
ASES311 MIDTERM

BIG FIVE (OCEAN) ADMINISTRATION


1. Select a Reliable and Valid Questionnaire: Choose a
History questionnaire that has been well-validated and widely
 Gordon Allport and Henry Odbert in 1936 used, such as the NEO-PI-R or the Big Five Inventory
 Identified 4,500 personality traits. (BFI).
 Raymond Cattell later applied factor analysis to these 2. Explain the Purpose: Provide participants with an
terms, creating 16 larger categories explanation of the purpose of the assessment and
 16 traits could be encapsulated within five broader ensure they understand that there are no right or wrong
dimensions of personality. Five-Factor Model (FFM) answers.
 Paul Costa and Robert McCrae published the Big Five 3. Instructions: Read or provide written instructions for
personality traits in 1985 completing the questionnaire. Explain the response
 Personality across five core dimensions: openness to format (e.g., Likert scale) and ensure participants
experience, conscientiousness, extraversion, understand how to respond
agreeableness, and neuroticism 4. Scoring: Once the questionnaire is completed, score
each trait (Openness, Conscientiousness, Extraversion,
Big Five Inventory - A (BFI-A) Agreeableness, Neuroticism) based on the responses.
 The Big Five Inventory-A (BFI-A) gives a measure of an Some questionnaires may provide a total score for each
individual’s personality. Higher autistic traits are trait, while others may provide scores for facets of each
predisposed to specific personality traits trait.
 Authors: J.M. Digman & Lewis Goldberg 5. Interpretation: Interpret the scores in the context of the
 Duration: 5–10 minutes individual’s personality profile. High and low scores on
 44 item each trait can provide insights into the individual’s
 Age 10+, of average or higher intelligence personality characteristics
6. Confidentiality: Ensure that responses are kept
Purpose confidential and that the individual’s privacy is
 Big Five Personality Test is to measure an individual's respected throughout the process.
personality and to know themselves more
 Being aware of their own personality can help them to Common Settings Where The Questionnaires Could Be
know what are the roles or characteristics best suited to Administered:
them.  Clinical settings
 This test was divided into five dimensions, openness,  Workplace settings
conscientiousness, extraversion/introversion, and  Educational settings
neuroticism
Age: 10 and above
Openness Requirement: level B
 It is for measuring the creativity and imagination of an Time: 20 minutes
individual. Their thoughts about a certain things, their
preferences and their openness in a new knowledge.
THE BASIC PERSONALITY INVENTORY (BPI)
Conscientiousness
 It is all about mindfulness or being aware in a certain Proponent
scenario. How an individual would react and having a  Dr. Douglas N. Jackson
good impulse, including also an organized decision in a
specific situation. Introduction
 To identify sources of maladjustment and personal
Extaversion/Introversion strengths
 It measures the socialization of an individual in relation  It is composed of 240 true/false self-report
to their peers. How well they socialize with other  Measure of general domain of psychopathology
people.  Containing 11 bipolar personality scales and 1 critical
item in total of 12 scales
Agreeableness  12 scales measure broad dimensions of personality that
 In this part, it reveals how an individual reacts in relate an individual’s intrapsychic and interpersonal
different situations, either positive or negative. How functioning
they sympathize with others, being a considerate
person 12 scales:
 Hypochondriasis
Neuroticism  Depression
 It measures the emotional reactions or emotional  Denial
stability of an individual. Are their reactions too much or  Interpersonal problems
it lacks emotions.  Alienation
 Persecutory ideas
 Anxiety
 Thinking disorder

BIOJON 6
ASES311 MIDTERM

 Impulse disorder For examiner:


 Social introversion  Extra booklet
 Self depreciation  Extra answer sheet
 Deviation  Stopwatch or other timing device
 Scoring template (if hand scoring is preferred)
TEST DESCRIPTION
Age: 12 years old and older 3 types of RPM
Qualification: level C Raven’s Standard Progressive Matrices
Administration Time: 30 - 40 minutes  60 items
Where: in public institutions, private psychological,  A-E = 12 each
psychiatric, and counseling practices  For age 4 - 90
When: in juvenile and adult correctional facilities or court  Present items in increasing (progressive) difficulty
referrals, personnel screening and selection  Black patterns w/white background
Who: research with alcoholism, eating disorders, and juvenile
delinquency Raven’s Colored Progressive Matrices
Test  36 items
 It can be administered either individually or in  A = 12
supervised group  Ab = 12
 it is helpful to familiarize respondents with what is  B = 12
required by reading aloud the instructions on the BPI  For young children and old people
cover  Colored background to be visually stimulating
 Materials: Test Booklet, Answer Sheet, Pencil
Raven’s Advance Progressive Matrices
 48 items
RAVEN’S 2 PROGRESSIVE MATRICES  Set 1 = 12
 Set 2 = 36
Proponent  For adults and adolescent of above average intelligence
 John C. Raven  Black patterns w/white background
 Report aims to explore the life and contributions of John
C. Raven, shedding light on the visionary mind behind
the revolutionary cognitive assessment tool, RPM CULTURAL FAIR INTELLIGENCE TEST

Introduction Introduction
 In the 1930s, John C. Raven introduced a non-verbal  Raymond B. Cattell
cognitive test, minimizing cultural bias.  Developed to be a measure of intelligence without
 It relies on visual patterns to assess abstract reasoning cultural biases. Aiming at deriving a culture-free
and fluid intelligence. intelligence test based on a research of the literature,
 The Standard and Advanced Progressive Matrices the author decided on seven sub-steps
versions have sustained its widespread use, highlighting
the lasting impact of Raven's pioneering work Seven Sub-Steps:
 It’s a non-verbal test intelligence test designed to  1-2: Mazes, series
measure abstract reasoning ability.  3: Classification
 The test was developed by John C . Raven in 1983.  4: Progressive Matrices I relation matrix first order
 Raven was a British psychologist who aimed to create a  5: Progressive Matrices II relation matrix second order
test that could assess person’s cognitive abilities  6: Progressive Matrices III sequence matrix
without relying on language or specific cultural  7: Mirror images
knowledge
Crystallized Intelligence
Age Consideration  Represents knowledge acquired through experience.
 4 to 90 years of age (the administration time depends Are thought to reflect the influence of culture and
on the age of the individual) schooling such as verbal memory and general
 Qualification Level: B knowledge.
Age Starting Item Set Item Set # of Item Time Limit
4-8 A A,B, C 36 30
9-79 B B,C,D,E 48 45 Fluid intelligence
80-90 A A,B,C 36 45  Represents the biological ability to acquire knowledge
and solve problems. Are thought to reflect intelligence
Paper Administration independent of learning such as reasoning speed,
For each examinee: spatial reasoning, and inductive reasoning.
 1 test booklet
 1 answer sheet The Need For The Culture-Fair test Arises Because:
 Pencil with eraser  Certain ethnic groups may be naturally favored by the
nature of an exam, particularly if the examination
contains things or language unique to that group.

BIOJON 7
ASES311 MIDTERM

 The test's administration might be biased, possibly 2. Classifications


giving some sub-cultural groups more weight than  The individual is presented with 5 figures. In scale 2,
others through its guidelines or practices. they must select one which is different from the other
 Various cultural views may lead to various analyses and four. In scale 3, they must correctly identify two figures
conclusions dependent on how test findings are which are in some way different from three others.
interpreted. Administration: 14 items, 4 Minutes
3. Matrices
History  Task is to correctly complete the design or matrix
 Late 1920' s: Began in the work undertaken by Cattell, presented at the left of each row.
sparked the precise scientific research of Charles Administration: 13 items, 3 Minutes
Spearman into the nature and accurate measurement of 4. Conditions (Topology)
intelligence.  Requires the individual to select, from the 5 choices
 1930: Resulted in the publication of the Cattell Group provided, the one which duplicates the conditions given
and Inventory (particularly intended for use with in the far left box.
children) were revised and recast into non-verbal form Administration: 10 items, 2.5 Minutes
to diminish the unwanted and unnecessary effects of
verbal fluency in the pure measurement of intelligence
 1940: Another revision of the test appeared. Items had
become completely perceptual and were organized into
6 sub test, 3 of which have been retained in the present
format of the 159 items analyzes, 72 of satisfactory
validity and reliability were retained for the published
edition
 1949: Another revision and adopted the format
consisting of 4 sub-test (Series, Classification, Matrices
and Conditions)
 1961: Primary outcome of this revision were slight
adjustments in the difficulty level and sequencing of few
items. At the same time the few samples were
expanded to achieve better national representation in
the final tables.

Requirements
 Level B: Available only if the test administrator has
completed an advanced level course in testing in a
university, or its equivalent in training under the
direction of a qualified superior or consultant.

Age Range
 Scale 1: Ages 4 to 8 years and older, mentally
handicapped individuals
 Scale 2: Ages 8 to 14 years and average adults
 Scale 3: 14 to college students and adults of superior
intelligence

Materials Required
 CFIT Form A and B test booklet
 Stopwatch
 Screen
 Pencil
 Eraser
 CFIT Manual
 CFIT Technical Manual
 Response Sheets

Description of the Sub-Test


1. Series
 The individual is presented with incomplete, progressive
matrices. The task is to select, from among the choices
provided, the answer which best continues the series.
Administration: 13 items, 3 Minutes

BIOJON 8
ASES311 MIDTERM

TEST AND TESTING Assumption 3: Test-Related Behavior Predicts Non-Test


Related Behavior
Assumptions in Psychological Testing and Assessment  Responses on tests are thought to predict real world
Assumption 1: Psychological Traits and States Exist behavior. The obtained sample of behavior is expected
Trait: to predict future behavior.
 “Any distinguishable, relatively enduring way in which  The tasks in some tests mimics the actual behaviors that
one individual varies from another” (Guilford, 1959, p. the test user is attempting to understand
6).  Such tests only yield a sample of the behavior that can
 Permit people predict the present from the past. be expected to be emitted under non-test conditions
 Characteristics pattern of thinking, feeling, and behaving
the generalize across similar situations, differ Assumption 4: Test and other Measurement Techniques
systematically between individuals, and remain rather Have Strengths and Weaknesses
stable across time.  Competent test users understand and appreciate the
 Constructs - an informed, scientific concept developed limitations of the test they use as well as how those
or constructed to describe or explain behavior. limitations might be compensated for by data from
 We can’t see, hear, or touch constructs, but we can other sources.
infer their existence from overt behavior, such as test
scores. Assumption 5: Various Sources of Error are Part of
States: Assessment Process
 Distinguish one person from another but are relatively Error:
less enduring (Chaplin et al., 1988).  Refers to something that is more than expected; it is
 Characteristics pattern of thinking, feeling, and behaving component of the measurement process
in a concrete situation at a specific moment in time.  Refers to a long-standing assumption that factors other
 Identify those behaviors that can be controlled by than what a test attempts to measure will influence
manipulating the situation. performance on the test

Construct: An informed, scientific concept developed or Error Variance - The component of a test score attributable
constructed to explain a behavior, inferred from overt to sources other than the trait or ability measured
behavior
Overt Behavior : An observable action or the product of an Potential Sources of Error Variance:
observable action 1. Assessors
 Trait is not expected to be manifested in behavior 100% 2. Measuring Instruments
of the time 3. Random errors such as luck
 Whether a trait manifests itself in observable behavior,
and to what degree it manifests, is presumed to depend Classical Test Theory - each test-taker has true score on a
not only on the strength of the trait in the individual but test that would be obtained but for the action of
also on the nature of the action (situation-dependent) measurement error
 Context within which behavior occurs also plays a role in
helping us select appropriate trait terms for observed Assumption 6: Testing and Assessment can be Conducted in
behaviors a Fair and unbiased Manner
 Definition of trait and state also refer to a way in which  All major test publishers strive to develop instruments
one individual varies from another that are fair when used in strict accordance with
 Assessors may make comparisons among people who, guidelines in the test manual.
because of their membership in some group or for any  Problems arise if the test is used with people for whom
number of other reasons, are decidedly not average it was not intended.
 Some problems are more political than psychometric in
Assumption 2: Psychological Traits and States Can Be nature
Quantified and Measured  Despite best efforts of many professionals, fairness-
 Different test developers consider the types of item related questions and problems do occasionally rise
content that would provide insights to it, to gauge the  In all questions about tests with regards to fairness, it is
strength of that trait. important to keep in mind that tests are tools they can
 Measuring traits and states means of a test entail be used properly or improperly
developing not only appropriate tests items but also
appropriate ways to score the test and interpret the Assumption 7: Testing and Assessment Benefit Society
results  Considering the many critical decisions that are based
 Cumulative Scoring – assumption that the more the on testing and assessment procedures, we can readily
test-taker responds in a particular direction keyed by appreciate the need for tests
the test manual as correct or consistent with a  There is a great need for tests, especially good tests,
particular trait, the higher that test-taker is considering the many areas of our lives that they benefit.
 Presumed to be on the targeted ability or trait

BIOJON 9
ASES311 MIDTERM

WHAT’S A “GOOD TEST”? Generalization of findings from convenience samples must be


 Reliability: The consistency of the measuring tool: the made with caution.
precision with which the test measures and the extent
to which error is present in measurements. Developing norms for a standardized test
 Validity: The test measures what it purports to measure. Having obtained a sample test developers:
 Other considerations: Administration, scoring,  Administer the test with standard set of instructions
interpretation should be straightforward for trained  Recommend a setting for test administration
examiners. A good test is a useful test that will  Collect and analyze data
ultimately benefit individual test-takers or society at  Summarize data using descriptive statistics including
large. measures of central tendency and variability

Norms Standardization Sample and Normative


 Norms are the test performance data of a particular Sample
group of test-takers that are designed for use as a  The test remains standardized based on data from the
reference when evaluating or interpreting individual original standardization sample; it’s just that new
test scores. normative data are developed based on an
 A normative sample is the reference group to which administration of the test to a new normative sample.
testtakers are compared.
 Norm-referenced testing and assessment Types of Norms
 a method of evaluation and a way of deriving meaning We can classify norms are as follows:
from test scores by evaluating an individual test-taker’s  Age norms
score and comparing it to scores of a group of test-  Average performance of different samples of test-
takers. takers who were at various ages when the test was
 A normative sample is the reference group to which administered
test-takers are compared.  Grade norms
 Group of people whose performance on a particular test  The average test performance of test-takers in a
is analyzed for reference in evaluating the performance given school grade
of individual test-takers.  Developmental norms: a term applied broadly to
 Norming norms developed on the basis of any trait, ability,
 Refer to the process of deriving norms. Norming may be skill, or other characteristic that is presumed to
modified to describe a particular type of norm develop, deteriorate, or otherwise be affected by
derivation. chronological age, school grade, or stage of life
 Race norming  National norms
 Controversial practice of norming on the basis of race or  Derived from a normative sample that was
ethnic background. nationally representative of the population at the
time the norming study was conducted
 National anchor norms
Sampling to Develop Norms  An equivalency table for scores on two different
Standardization or test standardization tests. Allows for a basis of comparison
 The process of administering a test to a representative  Local norms
sample of test-takers for the purpose of establishing  Provide normative information with respect to the
norms. local population’s performance on some test.
STANDARDIZED  Norms from a fixed reference group
 When it has clear specification of procedures for  Subgroup norms
administration and scoring, typically including  A normative sample can be segmented by any of the
normative data. criteria initially used in selecting subjects for the
sample
Sampling  Percentile norms
 The process of selecting the portion of the universe
deemed to be representative of the whole population. Fixed Reference Group Scoring Systems:
 Test developers select a population, for which the test is The distribution of scores obtained on the test from one
intended, that has at least one common, observable group of test-takers is used as the basis for the calculation of
characteristic. test scores for future administrations of the test. EXAMPLE:
 Stratified sampling: Sampling that includes different SAT
subgroups, or strata, from the population.
 Stratified-random sampling: Every member of theNorm-Referenced Versus Criterion-Referenced Evaluation
population has an equal opportunity of being included Criterion
in a sample.  A standard on which a judgment or decision may be
 Purposive sample: Arbitrarily selecting a sample that is based.
believed to be representative of the population. Criterion-referenced testing and assessment
 Incidental/convenience sample: A sample that is  May be defined as a method of evaluation and a way of
convenient or available for use. May not be deriving meaning from test scores by evaluating an
representative of the population. individual’s score with reference to a set standard.

BIOJON 10
ASES311 MIDTERM

Norm referenced Domain Sampling Model


 Tests involve comparing individuals to the normative  Seek to estimate the extent to which specific sources of
group. variation under defined conditions are contributing to
Criterion referenced the test score.
 Tests test-takers are evaluated as to whether they meet  Considers problem created by using a limited number of
a set standard items to represent a larger and more complicated
construct
Culture and Inference
 In selecting a test for use, responsible test users should Item Response Theory
research the test’s available norms to check how  The probability that a person with X ability will be able
appropriate they are for use with the targeted test-taker to perform at a level of Y in a test
population.  Focus: item difficulty
 When interpreting test results it helps to know about
the culture and era of the test-taker. Measurement Error
 It is important to conduct culturally informed  Collectively, all of the factors associated with the
assessment. process of measuring some variable, other than the
variable being measured.
* Can be categorized as being either systematic or random
PSYCHOLOGICAL ASSESSMENT
Random error
RELIABILITY  Source of error in measuring a targeted variable caused
 Extent to which a score or measure is free from by unpredictable fluctuations and inconsistencies of
measurement error. Theoretically, reliability is the ratio other variables in measurement process (e.g., noise,
of true score variance to observed score variance temperature, weather)
(Kaplan and Saccuzzo, 2011)
 The consistency in measurement; In the psychometric Systematic error
sense it really only refers to something that is consistent  Source of error in a measuring a variable that is typically
—not necessarily consistently good or bad, but simply constant or proportionate to what is presumed to be
consistent. RELIABILITY (Cohen and Swerdlik, 2018). the true values of the variable being measured - has
 Dependability or consistency of the instrument or consistent effect on the true score
scores obtained by the same person when re-examined
with the same test on different occasions, or with RELIABILITY - ERROR
different sets of equivalent items a. Item Sampling/Content Sampling:
 Refer to variation among items within a test as well as
Dichotomous to variation among items between tests
 Can be answered with only one of two alternative  The extent to which testtaker’s score is affected by the
responses content sampled on a test and by the way the
Power Tests content is sampled is a source of error variance
 When time limit is long enough to allow test takers to
attempt all times RELIABILITY - SOURCES OF ERROR VARIANCE
Speed Tests b. Test Administration
 Generally contains items of uniform level of difficulty  Testtaker’s motivation or attention, environment, etc.
with time limit c. Test Scoring and Interpretation
 May employ objective-type items amenable to
Classical Test Theory (True Score Theory) computer scoring of well-documented reliability
 Assumes that each person has a true score that would  Test reliability is usually estimated using different
be obtained if there were no errors in measurement. method
 A major assumption in classical test theory is that errors
of measurement are random. RELIABILITY ESTIMATES
Time Sampling: The Test–Retest Method
 Estimates are used to evaluate the error associated with
administering a test at two different times. This type of
analysis is of value only when we measure “traits” or
characteristics that do not change over time

Carryover Effects:
 Happened when the test-retest interval is short,
wherein the second test is influenced by the first test
because they remember or practiced the previous test

BIOJON 11
ASES311 MIDTERM

Practice effects: KR20 Formula


 One important type of carryover effect.  Used to measure the internal consistency reliability of a
 Scores on the second session are higher due to their test
experience of the first session of testing  Used for items that have varying difficulty

Mortality KR21 Formula


 Problems in absences in second session  Used for a test where the items are all about the same
difficulty.

Coefficient alpha
 Appropriate for use on tests containing non-
dichotomous items.
 Calculated to help answer questions about how similar
sets of data are.
Statistical tool: Pearson R, Spearman Rho  On a scale from 0 (absolutely no similarity) to 1
Test-retest reliability (perfectly identical)
 Estimate of reliability obtained by correlating pairs of
scores from the same people on two different Inter-scorer reliability
administrations of the same test  Is the degree of agreement or consistency between two
or more scorers (or judges or raters) with regard to a
Parallel Forms Method particular measure.
 Each form of the test, the means, and the error  Often used when coding nonverbal behavior.
variances, are EQUAL; same items, different
positionings/numberings Measures of Inter-Scorer Reliability
 True score must be the same for two test. Fleiss Kappa
 Determine the level between TWO or MORE raters
Alternate Forms when the method of assessment is measured on
 Simply different version of a test that has been CATEGORICAL SCALE
constructed so as to be parallel - test should contain theCohen’s Kappa
same number of items and the items should be  Two raters
expressed in the same form and should cover the same Krippendorff’s Alpha
type of content; range and difficulty must also be equal  Two or more rater, based on observed disagreement
- if there is a test leakage, use the form that is not corrected for disagreement expected by chance
mostly administered Basic research
 It has been suggested that reliability estimates in the
Inter-item consistency / Internal consistency reliability range of .70 and .80 are good enough for most purposes
 Refers to the degree of correlation among all the items in .
on a scale.  Some people have argued that it would be a waste of
 Calculated from a single administration of a single form time and effort to refine research instruments beyond a
of a test. reliability of .90.
 Useful in assessing the homogeneity of the test.
 A way to measure the validity of the test and each item How Reliable Is Reliable? Clinical
on the test. settings
 Such as whether the items in a questionnaire are all  High reliability is extremely important.
measuring the same construct.  A test with a reliability of .90 might not be good enough.
 For a test used to make a decision that affects some
Split-Half Reliability person’s future, evaluators should attempt to find a test
 Obtained by correlating two pairs of scores obtained with a reliability greater than .95.
from equivalent halves of a single test administered
ONCE Increase the Number of Items
 Useful when it is impractical or undesirable to assess  The larger the sample, the more likely that the test will
reliability with two tests or to administer a test twice represent the true characteristic.
 Cannot just divide the items in the middle because it
might spuriously raise or lower the reliability coefficient Factor and Item Analysis
 The reliability of a test depends on the extent to which
The Spearman–Brown formula all of the items measure one common characteristic
 It is a specific application of a more general formula to
estimate the reliability of a test that is lengthened or
shortened by any number of items.
 Could also be used to determine the number of items
needed to attain a desired level of reliability.

BIOJON 12
ASES311 MIDTERM

review the test items and rate them in terms of how


 More number of items = higher reliability closely they match the objective or domain specification.
 Minimizing error  Content-related evidence for validity of a test or
 Using only representative sample to obtain an observed measure considers the adequacy of representation of
score the conceptual domain the test is designed to cover.
 True score cannot be found  The only type of evidence besides face validity that is
logical rather than statistical.

Test Blueprint
 A plan regarding the types of information to be covered
by the items, the no. of items tapping each area of
coverage, the organization of the items, and so forth
 Concerned with the extent to which the test is
representative of defined body of content consisting the
topics and processes

Criterion Validity
 A judgment of how adequately a test score can be used
to infer an individual’s most probable standing on some
measure of interest—the measure of interest being the
criterion.
 Tells us just how well a test corresponds with a
particular criterion
 Criterion: standard on which a judgement or decision
may be made

Concurrent validity
 Is an index of the degree to which a test score is related
to some criterion measure obtained at the same time
(concurrently)
VALIDITY
 A judgment or estimate of how well a test measures Predictive validity
what it purports to measure in a particular context  Is an index of the degree to which a test score predicts
(Cohen and Swerdlik, 2018). some criterion measure.
 The agreement between a test score or measure and
the quality it is believed to measure (Kaplan and Criterion-Related Validity
Sacuzzo (2018).  When evaluating the predictive validity of a test,
Validation researchers must take into consideration the base rate
 Is the process of gathering and evaluating evidence of the occurrence of the variable in question, both as
about validity that variable exists in the general population and as it
exists in the sample being studied
Face Validity
 Relates more to what a test appears to measure to the Base rate
person being tested than to what the test actually  The extent to which a particular trait, behavior,
measures. characteristic, or attribute exists in the population
 Face validity is really not validity at all because it does (expressed as a proportion).
not offer evidence to support conclusions drawn from
test scores. Hit rate
 These appearances can help motivate test takers  Defined as the proportion of people a test accurately
because they can see that the test is relevant identifies as possessing or exhibiting a particular trait,
behavior, characteristic, or attribute.
Three Aspects of Validity
 Content Validity Miss rate
 Criterion-related validity  May be defined as the proportion of people the test
 Construct validity fails to identify as having, or not having, a particular
characteristic or attribute
Content validity
 Whether the test covers the behavior domain to be The category of misses may be further subdivided:
measured which is built through the choice of False positive
appropriate content areas, questions, tasks and items.  Predicted success does not occur
 Content validation is not done by statistical analysis but False negative
by the inspection of items. A panel of experts can  Predicted failure but succeed

BIOJON 13
ASES311 MIDTERM

Incremental validity
 The degree to which an additional predictor explains Leniency Error (Generosity Error)
something about the criterion measure that is not  Rater is lenient in scoring
explained by predictors already in use;
 Used to improve the domain Severity Error:
 Rater is strict in scoring
CONSTRUCT VALIDITY
 A judgment about the appropriateness of inferences Central Tendency Error:
drawn from test scores regarding individual standings  Rater’s rating would tend to cluster in the middle of the
on a variable called a construct. rating scale
Construct  One way to overcome rating errors is to use rankings
 Is an informed, scientific idea developed or
hypothesized to describe or explain behavior. Halo Effect:
 Constructs are unobservable, presupposed (underlying)  Tendency to give high score due to failure to
traits that a test developer may invoke to describe test discriminate among conceptually distinct and
behavior or criterion performance. potentially independent aspects of a rate’s behavior

Convergent Evidence * Attempting to define the validity of the test will be futile if
 if scores on the test undergoing construct validation the test is NOT reliable
tend to highly correlated with another established,
validated test that measures the same construct

Discriminant Evidence
 a validity coefficient showing little relationship between
test scores and/or other variables with which scores on
the test being construct-validated should not be
correlated
 Multitrait-multimethod Matrix: useful for examining
both convergent and discriminant validity evidence
 Multitrait: two or more traits
 Multimethod: two or more methods

Factor Analysis
 Designed to identify factors or specific variables that are
typically attributes, characteristics, or dimensions on
which people may differ
 Used to study the interrelationships among set of
variables
 Identify the factor or factors in common between test
scores on subscales within a particular test

Explanatory Factor Analysis:


 Estimating or extracting factors; deciding how many
factors must be retained

Confirmatory Factor Analysis:


 Researchers test the degree to which a hypothetical
model fits the actual data

Factor Loading:
 Conveys information about the extent to which the
factor determines the test score or scores
 Can be used to obtain both convergent and discriminant
validity

Rating
 Numerical or verbal judgement that places a person or
an attribute along a continuum identified by a scale of
numerical or word descriptors known as Rating Scale
Rating Error:
 Intentional or unintentional misuse of the scale

BIOJON 14
ASES3111 FINAL

CODE OF ETHICS conflicts of interest that could result in harm and


FOR PHILIPPINES PSYCHOLOGISTS exploitation of persons or peoples.
 Complete openness and disclosure of information must
be balanced with other ethical considerations, including
Republic Act No. 10029 the need to protect the safety or confidentiality of
 Also known as the Philippines Psychology Act of 2009 persons and peoples, and the need to respect cultural
 Mandates the Professional Regulatory Board of expectations
Psychology (Board) to monitor the conditions and
circumstances affecting the practice of Psychology and PRINCIPLE IV: Professional and Scientific Responsibilities to Society
Psychometrics in the Philippines  Psychology functions as a discipline within the context
 To adopt such measures as may be deemed lawful and of human society.
proper for the enhancement and maintenance of high  As a science and a profession, it has responsibilities to
professional, ethical and technical standards of the society.
profession
 Objectives of the Universal Declaration are to provide a GENERAL ETHICAL STANDARDS AND PROCEDURES
moral framework and generic set of ethical principles I. How we resolve ethical issues in our professional lives and
for Psychology organizations worldwide as a mean to: communities
A. Evaluate the ethical and moral relevance of their II. How we adhere to the highest standards of professional
Codes of Ethics competence
B. Guide the development or evolution of their Code III. How we respect for the rights and dignity of our clients,
of Ethics our peers, our students, and our other stakeholders in the
C. Encourage global thinking about ethics, whole profession and scientific discipline
also encouraging action that is sensitive and IV. How we maintain confidentiality in the important aspects
responsive to local needs and values and of our professional and scholarly functions
D. Speak with a collective voice on matters of ethical V. How we ensure truthfulness and accuracy in all our public
concern. statement; and
VI. How we observe professionalism in our records and fees
PRINCIPLE I: Respect for the Dignity of Persons and Peoples
 Respect for the dignity of persons is the most
fundamental and universally found ethical principle I. REDOLVING ETHICAL ISSUES
across geographical and cultural boundaries, and across A. Misuse of Psychologist’s Works
professional disciplines.  In instances where misuse or misrepresentation of our
 Respect for dignity recognizes the inherent worth of all work comes to our attention, we take appropriate and
human beings, regardless of perceived or real reasonable steps to correct or minimize effects of such
differences in social status, ethnic origin, gender, misuse or misrepresentation.
capacities, or other such characteristics. B. Conflicts between Ethics and Law, Regulations or other
 Respect for the dignity of persons and peoples is Governing Legal Authority
expressed in different ways in different communities  In instances where our code of ethics conflicts with the
and cultures. law, regulations or governing legal authority, our first
step is to take appropriate actions to resolve the
PRINCIPLE II: Competent Caring for the Well-Being of Persons conflicts while being committed to our code of ethics.
and Peoples  However, if the conflicts cannot be resolved by such
 Competent caring for the well-being of persons and means, we adhere to the law, regulations or governing
peoples involves working for their benefit and, above all, legal authority.
doing no harm. C. Conflict between Ethics and Organizational Demands
 It includes maximizing benefits, minimizing potential  In instances where our code of ethics conflicts with
harm, and offsetting or correcting harm. organizational demands, we make our code of ethics
 It also requires the ability to establish interpersonal known to the organization.
relationships that enhance potential benefits and  We also declare our commitment and adherence to this
reduce potential harm. code when resolving the conflicts.
 Another requirement is adequate self-knowledge of D. Informal Resolution of Ethical Violations
how one's values, experiences, culture, and social  When we become aware that another psychologist
context might influence one's actions and violated our code of ethics, we may resolve the issue by
interpretations. bringing it to the attention of the psychologist.
 We do so if informal resolution is sufficient and if the
PRINCIPLE III: Integrity intervention does not violate confidentiality rights.
 Integrity is vital to the advancement of scientific E. Reporting Ethical Violations
knowledge and to the maintenance of public confidence 1. If there is likely to have substantial harm to a person or
in the discipline of psychology. organization, we take further action to report violation of the
 Integrity is based on honesty, and on truthful, open and code of ethics to appropriate institutional authorities.
accurate communications. 2. However, this does not apply when an intervention would
 It includes recognizing, monitoring, and managing violate confidentiality rights or when we are called to review
potential biases, multiple relationships, and other

BIOJON 1
ASES3111 FINAL

the work of another psychologist whose professional conduct  However, we shall immediately discontinue said services
is in question. as soon as the emergency has ended, and ensure that
F. Cooperating with Ethics Committee appropria
 We cooperate with the ethics investigation, proceedings
and requirements of any psychological association we
belong to.
G. Improper Complaints
 We refrain from filing ethical complaints with reckless
disregard or willful ignorance of facts that would
disprove allegations of ethical violations. We also refrain
from filing complaints without supporting factual
evidence.
H. Unfair Discrimination Against Complainants and
Respondents
1. We do not discriminate against complainants and
respondents of ethical complaints by denying them
employment, advancement, admissions to academic, tenure
or promotion.
2. This does not rule out taking appropriate actions based on
outcomes of proceedings.

II. COMPETENCIES
A. Boundaries of Competence
1. We shall provide services, teach, and conduct research
with persons, populations in areas only within the boundaries
of our competence, based on our education, training,
supervised internship, consultation, thorough study, or
professional experience.
2. We shall make appropriate referrals, except as provided in
Standard A.2, Providing Services in Emergencies, where our
existing competencies are not sufficient to ensure effective
implementation or provision of our services.
3. When we plan to provide services, teach, or conduct
research involving populations, areas, techniques, or
technologies that are new to us and/or are beyond our
existing competence, we must undertake relevant education,
training, supervised experience, consultation, or thorough
study.
4. So as not to deprive individuals or groups of necessary
services, which we do not have existing competence, we may
provide the service, as long as:
a. we have closely related prior training or experience,
and
b. we make a reasonable effort to obtain the
competence required by undergoing relevant
research, training, consultation, or thorough study.
5. In those emerging areas in which generally recognized
standards for preparatory training do not yet exist, but in
which we are required or requested to make available our
services, we shall take reasonable steps to ensure the
competence of our work and to protect our clients/patients,
students, supervisees, research participants, organizational
clients, and others from harm.
6. We shall be reasonably familiar with the relevant judicial
or administrative rules when assuming forensic roles.

B. Providing Services in Emergencies


 We shall make available our services in emergency
situations to individuals for whom the necessary mental
health services are not available even if we lack the
training appropriate to the case to ensure these
individuals are not deprived of the emergency services
they require at that time.

BIOJON 2
ASES3111 FINAL

UTILITY  utility analysis is an umbrella term covering various


possible methods, each requiring various kinds of data
to be inputted and yielding various kinds of output
 Refer to the usefulness of some thing or some process.
 also referred to as test utility Expectancy data
 It refers to how useful a test is  require little more than converting a scatterplot of test
 It refers to the practical value of using a test to aid in data to an expectancy table
decision making  Expectancy table can provide an indication of the
likelihood that a testtaker will score within some
What is Utility? interval of scores on a criterion measure—an interval
 Utility in the context of testing and assessment as the that may be categorized as “passing,” “acceptable,” or
usefulness or practical value of testing to improve “failing.”
efficiency
 Refers to anything from a single test to a large-scale Hit Rate
testing program that employs a battery of tests.  A correct classification
Miss
Factors That Affect a Test’s Utility  An incorrect classification; a mistake
Psychometric soundness Hit rate
 refer—as you probably know by now—to the reliability  The proportion of people that an assessment tool
and validity of a test. accurately identifies as possessing or exhibiting a
 A test is said to be psychometrically sound for a particular trait, ability, behavior, or attribute
particular purpose if reliability and validity coefficients Miss rate
are acceptably high.  The proportion of people that an assessment tool
 An index of reliability can tell us something about how inaccurately identifies as possessing or exhibiting a
consistently a test measures what it measures; and an particular trait, ability, behavior, or attribute
index of validity can tell us something about whether a False positive
test measures what it purports to measure.  A specific type of miss whereby an assessment tool
Cost falsely indicates that the testtaker possesses or exhibits
 one of the most basic elements in any utility analysis is a particular trait, ability, behavior, or attribute
the financial cost of the selection device False negative
 cost in the context of test utility refers to disadvantages,  A specific type of miss whereby an assessment tool
losses, or expenses in both economic and noneconomic falsely indicates that the testtaker does not possess or
terms exhibit a particular trait, ability, behavior, or attribute
 term costs can be interpreted in the traditional,
economic sense; that is, relating to expenditures Some Practical Considerations The
associated with testing or not testing pool of job applicants
Benefits  If you were to read a number of articles in the utility
 Judgments regarding the utility of a test may take into analysis literature on personnel selection, you might
account whether the benefits of testing justify the costs come to the conclusion that there exists, “out there,”
of administering, scoring, and interpreting the test. what seems to be a limitless supply of potential
 Benefit refers to profits, gains, or advantages employees just waiting to be evaluated and possibly
 As we did in discussing costs associated with testing selected for employment
(and not testing), we can view benefits in both
economic and noneconomic terms. The complexity of the job
 In industrial settings, a partial list of such noneconomic  In general, the same sorts of approaches to utility
benefits—many carrying with them economic benefits analysis are put to work for positions that vary greatly in
as well—would include: terms of complexity.
 an increase in the quality of workers’  The same sorts of data are gathered, the same sorts of
performance; analytic methods may be applied, and the same sorts of
 an increase in the quantity of workers’ utility models may be invoked for corporate positions
performance; ranging from assembly line worker to computer
 a decrease in the time needed to train programmer
workers;
 a reduction in the number of accidents;
 a reduction in worker turnover

Utility Analysis
 Defined as a family of techniques that entail a cost–
benefit analysis designed to yield information relevant The cut score in use
to a decision about the usefulness and/or practical value  Also called a cutoff score, we have previously defined a
of a tool of assessment cut score as a (usually numerical) reference point
derived as a result of a judgment and used to divide a
set of data into two or more classifications, with some

BIOJON 3
ASES3111 FINAL

action to be taken or some inference to be made on the


basis of these classifications.
 Type of cut score is set with reference to the
performance of a group (or some target segment of a
group), it is also referred to as a norm-referenced cut
score
 Fixed cut score, which we may define as a reference
point—in a distribution of test scores used to divide a
set of data into two or more classifications—that is
typically set with reference to a judgment concerning a
minimum level of proficiency required to be included in
a particular classification
 Multiple cut scores refers to the use of two or more cut
scores with reference to one predictor for the purpose
of categorizing testtakers.
 Multistage (or multiple hurdle) selection process, a cut
score is in place for each predictor used
 multiple hurdles may be thought of as one collective
element of a multistage decision-making
 Compensatory model of selection, an assumption is
made that high scores on one attribute can, in fact,
“balance out” or compensate for low scores on another
attribute.
 Angoff method for setting fixed cut scores can be
applied to personnel selection tasks as well as to
questions regarding the presence or absence of a
particular trait, attribute, or ability.

The Known Group Method


 Entails collection of data on the predictor of interest
from groups known to possess, and not to possess, a
trait, attribute, or ability of interest. Based on an
analysis of this data, a cut score is set on the test that
best discriminates the two groups’ test performance.

BIOJON 4
ASES3111 FINAL

BIOJON 5

You might also like