Psychological Testing and Measurement (PSY-631) VU
Lesson 23
Item Analysis
So far we have discussed the major concepts related with item analysis. However, there are some other concepts
too that you should be familiar with, though you might be using them at a later, higher, level of your studies or
whenever you will work on test development.
In this section you will be introduced to the following concepts:
1. Item response theory
2. Item- characteristic curves
3. Cross validation
4. Qualitative analysis of tests
Item Response Theory:
Item response theory is an approach that takes into consideration the probability of answering, right or wrong,
each individual item in a test. The information regarding each item is plotted graphically. This approach is known
as the item response theory or IRT. It is also known as Item- Characteristic Curve Theory and Latent Trait
Theory. The graph containing information about the items is called the item- characteristic curve. Decisions
regarding the items of a test can be based upon this information.
Item- Characteristic Curves:
Item difficulty and item discrimination can be presented graphically also. Item- characteristic curves are the
graphs that represent these characteristics of a test. The horizontal axis represents the ability being tested
whereas the vertical axis contains the probability correct responses or the proportion of examiners responding
correctly to the item. In the words of Kaplan and Saccuzzo (2001), it is “a graph prepared as part of the process
of item analysis. One graph is prepared for each item and shows the total test score on the X axis and the
proportion of test takers passing the item on Y axis” (p. 637). The shape or slope of the graph or curve
indicates whether the item is a good one or not, how far does it discriminate high scorers from low scorers. A
steep slope indicates that the test discriminates between the two groups. Scores of a highly discriminating test
will yield a very steep slope.
A good item has a positive slope. As can be seen from the following graph, the proportion of high scorers
responding correctly is higher than the proportion of low scorers. More of the low scorers are not responding
correctly. It can be said that the item that yielded such a curve is a good item.
A good item
1.2
Proportion of examinees responding
1.0
correctly to the item
0.8
0.6
Total test score
0.4
0.2
0.0
total test score
©copyright Virtual University of Pakistan 1
Psychological Testing and Measurement (PSY-631) VU
A negative slope like the following one, or a similar one in the same direction, means that the item is not a good
one. The item does discriminate between the two groups, high scorers and low scorers, but in the opposite way.
We do not expect and accept this type of discrimination. This graph indicates that more of the low scorers did
the item correctly than the high scorers. This means that the probability of answering the item correctly is higher
for the low scorers rather than the high scorers. This, therefore, is a bad or poor item that needs to be removed
or replaced.
A weak/poor/bad item
1.2
Proportion of examinees responding
1.0
correctly to the item
0.8
0.6
Total test score
0.4
0.2
0.0
total test score
A weak/poor/bad item
1.2
Proportion of examinees responding
1.0
correctly to the item
0.8
0.6
Total test score
0.4
0.2
0.0
total test score
Another type of items can be the one in which the majority of neither the top scorers nor the low scorers do the
item correctly. It is the middle, moderate, scorers who attain the maximum proportion of correct responses. This
type of item is also a bad item.
©copyright Virtual University of Pakistan 2
Psychological Testing and Measurement (PSY-631) VU
A weak/poor/bad item
0.70
0.60
0.50
Proportion: 0.40
examinees Total test score
0.30
responding
correctly 0.20
0.10
0.00
total test score
Cross Validation:
The validity of a test, we know, may be determined from a sample that was used for item selection. However, in
order to have a better estimate of the validity of the test, the entire test needs to be validated on different samples
as well. This process is called cross validation. “The term cross validation refers to a revalidation of a test on a
sample of test takers other than the ones on whom test performance was originally found to be a valid predictor
of some criterion” ( Cohen and Swerdlik,1999, p.246). The regression equation is used to predict performance in
a sample of test takers who are different from the ones on whom the test was validated.
If validity of a test is computed from the original sample used for sample selection, then there are chances that
the validity index will be higher than the one expected to be obtained from a new or different sample of test
takers. It is expected so because of the possible chance variations. It is expected that validity will shrink in the
process of cross validation. “The amount of decrease in the strength of the relationship from the original sample
to the sample with which the equation is used is known as shrinkage” (Kaplan & Saccuzzo, 2001).
The factors that may affect the amount of shrinkage:
1. The size of the original item pool
2. Proportion of test items retained
3. Sample size
High validity coefficient can be expected if the original item pool was large while the proportion of retained
items is small. The size of cross validation sample also affects shrinkage. Greater validity shrinkage may be
expected if smaller samples are used.
Qualitative Analysis:
Qualitative analysis of a test may also be conducted along with quantitative analysis. After test administration is
over, the test takers may be asked questions about various aspects of the test. These questions can be asked and
answered orally or in writing. Different formats can be adopted for this purpose e.g. interview, discussions etc.
The respondents’ responses can be of great help in improving the individual test items, test format, and the
entire test itself. Cohen and Swerdlik (1999) have pinpointed some areas that may be explored
• Cultural sensitivity
• Face validity
• Test administrator
• Test environment
• Test fairness
• Test language
• Test length
• Test taker’s guessing
©copyright Virtual University of Pakistan 3
Psychological Testing and Measurement (PSY-631) VU
• Test taker’s integrity
• Test taker’s mental/physical state upon entry
• Test taker’s mental/physical sate during the test
• Test taker’s overall impressions
• Test taker’s preferences
• Test taker’s preparation
Review of Item Analysis:
Methods used for assessing and evaluating characteristics of test items and the test itself. Primarily two
characteristics are measured, item difficulty and discriminability.
Item-Difficulty Index:
As mentioned earlier, item difficulty index is either in form of percentages or proportions of the total number of
test takers who attempted an item correctly. Item difficulty is calculated separately for every item. It is denoted
by a lowercase italicized ‘p’. A number attached as subscript to this ‘p’ indicates the item number whose difficulty
level is described. For example p 1 indicates item difficulty of item number 1, p 2 is the difficulty level of item
number 2 and so on. On occasions you may come across the term ‘facility index’ rather than difficulty index that
refers to the percentage of responses to correct choices. Both terms refer to the same procedure.
Item Discrimination:
A test is supposed to discriminate between those who know and those who do not know; those who score high
and those who score low; those who have acquired a skill and those who have not. A test will not be a good test
if the people who are supposed to know the correct answer fail and those who are not supposed to know
succeed. A good test differentiates between the high and low scorers. If some items are correctly answered by
high scorers and some by low scorers then something is wrong with the test.
Item Discrimination Index:
Every test has its discrimination power. To see if the test discriminates between high and low achievers a certain
percentage of the high and low achievers are taken. The discrepancy between their attempts to correct responses
is calculated in terms of percentages. Item discrimination index is denoted by a lowercase italicize
©copyright Virtual University of Pakistan 4
Psychological Testing and Measurement (PSY-631) VU
Lesson 24
Assessment of Intellectual and Cognitive Abilities
Thinking about intelligence and intelligence testing, a number of questions come to one’s mind:
• What is intelligence?
• Why do we try to understand intelligence?
• Why do we measure intelligence?
• Why cannot we use the same measure for everyone?
• Can intelligence tests be taken as reliable measures of intellectual ability?
• Are these tests valid?
• Are intelligence tests the only way to measure intellectual ability of a person?
Moving on to a relatively personal side, one may ask:
• Am I an intelligent person?
• Is intelligence about a skill, specific skills, or a general ability that I possess?
• How do I judge using my observation if someone is intelligent or not?
• Is intelligence an inborn ability or is it a learned phenomenon?
None of these questions can be answered in one word, one sentence or one answer.
Before we start our discussion on intelligence testing, we need to understand what intelligence is. This is
important because we can understand the significance, logic, and process of intelligence testing if we have
understood what intelligence means to both the test developer and the user. Different authors and researchers
have proposed a variety of descriptions of intelligence.
Intelligence:
As discussed in earlier courses also, we know that “intelligence is the capacity to understand the world, think
rationally, and use resources effectively when faced with challenges”( Feldman, 2002, p.261).
Intelligence refers to the ability to adapt, to reason, to solve problems, and think in an abstract manner; it also
includes learning and experiencing new things and understanding from the past experiences.
Intelligence or the intellectual ability of a person is based upon a constant and ongoing interaction between
environmental factors and inherited potentials in order to have better understanding of how to ‘use’ and ‘apply’
the potentials in a meaningful manner.
Modern psychology considers both environment and heredity and their interaction to be influential.
Theories of Intelligence:
One of the earliest contributions to the measurement of intelligence in the 19th century was made by Sir Francis
Galton, the cousin of Charles Darwin. Born in the family of geniuses he himself was a genius having a very
high IQ. Besides being a geographer, meteorologist, and tropical explorer, he was the “founder of differential
psychology”.
According to Francis Galton ("Hereditary Genius, 1869) “gifted individuals” tended to come from families,
which had other, gifted individuals.
His was the first systematic attempt to measure intelligence by investigating the role of heredity and its impact on
intellectual abilities. He attempted to measure human trait quantitatively in order to determine the distribution of
heredity in it. He also talked about the relationship between intelligence and the size of one’s head, but the idea
received no empirical support.
Cattell, an American psychologist, gave more importance to the mental processes. He used the term “mental
test” for the first time for devices used to measure intelligence. He developed tasks that were aimed to measure
reaction time, word association test, keenness of vision and weight discrimination.
Another name without which the history of intelligence testing will never be complete is that of Alfred Binet
who developed the first proper formal intelligence test in 1905.his test will be discussed in detail in the following
sessions. All these people talked more about intelligence tests and little about what constituted intelligence itself.
Today we see that there are a number of approaches to explaining and understanding the concept of intelligence.
Spearman’s g-factor theory:
©copyright Virtual University of Pakistan 1
Psychological Testing and Measurement (PSY-631) VU
Psychologists have always been interested in studying if intelligence is a single, specific, ability or a general ability
that contains multiple components and is reflected in all aspects of one’s life. One of the earliest theories of
intelligence proposed the idea that intelligence was a single, general factor. British psychologist, Charles
Spearman who gave his theory in the early 1900s, observed that people who scored high on one mental test also
tend to score on the other as well. The same applies to the low scorers. Using “factor analysis” a statistical
technique on the basis of which he proposed two factors that can account for the individual differences; g- factor
and s- factor. The “g” factor referred to “general intelligence” and the “s” factor meant “specific intelligence”.
Spearman and other psychologists with similar approach believed that there was a single factor, a general factor,
for mental ability, the “g” factor.
This factor, Spearman proposed, can account for the general ability that is common in all people: as observed
from the mental tests, whereas the‘s’ factor can account for the specific abilities that are different in different
people. Spearman’s theory seemed to be impressive to many people but not all. There were psychologists who
believed otherwise and proposed alternate explanations of intelligence.
Thorndike’s Social Intelligence:
Thorndike was critical of Spearman’s g- factor approach. He argued that instead of only one ‘g’ factor, there are a
number of factors that make and influence intelligence; factors that cannot generally be found out but that are
expressed in human actions.
According to him intelligence covers three main divisions:
1. Social intelligence; that enables one to understand and manage relationships
2. Abstract intelligence; that enables one to understand and manage ideas such a algebra, mathematics, or
abstract concepts
3. Concrete intelligence; that enables us to manage concrete and mechanical concepts and ideas e.g. accounting,
economics, architecture, banking etc.
Thurstone’s approach: Primary Mental Abilities:
Some psychologists such as American psychologist Louis L. Thurstone (1938), argued that intelligence is not a
general factor, but it is composed of small independent factors or elements. Thurstone called these factors
“primary mental abilities”.Thurstone and his wife prepared a set of 56 tests for the identification and verification
of these abilities. These were administered to 240 college students. The results were analyzed through factor
analysis. The analysis yielded seven primary mental abilities:
• Verbal comprehension: An ability to understand and define words
• Word fluency: An ability or speed of thinking of verbal material such as rhyming, or naming words in a given
category
• Spatial visualization: Ability to recognize and manipulate objects or things in three dimensions such as
drafting and blue print reading
• Perceptual speed: An ability to quickly perceive and detect the visual details and differentiate between the
similarities and differences between designs
• Reasoning/ inductive reasoning: A logical ability of deriving general ideas from specific information
• Numbers/ Arithmetic ability: Capability of doing work easily on numbers such as doing simple arithmetic
tasks fast and rapidly
• Memory: An ability or capacity of remembering and retaining the material such as words, letters and ability
to recall and associate different words.
Crystallized and Fluid Intelligence: R.B Cattell:
After in depth research and observation some psychologists have proposed the idea that there are two types of
intelligence, crystalized intelligence and fluid intelligence.
1. Crystallized intelligence: The capability of using information that has been learnt through experience.
Education and culture affect this type of intelligence. Whatever knowledge, skills, techniques, and arts one
learns over years of one’s life are accumulated. All of these are applied in problem solving situations. This
type of intelligence keeps on increasing with age, or the learning experiences of a person.
2. Fluid intelligence: Largely influenced by biological factors, it is the capability of information- processing,
solving problems which depends more on the neurological development of a person such as reasoning and
©copyright Virtual University of Pakistan 2
Psychological Testing and Measurement (PSY-631) VU
memory, which decline with age. If we are learning a multiplication table, grouping objects according to
some classification system, or solving an equation then we will be using fluid intelligence.
It “reflects information- processing capabilities, reasoning, and memory” (Feldman, 2002, p. 269).
Guilford’s theory of the Structure of Intellect (SOI):
It is a model of intelligence according to which intelligence is the result of the interaction of operations, contents
and products.
He believed that intelligence is a much more complex phenomenon than one thinks of it; it is difficult to define
it as a ‘g’ factor or in terms of ‘primary mental abilities. Guilford talks of 150 such abilities/ factors (Guilford,
1985).
The different components of intelligence are:
Operations: it is the potential of different ways of thinking including:
• Evaluation
• Convergent thinkingDivergent thinking
• Memory retention
• Memory recording
• Cognition
Contents: A potential of what we think about something. Contents include:
• Visual
• Symbolic
• Semantic
• Behavioral
Products: The results obtained by applying certain operations to certain contents, or the ability of thinking in a
certain manner about a certain thing. Products include:
• Units
• ClassesRelations
• SystemsTransformation
• Implication
Multiple Intelligences: Howard Gardner’s Approach:
Howard Gardner (1985) maintained that intelligence does not consist of a single factor. Intelligence, he
proposed, consists of eight independent intelligences. These eight are possessed by all individuals though in
varying degree. The said eight kinds of intelligences include:
1. Linguistics
1. Logical- mathematical
2. Spatial intelligence
3. Musical intelligence
4. Bodily- kinesthetic
5. Interpersonal intelligenceIntrapersonal intelligence
7. Naturalistic intelligence
Sternberg’s Triarchic Theory:
According to Robert Sternberg’s triarchic or three- dimensional theory given in the 1980s, intelligence consists of
three main components:
• Analytic intelligence
• Creative intelligence
• Practical intelligence
No other psychologist has talked about practical intelligence and its significance the way Sternberg has. Practical
intelligence is related to overall success in living (Sternberg, 2000). According to Sternberg, the traditional tests
©copyright Virtual University of Pakistan 3
Psychological Testing and Measurement (PSY-631) VU
measure academic success whereas career success requires practical intelligence. Practical intelligence is a learnt
phenomenon. It results from the observation of others’ behavior, unlike academic success that results from
knowledge of a particular information base obtained from reading and listening.
Emotional Intelligence:
A modern approach is to judge the ability of people from their emotional intelligence (Goleman, 1995) .
Emotional intelligence is about the ability to go along with others according to Daniel Goleman (1995). Accurate
assessment, evaluation, expression, and regulation of emotions have certain underlying skills and these skills stem
from emotional intelligence.
Emotional intelligence involves the realization and regulation of personal emotions as well as empathy and
understanding of others’ emotions. Social skills, self-awareness, and empathy are important aspects of one’s
emotional sphere, and all three require emotional intelligence.
Piaget’s View of Intelligence:
• Intellectual development can be defined in terms of qualitative changes in thinking which are clearly
apparent in children of particular age.
• His theory is more concerned with the universal patterns of intellectual development and functioning. He
maintained a comprehensive theory that emphasized on ‘how’ children acquire knowledge and use it to solve
logical problems
• He was more interested in how children exhibit intelligence in different stages of life as he proposed the four
stages of cognitive development, which he termed as universal and invariant (occurring in the same
sequence). The stages are: sensorimotor, preoperational, concrete operational and formal operational.
©copyright Virtual University of Pakistan 4
Psychological Testing and Measurement (PSY-631) VU
Lesson 25
Measurement of Intelligence
A variety of tests are available to us for the assessment of intelligence. The tests may be chosen according to the
specific purpose for which the test is to be used. The tests may be used independently or in combination with
other tests i.e., as part of a battery of tests. Similarly, we have the option to choose from individual tests or group
tests. The prevalent, modern, approaches to measure intelligence are based upon the contribution of Alfred
Binet.
In this section we will discuss some intelligence tests that are commonly used. First of all we will look at the
historical evolution of Binet Scale in order to get a feel of how tests are developed and how their evolution may
continue over decades.
The Binet Scale:
French psychologist Alfred Binet and Theodore Simon, in 1905 in France, developed the first formal measure of
intelligence. The purpose of their scale was to assist the education ministry and department in identifying “dull”
students in the Paris school system, so that they could be provided remedial assistance or training. They felt that
the children’s performance could be used as an indicator of their intelligence. In other words, they believed that
intelligence can be measured in terms of performance of a child.
The idea was that children can perform different types of tasks at each age level. With growing age the tasks that
a child can perform become more difficult and complex in nature. If a child can perform the tasks that children
of his age can perform then he is an intelligent child; if he cannot perform the way his age mates do, then he
possesses below average intelligence; the one who can perform tasks that children older than him can perform
then he has above average intelligence. However, when Binet developed his scale, the main focus was to identify
children who could not perform according to their age level. For the next many decades, 1905 onwards, a
number of revisions were made in the original scale. The first intelligence test, Binet and Simon’s scale could
identify more intelligent children within a particular age group and could differentiate intelligent children from
the less intelligent ones.
The development of Binet and Simon’s scale:
First of all a number of tasks were developed. Then groups of students who were categorized or labeled as ‘dull’
or ‘bright’ by their teachers were selected. The tasks were presented to them. The tasks that could be completed
by the ‘bright’ students were retained; the rest were discarded. The idea was to retain tasks that could be
completed by the bright students, as these were considered to be indicative of the child’s intelligence. The
realization was there that not all tasks could be performed by all children, even the bright ones; tasks were age
related. Children of any age group could perform tasks specific tasks. Children of a lower age group could not do
tasks meant for a higher level. If a child could perform tasks meant for a higher age group, she was considered to
be of above average intelligence. On the contrary, if a child could not perform tasks meant for her age then she
was considered as a dull child. Using the same approach dull or bright children could be identified with reference
to their age.
Binet’s scale became popular very soon and work was being done in various parts of the world on its translation
and adaptation. The U. S was a country that took lead in this regard. A training school in New Jersey was using it
in 1908 (Goddard, 1910). A modified version was published in 1912 (Kuhlman, 1912). This version included a
widened age range that went down to three months of age.
The major developments took place at Stanford University. The major milestones in the history of this scale are
as follows:
The 1905 Scale:
The original, 1905, scale comprised 30 items arranged according to increasing order of difficulty. The norms
were obtained from a sample of only 50 children who were reported to be ‘normal’ considering their average
school performance. Though normative and validity related information for this scale was not sufficient, this
scale was a major milestone in the history of psychological measurement.
©copyright Virtual University of Pakistan 1
Psychological Testing and Measurement (PSY-631) VU
The 1908 Scale:
The 1908 scale was similar to the 1905 scale in that it was an age scale that retained the principle of age
differentiation. The concept of mental age was introduced in this revision. The performance of a test taker was
compared with the average performance of other persons of the same chronological age, as discussed earlier.
The standardization sample for this revision included 203 individuals.
The 1916 revision: The Stanford- Binet Intelligence Scale:
The American psychologist, Lewis Terman gave the first Stanford revision of the scale in 1916, known as
Stanford- Binet Intelligence Scale. The revision was standardized on American sample and was meant for age 3
years to 14 years plus average and superior adults. The standardization sample was increased in size, though
containing only white- native Californian children. This fact was a reason for criticism against the scale.
This scale had three significant characteristics; (a) it used the concept of IQ and was the first American test to do
so, (b) It included the concept of ‘alternate item’ i.e., an item that could be used in place of a regular item on
occasions when the original item could not be used properly due to any reason, and (c) it provided detailed and
organized instructions for administration and scoring.
The 1937 Revision:
The next revision took place in 1937. It was the result of a project that began in 1926. Terman and Merrill,
Terman’s colleague at Stanford, worked on this revision. The 1937 revision contained new tasks meant for
preschool level and for adult level. It comprised two equivalent forms, ‘L’ and ‘M’ (the initials of the first names
of the authors). The test was appreciated for its validity and reliability. The standardization sample was much
larger in size than the previous samples. It consisted of 3184 individuals, selected from 11 U.S. states. However,
it was criticized for lack of representativeness because the subjects were all whites and more from urban than
rural areas. Nevertheless it was more improved than the previous versions.
The Concept of Mental Age:
Children taking the Binet- Simon test were assigned a score that corresponded to the age group they belonged to.
This score indicated their “mental age”. Mental age referred to the average age o children who secured the same
score. Mental age can be understood as the typical intelligence level found for people at a given chronological
age. •Mental age of a person can be different from his or her chronological age i.e., it can be above or below that.
It could reflect whether or not a child was performing at a level at which his age mates were
The Concept of Intelligence Quotient or IQ:
As a result of problems with depending merely on mental age, a solution was devised in terms of intelligent
quotient, a concept whereby the chronological age of the person is also given due consideration. It is an indicator
of intelligence that takes into consideration a person’s mental as well as chronological age. The formula for IQ is:
IQ score= MA/ CA x 100
This formula is basically a ratio of a person’s mental and chronological age. It is multiplied with 100 for
eliminating decimal points. Using this formula means that if the mental and chronological age of a person is the
same, then he or she will have an IQ of 100. If one is below his chronological age then the IQ will fall below 100
and vice versa.
The 1960 Revision:
This revision was followed by another one in 1960. The 1960 edition was being worked upon when Terman died
in 1956. One major change that took place in this edition was that instead of two forms, it comprised a single
form, ‘L-M’. No new items were added and the test included the best items from the two previous forms. This
form of the test used the concept of deviation IQ rather than ratio IQ. The previous test manuals contained ratio
IQ tables, whereas the 1960 edition’s manual included deviation IQ tables. Use of deviation IQ meant that the
performance of a test taker was compared with the performance of other people of the same age level in the
standardization sample. The mean score is taken to be 100 with a standard deviation of 16. The score of a test
taker is converted into a standard score using these values. Using the ratio IQ one could assess a person’s
©copyright Virtual University of Pakistan 2
Psychological Testing and Measurement (PSY-631) VU
intelligence but not his standing in comparison to other test takers. But the use of deviation IQ made it possible
to estimate a test taker’s relative position with reference to the standardization sample. However, the criticism
against unrepresentativeness of the test continued.
The 1960 revision did not involve re-standardization or a new normative sample.
The 1972 Revision:
In 1972 an improved normative sample was taken which comprised 2100 subjects. The sample included
nonwhites as well. About 100 subjects for each Stanford- Binet age level were included. Still the sample was
criticized for not taking enough non-whites.
The 1986 Version: (The Stanford Binet: 4th Edition):
This version of the Binet Scale overcame the problems for which it was criticized. A standardization sample of
5000 subjects was used. The subjects belonged to 47 states of the U.S. and the District of Columbia. Geographic
region, community size, ethnic group, age, and gender were considered for stratification of the sample.
The following content areas are covered in the latest scale:
1. Verbal reasoning
2. abstract/ visual reasoning
3. quantitative reasoning
4. short- term memory
The subtests of the scale are as follows:
1. Verbal reasoning
i. Vocabulary
ii. Comprehension
iii. Absurdities
iv. Verbal relations
2. Abstract/ visual reasoning
i. Pattern analysis
ii. Copying
iii. Matrices
iv. Paper folding and cutting
3. Quantitative reasoning
i. Quantitative subtest
ii. Number series
iii. Equation building
4. Short- term memory
i. Bead memory
ii. Memory of sentences
iii. Memory of digits
iv. Memory of objects
Administration of Stanford-Binet test:
Individual-oral administration is used. The examiner begins from a mental level at which he finds out the subject
to be. Items from succeeding levels are asked. The test ends when they reach a level where no items are
successfully attempted. The administrator establishes the ‘basal age’ and a ‘ceiling’ for each test. Basal age refers
to “the lowest level or point where two consecutive items of approximately equal difficulty are passed” and
ceiling is “the point at which at least three out of four items are missed” (Kaplan & Saccuzzo, 2001, p. 271).
Scores for all 15 tests are attained, and are converted into standard age scores, with mean of 50 and SD of 8.
Four area- content scores result from the grouping of individual tests into content areas. The mean in this case is
100 and SD is 16. a composite score is also calculated.
©copyright Virtual University of Pakistan 3
Psychological Testing and Measurement (PSY-631) VU
Some Sample Items from Early Versions of Simon-Binet Scale:
Three years: Shows nose, eyes and mouth. Repeats two digits, describes objects in a picture, gives family name
and repeats a sentence of six syllables.
Four years: Gives own sex, names key, knife, and penny, repeats three digits, compares the length of two lines.
Five years: Compares two weights, copies a square, repeats a sentence of ten syllables and counts four pennies.
Six years: Distinguishes between morning and afternoon, defines objects in terms of their use, copies a shape,
counts 13 pennies and compares faces from the aesthetic point of view.
Seven years: Identifies right hand and left ear and describes a picture, follows precise directions and names four
colors.
Eight years: Compares two remembered objects, counts from 20 to 0, indicates omissions in pictures, gives day
and date and repeats five digits.
At the highest level of Fifteen years: Repeats seven digits, gives three rhymes, repeats a sentence of 26 syllables,
interprets a picture and solves a problem from several facts
The Wechsler Scales:
The Wechsler scales are perhaps the most commonly used intelligence tests. These were developed by
psychologist David Wechsler. Three Wechsler intelligence tests that are available to us at present are:
i. Wechsler Adult Intelligence Scale: third edition or WAIS-III: this scale is meant for ages 16 years to 89
years
ii. Wechsler Intelligence Scale for Children: third edition or WISC-III: this scale is meant for children
aged 6 to 16
iii. Wechsler Preschool and Primary Scale of Intelligence-Revised or WPPSI-R: this scale is for children
aged 3 years to 7 years three months.
The first scale was developed by David Wechsler was published in 1939, known as the W-B I or Wechsler -
Bellevue. Wechsler was employed by Bellevue Hospital in Manhattan. The test was not an age scale like the Binet
scale. Rather it was a point scale in which credit for every correct response was given. In 1942 another form, an
equivalent one, was also developed. Known as W-B II, this form is rarely talked about. The original scale had
some problems, particularly those related with standardization. The standardization sample used for this scale
comprised 1081 whites as subjects. Most of them belonged to New York. But Wechsler soon removed the initial
flaws, revised W-B I, and developed the WAIS (Wechsler Adult intelligence Scale) in 1955. The WAIS was
revised again and WAIS-R was introduced in 1981. The latest version is the WAIS- III that was developed in
1997.
The scale consists of two categories of scales, verbal and performance. The details of the two types of subtests
are as follows:
Verbal scale:
i. Vocabulary
ii. Similarities
iii. Arithmetic
iv. Digit span
v. Information
vi. Comprehension
vii. Letter- numbering sequencing
©copyright Virtual University of Pakistan 4
Psychological Testing and Measurement (PSY-631) VU
Performance Scale:
i. Picture completion
ii. Digit symbol- coding
iii. Block design
iv. Matrix reasoning
v. Picture arrangement
vi. Symbol search
vii. Object assembly
Administration of WAIS and WISC or the latest prevalent forms is time consuming because it requires individual
administration.
Psychometric Properties of WAIS-III:
A standardization sample comprising 2450 subjects was used. The subjects were all adults. Thirteen age groups
were taken, starting from 16-17 and going up to 85-89. The sample’s stratification was done on the basis of
gender, race, education, and geographic region. This information was obtained from the 1995 census of the U.S.
The Meaning of IQ Test Scores:
The commonly followed standards of interpreting IQ scores are as follows:
IQ score Rating
< 70 Retarded
85 Borderline
100 Average
Above 115 Superior
Above 140 Gifted
©copyright Virtual University of Pakistan 5
Psychological Testing and Measurement (PSY-631) VU
Lesson 26
Intelligence Tests
The Kaufman Scales:
The husband and wife duo, Alan and Nadeem Kaufman, have contributed to intelligence testing by developing
the following tests:
i. K-ABC: Kaufman Assessment Battery for Children (Kaufman & Kaufman, 1983)
ii. K-BIT: Kaufman Brief Intelligence Test (Kaufman & Kaufman, 1990)
iii. KAIT: Kaufman Adolescent and Adult Intelligence Scale (Kaufman & Kaufman , 1993)
K-ABC: Kaufman Assessment Battery for Children
The following global scales are included in this test battery:
i. Sequential processing. It includes subtests; Hand movement, number recall, word order.
ii. Simultaneous processing. It includes subtests; Magic window, face recognition, Gestalt closure, triangles,
matrix analogies, spatial memory and photo series.
iii. Mental processing Composite (combining i and ii)
iv. Achievement. It includes subtests; expressive vocabulary, faces and places, arithmetic, riddles,
reading/decoding, reading/ understanding
v. Nonverbal. It includes subtests; face recognition, hand movements, triangles, matrix analogies, spatial
memory and photo series.
The scales further consist of subtests, 16 in number other than the nonverbal scale.
As can been seen from these four areas, the battery is focusing on information processing. The information
processing approach looks into the way information is processed. A national sample of 2000 American children
was used for standardization of this battery. The ages of the children ranged between 2.5 to 12.5 years. The
sample considered a number of characteristics of the subjects; age, gender, geographic region, parental education,
community size, educational placement, and race. In addition, gifted and talented, mentally retarded, and learning
disabled were also included on the basis of their proportion in general public.
K-BIT: Kaufman Brief Intelligence Test:
The K- BIT, meant to be used with ages 4 to 90 years, is a quick screening instrument. It is an individual test for
assessment of intellectual functioning that involves individual administration. The K-BIT yields three scores
namely, verbal, non-verbal, and composite. The verbal subtest includes 45 Expressive Vocabulary items and 37
Definitions. The non-verbal subtest contains 48 matrices. This test used nearly 20% of the KAIT
standardization sample.
KAIT: Kaufman Adolescent and Adult Intelligence Scale:
The KAIT was developed for measuring intelligence of subjects aged 11 to 85 plus. It consists of a Crystallized
scale and a fluid scale. The former measures concepts leaned from schooling and acculturation. The latter is
about the ability to solve new problems. Three subtests are used in each scale in the Core Battery. An additional
plus point is the possibility of using an Expanded Battery for subject who are suspected to have neurological
damage. In such case, any of four specified subtests are added to the original battery. For the test takers who are
cognitively impaired to the extent that they cannot complete the whole battery then a brief mental status test is
also included. This test assesses attention and orientation.
Differential Ability Scales:
The Differential Ability Scales or DAS were developed by C. D. Elliot (1990) in gReat Britain. It is actually a
revised form of the British Ability Scales (Elliot, Murray, &Pearson, 1979).
There are three major components of the DAS that contain 20 subtests in all.
The Core subtests:
i. Block Building
ii. Verbal Comprehension
iii. Picture Similarities
iv. Naming Vocabulary
©copyright Virtual University of Pakistan 1
Psychological Testing and Measurement (PSY-631) VU
v. Early number Concepts
vi. Copying
vii. Pattern Construction
viii. Recall of Designs
ix. Word Definitions
x. Matrices
xi. Similarities
xii. Sequential and Quantitative reasoning
Diagnostic subtests:
i. Matching Letter-Like forms
ii. Recall of digits
iii. Recall of objects
iv. Recognition of Pictures
v. Speed of Information Processing
Achievement Test:
i. Basic number skills
ii. Spelling
iii. Word reading
The four core subtests are used with preschoolers, preschoolers aged 2 years and 6 months to 3 years and 5
months. Six core subtests are meant for preschoolers of age 3 years and 6 months to 5 years and 11 months. Six
subtests are for school level, ages 6 years to 17 years and 11 months.
The standardization sample included 3475 persons, aged 2 years and 6 months to 17 years and 11 months,
representing the target population of all non-institutionalized English- proficient individuals in the U.S. The
subjects’ age, gender, race/ethnic origin, parental education, and geographic region were taken into consideration
in sample selection for the sake of stratification.
Cultural Biases and Intelligence Tests:
Tests used to assess people’s intelligence have been frequently criticized for being biased against particular
groups of people. Culture-fair IQ tests are developed and used for overcoming this problem. These tests do not
discriminate against any minority or cultural group, e.g. Raven’s Progressive Matrices
Some Significant Questions Pertaining To The use of IQ Tests
Is the test a validity test?
Is it a reliable test?
Was it standardized?
Is it being used with people similar to those included in the standardization sample?
Is it being used with people different from those included in the standardization sample?
Are the differences drastic and serious?
Have the consequences been anticipated and weighted against the expected benefits?
Can the cultural background of the test taker affect the results?
Can the test takers ethnic origin affect the test results?
Can the administration be problematic due to personal, environmental, physical or other reasons?
All these questions need to be considered, answered, and tackled in case problems are seen or foreseen.
Alternative Formulations
These include:
• Moral intelligence
• Social intelligence
• Emotional intelligence
©copyright Virtual University of Pakistan 2
Psychological Testing and Measurement (PSY-631) VU
Moral intelligence:
Given by Coles (1997) and Hass (1998)
• It is the ability to differentiate between right and wrong
• More comprehensively, it is the capacity of making right decisions that are not only beneficial for one self
but to others as well
Social Intelligence:
Given by Hough, 2001; Riggio, Murphy, & Pirozzolo, 2002)
• Manifested as SQ
• Ability to understand and deal with people; salesmen, politicians, teachers, clinicians, and religious leaders
exhibit this type of intelligence
• It is also the ability to understand and deal with in own self by identifying one’s thoughts, feeling, attitudes
and behaviors
Emotional intelligence (EI)
• It is the type of social intelligence which is the ability to cope with one’s own and other’s emotions, to
differentiate between them and use information for guiding one’s thoughts and actions.
• Indicated by the EQ of a person.
It includes these aspects:
1. Self-awareness
2. Managing emotions
3. Empathy
4. Handling relationships
©copyright Virtual University of Pakistan 3
Psychological Testing and Measurement (PSY-631) VU
Lesson 27
Piagetian approach: Measurement of Cognitive Development
Different from the traditional psychological measurement, we see an approach that measures cognitive
development but does not employ tests the way we do in routine. This approach is the Piagetian approach, the
approach introduced by Swiss psychologist Jean Piaget. Piaget presented his theory of cognitive development
and introduced his methodology for studying and understanding the same. Cognitive development is the process
whereby the development of children understands of the world as a function of age and experience takes place.
In order to understand the Piagetian approach and methodology, let us refresh our knowledge about cognitive
development. Cognition is the process of knowing as well as what is known. It includes "knowledge" which is
innate/ inborn and present in the form of brain structures and functions. We ‘remember’ the physical
environment in which we were brought up and develop perceptual constructs or knowledge accordingly (seeing,
hearing, sounds etc).
Cognition refers to ‘mental processes’ that people use to gather/ acquire knowledge, and also the knowledge that
has been gathered/ acquired subsequently used in mental processes. Cognition and knowledge, therefore, can be
said to have a circular relationship.
Mental processes
Cognitive development involves:
Knowledge
Language,
Mental imagery,
Thinking,
Reasoning,
Problem solving and
Memory development
Jean Piaget’s Theory of Cognitive Development:
Piaget (1896-1980) was a Swiss psychologist, who became interested in epistemology i.e., knowledge and
knowing as a result of his study of philosophy and logic. This interest in observation and epistemology laid
foundation of his theory of cognitive development.
Piaget was influenced by Henri Bergson’s Creative Evolution, unlike most of the other psychologists who were
impressed by Darwin’s theory of evolution. Bergson believed in divine agency instead of chance as the force
behind evolution: life possesses an inherent creative impulse. After having secured a position in Alfred Binet’s
laboratory in Paris he got a chance to observe children’s performance, their right and wrong answers. Piaget’s
work and observation generated an interest in children’s mental processes. The real shift took place when he
started observing his own children from birth onwards. He kept records of their behavior and used them to trace
the origins of children’s thoughts to their behavior as babies; later on he became interested in the thought of
adolescents as well
These experiences resulted in two significant consequences:
1. Piaget’s theory of cognitive development
2. Piagetian method of study
Piagetian Method of Investigation:
Piaget’s method is known as the clinical approach which is a form of a structured observation. Piaget used to
present problems/tasks to children of different ages, asked them to explain their answers. Their explanations
were further probed through carefully phrased question.
©copyright Virtual University of Pakistan 1
Psychological Testing and Measurement (PSY-631) VU
Piaget’s Theory of Cognitive Development
Cognitive Development takes place in four stages in a set sequence. The sequence of stages is invariant. The age
range of each stage is also described but the age is not invariant. Age specification is arbitrary and different
children may perform the same task at different age levels. The organization of behavior is qualitatively different
in different Stages. Children throughout the world pass through a series of four stages of cognitive development
in a fixed order.
Piaget’s Stages of Cognitive Development:
1. Sensorimotor stage
2. Preoperational stage
3. Concrete operational stage
4. Formal operational stage
Sensorimotor Stage: Infancy: Birth-2 years
The child’s thought is egocentric and confined to action schemes. Development is very rapid in this stage but
thought processes are limited to the immediate world of the child. Development of object permanence and
development of motor skills takes place. The child has little or no capacity for symbolic representation.
Preoperational Stage: Preschool: 2-7 years
Development of representational thought takes place. The child’s thinking is intuitive not logical. A significant
aspect of development at this stage is the development of language and symbolic thinking. Thinking remains
egocentric.
Concrete Operational Stage: Childhood: 7-11 years
At this stage the child’s thinking becomes systematic and logical, but only with regard to concrete objects.
Development of conservation and mastery of concept of reversibility takes place.
Formal Operational Stage: Adolescence and adulthood: 11 years onward
Abstract and logical thought develops at this stage. The person can deal with the abstract and the absent.
Some Piagetian Tasks:
These tasks measure the acquisition of various concepts. The acquisition of concepts is progressive. Children of
different age levels or children belonging to different stages of cognitive development show different levels of
acquisition,
The ‘A-B’ Search Task:
This task is meant for sensorimotor stage children. It is about the acquisition of the concept of object
permanence. In this task two hiding places, A and B, are used. The places are in front of the child. The places
can be something like two place mats or napkins on a table under which an object may be hidden. An object is
hid under either of two and the child has to look for it. The children’s responses vary according to the stage of
development at which they are. Even within the same stage children of different age levels give different
responses. Their responses may be something like this:
A B
©copyright Virtual University of Pakistan 2
Psychological Testing and Measurement (PSY-631) VU
4- 8 month olds: these children will not search for the object even when it is hidden under A in front of them.
8-12 month olds: they will try to search for the object and will find it under A. when the object is shifted from
under A to under B, the child will still look for it under “A” even when it was hidden in front of the child. This
shows egocentric thinking. It is also known as ‘A- not B’ error.
12-18 months: The child can accurately search for the object.
18-24 months: The child can not only search for the object but can also experiment with it and other similar
objects.
Conservation Tasks:
Conservation is a concept according to which the some properties of an object/ mass matter remain unchanged
or invariant while some others have been changed. For example the weight of an object will remain the same
when its shape has been changed; the number of objects remains unchanged while their arrangement is changed.
Children learn the conservation of mass and number earlier (around 5-6 years of age) than conservation of
weight (around 8-9 years of age).
Conservation of mass: Play dough, plasticine, or clay can be used for this task. Take two same sized balls of the
dough and ask the child if the two have the same amount of dough/clay in them. Let the child feel it and then
answer. When he says yes they are the same amount then flatten one ball like a pan cake or chapatti and ask if
they still have the same amount of the pliable material in them. Children at different cognitive levels will respond
differently. If the child says the two had different amounts of mass, and then ask why does he think so.
Conservation of number: Take ten coins and arrange them in two parallel rows. Ask the child if there are
buttons in same number in each row.
When the child says yes, then rearrange the buttons and spread buttons in one row distantly so that the row
appears to be longer than the other one. Now ask the child if the two rows contained the same number of
buttons. Children belonging to different levels will respond differently. Those who have not acquired the
concept of conservation of number will say that one row was longer than the other one.
Conservation of weight: Once again two, same weight, play dough or clay balls may be used. Ask the child if
the two balls had the same weight. Once the child agrees then change the shape of one of the two balls and
convert it into an oblong. Now ask the child if they were of the same weight. Children belonging to different
levels will respond differently. Those who have not acquired the concept of conservation of weight will say that
the ball and the oblong were of different weight, whereas those who have acquired the concept of conservation
of weight will say that the two objects were of the same weight.
©copyright Virtual University of Pakistan 3
Psychological Testing and Measurement (PSY-631) VU
Conservation of volume: Put equal amount of water in two same sized glasses or jars.
Ask the child if the two contained the same amount of water. When the child agrees and says yes, then take two
other, differently sized, containers and pour water in them, one will have a lower level of water and the other one
will have higher level. Ask the child if the two containers had the same amount of water in them. Once again
children at different levels of cognitive development, with reference to conservation of volume, will give
different answers.
Perspective:
The child is asked to imagine if she is standing at the beginning of a long road, and there are trees on both sides
of the road. She is asked to draw and tell how the road and the trees would look from her position.
©copyright Virtual University of Pakistan 4
Psychological Testing and Measurement (PSY-631) VU
Significant Influences on Cognition:
Socio- Cultural Factor:
• Given and debated in the early 1900s socio-cultural approach has now regained interest among cognitive
scientists
• It states that cognitive ability does not start with the anatomy/ biology of the individual or only with the
environment: the culture and society into which the individual is born provide the most important
resources/ clues for human cognitive development.
• They provide the context into which the individual begins his experience of the world.
• Social groups help in person's cognitive development by placing value/ importance on learning certain skills,
thereby providing all important motivation that the person needs and requires in order to learn and exhibit
those skills or behaviors. This results in cognitive development
• One perspective about cognitive ability suggest that there is some sort of innate potential existing within an
individual
• Another suggests that there is potential within the socio- cultural context for development of the individual.
The individual is born into a society of potential intellect. Knowledge will develop largely based on the
evolution of intellect within the society and culture.
Motivation, Cognition and Learning:
• It is believed that cognitive ability alone cannot account for achievement; motivation is also important in
acquiring/ attaining cognitive skills and abilities.
• People learn information that corresponds to, and is in accordance with, their view of the world. They learn
skills that are meaningful to them. e.g. children who are born in a poor family may not give any attention or
importance to the formal education and as adults, they may pass on similar beliefs and attitudes to their off
springs.
• Motivation determines whether or not one is capable of learning. Whether one learns well or not, depends
on one’s own view and that affects the ability to learn. The motivational condition largely depends on the
way the culture responds to achievements and failures. There are culturally developed attitudes about the
probability of learning successfully after one has initially failed to learn. These attitudes can greatly affect
future learning.
©copyright Virtual University of Pakistan 5
Psychological Testing and Measurement (PSY-631) VU
Lesson 28
Individual Tests of Ability for Specific Purposes
A numbers of tests have been designed for individuals with special needs like learning disability and/or memory
problems.
Learning Disabilities:
One of the areas that cause most concern for psychologists and educationists is learning disabilities. In all
mainstream schools one may expect to find children who face problems in routine education because of certain
learning disabilities. A child’s average achievement scores (marks) at school may be lower than the expected score
for his age level due to the same disability. If a child of average cognitive development or IQ cannot perform or
achieve what other average children of same intelligence may do, then the child may be suspected to be a
learning disabled child. For this purpose such tests will be employed that can detect learning disability. The
difference between IQ and achievement amounting to 1.5 to 2 standard deviations is considered to be indicative
of learning disability.
Illinois Test of Psycholinguistic Abilities (ITPA):
The test is based on modern concepts of information processing. ITPA is based on the theory that inability to
respond correctly to stimuli does not result from defective output alone. The input has a role to play as well.
Input refers to the information-processing system. The input comes from an external stimulus. Our response to
it or information processing takes place in three stages:
Stage 1: incoming information is received through senses
Stage 2: analysis or processing of information is done
Stage 3: the response takes place
The Illinois test provides the independent measure of all these three stages. The three subtests measure
individual’s ability to receive visual, auditory or tactile inputs. Three further subtests provide independent
measure of processing in these sensory modalities. There are other measures of motor and verbal output as well.
The test is designed for children aged 2-10 years. ITPA is widely used among educators, psychologists, learning
disability specialists and researchers. But the psychometric properties of this test are widely criticized. The test
provides no validity and reliability data. The test norms have been obtained from middle class population and it
contains culturally loaded content. Therefore it may not be appropriate for lower-class or minority groups.
The ITPA subtests include the following:
• Auditory Reception
• Visual Reception
• Auditory Association
• Visual Association
• Verbal Expression
• Manual Expression
• Grammatic Closure
• Visual Closure
• Auditory Sequential Memory
• Visual Sequential Memory
• Auditory Closure
• Sound Blending
Woodcock-Johnson Psycho-Educational Battery – Revised:
The Woodcock-Johnson Psycho-Educational Battery - Revised is a commonly used measure of children's
achievement measuring various aspects of scholastic ability. The test measures cognitive abilities, aptitudes,
achievement and interests. Learning problems can be identified by comparing subjects’ cognitive ability score
with their achievement.
©copyright Virtual University of Pakistan 1
Psychological Testing and Measurement (PSY-631) VU
If a different of 1.5 to 2 SD is found between the cognitive ability and achievement of a child, then it will be
considered a major discrepancy that is taken to be indicative of learning disability.
The tests of cognitive ability include:
• Picture vocabulary
• Spatial relations
• Memory for sentences
• Concept formation
• Analogies, and
• A variety of mathematical problems
The achievement tests include:
• Letter and word recognition
• Reading comprehension
• Proofing
• Calculation
• Science and
• Social science and humanities
The tests of interest level cover math, language and physical and social interests. The scores can be described in
terms of percentiles, which can further be converted into standard scores, with a mean of 100 and SD equal to
15. If a child is on the 50th percentile, or mean, in cognitive score, with an achievement score that is 2 SD below
the cognitive score, then it can be taken as indicating learning disability. With normative data of more than 4700
including members of different race, gender, urban and rural status, these tests have good psychometric
properties.
Visiographic Tests:
Visiographic tests require a subject to copy various designs. These test are useful form many kinds of brain
damage.
Benton Visual Retention Test (BVRT):
This test is used for measuring brain damage and psychological deficit. The BVRT assumes that brain damage
easily impairs visual memory ability. The test is designed for individuals aged 8 and older. The subjects have to
reproduce geometric designs presented briefly before them and then removed. The subject loses points for
mistakes and omission. As the number of errors increases subject approaches the organic (brain-damage) range.
Bender Visual Motor Gestalt Test (BVMGT):
The Bender Visual Motor Gestalt Test (BVMGT) is one of the most popular individual tests. The test has nine
geometric figures that subject has to copy. The test is scored on the bases of errors. Norms are available for
children aged 5- 8 years. The one or two errors by the age of 9 years are considered normal. But if individuals
over 9 make more errors this may be indication of some deficit. The individuals with more errors can be said to
have mental age less than 9 years (low intelligence), brain damage or emotional problems. Though the test has
number of scoring systems the reliability of the test is questioned.
Memory-for-Designs Test (MFD):
The MFD is a short time administered (10 minutes only) simple drawing test. The test measures perceptual
motor coordination of individuals from 8 ½ to 60 years of age. The subjects are shown simple designs for a
short duration and are then asked to reproduce them. The drawings are given scores from 0 to 3 depending on
how close or similar they were to the original designs.
The scoring of the 15 drawings can indicate brain injury and brain disease with the help of provided reference
tables according to age and intelligence. The test has good psychometric properties with additional needs for
validity.
©copyright Virtual University of Pakistan 2
Psychological Testing and Measurement (PSY-631) VU
Torrance Tests of Creative Thinking (TTTCT):
Creativity can be defined as “the ability to be original, to combine known facts in new ways, or to find new
relationships between known facts” (Kaplan & Saccuzzo, 2001, p.333). The Torrance Tests of Creative
Thinking (TTTCT) measure different aspects of creativity including fluency, originality and flexibility.
Fluency: Fluency is about the ability to generate a variety of solutions to problem. People would score high on
fluency if their solutions are distinct. The more distinct are the solutions, the higher is the score. The individual’s
fluency is measured by his/her ability to provide as many solutions as on can find.
Originality: Originality has to do with the uniqueness, novel ness, and unusual nature of solutions. One can be
said to be a creative person if one can come up with novel ideas or solutions. The unusual and unique solutions,
which are different from the usual, conventional, and expected ones, add to originality score of an individual.
Flexibility: flexibility is about the ability to shift from one stand point or strategy to another for finding
solutions of problems. People can be said to have a flexible approach if they do not mine shifting positions in
problem solving. If one strategy does not seem to be working, such people would try other strategies. Flexibility
is measured by gauging a person’s ability to switch to different approaches of problem solution. The test is useful
for applied practitioners but needs more research for enhancing its psychometric properties.
Wide Range Achievement Test- 3 (WRAT-3):
The intelligence tests measure what an individual may achieve whereas what an individual has actually achieved is
measured through achievement tests. The IQ tests are about the potential while achievement tests are about the
use of potential. The scores on achievement may be indicative of people’s intelligence test but not necessarily
always. Factors such as interest, motivation, training, prior knowledge or previous exposure may affect the way
one has used one’s potential i.e., intelligence. Therefore both IQ and achievement tests are used for assessing
one’s ability, depending on the situation and purpose. Usually achievement tests are used in groups, but some
individual achievement tests are also available. The Wide Range Achievement Test- 3 (WRAT-3) is most widely
used achievement test that measures the grade-level functioning in reading, spelling and arithmetic. The test can
be used for children aged 5 and older. The WRAT-3 is widely criticized for its grade-level reading ability.
©copyright Virtual University of Pakistan 3
Psychological Testing and Measurement (PSY-631) VU
Lesson 29
Group Testing
Group versus Individual Tests:
So far you have learnt about varieties of ability or intelligence tests that are primarily individual tests. These tests
are administered individually and are chosen, used, and interpreted according the needs of individual subjects.
We also mentioned some tests in the previous sections that can be used in group administration as well. In the
present section we will be exclusively discussing group tests. Such tests are designed and developed for group
administration. No doubt, the individual tests are their own advantages which cannot be denied. For example
there is a certain degree of rapport between the examiner and the test taker. Test administration is comparatively
flexible in the sense that instructions and questions can be repeated and at time rephrased according to the
educational/cultural/ anxiety level of the examinee. In addition, if testing is being done for diagnostic purposes
then individual testing is the best and perhaps the only approach used. However, there are situations where we
prefer to use group testing rather than individual testing.
Which type of test administration shall we use, individual or group, will depend on the purpose of the test. There
will be occasions when the disadvantages will not affect much the findings and the advantages will be considered
to be more important. On the other hand there will be cases where individual testing will be the preferred
approach. Therefore, as is quite understandable, that the mode of test administration will be decided on the basis
of purpose for which the test will be used.
Characteristics of Group Tests:
Group tests are the tests that have been designed in such a manner that a large number of test takers can do
them simultaneously. The instructions address the entire group not individuals, are delivered only once at the
beginning of the test and are not repeated for anyone. These tests are usually timed tests and the examiner does
not do or say anything that may interrupt the testing process. Group tests are generally paper- pencil tests,
although computerized administration is also becoming popular in developed countries.
Group tests mostly have a multiple choice format. Usually blank circles, or spaces,
are provided in front of every response option. The test taker blackens the circle, or places a tick mark in blank
space, or encircles the option number (a, b, c, d etc.). The answer sheets are marked with the help of a key.
Answer keys can be of different types but one common type is the one which is designed like a stencil. A sheet
like the test’s sheet is used in which holes or slots are made on the right option. The stencil is placed on top of
the answer sheet. If the space under the slots has been darkened then the item is counted as right. The answer
sheets can be marked by computers also, which is the commonly used approach nowadays. Group tests are also
administered with the help of computers. In computerized testing, a mouse or joy stick is used to mark the
answers. But this approach can be adopted only when sufficient number of computers is available.
Advantages of Group Tests:
• Such tests save time of administration
• A large number of test takers can be examined simultaneously
• Group tests are a good source of quick data collection for research projects
• If quick decisions are to be made, such as screening or school admissions, then group tests are very useful
• Test administration is easy from the point of view of the examiner, especially because there is little pressure
on the examiner for taking notes of individual expressions, explanations, or clarifications on part of the
examinee.
• There is little impact of the personality of the examiner on the performance of the examinees, which can be
the other way round in case of individual testing.
• The role of the examiner is the minimum. If desired, tape recorded instructions may also be used.
Disadvantages of Group Tests:
In spite of a number of advantages, the group administered tests entail some disadvantages as well:
• Group tests are very impersonal. The personal touch and rapport that is an ingredient of individual tests is
not found in group tests.
©copyright Virtual University of Pakistan 1
Psychological Testing and Measurement (PSY-631) VU
• Group administration is not flexible like individual testing. In individual administration, the examiner may
repeat instructions, or may rephrase them according to the needs of the test taker.
• In group testing all subjects attempt all items, whereas in individual testing considering basal rule and ceiling
rule the testing process may be tailored according to the ability level of the test taker.
• In group testing the test takers’ motivation level cannot be judged, maintained, or enhanced.
• The examiner is deprived of the verbal and non-verbal cues that can reflect the examinees’ anxiety,
confusion, or discomfort etc. Nor can the examiner do anything to put individual examinees at ease.
• Most group tests require the subjects to have reading skills and the ability or practice in paper- pencil
manipulation to record their responses. There may be some test takers in the group who do not have these
abilities and who may remain undetected because of the nature of the testing process. Therefore their scores
will not be accurately representing their true potential.
Group Tests and Batteries:
Following is a description of some of the commonly used and popular group administered intelligence tests.
These are just a few of the large number of available intelligence tests. This will also give you a flavor of the
various purposes for which these tests may be used.
Kuhlmann- Anderson Test- 8th Edition (KAT):
KAT is a group intelligence test. It can be used with kindergarten to 12th grade children. It is primarily a non-
verbal test and it is so for all grade levels. The test has eight levels that cover these school grade levels. A number
of tests are found in each level. KAT is popularly used as a very good test of mental ability. The test has strong
psychometric properties,
Its norms obtained from a sample of more than 10000, with high reliability and validity coefficients.
Henmon-Nelson Test (H-NT):
H-NT is also considered to be a good test of mental ability or intelligence. The test has two sets of norms, grade-
wise and age- wise. The 90 item test can be completed in around 30 minutes and can be used for all grade levels.
Rather than considering multiple intelligences, this test provides a single score relating to Spearman’s g factor. H-
NT also has high reliability in the 90s, and validity coefficients of 50s to 90s.
Cognitive Abilities Test (COGAT):
The COGAT yields three scores; verbal, nonverbal, and quantitative. The test has been designed very carefully
and attempts have been made to remove sources of difficulty such as cultural bias. It was developed specially for
poor readers, those who were poorly educated, and the ones who had English as a second language. The
statistical analysis of items was done to identify if there were any items that predicted differentially for white and
minority students. Such items were removed from the test. the purpose was to eliminate bias against any group
from test content. There are three subtests which can be completed in 32-34 minutes each. However the test
manual recommends that this testing be completed in 2-3 days. Good psychometric properties have been
reported for the test. The most positive point about this test is that it is a tool better than other tests for the
assessment of culturally diverse, minority, and economically disadvantaged children (Kaplan & Saccuzzo, 2001).
Some research evidence suggests that verbal underachievement can be measured well with this tool (Langdon,
Rosenblatt, & Mellanby, 1998). It has also been reported as a sensitive discriminator for giftedness (Harry,
Adkins, & Sherwood, 1984) and a tool that can make good predictions about future performance (Henry &
Bardo, 1990).
The reliability coefficients reported for COGAT are very high, in the 90s.
©copyright Virtual University of Pakistan 2
Psychological Testing and Measurement (PSY-631) VU
Lesson 30
Specific Purposes Tests
The Scholastic Assessment Test (SAT-I):
The Scholastic Assessment Test or SAT-I, was previously known as Scholastic Aptitude Test or SAT. First used
in 1926, the test is the most commonly used college entrance test in the U.S. SAT-I has two parts that contain
the reasoning tests which comprise further subtests.
Verbal Reasoning:
This part consists of 78 questions in all, to be completed in 75 minutes. The distribution of items is as follows:
• Sentence Completion; 19 questions
• Critical Reading; 40 questions
• Analogies; 19 questions
Mathematical Reasoning:
This art contains 60 questions. The questions are distributed in the following subtests:
• Regular Mathematics; 35 multiple choice questions
• Student- Produced Responses; 10 questions
• Quantitative Comparisons; 15 questions
The test norms were obtained from a large representative sample. SAT-II is also available that comprises Subject
Tests including:
• A direct writing test
• Tests in Asian languages
• English-as-a- second Language Proficiency Test
Graduate and Professional School Entrance Tests:
The graduate school entrance tests are widely used for admission in graduate school and professional degree
programs like medicine, art, and law etc.
Graduate Record Examination Aptitude Test (GRE):
Graduate Record Examination Aptitude Test is the most commonly used graduate-school entrance test. The
general scholastic ability is measured by GRE along with grade point average and letter of recommendation as
process of general selection in school. The three sections of GRE include verbal (GRE-V), quantitative (GRE-
Q) and analytic (GRE-A). The verbal section includes measurement of reasoning, antonyms, analogies and
paragraph comprehension. GRE-Q purports to measure reasoning, algebra and geometry. In addition to that
GRE also measures the general achievement in at least 20 majors like psychology, history and chemistry.
Though the psychometric properties of GRE are not very impressive it is used as a relatively strong instrument.
The studies on relationship of Grade Point Average (GPA) and GRE have shown the correlation from .22 to
.33. The GRE, in addition with GPA, has been proved to be a good predictor of graduate success.
Miller Analogies Test:
The Miller Analogies Test (MAT) is the second major, widely used, scholastic aptitude test. The MAT is a 50
minute verbal test that measures student’s ability to find logical relationships for 100 different analogy problems.
The MAT offers special norms for various fields. The research has indicated that MAT has an age bias as its
scores over predicted the GPAs for 25-34 years group and under predicted for the age 35-44 years.
The psychometric properties are adequate for MAT but it does not predict research ability, creativity, and other
factors important in graduate-school.
Nonverbal Group Ability Tests:
Nonverbal tests are used for evaluation of individuals without the use of language. The individuals are usually
asked to perform some tasks like drawing, solving maze or identify problem figure from set of figures presented
before them.
©copyright Virtual University of Pakistan 1
Psychological Testing and Measurement (PSY-631) VU
Raven Progressive Matrices:
Raven Progressive Matrices (RPM) is a non-verbal multiple choice measures of the general intelligence. In each
test item, the subject is asked to identify the missing element that completes a pattern. The test can be
administered to groups or individuals of 5 years old to older adults. The Raven’s test contains 60 matrices with a
missing part presented in graded difficulty. The subject has to select appropriate pattern from a group of eight
options.
The research has shown that test measures general intelligence along with the capacity to think clearly and make
sense of complex data. In spite of criticism over psychometric properties RPM has widespread use for children,
language-handicapped and the culturally deprived. The updated manual of Raven Progressive Matrices provides
us the comparison of performance of children from major cities of the world. The RPM has minimized the
effects of language and culture.
Goodenough-Harris Drawing Test:
The Goodenough-Harris Drawing Test is the quickest, easiest and less expensive nonverbal test for measuring
intelligence. The subject is asked to draw a whole human figure. The test is scored for each item included in
drawing. The subject can get out of 70 possible points. The G-HDT scoring follows the age differentiation
principle; older children tend to get more points because of greater accuracy. The test has good psychometric
properties. The scores on the G-HDT can be related to Wechsler IQ scores. The test can be more appropriately
used in combination with other tests of intelligence.
IPAT Culture Fair Intelligence Test:
The purpose of nonverbal and performance tests is to remove cultural influences in intelligence and learning.
The IPAT Culture Fair Intelligence Test is a paper pencil test for three levels; age 4-8 and mentally disabled
adults, age 8-12 and randomly selected adults and high-school age and above-average adults.
Group Tests For Specific Purposes:
Along with large body of group tests for intelligence, academic aptitudes, personnel and occupation selection, a
number of tests are available for specific populations. For example Black Intelligence Test of Cultural
Homogeneity (BITCH) is used as culture-fair intelligence test for African Americans.
Tests for use in industry: Wonderlic Personnel Test:
This test helps in making decisions concerning employment, placement and promotion. The Wonderlic
Personnel Test (WPT) is based on population Otis Self-Administering Tests of Mental Ability. The WPT is a
quick (12-minute) test of mental ability in adults. The test lacks in validity documentation. The test is widely used
for employee-related decisions in industry.
Tests for Assessing Occupation Aptitude:
The General Aptitude Test Battery (GATB) is widely used ability test among a variety of available group tests; to
measure the aptitude for various occupations. The U.S Employment Service developed GATB for employment
decisions in government agencies. The test provides scores for motor coordination, perception and clerical
perception along with verbal numerical and spatial aptitudes. The test is criticized for its normative data.
The Differential Aptitude Test, The Bennett Mechanical Comprehension Test and Revised Minnesota Paper
Form Board Test are used for mechanical ability and clerical competence.
Armed Services Vocational Aptitude Battery:
Armed Services Vocational Aptitude Battery (ASVAB) was designed for postsecondary school student and
students of 11-12 grades originally for use in Defense Department. The test scores can be used for both in
educational and military settings. The ten subtests of ASVAB include: general science, arithmetic reasoning, word
knowledge, paragraph comprehension, numeral operations, coding speed, auto and shop information,
mathematics knowledge, mechanical comprehension and electronics information. These ten subtests are grouped
into composites: academic composites (academic ability, verbal and math); four occupation composites
(mechanical and crafts, business and clerical, electronic and electrical and health and social); and overall general
ability. The ASVAB has very good psychometric properties and recently the military has started using this test
adaptively through new computerized format than traditional paper-based test.
©copyright Virtual University of Pakistan 2
Psychological Testing and Measurement (PSY-631) VU
Tests for Choosing Careers:
There are a number of psychological tests available for helping individuals in choosing the right career depending
on their interests and aptitude. The first step of career selection is evaluation of interest. Carnegie Interest
Inventory is the first interest inventory introduced in 1921 that provided measure for 15 different interests.
Nowadays more than 80 interest inventories are available for interest measurement.
The Strong Vocational Interest Blank (SVIB):
The most widely used interest test is the Strong Vocational Interest Blank was developed in 1927. By the use of
criterion-group approach strong developed SVIB; the subject’s interest was matched with criterion group of
people who were happy in their selected careers. The 399 items of SVIB are related to 54 occupations for men
and 32 occupations for women presented separately. Items in the SVIB were weighted according to how
frequently an interest occurred in a particular occupation group.
The test provides strong psychometric characteristics with most interesting findings that patterns of interest
remain relatively stable over time. The studies have also shown that interests’ patterns usually established by age
17 years. Though the SVIB has been used widely it is criticized for gender bias having used different scales for
men and women and lack of theoretical information associated with SVIB.
The Kuder Occupational Interest Survey:
The second most popular interest test is The Kuder Occupational Interest Survey (KOIS).
The test taker has to select most preferred and least preferred activity among 100 triads of alternative activities.
The similarity between test taker’s interest and those who are employed in various occupations is assessed in
KOIS.
The separate norms are available for men and women in addition with separate scales for college majors. So,
KOIS helps students for choosing majors. A series of new scales is added to KOIS for nontraditional
occupations. In spite of lack of research data the KOIS is useful for guidance decision for high-school and
college students.
©copyright Virtual University of Pakistan 3
Psychological Testing and Measurement (PSY-631) VU
Lesson 31
Tests for Special Populations
Scales for Infants:
Brazelton Neonatal Assessment Scale (BNAS)
Population: infants
1. Age group: 3 days- 4 weeks
2. Aim: to measure a newborn’s competence
3. Total scores: 47 (20 elicited responses and 27 behavioral items)
4. Areas of functioning covered: social, behavioral, and neurological
5. Examples of factors assessed: reflexes, motor maturity, ability to habituate to sensory stimuli, startle reactions,
cuddliness, responses to stress, hand mouth coordination
The BNAS is considered as a good assessment and research tool. However, its test retest reliability is not
satisfactory. Also if prediction of future intelligence is required then this scale cannot be of help.
Gesell developmental Schedules (GDS):
1. Population: infants and children
2. Age group: 2.5- 6 years
3. Aim: To measure infants’ and children’s intelligence. The purpose is to measure the subject’s developmental
status
4. Final score: Developmental quotient or DQ that can be used to calculate IQ. The formula is the same as the
one for IQ. The value of MA is replaced by DQ.
5. Areas: gross motor, fine motor, adaptive, language, personal-social
6. Other names: Gesell Maturity Scale, the Gesell Norms of Development, and The Yale Tests of Child
Development: Originally developed in 1925, this tool has been used quite popularly, however it entails certain
shortcomings. For example it cannot be used as a predictor of future intelligence. Also it is criticized for poor
psychometric properties, inadequate sample, and weak standardization.
Bayley Scales of Infant Development: 2nd Edition (BSID-II):
1. Population: infants
2. Age group: 2- 30 months
3. Aim: To measure infants’ cognitive and motor functions
4. Final score: Two main scores, mental and motor
5. Areas: gross motor, fine motor, adaptive, language, personal-social
The Bayley scale is valued and appreciated for adequate and good standardization. Although this scale can also
not predict future intelligence, it is considered as a useful tool that can predict well for children who are mentally
retarded.
Cattell Infant Intelligence Scale (CIIS):
Population: infants
Group: 2- 30 months
Aim: To measure infants’ intelligence
Areas: gross motor, fine motor, adaptive, language, personal-social
The CIIS is an age scale that uses the concept of mental age and IQ and is designed after the Binet Scale. It is
referred to as a downward extension of Binet’s Scale.
©copyright Virtual University of Pakistan 1
Psychological Testing and Measurement (PSY-631) VU
Assessment of the Mentally Retarded:
Vineland Adaptive Behavior Scales (VABS):
These are the latest version of the Vineland Social Maturity scale developed by Edger Doll in the 1930s. Doll
developed it after observing differences among the mentally retarded patients. Doll developed a standardized
record form for the assessment of developmental level. The level was determined by considering the following:
• Looking after their practical needs, and
• Taking responsibility in daily living
The VABS is available in three versions. The following domains and subdomains are covered in this tool:
1. Communication
• Receptive
• Expressive
• Written
2. Daily living Skills
• Personal
• Domestic
• Community
3. Socialization
• Interpersonal relationships
• Play and leisure time
• Coping skills
4. Motor skills
• Gross
• Fine
5. Adaptive behavior composite
6. Maladaptive behavior
Issues of Intelligence Testing:
No matter how we understand intelligence and what approach to its measurement do we adopt, we should keep
these issues in mind:
• Is intelligence innate or acquired?
• Is intelligence a stable phenomenon?
• Can the gender, race, culture, or region of the subjects affect their scores?
Standardized intelligence testing has been called one of psychology's greatest successes. But intelligence testing
has also been met with number of issues including race, gender, class and culture; of minimizing the importance
of creativity, character and practical know-how; and of propagating the idea that people are born with an
unchangeable intellectual potential that determines their success in life. Some issues of intelligence testing include
following:
Nature versus Nurture:
The number of researches has shown the importance of heredity and environment issues in intelligence. The
twin studies have shown that identical twins that have been reared apart have shown similar intelligence test
scores. Children born to poverty stricken parents but adopted and brought up with better-educated and middle
class families tend to have higher intelligence test scores. The natural mothers with higher IQs tend to have
children with high intelligence regardless of the place and families they have brought up.
The proponents of nurture view emphasize the role of prenatal and postnatal environment, socioeconomic
status, educational opportunities and parental modeling with respect to intelligence testing. However the
interactionist position propagates that intelligence is result of the interaction between heredity and environment.
©copyright Virtual University of Pakistan 2
Psychological Testing and Measurement (PSY-631) VU
The Measurement Process:
The process of intelligence testing itself involve number of issues like instrument used, standardization sample,
test administrator and accuracy of test scoring. The test taking attitude prior coaching of examinee or test
administrator’s training are the factors that may affect the scores of intelligence testing.
Personality:
Researches have shown that several personality and intelligence tests overlap each other. Wechsler (1958)
believed that tests of intelligence measure traits of temperament and personality such as drive, energy level,
impulsiveness, persistence and goal awareness.
Longitudinal studies have shown positive various personality characteristics and intelligence measurement.
Higher intelligence scores are found to be associated with high need of achievement, competitive striving, self-
confidence and emotional stability. On the other hand low intelligence measurement is expected among
individuals with passivity, dependence and maladjustment.
Gender:
The extensive research has been done to find out the gender differences in intelligence. However the findings
have revealed that the differences found in gender are result of psychosocial and physiological factors.
Family Environment:
The family environment includes both aspects of nature and nurture. The twin studies have shown the
significance importance of heredity and family environment. The issues like parental use of language, parental
stress for achievement, access to resources and exposure to the world and parental influences over discipline and
policies in home environment are also matter of interest for researchers.
The studies on maternal age and social class have shown that aged mothers tend to have children with higher IQ.
©copyright Virtual University of Pakistan 3
Psychological Testing and Measurement (PSY-631) VU
Lesson 32
Personality Testing
Before we discuss the various approaches to the assessment of personality, we need to understand what
personality is. Personality has been defined in many ways:
• The sum total of characteristics on the basis of which people can be differentiated from each other.
• The stability in a person’s behavior across different situations.
• Characteristic ways in which people behave.
• Characteristics that are relatively enduring, and that make us behave in a consistent and predictable way.
Methods of Assessment of Personality:
• Interview
• Observation and behavioral assessment
• Psychological tests
1. Interview:
Interview is direct, face to face encounter and interaction between the psychologist and the person being
assessed. Verbal as well as non-verbal information is available to the psychologist. Interviews are usually used to
supplement information gathered through other sources. Skill of the interviewer is very important since the
worth and utility of the interview depends on how well he can draw relevant information from the interviewee.
2. Behavioral Assessment:
It refers to direct observation of behavior, for investigating, understanding, and describing personality
characteristics. Skill and expertise of the observer are the most significant ingredients of the observation process.
3. Psychological Tests:
Psychological tests are standard measures devised in order to objectively assess personality and behavior. Like
any other type of psychological tests personality tests also have to be valid and reliable. Availability of norms is
an additional characteristic.
Psychological tests are generally of two types:
1. Objective tests/ personality inventories/ self- report measures
2. Projective tests
Objective Tests/ Personality Inventories/ Self- Report Measures:
Measures wherein the subjects are asked questions about a sample of their behavior
For example MMPI (Minnesota Multiphasic Personality Inventory) is the most frequently used personality test. It
was initially developed to identify people having specific sorts of psychological difficulties. But it can predict a
variety of other behaviors too. It can identify problems and tendencies like Depression, Hysteria, Paranoia, and
Schizophrenia.
Projective Tests/ Techniques:
Tests in which the subject is first shown an ambiguous stimulus and then he has to describe it or tell a story
about it is known as projective tests.
The most famous and frequently used projective tests are:
• Rorschach test, and
• TAT or Thematic Apperception Test
How is The Content of a Personality Test Decided?
What the test will contain and how it will measure personality will be affected by the theoretical orientation of
the test developer. Similarly the choice of a test to assess personality also depends on how one defines
personality
©copyright Virtual University of Pakistan 1
Psychological Testing and Measurement (PSY-631) VU
1. Psychodynamic Approach:
This approach focuses upon the unconscious determinants of personality i.e., psychologists belonging to this
approach believe that unconscious forces determine our personality. Unconscious is the part of personality
which we are not aware of. Unconscious contains instinctual drives: Infantile wishes, Desires, Demands, and
Needs.
Therefore the test based on this orientation will try to unfold and explore the unconscious.
2. Trait Approaches:
These are the approaches that propose that there are certain traits that form the basis of an individual’s
personality. These approaches seek to identify the basic traits necessary to describe and understand personality.
Traits are enduring dimensions of personality characteristics that differentiate a person from others. Trait
theories do not imply the absence or presence of different traits in different people i.e., either/or situation.
These assume that some people are relatively high on some traits whereas, some are low on the same traits.
Trait theories based upon factor analysis:
Factor analysis: a statistical method whereby relationships between a large numbers of variables are summarized
into fewer patterns. These patterns are more general in nature.
For example a researcher prepares a list of traits that people may like in an ideal man.
The extensive list is then administered to a large number of people, who are asked to choose traits that may
describe an ideal man. Through the factor analysis, the responses are statistically combined and the traits
associated with one another in the same set (or person) are computed. Thus the most fundamental patterns are
identified. These patterns are called factors.
Raymond Cattell’s Sixteen Personality Factors:
After using factor analysis Cattell proposed that two types of characteristics form our personality; Surface traits,
and source traits
Eysenck’s Dimensions of Personality:
According to Eysenck, personality can be understood and described in terms of just two major dimensions;
Introversion-extroversion, and neuroticism-stability
On the first dimension, people can be rated ranging from introverts to extroverts: the rest of the traits fall in
between. The second dimension is independent of the first one, and ranges from being neurotic to being stable.
Introverts are quiet, passive, and careful people. Extroverts are outgoing, sociable, and active people. Neurotics
are moody, touchy, and anxious people. Stable are calm, care-free, and even-tempered people.
Eysenck evaluated a number of people along these dimensions. Using the information thus obtained, he could
accurately predict people’s behavior in a variety of situations.
3. Social Cognitive Approach to Personality:
This approach emphasizes upon the role of people’s cognitions in determining their personalities. Cognitions
include: people’s thoughts, feelings, expectations, and values. These approaches consider the “inner” variables to
be important in determining one’s personality. These approaches emphasize the reciprocity between individuals
and their environment. There exists a web of reciprocity, consisting of the interaction of environment and
people’s behavior. Our environment affects our behavior, and our behavior in turn influences our environment
and causes modifications in the environment. The modified environment in turn, affects our behavior.
©copyright Virtual University of Pakistan 2
Psychological Testing and Measurement (PSY-631) VU
Lesson 33
Objective / Structured Tests of Personality
The objective measures of personality are also known as the structured measures.
“Structured measures of personality are characterized by structure and lack of ambiguity. A clear and definite
stimulus is provided, and the requirements of the subject are evident and specific” (Kaplan & Saccuzzo, 2001, p.
406).
Advantages of Structured Tests:
• They are easily administered.
• The subject can endorse own responses using paper and pencil.
• They are easy to score.
• They can be group administered.
• Their scoring is uniform for all and the scorers’ likes, dislikes, or theoretical orientations do not interfere
with the results.
• They are time- economical.
Some Popular Personality Inventories:
Woodworth personal Data Sheet:
Developed in: Developed during the first World War. Its final form was published after the war
(Woodworth, 1920).
Developed for: Identifying recruits who were likely to break down in combat. The recruits who reported
many symptoms were called for interview. They were the ones who were most likely to be
rejected.
Forms: Single
Format: It was like a paper-pencil psychiatric interview. The items were chosen from psychiatrists’
questions asked in screening interviews and lists of known symptoms of emotional
disorders.
Items: 116 questions with a ‘yes’ ‘no’ format.
Score: A single score was obtained from the data sheet as it was designed as a global measure of
functioning.
Individual/ The Woodworth was used for mass screening. Many new tests followed its foot-steps.
group:
Mooney Problem checklist:
Developed in: 1950
Developed for: Identifying problems experienced/ faced by the subject.
Items: Checklist of problems. Problems included in the checklist are chosen out of problems
reported in statements of around 4000 high- school students as well as well in clinical case
history data.
Score: The checked items indicate the problems experienced by the respondent.
Weak points: For the sake of interpretation, one has to rely on the reported problems. There is no way to
check if the reports are true or not. All that counts is face validity of responses.
Minnesota Multiphasic Personality Inventory (MMPI):
MMPI has been developed by S. R. Hatahaway and J. C. Mckinley first published in the 1940s (Hatahaway &
Mckinley, 1940, 1942, 1943, 1951) and was mental to be used with people 14 years of age and above. It was first
published by University of Minnesota press in 1943.
©copyright Virtual University of Pakistan 1
Psychological Testing and Measurement (PSY-631) VU
Developed in: 1940s
Developed for: It was initially developed to identify people having specific sorts of psychological
difficulties or to detect major psychiatric or psychological disorders. It can predict a variety
of other behaviors too. It can identify problems and tendencies like Depression, Hysteria,
Paranoia, and Schizophrenia. It can be said that the main goal is to differentiate abnormal
persons from normal.
Forms: MMPI, MMPI-2 and MMPI -A
Items: It is a self-report measure. It contains statements that have a true/false format. The subject
has to tell if the different statements are ‘true’ about them or ‘false’. There are a total of 566
items in MMPI and 567 in MMPI- 2.
Difference between MMPI had a total of 566 items of which 16 were repeated items.
MMPI and In MMPI-2 a number of items were dropped; the 16 repeated items of MMPI, 77 items
MMPI-2: from 399 to 550, and 13 items from the clinical scales. 460 items were retained from the
original test. 127 new items were added in MMPI-2; two critical items, were added to
identify severe pathology, 81 items added to new content scales, and 24 items added for
experimental purpose. These 24 were the unscored items.
MMPI- A is the version developed for adolescents. It was observed over years that the
response/ score pattern of adolescents on earlier versions reflected some problems. This
group tended to score higher than adults on the clinical scales. Therefore a need was felt
for a scale specifically for adolescents. MMPI- A contains 478 items, 88 items less than the
earlier versions. It can be used in school settings, educational counseling, and psychiatric
set ups. The format and scales are the same as the other versions, and it follows a true/
false format.
Similarities between
MMPI and The use and interpretation is the same for both tests.
MMPI-2:
MMPI scales: It contains three validity scales and ten clinical scales.
Score: The scores on the MMPI scales are used to plot a profile of the subject which shows
tendencies/pathologies on which a subject is high or low.
Validity scales: • Lie scale, 15 itemed, to detect naïve attempt to ‘fake good’.
• K scale, 30 itemed, to identify defensiveness.
• F scale, 64 itemed, to detect attempt to ‘fake bad’.
Clinical scales: • Hypochondriasis, 33 itemed, for physical complaints
• Depression, 60 itemed, for detecting depression
• Hysteria , 60 itemed, for detecting immaturity
• Psychopathic deviate , 50 itemed, for authority conflict
• Masculinity- femininity , 60 itemed, for masculine or feminine interests
• Paranoia , 40 itemed, for identifying suspicion and hostility
• Psychasthenia , 48 itemed, for detecting anxiety
• Schizophrenia , 78 itemed, for detecting alienation and withdrawal
• Hypomania, 46 itemed, indicates high energy and elated mood level
• Social introversion, 70 itemed, yields score for shyness and introversion.
Individual/ group: It has been used in both ways.
Prerequisite: The subject should have IQ within the normal range. Also, MMPI requires reading ability
of grade 6 whereas MMPI- 2 requires that the subject has the reading ability of grade 8
level.
Strong points: The inventory includes validity scales/ lie scale that help identify faking. It can be identified
if people have been wrongly reporting pathological inclinations/symptoms or intentionally
avoiding pathological content. The standardization of all versions was done on large
samples.
Weak points: It is a lengthy test and requires a long duration of time for administration.
©copyright Virtual University of Pakistan 2
Psychological Testing and Measurement (PSY-631) VU
California psychological Inventory-Revised Edition (CPI):
CPI (Gough, 1987) is one of the popularly used personality inventories. It follows the pattern of MMPI and
many of its items are the same as the ones in MMPI.
Developed for: Personality assessment in normally adjusted individuals.
Scales: CPI has 20 scales. There are four classes of scales. Each of its 18 scales is grouped into
one of four classes.
What the classes of Class I scales: Poise, self-assurance, and interpersonal effectiveness.
scales • High score: resourceful, active, competitive, outgoing, spontaneous, self-confident, at
measure : ease in interpersonal situations.
Class II: Socialization, maturity, and responsibility.
• High score: Honest, dependable, conscientious, calm, practical, cooperative. Also,
alertness to social issues.
Class III: Achievement potential and intellectual efficiency.
• High score: Efficient, organized, capable, forceful, knowledgeable, mature, and sincere.
Class IV: Interest modes.
• High score: Socially well adapted, responsive to others’ needs.
Items: 462 items
Individual/ group: Can be used in both settings.
Advantages: CPI can be used with normal subjects.
Cattell’s Sixteen Personality Factor Questionnaire (16 PF):
At least five editions of 16PF are available. This test/questionnaire covers the following primary source traits:
A. Cool-warm
B. Concrete thinking –Abstract thinking
C. Affected by feelings-emotionally stable
D. submissive- Dominant
E. Sober- Enthusiastic
F. Expedient- conscientious
G. Shy- bold
H. Tough minded- tender minded
I. Trusting- Suspicious
J. Practical- Imaginative
K. Forthright- Shrewd
L. Self-assured- Apprehensive
Q 1 : Conservative- Experimenting
Q 2 : Group oriented- Self sufficient
Q 3 : undisciplined self-conflict-following self-image
Q 4 : Relaxed- Tense
Shortcomings of Structured Tests:
• These tests are of less help if in depth information about the subject’s personality is required.
• Their format is fixed and cannot be molded according to the respondent’s needs. For example it is difficult
to find out if the subject is facing problems in understanding the wording of test items, and if so no
alterations can be made in the wording or format. This is more so if the test is being group administered.
• The accurate endorsement of the responses depends on the skill and understanding of the subject as well as
the skill of the psychologist/ examiner in test administration.
• Response bias: at times people may have a tendency to mark all items in the same pattern e.g., all true or all
false responses.
©copyright Virtual University of Pakistan 3
Psychological Testing and Measurement (PSY-631) VU
Lesson 34
Projective Personality Tests
The projective tests are the tests in which the subject is first shown an ambiguous stimulus and then he has to
describe it or tell a story about it. Two most famous and frequently used projective tests are:
I. Rorschach test, and
II. TAT or Thematic Apperception Test
Rorschach Ink Blot Test:
• The test consists of inkblot presses. These have no definite shape.
• The shapes are symmetrical, and are presented to the subject on separate cards.
• Some cards are black and white and some colored.
Procedure of Rorschach administration:
The subject is shown the stimulus card and then asked as to what the figures represent to them. The responses
are recorded.
Using a complex set of clinical judgments, the subjects are classified into different personality types. The skill and
the clinical judgment of the psychologist or the examiner are very important.
Thematic Apperception Test (TAT)
Developed in: The TAT was developed by Christiana D. Morgan and Henry Murray in 1935 during their
working at Harvard Psychological Clinic.
Population: Originally the TAT was developed for patients in psychoanalysis to obtain raw data.
Developed The main objective of TAT is to evaluate a person's patterns of thought, attitudes, observational
for: capacity, and emotional responses to ambiguous test materials.
Scoring: The interpretive system of TAT identifies the story with individual/person described in story,
needs and demands of environment created by storyteller.
Among variety of scoring interoperations the scoring system is based on Murray’s personality
theory.
Items: The test contains 30 black-and-white picture cards with variety of situations including human
figures and situations.
Some cards are suggested to use with adult males or females and some are used with children.
Forms: In clinical practice TAT is used depending on the client’s need and situation. The practitioner
may use the 20 cards as prescribed number for presentation or depending on client’s story-telling
capacity; 1-2 or 30 cards may be used.
Strong The test has great intuitive appeal. It helps to identify emotions and motivations of the story-
Points: teller projected by unambiguous stimuli.
The test can be used with ample liberty of administration and scoring for practitioners.
Weak The psychometric properties of the TAT are debated like other projective techniques. It has
Points: general lack of standardization in administration, scoring and interpretations procedures.
Other Picture-Story Tests:
Children’s Apperception Test (CAT)
Developed in: The TAT was developed by Leopold Bellak in 1949.
Population: The CAT was developed for children ages 3-10 years.
Developed for: The main objective of CAT is to measure the personality traits, attitudes, and psychodynamic
processes evident in children.
Scoring: Scoring of the Children's Apperception Test is not based on objective scales; it must be
performed by a trained test administrator or scorer. The scorer's interpretation should take
into account: the story's primary theme; the story's hero or heroine; the needs or drives of the
hero or heroine; the environment in which the story takes place; the child's perception of the
©copyright Virtual University of Pakistan 1
Psychological Testing and Measurement (PSY-631) VU
figures in the picture; the main conflicts in the story; the anxieties and defenses expressed in
the story; the function of the child's superego; and the integration of the child's ego.
Items: The test contains black-and-white picture cards and use animal figures instead of humans.
Forms: In addition with original CAT an alternative version of CAT called CAT-H is also published.
The Picture Story Test
Developed in: The Picture Story Test was developed by Symonds in 1949.
Population: The test was developed to use with adolescents.
Developed for: The main objective of Picture Story Test is to elicit stories related to specific situations like
coming home late, leaving home and planning for the future.
Items: The test contains 20 picture cards.
The Education Apperception Test and the School Apperception Method
Developed in: The Education Apperception Test was developed by Thompson and Sones in 1973 and the
School Apperception Method was designed by Solomon and Starr in 1968.
Developed for: These two picture instruments were designed to tap children’s attitude toward school and
learning.
The Michigan Picture Test
Developed in: The Michigan Picture Test was developed in 1953.
Developed for: It is used to elicit various responses, ranging from conflicts with authority figures to feelings
of personal inadequacy.
Population: The test was developed for use with children between the ages of 8 and 14 years.
Items: The test contains 16 pictures.
Make A Picture Story Method
Developed in: Make A Picture Story Method was developed in 1952.
Developed for: The test helps to indicate the thinking, feelings of test-taker’s projections.
Items: The test contains 67 cut-up figures of people and animals that may be presented on any of 22
pictorial backgrounds. The number of figures with blank faces and different background
settings like living room, street, nursery, stage, bridge etc. are available to use.
Tests Using Pictures as Projective Stimuli:
The Hand Test
Developed in: The Hand Test was developed in 1983 by Wagner.
Items: The test contains nine cards with pictures of hands on them and tenth blank card.
Scoring: The test taker is asked what the hands on each card might be doing. When presented with
the blank card, the test taker is instructed to imagine a pair of hands on the card and then
describe what they might be doing. The responses are interpreted according to 24 categories
such as affection, dependence and aggression.
The Rosenzweig Picture-Frustration Study
Developed in: The Rosenzweig Picture-Frustration Study was originally developed in 1947 by Rosenzweig.
Developed for: The test is based on assumption that the test taker will identify with the person being
frustrated.
Forms: The test is available in forms for children, adolescents and adults.
Scoring: The task of test is to fill in the response of cartoon figure being frustrated. The responses
are scored in terms of the type of reaction elicited and the direction of the aggression
expressed.
©copyright Virtual University of Pakistan 2
Psychological Testing and Measurement (PSY-631) VU
Words as Projective Stimuli:
The first attempt of using words as projective measure was made by Galton in 1879. Afterwards Cattell and
Bryant in 1889, Kraepelin in 1896 and Jung in 1910 use the words as tests. The task of these tests is to interpret
the responses of words.
Word Association Tests
Developed in: The test was developed by Rapaport, Gill and Schafer in 1946.
Developed for: The Word Association Test tries to evaluate the responses with respect to variables like
popularity, reaction time, content and test-retest responses.
Scoring: The length of response time is recorded each time. The examinee is asked to clarify the
relationship exist between the original word and response word.
Items: The test consists of 60 words that are presented before the examinee in two parts firstly the
examinee is asked to respond quickly with first word came in mind. In second part he is again
presented with words and asked to reproduce original response.
The Kent-Rosanoff Free Association Test
Developed in: The test was developed in 1910.
Developed for: The test attempts at standardizing the response of individuals to specific words. The purpose
of the test is to identify the individuality of response that may be influence by
psychopathology and many other variables.
Items: The test consists of 100 stimulus words.
Sentence Completion Tests:
Sentence completion tests are some other tests that use verbal material as projective stimuli. The number of
standardized tests is available for use.
Rotter Incomplete Sentence Blank (RISB)
Developed in: RISB is standardized test developed in 1950.
Population: The test was developed for use with populations from grade 9 through adulthood.
Scoring: The manual of the RISB suggests that responses on the test be interpreted according to
several categories: family attitudes, social and sexual attitudes, general attitudes, and character
traits. Each response is evaluated on 7-point scale ranges from “need for therapy” to
“extremely good adjustment”.
Items: The test consists of 40 incomplete sentences.
Strong Points: The test may be used for obtaining diverse information relating to an individual’s interests,
educational aspirations, future goals, fears, conflicts, needs and so forth.
Weak Points: The sentence completion test is most vulnerable of all the projective methods to faking on the
part of the examinee intent on making a good or bad impression.
Production Figure Drawings:
The use of drawings in clinical and research settings has extended beyond the area of personality assessment. It
attempts to use artistic productions as a source of information agbout intelligence, neurological intactness, visual-
motor coordination, cognitive development and learning disabilities. The figure drawings are appealing source of
diagnostic data.
Draw A Person test (DAP)
Developed in: DAP is developed on the working of Karen Machover (1949).
Scoring: The drawings of person made by examinee are evaluated for various characteristics including,
placement, size of the figure, pencil pressure used, symmetry, line quality shading the presence
of erasures, facial expressions, posture clothing and overall appearance.
Items: The test needs simple pencil and 8 ½ by 11 inch paper and person is asked to draw a person.
©copyright Virtual University of Pakistan 3
Psychological Testing and Measurement (PSY-631) VU
The House-Tree-Person test (HTP)
Developed in: The HTP was developed and popularized by Buck in 1948.
Developed for: The drawings of house tree and person are use as reflective source of psychological
functioning.
Items: The pencil and white paper is required for test and person is instructed to draw a picture of
house, a tree and a person.
Advantages of Projective Tests:
• In depth investigation
• Flexible nature
• Subject’s liberty to respond in whatever way
• Psychologist has access to nonverbal cues
• Used for psychodynamic examination
Disadvantages of Projective Tests:
• The psychologist has to be highly skillful
• These tests may be very time consuming
©copyright Virtual University of Pakistan 4
Psychological Testing and Measurement (PSY-631) VU
Lesson 35
Personality: Measurement of interests and attitudes
The First World War brought about many changes in the way psychologists were functioning. Many new
avenues of action and research were open. New tests were developed, the scope of psychology became wider,
and the application of this discipline became more practically oriented. One of the areas thus explored and
worked upon was the study and assessment of interests. As a result many new tests were developed in the
following years.
Strong Vocational Interest Blank (SVIB):
I. E. K. Strong, Jr., and colleagues studied the activities of persons belonging to different professions. They
observed that different professionals had different likes and dislikes for different activities. Their interests
followed different patterns. It was observed that hobbies of people working in same profession were also similar.
They tended to indulge in similar past time activities. Using the criterion- group approach, Strong developed a
test to see if the interests of a person/ test taker matched with the interests and values of people who were happy
in their chosen careers. The criterion group included persons belonging to various professions. The test is called
Strong vocational interest blank or SVIB. The 399 items of SVIB are related to 54 occupations for men and 32
occupations for women presented separately. Items in the SVIB were weighted according to the frequency of
occurrence of an interest in a particular occupational group as compared to how frequently it occurred in the
general population..
The test provides strong psychometric characteristics. The normative samples used for the development of this
test were well sized. Around 300 people were used in each criterion group. The raw scores obtained on SVIB
were converted in standard scores, with a mean of 50 and SD of 10. Research has shown that interests’ patterns
usually established by age 17 years. It was also observed that patterns of interest remain relatively stable over
time. In a study of Stanford University students taking this test first in the 1930s and then later on also, it was
found that the interests remained relatively unchanged even after 22 years. Though the SVIB has been used
widely it is criticized for gender bias having used different scales for men and women and lack of theoretical
information associated with SVIB. It was often criticized for not having a theoretical basis to explain why people
belonging to different professions were likely to have similar interests.
Holland (1975) had presented his theory of vocational choice. According to that theory people’s personality is
expressed in their interests. Furthermore, considering people’s interests we can classify them in into one or more
of the following six categories of personality factors:
a. Realistic
b. Investigative
c. Artistic
d. Social
e. Enterprising
f. Conventional
These factors were gender- bias free and could be used with both men as well as women. There was a similarity
between Holland’s factors and the patterns of interests yielded by research with SVIB. This appealed Campbell
who incorporated this theory in his version of SVIB.
The Strong- Campbell interest Inventory (SCII):
The SVIB had certain features for which it was criticized. A new version was developed by D. P. Campbell
and was named as The Strong- Campbell interest Inventory or SCII.
Developed in: 1974
Population: Both men and women. It is a gender free version.
Special Unlike the SVIB, the SCII does not have separate forms for men and women. The gender bias
feature: in the SVIB was removed and the items from the male and female forms were merged into one
form. The scales in this form are free of any gender bias.
Developed The measurement of interest.
©copyright Virtual University of Pakistan 1
Psychological Testing and Measurement (PSY-631) VU
for:
Items: There are 325 items. The response options include ‘like’, ‘dislike’, or ‘indifferent’. The present
version of SCII contains seven parts including:
Occupations: 131 items
1. School subjects: 36 items
2. Activities: 51 items
3. Amusements: 39 items
4. Types of people: 24 items
5. Preference between two activities: 30 items
6. Your characteristics: 14 items
Forms: Now it has one form for everyone, male or female.
Scoring: Several scores are obtained for each profile.
1. General themes based on Holland’s (1999) six personality types
2. Scoring for the administrative indexes
3. A Person’s basic interests.
4. Occupational scales
Strong The test follows Holland’s theory of vocational choice and therefore has a theoretical basis.
feature: Also it is not gender biased.
The Kuder Occupational Interest Survey (KOIS):
As mentioned earlier, in lecture 30, another most popular interest test is The Kuder Occupational Interest Survey
(KOIS).
The test taker has to select most preferred and least preferred activity among 100 triads of alternative activities. A
triad is a set of three alternates. The similarity between test taker’s interest and those who are employed in
various occupations are assessed in KOIS.
This instrument has separate norms for men and women. Separate scales for college majors are also available.
The KOIS helps in two major ways. Firstly, it suggests as to which occupational group may be best suited to a
person’s interests, and secondly it can assist students in choosing their majors. A series of new scales is added to
KOIS for nontraditional occupations. In spite of lack of research data the KOIS is useful for guidance decision
for high-school and college students.
The Jackson Vocational Interest Survey (JVIS):
Developed in: The JVIS was developed by D. N. Jackson. The version revised in 1995 is in use commonly.
Population: The test is used for the career education and counseling of high-school and college students.
Developed for: It can be used for career planning of adults and for those seeking mid-life career changes.
Items: It contains 289 statements about job-related activities. The JVIS can be completed in around
45 minutes. The items follow a forced choice format and the respondent has to select one
interest that he/she prefers over the other.
Forms and Available in both hand-scored and machine-scored forms.
scoring:
Strong Points: The test has strong psychometric properties. It carefully avoided gender bias.
The Minnesota Vocational Interest Inventory (MVII):
Population: The MVII is designed for men who are not oriented toward college. Skilled and semiskilled
trades are emphasized.
Developed for: The MVII has been used extensively by the military and by guidance programs for individuals
not going to college.
Items: The MVII has nine basic interest areas, and 21 specific occupational scales. The basic interest
areas include areas like mechanical interests, electronics, and food service. The occupational
scales cover occupations like plumber, carpenter and truck driver.
©copyright Virtual University of Pakistan 2
Psychological Testing and Measurement (PSY-631) VU
The Caree Assessment Inventory (CAI):
Developed in: The test was developed by Charles B. Johansson in 1976.
Population: The test is developed for American citizens with less than four years of postsecondary
education. This segment of the population comprises around 80 % of the U.S citizens
(Kaplan & Saccuzzo, 2001). The test requires reading ability of grade-6 reading level.
Developed for: Measuring interests.
Items: Test taker is evaluated on Holland’s six occupational theme scales.
Basic interests in 22 areas are also assessed.
89 occupation scales are used.
Strong Points: The test has good validity and reliability. The test developer also tried to make the CAI
culturally fair and eliminate gender bias.
The Self -Directed Search (SDS):
Developed in: J.L Holland developed the Self-Directed Search.
Developed for: The test attempts to simulate the counseling process by allowing respondents to list
occupational aspirations, indicate occupational preference in six areas, and rate abilities and
skills in these areas.
Items: The test has 228 items. There are six scales that have 11 items each that describe activities.
Competencies are assessed by 66 items. Another six scales with 14 items each evaluate
occupations.
The respondents list their occupational aspirations, and indicate occupational preferences in
six areas. The also rate abilities and skills in these areas.
Scoring: The SDS is self-administered, self-scored and self-interpreted vocational interest inventory.
The test takers score the inventory and calculate six summary scores. These scores and
further codes reflect areas of highest interest.
Eliminating Gender Bias in Interest Measurement:
The advocates of women’s rights justifiably pointed out discrimination against women in early interest
inventories. The Associate for Evaluation in Guidance appointed the Commission on Sex Bias in Measurement,
which concluded that interest inventories contributed to the policy of guiding young men and women into
gender-typed careers.
The SVIB has separate form for women but careers for women tended to be lower in status and to command
lower salaries. Because career choices for many women are complex, interest inventories alone may be
inadequate and more comprehensive approaches are needed.
Sentence completion tests are some other tests that use verbal material as projective stimuli. The number of
standardized tests is available for use.
Rotter Incomplete Sentence Blank (RISB)
Developed in: RISB is standardized test developed in 1950.
Population: The test was developed for use with populations from grade 9 through adulthood.
Scoring: The manual of the RISB suggests that responses on the test be interpreted according to
several categories: family attitudes, social and sexual attitudes, general attitudes, and character
traits. Each response is evaluated on 7-point scale ranges from “need for therapy” to
“extremely good adjustment”.
Items: The test consists of 40 incomplete sentences.
Strong Points: The test may be used for obtaining diverse information relating to an individual’s interests,
educational aspirations, future goals, fears, conflicts, needs and so forth.
Weak Points: The sentence completion test is most vulnerable of all the projective methods to faking on the
part of the examinee intent on making a good or bad impression.
©copyright Virtual University of Pakistan 3
Psychological Testing and Measurement (PSY-631) VU
Production Figure Drawings:
The use of drawings in clinical and research settings has extended beyond the area of personality assessment. It
attempts to use artistic productions as a source of information about intelligence, neurological intactness, visual-
motor coordination, cognitive development and learning disabilities. The figure drawings are appealing source of
diagnostics data.
Draw A Person test (DAP)
Developed in: DAP is developed on the working of Karen Machover (1949).
Scoring: The drawings of person made by examinee are evaluated for various characteristics including,
placement, size of the figure, pencil pressure used, symmetry, line quality shading the presence
of erasures, facial expressions, posture clothing and overall appearance.
Items: The test needs simple pencil and 8 ½ by 11 inch paper and person is asked to draw a person.
The House-Tree-Person test (HTP)
Developed in: The HTP was developed and popularized by Buck in 1948.
Developed for: The drawings of house tree and person are use as reflective source of psychological
functioning.
Items: The pencil and white paper is required for test and person is instructed to draw a picture of
house, a tree and a person.
Advantages of Projective Tests:
• In depth investigation
• Flexible nature
• Subjects liberty to respond in whatever way
• Used for psychodynamic examination
Disadvantages of Projective Tests:
• The psychologist has to be highly skillful
• These tests may be very time consuming
©copyright Virtual University of Pakistan 4
Psychological Testing and Measurement (PSY-631) VU
Lesson 36
Measurement of Attitudes, Opinions, Locus of Control, Multidimensional Health and Self-
efficacy
Measurement of Attitudes:
Besides measuring different personality characteristics, personality dynamics, and interests etc. psychologists
assess and investigate many other aspects of personality too. Attitudes and opinions are two such aspects. An
attitude can be defined as “a tendency to react favorably or unfavorably toward a designated class of stimuli, such
as a national or ethnic group, a custom, or an institution” (Anastasi & Urbina (2007, p. 418-419). Another related
aspect is ‘opinions’. Attitudes and opinions are closely related since one’s opinions are determined and directed
by one’s attitudes. The measures commonly used for the measurement of attitudes are called attitude scales.
There are different types of scales available that we can use. However in most attitude or opinion surveys one
needs to develop attitude scales or other instruments according to the theme of the research. One can develop
scales following the formats available. Some of the scales whose format is commonly followed are as follows:
Thurstone (1931) scale/ Equal Appearing Intervals:
In this scale the scale development is most important. A large number of attitude- related statements are
developed. The statements can be positive and negative toward the object of attitude. A panel of judges rates
each statement from one to eleven. One means highly negative on the subject and eleven indicates highly
positive. The ratings of all judges are processed and mean rating for each statement is gauged. When respondents
are given scores according to the mean values obtained from the judges. They score the scale value of each item
agreed with.
Likert( 1932) Scale:
In this type of scale a number of statements are developed regarding the object of attitude. The statements are
both favorable and unfavorable. They are rated according to the following scale:
5 4 3 2 1
Strongly agree Agree Undecided Disagree Strongly disagree
The value of a chosen response is the score of the person on that item. The total or overall score is calculated
from the score on individual items.
Some other popular scales include the Guttman (1950) scale, the Semantic Differential Scale (Osgood et al.,
1957), and the Social Distance Scale (Bogardus, 1925).
Locus of Control:
Locus of control refers to a person’s perceptions or beliefs about the location of responsibility for his or her life;
circumstances, happenings, events, conditions. The perception of who is in-charge of one’s life, who decides
one’s fate, and who is responsible for whatever the person is experiencing, is determined by the person’s locus of
control (LOC). People’s perceptions of success and failure, of health and illness, ability or inability, all reflect
their locus of control. In other words, the concept of LOC refers to perceived control; the perception of how
much a person feels in control of life
(Lefcourt, 1982). The earliest formal investigations of the concept were reported by Julian Rotter (1966, 1975).
Rotter showed that people have different views about things that happen to them. People have their own beliefs
or generalized expectations about where the control of life, or events, resides.
Rotter’s original formulation of LOC had a dualistic approach. People’s beliefs regarding who is responsible for
events, and who influences life, were classified along a bipolar dimension. Rotter showed that people have
different views about the source of control, and the things happening to them; these beliefs, and therefore
people holding them, were seen as falling into two categories namely, internal and external, that could be
measured with the I-E scale. Later on Levenson and others added to this construct as well as its measure.
However, perhaps the most quoted contribution in this regard is in the assessment of Multidimensional Health
Locus of Control.
©copyright Virtual University of Pakistan 1
Psychological Testing and Measurement (PSY-631) VU
Measurement of Multidimensional Health Locus of Control (MHLC):
Health locus of control can be measured by using specifically designed instruments. Its quantification gives an
edge to this construct over many other constructs pertaining to health beliefs. The Multidimensional Health
Locus of Control (MHLC) scales were developed by B. S. Wallston and K. A.Wallston, and their colleagues
(Wallston, Wallston, Kaplan, & Maides, 1976; Wallston, Wallston, & De Vellis ,1978). These are the most
popularly used measures for quantifying HLC.
Krischt and his colleagues (Dabbs & Krischt, 1971; Krischt, 1972) were the ones who produced the first
published version of an LOC measure that was specifically meant for use in the domain of health and illness.
Due to some inherent flaws in this measure however, it could not gain popularity and a need was felt for more
precise measures.
B. S.Wallston, and K. A. Wallston have so far made highly significant contributions to the measurement of HLC.
Multidimensional Health Locus Of Control (MHLC) Scales:
The MHLC follows the 6-point Likert response pattern, and includes three scales. In conceiving these scales,
Levenson’s (1973, 1981) multidimensional approach was followed, in which the dimension of externality was
split into two components. Hence the scales for powerful others health locus of control (PHLC), and chance
health locus of control (CHLC).Instead of treating EHLC as a single dimension that covers the influence of
powerful others and that of chance as one and the same dimension, the new versions treated PHLC and CHLC
as two separate components. PHLC, that measures the powerful others health locus of control, pertains to a
person’s beliefs about the influence of other people on her health. The people who are believed to have the
power to determine one’s health may include the family, friends, doctors, hospital staff and others.
The extent to which a person believes in the influence of chance, luck, or fate is measured by the CHLC Scale.
CHLC measures the beliefs about health or illness being related to chance, fate or luck, instead of being related
to one’s own responsibility.
The three MHLC scales include:
1. Internal Health Locus of Control (LHLC)
2. Powerful Others Health Locus of Control (PHLC)
3. Chance Health Locus of Control (CHLC)
A six-point rating scale is used with the MHLC items, where one can make a choice from response options
ranging from ‘strongly disagree’ to ‘strongly agree’. The scale contains three subscales and eighteen statements in
all. Each subscale carries six items/ statements. The items pertaining to the three subscales have been mixed up.
The scoring procedure is very simple. The values of the marked responses to each item in a subscale are added
up. The sum indicates the person’s score on that subscale. The subscale on which the person obtains the highest
score indicates the type of health locus of control that the person has. One may choose any one from forms A or
B. The IHLC Scale assesses the internal health locus of control that is the extent to which a person believes that
her health or illness is determined by internal factors. K. A.Wallston, and B. S. Wallston (1982) have asserted
that the dimensions measured by the scales are more or less statistically independent. Therefore a low IHLC
score does not necessarily indicate that the person believes in the influence of external factors, powerful others,
or chance. A low IHLC score may be understood to mean that the person’s belief in the influence of internal
factors is low (or may be non-existent in some cases).
The Measurement of Self Efficacy:
Self-efficacy, as the very name suggests, is the perception of one’s own ability to produce some desired
outcomes.
Researchers have used the construct of self-efficacy for assessing the impact of people’s perceptions of personal
control and capability on their behavior in a variety of situations. A divergent range of self-efficacy measures is
available to researchers interested in investigating the relationship between thought and action. Although a
considerable majority of studies in this regard have investigated the influence of perceived self-efficacy on
people’s health-related behaviors, measures like collective teacher self-efficacy, and teacher self-efficacy scales
have also been devised.
©copyright Virtual University of Pakistan 2
Psychological Testing and Measurement (PSY-631) VU
For a health psychology researcher, primarily two types of measures of self-efficacy are available. These
instruments can be used to assess health related self-efficacy in two ways, including:
1. General perceived self-efficacy
2. Perceived self-efficacy pertaining to specific health behaviors.
The measures included in the above mentioned categories may be used in their original form as well as with
alterations made according to the nature of problem under investigation.
1. General Perceived Self-efficacy (GSE) scale:
The most widely used measure of self-efficacy, General Self-Efficacy (GSE) Scale, was developed by Matthias
Jerusalem, and Ralph Schwarzer in 1981. The original version, in German language, comprised 20 items, but the
later version consisted of only 10 items (Schwarzer & Scholz, 2000; Schwarzer & Jerusalem, 1995). It is this 10
item version that is used by researchers studying self-efficacy in recent researches.
The GSE scale is available in at least 26 different languages. The scale was originally devised for predicting both
coping and adaptation; how people coped with daily hassles, and how they adapted after having undergone
stressful life events. This scale gauges a generalized sense of self-efficacy, indicating the overall global confidence
that a person has about personal ability to cope with a wide range of situations that may be new, novel, taxing or
demanding. GSE focuses upon a sense of competence that is broad and stable, rather than being only domain-
specific (Schwarzer & Scholz, 2000).
The response format of GSE scale is uniform for all 10 items, consisting of a 4-point scale. The response
options range from ‘not at all’ (definitely not) marked as 1, to ‘exactly true’ marked as 4. The final composite
score is obtained by adding up the responses to all 10 items. The final score may range from 10 to 40. If the
person marks ‘not at all’ in response to the entire range of items, he will be understood to be standing at the
lowest possible level of self-efficacy. This can be taken to indicate a lack of self-efficacy. On the contrary a
score of 40 will mean the person has the highest possible level of feeling self-efficacious. On average, the scale
can be completed in 4 minutes. Some people may take longer, or lesser than the average time.
©copyright Virtual University of Pakistan 3
Psychological Testing and Measurement (PSY-631) VU
Lesson 37
Alternate Approaches to Personality Assessment: Behavioral and Cognitive- Behavioral
Testing
Behavioral Assessment:
At times mere psychological testing is not sufficient for making a good judgment about a person’s personality. In
fact when using intelligence, achievement, or aptitude tests as well one may not be sure whether to rely only on
test results or not. There are occasions when one seeks other supportive evidence to support the information
yielded by the test.
This is where the significance of behavioral assessment cannot be ignored.
Behavioral assessment provides us with first hand, direct, and tangible evidence about a person. What personality
tests measure are traits, characteristics, or qualities and assumed to be underlying one’s behavior. Behavioral
assessment on the other hand yields information regarding a sample of behavior that is assumed to represent a
person’s behavior in various situations. It is always better and safer to supplement test results with behavioral
information when some sort of decision making, diagnosis, or screening is involved. Behavioral assessment can
be used as an independent approach as well. However, perhaps the best strategy would be to use a battery of
procedures by employing psychometric tools along with behaviorism assessment.
Following is a brief description of approach/ procedures used for behavioral assessment.
Behavioral Observation:
The very first and the most basic tool/ method used for behavioral assessment is observation. Although all
psychologists may use this procedure it is more commonly used by developmental, educational, child, and school
psychologists. These psychologists observe the behavior of interest and keep a record of it.
The psychologists may employ various technical devices for recording the behavior of interest. Audio or video
recordings add to the accuracy of observation by capturing the moments and those significant aspects of the
subject’s behavior that the observer may miss while taking down observational notes. The psychologist may
employ trained staff for observational record keeping.
An off shoot of such observation is self- observation whereby a subject herself reports her own behavior.
A similar approach is ‘self-monitoring’ in which the subject keeps a record of her own behavior as it happens
e.g. cigarettes smoked by a smoker during the day, binge eating episodes had by a bulimic, or seconds of spot
running done by an overweight man in a day .
Recording Observed Behavior:
For recording the behavior in question a number of approaches may be used:
a. Narrative records; the observer may take notes while observing and record each and every thing that could
be of interest. The advantage is that detailed notes are taken, but there are always chances that a lot might be
missed while the observer is writing down the notes. A better approach is to take very short notes and fill
out the gaps when the observation session is over.
b. As mentioned earlier, audio/ video recordings can be made.
c. Behavioral rating scales; one can record observed behavior in terms of codes rather than recording
narratives. This procedure not only saves recording time but also makes possible inter rater uniformity when
multiple observers are gathering information. Rating scales are designed in such a manner that one can
record presence/absence, frequency, intensity and other aspects of behavior. Observers may develop their
own scales or recording instruments, but may also choose from the available scales. Some of the available
scales include The Play Performance Scale for Children (Lansky et al., 1985, 1987), Walker Problem
Behavior Identification Checklist (Walker, 1983), Behavior Rating Profile( Brown & Hammil, 1978), and
Social Skills Rating System (Gresham & Elliot, 1990) just to name a few.
Situational Performance Measure:
It is a form of observation in which a person’s behavior is observed under specific circumstances. These
circumstances can be real or simulated. For example a candidate for lecturer’s position may be asked to deliver a
©copyright Virtual University of Pakistan 1
Psychological Testing and Measurement (PSY-631) VU
model lecture in front of a real class; an applicant of heavy duty vehicle may be asked to drive a truck on a real
life busy road; a would be astronaut has to perform under simulated situation in a state of weightlessness. The
subject’s behavior under the chosen situation may be one or more observers.
One form of this type of observation is the use of situational stress tests when a person’s behavior is observed
under certain type, level or amount of stress, anxiety, or frustration. Such an approach may be used for jobs
where the prospective candidate is expected to experience psychological pressure e.g. armed forces, bureaucracy.
The U.S Office of strategic Services (OSS, 1948) has been reported to have used situational stress tests during
World War- II for the selection of candidates for military intelligence and other positions.
Cognitive- Behavioral Assessment:
Verbal similar to the behavioral approach is the cognitive- behavioral approach to assessment. The only
difference is in terms of the cognitive component.
Why Cognitive Behavioral Testing Is Needed?
The cognitive- behavioral testing is based on the cognitive- behavioral approach which is a relatively modern
approach as compared to other approaches to testing.
In cognitive- behavioral testing, just like cognitive- behavioral therapy, an individual’s own cognition, behavior,
and related physiological responses are focused. The problem behavior itself is targeted for its understanding and
treatment. Therefore the core of interest is the symptom behavior rather than the unconscious determinants or
underlying causes that need to be explored, reached, and deciphered.
According to Kaplan & Saccuzzo, 2001, in comparison to traditionally used tests, the cognitive- behavioral tests
target the ‘disordered behavior’ rather than the underlying cause. This approach is based on the behavior model
and herein the symptom (reported as problem) is the focus of treatment. The analysis of disordered behavior is
the goal of cognitive- behavioral assessment.
There are four steps involved in cognitive- behavioral assessment (Kaplan & Saccuzzo, 2001 :
1. Identification of critical behavior
2. Determining if the critical behaviors in question are in excess or deficits.
3. Evaluation of the frequency, duration, or intensity of the behavior being considered.
4. Based on step- 3, the frequency, duration, or intensity of the critical behavior is decreased or increased.
If they were in excess then attempts to decrease would be made and if they were in deficit then an
increase will be aimed for.
In order to give you a flavor of what cognitive- behavioral assessment is like, some of assessment methods/
procedures used for this purpose are being described here.
The Fear Survey Schedule (FSS):
It is a self-report procedure used for various clinical purposes. Primarily ratings on fear are taken on a rating
scale. Initially introduced by Akutagawa (1956), the FSS is available in different versions after having undergone a
number of revisions. Originally it had 50 items. Today the different versions have 50 to 122 items, employing
either 5- point or 7- point scales.
The FSS items involve fear provoking situations and avoidance behaviors. The aim is to identify such situations
and avoidance behavior in case of the subject being assessed. The items have been derived from clinical
observations of actual cases (Wolpe & Lang, 1964) and laboratory experimental studies (Geer, 1965)
Irrational Beliefs Test (IBT):
People often hold irrational beliefs. Such beliefs do not have a logical or realistic basis but people simply can not
separate the belief from their cognitive system. A number of cognitive- behavioral tests are available for testing
of irrational beliefs that people hold. One such test is the Irrational Beliefs Test or IBT developed by R. a. Jones
(1968). The test contains 100 items. The test follows a 5- point scale format. The subject has to indicate level of
agreement or disagreement with each item. Half of the items pertain to presence and the other half to the
absence of particular irrational beliefs.
Kanfer and Saslow’s Functional Approach:
Kanfer and Saslow (1969) were the ones who played one of the lead roles in the initiation of the cognitive-
behavioral approach to assessment. In their functional approach, that is a behavior- analytic approach, excesses
©copyright Virtual University of Pakistan 2
Psychological Testing and Measurement (PSY-631) VU
and deficits in peoples’ behavior are focused. Rather than using traditional labeling of people into
psychopathological categories such as schizophrenic, psychotic, or neurotic their behavior is analyzed in view of
excesses and deficits.
According to Kaplan & Saccuzzo (2001), “a behavioral excess is any behavior or class of behaviors described as
problematic by an individual because of its inappropriateness or because of excesses in its frequency, intensity, or
duration” (p. 480). On the other hand “behavioral deficits are classes of behavior described as problematic
because they fail to occur with sufficient frequency, with adequate intensity, in appropriate form, or under
socially expected conditions” (p.481).
The functional approach proposes that same laws operate in the development of normal and disordered
behaviors. The difference occurs only in extremes. Therefore in the analysis the psychologist first identifies the
excesses and deficits and then tries to help the client decrease or increase them accordingly.
©copyright Virtual University of Pakistan 3
Psychological Testing and Measurement (PSY-631) VU
Lesson 38
Testing and Assessment in Health Psychology
Health psychology is one of the most popular areas of psychology. The main reason for its growing popularity is
perhaps the fact that it has practical relevance to most people’s life. With the growing body of research literature
and evidence in health related issues, the number of available research and assessment tools is also increasing.
This section discusses some of the tests and scales that a psychologist may use while assessing the subjects/
clients or when carrying out health psychological research.
The State- Trait Anxiety Inventory (STAI):
The STAI is based on the State- Trait Anxiety theory of Charles. D. Spielberger. The theory, and the inventory,
assumes that anxiety is of two types; state anxiety and trait anxiety. State anxiety is an emotional reaction that
may vary from situation to situation, whereas trait anxiety is a personality characteristic that may be found to be
stable across situations. There are two scales and therefore two scores yielded by STAI, the A-State and A-Trait.
The inventory has a 4- point scale format and the two scales have 20 items each.
The STAI has been, and may be, used with patients suffering from various health conditions and undergoing
surgical or other stressful procedures for an assessment of anxiety .
The Ways of Coping Scale:
As the very name suggests, the Ways of Coping Scale( Lazarus, 1995; Folkman & Lazarus, 1980) assesses the way
people cope with stress. It is one of the most po[pularly used tools in health psychology. It is a checklist in which
the subject indicates the items/ thoughts and behaviors that apply to them. It contains 68 items and the
following seven subscales:
a. Problem solving
b. Growth
c. Wishful thinking
d. Advice seeking
e. Minimizing threat
f. Seeking support
g. Self-blame
The subscales, research suggests, can be divided into two broad categories:
• Problem- focused strategies i.e., cognitive and behavior strategies for coping with stress. These strategies are
attempts made to solve the problem.
• Emotion- focused strategies i.e., ways of dealing with the emotional response to stress. Such strategies do
not help in resolving the problem.
Coping Inventory:
The Coping Inventory (Horowitz & Wilner, 1980) contains items that have been derived from clinical interview
data. Its 33 items fall into three categories:
a. Activities and attitudes people adopt for avoiding stress.
b. Strategies for working through stressful events.
c. Socialization responses
The Social Support Questionnaire (SSQ):
The SSQ developed by I. g. Sarason and co- workers (1983) is an instrument that measures social support and
related aspects. It contains 27 items of which each one has two parts. For every item the respondent has to
endorse two things which ultimately culminate into two scores:
i. Listing the persons that the respondent can count on for support in given circumstances, these responses
yield the number (N) score. The number of people listed in all 26 items is used to calculate an average (N)
score.
©copyright Virtual University of Pakistan 1
Psychological Testing and Measurement (PSY-631) VU
ii. Indicating overall level of satisfaction with these supports. This leads to the satisfaction (S) score. This score
may range from one to 6 for each item, one being very dissatisfied and 6 means very satisfied. The
satisfaction score from all items is used to get an average (S) score.
Scales for Specific Health conditions Related Locus of Control
a. Drinking Locus of Control Scale:
A 25 item-scale measuring drinking locus of control is available. It follows a forced choice format that has been
developed by Donovan, and O’Leary (1978). The items involve pairing of internal and external control
alternatives.
b. Weight Locus of Control (WLOC) Scale:
This scale assesses the internal and external determinants of one’s weight. The scale designed by Saltzer (1982),
uses 6-point Likert scale format and is a 4-item measure.
c. Perceived Behavioral Control Measure:
Armitage, and Connor (1999) developed a measure to assess perceived behavioral control. The measure includes
items like “Whether or not I eat a low fat diet is entirely up to me”.
d. Desired Control Scale:
This 70-item rating scale was developed by Reid, and Zeigler (1981). It uses a 5-point response scale. The ratings
range from ‘strongly agree’ to ‘strongly disagree’.
The scale comprises two subscales, with 35 items each. The subscales include:
i. Desire of outcomes
ii. Beliefs and attitudes
e. Health Engagement Control Strategies / HECS
This scale, uses a 5-point rating scale, comprising 9 items and has been reported by Worsch, Schulz, and
Heckhausen (2002). It contains items like “I invest as much time and energy as possible to improve my health”.
The rating options range from “almost never true” to “almost always true”.
Assessment of Perceived Self-Efficacy Pertaining to Specific Health Behaviors:
Whereas the GSE scale has been found to be a good predictor of a general sense of personal competence across
various situations, numerous studies have used the construct of self-efficacy for assessing its impact on specific
health behaviors as well. Such studies have investigated the potential influence of self-efficacy on the initiation
of health practices. Such practices include indulging in healthy lifestyles, avoiding or quitting unhealthy
behaviors, and coping with specific health conditions and / or illnesses.
The main approach of assessment in this regard remains the same as that adopted in the measurement of GSE.
However unlike the GSE measure, the specific health behavior measures focus upon the health condition in
question alone, rather than on a global ability to handle a wide range of stressful situations in general. The
researchers can simply replace the original items with items pertaining to specific health conditions, or devise
similar measures on their own.
Many studies have used such very brief scales comprising only 4-5 items. In some cases even single item
measures have been used. What needs to be kept in mind while devising and using such measures is the rule that
the item or items should bear appropriate wording. The words used in the item / items should be theory-based
and should convey exactly what the researcher wants to find out. The wording must very clearly include the
mention of both the health action and the perceived barrier or condition for action. In this regard an ‘if-then’
sentence formation has been suggested.
The semantic structure recommended for health related research is as follows: ‘I am certain that I can do XX,
even if YY (barrier)’ (Luszczynska, & Schwarzer, 2005). Scanning the available research literature, one can find
the mention of at least around a dozen different measures of specific health condition- related perceived self-
efficacy. Some of these measures are the altered forms of the scales developed by Schwarzer and colleagues,
whereas the others have been designed and developed by independent researchers. Following is a brief
©copyright Virtual University of Pakistan 2
Psychological Testing and Measurement (PSY-631) VU
description of specific health-condition related measures of self-efficacy that have been devised and used by
researchers. These measures are available for those interested in exploring the relationship between self-efficacy
and the health conditions and / or practices being investigated.
a. Measures for Assessing Exercise-Related Self-Efficacy:
Developed by Schwarzer, and Renner (2000) the exercise self-efficacy scale primarily focuses on the extent to
which a person feels capable of overcoming barriers to adopting or maintaining the habit of exercising. The
scale has a 4-point format, ranging from ‘definitely not’ to ‘exactly true’.
A similar measure, self-efficacy for regular exercise, has been reported by Lorig et al (1996). The Exercise
Regularly Scale assesses people’s confidence in regularly doing certain physical activities such as gentle exercise,
or aerobic exercise including walking, swimming, or bicycling. The subjects can choose from the ten response
options for each item, starting from ‘not at all confident’ to ‘totally confident’.
b. The Nutrition Self-Efficacy Scales:
Researchers have also developed and used self-efficacy measures specifically involving proper nutrition related
behaviors. The Nutrition Self-efficacy Scale by Anderson, Winett, and Wojcik (2000) offers a choice from ten
response options for each item, ranging from “very sure I cannot’ to ‘very sure I can’. The items assess the level
of confidence of a person in terms of how certain he is that he can indulge in behavior involving the use of
nutritious foods, such as taking a slice of bread containing fiber to school or work.
Schwarzer, and Renner (2000) have also reported the use of a Nutrition Self-efficacy Scale. The said scale once
again involves the assessment of how certain a person is that she can overcome barriers to healthy eating as well
as the extent to which the person can manage to stick to healthful foods despite personal or situational / social
impediments. The response options, four in all per item, range from ‘definitely not’ to ‘exactly true’.
c. Habit Cessation, And Abstinence Self-Efficacy:
Besides measures for gauging adoption of healthy behaviors, tools for self-efficacy related to assessing the
confidence in the ability to refrain from unhealthy behavior, are also available. One of the earliest reports in this
regard has been made by Annis (1987). The Situational Confidence Questionnaire is meant for examining
alcohol abstinence self-efficacy. This instrument, with its 6-point scale format, provides response options in
terms of percentages ranging from 0% to 100%. The mid-range response options include 20, 40, 60 and 80
percent. A response of 0% indicates ‘not at all confident’, while 100% means ‘very confident’ in resisting the
urge to drink heavily even when circumstances were favorable for drinking a lot.
Schwarzer, and Renner (2000) report a similar scale that aims to assess the level of certainty with which a person
feels in control of his own drinking behavior. The four response options range from ‘definitely not’ (1) to
‘exactly true’ (4).
Dijkstra, and De Vries (2000) have reported on a measure of self-efficacy that can be used with those trying to
quit smoking. The Smoking Cessation Self-efficacy Scale is a 7- point scale in which the response options range
from -3 to +3; from ‘not at all sure I am able to’, to ‘very sure I am able to’. The scale assesses as to how much a
person feels she can refrain from smoking in different situations.
d. Health-Protective Behaviors and Adherence to Medical Advice Self-Efficacy:
Luszczynska, and Schwarzer (2003) have reported on the use of two scales for measuring self-efficacy pertaining
to breast self-examination (BSE). The first scale, Preaction BSE Self-Efficacy Scale can help examine the extent
to which a woman feels able to perform regular BSE in spite of possible odds, besides a tendency to
procrastinate and reschedule the plan. The scale offers five response options ranging from ‘definitely not’ to
‘exactly true’. The other scale, Maintenance BSE Self-Efficacy Scale, also contains the same response options.
This scale can be used to gauge the self-efficacy felt in maintaining the regular habit of BSE.
A scale for measuring adherence self-efficacy has been used by Mohr, Boudewyn, Likosky, Levine, and Goodkin
(2001). The Adherence Self-efficacy Scale assesses self-efficacy related to self-injection. The response options go
from ‘I will not have any problems’ (1), to ‘I will not be able to tolerate it at all’ (6).
©copyright Virtual University of Pakistan 3
Psychological Testing and Measurement (PSY-631) VU
Lesson 39
Measuring Personal Characteristics for Job Placement
Imagine if you had to advise a friend in job selection. Suppose that friend has two job options available with
similar salary packages and located at same distance from home
What factors do we generally consider while taking such a decision?
Personal interest and aptitude? The skills and ability that the person has? The work place and the work setting?
The prospective boss and colleagues? Or maybe all of these factors?
You are very well familiar with the role and significance of personal interests in career choice. We have discussed
in detail the various tests and tools that can be used for the assessment of personal interests. But tests of interests
are not the only measures that help in identifying whether a person is suitable for the job or not. In other words
we have available a variety of other tests also for choosing the best person for the job.
Also, a number of assessment tools have been developed to assist you if you were to find out if you had the skills
required for a job, or if the job or the work place was what you were made for.
Psychologists have developed a variety of measures that can gauge the suitability of individuals for a particular
job by taking into account their personal characteristics as well as the features of the work setting. Different
psychologists have proposed different theories in this regard. Based on these theories a number of assessment
tools have been developed.
Osipow’s Vocational Dimensions Approach: The Trait Factor Approach:
One psychologist who is best known for the use of trait factor approach for job decision making is Samuel
Osipow. One can see that he has a global approach i.e., that considers a number of traits or let us say aspects of a
person’s personality. In this approach a number of tests, a battery of tests, are used for assessment.
The battery includes a variety of tests such as; the Kuder Occupational Interest Survey (Kuder, 1979), Strong-
Campbell interest Inventory (Campbell, 1974), Seashore Measure of Musical Talents (Lezak, 1983), and Purdue
Pegboard (Fleishman& Quaintance, 1984).
This approach gives quite comprehensive information regarding the traits and interests of the person. However it
is criticized for not taking much into account the work environment Nevertheless this approach is found to be
very useful in helping people make occupational decisions.
Roe’s Career- choice Theory: The California Occupational Preference Survey:
The core feature of Roe’s theory is its emphasis on ‘person’ or ‘nonperson’ orientation found in people.
According to Roe, this orientation plays a significant role in people’s career choice. In simpler terms maybe we
can say that whether one likes to be with pother people or not affects one’s career choice. The person/people -
oriented people would be looking for jobs where they are in contact with other people e.g. Arts, entertainment,
or other services.
The individuals who are not people- oriented would be seeking jobs that involve little interpersonal contact e.g.
lab work, science and technology, field exploration etc. Roe drew some very interesting conclusions from
extensive examination of the personalities of scientists. These scientists were working in different areas of study.
Roe proposes that career choices that people make in life are a result of their childhood experiences of
relationship with their families. That is to say that people with different types of experiences of relationships with
their family as a child will make different career choices.
According to Roe, whether people, as children, were reared in a warm family environment or a cold and aloof
one determines if they are interested in other people or not. Children brought up in a family environment that
is warm and accepting grow into people- oriented adults. On the other hand children who experienced a cold
and aloof environment turn into adults who are interested in things rather than people (Roe & Klos,1969; Roe &
Siegelman, 1964). Roe and Klos (1969) proposed the idea that occupational roles can be divided into two classes
according to two independent continua.
The First Continuum: The extremes go from “orientation to purposeful communication” to “orientation to
resource utilization”
The Second Continuum: The extremes go from “orientation to interpersonal relations” to “orientation to
natural phenomena”
People make career choices according to where they stand on these two continua.
©copyright Virtual University of Pakistan 1
Psychological Testing and Measurement (PSY-631) VU
Lesson 40
Achievement and Educational Tests
Psychologists use tests in the educational set ups for a variety of reasons.
Tests may be used for an assessment of achievement, for diagnostic purpose, for onward transmission of test
results to some other agency, for selection for a program, for entrance into an institution, or for screening before
being chosen for specific skill acquisition training.
Tests may be used for evaluation of achievement or what students have learnt in a program of study. Tests may
also be used for diagnostic purpose. This usually happens when the school teachers suspect some psychological/
behavioral problem, learning difficulty, some deficiency, or a similar problem. In such a case the child is referred
to the school counselor or some other professional outside school.
At times the parents themselves might approach the school counselor for help. In such cases the child may be
assessed for the identification of the problem.
The schools also use tests for selecting students with a certain level of intellectual ability, specific aptitudes, or
skills. This is when the student is to be selected for a program of study or training that requires specific aptitude,
orientation, or interest. Many institutions have their own admission tests that are used for selecting candidates
for admission to their institution. On the other hand some agencies or institutions develop admission tests that
are used by most institutions both nationally and internationally for admission purposes e.g. GRE, SAT, MAT
etc.
However the tests most commonly used in academic settings are the achievement tests.
Achievement Tests:
Achievement tests are meant to assess if students or trainees have learnt whatever they were supposed to learn at
the end of a course or program of instruction.
These tests measure the students’ achievement alongside the effectiveness of a program.
What, How, and When of Achievement Tests:
The assessment of achievement involves three basic decisions.
What Is To Be Assessed?
This decision pertains to:
a. The course content to be covered by the assessment tool.
b. The instructional objectives that specify the expected and desired outcomes of the teaching-learning process.
How To Assess?
This decision pertains to:
a. The type of the assessment tool.
b. The administration procedure.
c. The number, format, and difficulty level of test items.
When To Assess?
This decision involves answers to these questions:
a. At what time during the academic session will the assessment take place?
b. Once in a term, or more than once?
This decision will affect the content area to be covered in assessment.
Of all the above mentioned issues and decisions, the most significant is to cover in the test the content area that
the students have been taught keeping in mind the objectives specified for every component of the content.
Teacher Made Achievement Tests:
Teacher made achievement tests are the most common type of achievement tests.
Teachers, all over the world, and in all educational institutions are busy throughout the year either teaching or
assessing their students. Teachers have a choice to design and develop their tests the way they like them to be.
A teacher made test can be either objective or subjective. On occasions it may be a combination of both.
Objective and subjective type of items have their own advantages and disadvantages.
©copyright Virtual University of Pakistan 1
Psychological Testing and Measurement (PSY-631) VU
Objective or Forced Choice Type Of Items:
• These are difficult to develop but easy to score.
• These allow the teacher to cover a wide range of content area.
• There is always a chance of selecting the right answer simply by guessing while actually not knowing the right
answer. But this can be controlled. If the items are MCQs with 4-5 options per item, and the options are
carefully developed then guessing can be controlled to a large extent.
• Another advantage of objective type items is uniformity of scoring across examiners. No matter who does
the scoring the students will be receiving the same score.
The essential requirement for availing these benefits of objective tests is care in writing test items. The stem of
every item should be clearly stated, should not be ambiguous, and should convey what the examiner wants to
convey.
Even more important than this is the selection of appropriate response options. A good MCQ item is the one in
which every option appears to be the right answer. Therefore only the ones who know the course content are
able to select the right choice.
Subjective or Descriptive Tests:
On the other hand subjective or descriptive tests also have their advantages.
• The nature of the items is such that the examiner can test in depth knowledge of the students. However
marking and evaluation of such examiner papers may be problematic.
• The inter examiner uniformity of scoring is doubtful in such tests.
• Examiners’ personal or ideological biases may interfere with the objectivity in evaluation required of a just
teacher.
It is ultimately for the examiners to decide as to what format they prefer to use and what would suit best to the
course content.
Other Varieties of Achievement Tests:
Other than teacher made achievement tests, we have available a variety of standardized achievement tests that are
used at national and international level.
We have discussed a number of such tests in earlier sections. Let us very briefly have a look at three of these tests
which are most commonly used.
Standardized Achievement Tests:
The Scholastic Assessment Test (SAT-I):
The Scholastic Assessment Test or SAT-I, previously known as Scholastic Aptitude Test or SAT was first used in
1926, the test is the most commonly used college entrance test in the U.S.
SAT-I has two parts that contain the Verbal Reasoning and Mathematical Reasoning tests. These comprise
further subtests.
SAT-II is also available.
Graduate Record Examination Aptitude Test (GRE)
GRE is one of the most well-known tests across the globe. It is the most commonly used graduate-school
entrance test.
GRE measures general scholastic ability and contains three sections:
• Verbal (GRE-V),
• Quantitative (GRE-Q) and
• Analytic (GRE-A).
Miller Analogies Test (MAT)
MAT is the second major, widely used, scholastic aptitude test.
It is a verbal test that measures student’s ability to find logical relationships for 100 different analogy problems.
©copyright Virtual University of Pakistan 2
Psychological Testing and Measurement (PSY-631) VU
Achievement Versus Aptitude Tests:
Going through this brief description of these tests, you must have noticed that these tests are discussed as
achievement tests whereas they have the term ‘aptitude’ attached to their name. If you have taken the indigenous
GAT you must have realized that many of the items were more about your aptitude and ability rather than
achievement in the conventional sense.
That is why one question commonly arises to the mind of most students of psychological testing i.e., what is the
difference between achievement tests and aptitude tests?
In most situations these terms are used interchangeably.
If one analyzes logically, one can understand that it is not possible to cover in one test all of the content that
students from different institutions, regions, and countries have studied.
Kaplan and Saccuzzo (2001, p. 343) have given a very good comparative description of the features of the two
types of tests.
Grading, Percent Score, And Related Interpretive Issues:
School/ college tests usually use the grading system. Scores are also given in terms of percent. Grades make it
easier to understand the relative position of students.
In large scale tests, like GRE or GAT, the results are communicated in terms of percentile ranks. These describe
a candidate’s position in relation with those scoring above as well as those scoring below him or her.
©copyright Virtual University of Pakistan 3
Psychological Testing and Measurement (PSY-631) VU
Lesson 41
Multicultural Testing
In the present and the following sections we will be discussing some specific issues that psychologists might
come across when working in different situations and with different types of subjects.
Multicultural Testing:
At times psychologists are working with subjects or clients who come from a variety of cultural backgrounds and
when their cultural background may interfere with their test performance. In such situations a need is felt for
tests that can be used with all people and that are neither biased against or in favor of any specific cultural origin.
Multicultural testing refers to tests and testing procedures that are not affected by the cultural background of the
test taker. Such situations may arise in testing scenarios where immigrants belonging to different cultural
backgrounds settle in the developed countries and are to be tested on same variables using same tests. For
example:
• When measurement of IQ or personality is to be done.
• When screening, short listing, or selection for jobs is to be done.
• When diagnosis of mal adjustment or mental illness is to be done.
Also, there are situations where the same tests are meant to be used with people based in different parts of the
world and having different cultural origins and experiences.
Even people belonging to subcultures within a large society may experience cultural disadvantage.
The main idea is that the nature of many standardized tests is such that certain segments of the population may
be at a disadvantage because of their origin or in other words, cultural disadvantage. In such situations there is a
need to have tests that are free of cultural bias. Such tests may be called multicultural tests. The content, as well
as the administration, scoring procedures, or scores are not affected by the cultural origin of the test taker.
Multicultural testing is also known as transcultural testing or cross- cultural testing.
Factors That May Cause Cultural Bias:
The issue of cultural differences arises when people from one culture have to live in a culture very different from
their own culture. There are a few factors that may put one culture at a disadvantage or advantage in comparison
to another. These disadvantages become significant when people have to take psychological tests developed in
cultures other than their own.
Anastasi and Urbina (2007) describe these as parameters along which cultural differences may be found. Such
variables include:
a. Language: People are at a disadvantage if they can use only the language spoken in their own culture and not
the one used in the culture wherein they have to adjust. As a consequence they will be handicapped if
psychological tests administered to them are in the language that they are not familiar with.
b. Reading Ability: People will still be at a disadvantage if they cannot read. Most tests require certain level of
reading ability. People may be familiar with the language that the test has been designed in, but they will remain
handicapped if they cannot read the test items.
c. Speed: The speed required for completing a test may also cause problems. In some cultures life is very fast
and people are familiar with a sense of urgency to meet deadlines. On the other hand the tempo of life is slower
in some cultures and people are used to patiently waiting expected outcomes (e.g. rural and agricultural societies)
rather than striving for immediate and rapid outputs.
Therefore persons coming from such cultures may find it difficult to cope with the demands of speed based
tests.
d. Familiarity With The Format, Style, And Contents Of Tests: At times people may not be familiar with
certain forms of test items and formats of tests. Consequently they find it hard to attempt certain types of items,
©copyright Virtual University of Pakistan 1
Psychological Testing and Measurement (PSY-631) VU
may take longer than allocated time, and may also make mistakes because of not being able to understand what
they were supposed to do.
For example people may find it difficult to attempt MCQ type questions if they have not seen such items
previously. Even problems/items involving figures for assessing spatial reasoning may be a totally new
experience for test takers who have never seen or made geometric drawings
Multicultural testing includes tests that are free of the bias that may arise out of cultural disadvantage stemming
from variables such as language, reading ability, or speed.
In order to tackle the above mentioned issues, sources of bias, and disadvantages, certain measures are taken.
Multicultural tests generally do not involve reading or writing, verbal ability, or test taking speed.
As far as familiarity with the format, style, and contents of tests is concerned it is an issue that is controlled by
using performance and drawing based items. However the issue of familiarity with geometric drawings is
concerned, it is controlled by avoiding the use of designs and patterns that most people cannot relate to.
Some Multicultural Tests
The Leiter International Performance Scale- Revised (LIPS-Revised)
The LIPS-Revised (Roid & Miller, 1997) is an individually administered test whose first version was published
undergone many revisions ever since. It measures intellectual ability.
Developed in: 1940
Population: Can be used with all age groups. Its 1997 version was standardized on a sample of 2000, aged
2 to 20 years, both atypical and normal subjects from the U.S.
Special • This scale does not involve verbal instructions as such.
features: • It is individually administered and follows a difficulty level sequence i.e., the easiest item
is administered first.
• There is no time limit.
• Easels are used to present the graphic stimulus materials. The picture cards that the
subject considers to be the appropriate response are placed in the provided response tray.
Measures: The scale covers four domains:
a) Reasoning
b) Visualization
c) Attention
d) Memory
Tasks in domains: The scale involves various tasks meant for various age levels
Reasoning and Visualization: matching, form completion, design analogies, sequential
ordering, paper folding, figure rotation, and classification.
Attention and Memory: Sustained and divided attention measures; immediate and
delayed memory tasks.
Raven Progressive Matrices:
One of the most popularly used nonverbal and culture free tests of general intelligence is the Raven Progressive
Matrices (RPM).
As you are already familiar, it uses a multiple choice format. In each test item, the subject is asked to identify the
missing element that completes a pattern. The test can be administered to groups or individuals of 5 years old to
older adults. There are 60 matrices with a missing part presented in graded difficulty. The subject selects
appropriate pattern from a group of eight options.
Goodenough-Harris Drawing Test:
The Goodenough-Harris Drawing Test is the quickest, easiest and less expensive nonverbal test for measuring
intelligence. The subject is asked to draw a whole human figure. The test is scored for each item included in
©copyright Virtual University of Pakistan 2
Psychological Testing and Measurement (PSY-631) VU
drawing. The subject gets credit for inclusion of elements such as individual body parts, proportion, perspective,
clothing details etc.
The G-HDT scoring follows the age differentiation principle; older children tend to get more points because of
greater accuracy. It is not a test of the subject’s drawing or artistic skill. What is considered important is the
development of conceptual thinking and accuracy of observation.
In the revised scale the test is not limited to the drawing of a man alone. The subject is asked to draw picture of a
woman and of one’s own self. The self-scale is used as a projective test of personality (Anastasi & Urbina, 2007).
The test has good psychometric properties. As previously mentioned, the scores on the G-HDT can be related
to Wechsler IQ scores. The test can be more appropriately used in combination with other tests of intelligence.
IPAT Culture Fair Intelligence Test:
R. B. Cattell directed the development of this test. The IPAT Culture Fair Intelligence Test is a paper pencil test
for three levels;
• Age levels 4-8 years and mentally disabled adults,
• Age levels 8-12 and randomly selected adults, and
• High-school age and above-average adults.
©copyright Virtual University of Pakistan 3
Psychological Testing and Measurement (PSY-631) VU
Lesson 42
Adaptive Testing and Other Issues
Computer Based Administration
Have you ever thought that different test takers have different attitudes toward test that they are taking?
They have different levels of motivation in taking the tests. Their abilities and aptitudes are different, and so is
their test taking approach.
Subjects’ response characteristics are also different and they affect their score or performance on a test .
Psychologists involved in test development have been working on the possibility of tests as well as test
administration procedures that take such differences in consideration
In this section, we will discuss some of such issues.
Adaptive Testing
As said earlier, psychologists have been working on the possibility of tailor- making the tests according to the
individual response characteristics of the test takers.
The idea is that people should not be at a disadvantage because of their specific response characteristics. What
happens usually is that people start taking the test and then they go on attempting all the items with increasing
difficulty level no matter if the easier items were attempted correctly or not. As a consequence many people
might score worse than what they could have achieved had the test been used according to individual response
characteristics of the test takers.
Adaptive testing refers to the testing procedure whereby test item coverage is adjusted according to the response
characteristics of individual subjects. There are different procedures available for this purpose. One such
procedure described by Anastasi & Urbina (2007) involves Two-Stage Adaptive testing with three
measurement levels. The authors give the example of a hypothetical test that comprises 70 items in all. Ten items
are placed in a routing test while the remaining 60 items are divided in three measurement tests of 20 items each.
These three measurement tests are of varying difficulty levels; easy, intermediate, and difficult. All subjects will
attempt the ten items in the routing test but not all of the other 60 items. They will be taking any one of the three
measurement tests depending on their performance on the routing test. Therefore everyone will be given 30
items in all, but the last 20 items that everyone will attempt will vary from person to person. So one can expect
that if one could do the difficult items in the routing test then one will get the ‘difficult’ measurement test, a
difference one could only do the easy items then one will be getting the ‘easy’ measurement test.
The authors (Anastasi & Urbina, 2007) have described an alternate to this two- stage model. This second model
is the Pyramidal Testing Model since it progresses in the form of a pyramid. In this model every one begins
with an item of intermediate difficulty level.
If one manages to answer the item correctly then one is given the next item of a higher level. If, on the contrary,
one fails in the first item then one is routed downward to an item of lower difficulty level. This procedure is
repeated until the test taker manages to answer the desired number of items. It can be seen in both of these
models that the test takers are treated according to their response pattern. These procedures can be used as it is
as well as in their varied forms. Although these procedures for adaptive testing can be done manually using
simple paper and pencil, they are quite tedious. Computerized adaptive testing is a convenient option that
provides facility to the psychologists.All that is required is the availability of the suitable software and the skill on
part of the psychologist.
Computer Based Administration:
Like all other fields of life computers have been playing a very significant role in psychometrics. Availability of
computers has facilitated the psychologists in a number of ways. They have made things possible and easier;
whether it is computerized administration, scoring, item analysis, analysis of data obtained from large
standardization samples, or adaptive testing. There are situations when group testing involves participants in
large numbers or tests that involve tedious procedures.
In such situations computers can be, and are used for administration and scoring for example in case of use of
multilevel batteries, different forms of educational testing, aptitude testing on its own or for career guidance, and
achievement testing.
©copyright Virtual University of Pakistan 1
Psychological Testing and Measurement (PSY-631) VU
Today all major achievement and ability tests are administered and/or scored with the help of computers e.g.
GRE, SAT, MAT, GAT, IELTS etc. Computers have made it possible to devise new ways and instruments of
testing and assessment
Alternatives to Psychological Tests!!!
Can we use other ways of assessment rather than using psychological tests???
Interviews as Assessment Tools:
Psychological tests are just one form of instruments that a psychologist may use foe making assessment of
people’s personality, IQ, ability, aptitude or any other variable of interest. Interviews are another such
instrument. Interviews provide an opportunity to have a direct, face to face, interaction with the person being
examined. Interviews can be used as an alternative to psychological tests.
Kaplan & Saccuzzo, 2001, have highlighted similarities between a test and an interview.
According to them the two have these common features:
• Method for gathering data
• Used to make predictions
• Evaluated in terms of reliability
• Evaluated in terms of validity
• Group or individual
• Structured or un structured
Types of Interviews That Are Used For Assessment:
a. Evaluation interview: This interview helps the psychologists assess and understand why the student/ client/
individual has come to them.
b. Structured Clinical Interview: Structured interviews follow a fixed and set pattern of questions and
procedures. This pattern may be decided by the clinic/hospital/institution or may be recommended by some
other agency e.g. the use of DSM according to a sequence of steps.
c. Case History Interview: This interview may be more detailed as compared to other types as it aims at in
depth information. It usually takes a developmental approach. These are relatively flexible though focused.
d. Mental Status Examination: This interview is more fixed and focused and used more commonly in
psychiatric settings. Usually it is used when some psychiatric, neurological, or emotional problem is suspected
d. Employment Interview: These interviews are used by the employers for the selection of right people for the
available jobs. Such interview may be both structured and/or unstructured, depending upon the nature of the
organization, the employer, and the position for which interview is being made.
Interviewing Skills Required In Psychologists:
• Practice and training
• Command over language and vocabulary
• Overcoming personal complexes
• Empathy
• Flexibility and acceptance of the other person’s opinion
• Control over own emotional reactions
• Cultural sensitivity
• Note taking skills and technological assistance
©copyright Virtual University of Pakistan 2
Psychological Testing and Measurement (PSY-631) VU
Lesson 43
Social and Ethical Considerations in Testing
The use of psychological tests may seem to be a simple and straightforward thing but it it may involve a variety
of social and psychological issues pertaining to the use of these tests. Psychology being a very well organized
discipline takes care of these issues. Psychologists have developed rules and regulations for carrying out research
and all sorts of investigation, including psychological testing and assessment.
Psychologists are expected to follow a strict code of ethics in the endeavors they take up.
The most commonly followed ethical standards are the ones developed and published by the APA or the
American Psychological Association
With the increasing application and popularity of Psychology there has been a growing concern about the way
psychologists operate. This becomes more significant and relevant when psychological research and assessment
are under discussion.
Although psychological research and assessment may be treated as two different areas, one can see that they
overlap a lot. All psychological researches involve some form of instruments of data collection. These
instruments are in fact most of the times some form of psychological tests. Therefore the ethical standards set
for psychological research also apply to psychological testing. Before going into the details of ethics involved in
psychological testing let us have a look at some agencies or sets of ethical standards available for psychologists
involved in psychological testing:
APA Ethics Code:
This document by the APA covers most aspects of psychological testing ranging from confidentiality,
development and use of psychological assessment techniques to legal and forensic contexts.
Principles for the Validation and Use of Personnel Selection Procedures:
This document containing guidelines for the validation and use of assessment procedures employed for
personnel selection was developed in 1987 by Society for Industrial and Organizational Psychology (SIOP).
The RUST Statement/ Responsibilities of Users of Standardized tests:
The American Counseling Association (ACA) adopted this statement in 1989.
“Ability Testing: Uses, Consequences, and Controversies”:
This book, by Wigdor and Garner (1982), covers all aspects of ability testing.
This publication, a two volume book, is actually the report of a project that investigated the use of ability tests in
various settings. This four year project looked into the use of such tests in a variety of settings ranging from
schools to job selection.
Board on Testing and Assessment / BoTA:
This board, established in 1993, primarily works on the use of psychological tests and other tools of assessment
as tools of public policy. This board was created in the U.S. under support by the departments of Defense,
Education, and Labor (Anastasi & Urbina (2007).
Ethical Issues in Psychological Testing
Ethical Issues and ethical Standards in Psychological Testing:
As mentioned earlier, a number of agencies have attempted to propose and set ethical standards for test use.
These standards pertain to various aspects of testing, from test development to application of tests in a variety of
situations for attaining a wide range of objectives.
In the forthcoming sections we will be discussing the general ethics that test developers and test users need to
keep in mind while doing psychological assessment. Most of the times we refer to, and follow, APA standards.
On occasions some aspects of these standards or guidelines may be found in test manuals as well, particularly
with reference to the person using or administering the test.
©copyright Virtual University of Pakistan 1
Psychological Testing and Measurement (PSY-631) VU
1. The Training and Eligibility of the Test User:
• The person using or administering the test should be properly trained and experienced.
• We know that the personality of the psychologist or the person responsible for test administration
should have appropriate qualification and training.
• The standards of qualification may vary from place to place or institution to institution.
• However, generally the academic institutions or professional/psychological associations specify the
minimum qualification for test use and administration.
• As far as training is concerned, the psychologist or the administrator should have completed sufficient
number of hours of supervised training before carrying out independent test administration.
• Once again, the number of hours of training may vary from place to place and institution to institution.
• Training and qualification are specifically required in case of intelligence tests, particularly individually
administered tests, and personality tests.
• This issue becomes even more significant when the interpretation of projective tests is in question.
• In case of achievement tests, especially objective tests, a compromise may be made on the qualification
of the person responsible for test administration.
• In this regard we should consider what the Ethics Code states. According to the code the psychologists
should “provide only those services and use only those techniques for which they are qualified by
education, training, or experience” (APA, 1992, p. 1599).
2. Human Rights and Test Use:
• All individuals have a right to decide if they want to be tested or not.
• People should not be tested if they refuse to be tested.
• Legal or forensic scenarios may be an exception where testing is directed by a court of law.
3. Invasion of Privacy:
Confidentiality is an essential ingredient of counseling and clinical encounters
Unless allowed by the subject/participant/ test taker, the psychologist should not disclose and make public the
following:
• That the person was tested
• The scores of the person
• The interpretation of the test results.
• The diagnosis, if any.
As said earlier, legal situations are an exception. Also, in case of achievement tests the test results are generally
understood to be made public.
4. Confidentiality, Honesty, and Openness:
• The psychologist should explain the nature and purpose of the test to the subject.
• The subject also needs to be informed about the possible use of test results.
• The test results should not be used for any purpose other than those mentioned to the subject.
• As far as explaining the nature of tests before hand is concerned, it may be problematic in case of some
tests.
• For example in case of projective tests like Rorschach, TAT, or WAT the test performance may be
affected by information about the true nature of the test. In such situations the required information
may be provided immediately after the test is over.
5. Care with Labeling:
• Unless essential, the psychologist should try to avoid labeling.
• People may be diagnosed with a certain problem, or their scores may be indicative of certain tendencies.
However labeling should not be done unless that was the main objective of testing.
©copyright Virtual University of Pakistan 2
Psychological Testing and Measurement (PSY-631) VU
• Certain labels have a social stigma attached to them e.g. schizophrenic, or PWA (patient with AIDS).
Such labels may be damaging the interests of the test taker in many ways; socially, psychologically, and
even financially if the person is refused employment.
• Therefore the psychologist should try to avoid labeling if possible.
6. The Issue of Divided Loyalties:
At times the psychologists do assessment because it was requested and paid by an organization. In such
situations the services of the psychologists are hired by the organization, but being their profession demands care
and protection of the test taker as well. If there is a clash between the interests of the organization and those of
the test taker then the psychologists have to take rational decisions.
In such situations the following steps may be taken ( Kaplan & Saccuzzo, 2001):
• The psychologist must inform the clients beforehand about the purpose for which the test results will be
used.
• The clients should also be informed about the limits of confidentiality.
• The results should be explained to the client or his representative.
• The organization may be provided only that much of information that was required.
• In case of adverse decisions the person’s right to know the results should be preferred over test security.
7. Issues Pertaining To the Test Developers:
• The test developers should be fair and objective.
• They should use test content that is not gender biased or culturally biased.
• If the test is to be used with diverse populations then it should be culture free.
8. Issues Pertaining To the Test User in Diverse Populations:
• The test users or administrators should be fair to the test takers.
• Tests developed and standardized in other cultures should not be used blindly with subjects belonging to
very different cultures.
• The tests should either be culture free, or translated, adapted, and standardized for indigenous culture.
©copyright Virtual University of Pakistan 3
Psychological Testing and Measurement (PSY-631) VU
Lesson 44
Assessment and Psychological Testing in Clinical & Counseling Settings
Psychological testing is a part of psychological assessment. Assessment is more than testing. It involves
behavioral observation, interviews, and examination of case history. In the counseling and clinical settings
psychological tests may be used as independent tools or as part of a complete assessment package.
Tests used in Clinical and Counseling Psychology:
In these settings psychological tests are used for diagnosis, induction in treatment groups or hospitals, for general
assessment, and for gauging the rate of recovery. All intelligence and personality tests may be used. For example
HTP can depict psychopathology.
Some tests are used for diagnosing specific learning disabilities e.g. Kaufman Test of Educational Achievement
(K-TEA).
The counseling or clinical psychologists commonly use tests for the following purposes:
• General assessment of ability/ IQ
• General assessment of personality
• Diagnosis of intellectual deficits
• Diagnosis of mental disorders
• Assessment of aptitude
• Neuropsychological assessment
• Assessment of learning disabilities
You are familiar with many of the tests used for the above mentioned purposes. In addition to these tests
behavioral assessment also proves to be an important tool.
Neuropsychological Testing:
In this section we will discuss areas that need special attention.
Neuropsychological testing is an area of psychological assessment that is quite complicated, particularly when it is
with reference to the diagnosis of brain damage. The assessment may involve an extensive battery of tests. The
person may be tested in a number of areas; cognitive ability, verbal ability, spatial relations etc.
A number of tests are required for making an exact diagnosis of the problem.
Some of the tests used for neuropsychological testing include the following:
Bender- Gestalt test and Benton Visual Retention Test: These two tests are quite commonly used for
neuropsychological testing. However a single test may not prove to be an accurate instrument. Therefore
batteries of tests are preferred for this purpose.
Halstead- Reitan Neuropsychological Test Battery (HRB- Reitan &Wolfson, 1993) and the Luria-
Nebraska Neuropsychological Battery: Batteries like these are preferred over single tests because rather than
providing information in one particular area they can provide information in a variety of areas.
According to Anastasi & Urbina (2007) these batteries are useful tools because:
• They provide measures of all significant neuropsychological skills.
• These and similar standardized batteries can detect brain damage with a high degree of success.
• Such a battery can help identify and localize the impaired brain areas.
• Differentiation between particular syndromes associated with cerebral pathology can be made.
Behavioral Assessment:
• The behavioral assessment procedures include the following:
• Self-report by the client
• Direct observation of behavior
©copyright Virtual University of Pakistan 1
Psychological Testing and Measurement (PSY-631) VU
• Physiological measures
Self-report By The Client:
• Self-reports can be made in various forms such as inventories and checklists.
• Clinical interviews can also be one of the procedures used for this purpose.
• One of the most commonly used such tools is the Beck Depression Inventory (BDI).
• The BDI involves self-ratings on 21 items that help assess the severity of depression.
• Alcohol Use Inventory (Horn, Wanberg, & Foster, 1990) is another such instrument.
• Some instruments involve multiple informants.
The Social Skills Rating System (SSRS):
Gresham &Elliott, 1990:
Positive and problematic behaviors of students in educational and family settings can be evaluated.
There are separate forms for parents, teachers, and students themselves.
Behavior Assessment System for Children / BASC:
• This is one of the most comprehensive instruments of its kind.
• It includes behavior rating scales for teachers as well as parents.
• The children can be given a self-report questionnaire.
• This system also contains a form that can be used for coding and recording classroom behavior.
• In order to take developmental history from parents an additional structured interview is also available.
Direct Observation Of Behavior:
It may be recorded by the psychologist, parents, teachers or any other designated person
The observation takes place in naturalistic setting. The observation may be recorded in the form of narratives,
checklists, rating scales, record forms or similar tools.
Physiological Measures:
Depending on the nature of problem a number of physiological measures may be used.
Such measures are employed in case of cases of anxiety, sleep disorders or similar cases.
These measures may include measures of cardiovascular activity, cerebral functioning, electrodermal, and electro-
ocular activity. Behavioral assessment and clinical judgment
Evaluation of Various Assessment Techniques:
All techniques have their advantages and disadvantages. The psychologists may choose any method that best
serves their purpose and has minimum limitations.
©copyright Virtual University of Pakistan 2