0% found this document useful (0 votes)
130 views10 pages

The Construction of Unidimensional Tests

This article reviews methods for constructing unidimensional psychological tests. Classical item analysis aims to maximize average item-test correlations by selecting items highly correlated with the total test score. However, with fallible items, maximum correlations may not be achieved. Additionally, item reliabilities needed to correct correlations for attenuation are difficult to obtain accurately. While classical item analysis increases homogeneity, it does not ensure strict unidimensionality. The article evaluates several other methods against criteria of providing rational item selection procedures, explicit unidimensionality criteria, and known sampling distributions of dimensionality indices.

Uploaded by

Alexandre Peres
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
130 views10 pages

The Construction of Unidimensional Tests

This article reviews methods for constructing unidimensional psychological tests. Classical item analysis aims to maximize average item-test correlations by selecting items highly correlated with the total test score. However, with fallible items, maximum correlations may not be achieved. Additionally, item reliabilities needed to correct correlations for attenuation are difficult to obtain accurately. While classical item analysis increases homogeneity, it does not ensure strict unidimensionality. The article evaluates several other methods against criteria of providing rational item selection procedures, explicit unidimensionality criteria, and known sampling distributions of dimensionality indices.

Uploaded by

Alexandre Peres
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Psychological Bulletin

1961. Vol. 58, No. 2, 122-131

THE CONSTRUCTION OF UNIDIMENSIONAL TESTS


JAMES LUMSDEN
University of Western Australia

It is the purpose of this article to TABLE *


review methods which have been sug- PATTERNS OF RESPONSES FOR A UNIDIMEN-
gested, either directly or indirectly, SIONAL TEST OF FIVE INFALLIBLE ITEMS
for the construction of unidimen-
sional tests. No general survey of this Total Item
topic appears to have been made Scores
1 2 3 4 5
previously but much help was ob-
tained from critiques by Loevinger 0 F" F F F F
(1948), Guttman (1950a, 1950b,1950c), 1 P F F F F
and White and Saltz (1957). 2 P P F F F
3 P P P F F
Definition of unidimensional tests. 4 P P P P F
A unidimensional test may be defined 5 P P P P P
simply as a test in which all items are
measuring the same thing. A set of "P-Paas P-Fntl.
high jumps or a set of broad jumps is
unidimensional. A mixture of high however, it is sufficient to take the
jumps and broad jumps is not. In answer pattern of Table 1 as provid-
psychological tests, however, items ing a working definition of unidimen-
which appear to be of the same sort sionality remembering that with falli-
often turn out on closer investigation ble items the answer pattern will be
to be measuring different things so disturbed by random error.
that this simple definition will not Criteria for evaluation. In evaluat-
suffice for the construction of tests. ing the methods, major consideration
A more precise definition is given will be given to the extent to which a
by considering the answer pattern method provides for:
that would be yielded by a unidimen- 1. A rational procedure for item
sional test with infallible items. If selection
the items are arranged in order of 2. A criterion of unidimensionality
difficulty placing the easiest first it 3. An index or measure of unidi-
will be found that a person who fails mensionality
the first will fail all the other items; a A rational procedure for item selec-
person who passes the first and fails tion is essential. Any method which
the second will fail all the subsequent provides no adequate indication of
items and so on. That is, the pattern the most likely items to be discarded
of responses for five items could only from the pool and which relies on a
be one of the forms shown in Table 1. blind trial-and-error procedure to
With fallible items where the result discover the unidimensional set of
may be affected by fluctuations in items will be hopelessly uneconomical
the ability of the subjects or in the for practical test construction. In
difficulty of the items a perfect answer general the method should be con-
pattern may not be found even when vergent so that the homogeneity of
the items do systematically measure the item set increases as the proce-
the same thing. For our purposes, dure is applied and items are pro-
122
CONSTRUCTION OF UNIDIMENSIONAL TESTS 123
gressively removed from the original consideration will be given only to
pool. Minor departures from this classical item analysis, Loevinger's
principle at critical stages (usually technic of homogeneous tests, the
the beginning) are permissible so long independence criterion method, Gutt-
as the number of trials to reach a con- man's answer pattern method, and
vergent state of affairs is not too factor analysis. Most other methods
large. are special cases of one or other of
A criterion of unidimensionality is the listed methods and for the pur-
necessary so that checks can be made pose of this review it is unnecessary
from time to time and decisions made to consider them. For example,
either to continue culling of the item criticisms of the Guttman procedure
pool or to stop culling because a will apply also to the Cornell tech-
homogeneous set of items has been nique (Guttman, 1947a) and H tech-
obtained. nique (Stouffer, Borgatta, Hays, &
An index of the closeness of ap- Henry, 1952). Certain related tech-
proximation to unidimensionality is niques such as the Thurstone attitude
also required. Failure to find a set of scaling methods give tests of uni-
items which meets a strict criterion dimensionality as a by-product but
of unidimensionality is certainly pos- as test construction methods they
sible and, indeed, very likely. The are subject to the same criticisms as
set of items in question may, how- classical item analysis and the inde-
ever, be more unidimensional than pendence criterion.
any other measuring instruments
available and would be preferable to CLASSICAL ITEM ANALYSIS
a completely heterogeneous set of Classical item analysis using an
items alleged to measure the same internal criterion attempts among
attribute. The index of unidimen- other things to increase the average
sionality may be related to the pro- item-test correlation by selecting
cedure for selecting items and/or to from the item pool those items which
the proposed criterion of unidimen- have the highest item-test correla-
sionality, or, on the other hand, may tion. It is well-known that this pro-
be quite independent of either of cedure tends to increase the homo-
these. It would be desirable for the geneity of the test.
sampling distribution of the index of From Table 1 it will be clear that
unidimensionality to be known. for infallible items forming a unidi-
It should be noted that the index mensional test the item-test correla-
of unidimensionality is not quite the tion will be the maximum permitted
same as the measures of reproduci- by the shape of the distribution of
bility discussed by White and Saltz test scores. With the answer pattern
(1957). Reproducibility as defined of Table 1 there is no overlap in the
by White and Saltz confounds relia- distribution of test scores for those
bility and dimensionality since the who pass and those who fail a given
measures are affected by random item. The difference in mean test
errors as well as by systematic differ- scores of passers and failers is thus a
ences in item content. An index of maximum and the biserial correla-
unidimensionality appropriate to the tion between item and test is conse-
definition used here should be inde- quently maximized. It would appear
pendent of random error. then that if the culling of items pro-
Methods to be reviewed. Explicit ceeds to the point where the item-test
124 JAMES LUMSDEN
correlations are all maximized the For this method the criterion of
resulting test would be unidimen- unidimensionality would be maxi-
sional. There are a number of diffi- mum biserial after correction for at-
culties which make this program un- tenuation. No sampling distribution
likely to succeed. of corrected biserials appears to be
With fallible items the maximum available so that the significance of
item-test biserial will not be reached. departures from the perfect fit cannot
One solution would be to correct the be assessed. This is specially impor-
obtained biserials for attenuation tant in this case since the estimates of
using estimates of the reliability of item reliability on which correction
item and test scores. Accurate esti- is based are likely themselves to be
mates of the reliability of a single quite unreliable.
item are not easily obtainable. As- The logical measure of unidimen-
suming that this difficulty can be sionality would be average corrected
overcome a test would be regarded as biserial. This would need to be con-
unidimensional if the biserial correla- sidered relative to the maximum ob-
tions between item and test ap- tainable biserial (biserial r has a maxi-
proached the maximum after correc- mum of 1.0 only when the continuous
tion for attenuation. variable is normally distributed). A
Even granted the assumption that ratio of corrected biserial to its maxi-
accurate estimates of item reliabilities mum similar to Loevinger's Ht sug-
can be obtained the method is not gests itself but the absence of a knowl-
satisfactory. Consider the set of edge of its sampling distribution
items with factor constitutions as would restrict its value.
follows : An obvious possibility would be to
use the Kuder-Richardson Formula
20 with correction for variation in
item difficulty suggested by Horst
(1953). This statistic is, however,
affected by random as well as sys-
tematic variance and is therefore, a
measure of reproducibility rather
than an index of unidimensionality.
where a, b, and c represent different There would seem nothing to prevent
orthogonal common factors, m, n, the development of an index based
and p, q, r are loadings; and e\, e^ 63, on the ratio of obtained K-R 20 to
64, and en are error factors. the maximum K-R 20 for items with
Lumsden (1957) has shown that a given amount of random error.1
Items 1 and 2 form a unidimensional A search of the literature has not
subtest and that Items 3, 4, and 5 revealed any writer who has advo-
with differing loadings on c are not cated the use of classical item analysis
unidimensional. Yet the method of techniques as described above in
maximizing item-test correlations will order to produce unidimensional tests.
eliminate Items 1 and 2 first and no Thorndike attempted to demonstrate
unidimensional test will be discovered . the "homogeneity of intellect CAVD"
The only way out of this impasse by correlating scores on subgroups of
would be to try sets at random which
would make the procedure nonra- 1
1 am indebted to John Ross (University of
tional. Sydney) for this suggestion.
CONSTRUCTION OF UNIDIMENSIONAL TESTS 125
items with scores on the total set of LOEVINGER'S TECHNIC OF HOMO-
items and correcting the obtained GENEOUS TESTS
r's for attenuation. Evidence was Loevinger's procedure is closely
presented (Thorndike, Bergman, related to classical item analysis and
Cobb, Woodyard, 1926, p. 566) that indeed she indicates (Loevinger 1947,
these corrected correlations approxi- p. 26) that the earlier work by Thorn-
mated 1.0 and Thorndike concluded dike on the CAVD tests may have
that this demonstrated the homo- been influential in the development
geneity of CAVD tests. The logic of of her procedure.
Thorndike's procedure is impeccable The procedure is based on two
if applied to single items or to ran- statistics: the "homogeneity of an
domly selected subgroups of items item with a test" and the "homo-
but his subgroups were arranged so geneity of a test." The first of these
as to have, like the total set, equal is to be used as a tool for item selec-
numbers of Completion, Arithmetic, tion and is a development of Long's
Vocabulary, and Directions items. (1934) index of overlapping. The
Thorndike was thus merely able to formula for this is given by Loevinger
show that the composite score ob- as:

2 ("passes" below or tied with "fails")


PQ- "passes" one above "fails"

tained from his subsets was similar where P is the number passing the
to the total score obtained from the item and Q is the number failing the
complete set but not that the subsets item. It is clear that for a perfectly
or the total set were homogeneous in unidimensional test as denned by
the sense used here. It is only fair to Table 1 H,t will equal 1.0 since there
point out that Thorndike was mainly will be no subjects who pass an item
concerned to show that his easier who will have scores below or tied
sets of items and his harder sets gave with subjects who fail the item. Us-
the same sort of results as the total ing this statistic to cull a mixed set of
set. items will, however, be subject to all
Wherry and Gaylord (1943) sug- the difficulties encountered with clas-
gest as an alternative to factor analy- sical item analysis.
sis an iterative procedure based on The index of unidimensionality is
classical item analysis. In this proce- provided by the "homogeneity of a
dure each item is correlated with test," Ht. Loevinger notes that for a
total score; those items with the perfectly heterogeneous test pi/j = pi
highest correlations are selected and (i.e., probability of passing an Item i
a new total formed; all items (includ- having passed another Item j is the
ing those not selected in the first same as the overall probability of
stage) are then correlated with the passing Item i). For a perfectly homo-
new total and the procedure is con- geneous test as defined by Table 1,
tinued until a stable group of items pi/i = l.Q for all cases where pi>pj
is obtained. White and Saltz (1957) (i.e., where Item i is easier than Item
commend this method but it would j). From this it will be seen that
not appear to avoid any of the diffi- p,/i has a minimum value of pi for all
culties of classical item analysis. cases where pi> pj.
126 JAMES LUMSDEN

Loevinger then considers the sum : systematic variance. If we take sub-


jects who are exactly 6 feet tall then
s= Z different measures of height will vary
only through .error so that the meas-
urements will be independent, uncor-
where m is the number of items and related. The independence criterion
the item pairs are all such that is undoubtedly valid and is more gen-
eral than any other. It makes no
This sum will have a maximum assumptions about the distribution
value given by of ability or rectilinearity of regres-
sion.
The criterion suggests a procedure
for constructing unidimensional tests.
for a perfectly homogeneous test and It would be possible to obtain results
a value of zero for a perfectly hetero- on a pool of items from a large group
geneous test. To provide an index of subjects, to choose a number of
with the formal properties of a mini- subjects with the same total score,
mum of zero and a maximum of 1.0, and then to determine say by x2
Loevinger divides .S by Sm^ to give: whether the items are independent
or not. If certain items turned out
m—1 m
not to be independent these could be
„ £ Z Pi(Pm-pd rejected, new totals worked for all
subjects in the original group, a new
group with the same total score de-
termined, and the x2 test repeated.
The true scores on the test are not,
Loevinger provides a formula for however, known and the estimates
estimating Ht from sample statistics obtained from the raw test scores are
but points out that the sampling not satisfactory. O'Neil (1954) has
distribution is unknown and that the shown that, for subjects with the
estimate is not even known to be same obtained score, items, even if
unbiased. unidimensional, tend not to be inde-
pendent but to be negatively corre-
INTERDEPENDENCE CRITERION lated. If there are only two items for
Lazarsfeld (1950), Tucker (1952), example then for subjects with an
and Lord (1952) • have pointed out obtained score of 1 the items have a
that with a unidimensional test the tetrachoric correlation of —1.0 since
probability of success on one item is if a subject has the first item right he
independent of success in any other must have the second one wrong and
item for subjects with the same true vice versa. In mathematical terms
score. This is at first sight paradoxi- £i/y = 0 instead of pi as required by
cal because it would seem obvious the independence criterion. This
that items which are measuring the effect is known to decrease as the
same thing should be highly corre- number of items is increased and it is
lated. But when only subjects of the possible that the independence crite-
same true ability are considered then rion may be workable for fairly large
items which are measuring this abil- groups of items. With infallible items
ity and nothing else can differ only there is, of course, no problem since
through error and will exhibit no true scores will then equal the ob-
CONSTRUCTION OF UNIDIMENSIONAL TESTS 127

tained scores and the quoted example sistent with retaining a sufficient
could not occur (if the items were number of items was obtained. For
unidimensional). this, some measure of the closeness of
Even if this problem is overcome, approximation to unidimensionality
the culling of items is likely to proveis required. Guttman uses the co-
arduous. All items in the pool are efficient of reproducibility which is
likely to be correlated on the first the proportion of responses which can
trial. In the absence of any knowl- be correctly predicted from the total
edge about the number of items in raw score. For a perfectly unidimen-
the unidimensional set it is impossiblesional test it will be seen from Table 1
to say whether the unidimensional that the reproducibility coefficient
items will be more or less intercorre- will have the value 1.0. Guttman
lated than the items it is desired to suggests that a test may be regarded
reject. No rational, convergent pro- as a "scale" (i.e., as unidimensional)
cedure of item culling is available if the coefficient of reproducibility
using the independence criterion. exceeds .90.
No special index of unidimension- The coefficient of reproducibility
ality is suggested for this method. has been criticised severely by Fest-
This, of course does not matter, since inger (1947) and Jackson (1949) be-
if the method was otherwise suitable cause it does not allow for the chances
an index could be borrowed from one of obtaining high values when the
of the other methods. items are heterogeneous (e.g., with
only a few items of widely differing
ANSWER PATTERN METHODS difficulties). Guttman (1947b) re-
The Guttman procedure (1944) is plied to criticism claiming that such
the most important of the answer factors as the number of answer cate-
pattern methods and is the only one gories and the range of difficulty were
discussed here. Some earlier writings taken into account before calculating
by Walker (1931, 1936, 1940) and the coefficient of reproducibility.
Ferguson (1941) have the first ex- Guttman does not give explicit rules
plicit discussions of the relationship but improvements to the reproduci-
between answer pattern and other bility coefficient have been suggested
test characteristics but no sugges- by Jackson (1949) and Green (1954)
tions for test construction were made. which overcome some of the problems.
The answer pattern procedure con- The reproducibility coefficient,
sists essentially of inspecting the however modified, does not permit of
answer pattern and removing items a distinction being made between
so that the remaining items have random and systematic scale dis-
patterns which are as near as possible crepancies. Guttman claims (1950a,
to those of Table 1. It is clear that 1950b, 1950c) that the distinction
for infallible items this procedure may be made by examining the pat-
could be easily carried out and that a terns of scale discrepancies and pre-
simple inspection of the answer pat- sents tables (p. 161) which purport to
terns would provide a clearcut crite- represent scale patterns for a perfect
rion of unidimensionality. For items scale, a scale with random error, and
which exhibit slight departures from a scale with systematic error. Evi-
unidimensionality the procedure dence for random error in an item is
would be to eliminate items until the said to be provided when scale errors
closest possible approximation con- are distributed randomly around the
128 JAMES LUMSDEN
cutting point for the item; evidence unidimensional tests. The idea is
for systematic error when the scale sufficiently obvious to be thought at
errors are grouped in a systematic least implicit in the writings of
fashion. While this claim is undoubt- Spearman, Thurstone (1947), and
edly correct (such systematic group- other early factorists. The factor
ings are the basis of all the statistical analyses of test items by McNemar
analyses proposed for the problem) it (1942), Burt and John (1943), and
is difficult to see how these groupings others clearly suggest it. Papers on
may be discovered by inspection and related topics by Ferguson (1941),
distinguished from random errors Wherry and Gaylord (1943, 1944),
when the random errors are fairly Carroll (1945), and Loevinger (1948)
large. discuss with varying degrees of com-
Guttman (1950b) has explicitly de- pleteness the possibility of factor
nied any intention to use scale anal- analyzing items in test construction.
ysis for the selection of items. His Under restrictions which appear
scalogram was designed merely to plausible for ability test items it is
discover approximate cutting points easy to show (vide Lumsden, 1957)
for attitude scale items. Guttman in- that for a unidimensional test the
deed claims that the task of scale matrix of tetrachoric item intercorre-
analysis is to discover scales rather lations is of unit rank. One factor
than to construct them and states analytic procedure for constructing
that if a universe of attributes is unidimensional tests is to extract a
scalable then any subset of items single factor from the item intercor-
from that universe is scalable. Item relations, cull out the items which
culling is by this argument unneces- have large residuals, reanalyze, and
sary. The difficulty is that a test continue until a satisfactory fit to a
constructor (or discoverer) does not single factor solution is obtained.
know precisely what "universe of Wolfle (1940) in a well-known jibe at
attributes" he is sampling. Without Brown and Stephenson (1933) said:
precise definition he may sample a "if one removes all tetrad differences
number of related universes. Item which do not satisfy the criterion,
culling procedures are designed to the remaining ones do satisfy it" (p.
distinguish between groups of items 9). That is exactly what is done in
selected from different universes. this factor analytic technique of con-
It may be seen then that the an- structing unidimensional tests. The
swer pattern method provides no difference between the two situations
rational culling plan for use with is, of course, that Brown and Stephen-
fallible items. The index of unidi- son had asserted that their tests, all
mensionality provided by the plan is of them, would meet the tetrad differ-
the coefficient of reproducibility ence criterion, while here it is merely
which, despite improvements on the hoped that a subset of items will meet
early Guttman form, does not dis- the criterion.
tinguish between systematic and The procedure is quite simple. But
random error. is the culling procedure rational?
Will the set of items converge to uni-
FACTOR ANALYSIS dimensionality?
It is difficult to give due credit to It is evident that convergence of
whoever first suggested the use of the factor analytic procedure to a
factor analysis in the construction of unidimensional subset of items can-
CONSTRUCTION OF UNIDIMENSIONAL TESTS 129
not be guaranteed. If the unidimen- be the procedure advocated by
sional set is much less numerous than Cattell (1957) for his factor homo-
the heterogeneous items in the pool geneous scale except that he would
then it is probable that the unidi- require the additional restriction that
mensional set will not have sufficient the factor have significance in a more
influence on the nature of the first general factor space than that pro-
factor extracted to prevent the oc- vided by the item intercorrelations.
currence of large residuals among the The complete centroid procedure with
unidimensional set. These items will rotation could indeed be used with-
be discarded first and the procedure out further analysis except that the
will not converge to a single factor problems of estimating communali-
solution. ties and determining goodness of fit
If, however, the items are carefully are more complicated than for the
preselected on empirical and a priori unit rank case.
grounds, it seems likely that the state The criterion of unidimensionality
of affairs of the preceding paragraph suggested for item culling is the size
will not occur. If items are deliber- of the residuals. This must be con-
ately made parallel or if there is evi- sidered with relation to the sampling
dence for parallelism then it would distribution of residuals. Unfortu-
follow that the dimension of any nately there is no exact solution to
unidimensional test and the dimen- this problem. Many methods have
sions of the heterogeneous items in been suggested (Cattell, 1952) but
the total pool, will normally be highly none can be regarded as satisfactory.
correlated. In this circumstance the A reasonable solution for test con-
influence of the unidimensional set on struction purposes would be to use
the first factor extracted may well be one of the simpler procedures (e.g.,
greater than the actual numbers of standard error of average r) and ap-
items suggest, and the method may ply it rather severely. Increased
therefore be expected to converge. availability of automatic computing
The procedure of preselecting will services may permit the use of maxi-
also tend to increase the size of the mum likelihood methods of factoriz-
unidimensional set in the pool and ing which provide a test for rank.
this will also increase the probability An index of unidimensionality ap-
of convergence. propriate to the method is the ratio
Lumsden (1959) found that four of first factor variance to total bipolar
subsets of number series items se- factor variance after a complete
lected on a priori grounds converged centroid analysis with subjects who
rapidly and that three of them met a were not used for item selection. In
fairly stringent test of unidimen- most cases the ratio of first to second
sionality when cross-validated with a factor variance would seem to give a
fresh group of subjects. reasonably useful index. This index
One procedure that should almost has no fixed maximum value and
guarantee convergence (if a sizable little is known about the extent to
unidimensional set exists) is to carry which it may be affected by errors of
out a preliminary complete centroid sampling or of measurement.
analysis and then to select for further
analysis those items which appear in DISCUSSION
narrow strips (i.e., roughly co-linear) It seems clear that none of the
in the factor space. This appears to methods examined can be regarded as
130 JAMES LUMSDEN
satisfying all three of the main criteria. literature of the problem. Great
Only factor analysis appears to offer advances appear unlikely unless the
a rational procedure for item selec- development of criteria and indices
tion. The criteria and indices of uni- of unidimensionality is closely re-
dimensionality are unsatisfactory for lated to item selection procedures.
all methods.
This review has considered each of SUMMARY
the methods as if they were complete,
self-consistent creations of a single Five methods of constructing uni-
writer. With the exception of the dimensional tests (classical item anal-
Guttman answer pattern method and ysis, Loevinger's procedure, the in-
the Loevinger method this is not so. dependence criterion method, the
The various "natural" criteria and answer pattern method, and factor
indices suggested for each of the analysis) have been considered with
methods are not necessary conse- respect to their provision for: a ra-
quences of the choice of item selection tional procedure for item selection,
method. Combinations of different a criterion of unidimensionality, and
elements from different methods are an index of unidimensionality.
possible and this circumstance justi- It has been argued that only factor
fies a modified optimism. Thus a analysis provides a rational procedure
modification of the coefficient of re- for item selection. No method has a
producibility which produced an ac- fully satisfactory criterion of unidi-
ceptable index of unidimensionality mensionality. The index of unidi-
would not be cogent evidence for mensionality suggested for the factor
adopting an answer pattern method analytic method is the ratio of first
but would greatly improve all meth- to second factor variance. This
ods. suffers from the absence of any knowl-
Greatest emphasis has been deliber- edge of sampling fluctuations, but
ately placed on item selection ra- this weakness is shared by the only
tionale since this topic appears to reasonable alternative, the coefficient
have been relatively neglected in the of reproducibility.
REFERENCES
BROWN, W., & STEPHENSON, W. A test of the data by "scale analysis." Psychol. Bull.,
theory of two factors. Brit. J. Psychol., 1947, 44, 146-161.
1933, 23, 352-370. GREEN, B. F. Attitude measurement. In G.
BURT, C., & JOHN, E. A factorial analysis of Lindzey (Ed.), Handbook of social psychol-
the Terman-Binet tests. Brit. J. educ. ogy. Cambridge, Mass.: Addison-Wesley,
Psychol., 1943, 12, 156-161. 1954.
CARROLL, J. B. The effect of difficulty and GUTTMAN, L. A basis for scaling qualitative
chance success on correlations between data. Amer. social. Rev., 1944, 80, 139-150.
items or between tests. Psychometrika, GUTTMAN, L. The Cornell technique for scale
1945, 10, 1-19. and intensity analysis. Educ. psychol.
CATTELL, R. B. Factor analysis. New York: Measmt. 1947,7,247-279. (a)
Harper, 1952. GUTTMAN, L. On Festinger's evaluations of
CATTELL, R. B. Personality and motivation scale analysis. Psychol. Bull., 1947, 44,
structure and measurement. New York: 451-465. (b)
World Book, 1957. GUTTMAN, L. The basis for scalogram anal-
FERGUSON, G. A. The reliability of mental ysis. In S. A. Stouffer (Ed.), Measurement
tests. Univer. London Press, 1941. and prediction. Princeton: Princeton Uni-
FESTINGER, L. The treatment of qualitative ver, Press, 1950. (a)
CONSTRUCTION OF UNIDIMENSIONAL TESTS 131
GUTTMAN, L. The problem of attitude and O'NEIL, W. M. A problem of testing a set of
opinion measurement. In S. A. Stouffer items for unidimensionality. Paper read at
(Ed.), Measurement and prediction. Prince- British Psychological Society (Australian
ton: Princeton Univer. Press, 1950. (b) Branch), Perth, 1954.
GUTTMAN, L, Relation of scalogram analysis STOUFFER, S. A., BORGATTA, E. F., HAYS,
to other techniques. In S. A. Stouffer (Ed.), D. G., & HENRY, A. F. A technique for
Measurement and prediction. Princeton: improving cumulative scores. Public opin.
Princeton Univer. Press, 1950. (c) Quart., 1952, 16, 273-291.
HORST, P. Correcting the Kuder-Richardson THORNDIKE, E. L., BERGMAN, E. D., COBB,
reliability for dispersion of item difficulties M. V., & WOODYARD, E. The measurement
Psychol. Bull., 1953, SO, 371-374. of intelligence. New York: Teachers' Col-
JACKSON, J. M. A simple and more rigorous lege, Columbia Univer., 1926,
technique for scale analysis. In, A manual THURSTONE, L. L. Multiple factor analysis.
of scale analysis. McGill University, 1949. Chicago: Univer. Chicago Press, 1947.
(Mimeo) TUCKER, L. R. A level of proficiency scale for
LAZARSFELD, P. F. The logic and mathemat- a unidimensional skill. Amer. Psychologist,
ical foundation of latent structure analysis. 1952,7, 408. (Abstract)
In S. A. Stouffer (Ed.), Measurement and WALKER, D. A. Answer-pattern and score
prediction. Princeton: Princeton Univer. scatter in tests and examinations. Brit.
Press, 1950. J. Psychol., 1931, 22, 73-86.
LOEVINGER, JANE. A systematic approach to WALKER, D. A. Answer-pattern and score
the construction and evaluation of tests of scatter in tests and examinations. Brit.
ability. Psychol. Monogr. 1947, 61(4, Whole J. Psychol, 1936, 26, 301-308.
No. 285). WALKER, D. A. Answer-pattern and score
LOEVINGER, JANE. The technic of homogene- scatter in tests and examinations. Brit. J.
ous tests compared with some aspects of Psychol., 1940, 30, 248-260.
"scale analysis" and factor analysis. Psy- WHERRY, R. J., & GAYLORD, R. H. The con-
chol. Bull., 1948,45, 507-529. cept of test and item reliability in relation
LONG, J. A. Improved overlapping methods to factor pattern. Psychometrika, 1943, 8,
for determining the validities of test items. 247-269.
J. exp. Educ., 1934, 2, 262-268. WHERRY, R. J., & GAYLORD, R. H. Factor
LORD, F. M. A theory of test scores. Psycho- pattern of test items as a function of the
metric Monogr., 1952, No. 7. correlation coefficient: Content, difficulty
LUMSDEN, J. A factorial approach to uni- and constant error factors. Psychometrika,
dimensionality. Aust. J. Psychol., 1957, 9, 1944, 9,237-244.
105-111. WHITE, B. W., & SALTZ, E. Measurement of
LUMSDEN, J. The construction of unidimen- reproducibility. Psychol. Bull., 1957, 54,
sional tests. Unpublished master's thesis, 81-99.
University of Western Australia, 1959. WOLFLE, D. Factor analysis to 1940. Psycho-
McNEMAR, Q. The revision of the Stanford- metric Monogr., 1940, No. 3.
Binet scale. Boston: Houghton Miffiin,
1942. (Received January 4, 1960)

You might also like