0% found this document useful (0 votes)

175 views11 pages

Las Once Mil Vergas - Guillaume Apollinaire

Binomial tests

Uploaded by

Alejandro García

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

175 views11 pages

Las Once Mil Vergas - Guillaume Apollinaire

Binomial tests

Uploaded by

Alejandro García

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Binomial Test Models and Item Difficulty

Wim J. van der Linden

Twente University of Technology

In choosing a binomial test model, it is impor- have characteristic functions of the Guttman type.
tant to know exactly what conditions are imposed In contrast, the stochastic conception allows non-
on item difficulty. In this paper these conditions Guttman items but requires that all characteristic
are examined for both a deterministic and a sto- functions must intersect at the same point, which
chastic conception of item responses. It appears implies equal classically defined difficulty. The
that they are more restrictive than is generally beta-binomial model assumes identical char-
understood and differ for both conceptions. When acteristic functions for both conceptions, and this
the binomial model is applied to a fixed examinee, also implies equal difficulty. Finally, the compound
the deterministic conception imposes no conditions binomial model entails no restrictions on item diffi-
on item difficulty but requires instead that all items culty.
In educational and psychological testing, binomial models are a class of models increasingly
being applied. For example, in the area of criterion-referenced measurement or mastery testing,
where tests are usually conceptualized as samples of items randomly drawn from a large pool or do-
main, binomial models are frequently used for estimating examinees’ mastery of a domain and for de-
termining sample size. Despite the agreement among several writers on the usefulness of binomial
models, opinions seem to differ on the restrictions on item difficulties implied by the models. Mill-
man (1973, 1974) noted that in applying the binomial models, items may be relatively heterogeneous
in difficulty. Wilcox (1976, 1977) adopted the same position for both the binomial model and the
beta-binomial model. Huynh (1976a) stated that it is the exchangeability of all domain items that is
automatically assumed in the binomial model, implying similarity of item difficulties; he observed
that the beta-binomial model is suitable when a separate sample of items is given to each examinee
(Huynh, 1976b, 1977). Lord and Novick (1968, chap. 23) as well as Hambleton, Swaminathan, Algina,
and Coulson (1978) made the same observation.
In another paper, Huynh (1976c) does not even mention assumptions of the beta-binomial
model. The condition of equal item difficulty for applying the beta-binomial model is also mentioned
by Mellenbergh, Koppelaar, and van der Linden (1977). No item difficulty restrictions are mentioned
by Fhan6r (1974). Kriewall (1972) first says that for a given examinee all items are of equal difficulty
by assumption (p. 6); but when giving assumptions for applying the binomial model to item sampling
APPLIED PSYCHOLOGICAL MEASUREMENT
Vol. 3, No. 3 Summer 1979 pp. 401-411
@ Copyright 1979 West Publishing Co.

Downloaded from the Digital Conservancy at the University of Minnesota, http://purl.umn.edu/93227. 401
May be reproduced with no cost by students and faculty for academic use. Non-academic reproduction
requires payment of royalties through the Copyright Clearance Center, http://www.copyright.com/
402

(pp. 10-12), he does not mention any restriction on item difficulty. Subkoviak (1976) does not impose
explicit restrictions on item difficulties but says instead that a constant probability of a correct re-
sponse across items for a fixed person is a condition for the binomial model.
This paper carefully examines the restrictions on item difficulties that must be met when bi-
nomial models are applied to domain-referenced testing. This is done for both a deterministic and a
stochastic conception of item responses. In brief, the former supposes that for a given domain an ex-
aminee responds successfully over repeated independent trials (replications) with a probability equal
to 1 for some items and to 0 for the remaining part of the domain. The stochastic conception is based
on the idea that responding to items is a stochastic process. The probabilities associated with success-
ful outcomes of this process may have values between 0 and 1. The distinction between these two con-
ceptions is justified by the finding that both lead to different restrictions regarding item difficulties.
First, however, there must be a consideration of the formal assumption of binomial models and of
some definitions and aspects of item-sampling theory.

Binomial Models and Item Sampling

Whenever the formal definition of Bernoulli trials applies to a series of experiments or trials with
the outcomes &dquo;success&dquo; and &dquo;failure,&dquo; the binomial model offers the correct probability distribution
for the number of successes, X, in a series of n trials. Bernoulli trials are defined as trials that (1) have
two possible outcomes, &dquo;success&dquo; and &dquo;failure&dquo;; (2) have a probability of success constant for all
trials; and (3) are stochastically independent. The first assumption is evident. The second and third
assumptions allow the derivation of the binomial model with only the aid of the product and sum rule
for probabilities (see, for instance, Hogg & Craig, 1972, p. 87). Denoting the probability of success at
any trial by ~l, the binomial density can be written as

When trials conform to the first and the third property, but not to the second, they are said to be
Poisson trials (Feller, 1968, p. 218). In that case, the compound binomial model offers the correct de-
scription of the number of successes, X, in a series of n trials. This probability distribution is given by
the following generating function:

where Qg =
1 - Pg and P~ denotes the probability of success at the grh trial (Lord, 1965; Lord &
Novick, 1968, p. 525).
In some applications of the binomial model, it also makes sense to consider a probability distri-
bution of the binomial parameter À. Representing the probability density of A by/(A), it follows that
the number of successes, X, is distributed according to

Downloaded from the Digital Conservancy at the University of Minnesota, http://purl.umn.edu/93227.

May be reproduced with no cost by students and faculty for academic use. Non-academic reproduction
requires payment of royalties through the Copyright Clearance Center, http://www.copyright.com/
403

Because of its flexible form and mathematical advantages, a choice is often made of the two-para-
meter beta density,

with the complete beta function in the denominator, and v > o, and w > n - 1 (Lord & Novick, 1968,
p. 520). The result obtained in this way is known as the beta-binomial model. From Equation 3 it is
clear that h(X) may be considered a mixture of independent binomial distributions, each weighted by
f (~l). Therefore, the beta-binomial model can only apply to situations in which the conditional bi-
nomial distributions P(Xln, À) are realized independently for all values of A.
In test theory, the item-sampling model is well known. In this model, score Xa of person a is con-
sidered the score on a test of n items randomly drawn from a population or domain. Sampling may be
real or hypothetical, with or without replacement, stratified or simple, and from a finite or infinite do-
main. When persons are also randomly sampled, matrix sampling is possible. The starting point for
matrix sampling is a matrix IIYgal1that arises by taking the Cartesian product of the populations of
items and persons, where Yga is a stochastic variable with possible values 1 and 0 representing the re-
sponses of person a to item g. Since Yga is random over replications for a given g and a, it is in fact the
classical test theory propensity distribution at item level (Lord & Novick, 1968, pp. 29-30). A matrix
sample is now a realization of the stochastic submatrix obtained by drawing randomly and inde-
pendently a number of rows and columns of ~~ YRQ ~~. As a consequence, the same sample of items is ad-
ministered to each person in a sample of persons. Sampling plans in which each randomly selected
person is given a separately drawn sample of items are also possible.
In item sampling theory, it is usual to define

as the person parameter of interest. In terms of the matrix ))~)), ~ is the expectation for a given per-
son or column across rows and replications. (Note that Equation 5 should, in fact, have contained two
expectation signs, one across items and the other across replications. In this paper, however, the nota-
tion proposed in Lord and Novick, 1968, pp. 34-35, has been followed and the latter has been
omitted.) It can readily be shown that ~,, may also be interpreted as the expected relative test score
across samples of items for person a. Since the items are not necessarily parallel, randomly drawn

samples from a domain are to be considered nominally parallel tests and ~,, is the relative generic true
score.
For nominally parallel measurements, it is usual to assume the following analysis of variance de-
composition :

where p is the generic true score expected across persons,

nR is the classical item difficulty,
aRQ is the person x test interaction, and
eRa is the specific measurement error (Lord &
Novick, 1968, p. 176).
This model is of importance because it demonstrates that the generic measurement error

Downloaded from the Digital Conservancy at the University of Minnesota, http://purl.umn.edu/93227.

behaves differently from the specific measurement error of the classical test model. For instance, it is
known that the generic measurement errors for two randomly drawn persons, a and b, are not inde-
pendent across nominally parallel measurements but have a covariance equal to the variance of item
difficulties

(Lord & Novick, 1968, p. 181). In terms of matrix sampling, this means that a matrix sample yields
correlated errors when used for estimating the person parameters (Equation 5), unless all samples
possess equal difficulty in the classical meaning of the word.

A Deterministic Conception of Item Responses

Consider first the assumptions regarding item difficulty implied by the binomial models for a de-
terministic conception of item responses, i.e., when it is assumed that a person produces correct an-
swers to some items of the domain with a probability equal to 1 and to the others with a probability

equal to 0. The population matrix from item-sampling theory now contains, not the stochastic vari-
ables Yga, but the deterministic values yga. Sampling from this matrix means sampling from a pool of
0’s and 1’s and is entirely equivalent to sampling from the well-known vase with red and white
marbles. (An explicit reference to this analogy can sometimes be found in the literature on mastery
testing, e.g., see Kriewall, 1972). In classical test theory, a true score is defined as the observed score
expected across replications. Since the expected value of a constant is the constant itself, the deter-
ministic conception of item responses entails the equality of observed and true item scores. There is
no measurement error (in the classical meaning of the word), and the only error possibly involved is
estimation error when using a sample of item responses to estimate the proportion of items a person
has correct in a given domain.
The idea of item responses as deterministic events, only having a probability of success equal to 1
or 0, is akin to the hypothesis of learning as an all-or-none process. According to this hypothesis, a
student is not able to produce any correct response up to a certain point in the learning process; how-
ever, having passed this point, the situation has fully changed and he/she will always produce the cor-
rect answer. In the parlance appertaining to this hypothesis, knowledge is treated as an all-or-none
state: A student who has passed the critical point in the learning process &dquo;knows&dquo; the item; the others
&dquo;do not know it.&dquo;
The deterministic view also underlies the so-called state models for mastery testing (see Besel,
1973; Dayton & Macready, 1976; Emrick, 1971; Emrick & Adams, 1969; Macready & Dayton, 1977;
Meskauskas, 1976). For instance, Macready and Dayton (1977), in formulating their models for
mastery testing, took the idea that a person is either a master with a true item score vector of 1’s or a
nonmaster with a true item score vector of 0’s. This is clearly a deterministic starting point, since true
item scores equal to 1 or 0 imply probabilities of correct responses having the same values of 1 and 0,
respectively. According to Macready and Dayton, however, occasional disturbances, such as for-
getting and guessing, prevent this true state from showing itself fully; they next adopted parameters
to correct these probabilities to make them better fit real test data.
Lord and Novick (1968, p. 236), in their discussion of matrix-sampling theory, indicated that
they did not consider the item response variable Y~ as random across replications and replaced it by
the constant Yga. Although they stated that this was done to simplify their thinking and that they did
not expect to arrive at incorrect conclusions, this amounts to replacing a stochastic view of item re-
Downloaded from the Digital Conservancy at the University of Minnesota, http://purl.umn.edu/93227.
May be reproduced with no cost by students and faculty for academic use. Non-academic reproduction
requires payment of royalties through the Copyright Clearance Center, http://www.copyright.com/
405

sponses by a deterministic view. It is the purpose of this paper, however, to show that this choice is not
without consequences and leads to different conditions for applying binomial models to domain-ref-
erenced testing or to item-sampling situations.

Latent Trait Conceptualization

Since the concern of this paper is in the implications of binomial models not only for item diffi-
culty defined according to classical test theory (i.e., as the expected item response across replications
and persons) but also for the difficulty parameter from latent trait theory, it is worth noting how the
deterministic conception entails a special form for the characteristic curves of all items of the domain.
To show this, a latent trait point of view has been adopted and the discussion has been restricted to
domains for which an underlying continuum can be assumed and the probability of a successful re-
sponse is a nondecreasing function for each item in the domain. Items of this type are sometimes
called monotonic items; it seems safe to assume them here, since nonmonotonic items are only oc-
casionally found in the area of attitude measurement and not in achievement testing.
From a latent trait point of view, a deterministic conception of item responses amounts to the
idea that these characteristic curves are degenerated to a Guttman form: Up to a certain point, the
probability of a correct response is equal to 0; thereafter, it is equal to 1. Denoting the latent con-
tinuum by 9, the Guttman item characteristic curve for item g is defined as

where bR is the point at which the curve shows its jump. In latent trait theory, bR is interpreted as the
difficulty parameter of item g. For the sake of clarity, it has not been shown that Equation 9 is always
true. What has been pointed out is that whenever a latent trait point of view is appropriate, the deter-
ministic conception of item responses must take the form of Equation 9. This is necessary in order to
analyze whether the use of binomial models entails any restrictions on the latent trait theory item dif-
ficulty parameter, bll; The results of such an analysis would have practical meaning only when the
latent trait point of view is indeed appropriate.
Some authors (e.g., Millman, 1974) have advocated that item domains need not necessarily be
homogeneous and that binomial models are excellently suited to analyze sampling from hetero-
geneous domains. Therefore, unless otherwise indicated, it will be assumed that 0 from Equation 9 is
a vector representing the complete collection of all latent variables underlying the domain. In that
case, Equation 9 shows a multidimensional generalization of a Guttman characteristic curve. Multi-
dimensional item characteristic curves are usually called item characteristic functions, and this cus-
tom will be adopted to indicate when and when not to consider the multidimensional case.

Domain Sampling for a Fixed Person

An analysis will now be made of the situation of a fixed person, a, with a latent vector 0. and
items randomly drawn with replacement from a finite domain or without replacement from an in-
finite one. For this person, the number of items for which he/she has a correct response can, in prin-
ciple, be counted; and a proportion of correct responses may be defined as

Downloaded from the Digital Conservancy at the University of Minnesota, http://purl.umn.edu/93227.

where N denotes the domain size and, in case of an infinite domain, the limit for N to infinity should
be added. Note how T. results from applying the definition of relative generic true score Equation 5 to
deterministic values Y.lla instead of to random variables Y,,,. Although the population matrix ~~YRa~~
and the parameter Ta defined on it are deterministic quantities, sampling creates a chance mechanism
generating the probability distribution of the number of l’s in a sample of size n. It will be clear that
this distribution is the binomial Equation 1 with A replaced by T., since the responses to randomly
drawn items can be considered outcomes of Bernoulli trials: There are only two possible outcomes,
and random sampling guarantees equal probabilities for these outcomes at each trial and stochastic
independence between all trials.
From this conclusion it follows that for a deterministic view of item responses, a fixed person,
and item sampling of the above type, the binomial model does not impose any restriction on the item
difficulties. The Guttman item characteristic functions may display their jump anywhere in the latent
space without invalidating the description of the item responses as outcomes of a Bernoulli process.
And the individual parameters in the difficulty vector b8, which represent the item difficulty for each
separate dimension of the complete latent space and together indicate the place where this jump oc-
curs, are not restricted in their possible values. For a fixed person and a deterministic conception, the
classical definition of item difficulty as the expected item response across replications and persons de-
generates to the expected value of the constant Yga, which is equal to yea itself. In the previous reason-
ing, no assumptions were made regarding these constants. Therefore, although this definition has a
degenerate meaning here, it may be said that applying the binomial model in the present case does
not impose any restriction on classical item difficulty either. Applying the compound binomial model
is never a requirement. The reason for applying this model to sampling deterministic responses is not
variation in item difficulties but stratified, instead of simple, random sampling.

Matrix Sampling
The case of a population of persons and a domain of items with random sampling from both, as-
suming that sampling takes the form of matrix sampling, will be treated next. The persons and items
are drawn independently, and the same test is administered to all persons. An extension of the bi-
nomial model Equation 1 to the beta-binomial model in Equations 3 and 4, with ~ replaced by T from
Equation 10, seems to be obvious. It was seen earlier, however, that the beta-binomial model only ap-
plies to situations in which the conditional distributions P(Xln. T) are realized independently for all
values of T; and this requirement is not met when matrix sampling is used. From the equivalence of T
to the relative generic true scoreS and Equations 7 and 8, it follows that the distributions

are not independent and that the beta-binomial model does not apply. Equation 8 shows that the co-
variance between any two distributions is equal to the variance of the classical item difficulties and
that this covariance only vanishes when all items of the domain have equal difficulty. For this reason,
Lord and Novick (1968, p. 524) have observed that the beta-binomial model may be applied to matrix
sampling only if all items in the domain are of equal difficulty. It is important, however, to add that
this is only a necessary condition for applying the beta-binomial model. A covariance equal to zero by
no means implies that the distributions are independent. Note, too, that the combination of equal

Downloaded from the Digital Conservancy at the University of Minnesota, http://purl.umn.edu/93227.

item difficulty (in the classicalmeaning of the word) and Guttman characteristic functions imposes a
restriction on the difficulty parameters of the latter as well.
In order for a domain of items with these characteristic functions to yield equal classical difficul-
ties, the following inequality,

must hold for g =

1, 2, N, 8a and 8b being, respectively, the largest person vector below and the
...,

smallest vector not below the difficulty vector of any item g from the domain. In practice, it follows
from Equation 12 that all item characteristic functions must be identical when the beta-binomial
model is applied to a large population of examinees or to a small population with a slight dispersion
in 0.
The necessary condition in applying the beta-binomial model to matrix sampling-that all items
must possess (approximately) identical characteristic functions-is stringent and in practice will
never be met. There is a sampling plan, however, that deviates somewhat from the idea of matrix

sampling but offers in principle the same information and permits the application of the beta-bi-
nomial model without this condition, namely, independence between the distributions of Equation 11I
is guaranteed when a separately drawn sample of items is administered to each person in the sample
(see Lord & Novick, 1968, p. 524). It is therefore possible to avoid the necessity of fulfilling conditions
that in practice can never be met by using this relatively simple experimental procedure.

A Stochastic Conception of Item Responses

Suppose that a stochastic conception of item responses (i.e., for a given person and item) is now
adopted. Item responses are seen as the outcomes of a stochastic process dependent upon several per-
son and item characteristics. The probability of a correct response is not restricted to the possible
values 0 and 1 but may adopt every value on the (real) interval from 0 to 1. Earlier, it was seen that the
deterministic conception is akin to the view of learning as an all-or-none process and knowledge as an
all-or-none state. The stochastic conception, on the contrary, seems to be allied to the view that learn-
ing is a process in which a student improves his/her knowledge gradually. It is not so that a student
either &dquo;knows&dquo; or &dquo;does not know&dquo; the items; but according to this view, it seems more natural to
conceive of knowledge as a continuum underlying the item responses on which a student can take sev-
eral positions, which represents the amount of mastery the student possesses with regard to the cogni-
tive skills needed for solving the items and influences his/her probability of a successful response.
In terms of matrix-sampling theory, this means that the population matrix is no longer viewed as
a deterministic matrix with cells (a, g) containing one element of {0, 1} but as a stochastic matrix of
which the cells contain probability distributions on {O, 1} or, equivalently, probabilities Pg(018a) and
P~(lj8J with possible values in the interval {0, 1} and PR(1 ~9a) 1 - P~(0)0J. As a consequence,
=

sampling from this matrix is to be seen, not as sampling correct responses, but as probabilities of cor-
rect responses. The item characteristic functions are not necessarily the Guttman type but may adopt
any monotonic increasing form. For the sequel, it is superfluous to assume a model to explain the
probability Pg(110a). One of the logistic or normal ogive models could illustrate the unidimensional
case; and one of the multidimensional generalizations, the multidimensional case, but the conclu-
sions apply to any model that describes the probability of success as a function of the complete latent
space.
Downloaded from the Digital Conservancy at the University of Minnesota, http://purl.umn.edu/93227.
May be reproduced with no cost by students and faculty for academic use. Non-academic reproduction
requires payment of royalties through the Copyright Clearance Center, http://www.copyright.com/
408

Domain Sampling for a Fixed Person

First, consider the case in which randomly drawn items are administered to a fixed person with
vector 0.. Under the deterministic conception, the population matrix degenerated to a matrix with 1’s
and 0’s representing the items which the student will always have correct (items he/she &dquo;knows&dquo;) and
not correct (items he/she &dquo;does not know&dquo;), respectively, over replications. It was possible to count
the items of the former type and define a proportion of correct responses, as has been done in Equa-
tion 10. This can not be repeated for the stochastic conception, inasmuch as the population matrix
now contains probability distributions and there are no correct responses which can be counted.

Therefore, it is meaningless to define for this person a proportion of items he/she &dquo;knows.&dquo; It makes
sense, however, to use the probability distributions in the population matrix to introduce instead an
expected proportion of items v~, responded to correctly when the entire domain is administered to ex-
aminee a:

(Again, the limit for N to infinity must be added when the domain is infinite). Equation 13 defines the
proportion correct true score (Lord & Novick, 1968, chap. 23) for the whole population matrix, which
is not surprising, since the proportion correct true score is an expected proportion correct according
to the classical true score definition.
A second conspicuous difference from the deterministic conception is that simple random
sampling of items is a superfluous chance mechanism, because item responses are already stochastic
events. In addition, the probability that a certain item is successfully responded to by a person is
equal to the joint probability of drawing this item and a successful response to it. Since the probabil-
ity of being drawn is equal for all items in simple random sampling, it is a scale factor and can be ex-
cluded from consideration.
In order to be allowed to consider the responses to randomly drawn items for a person with vec-
tor 0. as outcomes of a series of Bernoulli trials, all items of the domain must possess equal success
probabilities, that is,

or using Equation 13,

for g = 1, 2, ..., N.
Since a series of Bernoulli trials is a necessary and sufficient condition for the binomial distribu-
tion, the number of successes, X, in a sample of size n only follows a binomial distribution, which can
be written as

Downloaded from the Digital Conservancy at the University of Minnesota, http://purl.umn.edu/93227.

if the restriction formulated in Equations 14 and 15 is met. This restriction says that for a fixed ex-
aminee with vector 0., all item characteristic functions of the domain must intersect each other for 0
=6L
Assume for a moment that 0 may be considered unidimensional and analyze the consequences
for the difficulty parameters of the logistic models known from latent trait theory. It is clear that
Equations 14 and 15 involve the restriction that all difficulty parameters of the Rasch model must be
equal. According to the Rasch model, the logistic curves can only vary in location; and to meet the
condition that they intersect each other for 0 9a, they must all have the same location or value for
=

their difficulty parameter. The situation differs, however, for the two-parameter Birnbaum model.
According to this model, the logistic curves can vary both in location and slope (or discrimination),
and these curves have identical forms only if the restriction of not one but two intersections is im-
posed. Therefore, applying the binomial simultaneously to persons with scores 0. and 8b implies that
for the two-parameter Birnbaum model, all items must have equal values not only for their difficulty
parameters but for their discrimination parameters as well. An example of this application is found in
mastery testing with an indifference zone (Fhaner. 1974; Kriewall, 1972; Wilcox, 1976), where opti-
mal cutting scores and test length are determined by analyzing simultaneously the binomially dis-
tributed measurement errors for persons with a minimum mastery level va v(O,,) and a maximum =

nonmastery level vb v(9&). The same analysis shows that in order to apply the binomial model simul-
=

taneously to three persons with different latent scores, the item characteristics of the three-parameter
logistic model must be identical.
Since the classical definition of item difficulty boils down to the probability Pg(1I 8a) for a fixed
person and a stochastic conception of item responses, it follows from Equation 14 that applying the
binomial model in this case requires all items of the domain to be of the same classical difficulty. This
is different from the deterministic conception, for which no assumptions of classical item difficulty
are involved.

Matrix Sampling

Finally, suppose that persons are randomly sampled and that the question of the assumptions of
item difficulty implied by the beta-binomial model is raised. Since the conditions for applying the bi-
nomial model are necessary conditions for applying the beta-binomial model, the conditions in-
dicated above must be met. An extra requirement is now that Equation 14 must be in force not only
for one point 0 0. but for all points of the latent space 0. All item characteristic functions must,
=

therefore, intersect each other for all points of 0. This implies that all item characteristic functions
must be identical, regardless of the number of dimensions that make 0 complete or the model that
may be adopted to describe these functions.
Since identical characteristic functions mean that the items are completely equivalent, the beta-
binomial model implies that all items have equal difficulty, according to the classical, or any other,
definition. In this case, a stringent condition for applying a binomial model is again encountered.
Note that now the necessity of fulfilling this condition cannot be avoided by leaving matrix sampling
and the administering of a separately drawn sample of items to each person. The reason is that the
equality of the item characteristic functions follows from the requirement that each person must have
the same probability of success for each item, and this cannot be reached by changing the sampling
plan. The compound binomial model allows items to vary in their probability of success for each per-
son, and it seems wise to use this less stringent model for the conditional part in Equation 3 when con-
fronted with a domain of unequal item characteristic functions.
Downloaded from the Digital Conservancy at the University of Minnesota, http://purl.umn.edu/93227.
May be reproduced with no cost by students and faculty for academic use. Non-academic reproduction
requires payment of royalties through the Copyright Clearance Center, http://www.copyright.com/
410

Discussion

Although binomial models are widely used for solving testing problems, the item difficulty as-
sumptions are not generally understood. The argument of this paper has shown that binomial models
involve rather strong assumptions, which may be summarized as follows:

1. Binomial Model (One-Person Case). Adopting the deterministic conception, it is not required
that all items of the domain have equal difficulties when using both the classical and the latent
trait theoretic definition of this parameter. The stochastic conception, however, entails the condi-
tion of equal values for the classical, as well as the Rasch, item difficulty parameter.
2. Beta-Binomial Model (Group- of-Persons Case). If the sample of items is administered to all per-
sons in the sample, either conception involves equal item difficulty for both the classical and
latent trait definitions. However, if the deterministic conception is adopted, this condition can be
avoided by giving separate samples of items to each person, whereas it can not if the stochastic
conception is adopted.
It is clear that the most important difference is between a deterministic and a stochastic concep-
tion of item responses. Both lead to differences in the condition under which binomial models can be
applied. The present writer believes that the stochastic view has greater validity than the deterministic
view. As indicated earlier, the latter is akin to the conception of learning as an all-or-none process and
to state-models for mastery testing, which has been criticized in another paper (van der Linden, 1978).
Here, it suffices to say that delusions, fatigue, fluctuations in intellectual capacity and attention,
reading errors, slips of the pen, guessing, and the like compel a conception of item responses as out-
comes of a stochastic process. This idea is present in classical test theory when it uses the so-called
propensity distributions to define true score. Only in modem test theory is it thoroughly explored:
Latent trait theory conceives of item responses as stochastic events and, in fact, applies the propensity
distribution at the item level. It also provides models that can be used for explaining the distributions.
As indicated earlier, defining a proportion of the domain an examinee knows does not make
sense for the stochastic conception of item responses. In the literature about criterion-referenced
measurement, it is, however, usual to consider this proportion as a typical criterion-referenced
measure and to place it opposite norm-referenced measures like percentile scores (for a classical

paper, see Ebel, 1962). Defining a proportion instead of an expected proportion corresponds to the
idea that sampling from a domain of items is equivalent to sampling from a vase with red and white
marbles. The proportion of items an examinee knows is comparable with the proportion of red
marbles that vase contains and can be estimated accordingly. As noted above, unlike the color of
marbles, item responses are not deterministic events; a response to an item is therefore not com-
parable with the color of a marble. The important difference is that the former is liable to all kinds of
stochastic influences, whereas the latter is free of this. Consequently, a distinction between a propor-
tion, which could in principle be observed when the student responds to the total domain, and a true
or expected proportion must be made. Only the latter is the person parameter in which criterion-ref-
erenced measurement should be interested.
The robustness of the binomial models with respect to assumptions of item difficulty has not
been considered in this paper. It is conceivable that the conditions for applying binomial models are,
in practice, less strong, because numerical results are relatively independent of the degree to which
these conditions are violated. Analysis of robustness should further clarify this point.

Downloaded from the Digital Conservancy at the University of Minnesota, http://purl.umn.edu/93227.

References

Besel, R. Using group performance to interpret indi- Lord, F. M., & Novick, M. R. Statistical theories of
vidual responses to criterion-referenced tests. mental test scores. Reading, MA: Addison-
Paper presented at the annual meeting of the Wesley, 1968.
American Educational Research Association, New Macready, G. B., & Dayton, C. M. The use of prob-
Orleans, LA, March 1973. (EDRS No. ED 076 abilistic models in the assessment of mastery.
658) . 99-120.
Journal of Educational Statistics, 1977,
2
Dayton, C. M., & Macready, G. B. A probabilistic Mellenbergh, G. J., Koppelaar, H., & van der Linden,
model for validation of behavioral hierarchies. W. J. Dichotomous decisions based on dichoto-
Psychometrika, 1976,
41, 189-204. mously scored items: A case study. Statistica
Ebel, R. L. Content standard test scores. Educational 1, 161-169.
Neerlandica, 1977,
3
and Psychological Measurement, 1962,
, 15-25.
22 Meskauskas, J. A. Evaluation models for criterion-
Emrick, J. A. An evaluation model for mastery test- referenced testing: Views regarding mastery and
ing. Journal of Educational Measurement, 1971, standard-setting. Review of Educational Re-
, 321-326.
8 search, 1976,
46, 133-158.
Emrick, J. A., & Adams, E. N. An evaluation model Millman, J. Passing scores and test lengths for do-
for individualized instruction (Report RC 2674). main-referenced measures. Review of Educational
Yorktown Hts., NY: IBM, Thomas J. Watson Re- Research, 1973, 43, 205-216.
search Center, October 1969. Millman, J. Criterion-referenced measurement. In
Feller, W. An introduction to probability theory and W. J. Popham (Ed.), Evaluation in education.
its applications (Vol. 1). New York: John Wiley & Berkely, CA: McCutchan, 1974.
Sons, Inc., 1968. Subkoviak, M. J. Estimating reliability from a single
Fhanér, S. Item sampling and decision-making in administration of a criterion-referenced test.
achievement testing. British Journal of Mathe- Journal of Educational Measurement, 1976, 13,
matical and Statistical Psychology, 1974, 27, 265-276.
172-175. van der Linden, W. J. Forgetting, guessing, and
Hambleton, R. K., Swaminathan, H., Algina, J., & mastery: The Macready and Dayton models re-
Coulson, D. B. Criterion-referenced testing and visited and compared with a latent trait approach.
measurement: A review of technical issues and de- Journal of Educational Statistics. 1978, 3, 305-
velopments. Review of Educational Research, 318.
. 1-47.
40
1978, Wilcox, R. R. A note on the length and passing score
Hogg, R. V., & Craig, A. T. Introduction to mathe- of a mastery test. Journal of Educational Statis-
matical statistics. New York: MacMillan, 1972. , 359-364.
tics, 1976,
1
Huynh, H. On the reliability of decisions in domain- Wilcox, R. R. Estimating the likelihood of false-posi-
referenced testing. Journal of Educational Mea- tive and false-negative decisions in mastery test-
surement, 1976, 13. 263-264. (a) ing : An empirical Bayes approach. Journal of
Huynh, H. Statistical consideration of mastery scores. Educational Statistics, 1977,
, 289-307.
2
, 65-79. (b)
Psychometrika, 1976,
41
Huynh, H. On mastery scores and efficiency of cri- Acknowledgment
terion-referenced tests when losses are partially
known. Paper presented at the annual meeting of Thanks are due to Fred N. Kerlinger, Gideon J.
the American Educational Research Association, Mellenbergh, and both reviewers for their most use-
San Francisco, April 1976. (c) ful comments and to Betsy Becker,for purifying some
Huynh, H. Two simple classes of mastery scores parts of the English text.
based on the beta-binomial model. Psycho-
metrika, 1977,
, 601-608.
42 Author’s Address
Kriewal, T. E. Aspects and applications of criterion-
referenced tests. Illinois School Research, 1972, 9
. Send requests for reprints or further information to
5-18. Wim J. van der Linden, Onderafdeling Toegepaste
Lord, F. M. A strong true-score theory, with applica- Onderwijskunde, T. H. Twente, Postbus 217, 7500
tions. Psychometrika, 1965,
, 239-270.
30 AE Enschede, The Netherlands.

Downloaded from the Digital Conservancy at the University of Minnesota, http://purl.umn.edu/93227.

Joe Metheny: The Cannibal Killer Story
No ratings yet
Joe Metheny: The Cannibal Killer Story
6 pages
Test Analysis
No ratings yet
Test Analysis
67 pages
Understanding Probability and Statistics Concepts
No ratings yet
Understanding Probability and Statistics Concepts
108 pages
Beta-Binomial Model for Small Area Estimation
No ratings yet
Beta-Binomial Model for Small Area Estimation
12 pages
Exam Question Evaluation With Item Response Theory: Evert-Jan - Bakker@wur - NL
No ratings yet
Exam Question Evaluation With Item Response Theory: Evert-Jan - Bakker@wur - NL
4 pages
Eco253 Summary 08024665051
No ratings yet
Eco253 Summary 08024665051
8 pages
P299 Module 8 Notes
No ratings yet
P299 Module 8 Notes
8 pages
Discrete Data Models and Distributions
No ratings yet
Discrete Data Models and Distributions
42 pages
Statistics and Probability
No ratings yet
Statistics and Probability
12 pages
Categorical Data Analysis Methods
No ratings yet
Categorical Data Analysis Methods
11 pages
Statistics
No ratings yet
Statistics
19 pages
Elementary Statistics Key Concepts Guide
No ratings yet
Elementary Statistics Key Concepts Guide
9 pages
Robert v. Hogg, Allen T. Craig - Introduction To M
No ratings yet
Robert v. Hogg, Allen T. Craig - Introduction To M
448 pages
Psychology Research Methods Explained
No ratings yet
Psychology Research Methods Explained
10 pages
Understanding Probability Distributions
No ratings yet
Understanding Probability Distributions
45 pages
Statistical Terms and Definitions Guide
No ratings yet
Statistical Terms and Definitions Guide
7 pages
STA301 SHORT NOTES (23 To 45) Final Term by JUNAID
100% (2)
STA301 SHORT NOTES (23 To 45) Final Term by JUNAID
16 pages
20201231171859D4978 - Psikometri 3
No ratings yet
20201231171859D4978 - Psikometri 3
14 pages
Item Analysis Accounting for Individual Differences
No ratings yet
Item Analysis Accounting for Individual Differences
9 pages
Basic Concepts in Item and Test Anaysis
No ratings yet
Basic Concepts in Item and Test Anaysis
11 pages
Questionnaire 2
No ratings yet
Questionnaire 2
5 pages
High School Exam Analysis Using CTT
No ratings yet
High School Exam Analysis Using CTT
7 pages
Beta-Binomial Distribution - Wikipedia
No ratings yet
Beta-Binomial Distribution - Wikipedia
25 pages
Probability and Statistics Concepts Explained
No ratings yet
Probability and Statistics Concepts Explained
19 pages
AP Statistics Vocabulary Terms Guide
No ratings yet
AP Statistics Vocabulary Terms Guide
5 pages
Introduction to Probability Concepts
No ratings yet
Introduction to Probability Concepts
28 pages
Introduction To Probability
No ratings yet
Introduction To Probability
66 pages
STATISTICS PROBABILITY 11 3rd Quarter Exam
100% (1)
STATISTICS PROBABILITY 11 3rd Quarter Exam
4 pages
Probability Distributions Sampling Distribution
No ratings yet
Probability Distributions Sampling Distribution
13 pages
Statistical MCQs for Data Analysis
No ratings yet
Statistical MCQs for Data Analysis
10 pages
Point Biserial Correlation in Item Analysis
No ratings yet
Point Biserial Correlation in Item Analysis
4 pages
Statistics Glossary
No ratings yet
Statistics Glossary
7 pages
Categorical Data Analysis Guide
No ratings yet
Categorical Data Analysis Guide
96 pages
Probability Theory and Statistics Overview
No ratings yet
Probability Theory and Statistics Overview
130 pages
Understanding Median and Statistical Concepts
No ratings yet
Understanding Median and Statistical Concepts
6 pages
π in Binomial Distribution Explained
No ratings yet
π in Binomial Distribution Explained
2 pages
Stat Part1
No ratings yet
Stat Part1
19 pages
Understanding Sampling and Experimental Design
No ratings yet
Understanding Sampling and Experimental Design
8 pages
Item Ananlyis
No ratings yet
Item Ananlyis
7 pages
Different Types of Distributions
No ratings yet
Different Types of Distributions
12 pages
Psychophysics. Irt
No ratings yet
Psychophysics. Irt
5 pages
MATH 1530 Statistics Course Overview
No ratings yet
MATH 1530 Statistics Course Overview
10 pages
Generalized Linear Models Course Overview
No ratings yet
Generalized Linear Models Course Overview
43 pages
Item Analysis Techniques for Assessments
No ratings yet
Item Analysis Techniques for Assessments
59 pages
Categorical Slide2024
No ratings yet
Categorical Slide2024
189 pages
Understanding Item Analysis in Testing
No ratings yet
Understanding Item Analysis in Testing
16 pages
Statistics Success in 20 Min A Day
No ratings yet
Statistics Success in 20 Min A Day
221 pages
Understanding the Rasch Model in Measurement
No ratings yet
Understanding the Rasch Model in Measurement
17 pages
Understanding Test Norms and Sampling
No ratings yet
Understanding Test Norms and Sampling
31 pages
Understanding Probability Concepts
No ratings yet
Understanding Probability Concepts
10 pages
Biostatistics Quiz for Students
100% (5)
Biostatistics Quiz for Students
33 pages
Rasch Model: Psychometric Analysis
No ratings yet
Rasch Model: Psychometric Analysis
8 pages
The Role of Probability - Boston
No ratings yet
The Role of Probability - Boston
39 pages
Stats
No ratings yet
Stats
9 pages
ST 260 Exam 1 (A) Solutions - Spring 2017
No ratings yet
ST 260 Exam 1 (A) Solutions - Spring 2017
8 pages
Chapter6 3
No ratings yet
Chapter6 3
26 pages
Dirichlet Conjugate Priors Explained
No ratings yet
Dirichlet Conjugate Priors Explained
71 pages
Statistics Quiz: Variables and Probability
No ratings yet
Statistics Quiz: Variables and Probability
16 pages
Understanding Probability Concepts in Statistics
No ratings yet
Understanding Probability Concepts in Statistics
17 pages
Fraenkel and Wallen
No ratings yet
Fraenkel and Wallen
8 pages
Buridan's Ass: Free Will vs. Determinism
100% (1)
Buridan's Ass: Free Will vs. Determinism
2 pages
Introduction To The Philosophy of The Human Person
No ratings yet
Introduction To The Philosophy of The Human Person
23 pages
A STUDY OF HUMAN MOTIVES IN THE LIGHT OF THE HOLY QUR'ĀN (A Psychoanalytical Perspective)
No ratings yet
A STUDY OF HUMAN MOTIVES IN THE LIGHT OF THE HOLY QUR'ĀN (A Psychoanalytical Perspective)
356 pages
The Presence of The Past PDF
No ratings yet
The Presence of The Past PDF
235 pages
Nadi Astrology
No ratings yet
Nadi Astrology
25 pages
07 Peter Allen - Cities and Regions As Evolutionary Complex Systems
No ratings yet
07 Peter Allen - Cities and Regions As Evolutionary Complex Systems
25 pages
Dimensionality and Dynamics in C. Elegans Behavior
No ratings yet
Dimensionality and Dynamics in C. Elegans Behavior
10 pages
Math Modeling for BSEd Students
No ratings yet
Math Modeling for BSEd Students
3 pages
Philosophical Insights on Disability and Society
100% (1)
Philosophical Insights on Disability and Society
12 pages
1997 - Ragin - Turning Tables
No ratings yet
1997 - Ragin - Turning Tables
11 pages
Determinism
No ratings yet
Determinism
3 pages
Decision Theory Under Uncertainty
No ratings yet
Decision Theory Under Uncertainty
280 pages
SAD Method vs. Pattern Analysis in Design
No ratings yet
SAD Method vs. Pattern Analysis in Design
5 pages
Free Will and Determinism
100% (2)
Free Will and Determinism
41 pages
The Challenge of Chance A Multidisciplinary Approach From Science and The Humanities 1st Edition Klaas Landsman
No ratings yet
The Challenge of Chance A Multidisciplinary Approach From Science and The Humanities 1st Edition Klaas Landsman
390 pages
Understanding Entropy's Interpretations
No ratings yet
Understanding Entropy's Interpretations
29 pages
Killing Laplace's Demon
No ratings yet
Killing Laplace's Demon
22 pages
Clouds Clocks and The Study of Politics Almond
No ratings yet
Clouds Clocks and The Study of Politics Almond
35 pages
Cambridge International As & AL Complete Psychology - Nodrm
No ratings yet
Cambridge International As & AL Complete Psychology - Nodrm
96 pages
AQA Philosophy Syllabus
100% (1)
AQA Philosophy Syllabus
30 pages
Edgar Morin - 7 Complex Lessons in Education For The Future
No ratings yet
Edgar Morin - 7 Complex Lessons in Education For The Future
91 pages
A Review of Recursive Methods in Economic Dynamics
No ratings yet
A Review of Recursive Methods in Economic Dynamics
10 pages
Jensen Et Al. Statistics For Petroleum Engineers and Geoscientists (1997)
100% (10)
Jensen Et Al. Statistics For Petroleum Engineers and Geoscientists (1997)
413 pages
Geography and Environmental Determinism
No ratings yet
Geography and Environmental Determinism
12 pages
Professor Gaitonde's Quantum Adventure
No ratings yet
Professor Gaitonde's Quantum Adventure
3 pages
Richard Swinburne - Natural Evil
No ratings yet
Richard Swinburne - Natural Evil
8 pages
What Is Noise?: John A. Scales and Roel Sniedert
No ratings yet
What Is Noise?: John A. Scales and Roel Sniedert
3 pages
Rebutting Naturalism in Cosmological Argument
No ratings yet
Rebutting Naturalism in Cosmological Argument
14 pages
Free Will: Compatibilism vs. Incompatibilism
100% (1)
Free Will: Compatibilism vs. Incompatibilism
9 pages
Lee, Faircloth, Macvarish 2014
No ratings yet
Lee, Faircloth, Macvarish 2014
267 pages

Las Once Mil Vergas - Guillaume Apollinaire

Uploaded by

Las Once Mil Vergas - Guillaume Apollinaire

Uploaded by

Binomial Test Models and Item Difficulty

Wim J. van der Linden

Binomial Models and Item Sampling

Downloaded from the Digital Conservancy at the University of Minnesota, http://purl.umn.edu/93227.

where p is the generic true score expected across persons,

Downloaded from the Digital Conservancy at the University of Minnesota, http://purl.umn.edu/93227.

A Deterministic Conception of Item Responses

Latent Trait Conceptualization

Domain Sampling for a Fixed Person

Downloaded from the Digital Conservancy at the University of Minnesota, http://purl.umn.edu/93227.

Downloaded from the Digital Conservancy at the University of Minnesota, http://purl.umn.edu/93227.

must hold for g =

A Stochastic Conception of Item Responses

Domain Sampling for a Fixed Person

or using Equation 13,

Downloaded from the Digital Conservancy at the University of Minnesota, http://purl.umn.edu/93227.

Downloaded from the Digital Conservancy at the University of Minnesota, http://purl.umn.edu/93227.

Downloaded from the Digital Conservancy at the University of Minnesota, http://purl.umn.edu/93227.

You might also like