Scientific Studies of Reading: To Cite This Article: Janice M. Keenan, Rebecca S. Betjemann & Richard K. Olson
Scientific Studies of Reading: To Cite This Article: Janice M. Keenan, Rebecca S. Betjemann & Richard K. Olson
27]
On: 02 October 2013, At: 09:23
Publisher: Routledge
Informa Ltd Registered in England and Wales Registered Number: 1072954
Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH,
UK
Reading Comprehension
Tests Vary in the Skills
They Assess: Differential
Dependence on Decoding and
Oral Comprehension
a b
Janice M. Keenan , Rebecca S. Betjemann &
c
Richard K. Olson
a
Psychology Department, University of Denver
b
Institute for Behavioral Genetics, University of
Colorado
c
Department of Psychology, University of Colorado
Published online: 25 Jul 2008.
To cite this article: Janice M. Keenan , Rebecca S. Betjemann & Richard K. Olson
(2008) Reading Comprehension Tests Vary in the Skills They Assess: Differential
Dependence on Decoding and Oral Comprehension, Scientific Studies of Reading, 12:3,
281-300, DOI: 10.1080/10888430802132279
Taylor & Francis makes every effort to ensure the accuracy of all the
information (the “Content”) contained in the publications on our platform.
However, Taylor & Francis, our agents, and our licensors make no
representations or warranties whatsoever as to the accuracy, completeness,
or suitability for any purpose of the Content. Any opinions and views
expressed in this publication are the opinions and views of the authors, and
are not the views of or endorsed by Taylor & Francis. The accuracy of the
Content should not be relied upon and should be independently verified with
primary sources of information. Taylor and Francis shall not be liable for any
losses, actions, claims, proceedings, demands, costs, expenses, damages,
and other liabilities whatsoever or howsoever caused arising directly or
indirectly in connection with, in relation to or arising out of the use of the
Content.
This article may be used for research, teaching, and private study purposes.
Any substantial or systematic reproduction, redistribution, reselling, loan,
sub-licensing, systematic supply, or distribution in any form to anyone is
expressly forbidden. Terms & Conditions of access and use can be found at
http://www.tandfonline.com/page/terms-and-conditions
Downloaded by [142.58.101.27] at 09:23 02 October 2013
SCIENTIFIC STUDIES OF READING, 12(3), 281–300
Copyright © 2008 Taylor & Francis Group, LLC
ISSN: 1088-8438 print / 1532-799X online
DOI: 10.1080/10888430802132279
Comprehension
Janice M. Keenan
Psychology Department
University of Denver
Rebecca S. Betjemann
Institute for Behavioral Genetics
University of Colorado
Richard K. Olson
Department of Psychology
University of Colorado
but not the other tests, both when development was measured by chronological age
and by word reading ability. We discuss the serious implications for research and
clinical practice of having different comprehension tests measure different skills and
of having the same test assess different skills depending on developmental level.
comparability of tests. In addition, because there is the suggestion from Nation and
Snowling (1997) and Francis et al. (2005) that tests using a cloze format may be
more influenced by decoding skill than other comprehension tests, it is important
to examine additional tests to try to provide further insight into this issue. The pres-
ent study therefore not only examines different tests, but also expands the range of
test formats examined so that we can begin to determine whether it is the cloze for-
mat or some other aspect of the test that determines whether it is more a measure of
decoding than comprehension skill.
Another question that we address in this study is the extent to which there are
developmental differences in what a test measures. Is it possible for the same test
to be assessing different skills depending on the age or the decoding ability level of
Downloaded by [142.58.101.27] at 09:23 02 October 2013
the reader? By examining test performance across different ages and different lev-
els of reading skill, we are able to answer this question. In sum, the goal of this re-
search is to determine if what we are calling reading comprehension varies with
the specific test being used, and if what the specific test measures varies with de-
velopmental level. If so, then we must begin to face the problems both for research
and clinical practice inherent in referring to all of them as measures of the same
construct.
The reading comprehension tests we compare in this article are all used in our be-
havioral genetic study of reading comprehension (cf. Keenan, Betjemann, Wads-
worth, DeFries, & Olson, 2006) conducted as part of a larger study of learning
disabilities (DeFries et al., 1997; Olson, 2006). We have been using multiple mea-
sures of reading comprehension in the hope that across all these measures we are
capturing most of the variance associated with individual differences in compre-
hension skill. Because there was no research available about the particular compre-
hension skills assessed by any of the tests when we were designing our test battery,
we attempted to cover a range of comprehension skills by covering a range of test
formats, reasoning that different formats involve different task demands and differ-
ent task demands are likely to tap a broader range of skills.
There are many test formats used in constructing reading comprehension
tests. Among them are (a) whether reading is oral or silent, (b) the length of the
passage, and (c) the particular type of comprehension assessment. The reading
comprehension tests in our battery are the Gray Oral Reading Test–3 (GORT;
Wiederholt & Bryant, 1992), the Qualitative Reading Inventory–3 (QRI; Leslie
& Caldwell, 2001), the Woodcock–Johnson Passage Comprehension subtest
(WJPC) from the Woodcock–Johnson Tests of Achievement–III (Woodcock,
McGrew, & Mather, 2001), and the Reading Comprehension subtest from the
Peabody Individual Achievement Test (PIAT; Dunn & Markwardt, 1970), which
COMPARISON OF READING COMPREHENSION TESTS 285
TABLE 1
How Our Reading Comprehension Tests Instantiated Various Test Format Options
Reading
Oral X X
Silent X X
Text
Single sentence X X
Short passage X
Medium passage X
Long passage X
Assessment
Downloaded by [142.58.101.27] at 09:23 02 October 2013
Picture selection X
Cloze X
Multiple-choice X
Short answer X
Retell X
Note. GORT = Gray Oral Reading Test–3; QRI = Qualitative Reading Inventory–3; PIAT = Pea-
body Individual Achievement Test; WJPC = Woodcock–Johnson Passage Comprehension subtest.
METHOD
Participants
The sample consisted of 510 children. Because they were taking these tests as part
of a behavioral genetic study of comprehension skills, all were twins (180 identi-
1We use the PIAT rather than the PIAT–R to maintain continuity with earlier data collection on the
cal, 290 fraternal) and their siblings (n = 40). The children ranged in age from 8 to
18 years, with the median at 10.5. They were recruited from 27 different school
districts in Colorado by first identifying twins from school birth-date records and
then sending letters to the families requesting participation. For inclusion in our
analyses, the participants had to have English as their first language, no uncor-
rected sensory deficits, and Full-Scale IQ greater than 85 as measured by the
Wechsler Intelligence Scale for Children–Revised (Wechsler, 1974) or the Wechs-
ler Adult Intelligence Scale–Revised (Wechsler, 1981).
One potential problem of using twins as participants is possible noninde-
pendence of the data when related individuals constitute the sample. Although
whatever potential bias created by using twins and siblings would be constant
Downloaded by [142.58.101.27] at 09:23 02 October 2013
across the analyses of each test because the same participants took all the tests, we
also took steps to ensure that the results were not specific to a twin sample. The
analyses were redone twice, once using one of the randomly selected twins of each
pair, and a second time using the other member; siblings were excluded from these
analyses to maintain independence of observations. The results were identical to
the original analyses that included all individuals, both in terms of significance and
in terms of the obtained values (to the second decimal place in almost all cases).
Reported here are the values from the full sample.
Measures
passages, both narrative and expository, and then retell the passage and answer
comprehension questions; and the KNOW-IT Test (Barnes & Dennis, 1996;
Barnes, Dennis, & Haefele-Kalvaitis, 1996), in which children are first taught a
knowledge base relevant to the story that they will hear, then listen to the long
story, and then answer short-answer literal and inferential comprehension ques-
tions.
straint. The other test was the Timed Oral Reading of Single Words (Olson,
Forsberg, Wise, & Rack, 1994), which assessed word recognition accuracy for a
series of increasingly difficult single words presented on the computer screen. For
responses to be scored as correct, they had to be initiated within 2 sec. The test’s
age-adjusted correlation with PIAT word recognition is .88. The composite mea-
sure of word recognition used in our analyses was created by combining the
age-adjusted z scores for the two measures.
RESULTS
Descriptive Statistics
The first column of Table 2 presents descriptive statistics for all of the standardized
tests for our full sample. For each of the different tests, it is clear that the means and
standard deviations for our sample are comparable to the population norms, or
slightly above. In addition, distribution plots of each measure on our sample
showed normal distributions. Table 2 also presents the means and standard devia-
tions on each of the standardized tests broken into two subgroups: one defined by
chronological age and one defined by reading ability, where reading ability was
based on raw scores from the PIAT Word Recognition test. These subgroups were
not used in the regression analyses reported in the results—all analyses were per-
formed using age and reading ability as continuous variables. However, for pur-
poses of illustrating our developmental findings later in Figure 2, we used median
splits on age and reading age, and we present the means of these groups here for
comparison. The QRI measures are not in this table because the QRI is not a stan-
288 KEENAN, BETJEMANN, OLSON
TABLE 2
Standard Score Means and Standard Deviations for Each Standardized Measure
GORT-3 10.99 (3.08) 11.68 (3.24) 10.25 (2.71) 9.87 (2.54) 12.10 (3.17)
Comprehension
WJ Passage 102.18 (10.41) 103.48 (9.96) 100.77 (10.7) 97.54 (9.30) 106.87 (9.36)
Comprehension
PIAT 107.44 (12.6) 105.57 (12.10) 109.46 (12.90) 104.28 (13.15) 110.68 (11.22)
Comprehension
PIAT Word 105.50 (12.21) 104.67 (11.94) 106.39 (12.46) 99.56 (10.80) 111.47 (10.53)
Downloaded by [142.58.101.27] at 09:23 02 October 2013
Recognition
Note. Standard deviations are in parentheses. GORT = Gray Oral Reading Test–3; WJ = Woodcock–John-
son; PIAT = Peabody Individual Achievement Test.
dardized test. Note, however, that we standardized each of the two QRI measures
before conducting any of our analyses by taking the raw scores and standardizing
them across the full sample, regressed on age and age squared; these standardized
residuals were then used in all analyses.
TABLE 3
Intercorrelations Among the Reading Comprehension Tests
GORT 1.0
QRI–Retell .31 1.0
QRI–Qs .38 .41 1.0
PIAT .51 .45 .44 1.0
WJPC .54 .48 .45 .70 1.0
Note. GORT = Gray Oral Reading Test–3; QRI = Qualitative Reading Inventory–3; PIAT = Pea-
body Individual Achievement Test; WJPC = Woodcock–Johnson Passage Comprehension subtest.
COMPARISON OF READING COMPREHENSION TESTS 289
Factor Analysis
An exploratory principal components factor analysis using oblique rotation, to al-
low for correlations between the factors, was performed. The analysis included not
only the five reading comprehension assessments but also the listening compre-
hension composite and the word and nonword decoding measures. Two factors
emerged with eigenvalues greater than 1. We refer to these factors as Comprehen-
sion and Decoding, and the correlation between them was r = .52. Table 4 shows
Downloaded by [142.58.101.27] at 09:23 02 October 2013
the pattern matrix of factor loadings. All the reading comprehension tests load on
the comprehension factor, but the factor loadings for the PIAT and the WJPC are
lower (.37, .43) than the other tests (.62 –.79). Furthermore, the PIAT and the
WJPC, but not the other reading comprehension measures, also load highly on the
decoding factor; in fact, they load considerably higher on decoding than on the
comprehension factor.
Regression Analyses
A pair of hierarchical regressions was run for each of the five reading comprehen-
sion tests to determine how much of the variance in performance on each test was
accounted for uniquely by word decoding and by listening comprehension and
how much was shared variance. For each pair of regressions, word decoding was
entered as the first step followed by the listening comprehension composite in one
analysis (Model 1), whereas in the other analysis, the order of entering was re-
TABLE 4
Pattern Matrix Showing the Factor Loadings of the Reading
Comprehension Tests (in Bold), the Word and Nonword Decoding
Composites, and the Listening Comprehension Composite
Note. GORT–3 = Gray Oral Reading Test–3; QRI = Qualitative Reading Inventory–3; PIAT =
Peabody Individual Achievement Test; WJPC = Woodcock–Johnson Passage Comprehension subtest.
290 KEENAN, BETJEMANN, OLSON
PIAT and the WJPC than the other three measures. The other finding is that this is
because most of the variance in these two tests is accounted for by word decoding
and its shared variance with listening comprehension. Only 5% of the variance
on the PIAT and only 7% of the variance on the WJPC is accounted for independ-
ently by listening comprehension skills. Thus, the answer to our question of
whether reading comprehension tests differ in the degree to which they are as-
sessing component skills appears to be yes because the PIAT and WJPC are more
sensitive to individual differences in decoding skill than are the GORT or the
QRI measures.
This split in sensitivity to word decoding is similar to the difference found by
Nation and Snowling (1997) between the Suffolk and the Neale reading tests.
What is interesting about our finding, however, is that it is not just the test using the
cloze format (the WJPC) that is so heavily influenced by decoding skill; the PIAT,
which uses multiple-choice selection of pictures representing the meaning of the
TABLE 5
Regression Analyses from the Full Sample Predicting Comprehension from Word
Decoding and Listening Comprehension on Each of the Five Reading
Comprehension Assessments
R2 R2 R2 R2 R2 R2 R2 R2 R2 R2
Model 1
1. Decoding .197* .165* .150* .540* .542*
2. Listening Comp .293* .096* .305* .141* .321* .171* .587* .047* .611* .069*
Model 2
1. Listening Comp .218* .259* .287* .246* .291*
2. Decoding .293* .075* .305* .047* .321* .033* .587* .341* .611* .319*
Note. Comp = comprehension. GORT = Gray Oral Reading Test–3; QRI = Qualitative Reading Inven-
tory–3; PIAT = Peabody Individual Achievement Test; WJPC = Woodcock–Johnson Passage Comprehension
subtest. *p < .01.
COMPARISON OF READING COMPREHENSION TESTS 291
Downloaded by [142.58.101.27] at 09:23 02 October 2013
FIGURE 1 The proportion of total variance in each of the reading comprehension tests that
was accounted for independently by word decoding skill, by listening comprehension skill, or
shared. Note. GORT = Gray Oral Reading Test–3; QRI R = Qualitative Reading Inventory–3
Retellings; QRI Q = Qualitative Reading Inventory–3 Comprehension Questions; PIAT = Pea-
body Individual Achievement Test; WJPC = Woodcock–Johnson Passage Comprehension
subtest.
sentence, shows the same pattern as the WJPC’s cloze-test format shows. This in-
dicates that other factors besides the format of the test item are responsible for this
pattern of results. We offer our analysis of what we think these are in the Discus-
sion section.
The GORT is the only test that we examined that was also examined by Cutting
and Scarborough (2006). It is amazing how well our findings on the GORT con-
verge with findings by Cutting and Scarborough. We both report the same amounts
of variance independently accounted for both by decoding skill, .075 in both stud-
ies, and in listening comprehension/oral language, .096 in ours and .093 in theirs.
There were only differences between the studies in the amount of shared variance,
with theirs being larger because their oral language measure focused more on vo-
cabulary tests than on the oral discourse measures we used, and vocabulary is an
important component of word decoding skill.
Developmental Differences
Because our sample included children across a broad age range, we could deter-
mine not only whether there were differences between the tests in what predicted
reading comprehension, but also whether this differed with developmental level.
Developmental differences were assessed both as a function of chronological age
and of reading ability, defined by raw scores on the PIAT Word Recognition test.
To determine if word decoding skill differentially predicted reading compre-
hension as a function of chronological age, we again ran the Model 2 regression
292 KEENAN, BETJEMANN, OLSON
analysis in which listening comprehension is entered as the first step. The amount
of variance accounted for after this first step represents that explained by listening
comprehension and its shared variance with word decoding, shown in the first row
of Table 6. Then to determine whether the amount of variance accounted for by
word decoding interacted with age, we entered an interaction term for the interac-
tion of decoding and age. We created the interaction term by multiplying each
child’s chronological age by their composite word decoding z score. The second
row of Table 6 shows the additional variance accounted for in each test by the inter-
action of age and decoding. As can be readily seen, the amount of variance ac-
counted for by the interaction is much larger for the PIAT (.31) and WJPC (.27) than
for the other tests (all ≤ .06), although all were significant because of our large sam-
Downloaded by [142.58.101.27] at 09:23 02 October 2013
ple. We tested the significance of the differences in variance explained by the inter-
action of age and word decoding across the tests by using Fisher Z tests. There is not
a significant difference between the PIAT and the WJPC in the amount of variance
accounted for by the interaction of word decoding and age; Z is less than 1 (p = .19).
However, comparing either the PIAT or the WJPC against the other three tests, the Z
statistics were always greater than 5 with p < .001, whereas those three tests were not
significantly different from each other. Thus, the interaction term analyses show that
there are developmental differences across tests in what is being assessed.
Perhaps the easiest way to see these developmental differences across tests is to
examine the top half of Figure 2. Although the regression analyses examining
whether word decoding interacts with age were conducted across the full sample,
for purposes of illustrating developmental trends across the tests, the top half of
Figure 2 displays the amount of variance accounted for in each test by decoding
and listening separately for the two halves of our sample, using a median split on
TABLE 6
The Percentage of Variance Accounted for on Each of the Five Reading
Comprehension Tests by the Interactions of Word Decoding Either
With Chronological Age or With Reading Age (Raw Score on PIAT Word)
After First Accounting for Listening Comprehension and Its Shared
Variance With Decoding
Note. Values with different subscripts are significantly different from each other at p < .01. GORT
= Gray Oral Reading Test–3; QRI = Qualitative Reading Inventory–3; PIAT = Peabody Individual
Achievement Test; WJPC = Woodcock–Johnson Passage Comprehension subtest. *p < .01.
COMPARISON OF READING COMPREHENSION TESTS 293
Downloaded by [142.58.101.27] at 09:23 02 October 2013
FIGURE 2 The proportion of total variance in each of the reading comprehension tests that was ac-
counted for independently by word decoding skill, by listening comprehension skill, or shared for
groups defined by chronological age (top figures), and reading age on Peabody Individual Achievement
Test (PIAT) Word Reading (bottom figures). Note. GORT = Gray Oral Reading Test–3; QRI R = Quali-
tative Reading Inventory–3 Retellings; QRI Q = Qualitative Reading Inventory–3 Comprehension
Questions; W-J = Woodcock–Johnson Passage Comprehension subtest.
age where the younger group’s mean age was 9.1 years and the older group’s was
13.1 years. It is readily apparent from this figure that decoding skill accounts for
more variance when children are younger than when they are older, a finding that is
well established in the literature (Hoover & Tunmer, 1993). What is new in our re-
sults is that there are such large discrepancies across tests in these developmental
differences. As this figure shows, and as the values for the interaction terms of age
with decoding (second row of Table 6) showed, there are dramatic differences
across tests as a function of age in the amount of variance accounted for by word
decoding. These developmental differences are large on the PIAT and WJPC,
whereas on the GORT and QRI measures, they are small.
Because reading problems are defined relative to expectations for what is ap-
propriate for age and grade level, it seemed equally important to look at possible
294 KEENAN, BETJEMANN, OLSON
differences in what the tests were measuring as a function of word reading ability,
independent of whether that was above or below expectations. The same analyses
performed for chronological age were therefore repeated using raw scores on the
PIAT word recognition test instead of chronological age. The bottom row of Table
6 shows the differences between the tests in how the amount of variance accounted
for by decoding interacts with a child’s word decoding ability. The pattern for read-
ing age analyses is similar to the results we previously reported for chronological
age. Again, the amount of variance accounted for by the interaction is much larger
for the PIAT (.261) and WJPC (.255) than for the other tests (all ≤ .06), although all
were again significant because of our large sample. Fisher Z tests assessing the sig-
nificance of the difference in the amount of variance accounted for by the interac-
Downloaded by [142.58.101.27] at 09:23 02 October 2013
tion terms again showed that the PIAT was not significantly different than the
WJPC (p = .45), but both were significantly different than the other three tests, the
Z statistics all had p < .001.
Again, the easiest way to see these developmental differences across tests is to
examine Figure 2, where the bottom half displays the amount of variance ac-
counted for in each test by decoding and listening separately for the two halves of
our sample defined by a median split on PIAT word reading raw scores. The differ-
ence between the two ability groups is most evident in the larger amount of vari-
ance independently accounted for by decoding skill in the children with lower
word reading ability. This is most apparent on the PIAT and WJPC where the
amount of variance accounted for independently by decoding declines from .48 to
.12 for the PIAT and from .36 to .16 for the WJPC. Thus, again we are seeing evi-
dence that the PIAT and WJPC are different than the other tests not only in terms of
how much of their variance is accounted for by decoding skill, but also because
what they measure depends on developmental level. If children are young or have
low reading ability, these tests are more assessments of decoding skill, whereas for
more advanced readers, they also assess listening comprehension skills and shared
variance with decoding.
DISCUSSION
interchangeability of word reading tests may underlie the tendency to assume simi-
lar comparability of comprehension tests. Word reading instruments tend to corre-
late very highly; as noted in the Method section, our two measures of word reading
correlate r = .88. However, the intercorrelations we observed among our five read-
ing comprehension measures were much lower. Even our most highly correlated
tests (the PIAT and the WJPC) correlate less highly than most word reading mea-
sures. The modest correlations we observed among reading comprehension tests
suggest that the assumption that reading comprehension tests are all fairly compa-
rable is not correct.2 They suggest that these tests are not all measuring the same
thing—a point that was substantiated by all our subsequent analyses.
The results of our factor analysis showed that two of the reading comprehension
Downloaded by [142.58.101.27] at 09:23 02 October 2013
tests, the PIAT and the WJPC, load highly on decoding, whereas the GORT and the
QRI measures do not. The regression analyses showed that most of the variance on
these two tests is accounted for by individual differences in decoding skill,
whereas decoding plays a rather small role in accounting for performance on either
the GORT or the QRI.
The analyses examining developmental differences as a function of chronologi-
cal age and of reading age further supported the conclusion that the tests are not all
measures of the same skill. Although our findings replicate previous research
showing more influence from decoding skills for younger and less skilled readers
(Catts, Hogan, & Adolf, 2005; Hoover & Tunmer, 1993; Kirby, 2006), they also
extend this finding in two important ways. One way is that they show that the very
same test, particularly the PIAT and the WJPC, can measure different skills de-
pending on developmental level. The other is that they show that there are greater
differences between what reading comprehension tests are measuring when chil-
dren are younger, less skilled or reading disabled. For less skilled and younger
readers, the PIAT and WJPC differ dramatically from the GORT and the QRI,
whereas for more skilled and older children, the differences between the tests are
much smaller.
2Modest correlations can also be interpreted as reflecting lowered reliability of the measures. It
should be noted, however, that because comprehension involves so many different component pro-
cesses, the lower correlations are likely to reflect differential assessment of the components. For exam-
ple, even within the same test, modest correlations can occur between performance on different pas-
sages because the reader may have knowledge about one topic and not another.
296 KEENAN, BETJEMANN, OLSON
describe our variations that parallel their sentence forms but that use different vo-
cabulary.
On the PIAT (and the PIAT–R), children read a single sentence; then the sen-
tence is removed and four pictures are presented for the child to show comprehen-
sion of the sentence by selecting the picture that best represents the sentence’s
meaning. The four choices depict alternatives that would correspond to incorrect
decodings of one of the words in the sentence. So, if the sentence were The patients
were amazed by the giraffe in the lobby, then the wrong answer pictures would de-
pict events using a word similar to patients, like parents, and a word similar to gi-
raffe, like graffiti. Thus, the child would be required to select among pictures of pa-
tients amazed by the giraffe, patients amazed by graffiti, parents amazed by the
Downloaded by [142.58.101.27] at 09:23 02 October 2013
giraffe, and parents amazed by graffiti. In short, correct word decoding is the es-
sence of choosing between the alternatives.
On the WJPC, the correct response also often hinges on correct decoding of a
single word. To illustrate the importance of one word, we use Xs to substitute for
that word. For example, consider this passage: I thought that the painting was too
XXXX. I did not, however, feel like arguing about the __________. The typical re-
sponse in examples like this is for the child to fill in the blank by saying “painting”;
so that the completion would be I did not feel like arguing about the painting. How-
ever, that is incorrect. The correct response is size because the word Xed out was
enormous. A child who has the comprehension skills to know that the blank must
refer back to something in the first sentence demonstrates that knowledge by say-
ing painting. However, the test scores the child as not having those skills because
the assessment of comprehension for this item rests on decoding that one word,
enormous.
three sentences). In our view, there are two reasons why using one- or two-sentence
passages results in a reading comprehension test that assesses decoding skill more
than comprehension. One is that the assessment of comprehension in a short text is
likely to be based on the successful decoding of a single word. This was illustrated
previously in our example of a cloze item where failure to decode just one word,
enormous, leads to an incorrect response, and in the PIAT where decoding confu-
sions appear to be the sole basis for constructing the alternatives on each test item.
Another reason that short passages tend to be more influenced by decoding is
that decoding problems are likely to be more catastrophic in short passages than in
longer passages. In a single sentence, there frequently are no other words for the
child to use to help determine the correct decoding of difficult words, such as ma-
Downloaded by [142.58.101.27] at 09:23 02 October 2013
gician. In a longer passage, however, the text is likely to describe events, such as
pulling a rabbit out of a hat, which would allow the child to use this context to de-
termine the correct decoding. Our speculations about this are reinforced by our
findings that decoding skill accounted for much less variance on the QRI mea-
sures, where the passages are quite long and decoding problems can often be recti-
fied by context.
CONCLUSION
We believe that our findings have important implications both for research and
clinical assessment. For research, it means that the answers to research questions
298 KEENAN, BETJEMANN, OLSON
could vary as a function of the specific test used to assess comprehension. To illus-
trate, suppose one was interested in assessing the extent to which decoding skill
and comprehension skill are associated with similar genes (e.g., Keenan et al.,
2006). If one used the PIAT or the WJPC to assess comprehension, the answer
would more likely be that the same genes appear to be involved in word reading
and comprehension, especially if the data were from young or less skilled readers,
because these comprehension tests assess mainly decoding skill in these readers.
In fact, such a conclusion has been reported by Byrne et al. (2007), who assessed
comprehension with the WJPC in twins tested at the end of first grade. If they had
used the QRI, we contend that their findings would at least have more potential to
show different genes associated with decoding and comprehension because what
Downloaded by [142.58.101.27] at 09:23 02 October 2013
ACKNOWLEDGMENTS
This research was supported by a grant from NIH HD27802 to the Colorado
Learning Disabilities Research Center, for which R. Olson is PI and J. Keen-
an is a co-PI. Rebecca Betjemann was supported by NIMH training grant T32
MH016880-25. Portions of these data were presented at the Midwestern Psycho-
logical Association Meeting, 2005 and the Society for the Scientific Study of
Reading, 2006.
COMPARISON OF READING COMPREHENSION TESTS 299
We thank Laura Roth, Amanda Miller, and Sarah Priebe for discussions of the
data; Laura Roth for comments on the manuscript; all the participants and their
families; and all the testers and scorers.
REFERENCES
Barnes, M. A., & Dennis, M. (1996). Reading comprehension deficits arise from diverse sources: Evi-
dence from readers with and without developmental brain pathology. In C. Cornoldi & J. Oakhill
(Eds.), Reading comprehension difficulties: Processes and intervention (pp. 251–278). Mahwah, NJ:
Erlbaum.
Downloaded by [142.58.101.27] at 09:23 02 October 2013
Barnes, M. A., Dennis, M., & Haefele-Kalvaitis, J. (1996). The effects of knowledge availability and
knowledge accessibility on coherence and elaborative inferencing in children from six to fifteen
years of age. Journal of Experimental Child Psychology, 61, 216–241.
Bowyer-Crane, C., & Snowling, M. J. (2005). Assessing children’s inference generation: What do tests
of reading comprehension measure? British Journal of Educational Psychology, 75, 189–201.
Byrne, B., Samuelsson, S., Wadsworth, S., Hulslander, J., Corley, R., DeFries, J. C., et al. (2007). Lon-
gitudinal twin study of early literacy development: Preschool through Grade 1. Reading and Writing:
An Interdisciplinary Journal, 20, 77–102.
Catts, H. W., Hogan, T. P., & Adolf, S. M. (2005). Developmental changes in reading and reading dis-
abilities. In H. W. Catts & A. G. Kamhi (Eds.), The connections between language and reading dis-
abilities. Mahwah, NJ: Erlbaum.
Cutting, L. E., & Scarborough, H. S. (2006). Prediction of reading comprehension: Relative contribu-
tions of word recognition, language proficiency, and other cognitive skills can depend on how com-
prehension is measured. Scientific Studies of Reading, 10, 277–299.
Davis, F. B. (1944). Fundamental factors of comprehension of reading. Psychometrika, 9, 185–197.
Defries, J. C., Filipek, P. A., Fulker, D. W., Olson, R. K., Pennington, B. F., & Smith, S. D. (1997). Col-
orado Learning Disabilities Research Center. Learning Disabilities, 8, 7–19.
Dunn, L. M., & Markwardt, F. C. (1970). Examiner’s manual: Peabody individual achievement test.
Circle Pines, MN: American Guidance Service.
Francis, D. J., Fletcher, J. M., Catts, H. W., & Tomblin, J. B. (2005). Dimensions affecting the assess-
ment of reading comprehension. In S. G. Paris & S. A. Stahl (Eds.), Children’s reading comprehen-
sion and assessment (pp. 369–394). Mahwah, NJ: Erlbaum.
Hoover, W. A., & Tunmer, W. E. (1993). The components of reading. In G. B. Thompson, W. E.
Tunmer, & T. Nicholson (Eds.), Reading acquisition processes (pp. 1–19). Adelaide, Australia: Mul-
tilingual Matters.
Keenan, J. M., & Betjemann, R. S. (2006). Comprehending the Gray Oral Reading Test without reading
it: Why comprehension tests should not include passage-independent items. Scientific Studies of
Reading, 10, 363–380.
Keenan, J. M., Betjemann, R. S., & Olson, R. K. (2007). How do the specific measures used for decod-
ing and comprehension influence the assessment of what a reading comprehension test measures?
Manuscript in preparation.
Keenan, J. M., Betjemann, R. S., Wadsworth, S. J., DeFries, J. C., & Olson, R. K. (2006). Genetic and
environmental influences on reading and listening comprehension. Journal of Research in Reading,
29, 79–91.
Kirby, J. (2006, July). Naming speed and fluency in learning to read: evaluation in terms of the simple
view of reading. Paper presented at the annual meeting of the Society for the Scientific Studies of
Reading, Vancouver, British Columbia.
300 KEENAN, BETJEMANN, OLSON
Leslie, L., & Caldwell, J. (2001). Qualitative Reading Inventory–3. New York: Addison Wesley
Longman.
Markwardt, F. C. (1989). Peabody Individual Achievement Test–Revised. Bloomington, MN: Pearson
Assessments.
Markwardt, F. C. (1997). Peabody Individual Achievement Test–Revised (normative update). Bloom-
ington, MN: Pearson Assessments.
Nation, K., & Snowling, M. (1997). Assessing reading difficulties: the validity and utility of current
measures of reading skill. British Journal of Educational Psychology, 67, 359–370.
Olson, R. K. (2006). Genes, environment, and dyslexia: The 2005 Norman Geschwind memorial lec-
ture. Annals of Dyslexia, 56(2), 205–238.
Olson, R., Forsberg, H., Wise, B., & Rack, J. (1994). Measurement of word recognition, orthographic,
and phonological skills. In G. R. Lyon (Ed.), Frames of reference for the assessment of learning dis-
abilities: New views on measurement issues (pp. 243–277). Baltimore: Brookes.
Downloaded by [142.58.101.27] at 09:23 02 October 2013
Pearson, P. D., & Hamm, D. N. (2005). The assessment of reading comprehension: A review of prac-
tices—past, present, and future. In S. G. Paris & S. A. Stahl (Eds.), Children’s reading comprehen-
sion and assessment (pp. 13–69). Mahwah, NJ: Erlbaum.
RAND Reading Study Group. (2002). Reading for understanding: Toward an R & D program in read-
ing comprehension. Santa Monica, CA: RAND.
Rimrodt, S., Lightman, A., Roberts, L., Denckla, M. B., & Cutting, L. E. (2005, February). Are all tests
of reading comprehension the same? Poster presentation at the annual meeting of the International
Neuropsychological Society, St. Louis, MO.
Wechsler, D. (1974). Examiners’manual: Wechsler Intelligence Scale for Children-Fourth Edition. San
Antonio, TX: Psychological Corporation.
Wechsler, D. (1981). Examiner’s manual: Wechsler Adult Intelligence Scale–Revised. New York: Psy-
chological Corporation.
Wiederholt,, L., & Bryant, B. (1992). Examiner’s manual: Gray Oral Reading Test–3. Austin, TX:
Pro-Ed.
Woodcock, R. W., McGrew, K. S., & Mather, N. (2001). Woodcock–Johnson III tests of achievement.
Itasca, IL: Riverside.