Hattie 2002
Hattie 2002
International Journal of
Educational Research 37 (2002) 449–481
Chapter 3
Abstract
This chapter examines the extent to which the composition of classes affects learning
outcomes. The aim is to explore peer effects when students are organized into classes on the
basis of ability, ethnicity, or gender, as well as the effects of multigrade and multi-age classes
and class size. The argument is defended that these composition factors affect only the
probability that differential instruction and learning occur and that, at best, their influences
are indirect. Teachers appear not to change their teaching activities when class composition is
changed and most often the power of peer effects is rarely realized. Any direct effects of class
composition are less related to learning outcomes and more related to equity and expectation
effects by teachers and other participants (students, parents, and principals). Whether a school
tracks by ability or not, reduces class sizes, implements multigrade/multi-age or single-level
classes, or has coeducational or single-sex classes, appears less consequential than whether it
attends to the nature and quality of instruction in the classroom, whatever the between-class
variability in achievement. The learning environments within the classroom, and the
mechanisms and processes of learning that they foster, are by far the more powerful. Good
teaching can occur independently of the class configuration or homogeneity of the students
within the class.
r 2003 Published by Elsevier Ltd.
The purpose of this chapter is to examine the extent to which the composition of
classes affects learning outcomes. Particular attention is placed on the tracking of
students by ability, ethnicity, or gender between classes and how this affects their
learning outcomes. Tracking is traditionally viewed as a response to the diversity of
students’ instructional needs, and the typical claim is that when students are placed
into homogeneous classes, teachers can better adapt the materials, level, and pace of
instruction to the needs of individual students. The chapter will also review the
effects of combination classes (usually multiple years in one class), single-sex classes,
and class size. The major argument is that these composition effects have little effect,
primarily because teachers rarely change their instructional methods when the peer
composition of the classes change. The nature and quality of instruction is more
powerful than the class variation in achievement.
1. Tracking
By far the greatest practice and most research relating to compositional effects
concern tracking. Because of variants in tracking practices and terminology, it is
often difficult to derive estimates of the extent of tracking (see Harlen & Malcolm,
1999). In the United States, it is often claimed that about 20–40% of middle schools
assign students to all classes on the basis of ability, and a further 40% use some
between-class tracking, primarily in reading and mathematics (Epstein & MacIver,
1990; Lounsbury & Clark, 1990; Wheelock, 1992). Data from the National
Educational Longitudinal Study (NELS) of 25,000 students in nearly 1000 schools
show that about 86% of public school students in United States middle and high
schools are placed in tracked classes. Approximately 80% of middle schools use
tracking, although 36% of these are considering ‘detracking’—that is, creating, or
reverting to, untracked classes (George & Shewey, 1994; Mills, 1998; Valentine,
Clark, Irvin, Keefe, & Melton, 1993).
There is much evidence of tracking within certain subjects. Loveless (1998)
estimated that 39% of all United States schools have students tracked into three
classes (high, middle, low) for all subjects, 18% have two classes for all subjects, 11%
have three classes for some subjects and untracked for others, 10% have two classes
for some subjects and untracked for others, seven percent have one subject tracked
and untracked for others, and only 14% have all untracked classes. Tracking is more
likely to be used in high schools with rolls of more than 200 students, which is not
surprising given that smaller schools are unlikely to have a sufficient number of
classes to consider tracking. Secondary schools with more than 500 students are
almost certainly to be tracked in the United States (Loveless, 1999a, b). Tracking is
also common in areas with much ‘bright flight,’ as it is seen as a way of holding on to
parents and students who seek advantage from the perceived higher-achieving
schools (Oakes, 1992).
There are many forms of tracking, although the fundamental concern relates to
whether there are heterogeneous or homogeneous classes. Such grouping is typically
formed on the basis of ability or achievement, although students have been assigned
on the basis of combinations of achievement, IQ, and teacher judgements. ‘XYZ skill
grouping’ has been used to refer to students grouped together for purposes of
instruction in, usually, three levels—high-, middle-, and low-tracked classes
(Mosteller, Light, & Sachs, 1996). Another form is the ‘Joplin Plan,’ which is a
more specific form of arranging homogeneous classes, usually in a specific subject.
For example, imagine that there are students grouped according to age across three
grade levels and that the reading levels of these students ranges from Level 1 to Level
9. For the purposes of reading, students are grouped by reading level regardless of
age. When the reading class is over, the students return to their original classes for
ARTICLE IN PRESS
J.A.C. Hattie / Int. J. Educ. Res. 37 (2002) 449–481 451
( 0.12). Noland and Taylor reported that researchers who appeared to favor
tracking tended to find evidence for increased achievement (number of effects=349,
effect size=0.08), whereas researchers who appeared to oppose tracking found
heterogeneous grouping more beneficial to students (number of effects=346, effect
size= 0.24). Noland and Taylor (1986) concluded that tracking, ‘‘while favored by
most teachers and entrenched in the public schools of the United States, does
not (except in some very specific circumstances) improve student achievement
and has potentially serious negative self-concept consequencesy . [Instead,] we
ought to be seeking policies and programs which enhance educational outcomes
and which promote fairness in educational processes. Ability grouping does neither’’
(pp. 29–30).
Mosteller et al. (1996) reported a meta-analysis based on 10 studies. The average
effect size, weighted by sample size, was zero, and the effects for the high, medium,
and low classes were also close to zero. Overall, they concluded that there was ‘‘little
evidence that skill grouping has a major impact, either positive or negative, on
students’ cognitive learning’’ (p. 812).
Slavin (1987) also presented a meta-analysis of 14 studies relating to the Joplin
Plan, most of which relate to grouping for reading. The median effect size was 0.45,
which is quite remarkable in the tracking literature. Further, the effects were
consistently high; the effect for high-tracked students was 0.46, for middle 0.43, and
for low 0.42. Slavin noted that one critical feature of the Joplin Plan (which we note
is often not present in tracking) is frequent, careful assessment of student
performance levels and provision of materials appropriate to these levels regardless
of students’ year levels. The adaptation of instructional pace and level to student
needs is considerable, and there is more movement especially up (and occasionally
down) the levels. Kulik and Kulik (1987) reported an average effect of 0.23 from 16
studies based on the Joplin Plan.
Slavin (1990) conducted a similar meta-analysis for secondary schools, where
tracking is often adopted as part of a whole-school approach. Across the 29 studies,
the typical effect size was 0.02. This near-zero finding was the case in schools where
all subjects were tracked and in schools where only some subjects were tracked. The
effects for high (0.01), average ( 0.08), and low ( 0.02) achievers were not different
from zero, and the average effect for both reading and maths was 0.01. Slavin
concluded that comprehensive between-class tracking has little or no effect on the
achievement of secondary students, at least as measured by standardized tests.
Further, there was ‘‘little support for the proposition that high achievers gain from
grouping whereas low achievers lose’’ (p. 486). Tracking is equally ineffective in all
subjects (except that there may be a negative effect of ability grouping in social
studies), and thus it appears that it ‘‘simply does not matter whom students sit next
to in a secondary class’’ (p. 491). In ‘‘study after study, including randomised
experiments of a quality rarely seen in educational research, [there is] no positive
effect of ability grouping in any subject or at any grade level, even for the high
achievers most widely assumed to benefit from grouping’’ (p. 491).
Gamoran (1987a, b) was critical of Slavin’s (1990) review because it did not
distinguish between school and class organization. ‘‘Grouping does not produce
ARTICLE IN PRESS
J.A.C. Hattie / Int. J. Educ. Res. 37 (2002) 449–481 455
achievement: instruction does’’ (p. 341). Hence, the extent that the effects of
grouping are mediated through teachers’ instructional behavior is critical and, he
argued, the study of grouping alone provides little information of value. This may be
the case; however, the overwhelming summation of these meta-analyses indicates
that this is highly unlikely—there is too little variance in the overall close-to-zero
findings. If teachers are using differential instruction, it does not appear to be having
an effect on the achievement outcomes of the students.
amounts. My estimate of the effect size of tracking in this study is 0.20, and thus
disagrees with Gamoran and Mare’s conclusion that the ‘‘net track effect on
mathematics achievement is substantial’’ (p. 1172).
These regression-based methods are aimed at controlling for pre-existing
differences in student achievement, but this method cannot be as powerful as
randomly assigning students to the various tracks. In a rare study employing random
assignment of students to track, Mason, Schroeter, Combs, and Washington (1992)
placed 34 average-achieving grade 9 students into high-track pre-algebra classes with
their high-achieving peers. Several of these average achieving students performed
better than their high-achieving peers and ‘‘took substantially more advanced
mathematics during high school’’ (p. 597). The high-achieving students suffered no
decrease in computation or problem-solving achievement and scored higher in
concepts than their cohort ‘average’ peer groups from previous years.
On the basis of a review of survey and ethnographic research, Gamoran and
Berends (1987) concluded that, when appropriate controls for prior achievement are
incorporated, most of tracking’s influence on academic achievement disappears.
They summarized data from 10 American data sets used in 16 studies (including
NELS, Project Talent, Youth in Transition, and HSB). These data spanned 29 years,
although no time trends were apparent. Gamoran and Berends noted that, by
reducing the pace and complexity of classroom work, teachers believe they are
gearing instruction to the ability levels of the students. It may also assist in
controlling students’ behavior: ‘‘Teachers used structured written work as a device to
quiet a class or keep it calm’’ (Metz, 1978, p. 103), particularly in low-tracked classes.
Low-track students appear to prefer such a pace as it is ‘‘less taxing and creates a
sense of routine. Moreover, low-track students preferred written work because it was
more private. In the oral instructional engagements in the higher tracks, mistakes
were more visible’’ (Gamoran & Berends, p. 423). Such slower-paced instruction
means that important parts of the curriculum may be introduced later for low-track
students, which can have an accumulative retrogression effect on these students’ later
chances at attaining educational desirability such as access to more challenging
upper-school courses and university entrance. Such pacing also destines the students
to remain in the lower tracks, as they are now even further behind their middle-
tracked peers.
Hoffer (1992) used a sample of 5945 grade 8 and 9 students from over 100 schools
(based on the Longitudinal Study of American Youth database). The science effect
sizes for the low-track versus ungrouped classes were 0.40 for grade 8 and 0.17
for grade 9 students and 0.08 for the high-track versus ungrouped classes for
students in both grades. The math effect sizes were 0.36 and 0.32 for the low
group and 0.26 and 0.18 for the high group (grade 8 and grade 9, respectively).
Hence, differential effects were found in both subjects, but more so in math than in
science. Hoffer noted, however, that there were more students in the high track than
the low track, concluding that the ‘‘net effects of grouping turns out to be about
zeroy . Ability grouping in seventh- and eighth-grade mathematics and science is
clearly not an optimal arrangement compared with the non-grouped alternative, for
low-group students are significant losers’’ (p. 221). After testing many other models
ARTICLE IN PRESS
J.A.C. Hattie / Int. J. Educ. Res. 37 (2002) 449–481 457
for moderators, they concluded that they could find no conditions under which
grouping benefits all students (or at least helps some and does not hurt any).
In one of the few studies to use HLM to ascertain the effects at both the student
and the class level, Bode (1993) used performance in mathematics for 1319 grade 9
students in 79 classes from 61 schools. There were no advantages to being in either a
tracked or untracked class. One of the best predictors was high level of effort (rather
than prior achievement), and the amount of time spent in small-group instruction in
heterogeneous rather than homogeneous classes. There were no effects relating to
class, teacher, or instructional programs.
Perhaps the most influential in-depth study of teaching and learning in tracked
classes is Oakes’ (1985) Keeping track: How schools structure inequality. Her study
was based on an intensive qualitative analysis of 25 junior and senior high schools.
The major finding was that many low-track classes are deadening, non-educational
environments. Oakes (1992) concluded that ‘‘the best evidence suggests that, in most
cases, tracking fails to foster the outcomes schools value’’ (p. 13). Tracking fosters
friendship networks linked to students’ group membership, and these peer groups
may contribute to ‘polarised’ track-related attitudes among high school students,
with high-track students becoming more enthusiastic and low-track students more
alienated (Oakes, Gamoran, & Page, 1992). In subsequent evaluations, Oakes (1993)
commented that tracking limits ‘‘students’ schooling opportunities, achievements,
and life chances. Students not in the highest tracks have fewer intellectual challenges,
less engaging and supportive classrooms, and fewer well-trained teachers’’ (p. 27).
Shanker (1993), then President of the American Federation of Teachers, in a
commentary of Oakes’ research, was more earthy: ‘‘Kids in these [lower] tracks often
get little worthwhile work to do; they spend a lot of time filling in the blanks in
workbooks or ditto sheets. And because we expect almost nothing of them, they
learn very little’’ (p. 34). In a similar qualitative design, Page (1991) provided a
detailed account of daily activities of eight low-track classes and found that teachers
and students came to understandings about how to not push each other too hard so
that they could cope, that low tracks were used as ‘holding tanks’ for students with
the most severe behavior problems, and that teachers focused on remediation
through dull, repetitious seatwork (see also, Camarena, 1990; Gamoran, 1993).
The nature of instruction is different in the various tracks. Gamoran, Nystrand,
Berends, and LePore (1995), in a two-year study of instructional methods in 92 high-,
regular-, and low-tracked classes in 25 secondary schools, found that it was less the
interactive style than the content of the interactions that favored higher-track over
regular- and lower-track classes. In the high-track classes, there was more instruction
relating to the subject matter, whereas in low-track classes, instruction was more
often fragmented, emphasising isolated bits of information instead of sustained
inquiry (Page, 1991). In studies by Gamoran (Gamoran, 1989; Nystrand &
Gamoran, 1988), students in low-track grade 9 and grade 10 English classes
answered true–false, multiple-choice, and fill-in-the-blank questions four to five
ARTICLE IN PRESS
458 J.A.C. Hattie / Int. J. Educ. Res. 37 (2002) 449–481
The differential achievements relating to racial and social class groups are among
the most contentious issues related to the tracking debate. Oakes and Wells (1996)
claimed that tracking exists to guarantee the unfair distribution of privilege, in that
white and wealthy students benefit from access to high-status knowledge that low-
income students and students of color are denied. Oakes et al. (1990) analysed 1200
public and private primary and secondary schools in the United States, and found
that minority students were seven times more likely to be identified as low-ability
than as high-ability. Those schools that track often explain this ethnic subdivision by
reference to past achievement, and thereby argue that tracking can maximise
opportunities to alter this. If tracking leads to proportionally more lower
socioeconomic students or students from particular ethnic groups being placed in
lower tracks, then the use of tracking may serve to increase divisions along class,
race, and ethnic lines (Haller & Davis, 1980; Rosenbaum, 1980). In his survey of
tracking policy in California and Massachusetts, Loveless (1999b) concluded that
there are massive contradictions, in that detracking is taking place in low-
achievement schools, in poor schools, and in urban areas; whereas suburban
schools, schools in wealthy communities, and high-achieving schools are staying with
tracking—indeed embracing it. ‘‘This runs counter to the notion of elites imposing a
counterproductive policy on society’s downtrodden. If tracking is bad policy,
society’s elites are irrationally reserving it for their own children’’ (Loveless, 1999b,
p. 154). Braddock (1990) found that schools with more than 20% of their rolls from
minority groups are more likely to track than those with fewer minority students.
Oakes et al. (1992) found that Asian students were more likely to be assigned to
advanced courses than were Hispanic students with whom their test scores were
equivalent. A disproportionate number of low socioeconomic status and disadvan-
taged minority students occupy the lower tracks and non-college tracks (National
Center for Educational Statistics, 1985; Oakes et al., 1992; Persell, 1977; Van Fossen,
Jones, & Spade, 1987). Students of average ability from advantaged families are
more likely to be assigned to higher tracks because of actions by their parents, who
are often effective managers of their children’s schooling (Alexander, Cook &
McDill, 1978; Baker & Stevenson, 1986; Dornbusch, 1994; Lareau, 1987; Useem,
1991, 1992). Further, schools with a larger proportion of minority and lower
socioeconomic students are less likely to have sufficient higher-level courses,
which affects the probabilities of students entering higher classes. Moreover, the
ARTICLE IN PRESS
460 J.A.C. Hattie / Int. J. Educ. Res. 37 (2002) 449–481
higher-track programs in these schools are often less rigorous than higher-track
classes in schools with fewer minorities and higher socioeconomic students (Oakes
et al., 1992).
1.4. Detracking
Table 1
Summary of achievement related effect sizes in 6 meta-analyses
na=not available.
Table 2
Summary of effect sizes relating to low-, middle-, and high-ability tracks in 7 meta-analyses
Table 3
Stem and leaf display of effect sizes for primary, intermediate, and secondary schools
0.4 8 8
0.3 14 3 134
0.2 6 0 06
0.1 03 03569 003569
0.0 3468 4 334468
0.0 000004555 00233 0000119 000000000001123345559
0.1 5 0148 01458
0.2 1378 289 1237889
0.3 228 228
0.4
0.5
0.6 018 018
0.7 01 01
Count 31 7 21 59
basis of the studies presented to feel confident that schools were undertaking a
worthwhile activity by considering or using tracking. The effects of many teaching-
related, instructional innovations substantially overwhelm the effects of tracking.
Further, the trade-off is not between closing the gap between low-ability and high-
ability students versus raising overall student test scores—it is between policy makers
attending to classroom organization practices versus improving what happens once the
classroom door is closed. Whether a school tracks or not appears less consequential
than whether it attends to the nature and quality of instruction in the classroom,
whatever the within-class variability in achievement. It is almost certain that there are
conditions of learning (such as specific and challenging goals, the presence of feedback,
and structure in the activities) that are far more powerful. It is likely that procedures
other than tracking can optimise the advantages of peer influences in learning, such as
reciprocal teaching and scaffolding. It is more likely that good teaching is more
powerful a factor and can be independent of the class configuration.
The relevance of the curriculum to the class is important, and greater relevance is
not necessarily a consequence of more homogeneity within the class. As Slavin
(1987) concluded, ‘‘for ability grouping to be effective at the primary level, it must
create true homogeneity on the specific skill being taught and instruction must be
closely tailored to students’ levels of performance’’ (p. 323). Perhaps the best
example of this is the Joplin Plan (Floyd, 1954), which involves tracking students for
reading across grade levels. Thus, all students in the school are timetabled for
reading at the same time, and then groups are formed based on reading ability across
grades. The average effect of the Joplin Plan, based on 14 studies, is 0.45. Hence,
tracking can occur in a way that allows expectations to be raised and allows
movement across reading levels, without disrupting the whole class structure or
establishing classes that teachers do not want to teach. The other example of
curriculum tailoring relates to gifted classes, where the effects are much greater than
ARTICLE IN PRESS
J.A.C. Hattie / Int. J. Educ. Res. 37 (2002) 449–481 463
for high-tracked classes. It is the curriculum relevance and not just the reduction of
homogeneity that seems to make the difference.
The effects of tracking on self-esteem also are near-zero overall, albeit slightly
positive for lower-tracked and slightly negative for higher-tracked students. Higher-
tracked students become slightly less satisfied with themselves when taught with their
intellectual peers; slower students may gain slightly in self-confidence when they are
taught with slower learners. This is known as the ‘little-fish-big-pond’ effect. Marsh
(1984a, b, 1987; Marsh & Hattie, 1995; Marsh & Parker, 1984) has extensively
documented these effects, which he claims are particularly evident with selective
schooling. Marsh hypothesised that students compare their own academic ability
with the academic abilities of their peers and use this social comparison impression
as one basis for forming their own academic self-concept. The effect occurs when
equally able students have lower academic self-concepts if they compare themselves
to more able students, and higher academic self-concepts if they compare themselves
with less able students. For example, if average-ability students are in a high-ability
class, then their academic abilities will be below the average of other students in the
class, and this will lead to academic self-concepts that are below average. Conversely,
if these students attended a low-ability class, then their abilities would be above
average in that class and this would lead to academic self-concepts that are above
average. Similarly, the academic self-concepts of below-average and above-average
students will depend on their academic ability but also will vary with the type of class
they attend. According to this model, academic self-concept will be correlated
positively with individual achievement (brighter children will have higher academic
self-concepts) but negatively related to class-average achievement (the same children
will have lower academic self-concepts in a class where the average ability is high).
Thus, equally able students tend to have lower academic self-concepts if they attend
academically selective classes (or schools) than if they attend classes in which the
average ability level is lower. Marsh (1987) argued ‘‘for at least some children, the
early formation of a self-image as a poor student may be more detrimental than
the possible benefits of attending a high-ability school’’ (p. 292).
There is a final conundrum in this research. The empirical evidence leads to a
conclusion that there is a close to zero effect from tracking, but the qualitative
literature indicates that there may be quite different teaching and interactions in the
low versus high tracked classes. The qualitative evidence indicates that low track
classes are more fragmented, less engaging, and taught by fewer well-trained
teachers. Clearly, if these lower tracked classrooms were more stimulating,
challenging, and taught by well-trained teachers there may be gains from tracking
for these students. It seems that the quality of teaching and the nature of the student
interactions are the key issues, rather than the compositional structure of the classes.
2. Combination classes
Combination classes are classes that group students of more than one year level
who are taught in the same classroom by the same teacher (also called ‘multigrade’,
ARTICLE IN PRESS
464 J.A.C. Hattie / Int. J. Educ. Res. 37 (2002) 449–481
3. Single-sex classes
classes at the primary level; although there is little reason to suspect that there would
be meaningful differences at this level. There are more powerful effects due to the
quality of teaching and teacher expectations than to whether a class is all one sex or
mixed.
4. Class size
The research on the effects of class size has been among the more voluminous in
educational research, with consistent findings. The earliest empirical studies were
published at the turn of the last century (Rice, 1902), although there are earlier
claims in the Talmud that the maximum size of bible classes should be no more than
25 students. The very first meta-analysis was related to class size. Glass and Smith
(1978) synthesized 77 studies, leading to a total of 725 effect sizes. The average effect
size was 0.09, but more importantly, there was a non-linear effect. Reducing class
sizes from 40 or more to 20 students led to close-to-zero increments in achievement,
but when class sizes dropped to 15 students or lower, there were larger effects on
achievement. Smith and Glass (1980) also synthesized 59 studies covering 371 effects
relating to class size and non-achievement based outcomes such as interpersonal
regard, quality of instruction, teacher attitude, and school climate. Table 4 presents
the effect sizes with a class-size of 30 as the anchor point at the 50th percentile of
effects for both the achievement and non-achievement outcomes (calculated from
Glass & Smith, 1978). Thus, if a student from a class of 30 was placed into a class of
20 students, he or she would experience achievement benefits superior to 54% of
students, and non-achievement benefits superior to 69% of students, who are taught
in the class of 30.
Smith and Glass (1980) concluded that achievement, attitude, teacher morale, and
student satisfaction gains were ‘‘appreciable’’ in smaller classes—provided we
recognise that ‘small’ means 10–15 students—with negligible gains from reducing
class sizes as high as 40 to 20 students. This effect was greater in secondary than in
primary schools, but the same across all subjects and across various ability levels.
Hedges and Stock (1983) reanalysed Glass and Smith’s (1978) set of studies using
Table 4
Summary of effect sizes for various reductions of class size on achievement and attitudes
30 5 0.84 0.41
30 10 0.26 0.52
30 15 0.13 0.33
30 20 0.04 0.19
30 25 0.00 0.09
30 30 0 0
ARTICLE IN PRESS
468 J.A.C. Hattie / Int. J. Educ. Res. 37 (2002) 449–481
slightly more rigorous statistical methods, and found no differences to the earlier
conclusions. A more telling criticism of the Glass and Smith meta-analysis was that
the studies were of short duration, included one-on-one tutoring, and were in some
cases non-class related (e.g., tennis). Slavin (1989a, b) found only eight studies that
met his inclusion criteria of lasting at least 1 year, involving a substantial reduction
in class size, and involving random assignment or matching of students across larger
and smaller classes. He concluded that substantial reductions in class size have a
small positive effect on students (effect size=0.13) and the effect was not cumulative
and disappeared within a few years.
McGiverin, Gilman, and Tillitski (1989) conducted a meta-analysis of 10 studies
of Indiana’s Prime Time project. This project aimed to reduce class sizes to 14 in 24
grades 1–3 classes, and the students were followed over three years. They reported
that grade 3 students who had been in smaller classes for two years had significantly
higher achievement test scores than did students in larger classes, with an overall
average effect size of 0.34 (see also Chase, Mueller, & Walden, 1986; Malloy &
Gilman, 1989; Mueller, Chase, & Walden, 1988). It is difficult to credit this effect to
class size, however, as the study had few controls. It is not clear that small classes
were kept small for the entire day and, while the average class size for the ‘smaller’
classes was set at 18, actual ‘small’ class sizes ranged from 18 to 31, and classes of 24
were considered small if there was a teacher aide to assist the teacher.
Hanushek (1986, 1998, 1999; Hanushek et al., 1996) has long maintained that
there is little evidence to support the benefits on student learning of smaller classes.
In a series of summaries of the literature, he found 78 separate estimates of class size
effects based on value-added results. Of these, 12% were statistically significant and
positive in favor of smaller classes, and 8% were negative; 21% were not statistically
significant but positive, and 26% were negative. Hence, there is ‘‘little reason to
believe that smaller classes systematically lead to improvements in student
achievement’’ (Hanushek, 1999, p. 148). When he added further studies, Hanushek
concluded ‘‘more studies actually suggest that small classes are harmfuly [and that
overall, there is] ‘‘no consistent or clear indication that overall class size reductions
will lead to improved student performance’’ (p. 149).
The Wisconsin Student Achievement Guarantee in Education (SAGE) program
was a five-year project that included reducing class size (Molnar et al., 1999). From a
series of regressions and an HLM analysis, they concluded that the effect size from
the first-year SAGE students for class size reductions was about 0.2, and higher for
African-American students. Interestingly, they noted no differences between class
sizes of 15 with one teacher and class sizes of 30 with two teachers, concluding that
this ‘‘suggests that the benefits of reducing class size may be achievable without the
attendant capital costs of building additional classrooms’’ (p. 177).
Project STAR (Student–Teacher Achievement Ratios) began in Tennessee in 1985
(for a history of this innovation, see Ritter & Boruch, 1999). This project involved a
random assignment of students entering kindergarten into regular classes (22–24
students), regular classes with teacher aides, or small classes (14–16 students). The
allocation was done across 331 classes (from 79 schools) and the students stayed in
their class conditions for the next 3 years, when they then moved into regular-sized
ARTICLE IN PRESS
J.A.C. Hattie / Int. J. Educ. Res. 37 (2002) 449–481 469
classes (Finn & Achilles, 1990; Finn, Folger, & Cox, 1991; Word et al., 1990). Finn
(1998) demonstrated that smaller classes benefited students in grade 1 through grade
4 academically, and there were improvements in the students’ expenditure of effort,
initiative taking, and reduced disruptive and inattentive behavior in comparison to
students in larger classes. The effect sizes were all positive in favor of the small
classes, and greater for minority (close to double) compared to white students for all
achievement areas; zero effects were found for motivation and self-concept. Finn and
Achilles (1990) reported that the difference between minorities and whites in mastery
rates on the grade 1 reading test was ‘‘reduced from 14.3% in regular classes to 4.1%
in small classes’’ (p. 568). Across all comparisons, the smaller class advantage in
grade 1 was approximately 0.15–0.18; for grade 2, 0.22–0.27; and for grades 3 and 4,
0.19–0.26. These overall effects (0.15–0.27) are not that different from what would
have been predicted on the basis of Glass’s meta-analysis.
Hanushek (1999) is highly critical of the large attrition rate in the Project STAR
data. He noted that slightly less than half of the original students in the experiment
remained in the study until the end of the third grade. Nye, Hedges, and
Konstantopoulos (1999), in a 5-year follow-up study, found that students who left
the small classes had higher achievement than those who left the larger classes,
suggesting that the observed differences are probably not due to attrition.
Hanushek’s more cogent criticism, which is as yet unanswered, is that although
randomisation was used, unlike randomisation in other areas (e.g., medical science),
it was not blind in this study. That is, teachers, parents, school officials, and students
were obviously aware of the assignment to small or larger classes. Hence, the results
could have been related to more resources going to the smaller classes, and other
‘‘more direct motivation and incentives of teachers and principals that could bias the
results of the different treatment groups’’ (Hanushek, 1999, p. 153). He also noted
the high likelihood of school effects influencing the conclusions. The students were
randomly assigned but the schools had to volunteer to participate. There were 79
schools with kindergarten experiments, and half (40) showed advantages for small
classes and the other half for regular classes. His conclusion, therefore, is that ‘‘it is
only slightly better than an even bet from the STAR data that the small class
achievement will exceed that of the regular and the regular with aide classes in any of
the sampled schools’’ (p. 159). Other econometric studies also show small effects
from reducing class size, with effects clustering around 0.00 to 0.10 (Boozer & Rouse,
1995; Hanushek et al. 1996; Krueger, 1997).
At grade 5, all of the students in the study returned to regular-sized classes. The
Lasting Benefits Study followed many of these students, some through to grade 11
(Finn et al., 1991). The effect sizes in favor of those who had begun in smaller classes
were primarily in the 0.10–0.15 range, indicating that there were positive effects of
this early age intervention, even when the small-class intervention was disbanded.
The effects for student engagement in learning (initiative taking, lack of disruption,
attentiveness) were greater in the smaller classes (effect size=0.13) a year after the
students returned to normal classes (Finn & Achilles, 1999). Wenglinsky (1997), in
an analysis of production functions based on Project STAR, also reported positive
effects for small classes at grade 5 but not at grade 9. Pate-Bain, Boyd-Zaharias,
ARTICLE IN PRESS
470 J.A.C. Hattie / Int. J. Educ. Res. 37 (2002) 449–481
Cain, Word, and Binkley (1997) followed a cohort through to grade 11, and
concluded that students who had been in the smaller classes appeared to have
maintained academic achievement advantages. There is more compelling evidence
that students from small classes were less likely to fail a grade level, less likely to be
suspended, and more likely to take more advanced courses than their peers who were
in regular and regular/aide classes.
A key issue in this class size debate is whether the teachers change the instruction
when moving from larger to smaller classes, and whether this has an effect on
learning outcomes. Glass, Cahen, Smith, and Filby (1982) reported evidence that,
too often, the nature of the instruction did not change when classes were reduced
from 40 to 20 students. A poor teacher with 30 students may remain a poor teacher
with 15. Shapson, Wright, Eason, and Fitzgerald (1980), in an unusual study,
randomly assigned teachers and students in grade 5 to one of four class sizes: 16, 23,
30, or 37 students. The students were randomly reassigned in grade 6 and, as well as
achievement measures, ratings were made of teacher–student interactions and
classroom behaviors. The teachers expressed more positive attitudes with the smaller
classes and were more pleased with the ease of managing and teaching in the smaller-
class setting. However, ‘‘the observation of classroom process variables revealed very
few effects of class size. Class size did not affect the amount of time teachers spent
talking about course content or classroom routines. Nor did it affect the choice of
audience for teachers’ verbal interactions; that is, when they changed class sizes,
teachers did not alter the proportion of their time spent interacting with the whole
class, with groups, or with individual pupils’’ (pp. 149–150). No differences were
found in student satisfaction or affective measures, teacher activities, subject
emphasis, classroom atmosphere, or the quality measures.
Bourke (1986) found that as class size became larger, so did the amount of noise
tolerated, non-academic management, and teacher lecturing or explaining. As class
size became smaller, there was an increase in the amount of homework assigned and
graded, teacher probes after a question, instances of the teacher directly interacting
with students, and positive teacher response to answers from students. Thus, in
smaller classes, less time is spent on classroom management, and there are more
protracted interactions with students. Blatchford, Edmonds, and Martin (2003)
completed an observational study of 21 small (average size of 19 students) and 18
large (average size of 33) reception classes and found that students in larger classes
were more distracted from work and more often off task. But they found no support
for the claim that peer relations were better in smaller classes; indeed, there was a
slight tendency for worse peer relations, in terms of aggression, asocial and excluded
students in the smallest classes. They concluded that teachers were more likely to
facilitate more peer-related contacts in larger classes. In a larger study of 122 small
and 112 larger classes they found that there was greater teacher-to-child (effect-
size=0.86) and child-to-teacher (effect-size=0.83) but fewer child-to-child interac-
tions (effect size= 0.34) in larger classes (see also Blatchford & Martin, 1998).
Evertson and Folger (1989) reported that students in smaller classes initiated more
contacts with the teacher for purposes of clarification, gave more answers to
questions that were open to the whole class, more often contacted the teacher
ARTICLE IN PRESS
J.A.C. Hattie / Int. J. Educ. Res. 37 (2002) 449–481 471
privately for help, were more on-task, and spent less time waiting for the next
assignment (see also Achilles, Kiser-Kling, Aust, & Owen, 1994; Kiser-Kling, 1995).
Overall, however, in the Project STAR analyses, teachers tended not to change their
fundamental teaching strategies when given a small class (Finn & Achilles, 1999).
In the Project STAR findings on classroom behaviors, Finn and Achilles (1999)
have noted that students in small classes were more engaged in learning behaviors
and exhibited less disruptive behavior than did students in large classes. They also
noted that ‘‘despite dozens of earlier studies, the classroom processes that distinguish
small from large classes have proven elusive’’ (p. 102). Like many other researchers,
they did not find differences between teachers of small and large classes in the overall
structure of lessons, teaching practices, or content coverage (e.g., Bohrnstedt,
Stecher, & Wiley, 2000; Molnar et al., 1999; Stasz & Stecher, 2002). Instead, Finn
and Achillles proposed that the ‘‘key to the academic benefits of small classes lies in
student behavior. It is proposed that students become more engaged academically
and more engaged socially when class sizes are reduced, and this increased
engagement in the classroom is a compelling explanation for increased learning in all
subject areas’’ (p. 3). It is the students not the teachers who cause any changes.
Finn, Pannozzo, and Achilles (submitted for publication) argued that there were
four principles that accounted for the positive effects in favor of smaller class sizes.
First, students are more in ‘‘the firing line’’ as they are more visible to the teacher,
and cannot avoid being noticed. Hence students are more engaged in learning.
Second, when students feel they are part of a smaller group they tend to feel more
responsible and this enhances their motivation to respond. Third, groups become
more efficient as the group size decreases because each participant exerts more effort.
Students believe that their contribution will have greater impact on group
functioning, and is more likely to be evaluated or rewarded. Fourth, smaller groups
may encourage member participation because they are more unified in their purposes
and actions than are larger groups, and because individual members often feel that
they are more closely affiliated with the group, receiving guidance and support from
other group members. That is, there is a greater sense of belonging; more
cohesiveness or a sense of community.
Betts and Shkolnik (1999) conducted an intensive analysis of the Longitudinal
Study of American Youth, which includes surveys by teachers, principals, students,
and parents about student and teacher behavior in the classroom. Class size
variations induced little change in how teachers allocated their time between new
material, review, discipline, routine tasks, and testing. In smaller classes, teachers did
not increase the proportion of time spent on new materials but allocated more time
to reviewing activities. They found that smaller class sizes led to teachers devoting
less time to group instruction and more time to individual instruction. Their
evidence, however, demonstrates that teachers could make small classes ‘‘consider-
ably more effective if they did not reduce group instruction to the extent that they
do’’ (p. 209). Overall, they argued that, because teachers reallocate their time to such
a small extent, this ‘‘may explain why it has been so hard in most past research to
identify a positive and significant impact of class size reduction on student
achievement’’ (p. 209).
ARTICLE IN PRESS
472 J.A.C. Hattie / Int. J. Educ. Res. 37 (2002) 449–481
Rice (1999) completed a similar analysis but used the NELS database. She found
that class size does not appear to influence the instructional strategies in science
classes, and there are very small effects in mathematics classes. The effects were more
pronounced in classes of higher-ability students, suggesting that the teachers do not
change their instructional practices for lower-achieving students no matter what the
class size. Hargreaves, Galton, and Pell (1998) also concluded from their British
study of classroom interaction differences that the successful teachers of the larger
classes had ‘‘difficulty in maximising the opportunities offered in the small class
setting, largely because they were unfamiliar with having to cope with such small
numbers’’ (p. 791). A related set of reasons is that teachers often use many groups
within their classes, and this can lead to some of the benefits claimed for reducing
class size. Blatchford, Baines, Kutnick, and Martin (2001) have noted that, for many
teachers, it is the size of groups within the class and not the number of students that
is more influential in achieving greater outcomes.
There is little evidence that instruction methods change when class size is reduced,
although a large part of any improvement relating to smaller class sizes can be
explained by improvements in student task engagement (Finn & Achilles, 1999;
Finn, Pannozzo, & Voelkl, 1995; McFadden, Marsh, Price, & Hwang, 1992; Steele,
1992). Reducing class size increases the probability that these more positive teacher
interventions will occur, but it does not guarantee them—and, too often, teachers do
not change their habits of instruction when their class sizes are reduced.
Brewer, Krop, Gill, and Reichardt (1999) estimated the costs of reducing class
sizes to 18 students in grades 1–3 in the United States, as President Clinton then
proposed, would require hiring an additional 100,000 teachers at a cost of $US5–6
billion per year. Per student costs were about $US500 for each year the students were
in smaller classes. To reduce again from 18 to 15 students would cost a further
$US5–6 billion per year. There are also the costs of classroom space, changes to
buildings, teacher training, and so on. This investment could, instead, be used to
raise teachers’ salaries by $20,000 per year.
In summary, the research on class size indicates that very small gains are made to
achievement as a consequence of reducing class sizes (even down to 15). The costs of
reducing class sizes to such an extent are extremely large, and it is likely that these
resources could be more effectively used to achieve greater achievement gains, and
higher quality teaching performances by implementing alternative innovations. There
is also little evidence to claim that more and/or positive peer interactions accompany
the benefits in achievement until class sizes are reduced to around 15 students.
Given the major finding of this chapter, that very few meaningful compositional
effects accrue directly from different classroom configurations, it is appropriate to
suggest that such configurations have few implications for peer effects on learning.
This is particularly underlined when it is also noted that changing configurations is
unlikely to be accompanied by changes in instructional methods. The rhetoric that
ARTICLE IN PRESS
J.A.C. Hattie / Int. J. Educ. Res. 37 (2002) 449–481 473
6. Conclusion
The research cited shows that there is a small advantage for many of the classroom
configurations. The best estimates of the effects of tracking on achievement
outcomes is 0.05, of combination classes (perhaps) 0.10, of single-sex classes 0.05,
and of smaller classes 0.05–0.15. Over all these various effects, at best the effects
average about 0.10. These estimates increase only when teachers change their
instruction to adapt more fully to the students in their classes. This change does not
mean changing the pace of instruction or lowering the expectations of what the
students can accomplish, but a dramatic change in the nature of the activities, and a
renewed vigor towards implementing appropriately challenging tasks.
As noted above, the trade-off is not between closing the gap between low-ability
and high-ability students versus raising overall student test scores via implementing
various class-level configurations. Rather, it is between policy makers attending to
classroom organization practices versus improving what happens once the classroom
door is closed. Whether a school tracks or not, or implements combination or single-
sex classes appears less consequential than whether it attends to the nature and
quality of instruction in the class, whatever the between-class variability of
achievement. Good teaching is more powerful and appears to be independent of
the class configuration or homogeneity of the students within the class. The most
important implication is that the major cost of attending to class-level configurations
relates to the false belief that something educationally sound has been accomplished.
Attention needs to be directed at more careful curriculum specification, higher
quality teaching, and higher expectations that students can meet appropriate
challenges—and these occur once the classroom door is closed and not by
reorganising which students are behind those doors.
It is likely that any compositional effects from changing between-classroom
configurations have more influence on attitudes than on achievement. Most
ARTICLE IN PRESS
474 J.A.C. Hattie / Int. J. Educ. Res. 37 (2002) 449–481
powerfully, tracking, for example, reinforces low expectations for both teachers and
students and has differential effects on enjoyment of school—lower for low tracks
and higher for high tracks. Tracking does tend to polarise student attitudes into pro-
and anti-school camps (Gamoran & Berends, 1987): ‘‘Whereas high-track students
tend to accept the school’s demands as the normative definition of behavior, low-
track students resist the school’s rules and may even attempt to subvert them’’ (p.
427). This polarisation is often echoed in the students’ perceptions about their peers
as well. The ‘haves’ become less likely to want to ‘associate’ with their less well-
endowed peers and the ‘have nots’ are less likely to want to ‘associate’ with those
with privilege to opportunities seemingly denied to them—by ability, by social
stratification, and by institutional practices. Both high- and low-track students, like
their teachers, view the top tracks as offering a better education, more opportunity,
and more prestige.
Of most interest for further research, is to ask why teachers and schools persist
with these composition effects in the face of the consistent message that they make
little if any positive difference, and to ask why teachers do not change their
instruction to meet the (homogenous) needs of students when there are changes to
the composition of the class? It is time to move away from the debate of composition
effects, as it may well be a distraction, to more critical concerns that have a higher
likelihood of positively influencing students learning. At best, class-based composi-
tion affects only the probability that differential instruction and learning occur, and
at most any influence is indirect. The direct effects are less related to learning
outcomes and more related (typically negatively) to equity and expectations effects
by teachers and other participants (students, parents, and principals). Any changes
are more likely caused by peer influences than by teacher changes. A major cost of
deciding to change the variability of the class, therefore, is the false assumption that
something has been done that can benefit the students merely by a grouping
composition effect. Teachers do not appear to employ differential instructional
methods when they switch from heterogeneous to homogenous classes (or vice
versa). It is likely that teachers consider that changing the composition of the class is
the outcome of school reform, whereas it is only a structural change that aims to
enhance the probability that teaching related effects are changed. At best, teachers
could concentrate on the teaching and peer related causes of learning and realize
these can occur within all classes without the damning equity effects that seem, too
often, to accompany between-class grouping. It is worth moving beyond debates
about class composition to enhancing the quality of teaching regardless of the
compositional effects of the students.
References
Achilles, C. M., Kiser-Kling, K., Aust, A., & Owen, J. (1994). A study of reduced class size in primary
grades of a fully Chapter 1-eligible school: Success starts small. Paper presented at the annual meeting of
the American Educational Research Association, San Francisco, CA.
Alexander, K. L., Cook, M. A., & McDill, E. L. (1978). Curriculum tracking and educational
stratification: Some further evidence. American Sociological Review, 43, 47–66.
ARTICLE IN PRESS
J.A.C. Hattie / Int. J. Educ. Res. 37 (2002) 449–481 475
Argys, L. M., Rees, D. I., & Brewer, D. J. (1996). Detracking America’s schools: Equity at zero cost?
Journal of Policy Analysis and Management, 15(4), 623–645.
Baker, D. P., & Stevenson, D. L. (1986). Mothers’ strategies for children’s school achievement: Managing
the transition to high school. Sociology of Education, 59(1), 156–166.
Ball, S. J. (1981). Beachside comprehensive. Cambridge: Cambridge University Press.
Bates, P. (1992). Beyond tracking. Equity Coalition for Race, Gender, & National Origin (ERIC
Document Reproduction Service No. ED 397 207).
Bellanca, J., & Swartz, E. (Eds.). (1993). The challenge of detracking: A collection. Palatine, IL: IRI/
Skylight Publishing (ERIC Document Reproduction Service No. ED 369 573).
Betts, J. R., & Shkolnik, J. L. (1999). The behavioral effects of variations in class size: The case of math
teachers. Educational Evaluation & Policy Analysis, 21(2), 193–213.
Bigelow, B. (1993). Getting off track: Classroom examples for an anti-tracking pedagogy. Rethinking
Schools, 7(4), 18–20.
Blatchford, P., Baines, E., Kutnick, P., & Martin, C. (2001). Classroom contexts: Connections between
class size and within class grouping. British Journal of Educational Psychology, 71, 283–298.
Blatchford, P., Edmonds, S., & Martin, C. (2003). Class size, pupil attentiveness and peer relations.
Journal of Educational Psychology, 73, 15–36.
Blatchford, P., & Martin, C. (1998). The effects of class size on classroom processes: It’s a bit like a
treadmill working hard and getting nowhere fast. British Journal of Educational Studies, 46(2),
118–137.
Boaler, J., William, D., & Brown, M. (2000). Students’ experiences of ability grouping—dis-
affection, polarisation and the construction of failure. British Educational Research Journal, 26(5),
631–648.
Bode, R. K. (1993). Hierarchical linear modeling of class ability range on student mathematics
achievement. Paper presented at the annual meeting of the American Educational Research Association,
Atlanta, GA (ED 360 317).
Bohrnstedt, G. W., Stecher, B. M., & Wiley, E. W. (2000). The California class size reduction evaluation:
Lessons learned. In M. C. Wang, & J. D. Finn (Eds.), How small classes help teacher do their best (pp.
201–225). Philadelphia, PA: Temple University Center for Research in Human Development and
Education.
Boozer, M., Rouse, C. (1995). Intraschool variation in class size: Patterns and implications (ERIC
Document Reproduction Service No. ED 385 935).
Bourke, S. (1986). How smaller is better: Some relationships between class size, teaching practices and
student achievement. American Educational Research Journal, 23, 558–571.
Braddock, J. H. (1990). Tracking: Implications for student race-ethnic subgroups (Technical Report No. 1).
Baltimore: Center for Research on Effective Schooling for Disadvantaged Students.
Brewer, D. J., Krop, C., Gill, B. P., & Reichardt, R. (1999). Estimating the cost of national class size
reductions under different policy alternatives. Evaluation & Policy Analysis, 21(2), 179–192.
Brown, P., & Goren, P. (1993). Ability grouping and tracking: Current issues and concerns. Achieving
national education goals. Washington, DC: National Governors’ Association Center for Policy
Research. (ERIC Document Reproduction Service No. ED 406 458).
Burns, R. B., & Mason, D. A. (1995). Organizational constraints on the formation of elementary school
classes. American Journal of Education, 103(2), 185–212.
Camarena, M. (1990). Following the right track: A comparison of tracking practices in public and
Catholic schools. In R. Page, & L. Valli (Eds.), Curriculum differentiation: Interpretive studies in US
secondary schools. Albany, NY: State University of New York Press.
Chase, C. I., Mueller, D. J., & Walden, J. D. (1986). PRIME TIME: Its impact on instruction and
achievement. Final report. Indianapolis, IN: Indiana Department of Education.
Coley, R. J. (1991). A long track record. Educational Testing Service Policy News, 4(1), 1–7.
Crespo, M., & Michelna, J. (1981). Streaming, absenteeism, and dropping out. Canadian Journal of
Education, 6, 40–55.
Dawson, M. M. (1987). Beyond ability grouping: A review of the effectiveness of ability grouping and its
alternatives. School Psychology Review, 16(3), 25–46.
ARTICLE IN PRESS
476 J.A.C. Hattie / Int. J. Educ. Res. 37 (2002) 449–481
Dornbusch, S. M. (1994). Off the track. Paper presented at the biennial meeting of sociol. res. adolesc., San
Diego.
Epstein, J. L., & MacIver, D. J. (1990). Education in the middle grades: National practices and trends.
Columbus, OH: National Middle School Association.
Evertson, C. M., & Folger, J. K. (1989). Small class, large class: What do teachers do differently?
Paper presented at the annual meeting of the American Educational Research Association, San
Francisco.
Finley, M. K. (1984). Teachers and tracking in a comprehensive high school. Sociology of Education, 57,
233–243.
Finn, J. D. (1998). Class size and students at risk: What is known? What is next? Washington, DC: US
Department of Education, Office of Educational Research and Improvement, National Institute on the
Education of At-Risk Students.
Finn, J. D., & Achilles, C. M. (1990). Answers about questions about class size: A statewide experiment.
American Educational Research Journal, 27, 557–577.
Finn, J. D., & Achilles, C. M. (1999). Tennessee’s class size study: Findings, implications, misconceptions.
Educational Evaluation and Policy Analysis, 21(2), 97–109.
Finn, J. D., Folger, J., & Cox, D. (1991). Measuring participation among elementary grade students.
Educational and Psychological Measurement, 51, 393–402.
Finn, J. D., Pannozzo, G. M., & Voelkl, K. E. (1995). Disruptive and inattentive-withdrawn behavior and
achievement among fourth graders. The Elementary School Journal, 95, 421–434.
Finn, J. D., Pannozzo, G. M., & Achilles, C. M. (submitted for publication). The ‘‘Whys’’ of class size:
Student behavior in small classes, Review of Educational Research.
Floyd, C. (1954). Meeting children’s reading needs in the middle grades: A preliminary report. Elementary
School Journal, 55, 99–103.
Gamoran, A. (1986). Instructional and institutional effects of ability grouping. Sociology of Education,
59(4), 185–198.
Gamoran, A. (1987a). The stratification of high school learning opportunities. Sociology of Education, 60,
135–155.
Gamoran, A. (1987b). Organization, instruction, and the effects of ability grouping: Comment on Slavin’s
‘‘best-evidence synthesis’’. Review of Educational Research, 57(3), 341–345.
Gamoran, A. (1989). Measuring curriculum differentiation. American Journal of Education, 97, 129–143.
Gamoran, A. (1992). Social factors in education. In M. C. Alkin (Ed.), Encyclopaedia of Educational
Research (pp. 1222–1229). New York: Macmillan.
Gamoran, A. (1993). Alternative uses of ability grouping in secondary schools: Can we bring high-quality
instruction to low-ability classes? American Journal of Education, 102(1), 1–22.
Gamoran, A., & Berends, M. (1987). The effects of stratification in secondary schools: Synthesis of survey
and ethnographic research. Review of Educational Research, 57, 415–435.
Gamoran, A., & Mare, R. D. (1989). Secondary school tracking and educational inequality:
Compensation, reinforcement, or neutrality? American Journal of Sociology, 94(5), 1146–1183.
Gamoran, A., Nystrand, M., Berends, M., & LePore, P. C. (1995). An organizational analysis of the
effects of ability grouping. American Educational Research Journal, 32, 687–715.
George, P. S., & Shewey, K. (1994). New evidence for the middle school. Columbus, OH: National Middle
School Association ED 396 839.
Gilbert, A., & Yerrick, R. (2001). Same school, separate worlds: A sociocultural study of identity,
resistance, and negotiation in a rural, lower track science classroom. Journal of Research in Science
Teaching, 38(5), 574–598.
Gillibrand, E., Robinson, P., Brawn, R., & Osborn, A. (1999). Girls’ participation in physics in single sex
classes in mixed schools in relation to confidence and achievement. International Journal of Education,
21(4), 349–362.
Glass, G. V., Cahen, L. S., Smith, M. L., & Filby, N. N. (1982). School size: Research and policy. Beverly
Hills, CA: Sage.
Glass, G. V., Smith, M. J. (1978). Meta-analysis of research on the relationship of class-size and
achievement: Laboratory of Educational Research, University of Colorado.
ARTICLE IN PRESS
J.A.C. Hattie / Int. J. Educ. Res. 37 (2002) 449–481 477
Goodlad, J. I., & Anderson, R. H. (1987). The nongraded elementary school. New York: Teachers College
Press (Original work published 1963).
Haller, E. J., & Davis, S. A. (1980). Does socioeconomic status bias the assignment of elementary school
students to reading groups? American Educational Research Journal, 17(4), 409–418.
Hallinan, M. T., & Williams, R. A. (1989). Interracial friendship choices in secondary schools. American
Sociological Review, 54, 67–78.
Hanushek, E. A. (1986). The economics of schooling: Production and efficiency in public schools. Journal
of Economic Literature, 24, 1141–1177.
Hanushek, E. A. (1998). The evidence on class size (ERIC Document Reproduction Service No. ED 443
158).
Hanushek, E. A. (1999). Some findings from an independent investigation of the Tennessee STAR
experiment and from other investigations of class size effects. Educational Evaluation & Policy Analysis,
21(2), 143–163.
Hanushek, E. A., Rivkin, S. G., & Taylor, L. L. (1996). Aggregation and the estimated effects of school
resources. Review of Economics & Statistics, 78(4), 611–627.
Hargreaves, L., Galton, M., & Pell, A. (1998). The effects of changes in class size on teacher–pupil
interaction. International Journal of Educational Research, 29, 779–795.
Harlen, W., & Malcolm, H. (1999). Setting and streaming: A research review. Revised Edition. Scottish
Council for Research in Education, Edinburgh.
Hattie, J. (1992). Measuring the effects of schooling. Australian Journal of Education, 36(1), 5–13.
Hattie, J.A. (1999). Influences on student learning. Inaugural Professorial Address, University of Auckland.
http://www.arts.auckland.ac.nz/edu/staff/jhattie/ermindex.html.
Hedges, L. V., & Stock, W. (1983). The effects of class size: An examination of rival hypotheses. American
Educational Research Journal, 20, 63–65.
Hoffer, T. B. (1992). Middle school ability grouping and student achievement in science and mathematics.
Educational Evaluation and Policy Analysis, 14(3), 205–227.
Ingels, S. J. (1988). Symposium on the national education longitudinal study of 1988 (NELS:88) and the
NELS:88 field test (New Orleans, LA, April 5–9, 1988): ED 297 006.
Jaeger, R. M., & Hattie, J. A. (1996). Artifact and artiface in education policy analysis: It’s not all in the
data. School Administrator, 53, 24–25, 28–29.
Kiser-Kling, K. (1995). Life in a small teacher-pupil ration class. Unpublished Ed. D dissertation,
University of North Carolina, Greensboro.
Krueger, A. B. (1997). Experimental estimates of education production functions (Working paper #6051).
Cambridge, MA: National Bureau of Economics Research.
Kruse, A.-M. (1987). Sagde du konssegregering—med vilje? Paedagogik med rode stromper. Kobenhavn:
Danmarks Laererhojskole.
Kruse, A.-M. (1989). Hvorfor pigeklasser? In A.-M. Adda Hilden, & Kruse (Eds.), Pigernes skole (pp.
249–263). Skive: Klim.
Kruse, A.-M. (1990). Konsadskilt undervisning som konsbevidst paedagogik. In H. Jacobsen, & L.
Hojgaard (Eds.), Skolen er kon (pp. 36–81). Viborg: Ligestillingsridet.
Kruse, A.-M. (1992). We have learnt not just to sit back, twiddle our thumbs and let them take over.
Single-sex settings and the development of a pedagogy for girls and a pedagogy for boys in Danish
schools. Gender and Education, 4(12), 81–103.
Kruse, A.-M. (1994). Hvordan er det med der forskelle pa piger og drenge? Interview med Harriet Bjerrum
Nielsen. Tidsskrift for borne & ungdomskultur, 34, 51–65.
Kruse, A.-M. (1995). Single-sex settings: Pedagogies for girls and boys in Danish schools. Paper presented
at the UNESCO colloquium, Institute of Education, London.
Kruse, A.-M. (1996). Approaches to teaching girls and boys. Current debates, practices, and perspectives
in Denmark. Women’s Studies International Forum, 19(4), 429–445.
Kulik, C.-L. C., & Kulik, J. A. (1982a). Effects of ability grouping on secondary school students: A meta-
analysis. American Educational Research Journal, 19(3), 415–428.
Kulik, C.-L. C., & Kulik, J. A. (1982b). Research synthesis on ability grouping. Educational Leadership,
39, 619–621.
ARTICLE IN PRESS
478 J.A.C. Hattie / Int. J. Educ. Res. 37 (2002) 449–481
Kulik, C.-L. C., Kulik, J. A. (1984). Effects of ability grouping on elementary school pupils: A meta-
analysis. Paper presented at the annual meeting of the American Psychological Association, Toronto,
Ont., Canada.
Kulik, J. A., & Kulik, C.-L. C. (1985). Effects of inter-class ability grouping on achievement and self-
esteem. Paper presented at the annual convention of the American Psychological Association, Los
Angeles, CA.
Kulik, J. A., & Kulik, C.-L. C. (1987). Effects of ability grouping on student achievement. Equity and
Excellence, 23, 22–30.
Lareau, A. (1987). Social class differences in family school relationships: The importance of cultural
capital. Sociology of Education, 60, 73–85.
Lockwood, A. T. (1990). Mixed reviews: Two interviews. National Center on Effective Secondary School
Newsletter, 5(1), 9–11.
Lockwood, A. T. (1996). Tracking: Conflicts and resolutions. Controversial issues in education. Thousand
Oaks, CA: Corwin Press.
Lounsbury, J. H., & Clark, D. C. (1990). Inside grade eight: From apathy to excitement. Reston, VA:
National Association of Secondary School Principals. ED 327 318.
Loveless, T. (1993). Organizational coupling and the implication of tracking reform (Technical Report
No.143). Chicago, IL: Midwest Administration Center, the University of Chicago.
Loveless, T. (1998). The tracking and ability grouping debate (Vol. 2, Number 8). Washington, DC:
Thomas B. Fordham Foundation, (http://www.edexcellence.net).
Loveless, T. (1999a). The tracking and ability grouping. Washington DC: Thomas B. Fordham
Foundation, Washington, DC (http://www.edexcellence.net) (ERIC Document Reproduction Service
No. ED 422 454).
Loveless, T. (1999b). The tracking war: State reform meets school policy. Washington, DC: Bookings
Institution Press.
Malloy, L., & Gilman, D. (1989). The cumulative effects on basic skills achievement of Indiana’s
PRIME TIME. A state sponsored program of reduced class size. Contemporary Education, 60,
169–172.
Marsh, H. W. (1984a). Self-concept, social comparison and ability grouping: A reply to Kulik and Kulik.
American Educational Research Journal, 21, 799–806.
Marsh, H. W. (1984b). Self-concept: The application of a frame of reference model to explain paradoxical
results. Australian Journal of Education, 28, 165–181.
Marsh, H. W. (1987). The big-fish-little-pond effect on academic self-concept. Journal of Educational
Psychology, 79, 280–295.
Marsh, H. W., & Hattie, J. A. (1995). Theoretical models in self-concept. In B. Bracken (Ed.), Handbook
on self-concept (pp. 38–92). New Jersey: Erlbaum.
Marsh, H. W., & Parker, J. W. (1984). Determinants of student self-concept: Is it better to be a relatively
large fish in a small pond even if you don’t learn to swim as well? Journal of Personality and Social
Psychology, 47, 213–231.
Marsh, H. W., & Rowe, K. J. (1996). The effects of single-sex and mixed-sex mathematics classes within a
coeducational school: A reanalysis and comment. Australian Journal of Education, 40(2), 147–162.
Mason, D. A., & Burns, R. B. (1995). Teachers’ views of combination classes. Journal of Educational
Research, 89(1), 36–45.
Mason, D. A., & Burns, R. B. (1996). ‘‘Simply no worse and simply no better’’ may simply be wrong: A
critique of Veeman’s conclusion about multigrade classes. Review of Educational Research, 66(3),
307–322.
Mason, D. A., & Doepner, R. W. (1998). Principals’ views of combination classes. Journal of Educational
Research, 91(3), 160–172.
Mason, D. A., Schroeter, D. D., Combs, R. K., & Washington, K. (1992). Assigning average-achieving
eighth graders to advanced mathematics classes in urban junior high. Elementary School Journal, 92(5),
587–599.
McFadden, A. C., Marsh, G. E., Price, B. J., & Hwang, Y. (1992). A study of race and gender bias in the
punishment of school children. Education and Treatment of Children, 15, 140–146.
ARTICLE IN PRESS
J.A.C. Hattie / Int. J. Educ. Res. 37 (2002) 449–481 479
McGiverin, J., Gilman, D., & Tillitski, C. (1989). A meta-analysis of the relation between class size and
achievement. Elementary School Journal, 90(1), 47–56.
Metz, M. (1978). Classrooms and corridors: The crisis of authority in desegregated secondary schools.
Berkerly: University of California Press.
Milligan, S., & Thomson, K. (1992). Listening to girls. Melbourne: Curriculum Corporation.
Mills, R. (1998). Grouping students for instruction in middle schools. Champaign, IL: Eric Clearinghouse
(http://www.ed.gov/database/Eric Digests/ed419631.html).
Molnar, A., Smith, P., Zahorik, J., Palmer, A., Halbach, A., & Ehrle, K. (1999). Evaluating the SAGE
program: A pilot program in targeted pupil-teacher reduction in Wisconsin. Educational Evaluation &
Policy Analysis, 21(2), 165–177.
Mosteller, F., Light, R. J., & Sachs, J. (1996). Sustained inquiry in education: Lessons from skill grouping
and class size. Harvard Educational Review, 66(4), 797–842.
Mueller, D. J., Chase, C. I., & Walden, J. D. (1988). Effects of reduced class size in primary classes.
Educational Leadership, 45(7), 48–50.
National Center for Educational Statistics. (1985). High school and beyond: An analysis of course-taking
patterns in secondary schools as related to student characteristics. Washington, DC: US Government
Printing Office.
Noland, T. K., & Taylor, B. L. (1986). The effects of ability grouping: A meta-analysis of research
findings. Paper presented at the annual meeting of the American Educational Research Association, San
Francisco.
Nye, B., Hedges, L. V., & Konstantopoulos, S. (1999). The long-term effects of small classes: A five-year
follow-up of the Tennessee class size experiment. Educational Evaluation & Policy Analysis, 21(2),
127–142.
Nystrand, M., & Gamoran, A. (1988). A study of instruction as discourse. Madison, WI: Wisconsin Center
for Education Research.
Oakes, J. (1981). A question of access: Tracking and curriculum differentiation in a national sample of
English and mathematics classes. A study of schooling in the United States: Technical Report Series, No.
24 (ERIC Document Reproduction Service No. ED 214 892).
Oakes, J. (1985). Keeping track: How schools structure inequality. New Haven, CT: Yale University Press.
Oakes, J. (1992). Can tracking research inform practice? Technical, normative, and political
considerations. Educational Researcher, 21(4), 12–21.
Oakes, J. (1993). Creating middle schools: Technical, normative, and political considerations. Elementary
School Journal, 93(5), 461–480.
Oakes, J., Gamoran, A., & Page, R. (1992). Curriculum differentiation, opportunities, outcomes, and
meanings. In P. Jackson (Ed.), Handbook of research on curriculum (pp. 570–608). New York:
McMillan.
Oakes, J., & Guiton, G. (1995). Matchmaking: The dynamics of high school tracking decisions. American
Educational Research Journal, 32, 3–33.
Oakes, J., Ormseth, T., Robert, B., & Camp, P. (1990). Multiplying inequalities: The effects of race, social
class, and tracking on opportunities to learn mathematics and science. Santa Monica, CA: Rand.
Oakes, J., & Wells, A. S. (1996). Beyond the technicalities of school reform: Lessons from detracking schools.
Los Angeles: Center X, Graduate School of Education and Information Studies, UCLA.
Oakes, J., & Wells, S. (1998). Detracking for high student achievement. Educational Leadership, 55(6),
38–41.
Page, R. N. (1991). Lower track classrooms: A curricular and cultural perspective. New York: Teachers
College Press.
Parker, L. H. (1985). A strategy for optimizing the success of girls in mathematics: Report of a project of
national significance. Canberra: Commonwealth Schools Commission.
Parker, L. H., & Rennie, L. (1997). Teachers’ perceptions of the implementation of single-sex classes in
coeducational schools. Australian Journal of Education, 41(2), 119–133.
Pate-Bain, H., Boyd-Zaharias, J., Cain, V. A., Word, E., & Binkley, M. E. (1997). STAR follow-up studies,
1996–1997: The student/teacher achievement ratio (STAR) project (ERIC Document Reproduction
Service No. ED 419 593).
ARTICLE IN PRESS
480 J.A.C. Hattie / Int. J. Educ. Res. 37 (2002) 449–481
Persell, C. (1977). Education and inequality: The roots and results of stratification in America’s schools. New
York: Free Press.
Radebaugh, B. F. (1993). Democratizing educational research or why is our nation still at risk after ten years
of educational reform? (ERIC Document Reproduction Service No. ED 366 104).
Rice, J. M. (1902). Education research: A test in arithmetic. The Forum, 34, 281–297.
Rice, J. K. (1999). The impact of class size on instructional strategies and the use of time in high school
mathematics and science courses. Educational Evaluation & Policy Analysis, 21(2), 215–229.
Ritter, G. W., & Boruch, R. F. (1999). The political and institutional origins of a randomized controlled
trial on elementary school class size: Tennessee’s project STAR. Educational Evaluation & Policy
Analysis, 21(2), 111–123.
Rosenbaum, J. E. (1980). Track misperceptions and frustrated college plans: An analysis of the effects
of tracks and track perceptions in the national longitudinal survey. Sociology of Education, 53(2),
74–88.
Rothenberg, J. J., McDermott, P., & Martin, G. (1998). Changes in pedagogy: A qualitative result of
teaching heterogeneous classes. Teaching and Teacher Education, 14(6), 633–642.
Rowe, K. J. (1988). Single-sex and mixed-sex classes: The effects of class type on student achievement,
confidence and participation in mathematics. Australian Journal of Education, 32, 180–202.
Russo, R. P. (1988). The national education longitudinal study of 1988: Teacher survey (ERIC Document
Reproduction Service No. ED 293 893).
Schwartz, F. (1981). Supporting or subverting learning: Peer group patterns in four tracked schools.
Anthropology and Education Quarterly, 12, 99–121.
Shanker, A. (1993). Public vs. private schools. National Forum. Phi Kappa Phi Journal, 73(4), 14–17.
Shapson, S. M., Wright, E. N., Eason, G., & Fitzgerald, J. (1980). An experimental study of effects of class
size. American Educational Research Journal, 17, 141–152.
Signorella, M. L., Frieze, I. H., & Hershey, S. W. (1996). Single-sex versus mixed-sex classes and gender
schemata in children and adolescents. Psychology of Women Quarterly, 20(4), 599–607.
Slavin, R. E. (1984). Meta-anlaysis in education: How has it been used? Educational Research, 13(8), 6–15,
24–27.
Slavin, R. E. (1987). Ability grouping and student achievement in elementary schools: A best-evidence
synthesis. Review of Educational Research, 57, 293–336.
Slavin, R. E. (1989a). Grouping for instruction in the elementary school. Hillsdale, NJ: Erlbaum.
Slavin, R. E. (1989b). School and classroom organization. Hillsdale, NJ: Erlbaum.
Slavin, R. E. (1990). Achievement effects of ability grouping in secondary schools: A best-evidence
synthesis. Review of Educational Research, 60(3), 417–499.
Smith, M. L., & Glass, G. V. (1980). Meta-analysis of research on class size and its relationship to attitudes
and instruction. American Educational Research Journal, 17, 419–433.
Spear, R. C. (1994). Teachers perceptions of ability grouping practices in middle level schools. Research in
Middle Level Education, 18(1), 117–130.
Stasz, C., & Stecher, B. (2002). Before and after class-size reduction: A tale of two teachers. In M. C.
Wang, & J. D. Finn (Eds.), Taking small classes one step further (pp. 19–50). Greenwich, CT:
Information Age Publishing.
Steedman, J. (1983). Examination results in mixed and single sex schools: Findings from the national child
development study: Report of the Equal Opportunities Commission of the UK. Manchester: Equal
Opportunities Commission.
Steele, C. (1992). Race and the schooling of Black Americans. Atlantic Monthly, 269, 68–78.
Trussell-Cullen, A. (1994). Whatever happened to times tables? Every parent’s guide to New Zealand
education. Auckland, NZ: Reed Books.
Tuckman, B. W., & Bierman, M. (1971). Beyond pygmalion: Galatea in the schools. Paper presented at the
annual meeting of American Educational Research Association, New York.
Urdan, T., Midgley, C., & Wood, S. (1995). Special issues in reforming middle level schools. Journal of
Adolescence, 15(1), 9–37.
Useem, E. L. (1991). Student selection into course sequences in mathematics: the impact of parental
involvement and school policies. Journal of Research in Adolescence, 1(3), 231–250.
ARTICLE IN PRESS
J.A.C. Hattie / Int. J. Educ. Res. 37 (2002) 449–481 481
Useem, E. L. (1992). Middle schools and math groups: Parents’ involvement in children’s placement.
Sociology of Education, 65(4), 263–279.
Valentine, J., Clark, D. D., Irvin, J. L., Keefe, J. W., & Melton, G. (1993). Leadership in middle level
education: A national survey of middle level leaders and schools. (Vol. I). Reston, VA: National
Association of Secondary School Principals (ERIC Document Reproduction Service No. ED 356 535).
Van Fossen, B. E., Jones, J. D., & Spade, J. D. (1987). Curriculum tracking and status maintenance.
Sociology of Education, 60, 104–122.
Veenman, S. (1995). Cognitive and noncognitive effects of multigrade and multi-age classes: A best-
evidence synthesis. Review of Educational Research, 65(4), 319–381.
Veenman, S. (1996). Effects of multigrade and multi-age classes reconsidered. Review of Educational
Research, 66(3), 323–340.
Welner, K. G., & Oakes, J. (1996). (Li)Ability grouping: The new susceptibility of school tracking systems
to legal challenges. Harvard Educational Review, 66(3), 451–470.
Wenglinsky, H. (1997). When money matters: How educational expenditures improve student performance
and how they don’t. Princeton, NJ: The Educational Testing Service, Policy Information Center.
Wheelock, A. (1992). Cross the track: How ‘‘untracking’’ can save America’s schools. New York: New Press
(ERIC Document Reproduction Service No. ED 353 349).
Wheelock, A. (1994). Alternatives to tracking and ability grouping. Arlington, VA: American Association
of School Administrators.
Wiatrowski, M. D., Hansell, S., Massey, C. R., & Wilson, D. L. (1982). Curriculum tracking and
delinquency. American Sociological Review, 47, 151–160.
Willis, S., & Kenway, J. (1986). On overcoming sexism in schooling: To marginalize or mainstream.
Australian Journal of Education, 30, 132–149.
Word, E., Johnston, J., Bain, H., Fulton, D. B., Boyd-Zaharias, J., Lintz, M. N., Achilles, C. M., Folger,
J., & Breda, C. (1990). Student/teacher achievement ratio (STAR): Tennessee’s K3 class-size study.
Nashville, TN: Tennessee State Department of Education.
John A. C. Hattie is Professor of Education at the University of Auckland. His interests are models of
teaching and learning, psychometrics, meta-analysis and theories of self (concept, regulation, efficacy).