7
Assessment in education is commonly used to certify the amount that individual students have learned and to provide an accountability measure for students and educational systems as a whole. Formative assessment, in which the assessment is integrated within instruction and aimed at increasing learning, can replace summative assessment in many situations.
Issues, Examples, and Challenges in Formative Assessment
Earl Hunt, James W. Pellegrino
Assessment in education is often in the news. There has been a tremendous push for continuous assessment in U.S. schools to ensure that students are being prepared adequately and to hold both students and teachers accountable for the quality of student preparation. However, we are concerned that the testing methods being used are not the best choices to meet the accountability goal and even have a potential for harming the system. We are not alone in our concerns. A report commissioned by the National Research Councils Board on Testing and Assessment (Heubert and Hauser, 1999) warned that the tests appropriate for tracking students and the tests appropriate for evaluating system performance are not necessarily the same. This sort of distinction is central to our thesis. We want to distinguish between summative tests, which evaluate a students capabilities at a particular time, and formative tests, which are intended to assist a student (or teacher) in improving a students capability. We also distinguish between disruptive testing, in which evaluation takes place outside the context of normal instruction, and integrated testing, in which testing is conducted unobtrusively, as part of normal classroom activity. Disruptive testing, sometimes derisively referred to as drop in from the sky testing, has traditionally been used for certifying student accomplishment and predicting future performance. This is the sort of testing that is at the focus of debates about accountability. We hope to shift the focus of the discussion. In this chapter, we describe some new developments in formative assessment and present some challenges for the educational community.
Preparation of this chapter was partially supported by a National Science Foundation grant, REC 9972999, to the University of Washington.
NEW DIRECTIONS FOR TEACHING AND LEARNING, no. 89, Spring 2002 Wiley Periodicals, Inc.
73
74
APPLYING THE SCIENCE OF LEARNING TO UNIVERSITY TEACHING AND BEYOND
Our main message is that many of the current assessment practices that serve certication and prediction functions well are not well suited for improving learning. Alternative approaches to assessment, rooted in cognitive theories of knowledge and learning, need to be deployed on a much wider scale than is currently the case. Developments in technology make such an approach to assessment increasingly feasible.
Continuous Formative Assessment
Formative assessments should take place during learning. Furthermore, we argue that they should be built on a realistic model of the way in which learning builds on learning. Instead of regarding a student as a point in a geometric space, as is the case in summative assessment, we believe students should be thought of as being in a particular state of knowledge. The knowledge state is dened by possession of facts, procedures, and (possibly problematical) understanding of the relation between facts and procedures. To illustrate, consider three students, all of whom know, correctly, that an object weighs less on the moon than it does on earth. One student might explain this by the fact that the moon has less mass than the earth, another by the fact that air pressure does not push down on the moon as it does on earth. The third might assert that gravity is not as great on the moon as it is on earth but be unable to explain the relation between mass and gravitational force. Each of these students would be able to answer a question about the relative weight of an object on earth and the moon. They would be in different knowledge states. To assess a students knowledge state, it is necessary to engage in a conversation with a student (or a class) and tailor instruction to the answers that the student gives. This is not simply a rediscovery of the Socratic dialogue in the modern classroom, for two reasons. One reason, which is not trivial, is that information and evaluation have to be returned to the student in a manner that he or she will receive it. Socrates was a bit sarcastic, and we note that his alumni did not rise up when the authorities in Athens decided to poison him. The second requirement is a logistical one. Socrates seems to have had rather few students. The modern classroom contains anywhere from twenty to sixty students, so a one-on-one dialogue is often impractical. Nevertheless, there are a variety of techniques for engaging students in in-class, formative assessments. We describe two here. But rst we have some comments about in-class, formative assessments in general. Formative assessments work, in the sense that they improve learning. Within a classroom, they can produce major improvements in learning. Black and Wiliam (1998) report effect sizes between .4 and .7 standard deviation unit, which are large effects for educational research programs. Consistent with these ndings, both the National Research Council (1996) and the National Council of Teachers of Mathematics (1989, 2000) have recommended that teachers use formative assessments in order to become aware of the preconceptions and problem-solving techniques that their students bring into the classroom.
ISSUES, EXAMPLES, AND CHALLENGES IN FORMATIVE ASSESSMENT
75
In spite of the research ndings showing the efcacy of formative assessment, it has not been adopted widely in classroom practice (Black and Wiliam, 1998), probably for several reasons. Formative assessment requires that the assessor know in advance both the material that students are supposed to grasp and the different alternative and problematical ways in which students may fail to grasp it. In order to obtain this information, a teacher either has to have a great deal of experience evaluating students reasoning or be aware of research literature on student beliefs and misbeliefs. The depth of this literature varies greatly across disciplines. In the general case, requiring all teachers to be experts at formative assessment is not feasible. A second problem with formative assessment is that it takes time. There is no point in formative assessment by a teacher if the teacher cannot identify, analyze, and respond to the problems of individual students. To the extent that the logistics of a situation require that an instructor give preset instruction, formative assessment is likely to be of little use unless it can be provided as an exercise for the individual student, with a minimum amount of supervision by the instructor. The extremes here are probably the instructional limits placed on rst- and second-year university instructors, who may have upwards of three hundred people in a single lecture class, and high school instructors who may see ve sections of thirty people each on a single day, in addition to any administrative duties. For these reasons, we believe that some of the best formative instruction programs are likely to be those in which a substantial part of the formative assessment is off-loaded to a robot instructor: computer technology. We hasten to say that we do not at any time propose that instruction be handed over to a teaching machine. The purpose of the technology is to gather information about the student that can be summarized and presented to a teacher so that the teacher can adjust instruction to the dominant ideas present in the class. We once again refer to an industrial idea: inventory. A formative assessment program should provide a continuous inventory of a classs educational status, taken during ongoing activities, just as a bar code system provides a continuous, unobtrusive inventory in a large store. By contrast, drop-in-from-the-sky (summative) testing is analogous to closing the store for inventory, an expensive process that modern businesses do not use. While this information is being gathered, the assessment program may also provide some instruction, as part of its conversation with the student. This may be a benet, but it is not the primary purpose of the assessment. Two examples will illustrate somewhat different approaches to the assessment problem. We especially urge readers to note the model of knowledge organization on which these programs are based. DIAGNOSER. DIAGNOSER (Hunt and Minstrell, 1996) is a collection of interactive computer programs used to facilitate facet-based instruction, which is based on the idea that students already have their own ideas about the topic of instruction. We and our colleagues have designed, and are designing, a number of DIAGNOSER modules covering different topics in science and mathematics, ranging from a middle school earth science
76
APPLYING THE SCIENCE OF LEARNING TO UNIVERSITY TEACHING AND BEYOND
assessment module to a closely related program, DIANA, developed by David Madigan and Andrew Schaffner for use in a university-level statistics course (Schaffner and others, 1997). Most of our experience is with the physical sciences, and more particularly introductory physics. Therefore, most of the examples here are drawn from that eld. DIAGNOSER was constructed to be a tool in a larger program, facetbased instruction, developed by James Minstrell (Minstrell and Stimpson, 1996; Hunt and Minstrell, 1994). Therefore, understanding DIAGNOSER requires understanding the teaching program within which it is used. Facet-based instruction is predicated on the assumption that students are not blank slates to be written on by either a process of behavior modication or clearly delivered didactic lectures. Students come to most topics with ideas, and the instructor should work with those ideas. On the other hand, students seldom have fully developed naive theories (self-generated explanations) of a topic. Instead, they have sets of working rules of thumb for understanding situations and selecting actions. These are referred to as facets. An important point to remember is that most students do not assign high priority to consistency between facets. It is much more important that facets provide easily understood, adequate local explanations of a phenomenon than that they t into a consistent framework. Facet-based instruction proceeds in two stages: identication of student facets and the presentation of situations that move students away from using a collection of limited facets toward the use of the interconnected schema characteristic of expert reasoning in a variety of elds. In this method of teaching, assessment is seamlessly integrated within instruction and is not a separate activity, intended to evaluate the results of instruction. Facet-based instruction and the DIAGNOSER assessment technique work because students seldom have their own unique set of facets. Instead, they share facets widely. For example, students in introductory physics classes often enter instruction with the belief (facet) that air pressure has something to do with weight, since air presses down on objects. Another widely held facet is that if two bodies of different sizes and speeds collide, larger, faster bodies exert more force than smaller, slower bodies do. Although neither of these facets is consistent with actual physical principles, they are roughly satisfactory explanations for understanding a variety of situations. A DIAGNOSER module consists of a collection of question sets dealing with a topic, roughly equivalent to a unit in a science or mathematics course. For example, modules have been written to cover the nature of gravity, scales and measurements, experimental design, and water cycles. The rst step in module design is to construct a list of facets that an instructor might encounter during the relevant unit. Collecting facets requires a careful analysis of both the material to be taught and the way that students are likely to deal with this material. In our work, facets have been gathered in three ways: by examining a relevant literature when it exists, consulting with experienced teachers, and examining student
ISSUES, EXAMPLES, AND CHALLENGES IN FORMATIVE ASSESSMENT
77
responses to open-ended elicitation questions intended to reveal students initial ideas about a topic. Which of these methods is best depends on the topic and the circumstance. However, developers should be aware that this is an extremely important and time-consuming step in the development cycle. The next stage is to develop the question sets for a module. Each question set consists of one or more phenomenological questions, terminating with a reasoning question. Phenomenological questions refer to a concrete situation or mathematical problem. The questions are presented in multiplechoice form, where each of the allowed responses is keyed to a facet. For example, one of the questions in our Nature of Gravity and Surrounding Media module asks whether a block of material that weighs twenty kilograms in normal room conditions would weigh more or less if (1) weighed in a vacuum or (2) weighed in special room in which the air pressure was doubled. A person who believes that air pressure is partially responsible for weight (a facet that has been observed) should say that the block is lighter in a vacuum and heavier under double air pressure. (In fact, the opposite is true, due to the effect of buoyancy.) When students complete the phenomenological questions, they are presented with a reasoning question. This is done without scoring or correcting a students answers to the phenomenological question. The reasoning question offers a list of possible explanations that might be given for the phenomena observed in the phenomenological questions. Each of the alternatives offered is basically a restatement of a facet, although the student does not know this. Because reasoning alternatives have been tied to the alternatives offered for the phenomenological question, the program can determine whether a student is responding consistently to both the phenomenological and reasoning questions. It is possible for a person to be consistent in his or her reasoning and still be incorrect. For instance, in the weight example, a person might say that the block would be heavier under air pressure and give as a reason the fact (more properly, facet) that weight is partly due to air pressure pressing downward. After the student answers the reasoning question, the program enters a feedback mode. First, it comments on whether the questions were answered correctly and whether the answer to the reasoning question was consistent with the answers to the phenomenological questions. This sequence has been chosen because, in addition to teaching content material, the DIAGNOSER attempts to reward two types of behavior. First, the program insists that a student have a reason for answering phenomenological questions. Our impression is that students quickly learn this. They stop answering phenomenological questions rapidly and intuitively, for they know that they are going to be asked to explain themselves. Second, the program compliments students for consistent reasoning, even if it is erroneous. This feedback is intended to work on one of the chief differences that several observers have noticed between naive and formal science. Naive
78
APPLYING THE SCIENCE OF LEARNING TO UNIVERSITY TEACHING AND BEYOND
reasoning does not require that an explanation give a consistent account for all evidence; scientic reasoning does. The nal step in DIAGNOSER is the prescription. This is a suggestion for further work, based on the facets of reasoning that the student displayed while answering the phenomenological and reasoning questions. The suggestion or thought experiment is seldom phrased as a direct statement of the right answer. Rather, it is a further challenge to the students thinking. In the weight example, a student who displayed the air pressure causes weight facet might be asked whether things weigh more when they are weighed in water. Most students, being familiar with buoyancy in water, say that objects weigh less in water than in air. The student would then be reminded that air is a medium, just as water is, and that in fact it does exert a buoyant force. At the end of a session, students and teachers are given a summary of their performance. This includes the number of questions correctly answered and, more important, the facets a student displayed and the extent to which the student used consistent reasoning. This information is intended to assist the teacher in choosing further instruction rather than to serve as a report of student progress. In work in progress, DIAGNOSER is being extended to be part of an instructional Web site that provides supports to teachers and students as they prepare for the conventional examinations associated with a statewide assessment program in science and mathematics. This integrates DIAGNOSER with the dominant summative assessment program rather than presenting it as an alternative to conventional assessment. SMART. The second set of examples differs from the work just described by focusing on larger units of scientic inquiry and instruction that have the character of more extended problem-based and project-based learning situations. Frequently, such situations engage students in individual and collaborative problem-solving activities organized around a complex realworld scenario. Such inquiry-based activities have been recommended in various standards for mathematics and science learning and teaching (National Council of Teachers of Mathematics, 1989, 2000; National Research Council, 1996). An example of embedding assessment strategies within such extended inquiry activities can be found in work pursued by the Cognition and Technology Group at Vanderbilt (CTGV) on the development of a conceptual model for integrating curriculum, instruction, and assessment in science and mathematics (Barron and others, 1995, 1998; CTGV, 1994, 1997). The resultant Scientic and Mathematical Arenas for Rening Thinking (SMART) Model involves frequent opportunities for formative assessment by both students and teachers and includes an emphasis on self-assessment to help students develop the ability to monitor their own understanding and nd resources to deepen it when necessary (Brown, Bransford, Ferrara, and Campione, 1983; Stiggins, 1994). The SMART
ISSUES, EXAMPLES, AND CHALLENGES IN FORMATIVE ASSESSMENT
79
Model involves the explicit design of multiple cycles of problem solving, self-assessment, and revision in an overall problem-based to project-based learning environment. Activity in the problem-based learning portion of SMART typically begins with a video-based problem scenario such as the Stones River Mystery, which tells the story of a group of high school students who, in collaboration with a biologist and hydrologist, are monitoring the water in Stones River (Sherwood and others, 1995). The video shows the team visiting the river and conducting various water quality tests. Students in the classroom are asked to assess the water quality at a second site on the river. They are challenged to select tools that they can use to sample macroinvertebrates and test dissolved oxygen, to conduct these tests, and to interpret the data relative to previous data from the same site. Ultimately, they nd that the river is polluted due to illegal dumping of restaurant grease and then must decide how to clean up the pollution. The problem-based learning activity has three sequential modules: macroinvertebrate sampling, dissolved oxygen testing, and pollution cleanup. Each follows the same cycle of activities: initial selection of a method for testing or clean-up, feedback on the initial choice, revision of the choice, and a culminating task. The modules are preliminary to the project-based activity in which students conduct actual water quality testing at a local river. In executing this activity, they are provided with a set of criteria by which an external agency will evaluate written reports and accompanying videotaped presentations. Within each activity module, selection, feedback, and revision make use of the SMART WWWeb site, which organizes the overall process and supports three high-level functions. First, it provides individualized feedback to students and serves as a formative evaluation tool. Like DIAGNOSER, the feedback suggests aspects of students work that are in need of revision and classroom resources that students can use to help them revise. The feedback does not tell students the right answer. Instead, it sets a course for their independent inquiry. The Web feedback is generated from data that individual students enter. As an example, when students begin working on macroinvertebrates, they are given a catalogue of sampling tools and instruments. Many of these are bogus and collect the wrong kind of sample; others are legitimate and will gather a representative sample of macroinvertebrates. The catalogue items are designed to include contrasting cases that help students discover the need to know certain kinds of information. Students are asked to choose and justify their choice of tool. To help them make their choices, they are provided with resources, some on-line, that they can use to nd out about river ecosystems, macroinvertebrates, and water quality monitoring. Once students have made an initial set of choices, they use SMART WWWeb. They enter their catalogue choices and select justications for their choices. Once students have submitted their catalogue order on-line, SMART
80
APPLYING THE SCIENCE OF LEARNING TO UNIVERSITY TEACHING AND BEYOND
WWWeb sends them individualized feedback. As in the DIAGNOSER examples, the catalogue items and foils are designed to expose particular misconceptions. The feedback that students receive from SMART WWWeb highlights why the selected tool is problematic and suggests helpful resources (sections of on-line and off-line resources, hands-on experiments, and peers). This form of feedback has been used in similar work on mathematics problem solving, and results suggest it can be an effective stimulus for guided inquiry and revision by students. The second function of SMART WWWeb is to collect, organize, and display in SMART Lab the data collected from multiple distributed classrooms. Data displays are automatically updated as students submit new data to the database. The data in SMART Lab consist of students answers to problems and explanations for their answers. Each classs data can be displayed separately from the distributed classrooms data. This feature enables the teacher and the class to discuss different solution strategies and, in the process, address important concepts and misconceptions. These discussions provide a rich source of information for the teacher on how the students are thinking about a problem and are designed to stimulate further student reection. The third section of SMART WWWeb is Kids Online, which consists of explanations by student actors. The explanations are text based with audio narration, and they are full of errors by design. Students are asked to evaluate the explanations critically and provide feedback to the student actor. The errors contained in the explanation seed thinking and discussion on concepts that students frequently misconceive. At the same time, students learn important critical evaluation skills. The ability of students and teachers to make progress through the various cycles of work and revision and achieve an effective solution to the larger problem depends on a variety of resource materials carefully designed to assist the learning and assessment process. Students who use these resources and tools learn signicantly more than students who go through the same instructional sequence for the same amount of time, but without the benet of the tools and the embedded formative assessment activities. Furthermore, their performance in a related project-based learning activity is signicantly enhanced (Barron and others, 1995, 1998).
Problems of Formative Assessment
The DIAGNOSER and SMART approaches attempt to integrate assessment and instruction. In DIAGNOSER, assessment carries the burden. Instruction is woven in. In the system being developed, the Web site will contain a variety of suggestions to the teacher about classroom activities similar to SMART that can be used to respond to the belief states uncovered by DIAGNOSER. In SMART, instruction carries the burden. However, it would be fairly easy to develop programs like SMART that included explicit DIAGNOSER-like assessment sessions as part of their activities.
ISSUES, EXAMPLES, AND CHALLENGES IN FORMATIVE ASSESSMENT
81
DIAGNOSER, SMART, and similar programs either implicitly or explicitly see education as a progression through a set of belief states rather than a progression along a dimension of ability. Let us say that there are a topics and, for notational simplicity, that students exhibit k belief states on each topic. Remember, too, that within each topic, the k belief states can be partially ordered and that the partial ordering may not be sufcient to reveal an underlying dimension of knowledge or ability. There are thus ak possible belief states in this domain. A perfect assessment will reveal what belief state a person is in. An imperfect but useful assessment will provide evidence that can be evaluated using a sophisticated form of Bayesian reasoning, referred to as evidentiary reasoning, to estimate the probability that a person is in a particular belief state. The trick is to nd situations that provide the necessary a posteriori probabilities. We cannot stress too strongly that this is a very demanding exercise. Both classic test theory (CTT)and item response theory (IRT) benet from the fact that when we are trying to estimate a point on a line, we can obtain an accurate estimate by aggregating over individual estimates that individually are not all that accurate. Therefore, reasonable but individually unreliable items can be combined to produce a highly reliable (and valid) test. For example, if individual items have an average validity correlation of .3, a thirty-item test will have an estimated validity coefcient of .93. This makes it feasible to create large item banks of tests of known characteristics, which we draw on to create alternative forms of tests. Assessments based on evidentiary reasoning make much higher demands on individual items, for several reasons. One is that we are not concerned with just the probability that the student will get the item right, given the students ability level; we are concerned with the probability that a particular type of answer will be given, given that a student is in a certain state of knowledge. In an analytic model, there are simply more parameters to be estimated per item. Good items become precious things. Based on considerable knowledge, gathered through our research efforts in this eld, we warn that creating good items is far from a simple process. You cannot just look at curriculum-based standards, consult a few experts (usually teachers), and write down the ways in which students are likely to (mis)understand instruction. The research effort required to generate good items for evidentiary reasoning requires many more cycles of item evaluation and modication than is the case for items intended to be aggregated into a total test score. SMART is one of a number of instructional programs in which classroom activities are augmented by student-computer interactions. There are some programs in which the laboratories are actually conducted in this way, which means that the student is interacting with a model of the world, in which beakers are never broken and the specimen never rots, rather than an actual world. The proper role of live laboratories and computeroriented laboratories is an interesting and important topic but is not
82
APPLYING THE SCIENCE OF LEARNING TO UNIVERSITY TEACHING AND BEYOND
germane to this discussion. What is germane is the role of computer interaction in recording student activities. Students can certainly use computers to record their understanding of a laboratory exercise, including hypotheses, observations, data analysis, and conclusions. Hand-held computers make this method of recording possible for live laboratory and eld trips. There are several ways that this can be done. These vary from checking off preset alternatives (the multiple-choice test goes to the eld), through ll-in-the-blanks questionnaires, to simply writing a narrative report. Free-form, constructed answers are also part of the response format of the new, expanded DIAGNOSER. It is widely believed that eld notes of this sort contain a wealth of information. In particular, the rich information in the extended responses, including but not limited to eld notes, should provide insight into a students approach to situations that require extended inquiry and problem solving, just the sorts of situations that cannot be assessed by drop-in-fromthe-sky testing. The problem is to nd a way to analyze these reports within the logistical constraints of the school setting. There have been suggestions that advances in computer-based analyses of texts may assist us here. However, these advances certainly do not constitute a solution to the general problem of computer comprehension of text. They are better described as highly sophisticated word counts (Landauer and Dumais, 1997). These techniques do a good job of grading essays that are several hundred words in length. Whether they or similar techniques can be used to classify the briefer texts that would appear in a laboratory notebook or a response to a DIAGNOSER question remains to be seen.
Conclusion
In this chapter, we have considered a range of issues related to assessment and its use in the educational context. Our main message is that many of the current assessment practices that serve certication and prediction functions well are not well matched to the very important purpose of improving education, especially at the locus where learning takes place. Alternative approaches to assessment, rooted in cognitive theories of knowledge and learning, need to be deployed on a much wider scale than is currently the case. Technology developments make such an approach to assessment increasingly feasible. Technology is rapidly changing the nature of learning and learning environments and will have an impact on assessment design and use in three important ways. First, new technology-based learning environments, and especially the learning affordances they provide for students, will create a press for assessing aspects of cognition and learning that have never before been attempted. Part of the problem will be the simple reality that assessment will of necessity require the use of technology. The knowledge and skill of interest for assessment cannot be decoupled from the design of technology-enhanced learning environments.
ISSUES, EXAMPLES, AND CHALLENGES IN FORMATIVE ASSESSMENT
83
Second, the environments will create incredible opportunities to track student performance at levels of detail never before available for classroombased formative assessment. Such information will make it possible to engage in ongoing, systemic analysis of student performance and learning, over extended periods of time, and on multiple aspects of the curriculum. This will bring with it major challenges as to how to process the information in ways that assist the learning process by providing effective feedback to both teacher and learner. Some form of automatic summarization will be essential, for we will not inform people by overwhelming them with more information than they have time to process. Third, the possibility exists that we can shift from a disconnected system of assessment practices in which we have different designs for different purposes to a synthetic design that ensures comprehensiveness and consistency. Imagine a situation where the audit function of external summative assessments is no longer needed because all the information needed to assess students at the levels of description used for current summative assessments is derivable from the data streams being generated by students within classrooms. The metaphor here is to the shift in retail sales environments ranging from supermarkets to department stores. No longer do stores close down once or twice a year for an inventory audit of the items in the store. Rather, through automated processes of checkout and the use of bar codes for all items, a continuous stream of information is available to monitor inventory and ow of items. What we envision is a process for assessment similar in function to the data collection and ongoing inventory process that technology and bar coding has enabled for retail sales and other industries. Implementing such advances in technology will allow attainment of some of the goals for assessment mentioned in this chapter. Rich sources of information will be continuously available across wide segments of the curriculum and for individual learners over extended periods of time. Although this may sound Orwellian, it is not much different from current uses of technology to monitor a wide range of personal information such as Web sites one visits, products one buys, and personal choices in books, music, and videos. The real issue is not whether this assessment reality is feasible in the future but how we design for this possibility and explore the options it provides for effectively using assessment information to aid in student learning. The chief constraint on our doing so is the cost and quality of the required behavioral research. There is a need for ongoing, and of necessity labor-intensive and costly, research on students belief states, how they should be assessed, and how they can be altered to move toward an expert understanding of a content domain. This research is required if new technologies are to be used to reach the goals of accountability for and improvement of education without using assessments that run the risk of doing more harm than good.
84
APPLYING THE SCIENCE OF LEARNING TO UNIVERSITY TEACHING AND BEYOND
References
Barron, B., and others. Creating Contexts for Community-Based Problem Solving: The Jasper Challenge Series. In C. N. Hedley, P. Antonacci, and M. Rabinowitz (eds.), Thinking and Literacy: The Mind at Work. Mahwah, N.J.: Erlbaum, 1995. Barron, B. J., and others. Doing with Understanding: Lessons from Research on Problem and Project-Based Learning. Journal of Learning Sciences, 1998, 7, 271312. Black, P., and Wiliam, D. Inside the Black Box: Raising Standards Through Classroom Assessment. Phi Delta Kappan, 1998, 80, 139. Brown, A. L., Bransford, J. D., Ferrara, R. A., and Campione, J. C. Learning, Remembering and Understanding. In J. H. Flavell and E. M. Markman (eds.), Handbook of Child Psychology, Vol. 3: Cognitive Development. (4th ed.) New York: Wiley, 1983. Cognition and Technology Group at Vanderbilt. From Visual Word Problems to Learning Communities: Changing Conceptions of Cognitive Research. In K. McGilly (ed.), Classroom Lessons: Integrating Cognitive Theory and Classroom Practice. Cambridge, Mass.: MIT Press, 1994. Cognition and Technology Group at Vanderbilt. The Jasper Project: Lessons in Curriculum, Instruction, Assessment, and Professional Development. Mahwah, N.J.: Erlbaum, 1997. Heubert, J. P., and Hauser, R. M. (eds.). High Stakes: Testing for Tracking, Promotion, and Graduation. Washington, D.C.: National Academy Press, 1999. Hunt, E., and Minstrell, J. A Collaborative Classroom for Teaching Conceptual Physics. In K. McGilly (ed.), Classroom Lessons: Integrating Cognitive Theory and the Classroom. Cambridge, Mass.: MIT Press, 1994. Hunt E., and Minstrell, J. Effective Instruction in Science and Mathematics: Psychological Principles and Social Constraints. Issues in Education: Contributions from Educational Psychology, 1996, 2, 123162. Landauer, T. K., and Dumais, S. T. A Solution to Platos Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. Psychological Review, 1997, 104, 211240. Minstrell, J., and Stimpson, V. A Classroom Environment for Learning: Guiding Students Reconstruction of Understanding and Reasoning. In L. Schauble and R. Glaser (eds.), Innovations in Learning: New Environments for Education. Mahwah, N.J.: Erlbaum, 1996. National Council of Teachers of Mathematics. Curriculum and Evaluation Standards for School Mathematics. Reston, Va.: National Council of Teachers of Mathematics, 1989. National Council of Teachers of Mathematics. Principles and Standards for School Mathematics. Reston, Va.: National Council of Teachers of Mathematics, 2000. National Research Council. National Science Education Standards. Washington, D.C.: National Academy Press, 1996. Schaffner, A., and others. Benchmark Lessons and the World Wide Web: Tools for Teaching Statistics. In Proceedings of the Second International Conference on the Learning Sciences. Mahwah, N.J.: Erlbaum, 1997. Sherwood, R. D., and others. Problem-Based Macro Contexts in Science Instruction: Theoretical Basis, Design Issues, and the Development of Applications. In D. Lavoie (ed.), Toward a Cognitive-Science Perspective for Scientic Problem Solving. Manhattan, Kan.: National Association for Research in Science Teaching, 1995. Stiggins, R. Student-Centered Classroom Assessment. Columbus, Ohio: Merrill, 1994.
EARL HUNT is professor emeritus at the University of Washington. He has specialized in the study of human and articial intelligence. In addition to doing basic research in these elds he has been involved in studies of cognition in the workforce and in educational appplications of cognitive science.
ISSUES, EXAMPLES, AND CHALLENGES IN FORMATIVE ASSESSMENT
85
JAMES W. PELLEGRINO is Liberal Arts and Sciences Distinguished Professor of Cognitive Psychology and Education at the University of Illinois at Chicago. His research and development interests focus on childrens and adults thinking and learning and the implications of cognitive research and theory for assessment and instructional practice.