Evaluation in Practice A Methodological Approach 2nd Edition Bingham R Download
Evaluation in Practice A Methodological Approach 2nd Edition Bingham R Download
https://ebookgate.com/product/evaluation-in-practice-a-
methodological-approach-2nd-edition-bingham-r/
https://ebookgate.com/product/understanding-mass-spectra-a-basic-
approach-2nd-edition-r-martin-smith/
ebookgate.com
https://ebookgate.com/product/impact-evaluation-in-practice-1st-
edition-paul-j-gertler/
ebookgate.com
https://ebookgate.com/product/social-policy-for-effective-practice-a-
strengths-approach-2nd-edition-rosemary-chapin/
ebookgate.com
https://ebookgate.com/product/effective-practice-in-health-social-
care-and-criminal-justice-a-partnership-approach-2nd-edition-ros-
carnwell/
ebookgate.com
Human Body Caroline Bingham
https://ebookgate.com/product/human-body-caroline-bingham/
ebookgate.com
https://ebookgate.com/product/physical-evaluation-in-dental-
practice-1st-edition-geza-t-terezhalmy/
ebookgate.com
https://ebookgate.com/product/guided-reflection-a-narrative-approach-
to-advancing-professional-practice-2nd-edition-christopher-johns/
ebookgate.com
https://ebookgate.com/product/statistics-for-censored-environmental-
data-using-minitab-and-r-statistics-in-practice-2nd-edition-dennis-r-
helsel/
ebookgate.com
Evaluation in Practice
A Methodological Approach
SECOND EDITION
Richard D. Bingham
Cleveland State University
Claire L. Felbinger
American University
Seven Bridges Press
135 Fifth Avenue
New York, NY 10010-7101
Bingham, Richard D.
Evaluation in practice : a methodological approach / Richard D.
Bingham, Claire L. Felbinger. — 2nd. ed.
p. cm.
Includes bibliographical references.
ISBN 1-889119-57-1
1. Evaluation research (Social action programs)—United States. 2.
Social sciences—Research—Methodology. I. Felbinger, Claire L. II.
Title.
H62.5 U5 B56 2002
300'.7'2—dc21 2001006312
CIP
Acknowledgments vii
Preface ix
PART I
Introduction 1
CHAPTER 1
The Process of Evaluation 3
CHAPTER 2
Evaluation Designs 15
CHAPTER 3
Measurement 31
CHAPTER 4
Performance Measurement and Benchmarking 45
PART II
Experimental Designs 55
CHAPTER 5
Pretest-Posttest Control Group Design 57
Improving Cognitive Ability in Chronically Deprived Children, 60
Harrison McKay, Leonardo Sinisterra, Arline McKay, Hernando Gomez, Pascuala Lloreda
Explanation and Critique, 77
CHAPTER 6
Solomon Four-Group Design 79
Evaluation of Multimethod Undergraduate Management Skills Development Program, 80
Marian M. Extejt, Benjamin Forbes, and Jonathan E. Smith
Explanation and Critique, 92
iv EVALUATION IN PRACTICE
CHAPTER 7
Posttest/Only Control Group Design 95
Community Posthospital Follow-up Services, 96
Ann Solberg
Explanation and Critique, 104
PART III
Quasi-Experimental Designs 107
CHAPTER 8
Pretest-Posttest Comparison Group Design 109
Conservation Program Evaluations: The Control of Self-Selection Bias, 110
Tim M. Newcomb
Explanation and Critique, 120
CHAPTER 9
Interrupted Time-Series Comparison Group Design 123
Regulatory Strategies for Workplace Injury Reduction: A Program Evaluation, 126
Garrett E. Moran
Explanation and Critique, 135
CHAPTER 10
Posttest-Only Comparison Group Design 137
The Effects of Early Education on Children’s Competence in Elementary School, 138
Martha B. Bronson, Donald E. Pierson, and Terrence Tivnan
Explanation and Critique, 147
PART IV
Reflexive Designs 151
CHAPTER 11
One-Group Pretest-Posttest Design 153
Nutrition Behavior Change: Outcomes of an Educational Approach, 154
Patricia K. Edwards, Alan C. Acock, and Robert L. Johnson
Explanation and Critique, 166
CHAPTER 12
The Simple Time-Series Design 169
A Little Pregnant: The Impact of Rent Control in San Francisco, 170
Edward G. Goetz
Explanation and Critique, 175
PART V
Cost-Benefit and Cost-Effectiveness Analysis 179
CHAPTER 13
Cost-Benefit Analysis 181
The Costs and Benefits of Title XX and Title XIX Family Planning Services in Texas, 182
David Malitz
Explanation and Critique, 194
CONTENTS v
CHAPTER 14
Cost-Effectiveness Analysis 197
A Cost-Effectiveness Analysis of Three Staffing Models for the Delivery
of Low-Risk Prenatal Care, 199
Elaine A. Graveley and John H. Littlefield
Explanation and Critique, 207
PART VI
Other Designs 209
CHAPTER 15
Patched Designs 211
Attitude Change and Mental Hospital Experience, 212
Jack M. Hicks and Fred E. Spaner
Explanation and Critique, 222
CHAPTER 16
Meta-Evaluation Designs 225
How Effective Is Drug Abuse Resistance Education?
A Meta-Analysis of Project DARE Outcome Evaluations, 226
S.T. Ennett, N.S. Tobler, C.L. Ringwalt, and R.L. Flewelling
Explanation and Critique, 237
PART VII
On Your Own 241
CHAPTER 17
Are Training Subsidies for Firms Effective? 243
The Michigan Experience
Harry J. Holzer, Richard N. Block, Marcus Cheatham, and Jack H. Knott
CHAPTER 18
Changing the Geography of Opportunity by Expanding Residential Choice 257
Lessons from the Gautreaux Program
James E. Rosenbaum
CHAPTER 19
Changes in Alcohol Consumption Resulting from the Elimination
of Retail Wine Monopolies 283
Results from Five U.S. States
Alexander C. Wagenaar and Harold D. Holder
PART VIII
Dilemmas of Evaluation 295
CHAPTER 20
Fifth-Year Report 297
Milwaukee Parental Choice Program
John F. Witte, Troy D. Sterr, and Christopher A. Thorn
vi EVALUATION IN PRACTICE
CHAPTER 21
School Choice in Milwaukee: A Randomized Experiment 329
Jay P. Greene, Paul E. Peterson, and Jiangtao Du
CHAPTER 22
Two Sides of One Story 345
Chieh-Chen Bowen
Index 351
About the Authors 355
Acknowledgments
W E WOULD LIKE to thank the following 425–40, copyright © 1984 by Sage Publications,
individuals, organizations, and publishers who Inc. Reprinted by permission of Sage
have kindly granted permission to include the Publications, Inc.
following materials in this book: “Regulatory Strategies for Workplace
“Improving Cognitive Ability in Chronically Injury Reduction: A Program Evaluation,” by
Deprived Children,” by Harrison McKay, Garrett E. Moran, in Evaluation Review: A
Leonardo Sinisterra, Arline McKay, Hernando Journal of Applied Social Research 9, no. 1
Gomez, Pascuala Lloreda, reprinted with (February 1985): 21–33, copyright © 1985 by
permission from Science 200, no. 4339 (21 April Sage Publications, Inc. Reprinted by
1978): 270–78. Copyright 1978 by American permission of Sage Publications, Inc.
Association for the Advancement of Science. “The Effects of Early Education on
“Evaluation of a Multimethod Under- Children’s Competence in Elementary School,”
graduate Management Skills Development by Martha B. Bronson, Donald E. Pierson, and
Program,” by Marian M. Extejt et al., in Journal of Terrence Tivnan, in Evaluation Review: A
Education for Business (March/April 1996): 223– Journal of Applied Social Research 8, no. 5
31. Reprinted with permission of the Helen (October 1984): 615–29, copyright © 1984 by
Dwight Reid Educational Foundation. Published Sage Publications, Inc. Reprinted by
by Heldref Publications, 1319 18th St. N.W., permission of Sage Publications, Inc.
Washington, D.C. 20036-1802. Copyright 1996. “Nutrition Behavior Change: Outcomes of
“Community Posthospital Follow-up an Educational Approach,” by Patricia
Services,” by Ann Solberg, in Evaluation Review: K. Edwards, Alan C. Acock, and Robert L.
A Journal of Applied Social Research 7, no. 1 Johnson, in Evaluation Review: A Journal of
(February 1983): 96–109, copyright © 1983 by Applied Social Research 9, no. 4 (August 1985):
Sage Publications, Inc. Reprinted by 441–59, copyright © 1985 by Sage Publications,
permission of Sage Publications, Inc. Inc. Reprinted by permission of Sage
“Conservation Program Evaluations: The Publications, Inc.
Control of Self-Selection Bias,” by Tim M. “A Little Pregnant: The Impact of Rent
Newcomb, in Evaluation Review: A Journal of Control in San Francisco,” by Edward G. Goetz,
Applied Social Research 8, no. 3 (June 1984): in Urban Affairs 30, no. 4 (March 1995): 604–
viii EVALUATION IN PRACTICE
12, copyright © 1995 by Sage Publications, Inc. Richard N. Block, Marcus Cheatham, and Jack
Reprinted by permission of Sage Publications, H. Knott, in Industrial and Labor Relations
Inc. Review 46, no. 4 (July 1993): 625–36. Reprinted
“The Costs and Benefits of Title XX and with permission from Cornell University.
Title XIX Family Planning Services in Texas,” “Changing the Geography of Opportunity
by David Malitz, in Evaluation Review: A by Expanding Residential Choice: Lessons from
Journal of Applied Social Research 8, no. 4 the Gautreaux Program,” by James E.
(August 1984): 519–36, copyright © 1984 by Rosenbaum, in Housing Policy Debate 6, no. 1
Sage Publications, Inc. Reprinted by (1995): 231–69. Copyright 1995 by Fannie Mae
permission of Sage Publications, Inc. Foundation. Reprinted with permission from
“A Cost-Effectiveness Analysis of Three the Fannie Mae Foundation.
Staffing Models for the Delivery of Low-Risk “Changes in Alcohol Consumption
Prenatal Care,” by Elaine A. Graveley and John Resulting from the Elimination of Retail Wine
H. Littlefield, in American Journal of Public Monopolies,” by Alexander C. Wagenaar and
Health 82, no. 2 (February 1992): 180–84. Harold D. Holder, reprinted with permission
Reprinted with permission from the American from Journal of Studies on Alcohol 56, no. 5
Public Health Association. (September 1995): 566–72. Copyright by
“Attitude Change and Mental Hospital Alcohol Research Documentation, Inc., Rutgers
Experience,” by Jack M. Hicks and Fred E. Center of Alcohol Studies, Piscataway, NJ 08854.
Spaner, in Journal of Abnormal and Social “Fifth-Year Report: Milwaukee Parental
Psychology 65, no. 2 (1962): 112–20. Choice Program,” by John F. Witte, Troy D.
“How Effective Is Drug Abuse Resistance Sterr, and Christopher A. Thorn, typescript
Education? A Meta-Analysis of Project DARE (December 1995). Reprinted with permission
Outcome Evaluations,” by Susan T. Ennett, from John F. Witte.
Nancy S. Tobler, Christopher L. Ringwalt, and “The Effectiveness of School Choice in
Robert Flewelling, in American Journal of Public Milwaukee,” by Jay P. Greene, Paul E. Peterson,
Health 84, no. 9 (September 1994): 1394–1401. and Jiangtao Du, in Learning from School
Reprinted with permission from the American Choice, edited by Paul E. Peterson and Bryan
Public Health Association. C. Hassel (Brookings Institution Press, 1998).
“Are Training Subsidies for Firms Effective? Reprinted with permission from the Brookings
The Michigan Experience,” by Harry J. Holzer, Institution.
Preface
In teaching program and policy evaluation There are simply too many points at which
courses in various graduate public adminis- mistakes can be made or unforseen events
tration programs, we have found that stu- intrude to permit perfection in either de-
dents have serious difficulty in critiquing sign or execution. We believe that critics
evaluations done by others. Many students have a responsibility to make two kinds of
have problems distinguishing one type of de- judgments about criticisms they offer: First
sign from another. If the design used by a re- it is necessary to make distinctions, if pos-
searcher is not a clone of one of the designs sible, between those mistakes that constitute
found in the textbook, many students are not serious flaws and those that are less serious
able to classify or categorize it. defects. Admittedly, this is a matter of judg-
In addition, students often believe that ment, yet we do believe that some sort of
merely because an article was published in a weighting ought to be suggested by respon-
scholarly or professional journal, it is perfect. sible critics as a guide to those who have not
They fail to see how an author did or did not digested the enormous volume of memo-
control for various methodological problems. randa, working papers, analyses, and final
We conceive of an evaluation course as a reports produced by the experiment. Second,
methods course in which students learn the it is only fair to distinguish between defects
techniques of program and policy evaluation that arise from incorrect planning and other
and the methodological problems encoun- errors of judgment and those that arise out
tered when conducting evaluation research. of events or processes that could not have
Part of the learning process involves the abil- been anticipated in advance. This is essen-
ity to assess the evaluations of others—not tially a distinction between “bad judgment”
solely to find fault with the work of others but and “bad luck,” the former being a legitimate
to develop a keen understanding of the diffi- criticism and the latter calling for sympa-
culties of conducting evaluations. Peter Rossi thetic commiseration (1978, 412–13).
and Katharine Lyall make this point clearly:
We would like to take Rossi and Lyall’s
It is easy enough to be a critic: All pieces of comments one step further. According to the
empirical research are more or less flawed. dictionary, a critic is “one who expresses a rea-
x EVALUATION IN PRACTICE
soned opinion on any matter as a work of art multitude of disciplines. What is important is
or a course of conduct, involving a judgment the methodology and not the substantive
of its value, truth, or righteousness, an appre- fields represented by the articles. The textual
ciation of its beauty or technique, or an inter- material in the chapters is somewhat limited
pretation” (Webster’s 1950, 627). as our concern is with practical applications
Thus, to be a critic requires positive judg- of the various designs. A list of supplemen-
ments as well as negative. We consider it im- tary readings follows the introductions to
portant to point to cases in which researchers many of the chapters for those interested in
have developed interesting approaches to more detailed explanations of the design be-
overcome problems of difficult evaluation de- ing discussed. The book has equal applicabil-
sign. Chapter 8 of this book is illustrative. ity in public policy courses, public adminis-
Tim Newcomb devised a unique comparison tration, planning, urban studies, sociology,
group in his evaluation of an energy conser- political science, criminal justice, education,
vation program in Seattle. As “critics,” we are social welfare, and the health professions, to
delighted when we run across this kind of cre- name a few.
ative thinking. The book makes no assumptions about
It should be clear by now that the pur- the reader’s background beyond an under-
pose of Evaluation in Practice is to illustrate graduate degree. A number of the articles
the techniques of different research designs use statistical techniques, but it is not neces-
and the major design problems (termed prob- sary that the student be well versed in statis-
lems of internal validity) encountered in real- tics. Unless we specify otherwise, the student
world evaluation research. Each chapter, 5 can assume that the statistics are appropri-
through 16, presents a brief introduction to ately applied. It is the design methodology
an evaluation design and uses an evaluation that is important in Evaluation in Practice,
as an example of that design. The article is not statistics.
then followed by an explanation and critique
of the work.
One caveat: The book does not cover all Organization of the Book
forms of evaluation monitoring, assessment,
process evaluations, and the like (with the ex- The book is composed of eight parts: intro-
ception of brief discussions in chapter 1). We duction, experimental designs, quasi-experi-
are concerned with impact evaluations. This mental designs, reflexive designs, cost-benefit
does not suggest that these functions are not and cost-effectiveness analyses, other designs,
important, only that the purpose of this book on your own, and dilemmas of evaluation.
is much more modest. Part I consists of four chapters. The introduc-
Evaluation in Practice is written for stu- tory chapter addresses process evaluations,
dents in graduate professional degree pro- output evaluations, and outcome evaluations.
grams and for general graduate courses in the The chapter is based on the premise that
social sciences. Although we both teach in evaluation is a continuous process in which
graduate public administration programs, different types of evaluation of a particular
this book is not that narrowly construed. The program are appropriate at different times
articles in the book are taken from a variety of and for different audiences. The second chap-
disciplines illustrating the commonality of ter introduces four basic evaluation designs
evaluation. The book provides examples of and discusses threats to validity. The third
both policy and program evaluations from a chapter presents the basics of measurement
PREFACE xi
and discusses the reliability and validity of mas of evaluation as it presents two articles
measurement. The last chapter in the section that use the same data set to evaluate a paren-
is concerned with two particular types of tal choice program in education but come to
measurement in vogue today—performance differing conclusions. Our colleague Chieh-
measurement and benchmarking. Chen Bowen provides the critique.
Part II consists of three chapters discuss- Finally, we would like to thank a number
ing and illustrating three experimental evalu- of people for their assistance on this book.
ation designs. The feature common to each is First, we thank the Urban Center at Cleveland
the random assignment of subjects into ex- State University for institutional support. But
perimental and control groups. The chapters we especially appreciate the comments and
cover the pretest-posttest control group de- guidance we received from a number of our
sign, the Solomon Four-Group Design, and colleagues—Joe Wholey of the University of
the posttest-only control group design. Southern California, Roger Durand of the
Part III covers quasi-experimental evalu- University of Houston at Clear Lake, Patricia
ation designs and is composed of four chap- Shields of Southwest Texas State University,
ters. They discuss the pretest-posttest com- and especially Paul Culhane of Northern Illi-
parison (not control) group design, nois University. Their comments on our pro-
regression-discontinuity design, interrupted posal for this second edition were absolutely
time-series comparison group design, and the critical to the book. Also thanks to American
posttest-only comparison group design. University students Jane Chan and Robin
Part IV also concerns quasi-experimental Gluck for assistance with manuscript produc-
designs but is differentiated from the designs tion. Finally, we appreciate the assistance of
in part III in that the part IV groups are com- the editorial and publishing services of David
pared only with each other. The two types of Estrin and the help of Bob Gormley,
evaluation design illustrated in this section Katharine Miller, and Ted Bolen of Chatham
are the one-group pretest-posttest design and House Publishers. Of course, we retain the ul-
the simple time-series design. timate responsibility.
Part V covers two variations, or expan- We have high hopes for this second edi-
sions, of other designs, which are included for tion. We are confident that if students thor-
their uniqueness. These are cost-benefit and oughly understand the readings and discus-
cost-effectiveness analyses. sion presented here, they will be capable of
Part VI also covers unique variations on rationally reviewing most of the program or
basic designs. They are patched designs and policy evaluations or evaluation designs that
meta-evaluation designs. they are likely to encounter during their pro-
Part VII requires students to work on fessional careers.
their own. The three chapters in this section
present three interesting policy and program
References
evaluations with no explanation and critique.
It is up to the students to evaluate the articles Rossi, Peter H., and Katharine C. Lyall. 1978. “An
Overview Evaluation of the NIT Experiment.” In
on their own and to apply what they have Evaluation Studies Review Annual, vol. 3. Beverly
learned thus far. Hills, Calif.: Sage Publications, 1950.
The book concludes with three articles in Webster’s New International Dictionary. 2d ed.
Part VIII. This section illustrates the dilem- Springfield, Mass.: Merriam.
PART I
Introduction
CHAPTER 1
The evaluation of agency programs or legis- should be healthier after the program than
lative policy is the use of scientific methods to they were before. These hypotheses should be
estimate the successful implementation and linked with the policy’s or program’s intent.
resultant outcomes of programs or policies for Typically, intent refers to policymakers’ hopes
decision-making purposes. Implicit in this regarding the program’s outcomes. Unfortu-
definition are the many levels on which a pro- nately, identifying intent is not always a
gram or policy can be evaluated and the many simple process. Bureaucratic agencies at-
potential audiences that may be interested in tempt to translate legislative intent into pro-
utilizing the evaluation. A single approach or cedures and regulations. When the intent is
method is not common to all evaluations. We unclear or contradictory, the translation can
hope that during the course of this book, stu- be manipulated into a program that does not
dents of evaluation will become acquainted resemble anything the legislators had in
with the multiplicity of approaches to evalua- mind. Even with clear intent, poor program
tion and develop the ability to choose appro- design can obscure original intent. In the best
priate designs to maximize internal validity of all worlds, evaluators look for broad-based
and meet consumers’ evaluation needs. goals and for clearly stated measurable objec-
Good evaluations use scientific methods. tives by which these goals can be reached.
These methods involve the systematic process Nevertheless, it is the job of the evaluator to
of gathering empirical data to test hypotheses try to break through these often unclear
indicated by program’s or policy’s intent. Em- guidelines and to provide usable findings in a
pirical data are observable, measurable units timely manner in the hopes of informing the
of information. In evaluation, one does not decision-making process.
just “feel” that a program is operating effec-
tively; the data demonstrate that this is or is
not the case. Hypotheses are assertions about Types of Evaluation
program impacts. In other words, hypotheses
state changes that should occur to program Several general types of evaluation corre-
recipients as a direct result of the program. spond to what some evaluators consider to be
For example, children in a nutrition program successive hierarchical levels of abstraction.
4 PART I INTRODUCTION
One type of evaluation is not “better” than “Are staff adequately trained for their jobs?”
another; each is appropriate to a different set are pursued at this level. Basically, these
of research questions. Timothy Bartik and evaluations look to uncover management
Richard Bingham refer to this as a continuum problems or assure that none are occurring.
of evaluations (1997, 247). This continuum is They deal with the behavior and practices of
illustrated in figure 1.1. Evaluation is a sci- the program’s staff. Work analysis, resource
ence and an art. The artful part is the success- expenditure studies, management audits,
ful matching of the level of the evaluation procedural overhauls, and financial audits are
and the appropriate scientific design within indicative of evaluations at this point in the
the resource and time constraints of the con- continuum.
sumer of the evaluation. These evaluations often involve an in-
spection of the fundamental goals and objec-
tives of the program. Indeed, evaluations
Process Evaluations
cannot occur in the absence of direction con-
The first sentence of this chapter mentions cerning what the program is supposed to do.
measuring the implementation of programs. It is shocking how many programs operate in
Implementation is the process by which a pro- the absence of written goals and objectives!
gram or policy is designed and operated. Pro- Sometimes policymakers cannot satisfy all
cess evaluations focus on the means by which relevant political players unless they are not
a program or policy is delivered to clients. specific about goals and objectives. Even
Karen Sue Trisko and V.C. League identify when such directives exist, evaluators often
five levels, or approaches, to evaluation find that the actual operation does not seem
(which Bartik and Bingham adapt to their to fit the intent of the written guidelines.
continuum), two of which are process or, as This may be because the goals are outdated,
they also refer to them, formative evaluations. at which point staff need to reevaluate the
Although Trisko and League refer to these as goals and objectives in light of the current
levels, the term approach seems to be more environment. Sometimes programs develop
appropriate in the evaluation context. The a life of their own, and evaluators can point
first process approach is monitoring daily out ways in which the program can get back
tasks. In this approach to evaluation, funda- on track.
mental questions of program operation are Even if the purpose of the evaluation is
the focus of inquiry. Indeed, questions such not to assess process, evaluators find it nec-
as “Are contractual obligations being met?” or essary to gather an inventory of goals and
FIGURE 1.1
Continuum of Evaluation
Source: Bartik and Bingham 1997, 248.
CHAPTER 1 THE PROCESS OF EVALUATION 5
objectives to determine the predicted impact our major national programs. Putting to-
of the program. They then must reconcile gether Congressional majorities in contro-
the written material with what they observe versial legislation often leads to murky
before a full-blown evaluation can occur. Jo- legislative intent.
seph Wholey and his colleagues at the Urban Process evaluations frequently focus on
Institute (Horst et al. 1974) termed the pro- the way a program is implemented. Program
cess by which this is done “evaluability as- managers may be interested professionally in
sessment.” Evaluability assessments are organizing and running a fine-tuned bureau
process evaluations that are performed so and in periodically assessing the efficiency of
that the evaluator and the evaluation client operations. During the course of these evalu-
can come to agreement on which aspects of ations, better ways of doing business may be
the program will be part of the final product discovered in addition to blatant inefficien-
and what resources are necessary to produce cies. For instance, staff may suggest alternate
the desired document. The benefits of pro- ways of organizing the service production, or
cess evaluations should not be underesti- they may point out innovative techniques or
mated. tools discovered in their own professional
The second approach to process evalua- development. Those who fund the activities
tion concerns assessing program activities are also concerned about efficiency (mini-
and client satisfaction with services. This ap- mizing waste)—although sometimes to the
proach is concerned with what is happening detriment of effectiveness (obtaining the de-
to program participants. Among the ques- sired effect). During times of fiscal austerity,
tions considered at this level are “What is evaluations of this type can be beneficial.
done to whom and what activities are actu- Unfortunately, evaluations are often the first
ally taking place?” or “How could it be done activities to be cut during budget slashing. In
more efficiently?” or “Are the clients satisfied order to understand how important process
with the service or image of the service?” evaluations are as precursors to impact
Both the first and second aspects of pro- evaluations, see case 1.1; it is one of our
cess evaluations involve subjective measures favorite evaluation stories as told by Michael
at times and also require staff and client in- Patton.
volvement to complete. Some researchers,
such as Donald Campbell and Julian Stanley
Impact Evaluations
(1963, 6–12), refer to process evaluations as
“pre-experiments.” But, once again, the value The next two approaches are referred to as
of process evaluations should not be under- impact, outcome, or summative evaluations.
stated. It makes little sense to attempt to as- Impact evaluations focus on the end results
sess the impact of a program if it is run of programs. The first impact evaluation,
incorrectly or if the consumer being served is enumerating outcomes, looks at whether the
not known. The results of these evaluations program’s or policy’s objectives have been
can be as basic as pointing out to an agency met. Here we are interested in quantifying
that it does not have evaluable goals or ob- what is happening to program participants.
jectives or informing it that its accounting Questions may be “What is the result of the
procedures are shoddy. If one ever wonders activities conducted by the program?” or
why it seems so difficult to evaluate the ac- “What happened to the target population be-
tivities of federal bureaucrats, one need only cause of those activities [was it the expected
look to the enabling legislation of many of outcome]?” or “Should different activities be
6 PART I INTRODUCTION
C A S E S T U D Y
commonalities among results, measures, and Bowen and Chieh-Chen Bowen (1998, 72)
trends in the literature. They reuse the extant identify seven steps in meta-evaluations:
research findings.
• Conceptualize the relationship under
Meta-evaluations are quite similar to lit-
consideration;
erature reviews. In science, one is interested
• Gather a set of studies that have tested
in the cumulative nature of research. Litera-
the specified relationship;
ture reviews attempt to make sense out of the
• Design a coding sheet to record charac-
research that precedes the current effort. Lit-
teristics of the condition under which
erature reviews are considered qualitative ex-
each study was conducted;
ercises. Equally equipped researchers can
• Examine each study and, using the
disagree on the interpretation of research re-
coding sheet, record the conditions
sults. Persons involved in meta-evaluations
under which it was conducted;
try to quantify the review, presumably mak-
• Compute the “effect size” for each study;
ing a more objective statement. Harris Coo-
• Statistically analyze the characteristics
per (1982) argues that meta-analytic
and effect sizes for all the studies;
procedures can be systematic and scientific. A
• Write a research report.
number of quantitative methods have been
developed, most notably by Gene Glass
(1976; 1978) and Richard Light (1979; see
Utilization of Evaluation Findings
also Light and Smith 1971). Often evalua-
tions focus on whether the average, or mean, Regardless of what type of evaluation is per-
measure of the evaluation criterion variable formed, it is not useful unless it is completed
for the group that participated in the pro- in time for decision makers to use the find-
gram is any different from a similar group ings as input for their decision-making pro-
that did not participate. Glass and his associ- cess. It makes little sense to perform an
ates found conceptually similar criterion, or elegantly designed evaluation the results of
dependent, variables across studies and de- which arrive a day after the decision on the
veloped what they call the “effect size” be- program’s fate is cast. This is one difference
tween the means of experimental and control between evaluation research and academic
groups, standardizing on the basis of stan- research. Academic researchers will usually
dard deviation. They estimate the size of the risk a timely report for the elegance of a
total effect of these programs by aggregating properly constructed, controlled design.
across studies. Light, in contrast, uses the Evaluation researchers hope to construct the
original data, as opposed to group means; “best” design, but they err on the side of ex-
pools the findings in different sites; and re- pedience when a decision deadline is near.
analyzes the data. He includes only those That does not mean that evaluation research
measures that are identical across studies. is necessarily slipshod. Rather, evaluation re-
This technique reduces the number of stud- search is not worth much if it is not utilized.
ies he can aggregate. However, the findings Evaluators must recognize the trade-offs
are not subject to measurement differences. made in the interest of providing material in
An example of a meta-evaluation is in chap- a timely manner.
ter 16. The idea of systematically aggregat- How can one determine whether the re-
ing evaluation results has received sults of an evaluation are utilized? In the
considerable research attention. William 1970s, evaluation researchers spent a great
CHAPTER 1 THE PROCESS OF EVALUATION 9
Point 3 evaluations are concerned with 1. Identify specific user audiences and
the outcomes of programs—policy evalua- tailor recommendations to these groups.
tions. When programs are highly controver- 2. Formulate recommendations in coop-
sial or when new programs are competing for eration with the user audience.
scarce dollars, policy evaluations affect 3. Direct specific recommendations toward
whether the general policy should continue. a broader policy and program model.
When prevailing policies are being ques-
4. Assess the impact of recommendations
tioned, fundamental questions such as
over time.
“Should the government really be involved in
this?” or “Does this really matter?” come to 5. Present the recommendations in an
fore. The evaluations of the 1970s income empathetic way.
maintenance experiments are of this sort, as Tash and Stahler caution against gearing
instituting the proposals would have consti- evaluations toward one general “imaginary”
tuted a major shift in national welfare policy. user audience. They argue that this type of
These evaluations are typically of a long-term casting tends to develop recommendations
variety and command the most costly evalua- that are not specific enough to provide mana-
tions. Again, decision makers at this point gerial or policymaking direction. The recom-
may utilize the information from the evalua- mendations often do not take into account
tion—in the case of income maintenance they the mid- and lower-level bureaucrats’
did. They cannot help, however, allowing their needs—especially when the recommenda-
feelings to be shaped by ideological concerns tions call for them to alter actions. Therefore,
and information from other sources. once the users are specified, it may be neces-
The feedback loop at point 4 is illustra- sary to produce multiple versions of the same
tive of how evaluations of the past affect deci- evaluation report to address the concerns of
sion making in the present and can be different groups.
considered a personal meta-evaluation exer- Tash and Stahler also suggest that evalua-
cise. Nachmias and Felbinger suggest that uti- tions, and especially the recommendation
lization research cast at this point would formulation process, be viewed as a joint ven-
require longitudinal case studies. The loop is ture that would make the role of staff more
descriptive; the research suggested seems un- participatory. As such, the process would en-
manageable. Nachmias and Felbinger hoped hance utilization because those who partici-
to end attempts to evaluate the utilization of pate have a stake in the process. In addition,
specific results and to measure the behavior staff involvement can lead to constructive
of decision makers. Evaluators were still con- and doable changes; otherwise recommenda-
cerned, however, about the value decision tions may not make sense in the given context
makers place on research results. Michael for reasons unknown to the evaluator. Regu-
Patton and his colleagues (1978) suggest that lar meetings regarding interim results are also
evaluators should concern themselves with suggested.
something they can control—the conduct of Tash and Stahler suggest that by casting
evaluations. They suggest that evaluators the recommendations more broadly, the uti-
should be concerned with designing utiliza- lization of the research findings can be more
tion-focused evaluations. William Tash and generalizable to other contexts. Although this
Gerald Stahler (1982, 180–89) followed a may seem to contradict step 2, they suggest
step-by-step method to enhance the utiliza- that broad implications can be the focus of
tion of their research results: an additional section of the report.
CHAPTER 1 THE PROCESS OF EVALUATION 11
When possible, both strategically and fi- staff on the evaluation team, or as
nancially, plans should be made to conduct a reviewers of the evaluation design and
follow-up assessment sometime down the draft reports. Program managers should
road to determine the extent to which the be kept aware of the progress of evalua-
recommendations have been followed. Un- tions and be given the opportunity to
fortunately, unless the users are committed to review evaluation findings before they
this reassessment, evaluators seldom know are made public.
the extent to which their proposals have been • The legislative body, chief executive, or
followed. Reassessment may be more likely, agency head might mandate periodic
however, if the evaluating unit is part of the program evaluations, or at least regular
evaluated organization. monitoring and reporting of program
In step 5 Tash and Stahler are referring to outcomes.
the “style” of the recommendation presenta- • The legislative body, chief executive, or
tions. Style demonstrates the respect the agency head could ask program manag-
evaluator has for the evaluated and their en- ers to set target levels of performance in
vironment. Respect detracts from the belief terms of key service quality and outcome
that all evaluators are interlopers who are indicators at the beginning of the year—
quick to find fault with staff. Tash and Stahler and to report progress in achieving those
suggest that rather than merely providing a targets quarterly and at the end of the
laundry list of recommendations, an evalua- year.
tor may present recommendations in the • To the extent feasible, the chief executive
form of case studies that hypothetically ex- or agency head should take steps to build
plain the intent of the recommendations. Re- achievement of program results into
gardless of the presentation, though, it makes performance appraisal systems for
sense that utilization would be enhanced by managers and supervisors.
the common-sense method of presenting the • The chief executive or agency head could
results as you would like to receive them. This develop performance contracts with
does not mean that the evaluator is co-opted program managers and with contractors
by the organization, but neither is she or he to deliver specific results—both outputs
aloof toward it. and outcomes.
Harry Hatry, Kathryn Newcomer, and • To encourage managers and staff to
Joe Wholey (1994, 594) suggest that to get identify opportunities to improve
the most out of evaluations, those at higher efficiency and effectiveness, legislators or
levels in the organization should “create in- executives could permit agencies to retain
centives for—and remove disincentives to— a share (say 50 percent) of any savings
performance-oriented management and they achieve from improved service
management-oriented evaluations.” Some of delivery that results in lower costs. The
these incentives are the following: agency should be given considerable
flexibility in the use of such funds. The
• Regardless of who sponsors the evalua- savings probably should be provided only
tion, evaluators should seek input from for a limited amount of time, such as one
program managers and staff on evalua- or two years, even though the improve-
tion objectives and criteria. Where ment lasts for many years.
appropriate, evaluators should include • Legislators or chief executives could give
the program manager and key program agencies and programs the option of
12 PART I INTRODUCTION
considerations and trade-offs are all a part of divulged, although access to that information
the art and duty of any evaluation. A number is necessary to arrive at aggregate statistics.
of ethical considerations are involved in pro- An evaluator has to be conscious of confiden-
gram evaluation. First, the evaluator should tiality concerns whether or not confidential-
be aware of the purposes for which an evalua- ity was assured.
tion is commissioned. Evaluations for covert As mentioned earlier, the results of an
purposes should be avoided. Those commis- evaluation can have a real, personal impact
sioned to whitewash or to make a program on employees, clients, and agencies. Care
look good are examples of unethical evalua- should be taken to perform evaluations pro-
tions because the clients dictate the results. fessionally and empathetically. These are
Results should emerge from carefully con- ethical considerations encountered in the
structed research; they should not be dictated business of evaluation.
from above. Evaluations commissioned to kill
programs are also unethical for the same rea-
References
son. Another covert purpose to be avoided is
when it is clear that the consumer of the Bartik, Timothy J., and Richard D. Bingham. 1997.
“Can Economic Development Programs Be
evaluation wants to target the replacement of Evaluated?” In Dilemmas of Urban Economic
current employees. Although some evalua- Development: Issues in Theory and Practice, ed.
tions may show that the program is good or Richard D. Bingham and Robert Mier. Thousand
Oaks, Calif.: Sage Publications, 246–77.
bad or that someone should be replaced, one
Bowen, William M., and Chieh-Chen Bowen. 1998.
should not design evaluations with a precon- “Typologies, Indexing, Content Analysis, and
ceived notion of what the results will be. Meta-Analysis.” In Handbook of Research
Unscrupulous evaluators who are in their Methods in Public Administration, ed. Gerald J.
Miller and Marcia L. Whicker. New York: Marcel-
profession just for the money and who will Dekker, 51–86.
tailor their results are often referred to as Campbell, Donald T., and Julian C. Stanley. 1963.
“Beltway Bandits.” This phrase refers to the Experimental and Quasi-Experimental Designs for
unscrupulous consulting firms that sprang Research. Chicago: Rand McNally.
up on the Washington, D.C., Beltway, a high- Cooper, Harris M. 1982.“Scientific Guidelines for
Conducting Integrative Research.” Review of
way that encircles the District of Columbia, Educational Research 52, no. 2 (Summer): 291–
during the heyday of federal funding of social 302.
program evaluations. Not all these firms were Glass, Gene V. 1976. “Primary, Secondary, and Meta-
unethical. Evidence suggests, however, that Analysis,” Educational Researcher 5: 3–8.
almost anyone at the time could get away ——. 1978.“Integrating Findings: The Meta-
Analysis of Research.” In Review of Research in
with calling him- or herself an evaluator re- Education, ed. L.S. Schulman. Itasca, Ill.:
gardless of training. Over time, the reputa- Peacock, 351–79.
tion of these “bandits” became known, and Hatry, Harry P., Kathryn E. Newcomer, and Joseph
they no longer were granted contracts. S. Wholey. 1994.“Conclusion: Improving
Evaluation Activities and Results.” In Handbook
Another ethical concern is confidential- of Practical Program Evaluation, ed. Wholey,
ity. The evaluator often has access to infor- Hatry, and Newcomer. San Francisco: Jossey-
mation of a confidential nature over the Bass, 590–602.
course of a study. Some of this information Horst, Pamela, Joe. N. Nie, John W. Scanlon, Joseph
S. Wholey. 1974.“Program Management and the
could be damaging to individuals who supply Federal Evaluation.” Public Administration
the information or to their superiors or sub- Review 34, no. 4 (July/August): 300–308.
ordinates. Also, some of the information Light, Richard J. ed. 1983. Evaluation Studies Review
(such as an individual’s income) need not be Annual, vol. 8, entire issue.
14 PART I INTRODUCTION
Light, Richard J., and Paul V. Smith. 1971. “Accumu- Strosberg, M. A., and Joseph S. Wholey. 1983.
lating Evidence: Procedures for Resolving “Evaluability Assessment: From Theory to
Contradictions among Different Research Practice in the Department of Health and
Studies.” Harvard Educational Review 41 Human Services.” Public Administration Review
(November, 1971): 429–71. 43, no. 1: 66–71.
Light, Richard J. 1979.“Capitalizing on Variation: Tash, William R., and Gerald J. Stahler. 1982.
How Conflicting Research Findings Can Be “Enhancing the Utilization of Evaluation
Helpful for Policy.” Educational Researcher 8 Findings.” Community Mental Health Journal 18,
(October): 3–11. no. 3 (Fall): 180–89.
Nachmias, David, and Claire L. Felbinger. 1982. Trisko, Karen Sue, and V. C. League. 1978. Develop-
“Utilization in the Policy Cycles: Toward a ing Successful Programs. Oakland, Calif.:
Conceptual Framework.” Policy Studies Review 2, Awareness House, chap 7.
no. 2 (Fall): 300–308.
Patton, Michael Q. 1978. Utilization-Focused
Evaluation. Beverly Hills, Calif.: Sage.
CHAPTER 2
Evaluation Designs
In the preface and chapter 1, we explicitly The same behavior occurs in state and
noted that this book does not cover all types local governments and in nonprofit agencies.
of evaluation, in particular process evalua- Mayor Rudolph Giuliani of New York City
tions, although they are important, even was proud of his community policing initia-
critically important. But the focus here is on tives and quick to claim that these initiatives
outcome evaluations (or impact evaluations, had led to reduced crime in New York City. Of
terms we will use synonymously) because course it may be something else, such as de-
they are really the only way to determine the mographics (fewer teenage males), that
success of programs in the public and non- really “caused” the reduction in crime. Yet
profit sectors. In the private sector there is community policing has been given credit for
always the bottom line—the profit and loss reducing crime, and mayors all over the coun-
statement. Did the company show a profit? try have jumped on the bandwagon and are
How much? The outcome evaluation is the initiating community policing activities in
profit and loss statement of the public sector. their communities merely based on these two
Did the program work? How effective is it events occurring at the same time. New York’s
compared to other alternatives? crime rate declined at the same time that
Very often, the trouble with new public community policing efforts were initiated and
programs is that they are highly touted as be- the number of teenage males was reduced.
ing responsible for the improvement in some Now, we do not want to be seen as throw-
service delivery because politicians want to ing cold water on a potentially good program.
take credit for some improvement in that con- Community policing may be everything it is
dition, regardless of cause and effect. Presi- believed to be. We are simply urging caution.
dent Bill Clinton wanted to take credit for a A new program should be evaluated before it
balanced budget occurring two years ahead of is widely cloned.
the planned balanced budget; however, it was The fact remains, however, that the pub-
a robust economy that deserved the credit. lic sector has the propensity to copy and
Moreover, President Clinton wanted to take clone programs before they are known to
credit for the robust economy, when he actu- work or not work. Examples are numerous,
ally had very little to do with it. including boot camps, midnight basketball,
16 PART I INTRODUCTION
neighborhood watches, and the DARE (Drug without the American Cancer Society’s smok-
Abuse Resistance Education) program ing cessation program? How many of those at-
(Butterfield 1997). DARE has been adopted tending the Betty Ford Clinic would have
by 70 percent of the school systems in the licked alcoholism anyway if the clinic had not
United States, yet evaluations of the program existed? The answers to those and similar
(one is included in this book) call the questions are impossible to determine exactly,
program’s efficacy into question. So much but evaluation procedures can give us an ap-
pressure is placed on school officials to “do proximate idea what would have happened.
something about the drug problem” that pro-
grams like DARE are adopted more for their
political purposes than for what they do— The Process of Conducting
because they may do something. But if they Outcome Evaluations
are doing nothing, they are using up portions
of school days that could be better spent on How does one go about conducting an out-
traditional learning. come evaluation? A pretty simple step-by-step
So this is why we are emphasizing out- process makes it fairly clear. That process
come evaluations. Before the public sector, or consists of the following:
the nonprofits, run around helter skelter
1. Identify Goals and Objectives
adopting the latest fad just to seem progres-
2. Construct an Impact Model
sive, it is first important to see if programs
3. Develop a Research Design
have their desired outcomes, or at least are
4. Develop a Way to Measure the Impact
free of undesirable outcomes.
5. Collect the Data
6. Complete the Analysis and Interpretation
The Problem of Identify Goals and Objectives
Outcome Evaluations
It is difficult to meaningfully evaluate program
The questions “Does the program work?” and outcomes without having a clear statement of
“Is it the program that caused the change in an organization’s goals and objectives. Most
the treatment group or is it something else?” evaluators want to work with clearly stated
are at the heart of the problem of doing out- goals and objectives so that they can measure
come evaluations. In other words, the evalua- the extent to which those goals and objectives
tor wants to compare what actually happened are achieved. For our purposes we will define a
with what “would have happened if the world program’s goal as the broad purpose toward
had been exactly the same as it was except that which an endeavor is directed. The objectives,
the program had not been implemented” however, have an existence and reality. They
(Hatry, Winnie, and Fisk 1981, 25). Of course are empirically verifiable and measurable. The
this task is not possible because the program objectives, then, measure progress towards the
does exist, and the world and its people have program’s goals.
changed ever so slightly because of it. Thus, This seems simple enough, but in reality
the problem for evaluators is to simulate what many programs have either confusing or non-
the world would be like if the program had not existent goals and objectives. Or, since many
existed. Would those children have ever goals are written by legislative bodies, they are
achieved the eleventh-grade reading level written in such a way as to defy interpretation
without that reading program? Would all of by mere mortals. Take the goals set for the
those people have been able to quit smoking Community Action Program by Congress il-
CHAPTER 2 EVALUATION DESIGNS 17
FIGURE 2.1
Evaluation Designs
Source: Adapted from Richard D. Bingham and Robert Mier, eds. 1997. Dilemmas of Urban Economic
Development, Urban Affairs Annual Reviews 47. Sage, 253.
Experimental Design
group that is as similar as possible to the pro-
gram recipients. Both groups are measured be- A distinction must be made between experi-
fore and after the program, and their mental designs and the quasi-experimental
differences are compared [(A2 – A1) – (B2 – design discussed above. The concern here is
B1)]. The use of a comparison group can alle- with the most powerful and “truly scientific”
viate many of the threats to validity (other evaluation design—the controlled random-
explanations which may make it appear that ized experiment. As Rossi and Freeman have
the program works when it does not), as both noted, randomized experiments (those in
groups are subject to the same external which participants in the experiment are se-
events. The validity of the design depends on lected for participation strictly by chance)
how closely the comparison group resembles are the “flagships” in the field of program
the program recipient group. evaluation because they allow program per-
20 PART I INTRODUCTION
sonnel to reach conclusions about program when the confidentiality and privacy of the
impacts (or lack of impacts) with a high de- participants can be maintained. It can be
gree of certainty. The experimental design used when some citizens can be given differ-
discussed here is the pretest-posttest control ent services from others without violating
group design. moral or ethical standards. It should be used
when the findings are likely to be generaliz-
able to a substantial portion of the popula-
Pretest-Posttest Control Group Design
tion. And finally, the experimental design
The pretest-posttest control group design should be used when the program is an ex-
shown in figure 2.1d is the most powerful pensive one and there are substantial doubts
evaluation design presented in this chapter. about its effectiveness. It would also be help-
The only significant difference between this ful to apply this design when a decision to
design and the comparison group design is implement the program can be postponed
that participants are assigned to the program until the evaluation is completed.
and control groups randomly. The partici-
pants in the program group receive program
A Few General Rules
assistance, and those in the control group do
not. The key is random assignment. If the Hatry, Winnie, and Fisk also pass on a few
number of subjects is sufficiently large, ran- recommendations about when the different
dom assignment assures that the characteris- types of designs should be used or avoided
tics of subjects in both groups are likely to be (1981, 53–54). They suggest that whenever
virtually the same prior to the initiation of the practical, use the experimental design. It is
program. This initial similarity, and the fact costly, but one well-executed conclusive
that both groups will experience the same his- evaluation is much less costly than several
torical events, mature at the same rate, and so poorly executed inconclusive evaluations.
on, reasonably ensures that any difference be- They recommend that the one-group
tween the two groups on the postprogram pretest-posttest design be used sparingly. It is
measure will be the result of the program. not a strong evaluation tool and should be
used only as a last resort.
If the experimental design cannot be
When to Use the Experimental Design
used, Hatry, Winnie, and Fisk suggest that the
Experimental designs are frequently used for simple time-series design and one or more
a variety of treatment programs, such as quasi-experimental designs be used in com-
pharmaceuticals, health, drug and alcohol bination. The findings from more than one
abuse, and rehabilitation. Hatry, Winnie, and design will add to the reliance which may be
Fisk (1981, 42) have defined the conditions placed on the conclusion.
under which controlled, randomized experi- Finally, they urge that whatever design is
ments are most likely to be appropriate. The initially chosen, it should be altered or changed
experimental design should be considered if subsequent events provide a good reason to
when there is likely to be a high degree of am- do so. In other words, remain flexible.
biguity as to whether the outcome was caused
by the program or something else. It can be
Threats to Validity
used when some citizens can be given differ-
ent services from others without significant Two issues of validity arise concerning evalua-
harm or danger. Similarly, it can be used tions. They are issues with regard to the design
CHAPTER 2 EVALUATION DESIGNS 21
of the research and with the generalizability of Concerning the pretest-posttest comparison
the results. Campbell and Stanley have termed group design, they say:
the problems of design as the problem of in-
Look for plausible explanations for
ternal validity (1963, 3). Internal validity refers
changes in the values other than the pro-
to the question of whether the independent
gram. If there are any, estimate their ef-
variables did, in fact, cause the dependent vari-
fect on the data or at least identify them
able.
when presenting the findings. (35)
But evaluation is concerned not only
with the design of the research but with its And even for the pretest-posttest control
results—with the effects of research in a group design, they advise:
natural setting and on a larger population.
Look for plausible explanations for the
This concern is with the external validity of
differences in performance between the
the research.
two groups due to factors other than the
program. (40)
Internal Validity
For many decision makers, issues surround-
As was discussed earlier, one reason experi- ing the internal validity of an evaluation are
mental designs are preferred over other de- more important than those of external valid-
signs is their ability to eliminate problems ity (the ability to generalize the research re-
associated with internal validity. Internal va- sults to other settings). This is because they
lidity is the degree to which a research design wish to determine specifically the effects of
allows an investigator to rule out alternative the program they fund, they administer, or in
explanations concerning the potential impact which they participate. These goals are quite
of the program on the target group. Or, to different from those of academic research,
put it another way, “Did the experimental which tends to maximize the external validity
treatment make a difference?” of findings. The difference often makes for
In presenting practical program evalua- heated debate between academics and practi-
tion designs for state and local governments, tioners, which is regrettable. Such conflict
Hatry, Winnie, and Fisk constantly alert their need not occur at all if the researcher and
readers to look for external causes that might consumers can reach agreement on design
“really” explain why a program works or does and execution of a project to meet the needs
not work. For example, in discussing the one- of both. With such understanding, the odds
group pretest-posttest design, they warn: that the evaluation results will be used is
enhanced.
Look for other plausible explanations for
Campbell and Stanley (1963) identify
the changes. If there are any, estimate their
eight factors that, if not controlled, can pro-
effect on the data or at least identify them duce effects that might be confused with the
when presenting findings. (1981, 27)
effects of the experimental (or program-
For the simple time-series design, they caution: matic) treatment. These factors are Hatry,
Winnie, and Fisk’s “other plausible explana-
Look for plausible explanations for tions.” These threats to internal validity are
changes in the data other than the pro- history, maturation, testing, instrumentation,
gram itself. If there are any, estimate their statistical regression, selection, experimental
effects on the data or at least identify them mortality, and selection-maturation interac-
when presenting the findings. (31) tion. Here we discuss only seven of these
22 PART I INTRODUCTION
well documented that teens commit the most electric motors (Roethlisberger and Dickson
crimes. Government figures show that the 1939). When the researchers increased the in-
most common age for murderers is eighteen, tensity of the lighting, productivity increased.
as it is for rapists. More armed robbers are They reduced the illumination, and productiv-
seventeen than any other age. And right now ity still increased. The researchers concluded
the number of people in their late teens is at that the continuous observation of workgroup
the lowest point since the beginning of the members by the experimenters led workers to
Baby Boom. Thus, some feel that the most believe that they had been singled out by man-
likely cause of the recent reduction of the agement and that the firm was interested in
crime rate is the maturation of the population their personal welfare. As a result, worker mo-
(“Drop in Violent Crime” 1997). rale increased and so did productivity.
One other short comment: The problem Another phenomenon similar to the
of maturation does not pertain only to Hawthorne effect is the placebo effect. Sub-
people. A large body of research conclusively jects may be as much affected by the knowl-
shows the effects of maturation on both pri- edge that they are receiving treatment as by
vate corporations and public organizations the treatment itself. Thus, medical research
(for example, Aldrich 1979; Kaufman 1985; usually involves a placebo control. One group
Scott 1981). of patients is given neutral medication (i.e., a
placebo, or sugar pill), and another group is
given the drug under study.
Testing
The effects of testing are among the most in-
Instrumentation
teresting internal validity problems. Testing is
simply the effect of taking a test (pretest) on In instrumentation, internal validity is
the score of a second testing (posttest). A threatened by a change in the calibration of a
difference between preprogram and post- measuring instrument or a change in the ob-
program scores might thus be attributed to servers or scorers used in obtaining the
the fact that individuals remember items or measurements. The problem of instrumenta-
questions on the pretest and discuss them tion is sometimes referred to as instrument
with others before the posttest. Or the pretest decay. For example, a battery-operated clock
may simply sensitize the individual to a sub- used to measure a phenomenon begins to
ject area—for example, knowledge of politi- lose time—a clear measure of instrument de-
cal events. The person may then see and cay. But what about a psychologist who is
absorb items in the newspaper or on televi- evaluating a program by making judgments
sion relating to the event that the individual about children before and after a program?
would have ignored in the absence of the pre- Any change in the psychologist’s standards of
test. The Solomon Four-Group Design, to be judgment biases the findings. Or take the
discussed in chapter 6, was developed specifi- professor grading a pile of essay exams. Do
cally to measure the effects of testing. not all of the answers soon start to look alike?
The best-known example of the effect of Sometimes the physical characteristics
testing is known as the “Hawthorne Effect” of cities are measured by teams of trained
(named after the facility where the experiment observers. For example, observers are trained
was conducted). The Hawthorne experiment to judge the cleanliness of streets as clean,
was an attempt to determine the effects of lightly littered, moderately littered, or heavily
varying light intensity on the performance of littered. After the observers have been in the
individuals assembling components of small field for a while, they begin to see the streets as
24 PART I INTRODUCTION
average, all the same (lightly littered or moder- Likewise, any improvement in scores from the
ately littered). The solution to this problem of remedial program may be due to the program
instrument decay is to retrain the observers. itself or to a regression artifact. Campbell and
Stanley argue that researchers rarely account
for regression artifacts when subjects are se-
Regression Artifact
lected for their extremity, although they may
Variously called “statistical regression,” acknowledge such a factor may exist.
“regression to the mean,” or “endogenous In chapter 9, Garrett Moran attempts to
change,” a regression artifact is suspected weed out the impact of regression artifact
when cases are chosen for inclusion in a treat- from the program impact of a U.S. govern-
ment based on their extreme scores on a vari- ment mine safety program aimed at mines
able. For example, the most malnourished with extremely high accident rates.
children in a school are included in a child nu-
trition program. Or students scoring highest
Selection Bias
in a test are enrolled in a “gifted” program,
whereas those with the lowest scores are sent The internal validity problem involving selec-
to a remedial program. So what is the prob- tion is that of uncontrolled selection. Uncon-
lem? Are not all three of these programs de- trolled selection means that some individuals
signed precisely for these classes of children? (or cities or organizations) are more likely
The problem for the evaluation re- than others to participate in the program un-
searcher is to determine whether the results der evaluation. Uncontrolled selection means
of the program is genuinely caused by the that the evaluator cannot control who will or
program or by the propensity for a group will not participate in the program.
over time to score more consistently with the The most common problem of uncon-
group’s average than with extreme scores. trolled selection is target self-selection. The
Campbell and Stanley view this as a measure- target volunteers for the program and other
ment problem wherein deviant scores tend to volunteers are likely to be different from those
have large error terms: who volunteer. Tim Newcomb was faced with
this problem in trying to determine the im-
Thus, in a sense, the typical extremely
pact of the home weatherization program
high scorer has had unusually good “luck”
aimed at volunteer low-income people (see
(large positive error) and the extremely
chapter 8). Those receiving weatherization
low scorer bad luck (large negative error).
did, in fact, cut their utility usage. But was
Luck is capricious, however, so on a
that because of the program or the fact that
posttest we expect the high scorer to de-
these volunteers were already concerned with
cline somewhat on the average, the low
their utility bills? Newcomb controlled for se-
scorers to improve their relative stand-
lection bias by comparing volunteers for the
ing. (The same logic holds if one begins
weatherization program with a similar group
with the posttest scores and works back
of volunteers at a later time.
to the pretest.) (1963, 11)
In terms of our gifted and remedial pro-
Experimental Mortality
grams, an evaluator would have to determine
if declining test scores for students in the The problem of experimental mortality is in
gifted program are due to the poor function- many ways similar to the problem of self-
ing of the program or to the natural tendency selection, except in the opposite direction.
for extreme scorers to regress to the mean. With experimental mortality, the concern is
CHAPTER 2 EVALUATION DESIGNS 25
why subjects drop out of a program rather gambling experiments with individuals, us-
than why they participate in it. It is seldom ing play money. Would the same individuals
the case that participation in a program is behave in the same way if the gambling situa-
carried through to the end by all those who tion were real and they were using their own
begin the program. Dropout rates vary from money? With reactive arrangements, the re-
project to project, but unfortunately, the sults might well be specific to the artificial ar-
number of dropouts is almost always signifi- rangement alone (Nachmias and Nachmias
cant. Subjects who leave a program may dif- 1981, 92–93).
fer in important ways from those who
complete it. Thus, postprogram measure-
ment may show an inflated result because it Handling Threats to Validity
measures the progress of only the principal
beneficiaries. The reason that experimental designs are
recommended so strongly for evaluations is
that the random assignment associated with
External Validity
experimental evaluations cancels out the ef-
Two issues are involved with external valid- fect of any systematic error due to extrinsic
ity—representativeness of the sample and reac- variables which may be related to the out-
tive arrangements. While randomization come. The use of a control group accounts
contributes to the internal validity of a study, for numerous factors simultaneously with-
it does not necessarily mean that the outcome out the evaluator’s even considering them.
of the program will be the same for all With random assignment, the experimental
groups. Representativeness of the sample is and control groups are, by definition,
concerned with the generalizability of results. equivalent in all relevant ways. Since they
One very well-known, and expensive, evalua- will have the exact same characteristics and
tion conducted a number of years ago was of are also under identical conditions during
the New Jersey Income Maintenance Experi- the study except for their differential expo-
ment (Rees 1974). That experiment was de- sure to the program or treatment, other in-
signed to see what impact providing varying fluences will not be confounded with the
amounts of money to the poor would have effect of the program.
on their willingness to work. The study was How are the threats to validity con-
heavily criticized by evaluators, in part be- trolled by randomization? First, with his-
cause of questions about the generalizability tory, the control and experimental groups
of the results. Are poor people in Texas, or are both exposed to the same events occur-
California, or Montana, or Louisiana likely to ring during the program. Similarly, matura-
act the same as poor people in New Jersey? tion is neutralized because the two groups
How representative to the rest of the nation undergo the same changes. Given the fact
are the people of New Jersey? Probably not that both groups have the same characteris-
very representative. Then are the results of tics, we might also expect that mortality
the New Jersey Income Maintenance Experi- would effect both groups equally; but this
ment generalizable to the rest of the nation? might not necessarily be the case. Using a
They are not, according to some critics. control group is also an answer to testing.
Sometimes evaluations are conducted in The reactive effect of measurement, if
settings which do not mirror reality. These present, is reflected in both groups. Random
are questions of reactive arrangements. Psy- assignment also ensures that both groups
chologists, for example, sometimes conduct have equal representation of extreme cases
26 PART I INTRODUCTION
so that regression to the mean is not the case. describes the manner in which the simple
And, finally, the selection effect is negated effects of a variable may differ from level
(Nachmias and Nachmias 1981, 91–92). to level of other variables. (1979, 33)
Let us illustrate with an overly simplistic solu-
tion. Figure 2.2 illustrates a program with two
Other Basic Designs components, A and B. In an evaluation of the
operation of a special program to teach calcu-
The brief description of several evaluation de-
lus to fifth graders, component A is a daily
signs covered earlier in this chapter will be am-
schedule of 30 minutes of teaching with a1 in-
plified and expanded upon in later chapters, and
dicating participation in the program and a2
variations on the basic designs will be discussed.
indicating nonparticipation. Component B is
For example, a design not yet covered, the
a daily schedule of 30 minutes of computer-
Solomon Four-Group Design, a form of experi-
ized instruction, with b1 indicating participa-
mental design, will be discussed in chapter 6.
tion and b2 indicating nonparticipation. The
Two variations of designs covered in the first
possible conditions are the following:
edition of Evaluation in Practice are not covered
here, however—the factorial design and the re-
a1b1 teaching and computer
gression-discontinuity design. We do not have a
a1b2 teaching only
chapter on either design because we were not
a2b1 computer only
able to identify good examples of the procedures
a2b2 neither teaching nor computer
in the literature. Instead, they will be discussed
here. Also suppose that a1b2 improves the students’
A factorial design, a type of experimental knowledge of calculus by one unit, that a2b1
design, is used to evaluate a program that has also improves the students’ knowledge of cal-
more than one treatment or component. It is culus by one unit, but that a1b1 improves the
also usually the case that each level of each students’ knowledge of calculus by three units.
component can be administered with various This is an example of an interaction effect.
levels of other components. In a factorial de- The combination of receiving both compo-
sign, each possible combination of program nents of the program produces greater results
variables is compared—including no pro- than the sum of each of the individual com-
gram. Thus, a factorial evaluation involving ponents.
three program components is similar to three Of course in the real world, factorial
separate experiments, each investigating a evaluations are seldom that simple. In an
different component.
One of the major advantages of the fac-
torial design is that it allows the researchers
to identify any interaction effect between Treatment A
variables. David Nachmias stated: a1 a2
(yes) (no)
In factorial experiments the effect of one b1 a 1b1 a 2b 1
(yes)
of the program variables at a single level
Treatment B
of another of the variables is referred to as b2 a 1b2 a 2b 2
the simple effect. The overall effect of a (no)
program variable averaged across all the
levels of the remaining program variables FIGURE 2.2
is referred to as its main effect. Interaction The 22 Factorial Design
CHAPTER 2 EVALUATION DESIGNS 27
To perform the analysis, two regression people with the same eligibility are on both
lines must be estimated—one for the treated sides of the cut point line, a fuzzy cut point has
eligibles and the other for the untreated been used. If the number of such mis-
ineligibles. For each, the market values of the classifications is small, the people in the fuzzy
home (Y) is regressed on income (X). Al- gap can be eliminated. One should use this
though it is intuitively pleasing in this example technique cautiously, however. The further
that both lines are upward sloping (i.e., that apart on the eligibility criterion the units are,
market value increases with income), that the less one is confident that the treated and
measure of program impact is at the intercepts untreated are comparable. A variation of a
(a) of the regression lines. If the difference be- fuzzy cut point occurs when ineligibles “fudge”
tween at (“a” for treated eligibles) and aut (“a” their scores on the eligibility criterion. This
for untreated ineligibles) is statistically signifi- non-random measurement error is a threat to
cant, then the program is successful. If not, the the validity of the design.These effects should
program is ineffective (the as were the same) be minimized (by careful screening) and at
or dysfunctional (aut was statistically signifi- least estimated. Fourth, because regression is a
cantly different from at and higher than at). linear statistic, a visual examination of the bi-
Langbein describes the rationale of the variate distributions ensures that nonlinear
design as follows: trends are not being masked. An associated
caution is that one should remember that the
It assumes that subjects just above and
program’s effect is being measured at only a
just below the eligibility cut point are
relatively small band of eligibility/ineligibility;
equivalent on all potentially spurious and
consequently, a visual inspection of overall
confounding variables and differ only in
trends seems reasonable for one’s claiming
respect to the treatment. According to this
programmatic impact.
logic, if the treated just-eligibles have
homes whose value exceeds those of the
untreated just-ineligibles, then the treat-
ment should be considered effective. The
The Need for an
linear nature of the regression makes it
Evaluability Assessment
possible to extrapolate these results to the
As mentioned in chapter 1, evaluability as-
rest of the treated and untreated groups
sessments are made to determine the extent
(1980, 95–96).
to which a credible evaluation can be done on
Thus, the design controls for possible threats a program before committing full resources
to internal validity by assuming that people to the evaluation. The process was developed
with incomes of $9,999 are similar in many by Joseph Wholey (1979) in the 1970s but has
respects to those who earn $10,001. been widely adopted by others (in particular
Several cautions are advanced to those Rutman 1980). An evaluability assessment
considering using this design. First, if the eligi- addresses two major concerns: program
bility cut-off for the program in question is the structure, and the technical feasibility of
same as that for other programs, multiple implementing the evaluation methodology.
treatment effects should be considered so as The concern with program structure is
not to make the results ambiguous. Second, essentially overcome by conducting a mini-
the effects of self-selection or volunteerism process evaluation. The first step in conduct-
among eligibles should be estimated. A nested ing an evaluability assessment is to prepare a
experiment within the design can eliminate document model of the program. Here, the
this threat to internal validity. Third, when evaluator prepares a description of the pro-
CHAPTER 2 EVALUATION DESIGNS 29
gram as it is supposed to work according to of the program’s complex activities and the
available documents. These documents in- types of impacts it may be producing.
clude legislation, funding proposals, pub- Through the fieldwork the evaluator tries to
lished brochures, annual reports, minutes of understand in detail the manner in which the
meetings, and administrative manuals. This program is being implemented. She or he
is the first step for the evaluator in attempting compares the actual operations with the
to identify program goals and objectives and models she or he has developed. With this
to construct the impact model (description work complete, the evaluator is now in a po-
of how the program operates). sition to identify which program components
The second step in conducting the and which objectives can seriously be consid-
evaluability assessment is to interview pro- ered for inclusion in the impact study.
gram managers and develop a manager’s Finally, the evaluator considers the feasi-
model. Here the evaluator presents the man- bility of implementing the proposed research
agers and key staff members with the docu- design and measurements. Is it possible to se-
ment model and then conducts interviews lect a relevant comparison group? Can the
with them to determine how their under- specified data collection procedures be
standing of the program coincides with the implemented? Would the source of the data
document mode. The evaluator then recon- and the means of collecting it yield reliable
ciles the differences between the documents and valid information (Rutman 1980)?
model and the manager’s model. Evaluability assessments will increase the
The evaluator then goes into the field to probability of achieving useful and credible
find out what is really happening. The docu- evaluations. This was ever so clearly illustrated
ments model and manager’s model are not by one of our favorite evaluation stories by
usually enough to yield a real understanding Michael Patton (case 1.1 in chapter 1).
FIGURE 2.3
Regression-Discontinuity Design: The Impact of Subsidized Interest Credit on Home Values
Source: Langbein, 1980, 95.
30 PART I INTRODUCTION
Note
1. Although a number of authors in the evalua- groups are groups that are matched to be com-
tion field use the terms control group and com- parable in important respects to the experi-
parison group interchangeably, they are not mental group. In this book, the distinction
equivalent. Control groups are formed by the between control groups and comparison
process of random assignment. Comparison
groups is strictly maintained.
References
Aldrich, Howard E. 1979. Organizations and Nachmias, David. 1979. Public Policy Evaluation:
Environments. Englewood Cliffs, N.J.: Prentice Approaches and Methods. New York: St. Martin’s.
Hall. Nachmias, David, and Chava Nachmias. 1981.
Bartik, Timothy J., and Richard D. Bingham. 1997. Research Methods in the Social Sciences. 2d ed.
“Can Economic Development Programs Be New York: St. Martin’s.
Evaluated?” In Dilemmas of Urban Economic Patton, Michael Q. 1978. Utilization-Focused
Development: Issues in Theory and Practice, ed. Evaluation. Beverly Hills, Calif.: Sage.
Richard D. Bingham and Robert Mier. Thousand Posavac, Emil J., and Raymond G. Carey. 1997.
Oaks, Calif.: Sage, 246–77. Program Evaluation: Methods and Case Studies.
Butterfield, Fox. 1997. “Study Suggests Shortcom- 5th ed. Upper Saddle River, N.J.: Prentice Hall.
ings in Crime-Fighting Programs.” [Cleveland] Rees, Albert. 1974. “An Overview of the Labor-
Plain Dealer (April 17), 3–A. Supply Results.” Journal of Human Resources 9
Campbell, Donald T. 1984.“Foreword,” in Research (Spring), 158–80.
Design for Program Evaluation: The Regression- Roethlisberger, Fred J., and W. Dickson. 1939.
Discontinuity Approach, by William M. Trochim. Management and the Worker. Cambridge, Mass.:
Beverly Hills, Calif.: Sage, 29–37. Harvard University Press.
Campbell, Donald T., and Julian C. Stanley. 1963. Rossi, Peter H., and Howard E. Freeman. 1985.
Experimental and Quasi-Experimental Designs for Evaluation: A Systematic Approach. 3d. ed.
Research. Boston: Houghton Mifflin. Beverly Hills, Calif.: Sage.
______. 1963. “Experimental and Quasi-Experi- Rutman, Leonard. 1980. Planning Useful Evaluations:
mental Designs for Research on Teaching.” In Evaluability Assessment. Beverly Hills, Calif.: Sage.
Handbook of Research on Teaching, ed. N.L.
Scott, W. Richard. 1981. Organizations: Rational,
Gage. Chicago: Rand McNally, 171–247.
Natural, and Open Systems. Englewood Cliffs,
“Drop in Violent Crime Gives Rise to Political Debate.” N.J.: Prentice Hall.
1997. [Cleveland] Plain Dealer (April 14), 5–A.
Skidmore, Felicity. 1983. Overview of the Seattle-
Hatry, Harry P., Richard E. Winnie, and Donald M. Denver Income Maintenance Experiment Final
Fisk. 1981. Practical Program Evaluation for State Report. Washington, D.C.: U.S. Department of
and Local Governments. 2d ed. Washington, D.C.: Health and Human Services, May.
Urban Institute.
Wholey, Joseph S. 1979. Evaluation: Promise and
Horst, Pamela, Joe Nay, John W. Scanlon, and Joseph Performance. Washington, D.C.: Urban Institute.
S. Wholey. 1974. “Program Management and the
Federal Evaluator.” Public Administration Review
34, no. 4: 300–308. Supplementary Readings
Kaufman, Herbert. 1985. Time, Chance, and Judd, Charles M., and David A. Kenny. 1981.
Organizations: Natural Selection in a Perilous Estimating the Effects of Social Interventions.
Environment. Chatham, N.J.: Chatham House. Cambridge, England: Cambridge University
Press, chaps. 3 and 5.
Langbein, Laura Irvin. 1980. Discovering Whether
Programs Work: A Guide to Statistical Methods for Mohr, Lawrence B. 1988. Impact Analysis for
Program Evaluation. Glenview, Ill.: Scott, Program Evaluation, Chicago: Dorsey, chaps. 4
Foresman. and 6.
CHAPTER 3
Measurement
Program evaluation is an empirical enter- gender is the concept and we would measure
prise. The logic of empirical inquiry concerns it by separating people by how gender is
the investigation of measurable, observable manifest—females and males. We do not
phenomena. Empirical knowledge is objec- “see” gender; rather, we see men and women.
tive knowledge. While the choice of what one In the physical sciences, concepts are eas-
studies may be subjective, the method by ily measured since they are physically “seen”—
which one does it is not. That is why empiri- mass can be weighed, length can be mea-
cal research is straightforward. Once the sub- sured, white blood cells can be counted.
ject of the investigation is identified, there are Some would say that it is unfortunate that in
rules by which the investigation proceeds. evaluation and the social sciences the gap be-
Measurement is a key component of empiri- tween many concepts and how the concept is
cal inquiry. In its broadest sense, “measure- observed in the world is sometimes quite
ment is the assignment of numerals to ob- wide. We rather would like to agree with E.L.
jects or events according to rules” (Stevens Thorndike, who said, “If something exists, it
1951, 1). exists in some amount. And if it exists in
some amount, it can be measured” (Isaac and
Michael 1981, 101). That is what makes
Conceptualization and the evaluation an art, a science, and a theoreti-
Process of Operationalization cally stimulating exercise.
Once you have identified a concept of in-
Concepts are the abstract or underlying terest, the first step in measurement is to de-
properties of variables we wish to measure. fine it conceptually. Let us take a concept to
The more abstract the concept, the more dif- which most people can relate—political par-
ficult it is to measure. In evaluation research, ticipation. An appropriate conceptual defini-
some concepts are very easy to measure. For tion might be, “any effort by citizens to affect
our purpose right now, let us assume that the outcome of political decisions.” Now, let
“measure” means how we see the concept in us assume that our study of political partici-
reality. For example, if we think that gender pation is confined to industrialized democra-
has an impact on the outcome of a treatment, cies in North America. That being the case,
32 PART I INTRODUCTION
we may wish to narrow our definition a bit to tions). The variable must be both conceptu-
“any legal effort by citizens to affect the out- ally and operationally measured at the same
come of political decisions.” This stipulative level of aggregation or unit of analysis.
definition makes sense in this context. It spe- For example, let us go back to political
cifically excludes illegal means of participat- participation. If we are interested in explain-
ing, such as political assassinations, bomb- ing the political participation level of people,
ings, airplane hijackings, or kidnappings. the unit of analysis would be the individual
This would be appropriate in the North and it could be measured in a number of
American context. In some countries, how- ways. One measure of participation mea-
ever, one could understand that to exclude sured at the individual level is to count how
these illegal forms would be to miss an im- many times an individual voted in the past
portant part of the concept of political par- five years. The variable is “voting frequency.”
ticipation as it is practiced in those countries. Another way is to create an index by adding
That brings us to the operational defini- up the different ways a person has reported
tion. When we operationalize a concept, we that they have participated using the Verba,
define how it is seen in reality. Given our Nie, and Kim items. The variable is a “par-
stipulative definition, how do we know politi- ticipation index.” A third way is to ask some-
cal participation when we see it? Sidney one if they voted in the last presidential elec-
Verba, Norman Nie, and Jae-On Kim (1979) tion. The variable is “voted in last
identify several variables that operationalize presidential election.”
political participation in their classic studies: Measurement refers to the rules for as-
voting, contacting elected or appointed offi- signing numbers to values on a variable. For
cials, contributing money for a candidate, voting frequency, consider the following rule:
running for elective office, trying to persuade For each individual in the study, look at the
someone of your political views, and cam- last five years of election records and count
paigning for a candidate. When you opera- the number of times each individual voted.
tionalize a concept, you name the variable Each voting incident increases the voting fre-
that you subsequently will measure.1 quency by 1. An alternate rule could be as fol-
Measurement refers to the rules for as- lows: Count the number of times each indi-
signing numerals to values of a variable. Ac- vidual voted in the past five years and divide
cording to Fred Kerlinger, it by the number of elections for which they
were eligible to vote. This alternative mea-
measurement is the game we play with ob-
surement controls for the time a person
jects and numerals. Games have rules. It
moves into or out of a voting district and also
is, of course, important for other reasons
controls for eligibility to vote by virtue of age.
that the rules be “good” rules, but whether
A variety of ways exist for constructing
the rules are “good” or “bad,” the proce-
the participation index. One way is to tally the
dure is still measurement. (1973, 427)
number of Verba, Nie, and Kim items each in-
In order to specify the rules, we need to iden- dividual had done in the past year. The result
tify the unit of analysis of the phenomenon would be an index which would vary from 0
we are measuring. The unit of analysis is the (no participation) to 6 (participated in all
general level of social phenomena that is the forms of political participation). Another,
object of the observation measured at an what Verba, Nie, and Kim have argued as a
identified level of aggregation (e.g., indi- “better” rule/measure, would be to do a factor
vidual persons, neighborhood, cities, na- analysis of the degrees of each item (number
Another Random Scribd Document
with Unrelated Content
despotisms established over them; and in the effort to reconstruct
the Union, the great mass of the people disfranchised, and the right
of suffrage given to the freed slaves, because it was alleged that the
Southern people were still rebellious, and so wedded to the idea of
secession, notwithstanding the bitter experience of the war, that
they could not be trusted with the right to vote and hold office. All of
this was done with Mr. Greeley's full knowledge and sanction.
It has been shown how long, how earnestly, and how anxiously
the question was discussed in Virginia, and that secession was
resorted to by that state only when a war of coercion had been
proclaimed, and she had been required to furnish troops to carry it
on. The state of Virginia believed, with Mr. Greeley, that it would be
a grievous wrong to "rush upon carnage to defy and defeat" the
right of the Cotton States to withdraw from the Union; and she
determined to do what he had declared his purpose of doing: that is
"resist all coercive measures." The ordinance of secession was
submitted to the popular vote at an election held more than one
month after its adoption by the convention, and it was ratified by an
overwhelming majority, thus showing beyond dispute that it was
"the echo of an unmistakable popular fiat." Did not "those who
rushed upon carnage to defy and defeat" "a judgment thus
rendered, a separation so backed," "place themselves clearly in the
wrong?"
Yet Virginia was the first of the seceding states invaded by the
Federal army; her towns and plains were devastated by a long and
cruel war; her people plundered, imprisoned and murdered; her
territory severed, and a new state erected within her limits, in
violation of the Constitution of the United States. Subsequently a
military despotism was thrust upon them, and the freed slaves were
vested with the right of suffrage and the capacity to hold office,
while such wide measures of disfranchisement were adopted that
enough men competent to fill the petty offices of state, even with
those whose fears and cupidity led them to apostatize and the influx
of adventurers could not be found in all the limits of that old
commonwealth which has been designated "the mother of states
and of statesmen."
In the case of Maryland, Kentucky and Missouri, the people were
overrun by Federal troops owing to the peculiar nature of their
situation, and they were deprived of the opportunity of freely
discussing and deliberating upon the questions involved, though the
legislature of Missouri did pass an ordinance of secession. Did not
those people, under such circumstances, have the right individually
to resist so flagrant an outrage upon their rights and liberties? They
were not only deprived of the liberty of peaceably assembling to
discuss their grievances, but it was sought to deprive them of the
right "to keep and bear arms," as expressly guaranteed by the
second amendment to the Constitution, in order that they might
have the means always of defending their liberties and rights, and
the only resource they had was to unite as individuals in the defence
of the common cause, and of their own violated homes and liberties.
It has been said that the Confederate states began the war by
firing upon Fort Sumter. If those states had the right to withdraw
from the Union and the United States had no right to resist or coerce
them then the attempt to maintain a garrisoned fort in one of the
most important harbors of the Confederacy, was an act of war. This
had, nevertheless, been patiently borne with, for nearly three
months after the secession of South Carolina, in whose principal
harbor the Fort was situated, and it was only when the Government
of the United States had given notice of its intention to supply Fort
Sumter "peaceably, if possible, otherwise by force," and the vessels
for that purpose had appeared off the harbor, that the attack began.
The commissioners sent to Washington to effect a peaceable
settlement of all questions had then been denied an audience, and
informed that the authorities at Washington would hold no
intercourse with them.
The war was thus inevitable, and the Federal authorities were
quietly preparing for it, in order to entrap the border states. The
threat to supply Fort Sumter indicated a purpose of war; was then
the Confederate Government to wait until the measures of the
Government at Washington had been so completely taken that the
former would find itself helpless in the hands of its enemy? The port
of Charleston was necessary to it as an inlet for obtaining supplies
and arms for its defence, was it then to allow the port, which could
block the entrance to that harbor, to be placed in a condition to
render the blockade complete, the harbor worthless and Charleston
untenable?
There can be no question of the right of the Confederate
Government to force a surrender of the fort, which had been
refused, and that it was fully warranted in pursuing the course it did.
I must confess that, at the time, I deeply deplored and condemned
the attack on Fort Sumter, on the score of policy, because I regarded
the threat of the Washington Government as designed to provoke a
commencement of the conflict by the firing of the first shot, and not
intended really to be carried into effect. It is now manifest that war
had already been resolved upon, and the firing of the first gun on
Fort Sumter was not its commencement. The war was begun by the
attempt to hold the forts in the Confederate harbors.
It has been alleged that the Southern States had previously
controlled the policy of the government, and that they seceded
because they had now lost that power. There had never been a
president elected from any of the Cotton States, which established
the Confederate Government except from Louisiana, of which state
General Taylor was a nominal resident, but really a native of Virginia,
and an officer in the army, and he lived but a little over a year after
his inauguration. These Cotton States had furnished comparatively
few cabinet ministers, and they had in the main been opposed to the
policy pursued by the government in regard to the most important
branches of legislation, such as internal improvements, the public
lands, tariff, etc. Their leading interest, the culture of cotton, had
received no fostering care whatever from the government, and
South Carolina had been complaining for more than thirty years that
her interest had been sacrificed to Northern cupidity by high tariff
and at one time she had taken steps to nullify the laws on that
subject. In no sense could the state which initiated secession, be
said to be actuated by disappointment at the loss of Federal power.
It is true that they had lost the power to protect themselves in
the Union, as the Constitution had been so flagrantly violated and
were now threatened with submission—and for this they seceded.
The state of Virginia had given four of the Southern presidents to
the Union, and Tennessee the other two. Washington had been the
unanimous choice of all of the states; Jefferson, Madison and
Munroe had been national men in their policy and had received the
support of a large majority of the Northern vote; Munroe being
accepted without opposition at his last election and receiving all of
the votes, North and South, but one northern electoral vote. Munroe
was the last Virginian elected or nominated as President. It is true
Tyler had succeeded to the office by the death of Harrison, but he
had not received the vote of Virginia even as vice-president.
Virginia had voted against Clay, Harrison, Taylor and Scott, all
natives of the state, when they were candidates for the presidency,
and she had cast her vote three times against Mr. Clay, and in the
cases of Harrison, Taylor and Scott, her vote had been cast for
Northern men against them. All of the presidents she had given had
been re-elected, because there was nothing sectional or local in their
policy, while no Northern president had been re-elected, though
three out of the six had been candidates again. In the election of
1860, the state of Virginia cast its vote for Bell and Everett, by a
plurality vote over Breckenridge and Lane, and Douglas and
Johnson, showing that in this election she was not liable to the
charge of sectionalism, even if that charge could be brought against
the supporters of Breckenridge and Lane, which is by no means
admitted. No interest of Virginia had at any time been fostered by
the action of the government, in any stage of its history, and the
government had not even taken steps to obtain from foreign
countries a diminution of the enormous duties placed on her leading
staple, tobacco, but her statesmen, when in office, had pursued a
policy looking to the general welfare and prosperity. If she had
furnished many statesmen to the common councils, it was because
of the general confidence in their patriotism, and freedom from all
selfish ambition and narrow-minded notions of policy.
Her history from the beginning of the controversy with Great
Britain had been one of sacrifices for the benefit of all of the states.
She had promptly sent troops to Massachusetts on the
commencement of the war of the Revolution in that state, all of its
battlefields were red with the blood of her sons; and that war had
been terminated on her own soil. With a territory larger than that of
all of the other states at the conclusion of peace, she had
surrendered an empire beyond the Ohio river, for the sake of Union
and for the common benefit; and subsequently, she had consented
to the erection of the state of Kentucky within her remaining
territory.
As the acknowledgement of the independence of the states had
left her, she would have been amply able to take care of herself, and
erect a powerful government of her own, yet she had contracted her
power and narrowed her limits for what she considered the common
good.[A]
[A] Note—The following extract is from the "History of the American Civil War"
by Professor Draper, a Northern Union man, which shows the nature of Virginia's
sacrifices: "At the time of the Declaration of Independence, Virginia was the most
powerful of the colonies; she occupied a central position and had in Norfolk one of
the best harbors on the Atlantic. She had a vast western territory, an imposing
commerce, and in the production and export of tobacco not only a source of
wealth, but from the mercantile connections it gave her in Europe, a means of
refinement. It was through this circumstance that so many of her young men were
educated abroad. When the epoch of separation from the mother country had
come, and the question of Confederation arose, she might have asserted her
colonial supremacy; she might have been the central power. Many of her ablest
men subsequently thought that in her voluntary equalization with the feeblest
colonies, the spontaneous surrender of her vast domain, the self-abnegation with
which she sacrificed all her privileges on the altar of the Union, she had made a
fatal mistake. In her action there was something very noble."
Injurious Effect of
Misinformation
In connection with this claim of the slave power were the most
shocking misrepresentations of the condition of the slaves
themselves and of the social relations of the Southern people, in
order to array the prejudices of the world against their cause. This
course of misrepresentation had long been pursued before the war,
and was not confined to American writers, but many works appeared
from the British press containing libels upon the society of the
Southern states and false views of slavery as established there. Such
works in both countries were evidently written by persons with
prejudiced minds or who knew little practically of slavery as it
existed in the South. Such was the intolerance of the public
sentiment which had been fostered in both countries upon the
subject, that no candid and impartial account of the workings of
domestic slavery as it existed in the Southern states would be
received with the slightest favor, whilst the exaggerated accounts of
cruelty practiced by the slave-owners, and consequent sufferings of
the slaves were eagerly accepted as the truth.
A very striking evidence of this prejudice was furnished by the
reception given to the works of two female writers not many years
since. The one, Miss Harriett Beecher, later Mrs. Stowe, wrote a
work of fiction called "Uncle Tom's Cabin," containing
misrepresentations of slavery and slanders upon Southern society.
Drawing upon a fertile imagination and pandering to the prejudices
of the uninformed, she published the book, which had a great run in
Europe as well as America, and was translated into almost all of the
continental languages. The incidents contained in the book were
either erroneous in point of fact or greatly exaggerated, but the
book itself was still more untrue as a picture of Southern society and
slavery, and would have been a misrepresentation if every fact
contained in it had been true in isolated cases. But the book was
received as a true and faithful picture of society and slavery in the
South, not merely by the agitators of abolition, but by that very
considerable class of persons in the world who allow others to do
their thinking, and when the authoress visited Great Britain, she was
treated with great attention and extensively feted by the nobility,
gentry and others. The view of Southern slavery which she drew is
perhaps accepted by nine-tenths of the otherwise well informed
persons in Europe.
In remarkable contrast to Miss Beecher's case, was that of Miss
Murray, a lady of talents and refinement, who held the position of
maid of honor to Queen Victoria. Miss Murray visited the United
States as a tourist with all of her predilections against slavery, but
she happened to be one of those persons who, not satisfied with
hearsay report, took the necessary trouble to inform herself
intelligently upon the subject. In the course of her travels, she went
into the Southern states, where she remained for some time as a
guest on some of the plantations. She had the opportunity of
observing the workings of domestic slavery as it actually existed and
in all of its details, and she availed herself of that opportunity to
make her own reflections. In letters to friends at home she gave the
result of her actual observations and upon her return to England was
induced to publish her letters. These letters represented slavery in
the Southern states in a very different light from that in which it was
accustomed to be presented to the British public, and the
consequence was that Miss Murray was notified by the ministry that
it was not desirable that she should longer occupy the relation she
held to the Queen, as the views she expressed in regard to slavery
were not consonant with the policy of the British government; so she
was retired.
This illustrates the difference in the reception of two works on
the subject of slavery given by the British public: one a work of
fiction from a prejudiced writer, the other a matter-of-fact account of
an eye witness of what she undertook to describe.
If British ministers could thus view the subject and be guilty of
the injustice they perpetrated in Miss Murray's case, what could be
expected of the great mass of British readers? It is hard to conceive
how the glory or prosperity of a nation could be advanced by giving
currency to fallacies, or suppressing the truth in regard to the actual
condition of African slavery in the Southern states.
It would seem that as Great Britain had had so much to do with
fostering the institution in those states, it would be rather gratifying,
than otherwise, to its ministers and its people, to know that the
descendants of those who had been ravished from their native
country by the cupidity of their predecessors, were in a contented
and comfortable condition. But such was not then "the policy of the
government" and perhaps the philanthropic disciple of Exeter Hall
who callously passed by the misery, want and immorality at her own
door in the great city of London, while she shed tears over the
imaginary woes pictured by Miss Beecher, would have been equally
as indignant as the British ministers with Miss Murray for attempting
to disabuse her of the delusion which caused those tears to flow.
Such is, and perhaps ever will be, the character of human
philanthropy, that it troubles itself more about the sufferings which
exist a long way off or only in imagination, than those which are
before its eyes. One weeps over the trials of the hero or heroine in a
novel or a play, while we pass the miserable child of want and sin in
the street with perfect indifference. If slavery did not have its evils
and its wrongs, it would not be a human institution, and as long as
"man's inhumanity to man makes countless thousands mourn," so
long will evils and wrongs exist in every relation of human society.
These exist in the relation of governor and governed, parent and
child, husband and wife, master and servant, employer and
employed, neighbor and neighbor, and are not excluded even from
the church.
It is not pretended therefore that some masters did not abuse
their servants, but these were rare instances, more perhaps than in
any other relation of like, and if for no other reason, the great mass
of masters were induced to treat their slaves well, because it was
their interest to do so. Let any one compare the condition of the
African in his native land, with that of the slaves of the South before
the violent abolition of slavery, and then say whether that institution,
which had produced such a vast improvement in his condition was
so great a wrong after all.[B]
[B] Note—Professor Draper, in his "History of the American Civil War" thus
represents the condition of the negro in his native land. "The Negro in Africa."
"On the west coast of Africa, the true negro-land, the thermometer not
infrequently stands at 120° in the shade. For months together it remains, night
and day, above 80°. The year is divided into the dry and the rainy season; the
latter setting in with an incessant drizzle, continues until May. It culminates in the
most awful thunderstorms and overwhelming rains. This is particularly the case in
the mountains. When the dry season has fairly begun a pestiferous miasm is
engendered from the vast quantities of vegetable matter brought down into the
low lands by torrents. From the fevers thus arising the negroes themselves suffer
severely.
"Moisture and heat, thus so fatal in their consequences to man, give to that
country its amazing vegetable luxuriance. For hundreds of square miles there is an
impenetrable jungle, infested with intolerable swarms of musquitoes. The interior
is magnificently wooded. The mangrove thickets that line the river banks upon the
coast are here replaced by a dark evergreen verdure, interspersed with palms and
aloes. A rank herbage obstructs the course of the streams. The crocodile,
hippopotamus, pelican find here a suitable abode. Monkeys swarm in the woods;
in the more gloomy recesses live the chimpanzee, gorilla and other anthropoid
apes approaching man most closely in stature and habits of life. In the open land
—the prairie of equatorial Africa—game is infrequent; there are a few antelopes
and horned cattle, but no horses. Man—or perhaps more truly woman—is the only
beast of burden.
"Plantains, sweet potatoes, cassava, pumpkins, ground-nuts, Indian corn, the
flesh of the deer, antelope, bear, snake, furnish to the negro, his food. He lives in a
hut constructed of bamboo or flakes of bark, thatched with matting or palm
leaves. His villages are often palisaded. Too lazy, except when severely pressed, to
attend to the labors of the field, he compels his wives to plant the roots or seeds,
and gather the scanty harvest. In hunting and in war, his main occupation, he
relies upon cunning and will follow his prey with surprising agility, crawling like a
snake prone upon the ground. He has little or no idea of property in land; slaves
are his currency; he makes his purchase and pays his debts with them. 'A slave is
a note of hand that may be discounted or pawned. He is a bill of exchange that
carries himself to his destination, and pays a debt bodily. He is a tax that walks
corporeally into the chieftain's treasury.'
"Ferocious in his amours, the African negro has no sentiment of love. The more
wives he possesses the richer he is. If he inclines to traffic, each additional father-
in-law is an additional trading connection; if devoted to war, an ally. His animal
passions too often disdain all such mercenary suggestions; he brings home new
wives for the sake of new gratifications. Fond of ornaments, his prosperity is
displayed in thick bracelets and anklets of iron or brass. An old European hat or a
tattered dress-coat, without any other article of clothing is a sufficient badge of
kingship. He inclines to nocturnal habits. He will spend all the night lolling with his
companions on the ground at a blazing fire, though the thermometer may be at
more than 80°, occupying himself in smoking native tobacco, drinking palm wine
and telling stories about witches and spirits. He is an inveterate gambler, a jester
and a buffoon. He knows nothing of hero-worship; his religion is a worship of
fetiches.
"They are such objects as the fingers and tails of monkeys, human hair, skin,
teeth, bones, old nails, copper chains, claws and skulls of birds, seeds of plants.
He believes that evil spirits walk at the sunset hour by the edge of forests; he
adores the devil, who is thought to haunt burial-grounds and, in mortal terror of
his enmity, leaves food for him in the woods. He welcomes the new moon by
dancing in her shine. Whatever misfortune or sickness befalls him, he imputes to
sorcery and punishes the detected wizard or witch with death. He determines guilt
by the ordeal of fire: the accused who can seize a red-hot copper ring without
being burned is innocent. His medicine-man—a wind raiser and rain-maker—
pursues his main business of exorcism in a head-dress of black feathers, with a
string of spirit-charms around his neck and a basket of snake-bone incantations.
The more advanced tribes have already risen to idol worship; they adore
grotesque figures of the human form, and following the course through which
intelligence in other races has passed, they have wooden gods who can speak and
nod and wink.
"In this deplorable, this benighted condition, the negro nevertheless shows
tokens of a capacity for better things. He is an eager trader, and knows the value
of his ebony, bar-wood, beeswax, palm oil, ivory. He has learned how to cheat;
nay, more, infrequently he can out-cheat the white man. He can adulterate the
caoutchouc and other products he brings down to the coast and pass them off as
pure. His color secures him from the detection of a blush when he lies. Though
utterly ignorant of any conception of art, he is not unskillful in the manufacture of
cooking pots and tobacco pipes of clay; he has a bellows-forge of his own
invention; he can reduce iron from its ores and manufacture it. He makes shields
of elephants' hide, cross-bows, and other weapons of war. But in the construction
of musical instruments his skill is chiefly displayed. From drums of goat-skins, from
harps and gourds, he extracts their melancholy sounds and disturbs the nocturnal
African forests with his plaintive melodies.
"It has been affirmed by those who have known them well, that the equatorial
negro tribes do not increase but tend to die out spontaneously. This is attributed
to infanticide and to the ravages of miasmic fever, which in its most malignant
form will often destroy its victim in a single day. Even though quinine be taken as
a prophylactic no white man can enter their country with impunity. The night dews
are absolutely mortal."
ERRATA
Page 51, line 17— "obitu" should be "obitur."
Page 77, line 11— the letter "a" should be inserted before
the word "felony."
Transcriber Note
Minor typos were corrected. All corrections in the ERRATA
have been applied.
*** END OF THE PROJECT GUTENBERG EBOOK THE HERITAGE OF
THE SOUTH ***
Updated editions will replace the previous one—the old editions will
be renamed.
1.D. The copyright laws of the place where you are located also
govern what you can do with this work. Copyright laws in most
countries are in a constant state of change. If you are outside the
United States, check the laws of your country in addition to the
terms of this agreement before downloading, copying, displaying,
performing, distributing or creating derivative works based on this
work or any other Project Gutenberg™ work. The Foundation makes
no representations concerning the copyright status of any work in
any country other than the United States.
ebookgate.com