Module_3
Module_3
015
DIPLOMA IN
DIPLOMA IN
MONITORING AND
MONITORING AND
EVALUATION
EVALUATION
MODULE 3
MODULE 3
Module two
Module threeofofthe
Monitoring and
Diploma in Evaluationand Evaluation
Monitoring
CATI
CAPACITY AFRICA TRAINING INSTITUTE
TABLE OF CONTENTS
Choosing questions and planning for Evaluation .....................................Pg 3
Information Gathering and Synthesis.......................................................Pg 36
Qualitative and Quantitative Evaluation Design .....................................Pg 59
Selecting appropriate Design ...................................................................Pg 88
Collecting and Analyzing Data ...............................................................Pg118
Collecting and use of Archival Data ......................................................Pg 142
Refining Project based on Evaluation Research .....................................Pg 166
2
MODULE 3 MONITORING AND EVALUATION
Chapter 1
In this Module, we'll discuss the first, and perhaps most important, step in evaluation research: deciding
exactly what to evaluate. Each of the rest of the sections in the chapter will deal in detail with one of the
steps you'll need to take to design, implement, and use the evaluation. The goal of the chapter is to provide
guidelines that are useful to grassroots or community-based organizations as well as students or academic
researchers.
Every evaluation, like any other research, starts with one or more questions. Sometimes, the
questions are simple and easy to answer. (Will we serve something close to the 50 people we
expect to?) Often, however, the questions can be complex and the answers less easy to find.
(Which or which combination, of the three parts of our intervention will affect which of the two
behavior changes we seek within participants?) The questions you ask will guide not only your
evaluation, but your program as well. By your choice of questions, you're defining what it is you're
trying to change.
For example, what's the real goal of a program to introduce healthier foods in school lunches? It
could be simply to convince children to eat more fruits, vegetables, and whole grains. It could be
to get them to eat less junk food. It could be to encourage weight loss in kids who are overweight
or obese. It could be to educate them about healthy eating, and to persuade them to be more
adventurous eaters.
3
The evaluation questions you ask both reflect and determine your goals for the program. If you
don't measure weight loss, for instance, then clearly that's not what you're aiming for. If you only
look at an increase in children's consumption of healthy foods, you're ignoring the fact that if they
don't cut down on something else (junk food, for instance), they'll simply gain weight. Is that still
better than not eating the healthy foods? You answer that question by what you choose to examine
- if it is better, you may not care what else the children are eating; if it's not, then you will care.
You choose your evaluation questions by analyzing the community problem or issue you're
addressing, and deciding how you want to affect it. Why do you want to ask this particular
question in relation to your evaluation? What is it about the issue that is the most pressing to
change? What indicators will tell you whether that change is taking place? Is that all you're
concerned with? The answer to each of these and other questions helps to define what it is you're
Academics and other researchers may approach choosing research questions differently from
those involved in community programs. In addition to their practical and social applications,
they may choose problems to research simply because they are interesting, or because they tie
into other work that they or their colleagues are doing. Community service workers and others
directly involved in programs, on the other hand, are concerned specifically with improving what
4
they're doing so they can help to enhance the quality of life for the participants in their programs,
and often for the community as a whole. Since we assume that most people using this chapter of
the Tool Box are likely to be practitioners in the community, let's look at some of the reasons
If you're running, or about to run, a program to affect a community issue or problem, you
Is there a cause-and-effect relationship (i.e., does one action or condition directly cause
another) between a particular action and a particular change? Usually, you'll be concerned with
this in terms of your program. (Does our smoking-cessation support group help members to
community. (Does a smoking ban in public buildings, bars, and restaurants lead to a decrease in
Will the program that worked in the next town, or the one that we read about in a professional
Some of the same differences between the concerns of researchers and the concerns of
practitioners may hold here. Those interested primarily in research may simply be moved by
curiosity or by the urge to solve a difficult problem. As a practitioner, on the other hand, you'll
want to know the effects of what you're doing on the lives of participants or the community.
5
• Your experience with an issue and its consequences in a particular population or
community
• The similarity of the issue to other issues in your community, or the issue's interaction with
other issues
Your interest as a community worker has to be considered in relation to your evaluation and the
purpose of your program. Your basic intent is probably to improve things for the population or
the community, but in what ways and by what means? Are you trying out some new things in the
hope of making an already-successful program more successful? Are you importing a promising
practice to see if it works with your population? Are you trying to solve a particularly difficult
professional problem?
A community mediation program found that it was having little success in cases involving
adolescents. After conferring with other similar programs - all of which were struggling with the
same issue - mediators in the program devised a number of strategies to try to reach youth. The
overall question they were concerned with - "Will these strategies make it possible to mediate
successfully where teens are involved?" - was one with real consequences.
SOCIETY?
Media reports about or community attempts to address the issue are clear indicators that it is
socially important. If it affects a particular group - violence in a given neighborhood, a high rate
6
of heart disease among middle-aged black males - it has an obvious impact on the community
and society. If your program or intervention has the potential to help resolve the issue in other
the importance of your analysis increases even further. If addressing the issue can lead to
All of this affects your evaluation and the questions you ask. If the issue is one of social
importance, then your evaluation of your work is socially important as well. Are you addressing
the aspects of your program or intervention that are of the greatest value to participants, the
The real question here is not whether the issue is important to the field - if it's important to the
community, that's what matters. However, you should explore whether there's evidence from the
field to apply to the issue. Is what you're doing likely to be more effective than other approaches
that have been tried? If your approach isn't effective, are there other approaches out there that hold
more promise? Can the published material about the issue help you understand it better, and give
COMMUNITY?
Consider whether there is evidence that the issue occurs with a variety of populations and under
a range of conditions. Also consider whether the observations or methods used to determine the
issue's existence are accurate and whether they can be used in different situations and with
7
different groups. Your evaluation may give you valuable information to pass on to practitioners
If evaluation shows that your program or intervention is successful, that's obviously valuable
information, especially if what you're evaluating is innovative and hasn't been tried before. Even
if the evaluation turns up major problems with the intervention, that's still important information
for others - it tells them what won't work, or what barriers have to be overcome in order to make
it work.
Some of those who might use your results include individuals and groups affected by the issue;
service providers and others who have to deal with the problem (in the case of youth violence,
for instance, this last group might include police, school officials, small business owners,
parents, and medical personnel, among others); advocates and community activists; and public
Who has to change in order to address the issue? The focus of the intervention will tell you
Some possibilities:
• Those in direct personal contact with those directly affected: parents, spouses and children,
8
• Those who serve or otherwise deal with those directly affected: medical professionals,
• Administrators and others who serve or deal with those indirectly affected: hospital or
You know why you're running your program. Evaluating it should just be a matter of deciding
whether things are better when you evaluate than they were before you started, right? Well,
actually...wrong. It's not that simple. First of all, you need to determine what "things" you are
actually looking at (remember the school lunch example?) Second, you will need to consider
how you will determine what you're doing right, and what you need to change. Here's a partial
• It helps you understand what effects different parts of your effort are having. By
framing questions carefully, you can evaluate different parts of your effort. If you add an
element after the start of the program, for instance, you may be able to see its effect
separate from that of the rest of the program...if you focus on examining it. By the same
token, you can look at different possible effects of the program as a whole. (Do adult basic
education learners read more as a result of being in a program? Are they more likely to
• It makes you clearly define what it is you're trying to do What you decide to evaluate
defines what you hope to accomplish. Choosing evaluation questions at the start of a
9
program or effort makes clear what you're trying to change, and what you want your results
to be.
• It shows you where you need to make changes. Carefully choosing questions and making
them specific to your real objectives should tell you exactly where the program is doing
well and where the program isn't having the intended effect.
• It highlights unintended consequences. When you find unusual answers to the questions
you choose, it often means that your program has had some effects you didn't expect.
Sometimes these effects are positive - not only did people in the heart-healthy exercise
program gain in fitness, but a majority of them report changing their diet for the better and
losing weight as well - sometimes negative - obese children in a healthy eating program
actually gained weight, even though they were eating a healthier diet - and sometimes
neither. Like the side effects of medication, the unintended consequences of a program can
be as important as the program itself. (In the case of the exercise program,
the changes in diet might do as much as or more than the exercise to maintain heart
health, for instance, and may point toward changing the focus of the program in some
way.)
• It guides your future choices. If you find that your program is particularly successful in
certain ways and not in others, for example, you may decide to emphasize the successful
areas more, or to completely change your approach in the unsuccessful areas. That, in turn,
program, thus making it more likely that it will meet community needs.
10
• It provides focus for the evaluation and the program. Choosing evaluation questions
carefully keeps you from becoming scattered and trying to do too many things at once,
• It determines what needs to be recorded in order to gather data for evaluation. A clear
choice of evaluation questions makes the actual gathering of data much easier, since it
usually makes obvious what kinds of records must be kept and what areas need to be
examined.
Evaluation questions, since they help shape your work, should be chosen and the evaluation
planned when planning the overall program or effort. That gives you time and room for a
participatory process, and gives you the chance to use the evaluation as an integral part of the
program. As the program unfolds, you might find yourself adjusting or adding questions to reflect
the reality of what is happening, but unless your original questions were misguided (you were
wrong about what behavior had to change in order to produce certain results, for instance), they
Now let's discuss reality for many community based and grassroots programs. They're often
understaffed and underfunded. Staffs members may be underpaid, and may often work many
more hours a week than they're paid for, because of their dedication to social justice and social
change. Most or all program staff may even be volunteers, with full-time jobs and family
responsibilities aside from their work in the program. Initial evaluation in these circumstances is
often anecdotal - i.e., based on participants' comments and stories about their progress and staff
11
members' personal, informal observations. A formal evaluation will probably wait until there's
funding for it, or until someone has the time to coordinate or take charge of it.
In that case, the "when" becomes "as soon as you can." You may be dealing with a program that
has just started, or with one that's been operating for a long time. You may know that changes
need to be made, or it may seem that the program is in fact meeting its goals. Whatever the
situation, evaluation questions need to be chosen, and an evaluation planned that will give you
the information you need to improve your work. Even with a program that's been going on for a
while, the questions can still help you define or redefine your work, and will certainly help you
EVALUATION?
If you've consulted other sections of the Tool Box concerned with evaluation, you probably
know that we advocate that all stakeholders be involved in planning the evaluation. We believe
that the best evaluation is participatory. That means that there is representation of the views and
knowledge of people affected by the issue to be addressed. The list of potential participants is
essentially the same as that under "Whose problem is it?" in the first part of this section: those
directly affected and their close contacts; those who work with those directly affected, or who
deal directly or indirectly with them and the issue; and public officials. To these groups, we
might add other concerned citizens, and those indirectly affected by the issue. (A shop owner
may not be a victim of neighborhood violence, but fear of that violence might nonetheless keep
12
Evaluations that involve all stakeholders have a number of advantages over those conducted in a
vacuum by outside evaluators or agency or program staff. They're more likely to reflect the real
needs of the community, and they bring to bear the community's knowledge of its own context -
history, relationships, culture, etc. - without which a program and its evaluation can go astray.
Participation can range from simple consultation before the fact to complete involvement in
every aspect of an evaluation - assessment, planning, data gathering, analysis, and passing on the
information. In general, the greater the involvement of stakeholders, the better, but in-depth
involvement of the stakeholders may not always be possible. There are time disadvantages to
participatory evaluation - it takes longer - and there are logistical concerns, as well. Participants
may have nothing in their backgrounds to prepare them for research, so training in a number of
areas may be necessary, requiring skill, careful planning, and yet more time. The level of
participation your evaluation can sustain, therefore, relies to some extent on your time
Choosing questions
When you choose evaluation questions, you're really choosing a research problem - what you
want to examine with your research. (Evaluation, whether formal or informal, is in fact research.)
You have to analyze the issue and your program, consider various ways they can be looked at,
and choose the one(s) that most nearly tell you what you want to know about what you're doing.
Are you just trying to determine whether you're reaching the right people in sufficient numbers
with your program? Do you want to know how well an intervention is working with specific
populations? What kinds of behavior changes, if any, are taking place as a result? What the
13
actual outcomes are for the community? Each of these - as well as each of the many other things
you might want to know - implies a different set of evaluation questions. To find the questions
that best suit your evaluation, there is a series of steps you can follow.
A problem is a difference between some ideal condition (all people 10 years of age or older
should be able to read; people should be able to find a decent job) and some actual condition in
the community or society (a 25% illiteracy rate among those attending a particular high school;
50% unemployment among minority youths in a particular city). This may mean the absence of
some positive factor (qualified teachers and adequate educational facilities; entry-level jobs that
are reachable from minority neighborhoods) or the presence of some negative factor (students'
difficulty with English; discrimination against minority job applicants), or some combination of
these.
• Describe the ideal condition, including the positive factors present and the negative factors
absent. What should it look like if everything was as you'd want it to be?
• Describe the actual conditions that constitute the problem of interest, including the negative
conditions present and the positive conditions absent. What are conditions really like?
• Describe the actual problem in terms of what you're hoping to change. What positive
factors do you want to produce and/or what negative factors do you want to eliminate?
14
To be sure that this is a problem you really should be addressing, consider its importance
• Is the discrepancy between ideal and actual conditions of the kind and size to be considered
important?
• Who experiences these consequences (i.e. program participants; their families, friends, and
peers; service providers, policymakers, and others)? How many people are affected?
• How often and for how long are they affected? What is the intensity of the effect?
• How much does the fact that the problem is experienced to this degree by these people
matter to them?
You might also ask whether the effects of the problem matter to society, but in fact, that
shouldn't make a difference. If they matter to the people who experience them, they're
important. Society doesn't always consider a problem important if it's only a problem for a
minority, or for a group that's generally ignored (the poor, the homeless).
In light of these factors, decide whether the problem is important to the evaluation.
Whose behavior, by its presence or absence, contributes to the problem? Are they in the
15
For each of them, consider the types of behavior that, by their presence or absence, contribute to
WHAT ARE THE CHANCES THAT YOUR EFFORT CAN HAVE ANY EFFECT ON EACH
OF THEM?
Based on the above analysis, choose behavior changes to target in specific people. Where you
can, specify the desired levels of change in targeted behaviors and outcomes (those changes in
writing, basic skills, etc. - for minority job seekers aged 18-24. Or you might instead or
in addition target policy makers, with the goal of having them offer tax incentives to
This is a way of defining your work. If you're planning the evaluation as you plan the program -
as you would in the ideal situation - then the questions you're asking the evaluation to examine
reflect the problems you're trying to solve, and this kind of analysis is important. If you're
starting an evaluation of a program that has been in place for some time, then you're going to
16
have to do some figuring after the fact about what consequences you think (hope) the program is
having, and what they will lead to. You may be talking about changes in specific participant
behaviors, about behaviors that act as indicators of other changes, or about results of another sort
(participants gaining employment, for instance, which may have a direct relationship to
Make sure that the expected changes would constitute a solution or substantial
If you conclude that they would not result in a substantial contribution, revise your choice of
problem and/or your selection of targeted people and actions as necessary. If you think that what
you're looking at in an evaluation doesn't address the problem, then you should be looking at
something else. If the objectives you've chosen do constitute all or a substantial part of a
SETTING
Now that you've chosen your questions, there may be other factors to consider, such as the
settings in which the evaluation will be conducted. If your program is relatively small and/or has
only one site, this wouldn't be an issue. However, if you don't have the resources - whether
There are some situations in which the choice setting may be important:
methods
17
MULTIPLE SITES
Multiple sites
Can present a challenge for an evaluation, because, although every effort may be made to make
the program at all sites exactly the same, it will seldom be so. If the program relies on human
be differences from site to site depending on the people staffing each. (The exception is when the
same people staff all sites, providing the same services at each site at different times or on
different days.) Even if all are equally competent, no two staff members or teams will do things
in exactly the same way or relate to participants in exactly the same way, and the differences can
be reflected in differences in outcomes. If methods or other factors vary from site to site, that
Furthermore, the physical character of a site can influence not only program effectiveness, but
also the recruitment of participants and whether or not they remain in the program long enough
for it to have some effect (often called "retention.") The site's layout, comfort, apparent safety
and security, and - often most important - how easy it is to get to, all affect whether participants
Where you do have the capacity to evaluate all sites, it will be helpful to build into the evaluation a
method of comparing them. This will allow you to identify and adopt at all sites methods,
conditions, or activities that seem to make one site particularly successful, and to identify and
change at all sites methods, conditions, or activities that seem to create barriers to success at
others.
18
If you can't evaluate each site separately, you'll have to decide which one(s) will give you the
information that will most help in adjusting and improving your program. If you're most
concerned with assessing your overall effectiveness, this may mean evaluating the site(s) closest
interaction, etc. If, on the other hand, your chief consideration is learning whether a particular
new or unusual method or situation is working, you may find yourself evaluating the site(s) least
If sites appear only minimally different, some other considerations that may come into play
are:
• The number and character of participants at the site. Participants at a particular site may be
experiencing the effects of the issue more severely, or may have a particular important
• The ability and willingness of participants and staff to support the evaluation research. If
staff at a particular site are unable or unwilling to record observations, attendance, and
• The stability of the population at the site. If participants at a site come and go at a rapid rate
- unless that's the program's intent - it can be difficult to gain information that contributes to
an accurate evaluation.
• An exception, of course, occurs here if one point of the evaluation is to find out why
participants stay for so short a time, and to try to develop methods or create conditions to
assist them to remain in the program long enough to reach their goals.
19
Sites with different methods, conditions, activities, or services
Programs sometimes are organized so that different methods are used or different services
provided at different sites. In other cases, conditions may vary from site to site because of the sites'
geographical locations or the available space. The ideal situation is to evaluate all sites and
compare the effects of the different methods, conditions, or services. When that's not possible,
If the methods, services, or conditions at a particular site are new or innovative, you may want to
evaluate them, rather than those that have a track record. There may be a particular method or
service that you want to evaluate, in which case the decision about which site to choose is
obvious. The decision should be based on what makes the most sense for your program, and
what will give you the best information to improve its effectiveness.
When you have the capacity to choose more than one site to evaluate, it often makes sense to
choose two or three sites that are different - especially if each is representative of other sites in
the program or of program initiatives - so that you can compare their effectiveness. Even where
sites are essentially similar, you'll get more information by evaluating as many as you can.
PARTICIPANTS
Another factor to consider is the participants whose behavior, activity, or circumstances will be
evaluated. If your program is relatively small this might not be an issue - the participants will
simply be all those in the program. However, if you don't have the resources - whether finances,
time, or personnel - to evaluate the whole program, there are some situations in which the choice
20
• If your program includes different groups of participants (groups that are in different
Multiple groups
There are a number of reasons why there might be multiple groups of participants in a
program. You might start different groups at different times, either because the program has a
rolling start schedule (when there are enough people for a class/training group, one will begin),
or because the program is aimed at different groups (for example, 5 year-olds, 8-year-olds, and
14-year-olds). You might also be trying different strategies with different groups.
The Brookline Early Education Project (BEEP), a program aimed at school readiness for children
aged pre-birth through 5, recruited pregnant families in three cohorts over the course of three
years. In addition, families in each cohort were assigned to one of three levels of service. Thus,
there were actually nine different groups among BEEP participants, even though, by the third year,
Once again, if there's no problem in evaluating the whole program, participants will simply include
Evaluate your work with only one group, with the expectation that work with the others will be
evaluated in the future. In this case, you'd probably want to choose the one for whom you
consider the program most crucial. They might be at greater risk (of heart attack, of school
failure, of homelessness, etc.) or might be experiencing the issue at a high level of intensity
21
(daily shooting incidents in the neighborhood, high rates of teen pregnancy, massive
unemployment).
Include a small number (2-4) of groups in your evaluation. You might want to choose groups
strategies). On the other hand, depending on the focus of your evaluation, you might want
groups that are essentially similar, to see whether your work is consistent in its effects.
Choose a few participants from each group to focus your evaluation on. While this won't give
you a complete picture, it should give you enough information to tell where your program is
accomplishing its goals and where it needs improvement. The differences in the ways
participants in different groups respond to the program (assuming there are differences) can also
Cultural factors can have an enormous effect on participants' responses to a program. They can
govern conceptions of social roles, family responsibilities, acceptable and unacceptable behavior,
attitudes toward authority (and who constitutes authority), allowable topics of conversation,
morality, the role of religion - the list goes on and on. In planning a program that involves
members of different populations and cultures, you essentially have three choices:
• Plan your program and implement it in the same way for everyone. If the program involves
population group but by when they sign up, what time of day they can attend, what they
22
• Plan your program to be as culturally sensitive as possible, and try to screen out anything
that might be offensive to or difficult for any group. In this instance, you might be prepared
• Divide participants by cultural group and plan different culturally sensitive approaches for
each. Your overall approach might be the same for everyone, but the way you apply it
In any of these instances, it would probably be important to understand how well your approach
is working with members of the various populations. If you can evaluate the whole program,
make sure that you include enough members of each group so that you can compare results (and
their opinions of the program) among them. If your evaluation possibilities are limited, then your
choices are similar to those for multiple groups of other kinds, and will depend on what exactly is
There are interactions between the choice of sites and the choice of participants here. You may
be concerned about the effects of your program on a particular population, which may be largely
concentrated at one site. In that case, if you have limited resources, you may want to evaluate
Regardless of other considerations, you may want to set some guidelines about whom you
include in the evaluation. How long do people have to be in the program, for instance, before
they're included? In other words, what constitutes participation? (This also sets a criterion for
who should be counted as a drop-out: anyone who starts, but leaves before meeting the standard
for participation.) What about those whose attendance is spotty - a few days here, a few days
23
there, sometimes with weeks in between? Do they have to have attended a certain number of
These issues can be more complex than they seem. People may start and drop out of a program
numerous times, and then finally come back and complete it. Many others start programs
numerous times, and never complete them. It's usually impossible to tell the difference until
someone actually gets to the point of completion, whatever that means for the particular
program.
stay in a program right up till the end and then drop out. This may have to do with the fear of
having to cope with success and a change in self-image, or it may simply be a pattern the person
has learned to follow, and will have to unlearn before being able to complete the program.
Should any or all of these people be included in or excluded from an evaluation, either before
(because of their history in the program) or after the fact? That's a decision you'll have to make,
based on what their inclusion or exclusion will tell you. Just be sure that your evaluation clearly
describes the criteria that you decide to use for your participants.
RESEARCHER
Up to this point, we've largely ignored the evaluation difficulties faced by evaluators not directly
connected with the organization or institution running the program they're evaluating. If you've
been hired or designated by the organization or a funder to evaluate the program, you have to
establish trust, both with the organization and its staff and with participants, if you hope to get
accurate information to work with. You also have to learn enough in a short period about the
24
community, the organization, the program, and the participants to devise a good evaluation plan,
If you're an independent researcher - a graduate student, an academic, and a journalist - you face
even greater obstacles. First, you have to find a place to conduct your research - a program to
evaluate - that fits in with your research interests. Then, you have to convince the organization
running that program to allow you to do the research. Once you've jumped that hurdle, you're
still faced with all the same tasks as an outside evaluator: establishing trust, understanding the
context, etc.
Let's look first at the process you as an independent researcher might follow in order to choose and
gain access to a setting appropriate to your interests. Once you've gained that access, you've
become an outside evaluator, so from that point on, the course of preparing for the evaluation will
CHOOSE A SETTING
If you're an academic or student, you can probably find an appropriate program by asking
colleagues, professors, and other researchers at your institution. If none of them knows of one
offhand, someone can almost undoubtedly put you in touch with human service agencies and
others who will. Other possible sources of information include the Internet, funders, professional
associations, health and human service coalitions, and community organizations. Public funding
information is often available on the web, in libraries, or in newspaper archives. The wider you
spread your net, the more likely you are to find the program you're looking for.
The right program will obviously vary depending on your research interests, but some
25
• Does the setting include people who are actually experiencing the problem that is of
interest to you?
• Is the setting similar to others of this type? (If not, its program might not be useful to
others dealing with the issue, even if it works well in its own context.)
• Does the setting provide support for the research? Will staff, participants, and others help
with data gathering, be forthcoming about context questions, and cooperate with you?
• Does the setting have the resources to maintain the program after your evaluation is done?
• Does the setting permit the changes in operation required by the research? If the planning
of the evaluation and choosing of questions point to doing things differently, can and will
public transportation or on foot from the areas from which participants are drawn, and
insecure about his educational background, for example.) Accessibility can be the
determining factor in whether participants consider a program, or whether they stay in it.
• Is the setting stable? Are the program and organization stable enough that you know they'll
be able to support their work at the current level, at least until the evaluation is completed?
26
Once you've found an appropriate setting, you'll have to convince the organization to collaborate
with you on an evaluation. The next three steps are directed toward that goal.
Just as you wouldn't go to a job interview without doing some research about the employer, you
shouldn't try to gain the cooperation of an organization without knowing something about it - its
mission, its goals, whom it serves, who the director and board members are, etc. If someone told
you about the organization, she may have, or may know someone who has, much of the
information you need. If the organization maintains a website, much of that information will be
available there. If it's incorporated, the office of the Secretary of the state of incorporation and/or
other state offices will have information about the officers (i.e., the Board of Directors) and other
aspects of the organization. Funding agencies may also have information that's a matter of public
Find out whom (by name as well as position) you should talk to about conducting a
Depending on the organization, this could be the board president, the executive director, or the
program director (if the program you're interested in is only part of a larger organization). In any
case, it might be wise to involve the program director even if he's not the final decision-maker,
since his cooperation will be crucial for the completion of your research.
• If you can, get a personal introduction. It's always best if you come recommended by
27
• If you can't get a personal introduction, it's usually best to send a letter requesting a meeting
• Before the meeting, send a proposal outlining what you want to do. This should be
substantive enough to help the organization decide whether it wants to work with you, but
not so specific that it doesn't allow for collaborative planning of the evaluation.
There are several purposes for this meeting, besides the ultimate one of getting permission
and support for your project (or at least an agreement to continue to discuss the
• Establishing your credentials - the experience, educational background, and any other
factors that equip you to conduct this evaluation. This might include references from
• Explaining what you want to do and why, what form the evaluation results are likely to
take, what you'll do with them, who'll have access, etc. This explanation should also cover
• Explaining what you need from the organization and/or program - participation of
participants and staff, for instance, any logistical support, access to records, or access to
program activities
28
• Explaining what you're offering in return - your services for a comprehensive formal
evaluation, any stipends, equipment or materials, other support services, or whatever else
• Clarifying the organization's needs, and discussing how they fit with your own - and how
Assuming that your presentation has been convincing, and you're now the program evaluator, the
rest of the steps here apply to both independent researchers and outside evaluators.
This may play out differently for outside evaluators than it does for independent researchers, but
it's equally important for both. It means finding out all you can about the community, the
organization, the program, and the participants beforehand - the social structure of the
community and where participants fit in it, the history of the issue in question, how the
organization is viewed, relationships among groups and individuals, community politics, etc.
If you're an outside evaluator, you can pick the brains of program administrators, staff, and
participants about the community, the organization, and the issue. Ask them to steer you to
others - community leaders, officials, longtime residents, clergy, and trusted members of
particular groups - who can give you their perspectives as well. If possible, get to know the
community physically: walk and/or drive around it, visit businesses, parks, restaurants, the
library. Understanding how the issue plays out in the community, the nature of relationships
among groups and individuals, and what life is like in the neighborhoods where participants live
29
If you're an independent researcher, learn as much about the context as you can before you
contact the program. Websites (for the organization and/or the community) and libraries are two
possible sources of information, as are community and organization literature and people who
know the community. Learning about the community, the organization, and the participants
beforehand will both help you determine whether this program fits with your research and help
you advocate for its cooperation with your project. Once you have that cooperation, you can
follow the same path as an outside evaluator (since that's what you are) to learn as much about the
PARTICIPANTS
This can be the most difficult part of an evaluation for someone from outside the organization.
There's no magic bullet or predictable timeline, but there are several things you can do:
• Be yourself. Don't feel you have to act a certain way: deal with people in the program as
you do with friends and acquaintances in other circumstances. People can tell when you're
• Don't assume you know more than anyone else just because you're the professional.
• Share freely what you do know, but don't tie yourself to any one process or method,
• Ask administrators, staff, and participants what they want from the evaluation, and discuss
30
• Don't be afraid to say "I don't know, but I'll find out," and then do.
• Follow through on whatever you say you'll do. Don't promise anything you can't deliver
We've discussed above the involvement of all stakeholders to the extent possible. Involving
participants, program staff, and other stakeholders in participatory planning and research can
often get you the most accurate data, and may give you entry to people and places you normally
might not have. On the other hand, participatory planning and research, as we've explained, takes
time and energy. If you have limited time, you may not be able to set up a fully participatory
project. You can, however, still consult with stakeholders, and involve them in ways that don't
necessarily involve training or large amounts of your time. They can help you line up interviews
with participants or other important informants, for instance, and/or act as informants themselves
At least the people in charge of the program, and probably those implementing it as well, will
expect to be part of the planning of the evaluation. They are, after all, the ones who need to know
whether their work is effective, and how to improve it. Involving participants as well, in roles
ranging from informants about context to actual researchers, is likely to enrich the quantity and
31
Plan the evaluation, in collaboration with stakeholders
That collaboration should be at the highest level of participation possible, given the nature of the
program, the time available, and the capacity of those involved (if program participants are
fiveyear-olds, they probably have relatively little to contribute to evaluation planning...but their
The actual planning involves ten different areas, each of which will be the subject of one of
Once the planning is done, it's time to get started on conducting the evaluation. And when you're
finished - having analyzed the information and planned and made the changes that were needed -
32
it's time to start the process again, so that you can determine whether those changes had the
effects you intended. Evaluation, like so much of community work, is a process that goes on as
long as the work itself does. It's absolutely essential to the continued improvement of your
program.
IN SUMMARY
Choosing evaluation questions - the areas in your work you'll examine as part of your evaluation
of your program - is key to defining exactly what it is you're trying to accomplish. For that
reason, those questions should be chosen carefully as part of the planning process for the
program itself, so that the questions can guide your work as well as your evaluation of it. The
more those stakeholders can be involved in that choice and planning, the more likely you are to
create a program that successfully meets its goals serving the community.
Choosing those questions well entails understanding the context of the program - the community,
participants, the culture of any groups involved, the history of the issue and of the social
structure of the community and the organization - and (if you're an outside evaluator without ties
to the program) establishing trust with administrators, staff members, and participants. That trust
will enable you to conduct a participatory evaluation that draws on the knowledge and talents of
all stakeholders, and to plan an evaluation that fits the goals of the program and accurately
analyzes its strengths and weaknesses. With that analysis in hand, you'll be able to make changes
to improve the program. Then you're ready to start the whole process again, so you can evaluate
33
Chapter 2
Suppose you wanted to design a house that used very little energy, took few resources to build
and maintain, and was affordable for most families. You might have some original ideas about
how this could be done, but you’d want to find out what ideas others had as well. You’d
probably read about earth-beamed houses (houses that are built into a hillside or earth mound),
solar panels or windmills for producing electricity, efficient insulating windows, waste-water
recycling, and non-toxic building materials that reuse waste wood and metal. You’d talk to
people who built or owned energy-efficient houses, to hear about the realities of living green.
You’d learn about the barriers to some environmentally-friendly strategies, as well as ways to get
around those barriers. There’s a huge amount of information out there, and it would make sense
to gather as much of it as possible, so that you could put together the information, incorporate
appropriate elements into your design, and get new ideas based on what’s already been done.
The same is true if you’re designing an intervention or program to deal with a community health
or other issue, or an evaluation of that program. Others have also undoubtedly tried to address
that issue, some with success and some without. Knowing what they did, how they did it, and
what the results were can help you decide how to design your effort. You might be able to find a
method here, and a technique elsewhere that all fit together into exactly the program that will suit
the people and conditions in your community. Or you might realize that something you’d
intended to do simply hasn’t worked in a number of other instances, and so wouldn’t be likely to
34
Gathering and using others’ ideas doesn’t mean that you can’t use your own or come up with
something new. New ideas tend to come out of what others have attempted. Most artists start out
imitating others before they develop their own styles. Einstein didn’t just chance on relativity; he
was familiar with it because others had worked on it. You can usually innovate more effectively
This section looks at gathering all the information you can about your community issue and
about attempts to address it, and putting that information together to design an evaluation to
address your questions. Although this chapter is about evaluation, much of the material in these
sections applies to planning the intervention (or program) and the evaluation: the two really can’t
be separated.
An evaluation is a research project: we are trying to discover what works and under what
conditions. The steps for designing and using an evaluation – the subject of this chapter – are
essentially the same as those for designing the program you’re evaluating. The elements that you
borrow from others’ successful efforts, and those that you create yourself, will give you an
intervention and related evaluation questions. Although this section talks about program design, it
Information gathering refers to gathering information about the issue you’re facing and the ways
other organizations and communities have addressed it. The more information you have about the
issue itself and the ways it has been approached, the more likely you are to be able to devise an
35
There are obviously many sources of information, and they vary depending on what you’re looking
for. In general, you can consult existing sources or look at “natural examples,” examples of actual
programs and interventions that have addressed the issue. We’ll touch on where to find both here,
and then go into more detail about them later in the section.
• Existing sources. This term refers to published material of various kinds that might shed
light either on the issue or on attempts to deal with it. These can be conveniently divided
into scholarly publications, aimed primarily at researchers and the academic community;
mass-market sources, written in a popular style and aimed at the general public; and
government agencies.
communities that have addressed your issue. Studying them can tell you what worked for
them and what didn’t, and why. By giving you insight into how issues play out in your or
other communities, they can provide nuts-and-bolts ideas about how to (or how not to)
conduct a successful program or intervention. For the most part, information sources here
are the people who are involved in efforts to address issues similar to yours, or those who
can steer you to them. Additionally, there are a number of natural examples (such as single
case studies) that have been written about descriptively in the literature of community
Synthesis is from the Greek; it means putting together. Its English meaning is the same: the
putting together of something out of two or more different sources. Synthetic fabrics, for
36
instance, are called that because they’re constructed from a number of different chemical
building blocks.
In this section, we’re talking about ideas. Synthesis here refers to analyzing what you’ve learned
from your information gathering, and constructing a coherent program or approach by taking
ideas from a number of sources and putting them together to create something new that meets the
Synthesizing in this way requires identifying the functional elements of each idea or program
that you’ve looked at that seems to hold lessons for your work. Functional elements are the core
components of each program – the methods, framework, activities, techniques, and other aspects
– that make up the specific program you’re examining. Once you’ve separated these parts out,
you can put those that meet your needs together with what you’ve learned about the issue and
your own ideas to build a program that speaks specifically to your situation.
As we’ve mentioned, the activities of information gathering and synthesis are needed
both to create the original program and to develop an evaluation of it that will help you
maintain and improve it. The two really start in the same place, with what you think
will address the issue – what shape the program or intervention should take, with whom
it should be applied, and what behaviors or conditions it aims to change. This also
informs what its short- and long-term goals should be, and by what means you’ll try to
achieve those goals. Once these are determined, they in turn determine your evaluation
questions. You can’t construct an evaluation without knowing exactly what you’re
trying to evaluate.
37
WHY GATHER AND SYNTHESIZE INFORMATION?
If you’re in the process of starting a program to address a community issue, such as violence or
early childhood education, you probably know quite a bit about that issue already. You’ve dealt
with it, perhaps, in a variety of ways, and you have some pretty good ideas about what kind of
program would work. Why take the time and trouble, for you and for others engaged in a
participatory planning effort, to read a lot of material written by others and to track down people
who’ve run programs? If you’re inclined to think this way, there are a lot of good reasons why
you should think again. Gathering information beforehand and putting together what you’ve
learned could be the most important things you do to make your program effective. Here’s why:
• It will help you avoid reinventing the wheel. A lot of different organizations have likely
approached this issue before you. Some might have been successful and some might not
have, but all of them have probably learned something that would be useful to you in the
process. You don’t have to make the same mistakes someone else did if you know about
them, and you don’t have to make up something from scratch that may or may not work,
It’s certainly not a bad thing if you have some of the same good ideas that others have had, but it
helps to know that they are good ideas. And there’s a chance that you might have some of the same
bad ideas others have had, in which case it helps even more to know that they’re bad ideas. It will
save you a huge amount of trouble, and perhaps be the difference between creating a program that
does its job well and one that fails miserably and disappears. Square wheels don’t roll – someone
38
• It will help you to gain a deep understanding of the issue so that you can address it
properly. The first step in figuring out how to deal with an issue is to know what you’re
dealing with. The better you understand it – its causes, how it occurs, how people react
when they’re affected by it, what its consequences are for individuals and the community,
and who can influence it – the more likely it is that you’ll be able to determine how to
approach it.
• You need all the tools possible to create the best program you can. Foremost among the
tools you need to plan and implement a program or intervention are information,
information, and information. Just as with the issue itself, the more you know about what
works for whom, how to make things happen, and how to establish or eliminate certain
conditions, the more likely that you’ll be able to plan a successful program that addresses
all aspects of the issue and leaves nothing to chance. Various kinds of professional and
interpersonal skills may help you implement a program, but if what you’re implementing
• It’s likely that most solutions aren’t one size fits all. The more information you gather,
the greater the variety of approaches, methods, and frameworks you’ll have to choose from.
Putting together the right combination will help you to successfully address the particular
• It can help you to be culturally sensitive. Not only can you learn more about the
culture(s) of the people you’re working with, but you can probably find a number of
approaches that have worked with the cultural group you hope will benefit. Perhaps even
39
more important, you can learn to avoid costly mistakes that may take a lot of time and
• Knowing what’s been done in a variety of other circumstances and understanding the
issue from a number of different viewpoints may give you new insights and new ideas
for your program. As we discussed at the beginning of this section, new ideas seldom
spring from nowhere. They’re stimulated by your own experience and the ideas and
experience – good and bad, positive and negative – of others. Look to the experience of
other fields, communities, and countries. The more different ideas you’re exposed to, and
the more ways you can put them together, the greater chance there is that you’ll come up
with something new that’s more effective than what’s gone before.
Information gathering and synthesis is crucial to the success of the program and to the relevance
and effectiveness of the evaluation. It should start at the beginning of any effort, and contribute
to the initial planning. It should also go on throughout the life of the program, so that you can
Major adjustments should generally come at the end of an evaluation cycle, when you have solid
information about what worked and what didn’t. That doesn’t mean that you can’t make smaller
adjustments in the course of the program to improve results along the way.
There’s a tension here between continually changing a program to make it better and obtaining
accurate evaluation results. If you change a method or activity in midstream, your evaluation will
40
How much changing you do in the course of a program depends on your intent. If your first
responsibility is to find out what works best, so you can pass it on, then it’s important not to make
changes until an evaluation has been completed. If your primary responsibility is to the current
participants in the program, then you should make whatever changes are necessary whenever
There can be ethical issues involved here. In medical experiments with new therapies or drugs,
for example, some participants are given the new treatment and others aren’t (all participants
consent to this arrangement, and to not knowing which group they’ll be assigned to.) If the new
treatment proves to be harmful, there is an ethical obligation for the researchers to stop
administering it. If, on the other hand, it quickly proves remarkably effective, researchers usually
feel ethically bound to extend it to others in the study as soon as they can prove its positive
effects. Not all programs necessarily pose ethical problems that are as clear-cut as those
The assumption throughout this chapter is that the whole process – planning, design,
41
• Others affected by the program – police, medical staff, teachers, etc.
• Local officials
• Community activists
INFORMATION GATHERING
determined by the skills and experience of the participants. If there are academics or other
professional researchers involved, it would probably make the most sense for them – or others
with research experience – to review the evaluation literature. Members of the affected
population might be the best ones to collect information about the history of the issue in the
community, and about how it currently affects people. Program directors and staff would
probably have the best contacts in the field, and thus the best chance to find information about
other similar programs. Those with Internet access and computer experience might be the logical
on-line searchers, or might act as technical support for others to help them find what they’re
looking for. Those with knowledge in the law and legislation might be the ones to examine
policies.
There’s also the possibility that training could be provided to the whole group, or to various
individuals to allow them to pursue various lines of inquiry. There’s no reason, for instance, that
people without research experience couldn’t learn to understand and interpret demographic
information or contact programs in other places. (There are some limitations here: levels of
42
related education, materials or computers, and/or inability to connect with other people might all
SYNTHESIS
It is especially important that all participants in the process be involved in putting together the
information. Training new participants to synthesize information will pay dividends in the end,
because they may be able to see things in the information that aren’t obvious to experienced
researchers. They may know things about the community that shed light on which elements of
In any case, information gathering and synthesis, like any other part of the process, should reflect
There are a number of steps to gathering and putting together the information you need. Most of
these can be group activities, part of the participatory process. The actual information gathering
Not surprisingly, the first step in gathering information is determining what information to
43
• Details about the issue. These might include its immediate and root causes; its general
different stages; its history; and the history of attempts to address it.
• How the issue has been dealt with elsewhere. Best practices or approaches for which there
is an evidence base; other approaches that have been at least partially effective; and what
hasn’t worked, which may give you at least as much important information as what has.
• People who can help. This category encompasses experts in the field and people or
organizations that have run or been involved in successful attempts to address the issue.
• Who is affected locally, and how? This really comprises two questions: a) what population
groups – geographical, ethnic, cultural, racial, class, etc. – are particularly affected by this
issue? And b) what other groups are affected, but less visibly? These might include those
who work with the first group(s) in the community (teachers, for example, or social
workers), those who depend on them, and those on whom they depend.
• The importance of the issue to the community. Again, this implies a double question:
o How important does the community perceive the issue to be? and
o How much and in what ways does the issue actually affect the community as a
whole?
• Community needs related to the issue. What has to be added to or removed from the
community in order to improve the situation? What kinds of approaches will the
44
• Other context information. Community history, relationships among groups and
• Who, if anyone, has some influence or control over changing the situation? Public officials
and other policymakers are often in this position. Business leaders, landlords, government
enforcement agencies, schools, employers, hospitals and health personnel, and members of
the affected group itself might also be in the position to change the situation (by learning
As mentioned above, these encompass existing (i.e., published) sources and natural (i.e.,
experiential) examples. Published sources can be divided into scholarly, mass-market, and
statistical, each of which can provide different information and a different perspective on the
issue and attempts to address it. Depending on what you decide you’re looking for, you might
The single largest storehouse of information available is the Internet. Many scholarly articles are
published online and accessible – often free, sometimes for a fee – to anyone who’s interested.
Virtually all U.S. laws and regulations at every level of government are easily found, most on
several websites. General knowledge on just about anything is widely available, as are lists of
best practices and successful organizations and the websites of those organizations. Census data
and other similar statistical information are also on view. Add to these the information provided
by such all-encompassing sources as Wikipedia (recently, for all its quirks, found to be just about
as accurate across its million-plus entries as the Encyclopedia Britannica), and you have a nearly-
45
As always, you have to be cautious: most of cyberspace is unedited, and the quality of
information varies. If you stick to reasonably reliable sites, you’re likely to find almost whatever
EXISTING SOURCES
• Doctoral dissertations - these are accessible to researchers through university libraries and
• Papers and reports delivered at academic and professional conferences - these are often
Scientific American
• Newspaper archives
• Direct contact with academics and other researchers who’ve done work on the issue you’re
• Internet list servERs and news groups relating to the issue or the field in question
46
• Widely available books, often marketed as “self-help” or “life-changing,” to the public at
large
• Articles in popular magazines, both those devoted to science or behavior and those of
general interest
• Community reports, such as community report cards, self-studies, and needs assessments,
all of which should be obtainable through the appropriate municipal offices, and sometimes
• Organizational and agency data, usually a matter of public record if the agency is public or
publicly funded
47
In addition to these sources, the broadcast media often present stories about critical issues or
about successful efforts to address them. In most cases, such stories only skim the surface, since
they have to fit into short time slots (public broadcasting, on both radio and TV, breaks this mold
more than other media outlets). They can, however, serve as introductions to further research,
raising the importance of one or more aspects of an issue, or providing information about
Natural examples
• Program directors
• Funders (particularly public agencies, because their transactions, including whom they
• People who work in collaboration with programs – police, medical staff, teachers, etc.
48
• Experts – some of them the same academics and other researchers referenced under
scholarly sources – who have experience with your issue and efforts to address it
Don’t be afraid to range far and wide in your search for successful models or new ideas. Step
outside your own field and your own region, and see what’s been done elsewhere. A model from
social work or urban design might work in public health, or vice-versa. There’s enough overlap
among fields that deal with human health and development that you can often find exactly what
you need in seemingly odd places.
• Who will gather what information? As we’ve discussed, the ideal group is multi-sectoral
and diverse in backgrounds and skills. Information gathering should be assigned according
to participants’ skills, interests, and contacts in the community. We’ve suggested, for
approaching key informants in the community. This doesn’t mean, however, that in a given
group, these and other apparently logical roles couldn’t or shouldn’t be varied, depending
49
• How will the information be gathered? Another issue is just how the information will be
gathered. Finding and reading written material is relatively straightforward: it’s in the
library or on the web, and you can read and take notes on the relevant parts of it. Getting
information directly from other people, however, can be more complicated. Will you
public meetings? How will you contact people you don’t know – by letter, by phone,
how much time you have, exactly what information you need, the depth of the information
• What adjustments will be made for particular gaps in experience or skills? People who
don’t read, write, and/or speak the language proficiently may have to devise imaginative
writing for just about everyone in the group who isn’t an academic. In many cases, most or
the entire group may need orientation or training before information gathering can begin.
You’ll need to work out what the needs are as a group, and devise ways to meet them.
• What’s the timeline for information gathering? While information gathering should
continue throughout the life of a project, the initial phase should have a time limit, so that
action isn’t delayed for too long. The time limit depends on your time constraints, the
seriousness and intensity of the issue, the community’s perception of urgency, and whether
there are external time restrictions (student interns who are only available until the end of
the summer, for instance.) Having a clear deadline will focus the group’s activities, and
50
COLLECT INFORMATION
When your plan is completed, it’s time to put it into practice. You’ll have to conduct any training
that are necessary, and make sure that all the relevant tasks are assigned appropriately. You may
also want to set up regular meetings throughout the information-gathering process, in order to
give the group the chance to review progress, make suggestions, and report on what they’ve been
finding. In addition to providing support for those new to research, these meetings, by providing
a preview of the results of the process, will save everyone having to digest an overwhelming
The process of synthesis involves breaking the information down into its component parts,
sifting through those parts to see which fit together best for your situation, and then integrating
• What’s known about the issue itself? What personal and environmental factors contribute
to the problem? What are its root causes? Do you have the resources to address them, or
are they beyond your scope (e.g., global economic forces or climate change)? Does the
issue have a number of different effects, and if so, what are they? What are the likely
consequences for the community as a whole if the issue is not resolved? (An
environmental health risk can not only kill or sicken individuals, but might also affect
business productivity, insurance availability and rates, hospital costs, the housing market,
or even – as in the case of the Love Canal neighborhood in Niagara Falls, NY – the
51
• The community context of the issue. What are the specific local effects of the issue?
Exactly who is affected? Exactly how are they affected? What are the consequences for
those individuals? For their families, friends, neighbors, and others they have dealings
with? For the community as a whole? What has been the community’s experience with
this issue in the past? How, if at all, has it been addressed? What local conditions would
change if the issue were addressed, and how would they change? Are there underlying
conditions that have to change before the issue can be addressed? Whose attitudes and/or
behaviors need to change to have an effect on the issue (for example, among policy
• Successful and unsuccessful attempts to address the issue. These may have been gleaned
both from the literature on best practices, and directly or at second hand from those
involved in them. Here, it’s important to separate out the elements of various approaches.
What specific procedures – methods and intervention components – were used? What
kinds of training – feedback, role play, modeling, etc. – were provided to participants?
Was information provided to participants about when, why, and how to act? Were there
positive or negative consequences that helped to establish or maintain change (or its
removed? What was the overall philosophy behind the approach? What aspects of the
issue did it address? What kind(s) of community was it tried in? What population groups
(in terms of culture, age, social class, etc.) were involved? Who was the approach to
benefit? What were the specific results in the short term? In the long term? What makes
52
participant characteristics, and broader environmental factors – were critical? Is there a
The existence of a model unsuccessful program doesn’t indicate that if you do the opposite of
everything that program did, you’ll be successful. Even if it failed spectacularly, much of the
program may have been potentially effective, but one or two elements – the way participants
were approached, recruited, or treated, a particular method – negated what could have worked.
By the same token, most elements of the program may have been fine, but its basic premise
might have been mistaken or ineffective – “Just say no” as a way of preventing AIDS among
teens, for instance. It’s important to try to figure out why the program was unsuccessful. A true
model unsuccessful program is one that did everything wrong, but those are few and far between.
Lisbeth Schorr (Common Purpose) makes a useful distinction between “what works” and
conditions under which what works actually works. Sometimes the presence of a charismatic
leader or champion motivates staff and/or participants to succeed. When the leader or original
staff members leave, some such programs collapse, while others are able to renew themselves by
In looking for programs to draw from, you need to understand the intervention components and
elements that make those programs work. Also, try to understand the conditions that allow an
intervention to be successful.
53
Analyze the elements you’ve found to determine which of them would be appropriate for
• What has been used specifically with your population in your circumstances? Have the
successful programs you’ve looked at been context-specific (i.e., intended for their specific
communities and populations)? Can they be adapted to your context if they weren’t
• What can be adapted, if it wasn’t originally aimed at your population? (Techniques used
with children or adolescents that could be modified for use with adults, for instance, or
vice-versa.)
• What’s missing? What aspects of the issue in your community are not addressed by what
you’ve found? Are they important enough that they need to be addressed?
• Does what you’ve found confirm or contradict what you thought you already knew?
• Are there factors in your particular situation that make the issue substantially different for
you and your participants than for any other programs or approaches you’ve found out
• What does your information tell you about the possibility of successfully addressing the
issue’s root causes (e.g., income inequality, social exclusion, lack of power)?
• In general, did most or all successful programs direct their change efforts at the same group
of people (policy makers, for example), or was there a variety? If the latter, what do you
54
• Perhaps most important, what’s your definition of success, and which of the programs you
learned about came closest to achieving? What components and elements of those
Answering these questions will give you a good sense of which components of other programs
may work for you, and should also fit with what you already know to either give you ideas for new
elements that you can add, or confirm (or warn you away from) ideas for new elements that you
had already.
We don’t want to imply that simply taking a lot of different program components and playing
mix-and-match will provide you with an effective way to address a community issue. You
have to start with a clear framework informed by your vision and mission, and put together a
program that’s coherent and makes sense. All the elements have to fit; if they fit well enough,
you’ll end up with a whole that’s greater than the sum of its parts. If the elements don’t fit
together, or aren’t part of a program with a well-defined framework, the chances are you’ll end
up with a mess.
KEEP AT IT
Information gathering and knowledge synthesis should continue throughout the course of the
program. While you may wait until the results of an initial evaluation to change something, you
should always be looking for improvements and better approaches. No program or effort is perfect:
everything can be improved. As long as you keep trying to learn more and grow in your
understanding of your work, it will continue to get better. If you become complacent (i.e., you feel
you know what you’re doing and can relax), your program may start to lose its effectiveness.
IN SUMMARY
55
Gathering the information that already exists about your issue and attempts to address it is one of
the most important aspects of planning a program or evaluation. By putting together what’s
known about the issue and the history of the successes and failures of various approaches to it,
you can build a program structure that includes your own innovations and elements that have
worked for others in similar situations. This synthesis also allows you to avoid ineffective
approaches and to incorporate ideas and methods that have been particularly appropriate,
Information gathering and synthesis should continue throughout the life of the program. The
more information you have, and the more carefully you put it together, the better your chances of
Chapter 3
56
The local community health center was starting a program to encourage regular physical activity
among people with high blood pressure. The program had one simple objective: to engage
participants in 45 minutes of regular, moderate aerobic exercise at least four times a week over
the course of six months. The hope was that this regimen would lower participants’ blood
pressure, and lead to weight loss and an overall sense of greater well-being. A related aim was
that participants would continue the exercise on their own after the program ended.
The Center had gathered a group of 50 people who were willing to take part. After physical
checkups, all the participants attended workshops on diet, the mechanics and dangers of high
blood pressure, and how to start and maintain injury-free regular exercise. They also received
counseling about the kinds of exercise they might undertake – walking, bicycling, swimming,
etc. – and about ways to make exercising pleasant. To help participants to integrate exercise into
their lives, the center decided to ask them to exercise at their own convenience, using whatever
activities they chose, as long as they maintained the 45-minute, four-day-a-week pattern. They
were also asked to keep journals of the frequency and nature of their exercise, and to meet, in
groups of five, with Health Center counselors once a month for checks on blood pressure,
The center then had to decide how to evaluate the program. The performance goal of the
program – what participants were actually supposed to do – was maintaining the exercise
schedule over the six-month period. Since each participant was tending to that individually, it
would be hard to actually watch them all exercise whenever they did so. But the center needed to
know whether or not they had. The other program goals – lower blood pressure, good weight
loss – also had to be observed somehow. How could the center design a system to find out
57
whether the behavior of physical activity was occurring and whether this resulted in lower blood
Once you’ve determined your evaluation questions and gathered information about what to look
for, you have to find a way to look for it. That’s what observation systems are all about. Like the
center in the example, you’ll need to find ways to observe the behavior, the conditions, and the
changes – or lack of changes – in them that will answer your evaluation questions. This section is
An observational system is the way you get information about your program – what it and its
participants and implementers are actually doing, and what seems to be occurring as a result.
“Observation” here may mean actual observation – watching people, conditions, activity, or
results to see what happens – but it may also refer to less direct ways of monitoring a program’s
operation and outcomes. Its varieties include monitoring the behavior of individuals and groups
to see the results at different levels. Some methods of observation that might prove useful in
• Direct observation. This is the purest and most verifiable form – watching people or
use and neighborhood sense of ownership of a public park, for instance, you might
directly observe how much and how people use the park by visiting and observing on
different days, in different types of weather, and under different circumstances over a
activity would probably be, or they may be staff members who work with participants,
58
recording what happens. In either case, they are taking measures as outside observers, not
as participants themselves.
• Participant observation. A participant observer becomes part of the action, and observes
resident directly involved in the effort, or might be someone who becomes part of the life
of the park for the purposes of observation. He might jog their daily, or join a weekly
volleyball game and get to know others who use the park on a regular basis. His own
notes about what is observed in the park might also become part of his recording.
• Self-reports. Some of what you’re trying to achieve may simply not be visible at all, at
least not to you. Changes in what people do in private, such as their use of contraceptives,
may not be (or should not be) observed directly by an outsider. Similarly, when the goal
healthy eating in the community, it will not be feasible to directly observe this for
everyone. In such situations, we ask people to report on their own behavior Thus; an
firstperson reporting. Since such reporting may be subject to bias, we usually try to also
use other forms of evidence (e.g., observing weight loss as a product of the behaviors of
others who have direct experience with the people or conditions you’re concerned with.
Teachers, probation officers, park rangers, public health nurses, social workers – even
59
reports, like self-reports, may be gathered by interviews, journals, surveys, checklists, and
the like.
• Electronic or mechanical observation. The observer in this case isn’t a person (although
on camera, audio recorder, heart monitor, pedometer, GPS (global positioning system)
tracker, or other piece of equipment. A camera operated by a tripwire is often used, for
particular path.
• Tests of various kinds. Depending on what you’re measuring, this category could cover
blood tests and the like. They might also include tests of new program methods and
• Public and other records. Police reports, census data, employment statistics, public
health information – all of these and more could give you information on communitylevel
indicators that will help you determine the outcomes of your work.
result of a behavior, rather than the behavior itself. For instance, if interested in
environmental pollution, we might observe the amount of debris or toxins on the ground
or in the water, rather than the behavior of illegal dumping of toxins or materials. Similarly,
an initiative interesting in preventing childhood obesity might use school records of height
and weight to measure obesity – in addition to direct observations of school lunches and
60
In addition to specifying what kinds of observation you’ll use, the design of an observational
system should also cover when, where, how often, by whom, and under what circumstances
observations will take place, as well as just what will be looked at. All of these depend on what
you’re observing and what information you hope to gain from your evaluation (back to those
Among other considerations, will you look at the process of your effort or program – the steps
you took in setting it up, and whether they were faithful to what you intended? Will you look at
what you actually did – the number of participants you had, the methods you used, the time
everything took, how long participants stayed, etc.? Are you interested in which parts of what
you did were successful and which were not? And what do you want to know about outcomes –
Designing an observational system entails thinking carefully about what you need to know, and
creating a system that is most likely to get you that information as accurately and easily as
possible. We’ll discuss the design process in detail, including the issues mentioned in the last
• It can help you get reliable information. Designing a system that standardizes the
methods, times, and other aspects of the observation will mean that the information you
61
get from different observers and places is likely to be accurate and consistent, and
• It can help you find out exactly what you need to know, without wasted effort. You
can design a system that examines what you’re interested in, and ignores what you’re not.
That means that you don’t have to sort out unnecessary data, and that you’ll have the right
means of collecting the data you need to address your evaluation questions.
• It can ensure that observations are made. A consistent system that’s designed and
accepted by those who will do the observing, whatever form it takes, makes it far more
likely that observations will be made when, where, and how they’re supposed to.
• It can make it easier to analyze your data. A consistent, rational system of observation
can give you good information for scientific analysis, whether that analysis is quantitative
• It can help you avoid haphazard evaluation. A well-designed observational system will
allow you to collect information systematically, and not leave you with a mass of
disconnected data that are not necessarily related to what you want to know.
• It will make it easier to justify your findings. The more accurate your information, the
more reliable the conclusions that can be drawn from it. If your observational system is
designed and implemented well, it’s much easier to argue that your information is reliable
and accurate, and a good base for the conclusions you reach.
62
• It can help you gain credibility with funders and policymakers. The people who
control funds and policy are particularly concerned with accountability. If you can present
them with a useful evaluation based on data collected through a well-designed and
• It can let you pass on your practices with confidence. A well-designed observational
system makes it possible to feel that your evaluation results tell the truth. If those results
show that your program is highly effective, you can pass on what you do as a best
practice to colleagues in the field and others, without worrying that you may be urging
them to use methods or assumptions that might not work very well.
• It can give you the best information possible about what’s working in your program,
As we’ve discussed, an observational system refers not only to direct observation, but to any
method of examining and recording the process, activities, and outcomes of your program. An
observational system is intrinsic to your evaluation, since that is what will tell you what actually
went on. Therefore, the ideal is to design that system before you actually start implementing the
program, so that you can monitor the program throughout its existence.
That’s the ideal. The reality for many community workers – especially those who work in small,
community-based organizations – is that evaluation begins whenever the time, energy, and
resources are available, which is often months or years after the program has started. Whenever
it begins, the observational system should be designed to fit the evaluation questions you’re
63
asking. Because the observations must be consistent and reliable it's well worth taking the time
It’s best if you can observe through a whole program cycle, from beginning to end. Some
programs don’t have a cycle, and the observation may focus on the behavior or results of
individual participants rather than the program as a whole. In these situations, evaluation may
begin as new participants begin the program and are observed from the beginning.
If you’ve been recording events, keeping journals, etc., before you start evaluating, you may have
information that you can incorporate into the results of your observation. If your design calls for
specific firsthand observations, their definition may be precise enough that similar observations
recorded in journals kept by staff may not meet the criteria to be included. If staff journals or
records are part of the system you’re putting in place, however, you may be able to use all or
most of the information you have (and, indeed, you could design the observational system so you
can.)
The real danger here is that you’ve missed something important already by the time your
observational system is operational. It may be that the early part of the program is crucial, at
least for some participants or for some changes and you’ll be starting to observe after those
changes have been made. Start your observations early so that your system can pick it up.
We’ve stated many times the Community Tool Box bias toward participatory process, and
system, a system will function best if it’s designed by a group that includes those who will
64
actually be the observers. If they’re part of the planning, they’ll be familiar with the system,
know exactly what information it’s meant to observe, and understand their roles with the
observational system.
In a community-based or smaller organization, it’s likely that time will be a factor – there are
probably too few people already doing too many jobs. If that’s the case, the level and nature of
the observational system has to be one that the staff can actually handle, whether they’re the
observers, or whether they’re facilitating the observation for outside evaluators or volunteers. If
they help to plan and set up the system, they’ll have far more incentive to make sure it works
The design of the observational system should specifically include the people who will actually
do the observing, who are often either staff members or members of the group that will benefit
from the program. In addition to these, it helps to include researchers or others who understand
observational systems, and can help to design a system that specifically meets the needs of the
evaluation or research project. It might also be beneficial to include individuals from the
group(s) that will be observed, to help with cultural issues and provide feedback on their
response to the design. The observational design team, therefore, might consist both of members
of the overall evaluation planning group, and others specifically recruited to work on an
observational system.
65
• Outside evaluators or research consultants
• Participants or beneficiaries
• Volunteer observers
If, for some reason, the design group doesn’t include anyone with research experience, a
training that includes information about different methods of observation, and about which
methods are likely to produce which kinds of information, could help greatly to inform the
design process. If the group does include researchers, that information could be presented as
part of the discussion about design and the various possibilities, rather than in a training or
workshop format.
So, you’ve decided on evaluation questions and planned an evaluation. Now it’s time to
determine how you’ll get the data you need to answer your evaluation questions.
Remember these? You decided what it was you wanted to know, in order to determine whether
your program was effective. Let’s go back to the example at the beginning of this section, the local
community health center program. The center was starting a physical activity program for people
with high blood pressure. Its objective was to have the participants engage in 45 minutes of
66
• That participants’ blood pressure would decrease
• Those participants would continue the exercise routine after the six-month program ended.
• Did participants engage in the recommended exercise routine for the period of the
program?
• Did participants’ blood pressure decrease by the end of the six months?
• Did those participants who needed to lose weight do so by the end of six months?
• How well attended were the workshops? Did participants find them helpful? Did
participants who attended all or most of the workshops achieve better results than those
who didn’t?
• Did participants experience a sense of greater well-being by the end of six months?
67
Did participants continue the exercise routine (and maintain their lower blood pressure)
The answers to these questions might be only a few of those that the center wanted, but let’s
stick with them for now, and use them as examples as we go through this part of the section.
You may be concerned about your own process – how well you actually plan and implement
your program. You may also, like the center, have a specific time frame in which you hope for
results. You might also have benchmarks – smaller achievements along the way to a larger goal
– that you’re concerned with recording. All of this should figure into your observational design.
Depending on the kind of program or effort you’re engaged in, and the nature of your evaluation
• Participants’ behavior. This could be anything from the aggressive behaviors of children
• Someone else’s behavior. The ultimate test of whether a high school peer mediation
training program is working, for instance, may not be the behavior of the mediators on
whose training the program focuses, but the behavior of the students with whom they
68
work. There’s also a possibility here of looking at the ways in which participants are
house where drug dealing occurs, building affordable housing, cleaning up a polluted
condition itself isn’t visible or observable, either because it’s private, or because it takes
place on a level that can’t be observed directly, you may have to measure its products or
results. It would be virtually impossible to observe directly the rate at which adolescents
practiced safe sex, but it would be possible to learn the rate of STD infections among
them, and the number of teen pregnancies before, during, and after a safe-sex peer
education program.
When products or effects are all you can observe, you have to be sure that you’ve chosen the
right ones to look at. They should be, to the extent possible, obvious results of the behavior or
condition you’re interested in, and you should take into account – and try to correct for –
• Participants’ knowledge or attitudes. Like participants’ behavior, the possible range here
is enormous, from scores on a knowledge test to nearly anything else you might think of.
69
• Someone else’s knowledge or attitudes. For instance, an advocacy program would be
concerned with changing the attitudes of legislators and the public, but might not have
direct contact with those whom it hoped to influence. This might use repeated public
Goal attainment. Some programs have a particular aim that is their only reason for
existence. This might be the passage or repeal of a law, the building of a school, the
freedom of one or more political prisoners, etc. The only evaluation question in that case
may be whether the goal has been achieved (or to what degree). In this situation, a goal
attainment scale can be used to assess the degree of attainment (e.g., from 5 = most
whether particular individuals or groups interact or engage each other. For instance, if
the goal is increasing parent-child interaction, each party’s talking to and responding to
the other might be measure. As above, interactions among program participants, staff, or
All of the above possibilities might have to do with either program goals – i.e., what a program
wants to accomplish – or process and implementation – how a program goes about setting up
There are also some areas of observation that relate specifically to program process and
implementation:
70
• Planning. Measurement here may focus on who was involved in planning what parts of
the program, how the plan was developed, what its content was, satisfaction, etc.
• Timeline. When did the planning, implementation, and evaluation of the program each
begin? How long did each take? Were deadlines met, and, if not, why not?
Numbers of participants. How many participants did you have? What was the average
amount of time they spent in the program? How many dropped out before completing
the program? How did those numbers compare to what you expected?
• Methods. What methods did you use in the program or intervention? How were they
used?
• Program implementation. What did you actually do? This would include the program
activity, its frequency and duration, the number of participants it served, where it took
In addition to identifying what you want to look at, you’ll have to define it carefully, so that
observers know exactly what to look for. You have to be certain that all observations of a
particular behavior, for instance, refer to the same phenomena (e.g., specific features that define
whether the behavior occurred), even if observations made by different people. If they don’t
use the same measurement, you can’t really count on the information you get. Setting the limits
of observations in each category – what’s included, what’s excluded, and where the boundaries
are – will help to eliminate disagreement and make the observation more reliable.
71
72
To continue the health center example, let’s look at what the Center needs to observe. In order
to determine whether participants are actually doing their exercise regularly, the center has to
find a way to observe people’s behavior (e.g., activity logs, self-reports on how often they
engaged in physical activity). To find out whether they’ve lost weight, the center has to observe
an outcome of behavior (e.g., weight, body mass index). To learn whether they’re experiencing
an increased sense of well-being, the center has to obtain self-reports. And to learn whether
they continue their exercise past the end of the program, it has to find a way to observe behavior
Earlier, we discussed some methods of observation. We’ll return to those here, and examine
• Direct observation. Direct observation involves either the “fly on the wall” approach,
where the observer is anonymous and generally unnoticed, or – more often in service
member who works with participants and records, sometimes with their help, their
behaviors and aspects of the situation Anonymous observers are particularly good in
situations where any people being observed are equally anonymous – conditions, large
events, or activity like the use of that neighborhood park we talked about earlier, where
73
One common method of direct observation -- whether the observer is a program staffer or a
program participant – is through keeping a journal or activity log. The observer writes down or
otherwise records, soon after the occurrence, an account of what happened and events related to
it and often reactions to those experiences as well. The journal or activity log then becomes a
picture of the flow of the program, detailing the progress through it of particular participants,
The nature of journals or report logs will obviously vary with the nature of the program, and not
all programs or efforts lend themselves to this kind of observation. But, especially in situations
where several people have written journals or logs that cover the same period and events, they
event, activity, culture, etc. that’s being observed, and experience it firsthand. Thus, in a
health or human service program, the observer might be an actual participant (i.e., a
member of the group at whom the program is aimed), or an evaluator who joins
participants in their activities. In the case of the park, for instance, a neighborhood
resident who already visits the park regularly might volunteer to track how he and others
actually use it – when various people or groups come, what they do once they’re there,
which parts of the park they frequent, who interacts with whom, etc.
increase income through creation of small businesses by participants. Members of the staff took
part in each workshop, and participated in such activities as training in lending and loans, all the
74
while observing the activities and other participants. At the end of each day, the staff conducted
group discussions where they relayed what they had seen, and asked participants to analyze
what they had done. The staff in this instance functioned as participant observers, and their
participation added greatly to their ability to help their client, low-income women from small
• Self-reports. When the object of your observation is participants’ behavior that takes
place away from the program (the amount of time participants spend reading to their
children, for example), you often have to rely on the observations of participants
themselves. This reliance has, as you might expect, both advantages and disadvantages.
On the one hand, participants obviously know their own actions well. On the other, they
can also leave out things that might be embarrassing or report in ways that think others
want to see. In addition, since they haven’t had prior experience as or been trained as
observers, they might miss, or dismiss as unimportant and not report, behaviors or
An obvious remedy would be to train self-reporters as observers, and in some cases –medical
trials, for instance – that’s both reasonable and common. In other situations, however, it would
go too far toward telling participants “the right answers,” and thereby possibly changing what
they report toward what they think you want to hear. There is also a huge advantage to self-
report: when they’re honest and represent a real change in behavior or experience for the
reporters, they’re far more powerful than anything another person could say about their
experience.
75
Self-reports, at least as defined here, imply an array of possible techniques for data collection.
Individual and group interviews, focus groups, public meetings, surveys and questionnaires,
journals, checklists, and even casual conversation might all be ways for participants to convey
• Second-hand reports. These are reports about participant behavior or about conditions
that come from people associated with those participants or conditions, but not directly
connected to your program or effort. They might be service workers, teachers, health
the same cautions as those from people who work closely with participants.
personal biases can all keep reports from being objective. These observers may also need
training.
electronically, as are observations of health conditions inside the human body (using
problem, but you have to be sure that whatever equipment you use is working properly,
76
important that whoever interprets the information the equipment provides is trained to do
so, and understands the limits and appropriate uses of that information.
• Tests or other similar observation tools. Education and health organizations often use
various kinds of tests as observation tools. In a human service context, they are generally
medicine, tests may be used to observe health status (e.g., screenings for elevated blood
cholesterol) and the effects of treatment. They can be very useful in all these
circumstances, but they’re also very specific, and don’t allow much room for intuition. In
addition, the results of tests of skills, knowledge, or intellectual ability may be influenced
by nervousness, lack of sleep, personal problems, or other factors that have little to do
• Public records and the like. If you’re using community-level indicators, such as rates of
infant mortality or injuries due to motor vehicle crashes, as one way of looking at
outcomes, you’ll have to use records, census data, and other similar material to get the
To continue with our previous example, the local health center would have to use a variety of
these observation methods. The beginning and ongoing observations of blood pressure and
weight at the monthly counseling sessions would take place with the use of instruments – a blood
pressure cuff and a weighing scale – as well as by direct visual observation. (While an obvious
reduction in fat may not indicate weight loss if the fat is replaced by muscle, it does indicate an
77
increase in fitness, which may be equally beneficial. Fitness levels could also be mechanically
The amount and type of exercise each participant engaged in would be self-observed and
selfreported through journals and interviews. Feelings of well-being would be self-reported, but
could also be observed by counselors trained to look for changes over time in posture,
would come through one or more follow-up visits some time after the program ended, with
interviews, blood pressure and weight measurements, and more direct visual observation of
fitness levels. (Participants might also agree to continue keeping journals for a set period of time
after the program ended, thus providing a self-report of their ongoing levels of physical activity.)
The question here is whether you need to start observing at the very beginning of the program
(you almost always do), and how often you should observe throughout the course of the
evaluation.
• Pre- and post- observation. This means making your observations at the beginning and
the end of the evaluation period or the program. It’s the equivalent of what many schools
do with standardized testing. They test reading scores at the beginning and end of each
78
year, and then compare the two to determine how much the students have advanced.
Although this type of observation may tell you whether anything changed during the
program, it won’t give you strong evidence how the change took place, what caused it, or
This explanation assumes only before-and-after observation. Most of the possibilities here
include before-and-after observation, but add other observations to it. For most kinds of
evaluation, you should start observing at the beginning, or even well before the beginning (to
understand whether any changes may in fact be part of an already-existing trend). If you
conduct your first observation partway, you won’t know if changes occurred before then. A
major change may occur toward the beginning in some interventions, toward the end in others,
steadily throughout the intervention in still others, and in a few not till after the intervention is
over. It’s important to know just where you started from in order to fully understand what
you’re seeing. It may be that a long intervention is no more effective than a short one, or that a
short one makes no difference at all. You can only tell by knowing where you started from and
If your program or effort is one with a specific, one-time goal – for example, the passage of a
law, or the clean-up of a particular space – the temptation may be to evaluate it only by whether
you reach your goal or not (i.e., a single observation at the end of the effort). This would be a
mistake, because it wouldn’t take into account the parts of your effort that were successful and
why, whether or not you reached your goal. That’s a piece of information you’ll need the next
time – and there will be a next time – you or others in the community take on a similar effort.
79
• At regular intervals during the evaluation period. You might choose any period from
once an hour to once a month or more, depending on what you’re observing. The
• At irregular intervals during the evaluation period. The reason for this schedule might
be logistical (you observe when you can); might have to do with making sure that
observations aren’t expected, so that you get a true picture of what you’re looking at; or
might be an attempt to look at the program or effort randomly, again to try to get an
accurate picture.
• At specific times during the evaluation period. In this case, you might be concerned to
see what happens or what participants are doing at different, identifiable times that imply
morning, afternoon, and evening, at each of the four seasons, in rain, clouds, sun, and
snow, and on days when there were special events in the park, to see who uses the park
and how under different conditions and at different times. If you’re monitoring the
process and progress of the program, it’s important to make sure you observe each stage
of it – the planning, the preparation, the implementation, the evaluation, and any followup
– to make sure you get a full picture of what you did and how you did it. This will give
80
you the information you need to analyze in order to make adjustments in how you conduct
your work.
• Continuously. When the observer is a staff member working with program participants
(or one or more of the participants themselves), it may be possible to make ongoing
observations. The observer in this situation might observe directly using checklists, keep
a journal, ask participants to keep records, video- or audiotape sessions, or record what
happens in some other way, so that there’s an ongoing, day-to-day account of the
At the local health center, some of the observation – particularly that of monitoring participants’
blood pressure and weight – would be done at regular intervals, during the monthly meetings.
There would also be some continuous observation -- that of participants keeping track of their
exercise programs in journals. And finding out whether participants continued with their
routines would be a one- or two-time follow-up – perhaps six months and a year after the
program ended.
If you want to be sure you know what observers are referring to in their reports, you have to be
specific about what you want them to look for. The planning group, or a subgroup of it – the
81
ideal would be a group that included a high proportion of people who will actually be researchers
and observers – should set out identification standards for each element to be observed. These
would explain what it looked like, when it was likely to occur, who would probably be involved,
etc. For instance, to observe bullying or interpersonal violence on a playground would require
clear definitions of this behavior, examples and non-examples, and scoring instructions. As a
result, observers could all start with the same guidance about what they were looking for.
Unless all the observation is to be done by those directly involved in the planning (not
• What it’s important to record, and why. People who have no acquaintance with
research might not realize how important it is to record such details as the date, time,
evaluation length, place, and circumstances of any observation, a description of who was
involved and for how long, whether there were unexpected people or conditions present,
etc. An early morning observation might provide a different set of observations than a
late afternoon or evening one, for example. The presence of other people in a situation or
interview – relatives, friends, and program staff – can change the character of the
understand that the context of the observation can be as important as its content.
82
• The definitions and descriptions of the behaviors, conditions, events, or situations to
much difference unless those who’ll do the observations are familiar with them.
result of their reactions to being observed. Observers have to be aware of that possibility,
and make their own behavior as invisible as they can, so as not to influence participants’
behaviors.
In addition to human observers, the presence of audio or video equipment can often have an
effect. One way to offset it is to wait to start collecting data until participants are used to the
presence of the equipment. It’s also important to get participants’ permission beforehand to use
recording equipment.
• Observer bias. Especially in situations where the observers are also program staff, their
relationships with participants, or simply with the effort as a whole, may affect their
reports or observations. If they particularly like or dislike a participant, that may have
some influence on how they interpret or describe that person’s behavior. If they’re
heavily invested in the success of the program being evaluated, they may – intentionally
or unintentionally – put the best possible light on what they see. Whether or not they’re
program staff, observers can also be influenced by their personal assumptions, their
circumstances. If they can be helped to recognize these biases and understand why and
83
how they should be acknowledged or eliminated, there’s a better chance that they’ll
• Observer drift. Sometimes, after people have been observing for a while, their
observations tend to take on a regularity based on the rules they make up rather than
shared definitions based on a standard. They might tend to rate the behavior of certain
participants in ways based more on past experience than on what they see, for instance.
You may also have to correct for observer effects, bias, or drift over the course of the
evaluation. That’s part of devising checks for accuracy and reliability based on a standard.
If your information is to be reliable, it’s important that when two observers record a particular
behavior, they mean exactly the same thing. This is a matter of training (see above), and also one
of checking, either at the beginning or periodically, to make sure that all observers are seeing
things in the same way. A participatory design of the system will help here. If the observers are
involved in defining what they all will be observing, there’s a much better chance they’ll all see
it similarly.
• Use an external standard. One way to define what you’re looking for is to use a
standard that’s used and accepted by all observers. “Behavior X looks like this, occurs in
these circumstances, lasts for this long, and has these after-effects or results.” The use of
84
external standards often employs a checklist or something similar. The observer checks
off components of a behavior or condition, thus documenting what he sees in a way that
matches how a different observer would score the event the same situation. Such
Research teams and laboratories commonly use standards to assure agreement in identifying
various conditions. Each condition is described in detail, with various possible markers, such as
• Check for inter-rater reliability. Inter-rater reliability is the research term for assessing
whether all observers interpret the same things – behaviors, conditions, events – in the
same way. One way to address it is to check observers against one another. Two or more
are exposed to the same situations or information, and then their scoring, such as of
instances of bullying, are compared. If they all say essentially the same thing, then
agree 80% or more of the time, observations are deemed reliable. If they disagree about
what they saw, then you have to find the source of the disagreement. They may define
terms differently, or their backgrounds may bias them toward seeing the same thing in
different ways. Whatever the case, you have to uncover differences, and find a way to
• Use random third-party checks. A researcher, program director, or someone else who
has a clear idea of what information is important and what various conditions or
behaviors look like can observe in randomly chosen situations along with a regular
85
observer to see if their observations match reasonably well. If they disagree once, it may
Here’s this section’s version of “keep at it.” Just like your program, your evaluation, including
Now you’re ready to start collecting and analyzing data. With careful planning and good training,
you should be able to get the information you need for your evaluation.
IN SUMMARY
In order to conduct an evaluation that allows you to see your program or effort clearly and to
adjust and improve it, you have to have a way of collecting accurate and useful information
about it. The observational system you use is the way you look at what you’re doing – at your
own process, at participants’ behavior and their progress and results, at conditions that affect
your effort or that your effort is trying to change – to gain the information that you’ll analyze to
evaluate your work. That system has to be feasible within your resources, and has to fit with the
The design of observational systems is best carried out as a participatory process, particularly
one involving both researchers or evaluators and those who’ll do the actual data collection. That
involvement will give them a clear understanding of the system itself, of what information is
86
needed, and of the pitfalls to data collection that they might encounter along the way. The result
should be a more reliable system, and, ultimately, more accurate data for your evaluation.
Chapter 4
When you hear the word “experiment,” it may call up pictures of people in long white lab coats
peering through microscopes. In reality, an experiment is just trying something out to see how or
why or whether it works. It can be as simple as putting a different spice in your favorite dish, or
as complex as developing and testing a comprehensive effort to improve child health outcomes
in a city or state.
Academics and other researchers in public health and the social sciences conduct experiments to
understand how environments affect behavior and outcomes, so their experiments usually
involve people and aspects of the environment. A new community program or intervention is an
experiment, too, one that a governmental or community organization engages in to find out a
better way to address a community issue. It usually starts with an assumption about what will
work – sometimes called a theory of change - but that assumption is no guarantee. Like any
experiment, a program or intervention has to be evaluated to see whether it works and under
what conditions.
In this section, we’ll look at some of the ways you might structure an evaluation to examine
whether your program is working, and explore how to choose the one that best meets your needs.
87
These arrangements for discovery are known as experimental (or evaluation) designs.
Every evaluation is essentially a research or discovery project. Your research may be about
determining how effective your program or effort is overall, which parts of it are working well
and which need adjusting, or whether some participants respond to certain methods or conditions
differently from others. If your results are to be reliable, you have to give the evaluation a
structure that will tell you what you want to know. That structure – the arrangement of
MEANT TO ANSWER.
• What component(s) and element(s) of the program or intervention were responsible for
the change?
88
• What are the unintended effects of an intervention, and how did they influence the
outcomes?
• Will the program that worked in another context, or the one that you read about in a
professional journal, work in your community, or with your population, or with your
issue?
If you want reliable answers to evaluation questions like these, you have to ask them in a way
that will show you whether you actually got results, and whether those results were in fact due to
your actions or the circumstances you created, or to other factors. In other words, you have to
create a design for your research – or evaluation – to give you clear answers to your questions.
An evaluation may seem simple: if you can see progress toward your goal by the end of the
evaluation period, you’re doing OK; if you can’t, you need to change. Unfortunately, it’s not that
simple at all. First, how do you measure progress? Second, if there seems to be none, how do you
know what you should change in order to increase your effectiveness? Third, if there is progress,
how do you know it was caused by (or contributed to) your program, and not by something else?
And finally, even if you’re doing well, how will you decide what you could do better and what
elements of your program can be changed or eliminated without affecting success? A good
design for your evaluation will help you answer important questions like these.
Some specific reasons for spending the time to design your evaluation carefully include:
89
• So your evaluation will be reliable. A good design will give you accurate results. If you
design your evaluation well, you can trust it to tell you whether you’re actually having an
effect, and why. Understanding your program to this extent makes it easier to achieve and
maintain success.
• So you can pinpoint areas you need to work on, as well as those that are successful.
A good design can help you understand exactly where the strong and weak points of your
program or intervention are, and give you clues as to how they can be further
• So your results are credible. If your evaluation is designed properly, others will take
your results seriously. If a well-designed evaluation shows that your program is effective,
you’re much more likely to be able to convince others to use similar methods, and to
• So you can identify factors unrelated to what you’re doing that have an effect –
histories, crucial local or national events, the passage of time, personal crises, and many
other factors can influence the outcome of a program or intervention for better or worse.
A good evaluation design can help you to identify these, and either correct for them if
• So you can identify unintended consequences (both positive and negative) and
correct for them. A good design can show you all of what resulted from your program
or intervention, not just what you expected. If you understand that your work has
90
consequences that are negative as well as positive, or that it has more and/or different
• So you’ll have a coherent plan and organizing structure for your evaluation. It will
be much easier to conduct your evaluation if it has an appropriate design. You’ll know
better what you need to do in order to get the information you need. Spending the time to
choose and organize an evaluation design will pay off in the time you save later and in
Once you’ve determined your evaluation questions and gathered and organized all the
information you can about the issue and ways to approach it, the next step is choosing a design
for the evaluation. Ideally, this all takes place at the beginning of the process of putting together
a program or intervention. Your evaluation should be an integral part of your program, and its
That’s the ideal; now let’s talk about reality. If you’re reading this, the chances are probably at
intervention that’s been running for some time – months or even years.
Even if that’s true, the same guidelines apply. Choose your questions, gather information, choose
a design, and then go on through the steps presented in this chapter. Evaluation is important
enough that you won’t really be accomplishing anything by taking shortcuts in planning it. If
91
your program has a cycle, then it probably makes sense to start your evaluation at the beginning
of it – the beginning of a year or a program phase, where all participants are starting from the
If that’s not possible – if your program has a rolling admissions policy, or provides a service
whenever people need it – and participants are all at different points, that can sometimes present
research problems. You may want to evaluate the program’s effects only with new participants,
or with another specific group. On the other hand, if your program operates without a particular
beginning and end, you may get the best picture of its effectiveness by evaluating it as it is,
starting whenever you’re ready. Whatever the case, your design should follow your information
If you’re a regular Tool Box user, and particularly if you’ve been reading this chapter, you know
that the Tool Box team generally recommends a participatory process – involving both research
and community partners, including all those with an interest in or who are affected with the
program in planning and implementation. Choosing a design for evaluation presents somewhat
of an exception to this policy, since scientific or evaluation partners may have a much clearer
understanding of what is required to conduct research, and of the factors that may interfere with
it.
As we’ll see in the “how-to” part of this section, there are a number of considerations that have
to be taken into account to gain accurate information that actually tells you what you want to
know. Graduate students generally take courses to gain the knowledge they need to conduct
92
research well, and even some veteran researchers have difficulty setting up an appropriate
research design. That doesn’t mean a community group can’t learn to do it, but rather that the
time they would have to spend on acquiring background knowledge might be too great. Thus, it
makes the most sense to assign this task (or at the very least its coordination) to an individual or
small group with experience in research and evaluation design. Such a person can not only help
you choose among possible designs, but explain what each design entails, in time, resources, and
necessary skills, so that you can judge its appropriateness and feasibility for your context.
• The challenges to the research, and the ways they can be resolved or reduced
• The kinds of research designs that are generally used, and what each design entails
what the structure of your program will support, what participants will consent to, and
We’ll begin this part of the section with an examination of the concerns research designs should
address, go on to considering some common designs and how well they address those concerns,
93
and end with some guidelines for choosing a design that will both be possible to implement and
Note: in this part of the section, we’re looking at evaluation as a research project. As a result,
we’ll use the term “research” in many places where we could just as easily have said, for the
purposes of this section, “evaluation.” Research is more general, and some users of this section
The most important consideration in designing a research project – except perhaps for the value
of the research itself – is whether your arrangement will provide you with valid information. If
you don’t design and set up your research project properly, your findings won’t give you
information that is accurate and likely to hold true with other situations. In the case of an
evaluation, that means that you won’t have a basis for adjusting what you do to strengthen and
improve it.
Here’s a far-fetched example that illustrates this point. If you took children’s heights at age six,
then fed them large amounts of a specific food for three years – say carrots – and measured them
again at the end of the period, you’d probably find that most of them were considerably taller at
nine years than at six. You might conclude that it was eating carrots that made the children
taller because your research design gave you no basis for comparing these children’s growth to
94
There are two kinds of threats to the validity of a piece of research. They are usually referred
to as threats to internal validity (whether the intervention produced the change) and threats to
external validity (whether the results are likely to apply to other people and situations).
These are threats (or alternative explanations) to your claim that what you did caused changes in
the direction you were aiming for. They are generally posed by factors operating at the same
time as your program or intervention that might have an effect on the issue you’re trying to
address. If you don’t have a way of separating their effects from those of your program, you
can’t tell whether the observed changes were caused by your work, or by one or more of these
other factors. They’re called threats to internal validity because they’re internal to the study –
they have to do with whether your intervention – and not something else – accounted for the
difference.
education, etc. – and external events that occur during the research period – a disaster, an
election, conflict in the community, a new law – may influence whether or not there’s
• Maturation. This refers to the natural physical, psychological, and social processes that
take place as time goes by. The growth of the carrot-eating children in the example above
95
passed from adolescence to adulthood, the development of arthritis in older people, or
participants becoming tired during learning activities towards the end of the day.
existence, or of their taking part in it, may affect participants’ behavior or attitudes, as
– can change over time, or different ones may not give the same results. By the same
token, observers – those gathering information – may change their standards over time, or
• Regression toward the mean. This is a statistical term that refers to the fact that, over
time, the very high and very low scores on a measure (a test, for instance) often tend to
drift back toward the average for the group. If you start a program with participants who,
by definition, have very low or high levels of whatever you’re measuring – reading skill,
backgrounds, etc. – their scores may end up closer to the average over the course of the
• The selection of participants. Those who choose participants may slant their selection
toward a particular group that is more or less likely to change than a cross-section of the
population from which the group was selected. (A good example is that of employment
training programs that get paid according to the number of people they place in jobs.
They’re more likely to select participants who already have all or most of the skills they
96
need to become employed, and neglect those who have fewer skills... and who therefore
most need the service.) Selection can play a part when participants themselves choose to
enroll in a program (self-selection), since those who decide to participate are probably
particular group may, simply by coincidence, share a characteristic that will set their
results on your measures apart from the norm of the population you’re drawing from.
Selection can also be a problem when two groups being compared are chosen by different
standards. We’ll discuss this further below when we deal with control or comparison groups.
• The loss of data or participants. If too little information is collected about participants,
or if too many drop out well before the research period is over, your results may be based
on too little data to be reliable. This also arises when two groups are being compared. If
their losses of data or participants are significantly different, comparing them may no
• The nature of change. Often, change isn’t steady and even. It can involve leaps forward
and leaps backward before it gets to a stable place – if it ever does. (Think of looking at
the performance of a sports team halfway through the season. No matter what its record is
at that moment, you won’t know how well it will finish until the season is over.) Your
measurements may take place over too short a period or come at the wrong times to track
97
• A combination of the effects of two or more of these. Two or more of these factors
may combine to produce or prevent the changes your program aims to produce. A
language-study curriculum that is tested only on students who already speak two or more
languages runs into problems with both participants’ history – all the students have
experience learning languages other than their own – and selection – you’ve chosen
These are factors that affect your ability to apply your research results in other circumstances – to
increase the chances that your program and its results can be reproduced elsewhere or with other
populations. If, for instance, you offer parenting classes only to single mothers, you can’t
assume, no matter how successful they appear to be, that the same classes will work as well with
men.
Threats to external validity (or generalizability) may be the result of the interactions of other
factors with the program or intervention itself, or may be due to particular conditions of the
program.
Some examples:
• Interaction of testing or data collection and the program or intervention. An initial test
or observation might change the way participants react to the program, making a
difference in final outcomes. Since you can’t assume that another group will have the
98
• Interaction of selection procedures and the program or intervention. If the participants
program, it can’t be assumed to be effective with participants who are less sensitive or
Parents who’ve been threatened by the government with the loss of their children due to child
abuse may be more receptive to learning techniques for improving their parenting, for example,
• The effects of the research arrangements. Participants may change behavior as a result of
being observed, or may react to particular individuals in ways they would be unlikely to
A classic example here is that of a famous baboon researcher, Irven DeVore, who after years of
observing troupes of baboons, realized that they behaved differently when he was there than
when he wasn’t. Although his intent was to observe their natural behavior, his presence itself
constituted an intervention, making the behavior of the baboons he was observing different
context, or are exposed to another before or at the same time as the one being evaluated.
This may occur when participants are receiving services from different sources, or being
99
Given the range of community programs that exist, there are many possibilities here. Adults
might be members of a high school completion class while participating in a substance abuse
recovery program. A diabetic might be treated with a new drug while at the same time
participating in a nutrition and physical activity program to deal with obesity. Sometimes,
the sequence of treatments or services in a single program can have the same effect, with one
influencing how participants respond to those that follow, even though each treatment is
Many books have been written on the subject of research design. While they contain too much
material to summarize here, there are some basic designs that we can introduce. The important
differences among them come down to how many measurements you’ll take, when you will take
Program evaluations generally look for the answers to three basic questions:
• Was whatever change took place – or the lack of change – caused by your program,
intervention, or effort?
• What, in your program or outside it, actually caused or prevented the change?
As we’ve discussed, changes and improvement in outcomes may have been caused by some or
all of your intervention, or by external factors. Participants’ or the community’s history might
100
have been crucial. Participants may have changed as a result of simply getting older and more
mature or more experienced in the world – often an issue when working with children or
can often facilitate or prevent change as well. Understanding exactly where the change came
from or where the barriers to change reside, gives you the opportunity to adjust your program to
If all you had to do was to measure whatever behavior or condition you wanted to influence at
the beginning and end of the evaluation, choosing a design would be an easy task. Unfortunately,
it’s not quite that simple – there are those nasty threats to validity to worry about. We have to
Research designs, in general, differ in one or both of two ways: the number and timing of the
measurements they use; and whether they look at single or multiple groups. We’ll look at
101
Researchers usually refer to your first measurement(s) or observation(s) – the ones you take before you
start your program or intervention – as a baseline measure or baseline observation, because it establishes
• Independent variables are the program itself and/or the methods or conditions that the researcher
– in this case, you – wants to evaluate. They’re called variables because they can change – you
might have chosen (and might still choose) other methods. They’re independent because their
existence doesn’t depend on whether something else occurs: you’ve chosen them,
• Dependent variables are whatever may or may not change as a result of the presence of the
variable. (If you’re evaluating a number of different methods or conditions, each of them is an
independent variable.) Whatever you’re trying to change is the dependent variable. (If you’re
aiming at change in more than one behavior or outcome, each type of change is a different
dependent variable.) They’re called dependent variables because changes in them depend on
• Measures are just that – measurements of the dependent variables. They usually refer to
procedures that have results that can be translated into numbers, and may take the form of
community assessments, observations, surveys, interviews, or tests. They may also count
incidents or measure the amount of the dependent variable (number or percentage of children
who are overweight or obese, violent crimes per 100,000 population, etc.)
102
Observations might involve measurement, or they might simply record what happens in specific
circumstances: the ways, in which people use a space, the kinds of interactions children have in
a classroom, the character of the interactions during an assessment. For convenience,
researchers often use “observation” to refer to any kind of measurement and we’ll use the same
convention here.
Before we go any further, it is helpful to have an understanding of some basic research terms that
The simplest design is also probably the least accurate and desirable: the pre (before) and post
concerned with in one group – the infant mortality rate, unemployment, water pollution –
applying your intervention to that group or community, and then observing again. This type of
design assumes that a difference in the two observations will tell you whether there was a change
over the period between them, and also assumes that any positive change was caused by the
intervention.
In most cases, a pre-post design won’t tell you much, because it doesn’t really address any of the
research concerns we’ve discussed. It doesn’t account for the influence of other factors on the
dependent variable, and it doesn’t tell you anything about trends of change or the progress of
change during the evaluation period – only where participants were at the beginning and where
103
they were at the end. It can help you determine whether certain kinds of things have happened –
whether there’s been a reduction in the level of educational attainment or the amount of
environmental pollution in a river, for instance – but it won’t tell you why. Despite its
limitations, taking measures before and after the intervention is far better than no measures.
Even looking at something as seemingly simple to measure pre and post as blood pressure (in a
heart disease prevention program) is questionable. Blood pressure may be lower at the final
observation than at the initial one, but that tells you nothing about how much it may have gone
up and down in between. If the readings were taken by different people, the change may be due
in part to differences in their skill, or to how relaxed each was able to make participants feel.
Familiarity with the program could also have reduced most participants’ blood pressure from the
pre- to the post-measurement, as could some other factor that wasn’t specifically part of the
Interrupted time series design with a single group (simple time series)
An interrupted time series used repeated measures before and after delayed implementation of
the independent variable (e.g., the program, etc.) to help rule out other explanations. This
relatively strong design – with comparisons within the group – addresses most threats to internal
validity.
The simplest form of this design is to take repeated observations, implement the program or
intervention, and observe a number of times during the evaluation period, including at the end.
This method is a great improvement over the pre- and post- design in that it tracks the trend of
change, and can therefore, help see whether it was actually the independent variable that caused
104
any change. It can also help to identify the influence of external factors such as when the
Another possibility for this design is to implement more than one independent variable, either by
trying two or more, one after another (often with a break in between), or by adding each to what
came before. This gives a picture not only of the progress of change, but can show very clearly
what causes change. That gives an evaluator the opportunity not only to adjust the program, but
There are a number of variations on the interrupted time series theme, including varying the
observation times; implementing the independent variable repeatedly; and implementing one
independent variable, then another, then both together to evaluate their interaction.
In any variety of interrupted time series design, it’s important to know what you’re looking for.
In an evaluation of a traffic fatality control program in the United Kingdom that focused on
reducing drunk driving, monthly measurements seemed to show only a small decline in fatal
accidents. When the statistics for weekends, when there were most likely to be drunk drivers on
the road, were separated out, however, they showed that the weekend fatality rate dropped
sharply with the implementation of the program, and stayed low thereafter. Had the researchers
not realized that that might be the case, the program might have been stopped, and the weekend
Interrupted time series design with multiple groups (multiple baseline/time series)
This has the same possibilities as the single time series design, with the added wrinkle of using
repeated measures with one or more other groups (so-called multiple baselines). By using
105
multiple baselines (groups), the external validity or generality of the findings is enhanced – we
can see if the effects occur with different groups or under different conditions.
This multiple time series design – typically staggered introduction of the intervention with
• You can try a method or program with two or more groups from the same
• You can try a particular method or program with different populations, to see if it’s
• You can vary the timing or intensity of an intervention with different groups
• You can try the same two or more interventions with each of two groups, but reverse
A common way to evaluate the effects of an independent variable is to use a control group. This
group is usually similar to the participant group, but either receives no intervention at all, or
receives a different intervention with the same goal as that offered to the participant group. A
control group design is usually the most difficult to set up – you have to find appropriate groups,
observe both on a regular basis, etc. – but is generally considered to be the most reliable.
106
The term control group comes from the attempt to control outside and other influences on the
dependent variable. If everything about the two groups except their exposure to the program
being evaluated averages out to be the same, then any differences in results must be due to that
exposure. The term comparison group is more modest; it typically offers a community
watched for similar levels of the problem/goal and relevant characteristics of the community
The gold standard here is the randomized control group, one that is selected totally at random,
either from among the population the program or intervention is concerned with – those at risk
for heart disease, unemployed males, young parents – or, if appropriate, the population at large.
A random group eliminates the problems of selection we discussed above, as well as issues that
A control group that’s carefully chosen will have the same characteristics as the intervention
group (the focus of the evaluation). If, for instance, the two groups come from the same pool of
people with a particular health condition, and are chosen at random either to be treated in the
conventional way or to try a new approach, it can be assumed that – since they were chosen at
random from the same population – both groups will be subject, on average, to the same outside
influences, and will have the same diversity of backgrounds. Thus, if there is a significant
difference in their results, it is fairly safe to assume that the difference comes from the
107
The difficulty for governmental and community-based organizations is to find or create a
randomized control group. If the program has a long waiting list, it may be able to create a
control by selecting those to first receive the intervention at random. That in itself creates
problems, in that people often drop off waiting lists out of frustration or other reasons. Being
included in the evaluation may help to keep them, on the other hand, by giving them a closer
waiting list addressed the problem by offering those on the waiting list a different option.
They received videotapes to use at home, along with biweekly tutoring by advanced students
and graduates of the program. Thus, they became a comparison group with a somewhat
different intervention that, as expected, was less effective than the program itself, but was more
effective than none, and kept them on the waiting list. It also gave them a head start once they
got into the classes, with many starting at middle rather than at a beginning level.
When there’s no waiting list or similar group to draw from, community organizations often end
up using a comparison group - one composed of participants in another place or program and
whose members’ characteristics, backgrounds, and experience may or may not be similar to
those of the participant group. That circumstance can raise some of the same problems related to
selection seen when there is no control group. If the only potential comparisons involve very
different groups, it may be better to use a design, such as an interrupted time series design that
doesn’t involve a control group at all, where the comparison is within (not between) groups.
108
Groups may look similar, but may differ in an important way. Two groups of participants in a
substance abuse intervention program, for instance, may have similar histories, but if one
program is voluntary and the other is not, the results aren’t likely to be comparable. One group
will probably be more motivated and less resentful than the other, and composed of people who
already know they have a potential problem. The motivation and determination of their
participants, rather than the effectiveness of the two programs, may influence the amount of
change observed.
This issue may come up in a single-group design as well. A program that may, on average,
seem to be relatively ineffective may prove, on close inspection, to be quite effective with
certain participants – those of a specific educational background, for instance, or with particular
life experiences. Looking at results with this in mind can be an important part of an evaluation,
CHOOSING A DESIGN
This section’s discussion of research designs is in no way complete. It’s meant to provide an
introduction to what’s available. There are literally thousands of books and articles written on
this topic, and you’ll probably want more information. There are a number of statistical
methods that can compensate for less-than-perfect designs, for instance: few community groups
109
have the resources to assemble a randomized control group, or to implement two or more similar
Given this, the material that follows is meant only as broad guidelines. We don’t attempt to be
specific about what kind of design you need in what circumstances, but only try to suggest some
things to think about in different situations. Help is available from a number of directions: Much
can be found on the Internet (see the “Resources” part of this section for a few sites); there are
numerous books and articles (the classic text on research design is also cited in “Resources”);
and universities are a great resource, both through their libraries and through faculty and
graduate students who might be interested in what you’re doing, and be willing to help with
your evaluation. Use any and all of these to find what will work best for you. Funders may also
be willing either to provide technical assistance for evaluations, or to include money in your
Your goal in evaluating your effort is to get the most reliable and accurate information possible,
given your evaluation questions, the nature of your program, what your participants will consent
to, your time constraints, and your resources. The important thing here is not to set up a perfect
research study, but to design your evaluation to get real information, and to be able to separate
110
the effects of external factors from the effects of your program. So how do you go about
choosing the best design that will be workable for you? The steps are in the first sentence of this
paragraph.
What do you need to know? If the intent of your evaluation is simply to see whether something
specific happened, it’s possible that a simple pre-post design will do. If, as is more likely, you
want to know both whether change has occurred, and if it has, whether it has in fact been caused
by your program, you’ll need a design that helps to screen out the effects of external influences
For many community programs, a control or comparison group is helpful, but not absolutely
necessary. Think carefully about the frequency and timing of your observations and the amount
of different kinds of information you can collect. With repeated measures, you can get you quite
an accurate picture of the effectiveness of your program from a simple time series design. Single
group interrupted time series designs, which are often the most workable for small organizations,
can give you a very reliable evaluation if they’re structured well. That generally means obtaining
multiple baseline observations (enough to set a trend) before the program begins; observing often
and documenting your observations carefully (often with both quantitative – expressed in
numbers – and qualitative – expressed in records of incidents and of what participants did and
said – data); and including during intervention and follow-up observations to see whether
111
In many of these situations, a multiple-group interrupted time series design is quite possible, but
each working toward the same goals, you have the opportunity to stagger the introduction of the
intervention across the groups. This comparison with (and across) groups allows you to screen
out such factors as the facilitator’s ability and community influences (assuming all participants
come from the same general population.) You could also try different methods or time
In some cases, the real question is not whether your method or program works, but whether it
works better than other methods or programs you could be using. Teaching a skill – for instance,
employment training, parenting, diabetes management, and conflict resolution – often falls into
this category. Here, you need a comparison of some sort. While evaluations of some of these –
medical treatment, for example – may require a control group, others can be compared to data
from the field, to published results of other programs, or, by using community-level indicators,
There are community programs where the bottom line is very simple. If you’re working to
control water pollution, your main concern may be the amount of pollution coming out of
effluent pipes, or the amount found in the river. Your only measure of success may be keeping
pollution below a certain level, which means that regular monitoring of water quality is the only
evaluation you need. There are probably relatively few community programs where evaluation
is this easy – you might, for instance, want to know which of your pollution-control activities is
most effective – but if yours is one, a simple design may be all you need.
112
Consider the nature of your program
What does your program look like and what is it meant to do? Does it work with participants in
groups, or individually, for instance? Does it run in cycles – classes or workshops that begin and
end on certain dates, or a time-limited program that participants go through only once? Or can
participants enter whenever they are ready and stay until they reach their goals? How much of
the work of the program is dependent on staff, and how much do participants do on their own?
How important is the program context – the way staff, participants, and others treat one another,
the general philosophy of the program, the physical setting, the organizational culture? (The
culture of an organization consists of accepted and traditional ways of doing things, patterns of
relationships, how people dress, how they act toward and communicate with one another, etc.)
• If you work with participants in groups, a multiple-group design – either interrupted time
series or control group – might be easier to use. If you work with participants
individually, perhaps a simple time series or a single group design would be appropriate.
• If your program is time-limited – either one-time-only, or with sessions that follow one
another – you’ll want a design that fits into the schedule, and that can give you reliable
results in the time you have. One possibility is to use a multiple group design, with
groups following one another session by session. The program for each group might be
adjusted, based on the results for the group before, so that you could test new ideas each
session.
• If your program has no clear beginning and end, you’re more likely to need a single
group design that considers participants individually, or by the level of their baseline
113
performance. You may also have to compensate for the fact that participants may be
A proverb says that you never step in the same river twice, because the water that flows past a
fixed point is always changing. The same is true of most community programs. Someone
coming into a program at a particular time may have a totally different experience than a similar
person entering at a different time, even though the operation of the program is the same for
both. A particular participant may encourage everyone around her, and create an
overwhelmingly positive atmosphere different from that experienced by participants who enter
the program after she has left, for example. It’s very difficult to control for this kind of
difference over time, but it’s important to be aware that it can, and often does, exist, and may
If the organizational or program context and culture are important, then you’ll probably
want to compare your results with participants to those in a control group in a similar
There is, of course, a huge range of possibilities here: nearly any design can be adapted to nearly
any situation in the right circumstances. This material is meant only to give you a sense of how
In addition to the effect that it might have on the results of your evaluation, you might find that a
lot of observation can raise protests from participants who feel their privacy is threatened, or
from already-overworked staff members who see adding evaluation to their job as just another
114
burden. You may be able to overcome these obstacles, or you may have to compromise – fewer
or different kinds of observations, a less intrusive design – in order to be able to conduct the
evaluation at all.
There are other reasons that participants might object to observation, or at least intense
observation. Potential for embarrassment, a desire for secrecy (to keep their participation in the
program from family members or others), even self-protection (in the case of domestic violence,
for instance) can contribute to unwillingness to be a participant in the evaluation. Staff members
There are ways to deal with these issues, but there’s no guarantee that they’ll work. One is to
inform participants at the beginning about exactly what you’re hoping to do, listen to their
objections, and meet with them (more than once, if necessary) to come up with a satisfactory
approach. Staff members are less likely to complain if they’re involved in planning the
evaluation, and thus have some say over the frequency and nature of observations. The same is
true for participants. Treating everyone’s concerns seriously and including them in the planning
As we mentioned above, the important thing here is to choose a design that will give you
reasonably reliable information. In general, your design doesn’t have to be perfect, but it does
have to be good enough to give you a reasonably good indication that changes are actually taking
place, and that they are the result of your program. Just how precise you can be is at least
115
partially controlled by the limits on your time placed by funding, program considerations, and
other factors.
• Program structure. An evaluation may make the most sense if it’s conducted to
• Funding. If you are funded only for a pilot project, for example, you’ll have to conduct
your evaluation within the time span of the funding, and soon enough to show that your
program is successful enough to be refunded. A time schedule for evaluation may be part
• Participants’ schedules. A rural education program may need to stop for several months a
• The availability of professional evaluators. Perhaps the evaluation team can only work
Strategic planners often advise that groups and organizations consider resources last: otherwise
they’ll reject many good ideas because they’re too expensive or difficult, rather than trying to
find ways to make them work with the resources at hand. Resources include not only money, but
also space, materials and equipment, personnel, and skills and expertise. Often, one of these can
116
substitute for another: a staff person with experience in research can take the place of money that
would be used to pay a consultant, for example. A partnership with a nearby university could get
The lesson here is to begin by determining the best design possible for your purposes, without
regard to resources. You may have to settle for somewhat less, but if you start by aiming for
what you want, you’re likely to get a lot closer to it than if you assume you can’t possibly get it.
IN SUMMARY
The way you design your evaluation research will have a lot to do with how accurate and reliable
your results are, and how well you can use them to improve your program or intervention. The
design should be one that best addresses key threats to internal validity (whether the intervention
caused the change) and external validity (the ability to generalize your results to other situations,
Common research designs – such as interrupted time series or control group designs– can be
adapted to various situations, and combined in various ways to create a design that is both
appropriate and feasible for your program. It may be necessary to seek help from a consultant, a
university partner, or simply someone with research experience to help identify a design that fits
your needs.
A good design will address your evaluation questions, and take into consideration the nature of
your program, what program participants and staff will agree to, your time constraints, and the
resources you have available for evaluation. It often makes sense to consider resources last, so
117
that you won’t reject good ideas because they seem too expensive or difficult. Once you’ve
chosen a design, you can often find a way around a lack of resources to make it a reality.
Chapter 5
118
COLLECTING AND ANALYSING DATA
In previous sections of this chapter, we’ve discussed studying the issue, deciding on a research
design, and creating an observational system for gathering information for your evaluation. Now
it’s time to collect your data and analyze it – figuring out what it means – so that you can use it
to draw some conclusions about your work. In this section, we’ll examine how to do just that.
Essentially, collecting data means putting your design for collecting information into operation.
You’ve decided how you’re going to get information – whether by direct observation,
interviews, surveys, experiments and testing, or other methods – and now you and/or other
observers have to implement your plan. There’s a bit more to collecting data, however. If you are
conducting observations, for example, you’ll have to define what you’re observing and arrange
to make observations at the right times, so you actually observe what you need to. You’ll have to
record the observations in appropriate ways and organize them so they’re optimally useful.
Recording and organizing data may take different forms, depending on the kind of information
you’re collecting. The way you collect your data should relate to how you’re planning to analyze
and use it. Regardless of what method you decide to use, recording should be done concurrent
with data collection if possible, or soon afterwards, so that nothing gets lost and memory doesn’t
fade.
Some of the things you might do with the information you collect include:
119
• Making photocopies of all recording forms, records, audio or video recordings, and any
other collected materials, to guard against loss, accidental erasure, or other problems
• Entering narratives, numbers, and other information into a computer program, where they
ready for analysis. These might, for instance, include entering numerical observations
into a chart, table, or spreadsheet, or figuring the mean (average), median (midpoint),
• Transcribing (making an exact, word-for-word text version of) the contents of audio or
video recordings
• Coding data (translating data, particularly qualitative data that isn’t expressed in
• Organizing data in ways that make them easier to work with. How you do this will
depend on your research design and your evaluation questions. You might group
individuals or groups of participants, by time, by activity, etc. You might also want to
group observations in several different ways, so that you can study interactions among
different variables.
120
There are two kinds of variables in research. An independent variable (the intervention) is a
condition implemented by the researcher or community to see if it will create change and
improvement. This could be a program, method, system, or other action. A dependent variable
variable could be a behavior, outcome, or other condition. A smoking cessation program, for
example, is an independent variable that may change group members’ smoking behavior, the
Analyzing information involves examining it in ways that reveal the relationships, patterns,
trends, etc. that can be found within it. That may mean subjecting it to statistical operations that
can tell you not only what kinds of relationships seem to exist among variables, but also to what
level you can trust the answers you’re getting. It may mean comparing your information to that
from other groups (a control or comparison group, statewide figures, etc.), to help draw some
conclusions from the data. The point, in terms of your evaluation, is to get an accurate
assessment in order to better understand your work and its effects on those you’re concerned
There are two kinds of data you’re apt to be working with, although not all evaluations will
necessarily include both. Quantitative data refer to the information that is collected as, or can
be translated into, numbers, which can then be displayed and analyzed mathematically.
Qualitative data are collected as descriptions, anecdotes, opinions, quotes, interpretations, etc.,
and are generally either not able to be reduced to numbers, or are considered more valuable or
121
informative if left as narratives. As you might expect, quantitative and qualitative information
QUANTITATIVE DATA
Quantitative data are typically collected directly as numbers. Some examples include:
with diabetes, unemployed, Spanish-speaking, under age 14, grade of school completed,
etc.)
Data can also be collected in forms other than numbers, and turned into quantitative data for
analysis. Researchers can count the number of times an event is documented in interviews or
records, for instance, or assign numbers to the levels of intensity of an observed event or
behavior. For instance, community initiatives often want to document the amount and intensity
of environmental changes they bring about – the new programs and policies that result from their
efforts. Whether or not this kind of translation is necessary or useful depends on the nature of
what you’re observing and on the kinds of questions your evaluation is meant to answer.
Quantitative data is usually subjected to statistical procedures such as calculating the mean or
average number of times an event or behavior occurs (per day, month, and year). These
122
operations, because numbers are “hard” data and not interpretation, can give definitive, or nearly
definitive, answers to different questions. Various kinds of quantitative analysis can indicate
changes in a dependent variable related to – frequency, duration, timing (when particular things
happen), intensity, level, etc. They can allow you to compare those changes to one another, to
changes in another variable, or to changes in another population. They might be able to tell you,
at a particular degree of reliability, whether those changes are likely to have been caused by your
intervention or program, or by another factor, known or unknown. And they can identify
relationships among different variables, which may or may not mean that one causes another.
QUALITATIVE DATA
Unlike numbers or “hard data,” qualitative information tends to be “soft,” meaning it can’t
always be reduced to something definite. That is in some ways a weakness, but it’s also strength.
A number may tell you how well a student did on a test; the look on her face after seeing her
grade, however, may tell you even more about the effect of that result on her. That look can’t be
translated to a number, nor can a teacher’s knowledge of that student’s history, progress, and
experience, all of which go into the teacher’s interpretation of that look. And that interpretation
may be far more valuable in helping that student succeed than knowing her grade or numerical
Qualitative data can sometimes be changed into numbers, usually by counting the number of
times specific things occur in the course of observations or interviews, or by assigning numbers
The challenges of translating qualitative into quantitative data have to do with the human factor.
123
Even if most people agree on what 1 (lowest) or 5 (highest) means in regard to rating
“satisfaction” with a program, ratings of 2, 3, and 4 may be very different for different people.
Furthermore, the numbers say nothing about why people reported the way they did. One may
dislike the program because of the content, the facilitator, the time of day, etc. The same may be
true when you’re counting instances of the mention of an event, such as the onset of a new
policy or program in a community based on interviews or archival records. Where one person
might see a change in program he considers important another may omit it due to perceived
unimportance.
Qualitative data can sometimes tell you things that quantitative data can’t. It may reveal why
certain methods are working or not working, whether part of what you’re doing conflicts with
participants’ culture, what participants see as important, etc. It may also show you patterns – in
behavior, physical or social environment, or other factors – that the numbers in your quantitative
data don’t, and occasionally even identify variables that researchers weren’t aware of.
on people’s opinions, knowledge, assumptions, and inferences (and therefore biases) – than that
other communication, the spotting of trends – all of these can be influenced by the way the
researcher sees the world. Be aware, however, that quantitative analysis is influenced by a
124
number of subjective factors as well. What the researcher chooses to measure, the accuracy of
the observations, and the way the research is structured to ask only particular questions can all
influence the results, as can the researcher’s understanding and interpretation of the subsequent
analyses.
WHY SHOULD YOU COLLECT AND ANALYZE DATA FOR YOUR EVALUATION?
Part of the answer here is that not every organization – particularly small community-based or
evaluation. They may have to be content with less formal evaluations, which can still be
will involve some data gathering and analysis. This data collection and sense making is critical to
• The data can show whether there was any significant change in the dependent
variable(s) you hoped to influence. Collecting and analyzing data helps you see whether
The term “significance” has a specific meaning when you’re discussing statistics. The level of
significance of a statistical result is the level of confidence you can have in the answer you get.
Generally, researchers don’t consider a result significant unless it shows at least a 95%
certainty that it’s correct (called the .05 level of significance, since there’s a 5% chance that it’s
wrong). The level of significance is built into the statistical formulas: once you get a
mathematical result, a table (or the software you’re using) will tell you the level of
significance.
125
Thus, if data analysis finds that the independent variable (the intervention) influenced the
dependent variable at the .05 level of significance, it means there’s a 95% probability or
likelihood that your program or intervention had the desired effect. The .05 level is generally
considered a reasonable result, and the .01 level (99% probability) is considered about as close
to certainty as you are likely to get. A 95% level of certainty doesn’t mean that the program
works on 95% of participants, or that it will work 95% of the time. It means that there’s only a
5% possibility that it isn’t actually what’s influencing the dependent variable(s) and causing the
• They can uncover factors that may be associated with changes in the dependent
variable(s). Data analyses may help discover unexpected influences; for instance, that
the effort was twice as large for those participants who also were a part of a support
• They can show connections between or among various factors that may have an
effect on the results of your evaluation. Some types of statistical procedures look for
variables may change when others do. These changes may be similar – i.e., both
amount of reading they do also increases). Or the opposite may be observed – i.e. the
two variables change in opposite directions (as the amount of exercise they engage in
increases, peoples’ weight decreases). Correlations don’t mean that one variable causes
another or that they both have the same cause, but they can provide valuable information
126
• They can help shed light on the reasons that your work was effective or, perhaps,
less effective than you’d hoped. By combining quantitative and qualitative analysis,
you can often determine not only what worked or didn’t, but why. The effect of cultural
issues, how well methods are used, and the appropriateness of your approach for the
population – these as well as other factors that influence success can be highlighted by
careful data collection and analysis. This knowledge gives you a basis for adapting and
changing what you do to make it more likely you’ll achieve the desired outcomes in the
future.
• They can provide you with credible evidence to show stakeholders that your
Stakeholders, such as funders and community boards, want to know their investments are
well spent. Showing evidence of intermediate outcomes (e.g. new programs and policies)
127
Their use shows that you’re serious about evaluation and about improving your
work. Being a good trustee or steward of community investment includes regular review
• They can show the field what you’re learning, and thus pave the way for others to
improve community efforts and, ultimately, quality of life for people who benefit.
As far as data collection goes, the “when” part of this question is relatively simple: data
collection should start no later than when you begin your work – or before you begin in order to
establish a baseline or starting point – and continue throughout. Ideally, you should collect data
for a period of time before you start your program or intervention in order to determine if there
are any trends in the data before the onset of the intervention. Additionally, in order to gauge
your program’s longer-term effects, you should collect follow-up data for a period of time
The timing of analysis can be looked at in at least two ways: One is that it’s best to analyze your
information when you’ve collected all of it, so you can look at it as a whole. The other is that if
you analyze it as you go along, you’ll be able to adjust your thinking about what information you
actually need, and to adjust your program to respond to the information you’re getting. Which of
these approaches you take depends on your research purposes. If you’re more concerned with a
summative evaluation – finding out whether your approach was effective, you might be more
inclined toward the first. If you’re oriented toward improvement – a formative evaluation – we
128
recommend gathering information along the way. Both approaches are legitimate, but ongoing
data collection and review can particularly lead to improvements in your work.
The “who” question can be more complex. If you’re reasonably familiar with statistics and
statistical procedures, and you have the resources in time, money, and personnel, it’s likely that
you’ll do a somewhat formal study, using standard statistical tests. (There’s a great deal of
software – both for sale and free or open-source – available to help you.)
• You can hire or find a volunteer outside evaluator, such as from a nearby college or
• You can conduct a less formal evaluation. Your results may not be as sophisticated as if
you subjected them to rigorous statistical procedures, but they can still tell you a lot about
your program. Just the numbers – the number of dropouts (and when most dropped out),
for instance, or the characteristics of the people you serve – can give you important and
usable information.
• You can try to learn enough about statistics and statistical software to conduct a formal
129
• You can collect the data and then send it off to someone – a university program, a friendly
You can collect and rely largely on qualitative data. Whether this is an option depends to
a large extent on what your program is about. You wouldn’t want to conduct a formal
evaluation of effectiveness of a new medication using only qualitative data, but you might
be able to draw some reasonable conclusions about use or compliance patterns from
qualitative information.
• If possible, use a randomized or closely matched control group for comparison. If your
control is properly structured, you can draw some fairly reliable conclusions simply by
comparing its results to those of your intervention group. Again, these results won’t be as
reliable as if the comparison were made using statistical procedures, but they can point
you in the right direction. It’s fairly easy to tell whether or not there’s a major difference
between the numbers for the two or more groups. If 95% of the students in your class
passed the test, and only 60% of those in a similar but uninstructed control group did, you
can be pretty sure that your class made a difference in some way, although you may not
be able to tell exactly what it was that mattered. By the same token, if 72% of your
students passed and 70% of the control group did as well, it seems pretty clear that your
instruction had essentially no effect, if the groups were starting from approximately the
same place.
Who should actually collect and analyze data also depends on the form of your evaluation. If
you’re doing a participatory evaluation, much of the data collection - and analyzing - will be
130
done by community members or program participants themselves. If you’re conducting an
evaluation in which the observation is specialized, the data collectors may be staff members,
professionals, highly trained volunteers, or others with specific skills or training (graduate
students, for example). Analysis also could be accomplished by a participatory process. Even
where complicated statistical procedures are necessary, participants and/or community members
might be involved in sorting out what those results actually mean once the math is done and the
results are in. Another way analysis can be accomplished is by professionals or other trained
individuals, depending upon the nature of the data to be analyzed, the methods of analysis, and
Whether your evaluation includes formal or informal research procedures, you’ll still have to
collect and analyze data, and there are some basic steps you can take to do so.
• Clearly define and describe what measurements or observations are needed. The
definition and description should be clear enough to enable observers to agree on what
131
• Select and train observers. Particularly if this is part of a participatory process, observers
need training to know what to record; to recognize key behaviors, events, and conditions;
• Conduct observations at the appropriate times for the appropriate period of time. This
may include reviewing archival material; conducting interviews, surveys, or focus groups;
Record data in the agreed-upon ways. These may include pencil and paper, computer
(using a laptop or handheld device in the field, entering numbers into a program, etc.),
How you do this depends on what you’re planning to do with it, and on what you’re
interested in.
• Enter any necessary data into the computer. This may mean simply typing comments,
information (possibly including audio and video) into a database, spreadsheet, a GIS
• Transcribe any audio- or videotapes. This makes them easier to work with and copy, and
132
• Sort your information in ways appropriate to your interest. This may include sorting by
• When possible, necessary, and appropriate, transform qualitative into quantitative data.
This might involve, for example, counting the number of times specific issues were
We’ve referred several times to statistical procedures that you can apply to quantitative data. If
you have the right numbers, you can find out a great deal about whether your program is causing
or contributing to change and improvement, what that change is, whether there are any expected
or unexpected connections among variables, how your group compares to another you’re
measuring, etc.
There are other excellent possibilities for analysis besides statistical procedures,
• Simple counting, graphing and visual inspection of frequency or rates of behavior, events,
• Using visual inspection of patterns over time to identify discontinuities (marked increases,
133
• Calculating the mean (average), median (midpoint), and/or mode (most frequent) of a
series of measurements or observations. What was the average blood pressure, for
instance, of people who exercised 30 minutes a day at least five days a week, as opposed
track changes in) the people or situation. Journals can be particularly revealing in this
area because they record people’s experiences and reflections over time.
Finding patterns in qualitative data. If many people refer to similar problems or barriers,
these may be important in understanding the issue, determining what works or doesn’t
success might be meeting a goal for planning or program implementation, for example.
Depending on the nature of your research, results may be statistically significant (the 95% or
better certainty that we discussed earlier), or simply important or unusual. They may or may not
There are a number of different kinds of results you might be looking for.
134
• Differences within people or groups. If you have repeated measurements for
individuals/groups over time, we can see if there are marked increases/decreases in the
intervention. When the effects are seen when and only when the intervention is introduced
– and if the intervention is staggered (delayed) across people or groups – this increases
our confidence that the intervention, and not something else, is producing the observed
effects.
• Differences between or among two or more groups. If you have one or more randomized
control groups in a formal study (groups that are drawn at random from the same
population as the group in your program, but are not getting the same program or
intervention, or are getting none at all), then the statistical significance of differences
between or among the groups should tell you whether your program has any more
influence on the dependent variable(s) than what’s experienced by the other groups.
comparison group, many statistical procedures can tell you whether changes in dependent
variables are truly significant (or not likely due to chance). These results may say nothing
about the causes of the change (or they may, depending on how you’ve structured your
evaluation), but they do tell you what’s happening, and give you a place to start.
Correlation between variables doesn’t tell you that one necessarily causes the other, but simply
those changes in one have a relationship to changes in the other. Among American teenagers,
for instance, there is probably a fairly high correlation between an increase in body size and an
135
understanding of algebra. This is not because one causes the other, but rather the result of the
fact that American schools tend to begin teaching algebra in the seventh, eighth, or ninth grades,
a time when many 12-, 13-, and 14-year-olds are naturally experiencing a growth spurt.
On the other hand, correlations can reveal important connections. A very high correlation
between, for instance, the use of a particular medication and the onset of depression might lead
to the withdrawal of that medication, or at least a study of its side effects, and increased
awareness and caution among doctors who prescribe it. A very high correlation between gang
membership and having a parent with a substance abuse problem may not reveal a direct cause-
and-effect
136
relationship, but may tell you something important about who is more at risk for substance
abuse.
• Correlations. Correlation means that there are connections between or among two or
more variables. Correlations can sometimes point to important relationships you might
not have predicted. Sometimes they can shed light on the issue itself, and sometimes on
the effects of a group’s cultural practices. In some cases, they can highlight potential
causes of an issue or condition, and thus pave the way for future interventions.
• Patterns. In both quantitative and qualitative information, patterns often emerge: certain
health conditions seem to cluster in particular geographical areas; people from a particular
group behave in similar ways; etc. These patterns may not be specifically what you were
looking for or expected to find, but they may either be important in themselves or shed
light on the areas you’re interested in. In some cases, you may need to subject them to
statistical procedures (regression analysis, for example) to see if, in fact, they’re random,
of your data and application of logic, some findings may stand out. If 70% of a group of
137
overweight participants in a healthy eating and physical activity program lowered their
weight and blood pressure significantly, compared to only 20% of a similar group not in
the program, you can probably assume that program may have been effective. If there’s
no change whatsoever in education outcomes after two years of your education program,
then you’re either running an ineffective program, or you’re simply not reaching those
who are most likely to have poorer outcomes (which can also be interpreted to mean
Not all important findings will necessarily tell you whether your program worked, or
what the most effective method is. It might be obvious from your data collection, for
instance, that, while violence or roadway injuries may not be seen as a problem
citywide, they are much higher in one or more particular areas, or that the rates of
diabetes are markedly higher for particular groups or those living in areas with greater
disparities of income. If you have the resources, it’s wise to look at the results of your
research in a number of different ways, both to find out how to improve your program,
Once you’ve organized your results and run them through whatever statistical or other analysis
you’ve planned for, it’s time to figure out what they mean for your evaluation. Probably the most
138
common question that evaluation research is directed toward is whether the program being
evaluated works or makes a difference. In research terms, that often translates to “What were the
effects of the independent variable (the program, intervention, etc.) on the dependent variable(s)
(the behavior, conditions, or other factors it was meant to change)?” There are a number of
• Your program had exactly the effects on the dependent variable(s) you expected and
hoped it would. Statistics or other analysis showed clear positive effects at a high level of
significance for the people in your program and – if you used a multiple-group design –
none, or far fewer, of the same effects for a similar control group and/or for a group that
received a different intervention with the same purpose. Your early childhood education
program, for instance, greatly increased development outcomes for children in the
in school.
• Your program had no effect. Your program produced no significant results on the
dependent variable, whether alone or compared to other groups. This would mean no
• Your program had a negative effect. For instance, intimate partner violence increased (or
at least appeared to) as a result of your intervention. (It is relatively common for reported
events, such as violence or injury, to increase when the intervention results in improved
• Your program had the effects you hoped for and other effects as well.
139
o These effects might be positive. Your youth violence prevention program, for
instance, might have resulted in greatly reduced violence among teens, and might
also have resulted in significantly improved academic performance for the kids
involved.
o These effects might be neutral. The same youth violence prevention program
o These effects might be negative. (These effects are usually called unintended
HIV/AIDS might lower rates of unprotected sex but might also increase conflict
and instances of partner violence. Your program had no effect or a negative effect
and other effects as well. As with programs with positive effects, these might be
If your analysis gives you a clear indication that what you’re doing is accomplishing your
purposes, interpretation is relatively simple: You should keep doing it, while trying out ways to
make it even more effective, or while aiming at other related issues as well.
If your analysis shows that your program is ineffective or negative, however – or, for that matter,
if a positive analysis leaves you wondering how to make your successful efforts still more
successful – interpretation becomes more complex. Are you using an absolutely wrong
140
approach? Are you using an approach that could be effective, but is poorly implement? Is there a
particular contributing factor you’re failing to take into account? Are there barriers to success –
population from which participants are drawn? Are there particular components or elements you
can change to make your program more effective, or should you start again from scratch? What
Careful and insightful interpretation of your data may allow you to answer questions like these.
You may be able to use correlations, for instance, to generate hypotheses about your results. If
positive or negative changes in particular variables are consistently associated with positive or
negative changes in other variables, the two may be connected. (The word “may” is important
here. The two may be connected, but they may not, or both may be related to a third variable
that you’re not aware of or that you consider trivial.) Such a connection can point the way
toward a factor (e.g., access to support) that is causing the changes in both variables, and that
must be addressed to make your program successful. Correlations may also indicate patterns in
your data, or may lead to an unexpected way of looking at the issue you’re addressing.
You can often use qualitative data to understand the meaning of an intervention, and people’s
reactions to the results. The observation that participants are continually suffering from a variety
of health problems may be traced, through qualitative data, to nutrition problems (due either to
Muslim women may be unwilling – or unable because of family prohibition – to accept care and
141
Once you have organized your data, both statistical results and anything that can’t be analyzed
statistically need to be analyzed logically. This may not give you convincing information but it
will almost undoubtedly give you some ideas to follow up on, and some indications of
connections and avenues you might not yet have considered. It will also show you some
additional results – people reacting differently than before to the program, for example. The
numbers can tell you whether there is change, but they can’t always tell you what causes it or
why (although they sometimes can), or why some people benefit while others don’t. Those are
Analyzing and interpreting the data you’ve collected brings you, in a sense, back to the
beginning. You can use the information you’ve gained to adjust and improve your program or
intervention, evaluate it again, and use that information to adjust and improve it further, for as
long as it runs. You have to keep up the process to ensure that you’re doing the best work you
can and encouraging changes in individuals, systems, and policies that make for a better and
healthier community.
You have to become a cultural detective to understand your initiative, and, in some ways, every
IN SUMMARY
The heart of evaluation research is gathering information about the program or intervention
you’re evaluating and analyzing it to determine what it tells you about the effectiveness of what
you’re doing, as well as about how you can maintain and improve that effectiveness.
142
Collecting quantitative data – information expressed in numbers – and subjecting it to a visual
inspection or formal statistical analysis can tell you whether your work is having the desired
effect, and may be able to tell you why or why not as well. It can also highlight connections
(correlations) among variables, and call attention to factors you may not have considered.
events, and circumstances – can provide insight into how participants experience the issue you’re
addressing, what barriers and advantages they experience, and what you might change or add to
Once you’ve gained the knowledge that your information provides, it’s time to start the process
again. Use what you’ve learned to continue to evaluate what you do by collecting and analyzing
143
Chapter 7
You’re evaluating your teen pregnancy prevention program, and you’d like to know whether it
will result in a reduction in pregnancy rates among young girls in the community. There are
statistics on pregnancy rates for the state and county collected by the Public Health Service, but
none at the community level, and none that separate the rates for girls under 16, the population
you’re most concerned with. You could do a community survey to try to find out the local rate,
but there are two problems with that idea. The first is that you have neither the time nor the
resources to conduct the survey, and the second is that you’re unlikely to get an accurate picture–
There might be another way of getting the information, however. A number of other agencies in
the community work with youth, and one of them, or a combination, might have the figures
you’re looking for. Rather than generating or collecting it yourself, you can save a great deal of
144
We have previously discussed ways to find existing information to help you conduct a
community assessment of assets and needs, but now our goal is somewhat different, because
we’re seeking data that you can use to evaluate your work. This means that it needs to be in a
form that can be analyzed, and may have to refer to a very specific population, issue, and/or
method. As a result, it may be harder to find, and may have to be converted in some way once
you do find it. In many situations, however, using existing information still can be much easier
than collecting the data yourself. In this section, we’ll try to help you make the use of archival
Archival data refer to information that already exists in someone else’s files. Originally generated
for reporting or research purposes, it’s often kept because of legal requirements, for reference, or
as an internal record. In general, because it’s the result of completed activities, it’s not subject to
Some researchers make a distinction between archival and secondary data. They see archival
data as information specifically collected for bureaucratic procedures and the like –
applications, reports, etc. – that can then be made usable for other purposes. Secondary data
refer to research information, collected as a result of studies and similar efforts that can then be
used by others either as comparison data or as part of new research. For the purposes of this
section, we’ll include both of these types of data in our discussion, and not distinguish between
them.
145
Archival data can exist almost anywhere that information is collected.
Some of the most common sources (we’ll look at these and others later in more detail) are:
• Research organizations
146
Schools and education departments
Archives are often stored as paper files or on electronic storage – computer disks, CDs, DVDs,
etc. – and may include photographs and audio and video recordings as well. It may also take the
course, may include various media and text, all in the same place.
Many organizations have archives so large that they store most of the material off-site, either
with a data storage firm, or in their own or a rented facility. Some archives are made available
As explained above, much of the data you’re likely to use for evaluation purposes will probably
be more focused than data you’d use for an assessment of the level of a problem. Evaluation
information would be more likely than assessment data to come in the form of study results, for
example, than as narrative history or original documents. There’s a good deal of overlap: census
data, for instance, could be used in both assessment and evaluation. In general, however, the
possibilities below would refer to the types of data available, including information for a
147
• Demographics of the population (e.g., age, education, income)
Behavior
There are sometimes good reasons for using original data, including that the information you
need just isn’t available elsewhere. Additionally, if a researcher collects original data, he or she
On the other hand, if the information you need, or something very close to it, already
exists, there are several good reasons to find and use it.
• It’s easier and less time-consuming than collecting all the data yourself. This is
probably the most obvious and most common reason for taking advantage of archival
data. Especially if you’re looking for a large amount of information or information about
a large group of people, you may be able to save yourself an enormous amount of time
• Archival data may have already been processed by people with more statistical
advanced degree (and often not even then), the chances are that you don’t have a
148
flawless grasp of data analysis. You can hire someone or find a volunteer to help you,
but if the hard work has already been done, it will make your work that much easier.
Even with raw data, the basic organization and preparation (transcription of
interviews, entry of numbers into a spreadsheet or specific software, etc.) may have
• It’s quite possible that you can find more information than you’d be able to gather
if you did it yourself. The archival data you find may be more sweeping or more
specific than what you’d be able to gather. It may involve more people than you’d be
able to,
• Archival data could touch on important areas you have not considered, or identify
patterns or relationships you wouldn’t have looked for In cases like these, the use of
pre-existing data might change your whole view of your work, and help bring you to a
• It may eliminate the need to correct for problems, such as improper sampling, lack of
• Archival data allows the possibility of looking at the effects of your work over time.
Is the change in your population part of a trend that seems to be reflected in data from a
similar population or the entire state or nation? You may not have the capacity to collect
149
data over a long enough period to answer such questions, but if the data already exist, it
• Archival data can make it possible for small organizations with limited resources to
simply have neither the money nor the personnel to gather large amounts of data – but,
150
WHEN SHOULD YOU COLLECT AND USE ARCHIVAL DATA?
• When it’s available. This is the key question. If you know the data exist and you can get
access, use it. If the data doesn’t exist, if finding it would take more time and effort than
it’s worth, or if you have no access to it, then it’s not possible.
• When it’s relevant. As with its availability, the relevance of the data to what you’re
trying to find out is a key issue. All the archival data in the world won’t do you any good
• When you don’t have the time and/or resources to collect the data yourself. Whether
it’s a matter of the size and scope of your organization, time pressure from a funder to
produce an evaluation, or some other factor, archival data may be the only source of the
• When it can inform your evaluation. There are large amounts of archival data
available almost everywhere. The mere fact that it exists doesn’t mean that it will do you
any good. You have to be selective about what you gather and use. Make sure it is
actually what you need, that it refers to the population and/or other elements of your
program that will make it truly useful to you. If not, you’ve not only wasted your effort,
but the resulting evaluation won’t give you a realistic picture of your work and how to
improve
it.
151
As you search out and collect archival data, there are several questions you should ask.
To answer this question, you first might think about what information you need for your
• Data on past participants. You may want to compare the results for current participants
with data on past participants, especially if you’ve changed your methods or the
Community-level indicators show trends for the community as a whole. You can often use them
to find out whether your efforts have had any effect in the community. If, for example, you are
conducting a program to reduce alcohol use among youth, one indicator of success might be a
reduction in the number of weekend and nighttime one-car crashes involving teens.
In many cases, you might choose community-level indicators according to the data that’s
available. For example, if there are reliable figures for exactly the kinds of car crashes
described, then using those figures as a community-level indicator would probably make sense.
with. You may want to see how well various characteristics of your participants match
those of the general population or you may simply want to understand the context of the
evaluation better. You may also be looking for information to choose community-level
152
determine success of an initiative or intervention in the community; examples of
153
Both general and specific information might include several categories to choose from. These
categories include:
• Demographics:
• Behavior:
154
o Incidence (new cases) and prevalence (existing cases) of specific health conditions
o Development outcomes (e.g. those completing primary education, high school; those with
disabilities)
o General health and well being characteristics of the population or community Access to health
and human services (e.g. those with access to clean water and sanitation
o Knowledge and awareness of issues (e.g. survey data on public concern with violence)
culture. If you can see whether and where your group’s efforts are fitting with
participants’ cultures, it will help you to determine whether that’s an issue, and where
155
• Data on a similar group that can be used as a control or comparison. This might be a
group from the same population that signed up for but did not experience the program, or
• Results of previous studies. You’d probably be most interested in studies that looked at
the same issue and population group you’re addressing. These can provide a standard of
comparison, as well as some sense of what kinds of results might be reasonable to
expect.
156
Suppression of Statistics
An issue that you should be aware of and prepared to encounter during your research is suppression
of statistics. When research deals with small populations or data pools, in order to protect the
privacy of individuals, it is sometimes necessary to suppress data. In other words, when the number
of cases in a category - i.e., females in Wyandotte County who died from lung cancer in 2004 - is
small enough that disclosing the data might allow a specific individual to be identified, steps are
The most common method of preventing the identification of specific individuals is through cell
suppression. This means not providing counts in individual cells where doing so would potentially
allow identification of a specific person. Cell suppression can also be done by combining cells from
different small groups to create larger groupings that reduce the risk of identifying individuals.
The table below, from the Kansas Department of Health and Environment's Bureau of
Epidemiology and Public Health Informatics, shows a break-down by race of deaths due to chronic
liver disease and cirrhosis in Wyandotte County, KS in 2010. Because the numbers for
AfricanAmerican and Other are small enough that it might be possible to identify individuals from
those statistics, the data is suppressed, as indicated by the #.
Death Statistics for Chronic liver disease & cirrhosis for Wyandotte County, 2010
14 # # 20
157
# Indicates numbers below 6 t
The National Association of County and City Health Officials (NACCHO) has a useful tip shee
that explores this and other challenges of data collection and analysis in jurisdictions with small
s in
In addition to the question of confidentiality, low numbers in a given category can also be an iss
when considering the stability of data. In other words, when there are low numbers or incidence
the data you are researching, it is more difficult to accurately calculate rates and it can give an
inaccurate picture of the categories you are researching. For instance, if the number of lung cancer
deaths in 2004 was 20, and in 2005 it was 30, statistically that is a 50% rise over one year, whic
quite a substantial fluctuation; however, it may be that it is simply a normal variation in reportinh is
Because the numbers reported are so small, even minor changes can seem substantial, and this c
g.
result in unreliable or unstable data. The table below, from the Kansas Department of Health an
an d
Environment's Bureau of Epidemiology and Public Health Informatics, shows a break-down by
race
of deaths due to breast cancer in 2010 in Wyandotte County, KS. Because the numbers availabl
e
White, African-American, and Other are too small to allow for an accurate, reliable calculation
for
the rates for that year, the information is suppressed, indicated by the @.@ symbols.
of
Number 15 6 # 23
158
Rate @.@ @.@ @.@ 15.5
However, there are a couple of strategies that can be used to help avoid or address these problems
of instability.
One way to increase the reliability of data where you are dealing with small data sets is to combine
multi-year data (for instance, results of cancer deaths in a community for three years instead of
one). A drawback to this option is that looking at multi-year data limits the ability to monitor
program interventions and identify new trends. Rolling year averages (e.g., looking at data for
1997-2000 one year, and 1998-2001 the following year) may overcome this drawback and should
Another way to decrease the possibility of statistical instability is to expand the geographic area
you are investigating by looking at regional health assessments conducted by collaborating
neighboring jurisdictions, or in the example above, expanding from county to state. A drawback to
this option is that you may then be examining results for a geographical area that does not
necessarily apply to your assessment. Analyzing data at the regional level may also mask
interesting local variations in the data.
159
In some cases, you might know for certain that it exists; in others, you’ll have to search
Public records.
Government records at all levels – including federal state, county, and local. Copies of
• Census Bureau. In most developed countries, the census covers a broad range of
• Federal and state departments and ministries. From environmental data to farming
practices and subsidies to poverty statistics to public health issues, the federal
• Various levels of the court system. In the U.S., where civil and criminal trials and their
• Police records. Arrests, domestic disputes, injury reports, and other information can be
• Securities Exchange Commission and other business regulators. The SEC and other
160
• County commissions, agencies, and authorities. County Extension Services in the U.S.
Sometimes, government agencies are reluctant to share information, even though it’s public.
The Freedom of Information Act (FOIA) deals with this issue in the U.S. It allows for access
to a wide range of federal government records. Similar laws at the state level do the same for
state documents.
Some of these organizations aren’t, and don’t pretend to be, politically neutral. They have
agendas, conservative or liberal, and some of them interpret their research in light of those
agendas. It’s important to be aware of the bias of any archival data that you use if you want
reliable data. However, many organizations with a political stance nonetheless try to make
• Academia. Much research in health, human services, social issues, education, the
them. This includes theses and dissertations for advanced degrees, as well as the results
of funded research web search engines, such as Google scholar, can help locate research
information.
161
• News media. Newspapers, magazines, and radio and TV outlets all keep archives, often
going back to the founding of the publication or station. These are often available to the
public – sometimes on line – either free or for a fee. Although they are unlikely to
contain detailed study results, they often have summaries of important studies, and may
serve to point you in the right direction to find what you need.
• Foundations and other private funders. These organizations fund studies of all kinds,
and many publish or otherwise make available the results as a condition of funding.
• Hospitals and other health care providers are sometimes university-related, and may
conduct studies of various health issues. They also may collect, as an administrative
• Mental health providers may have data on particular types of conditions, or on who is
cover such areas as demographics and the location and character of community issues.
Depending on its nature, some of the research carried out or administrative data gathered by
universities, health and mental health providers, and human service organizations may have
some restrictions on them because of confidentiality. These restrictions usually only cover
access to individual records and identification of study participants, and generally don’t pose a
162
barrier to obtaining aggregate results of studies, assessments, or surveys with no identification
of individuals.
• Advocates and watchdog organizations may collect data (either locally, statewide, or
anything that pertains to their causes – and they’re usually willing to share it.
• Community activists. These folks tend to focus on specific issues, but if your issues are
similar to theirs, they may have a great deal of information that’s useful to you.
landuse maps and patterns (perhaps including population distribution by race, ethnicity,
age, etc.), environmental information, and other similar material you might find useful.
• Businesses and corporations, particularly large ones, often collect information on their
The question here is not only where to find archival information, but where to find it most
quickly and easily. Some of this material will be published, some only available from the
organizations that collected it. Looking in the right place first can save you a lot of time and
trouble.
163
Unless it’s brand new, your own organization should have an archive of administrative records,
past evaluations, assessments, and other data that might be helpful to you. Don’t ignore this
The Internet
Most public documents are either on the Web or can be found and/or ordered through a
website. The place to start is usually the website of the government agency most likely to have
collected the data. The Resources portion of this section contains a list of U.S. government
websites. In the U.S., states and most cities and towns have websites as well, with links to state
or municipal agencies and departments. (The URL’s for all state websites take the same form:
http:// www .[state abbreviation].gov. Municipal websites can easily be found by searching the
name and state of the community.) States or provinces and communities in most of the
Many of the other sources of information mentioned above are likely to have websites also.
Whether their data is available on those sites is another matter, and depends to some extent on
what kind of information you’re seeking. Watchdog organizations and some think tanks are
likely to post at least some of the results of their research on websites because they want it to be
informative websites, since they’re trying to attract businesses and residents to an area.
Health providers and academics, on the other hand, may post their research on a website, but
only after it’s been published in a journal or book, or presented at a conference. That means that
you’re not apt to find very recent data (from the past year, for example). Local health and
164
human service providers and schools rarely conduct formal research, and rarely post any
administrative data on their websites, for two reasons: confidentiality, to which we’ve already
referred, and the fact that most of that data are intended for internal use, and therefore not seen
as useful to anyone outside the organization. Business websites generally include material only
of interest to potential customers. Community activists may or may not have websites at all.
As always when using the Internet, you should be cautious about where you find your
information. There are enormous numbers of reliable websites...and huge numbers of unreliable
ones as well. If you’re not sure of a website or of the information you get from it, try to find
that information elsewhere as well. In general, you can rely on websites when you know where
they get their information, and when you trust the reputation and integrity of the site’s owner.
Often, the best way to find information from health and human service organizations, schools,
and businesses, as well as from advocates and community activists, is to go to them directly. If
you do, be prepared to explain exactly what you’re looking for, what you plan to use it for, and
what you can offer in return. Unless the organization is willing to let you comb through its files
– confidentiality is often a barrier to that – someone will have to spend some time finding what
you need. It’s only fair to offer something in return, whether it’s payment, data analysis
services, advocacy for the cause, or something else the other organization needs.
If you’re asking another similar organization for data so you can use it as a comparison or
control group, the request has to be extremely tactful. In a sense, you’ll be telling the staff of
that organization that you expect your results to be better than theirs. Depending upon how they
165
see their work – and how they perceive you and your organization – they may take this as an
opportunity to find better methods to serve their participants, or as a grave insult. If it’s the
latter, they’re hardly likely to agree to the use of their data. You’ll have to frame the request in
the right way, and offer a good exchange as well. It will help if you’re dealing with an
organization with which you already have a good relationship of mutual respect.
Libraries
Librarians have always been world-class experts at finding what library users needed. With
current technology, they’ve become even better. Many have an encyclopedic knowledge of not
only what’s available in the library itself, but what’s on the web as well. They may be familiar
with sources of archival information you’d never think of, and be able to help you find what you
need quickly and with minimum effort. When in doubt, head to (or communicate with) an
available library.
WHAT ARE YOU PLANNING TO DO WITH THE DATA ONCE YOU HAVE IT?
This question has to do not only with what form you need the data in, but also just what data
you actually need. If you’re planning to use it as a comparison to the group participating in the
program you’re evaluating – whether as a formal control group or as baseline data – you’ll need
information on the variables you’re planning to look at, as they relate to the population you’re
working with, or at least a population that’s reasonably similar. If, for example, you’re
evaluating a chronic disease prevention program intended to benefit Latinos, and you’ve found
archival data on physical activity and nutrition among Native Americans, you can’t compare
your results with those of the archival data because the groups are likely to be too different.
166
If you’re planning to subject your data to statistical analysis, you’ll want information that either
is, or can be made, quantitative. If the information you’re collecting on your participants is
largely qualitative, then the archival data should be qualitative as well. Furthermore, the
information you get either should determine or should match the way you collect your own data,
It’s difficult to imagine evaluating a program or approach without actually collecting your own
data on participants. You might be able to find data on those participants from an earlier time,
which you can then use as a baseline. You might be able to find appropriate data on a similar
group that you can use as a comparison or control. But you can’t find data elsewhere on what
those participants are currently experiencing, and that’s what you’re evaluating in almost every
case.
The “almost” here refers to a situation where you’re evaluating a program in retrospect –
looking back at it after it’s underway or been completed. It may be possible in that case to find
archival data that will allow you to determine the program’s effectiveness in terms of process,
outcomes, or both.
Although you’ll probably collect information on the participants in the program you’re
evaluating, there are a number of ways you might use archival data:
167
• To better understand the context of your evaluation. These might be ethnographic data
(see Section 6 of this chapter), oral histories, assessment information, interviews, etc.
You’d use it to get a clearer picture of the community in a number of ways, and to help
you interpret the results of your evaluation. It might, for instance, give you insight into
why a particular approach did or didn’t work, or why some participants stayed in the
• To identify areas to address. Along with a clearer picture of the community goes a
• To establish a baseline against which to measure your results. For this purpose, you’d
need recent information about where the population you’re working with stands on the
dependent variables or outcomes you’re concerned with. That would tell you where the
participants started from (on average), so that you could see from the measures you used
in your evaluation whether and how much they might have improved as a result of your
work.
There are two kinds of variables (things that may change) in research. An independent
set up by the researcher to see if it will create change and improvement. A dependent variable
is a behavior, condition, or other element that may change as a result of the independent
variable. A violence prevention program, for example, is an independent variable that may
change community members’ engagement in violent behavior and associated injuries, the
dependent variables.
168
• To identify already-existing trends that may affect the results of your evaluation study.
The fact that there’s been a change in participants between the beginning and end of your
evaluation doesn’t necessarily mean that you’ve caused it. Among other things, it may
be part of an ongoing trend toward change that started well before your program did, and
may continue after it. Archival data might show such a trend over a number of measures
You might find, for example, that even though community-level indicators moved in the right
direction – the sale of tobacco products went down, say – they still compared unfavorably with
the state or national averages for the same indicators. That knowledge might be important in
future goal-setting and in using your evaluation results to gain community support or funding.
• To establish a standard of comparison against which to measure your efforts. There are
two ways that you could use archival data for this purpose. One is to use census,
statewide, and/or community-wide data to compare with that of the population you’re
working with. That comparison can give you a sense of how serious the issue is for your
group, compared to the general public. The second way is to use similar data to compare
your outcomes with the data on the larger population. This might work especially well
when you’re using community-level indicators (e.g., rate of injuries, percentage of girls
• To act as a control or comparison group. One of the best ways to learn whether or not
your program had an effect is to compare the participants you’re working with to those
169
in another group that received no program or a different one. The best alternative here is
to create a group from the same population as participants – so that all participants will
have approximately the same background, environmental influences, cultural norms, etc.
– and to conduct the same observations on both groups at the same times, so that the
only difference between them is the program that one of them is exposed to. In practice,
creating or finding a perfect control group is often difficult. Archival data may be able
Often, the most likely possibility is a group that was part of another program with the same
goal as yours, but using different methods. This has the advantage not only of providing a
control, but of letting you infer whether your approach works as well as, not as well as, or
• To provide data for a longitudinal study. If you think your program might have a
longterm effect, or if you think it will interact with the effects of past events,
circumstances, or programs, you might want to conduct a longitudinal study – one that
looks at participants over a longer period of time – for your evaluation. You may not
have the time or resources to collect data over a period of years, but you may be able to
find archival information that allows you to draw some conclusions about long-term
effects.
There are at least two circumstances where you might be able to use archival data for a
longitudinal perspective. The first is one in which you’re looking at the effect of an issue on
170
the population for a length of time before your program began. This might make it easier to see
program results in context, and to understand whether the program broke a cycle and started
real change. The second circumstance is when you’re looking back at the effects of a program
that was completed some time ago. In some circumstances, the effects of a program multiply
or accelerate over time. Particularly if your program was aimed at changes throughout the
community (reducing intimate partner violence, for instance), you may be able to find archival
data that tells you whether the effects of your program continued, kept growing, or trailed off
IN SUMMARY
Most government agencies and departments, community-based health and human service
providers, advocacy organizations, universities, and many other entities keep archival records of
information. You may be able to use these as part of the data for your evaluation, saving time
and trouble. Especially for small organizations with limited resources, the use of archival data
can make it possible to produce an evaluation that provides the information needed to accurately
assess a program’s effectiveness and make the changes necessary to improve it.
171
Chapter 8
RESEARCH
activity among those with higher risk for heart disease. The evaluation showed mixed results. A
small number of participants (15%) had very good outcomes. They had marked increases in
physical activity and improved nutrition. Their fitness improved and they lost weight. As
predicted, their blood pressure dropped, their pulse rates went down, and they reported feeling
more energized.
They reported high levels of satisfaction with the program and results.
172
A large majority of the original group (70%) exercised, but not as regularly as hoped. The health
benefits for this group varied, with several reducing blood pressure at least slightly, and the rest
A final group (15%) consisted of dropouts – several participants left the program, most within a
short time – and other people who simply never managed to exercise on any schedule at all.
There was virtually no change in their weight, blood pressure, or sense of well-being...except
What could the Community Health Center do with these results? It knew that, while the
intervention apparently worked if people stuck with it, the program was only partially
successful. How could it use the evaluation to improve the program, and so improve the health
of those it served?
This chapter so far has discussed the elements of conducting a research-based evaluation. But
evaluation itself is only a means to an end: a tool to help you see what is happening so you can
improve the effectiveness of your work. In this section, we’ll examine how you can use your
Some key reflection questions that you and your group might consider:
173
• What are we seeing? (e.g., amount and kind of activities implemented; results shown –
• What does it mean? (e.g., was the introduction of the intervention associated with
changes)
• What are the implications for improvement? (e.g., do the results suggest that the
The reflection questions you ask will depend on the nature of your intervention, but the above
set of questions is a good starting point. Consider holding a meeting or brief retreat where the
evaluation results can be presented through graphs and charts, and key questions can be
discussed. Such a meeting might benefit from an experienced facilitator to keep the process
Refining the intervention is the process of making your work more effective by using data
Depending on what you’ve learned from this data, you might want to:
174
To continue with our example from above, the Community Health Center staff and selected
participants met to review the results. They felt that the evaluation had shown that if people
exercised regularly, they could lower their blood pressure, lose weight, and improve their overall
health.
A key implication of the findings was how to help people establish and stay with an exercise
routine.
Further dialogue about results of the evaluation left the Community Health Center with additional
questions:
• How can we increase the number of participants who actually adopt and continue regular
• Why did some people who didn’t exercise regularly reduce their blood pressure, and should
• What other factors, if any, besides exercise seem to help participants exercise regularly and
By focusing on the key reflection questions – What are we seeing? What does it mean? What are
the implications for improvement? – The center should be able to refine their program to get even
better results for more participants.
175
It will be important for you to meet with other members of your group to review the data,
identify key areas for improvement, and brainstorm and come to consensus on how to address
issues that have been raised. Careful attention to your evaluation results can help inform which
INTERVENTION?
Refining the intervention is the primary purpose of an evaluation. If you find out that your
intervention wasn’t effective, you have three choices: you can quit; you can blindly try another
approach; or you can use your evaluation research to guide you towards a more effective
intervention.
Using evaluation results is vital: it points you in the direction that your research tells you is apt
to be most helpful. Using research to help you choose your course of action also establishes you
as a credible and practical organization, one that’s concerned with what works. That kind of
reputation is likely to increase your opportunities for getting funding and other resources, and to
gain and sustain your community support. Most importantly, it helps the group succeed in
The short answer to this question is “constantly.” Monitoring and evaluation should go on
throughout the life of the program or project, and should be used to adapt and adjust what you
do on an ongoing basis. In practical terms, it’s wise to reevaluate your work regularly – once a
year is typical – and make any major changes at that time. Of course, you can and should make
176
minor adjustments throughout the year, based on your monitoring and on feedback from
There are, in addition, some specific times when adjusting your work can be especially
helpful:
• When what you’re doing isn’t working. If it’s obvious that your work isn’t having the
Make sure that you allow enough time for a program or intervention to have an effect before
you make a judgment that it isn’t working. Nothing happens overnight, and the more difficult
the issue you’re addressing, the longer it’s likely to take to influence intended outcomes. You
have to walk a line between cutting a program off before it’s had time to work and letting it go
• When participants are dropping out at a high rate. What are you doing – or what are
the external factors – that might be causing participants to leave your program? How can
you change the intervention to assure that people experience it long enough for them to
benefit?
exercise program used as an example – are only designed to run for a limited period, but
may run again and again, with new participants each time. If such a program is
continually evaluated, you’ll get – and should use – information each time that will help
177
• When funders or participants ask you to adjust some aspect(s) of your program.
• When funding or other resources are reduced. You may be faced with eliminating
parts of your program, cutting numbers of participants, or other unpleasant choices. Your
evaluation research can help you find the best way to make cuts without losing your
effectiveness, by keeping intact the elements of the program that make the most
difference.
• When the issue or goal changes. Sometimes there is a shift in priority issues for the
community following a rise in unemployment or violence. Your research can tell you
The best plan here is to involve a number of stakeholders, depending to some extent on who has
• Participants. These are the folks who experience both the intervention itself and its
effects, and they are likely to have ideas about what would make it better, easier for them
to participate, or more relevant for them. Participants should be your partners in refining
programs and interventions, since they have an inside perspective on whether they are
working.
178
• Staff members, paid or volunteer. Like participants, staff members have a unique
perspective on the intervention. Not only do they see the way it works every day, but
they’ll also have to carry out any changes. If they can claim ownership of those changes
by participating in the planning process for them, they’re far more likely to understand
• People who are directly or indirectly involved in supporting the work. Depending
upon the nature of your issue, these might include educators, government officials,
make any changes successful, it’s important that they have input into the planning of
those changes. They’ll need to understand and support them if the adjusted intervention
is to go well.
• Those who led and participated in the evaluation. They’ll have a good handle on what
the evaluation showed, and a grasp of what might need changing and how.
For example, the Community Health Center put together a team to look at the evaluation results
and make some recommendations for changes in the program. The team included a variety of
participants who had experienced different outcomes, a health care provider, a Center board
member, and a staff member from the university that conducted the evaluation. They went over
some of the research that the Center had used in developing the program, and carefully studied
participant interviews and other evaluation material, as well as the records kept by program
staff.
179
Changes in interventions should be focused on one or more of the three aspects of evaluation:
Process (both your process – activities implemented, doing what you intended, etc. – and
participants’ process – what did they actually do?), impact, and outcomes. You have to
examine each of these separately, and ultimately integrate them to decide what adjustments you
Each aspect of the evaluation builds on what comes before. In order to have the impact you
want, you have to put together and run your program well, and that’s a matter of process. If your
process didn’t go properly, then you haven’t really conducted the program you planned for. If
you didn’t get the impact you hoped for, it may be due to the fact that you simply didn’t do what
you planned, and the first adjustments should be to the process, to ensure that the intervention is
implemented as intended.
Similarly, to get the outcomes you intend, the program has to have an impact on the appropriate
risk and protective factors or other environmental conditions. If the program had the impact you
envisioned, but not the outcomes, then adjustments need to take place at the impact level,
perhaps in the risk and protective factors and/or conditions that influence outcomes.
PROCESS
An evaluation of the process of your effort compares what you planned to do with what you
actually did.
Process has a number of elements to which evaluation might be applied. They encompass both
logistics (the handling of details, such as finding space and buying materials) and program
180
These elements can include:
• Community participation. Were you able to involve members and sectors of the
community that you intended to? Were you able to make good contacts and establish
• Community assessment. Did you conduct an assessment of the situation in the way you
• Program planning. Was the planning participatory? Did it include research into best
practices and successful interventions? Did it result in an approach that everyone felt
would work?
• Staff hiring and/or volunteer recruitment. Did you hire staff and/or recruit volunteers
181
Staff and/or volunteer training. Were staff and/or volunteers oriented and trained
before they started, so that they knew what they were doing when they began work? Was
engage those from the groups intended? Were you able to recruit the number and type of
participants intended?
• Implementation strategy. Here, you’re determining both what you actually did in
implementing the program, and what participants actually did. Did you structure the
program as planned? Did you use the methods you intended to? Did you arrange the
amount and intensity of services, other activities, or conditions as intended? Did you
obtain and use the materials and equipment you expected to? Did relationships develop
• Evaluation strategy. Did you conduct the evaluation as planned? Did you gather data
• Timelines and benchmarks. Did you complete or start each of these elements in the
time you planned for? Did you complete key milestones or accomplishments as planned?
If all or most things went as planned, and any that didn’t were trivial, you’ve essentially
done what you set out to do. If they didn’t, there are a number of possible reasons for changes
182
• It took more time than you expected to complete one or more important tasks (finding
It was harder than you expected to accomplish a particular task. This may be a matter of
time spent, but it may also mean that you simply didn’t have the skills or personnel to do
• Something you had good reason to expect didn’t happen (e.g., funding or support that
• Someone or some organization you depended on didn’t come through (e.g., a hired staff
member became ill and did not finish the work on time)
• Partway through, you found that the methods you had planned didn’t work well, and you
• A funder or community advisory board asked you to change some of what you were
doing
• Partway through, you became aware of a new method that seemed to be extremely
• You discovered a more successful way of doing things in the course of the work, and
adopted it
183
• You underestimated the resources necessary to carry out your original plan, and had to
• You encountered disaster (e.g., the site burned down, the program coordinator became
• You simply didn’t pay attention to following the plan, and/or didn’t do your job as an
organization
The deviation from your plan may have had made very little difference at all, or it may have
made all the difference. Some differences might be positive – a delay might make it possible to
find a more stable funding source; a change in method might make for a more effective program
– but they’re still differences. It’s worthwhile to understand what changed to make sense of the
184
185
Perhaps you implemented your process according to plan, and your program ran as intended. By
contrast, the process may have been filled with difficulties – opposition, no community support,
difficulty recruiting participants, missed deadlines. Does that mean that all the work you put
It’s likely that the answer is the opposite. If you were able to carry off implementing a program
regardless of the fact that your plans were disrupted, it’s a good bet that the clear vision of what
Taking a close look at how you managed to overcome the obstacles in your way will help you
understand how to avoid them in the future. (Avoiding all obstacles is unusual in any
community work. The key is learning how to anticipate and overcome them.)
It’s also possible that the process leading up to the program went as planned, but the
implementation didn’t turn out as expected. In that case, it was probably your plan that was at
fault.
• You didn’t assess the situations or take some important aspects of preparation into
account
• You didn’t properly understand some aspect(s) of what you had to do to be successful
• You didn’t properly communicate some aspect(s) of what you had to do to staff,
• You underestimated the amount of money or other resources you would need
186
• You didn’t have proper fiscal control
• You ignored something important (treating participants with respect, for instance)
• You didn’t factor in enough time for some aspect(s) of what you had to do (i.e., you
planned for a given time period and carried that out, but it was too short)
• You didn’t provide some important support for participants (travel, child care, stipend),
Finding out why your plan didn’t produce the intervention you expected can be helpful.
Understanding what you need to plan for, and how to do it, can make your future work both
more efficient and more effective.
187
IMPACT
Your program or initiative’s impact is the effect it had on the environmental conditions, events, or
In most – but not all – cases, the immediate impact of the program is not the same as the eventual
intended results. Generally, a program aims only to influence one or more particular behaviors or
conditions – risk or protective factors. The assumption is that such influence will then lead to a
The intended impact of the Health Center’s exercise program, for example, is the adoption by
participants’ regular exercise, a protective factor in reducing risk for chronic diseases. The goals
of the program, however, are actually better heart health, and, ultimately, a longer and
higherquality life. Impact is the intermediate step – the influence you have on a behavior or other
Your process might have gone perfectly – you might have done exactly what you set out to do –
and might still have had no impact on the risk and protective factors you targeted. By the same
token, you may have ended up running a program markedly different from the one you planned,
and still have had the impact you hoped for. The results of the process evaluation will tell you
how closely you stuck to your plan in setting up and running your program. The results of your
impact evaluation will tell you whether your program made the changes or intended results.
188
In all these cases, evaluation should involve feedback from both participants who had good
results and those who didn’t. What worked particularly well for those who had success?
What were barriers to those for whom the program didn’t work well? It’s not always easy to
get participants to describe the positives and negatives – but it’s the best way to find out.
Your program worked as you planned if the behaviors and risk and/or protective factors changed
in the ways you intended. The big question that remains in this case is whether the changes your
program influenced led to the ultimate outcomes you were working toward. We’ll consider that
If your program actually had a negative impact on the targeted behaviors or risk and/or
protective factors – the intervention aimed to increase childhood immunizations, and fewer
children were immunized, for example – it is important to look more deeply into what is
happening.
Some possibilities:
• You underestimated or ignored cultural influences that were powerful enough that your
• You didn’t take into account cultural influences in participants’ lives that made it difficult
to achieve intended results - these factors could include poverty, or competing demands
189
• The cultural incompetence of the organization or some staff members worked against your
goals
• The structure and/or methods of the program led to unanticipated negative consequences
• The program was seen by participants as something that was being imposed on them, and
Just as you might find that your process went well and your program still didn’t influence the
risk and protective factors you meant to, it’s possible that you created exactly the changes you
intended in risk and protective factors, and the program still didn’t achieve the outcomes
OUTCOMES
The outcomes of an intervention are the changes that actually took place as a result of it. The goal
of an intervention is usually not just a change in behavior or circumstances, but the changes in
community health and development that occur as a result of that immediate change. A tobacco
control program, for instance, aims to help participants avoid or quit smoking: that’s its impact.
Its real goals – the hoped-for outcomes of the program – are reduced rates of heart disease, lung
cancer, and other smoking-related diseases for participants and their family members.
The ultimate outcomes may take years to assess, but others – like the blood pressure goals of the
Health Center exercise program, or the results of a job training course – can be determined at or
soon after the end of the intervention. Outcomes are the true measure of the success of the
intervention, because they are the reason it was conducted in the first place. However, the impact
190
made – such as changes in community programs of policies– can be an important intermediate
If the program produced the outcomes you intended, congratulations: you’ve achieved the goals
of your effort. This isn’t the time to consider your work complete, however. How can you make
• Can you expand or strengthen parts of the program that worked particularly well?
• Are there evidence-based methods or best practices out there that could make your work
• Would targeting more or different behaviors or risk and protective factors lead to greater
success?
• How can you reach people who dropped out early or who didn’t really benefit from your
work?
• How can you improve your outreach? Are there marginalized or other groups you’re not
reaching?
• Can you add services – either directly aimed at program outcomes or related services such
• Can you improve the efficiency of your process, saving time and/or money without
191
Good interventions are dynamic: they keep changing and experimenting, always reaching for
If the intervention produced only some, or some lower level, of the desired outcomes, you may be
headed in the right direction. The program may also have greater effects in the long run, as
participants incorporate the changes they’ve made into their everyday lives.
Some possible reasons for the program’s effect not being as great as planned:
• The program’s message didn’t reach participants or speak to them in a powerful way
• There were intervening factors – attendance or lack of support services – that made the
• The program didn’t approach participants in the right way – it was too formal, the
192
• There were conflicts among participants or between participants and staff
For example, let's say that the Health Center’s exercise program wasn’t by any means a failure,
but it was only modestly successful. How might the Health Center use its evaluation information
First, the Center could examine what participants said about the program. What enabled the
members of the most successful group to exercise? Why weren’t members of the much larger
group able to establish regular effective exercise routines? And for members of the third group –
those who didn’t exercise at all or dropped out quickly – what might have gotten them more
motivated?
Perhaps those in the first group attended all the sessions and found exercise partners who
challenged one another to do a little more (or to eat a little better.) Perhaps those in the other
Based on the evaluation, the program’s designers decided that they should arrange for exercise
partners or groups for everyone. It seemed from the evaluation that both the social situation and
the challenge that exercising with others presented made exercise more likely and more fun, and
promoted a more vigorous workout. They also decided to develop a much more formal nutrition
component to the program, and to incorporate a buddy system into that component as well, in the
hopes that participants could help one another develop recipes and stick to a reasonable eating
plan.
If the program produced no outcomes at all, you may have to make big changes.
193
It can be very difficult to admit that you’ve been taking the wrong direction, especially after
investing a lot of time and effort in planning and implementing a program. It’s tempting to
believe that if you just work harder, or recruit different participants, or use better materials,
you’ll get the results you want. It takes courage conclude that the results call for a major redesign
in the effort.
The program may have produced unintended outcomes, either positive or negative. If they’re
positive, you might want to understand how they came about so that you can continue to produce
them. If they’re negative, you’ll probably want to learn more so you can seek to eliminate them.
Most of the reasons for unintended outcomes are similar to those for lack of outcomes.
A positive unintended outcome in a youth violence prevention program, for example, might be
better school performance; a negative example in the same program might be an increase in
school dropout. Teens in the program might improve their school performance because they
admire a staff member with college education, and want either to be like him or to impress him.
Or they may see college and an escape from the neighborhood as their best way out of the cycle
of violence.
Those who drop out of school as a result of the program may also do so because they see it as a
way to avoid violence: school – or the trip to and from school – may be especially dangerous
because of the presence of youth from other neighborhoods or rival gangs. Conversely, they may
see dropping out of school in favor of work as a non-violent road to financial success, as opposed
194
Given all this, how do you approach your evaluation research to decide what you need to refine
and how? A good general approach is to work backward from outcomes – asking “but why?” –
Regarding why each previous phase failed to produce the results you wanted.
• Examine the outcomes. If your intervention achieved the intended outcomes, it has done
its job. Now you can consider how to maintain these effects or refine your program (see
above). You should still examine the results for process and impact, and make changes
where they’ll gain you greater effectiveness or efficiency. But chances are the program
doesn’t need major changes, unless you want to enlarge your goals, or unless you’ve
found an alternative approach that could lead to even more impressive outcomes.
• Examine the impact. If your evaluation research shows no outcomes, or outcomes that fall
short of what you intended, the next area to examine is the impact of your program on the
If the program had the impact you expected, but no outcomes, perhaps you’ve chosen the wrong
behaviors or factors to target, and need to rethink your problem analysis and related intervention.
There are other plausible explanations: your intervention wasn’t in place long enough, the effects
are delayed, your measures are insensitive to what is being achieved, etc.
• Examine the process. The next step here is to understand how well you planned, prepared
for, and implemented your intervention. If the reasoning and assumptions behind your
planning were accurate, and if you set up and implemented your program based on them,
195
you should have the impact you were aiming for, and that impact should lead to the
outcomes you intended. If your program didn’t go as planned, that could be a good part,
if not all, of the reason for your lack of outcomes. Your process evaluation can show you
where you need to adjust and improve your implementation to have a better chance to get
If your program did go as planned – you met your deadlines and did what you intended to do in
the way you intended to do it – and you failed to achieve your goals, there’s a good chance that
your planning was the problem. You may have aimed at insufficient risk and/or protective
factors, as mentioned above, or you may have chosen ineffective methods to influence the right
ones...or both. There are other possibilities that could be picked up by a process evaluation as
well, many of which have already been suggested – treatment of participants, language or other
communication issues, lack of cultural competence, etc. Identifying and correcting such
• Keep making adjustments. Make your adjustments and refinements, run and evaluate the
intervention, and make further adjustments and refinements to improve your work. This
IN SUMMARY
The purpose of an evaluation and the research that goes into it is not just to tell you whether or
not your intervention has been a success. The real value of evaluation research lies in its ability
to help you identify and correct problems – as well as to celebrate progress. Evaluation can
196
pinpoint the strengths of your program, and help you to protect and enhance those strengths and
By examining the three elements of an intervention – process, impact, and outcomes – your
evaluation can tell you whether you did what you had planned; whether what you did had the
influence you expected on the behaviors and factors you intended to influence; and whether the
changes in those factors led to the intended outcomes. That knowledge can show you what you
might change to improve your program, as well as the overall effectiveness of the intervention.
And, the information can be used to celebrate the accomplishments you are making along the
way.
197
ASSIGNMENT:
2.Using Archival data has its own bottlenecks. Name five and explain how to overcome them.
3.why is research important component in monitoring and evaluation? Give and explain four.
198
199