Program Evaluation Studies

TK Logan and David Royse

A variety of programs have been developed to address social problems such as drug addiction, homelessness, child abuse, domestic violence, illiteracy, and poverty. The goals of these programs may include directly addressing the problem origin or moderating the effects of these problems on indi-viduals, families, and communities. Sometimes programs are developed

to prevent something from happening such as drug use, sexual assault, or crime. These kinds of problems and programs to help people are often what allracts many

social workers to the profession; we want to be part of the mechanism through which society provides assistance to those most in need. Despite low wages, bureaucratic red tape, and routinely uncooperative clients, we tirelessly provide services tha t are invaluable but also at various Limes may be or become insufficient or inappropriate. But without conducting eva luation, we do not know whether our programs are helping or hurting, that is, whether they only postpone the hunt for real solutions or truly construct new futures for our clients. This chapter provides an overview of program evaluation in gen -eral and outlines the primary considerations in designing program evaluations.

Evaluation can be done informally or formally. We are constantly, as consumers, infor-mally evaluating products, services, and in formation. For example, we may choose not to return to a store or an agency again if we did not evaluate the experience as pleasant. Similarl y, we may mentally take note of unsolicited comments or anecdotes from clients and draw conclusions about a program. Anecdotal and informal approaches such as these gen-erally are not regarded as carrying scientific credibility. One reason is that decision biases play a role in our “informal” evaluation. Specifically, vivid memories or strongly negative or positive anecdotes will be overrepresented in our summaries of how things are evaluated. This is why objective data are necessary to truly understand what is or is not working.

By contrast, formal evaluations systematically examine data from and about programs and their outcomes so that better decisions can be made about the interventions designed to address the related social problem. Thus, program evaluation involves the usc of social research meLhodologies to appraise and improve the ways in which human services, poli-ci~s, and programs are co nducted. Formal eva l.uation, by its very nature, is applied research.

Formal program evaluations attempt to answer the following general ques tion: Does the p rogram work? Program evaluation may also address questions such as the following: Do our clients get better? How does our success rate compare to those of other programs or agencies? Can the same level of success be obtained through less expensive means?



What is the experience o f the typical client? Sho uld this prog ram be terminated and its funds applied elsewhere?

Ideally, a tho rough program eval uation would address more complex questions in three main areas: (1) Does the program produce the intended outcomes and avoid unin-tended negative o u tcomes? (2) For whom does the program work best and un der what conditions? and (3) Ilow well was a p rogram model developed in one setting adapted to another setti ng?

Evaluation has taken an especially p rominent role in practi.ce today because o f the focu~ on evidence-based practice in social programs. Social work, as a profession, has been asked to use evidence-based practice as an ethical obligation (Kessler, Gira, & Poertner, 2005). Evidence-based practice is defined diLTerently, but most definit ions include using program evaluation data to help determine best practices in whatever area of social programming is being considered. In other words, evidence-based practice incl udes using objective indica-tors of success in addition to p ractice or more subjective indicators of success.

Formal program evaluations can be found on just about every topic. For instance, Fraser, Nelson, and Rivn rd ( 1997) h ave examined th e effectiveness of family preservation services; Kirby, Korpi, Adivi, and Weissman ( 1997) have evalu ated an AIDS and preg-nancy prevention middle school program. Mo rrow- Howell, Beeker-Kemppainen, and Judy ( 1998) evaluated an interven tion designed to reduce the risk of suicide in elderl y adult clients of a crisis hotline. Richter, Snider, and Gorey ( 1997) used a quasi-experimental design to study the effects of a g roup work interven tio n on female sur vivors of childho od sexual abuse. Leukefeld and colleagues ( 1998) examined the effects of an I IlV prevention intervention with injecting drug and crack users. Logan and colleagues (2004) examin ed the effects of a drug co urt in terven tion as well as the costs of drug co urt compared with t he economic benefits of the drug court progra m.

Basic Evaluation Considerations

Before beginning a program eva luntion, several issues must be initially considered. These issues are decisions 1 hat are critical in determining the evaluation methodology and goals. Although you may not have complete answers to th ese qu estions when beginning to plan a n evaluation, these ques tion s help in developing th e plan and must be answered before a n evaluation ca n be carried out. We can up these considerations with the following questions: who, what, where, when, and why.

First, who will do the evaluation? This seems like a simple question at first glance. llowever, this particular consideration has major implications for the evaluation results. P rogram evaluators ca n be categorized as being either in ternal or external. An internal evaluator is someone who is a program staff member or regular agency employee, whereas an external evaluator is a professional, on contract, hired for the specific purpose of evalu-a tion. Th ere are adva ntages nnd disa dvan tages to using either type of evaluato r. For example, the internal evaluator probably will be very familia r with the staff and the program . This may save a lot of planning time. The d isadvnn tage is that eva luatio ns com-pleted by an internal eva luator may be considered less valid by outside agencies, including the funding source. The external evaluator gene rally is thought to be less biased in terms of evaluation outcomes beca use he or she has no persona l investment in the program. One disadvantage is that an externa l evaluator frequently is viewed as an “o utsider” by the staff w ithin an agency. This may affect the amount of time necessar)’ to conduct the eva lua tion or cause problems in the overall evaluation if agency staff are reluctant to cooperate.


Second, what resources are available to conduct the evaluation? Hiring an outside eval-uator ca n be expensive, whi le having a staff person conduct the evaluation m ay be less expensive. So, in a sense, you may be trading credibility for less cost. In fact, each method-ological decision will have a trade-off in credibility, level of information, and resources (including time and mo ney). Also, t he amount and level of infor mation as well as the research design .. ciU be determined, to some e11.”1ent, by what resources are available. A comprehensive and rigorous eval uation does take significant resources.

Third, where will the information come from? If an eval uation can be done using exist-ing data, the cost will be lower than if data must be collected from numerous people such as clien ts and/or staff across m ultiple sites. So having some sense of where the data will come from is important.

Fou rth, when is the evaluation information needed? In o ther wo rds, what is the time-fra me for the evaluation? The timeframe will affect costs and design of research methods.

Fifth, why is the evaluation being conducted? Is the evaluation being conducted at the request of th e fun ding so urce? Is it being cond ucted to improve services? Is it being con-ducted to document the cost-benefit trade-off of the program? If future program funding decisions will depend on the results of the evaluation, then a lot more importance will be attache d to it than 1f a new manager simply wants to know whether clients were satisfied with services. The more that is riding on an evaluation, the more attention will be given to the methodology and the more threa tened staff ca n be, especially if they think that th e purp ose of the evaluation is to down size and trim excess employees. In other words, there arc many reasons an evaluation is being considered, and these reasons may have implica-tions for the evaluati on methodology and implemen tation.

Once the issues described above have been considered, more complex questions and trade-offs will be needed in planning the evaluation. Specifically, six ma in issues guide and shape the design of any program evaluation effort and m ust be given thoughtful and delib erate consideration.

L Defining the goal of the program evaluation

2. Un dersta ndi ng the level of infor mation needed for the program evaluation

3. Determining the methods and analysis that need to be used for the program evaluation

4. Consider in g issues that might a ri se and strategies to keep the eval uation on course

5. Developing results into a useful fo rm at for the program stakeholders

6. Providing practical and useful feedback about the program strengths and weak-nesses as well as providing infor matio n about next steps

Defining the Goal of the Program Evaluation

It is essen tial that the evaluator has a firm understanding of the short- and long-term objectives of the evaluation. Imagine being hired for a position but not being given a job descrip tio n or informed aboul how the job fits into the overall organization. Without knowing why an evaluation is called for or needed, the evaluator might attempt to answer a d ifferent set of c.1uestio ns from those of interest to the age ncy director or advisory board. The management might want Lo know why the majo rity of clients do not return after one or two visits, whereas the evaluator might think that his or her task is to determ ine


whether clien ts who received group therapy sessions were better off than cl ien ts who received ind ividua l counseling.

In defini ng the goals of t he prog ram evaluation, severa l steps should be taken. First, the program goals should be examined. These can be lea rned through examining official program docum ents as well as through talking to key program stakeholders. In clarifying the overall purpose of the evaluation, it is critical to talk with different program “stake-holders.” Scriven ( 199 1) defines a program stakeholder as “one who has a substantial ego, credibility, power, futures, or other capital invested in the program . . .. This includes program staff and many who arc no t ac tively invo lved in the day-to-day operations” (p. 334) . Stakeholders incl ude both supporters and opponents of the program as well as program clients or consumers or even potential consumers or clients. lt is essential that the evaluator obtain a variety of different views about the program. By listening and con-sidering stakeholder perspectives, the evaluator can ascertain the most important aspects of the program to target for the evaluation by looking for overlapping concerns, ques-tions, and comments from the various stakeholders. However, it is important th at the stakehol ders have so me agreement on what program success means. Otherw ise, it may be d ifficult to conduct a satisfactory evalua tio n.

It is also important to consult the extant literature to understand what similar programs have used to evaluate their outcomes as well as to understand the theoretical basis of the program in defining the program evaluation goals. Furthermore, it is critical that the evaluator works closely with whoever initia ted the evaluation to set priorities for the evaluation. This process should identify the intended o utcomes of th e program an d which of those outco mes, if not all of them, will be evaluated. Takin g the eval uation a step further, it may be important to include the exam ination of un intended negative outcomes that may result from the program. Stakeholders and the literature will also help to deter-mine those kinds of outcomes.

Once the overall purpose and priorities of the evaluation a re established, it is a good idea to develop a written agreement, especially if the eva I uator is an external one. Misunderstandings can and will occu r m onths later if things are no t wr itten in black and white.

Understanding the Level of Information Needed for the Program Evaluation

The success of the program evaluation revolves around the evaluator’s ability to develop practical, researchable questions. A good rule to follow is to focus the evaluation on one or two key questions. Too many questions can lengthen the process and overwhelm the evaluator with too much data that, instead of facilitating a decision, might produce inconsistent findings. Sometimes, funding sources require only that some vague unde-fined type of evaluation is conducted. The funding sources m ight nei ther expect nor desire disserta tio n-quality researc h; they simply migh L expect “good fa ith” efforts when beginning eva luation processes. Other agencies may be quite demand ing in the types and forms of data to be provided. Obviously, the choice of methodology, data collection procedures, and reporting formats will be strongly affected by the purpose, objectives, and questions exam ined in the study.

It is important to note the difference between general research and evaluation. In resea rch, th e investigator often· focuses on q uestions based on theoretical considerations o r hypotheses gene rated to hu ilcl o n research in a specific area of study. Altho ugh


prog ram evaluatio ns m ay foc us on an intervention derived from a theory, the evalua-tio n questions should, first and foremost, be driven by the program’s objectives. The eval-uator is less con ce rned with buildi ng o n prior litera ture o r cont ributing to the development of practice theory than with determinin g whether a program worked in a specific community or location.

T here are actually two main types of evalu ation questi ons. There are quc~>tions that focus on client outcomes, such as, “What impact did the program have?” Th ese kinds of questions are addressed by using outcome evaluation methods. Then there are questions that ask, “Did the program achieve its goals?” “Did the program ad here to the spec ified procedures or standards?” o r “vVh at was learned in operating this program?” These kinds of questions are addressed by using process evaluation methods. We will examine both of these two types o f evaluation approaches in the following sec tions.

Process Evaluation Process evaluations offer a “snapshot” of the program at any given time. Process evalua-tions typically describe the day-to- day program effo rts; program modifica tions and changes; outs ide even ts that infl uenced the program; people and institutions involved; culture, customs, and traditions that evolved; and sociodemographic makeup of the clien-tele (Scarpitti, In ciardi, & Pottieger, 1993). P rocess evaluation is conce rned with identify-ing p rogra m st rengths and weaknesses. T his level of p rogram cvalua rion can be usefuhn several ways, including providing a contex-t within wh ich to interpret program outcomes and so that other agen ci es o r localities wishin g to sta rr sim ilar programs ca n benefit with-out havin g to make the same mistakes.

As an example, Bentelspacher, DeSilva, Goh, and La Rowe ( 1996) conducted a process eva luation o f the cultural co mpatibility of psychoed ucational fam ily grou p treatment with eth n ic Asian cl ients. As another example, Logan, Williams, Leukefeld, an d Minton (2000) conducted a detailed process evaluation of the drug court programs before under-taking an outcome evalual ion of the same programs. T he Loga n et al. sl udy used multiple m ethods to condu ct the process evaluati o n, including .in-depth i nterviews with the program administra tive personnel, inten,iews with each of five judges involved in the progr am, surveys a nd face- to -face interviews with 22 randomly selected current clients, and surveys of all program staff, 19 community treatment provider representatives, 6 ran -domly selected d efense attorney representatives, 4 prosecu tin g attorney representatives, l representative 6:om the probation and parole offi ce, 1 representa tive from the local co unty jail, an d 2 police depa rtmen l representatives. In all, 69 different individuals repre-senting I 0 different agency perspectives provided information about the drug court program. Also, all agency documents were ex amined and analyzed , observations of vari-ous aspects of the program process were conducted, and client intake data were analyzed as pa rt of the process evaluation. The results were all integrated an d compiled into one co mprehensive repo r t.

What makes a process evaluation so important is that resea rchers often have relied only on selected program outcome indicators such as termination and grad uation rates or number of rearrests to determine effectiveness. However, to better understand how an d why a program such as drug court is effective, an analysis of how the p rogram was con cep-tualized, implemented, and revised is needed. Consider this exan1ple-say one outcome eva luation of a drug cou rt p rogram showed a gra duat ion rate of 80% of those who began the program, while another outcome evaluation found that only 40o/o of those who began the program graduated. Then, the graduates of the second program were more likely to be free from substance usc an d crimin al behaviors at the l2- month foUow-up than the graduates


from the first program. A process evaluation could help to explain the specific differences in facto rs such as selection (how clients get into the programs), treatment plans, monitor-ing, program length, and other program features that may influence how many people graduate and slay free from drugs and criminal behavior at follow-up. Tn other words, a process evaluation, in contrast to an examina tion of program outcome only, can provide a clearer and more com prehensive pictm e of how drug cou rt affects those involved in the program. More specifically, a process evaluation can provide information about program aspects that need to be improved and those that work well (Scarpilli, Inciard i, & Pottieger, 1993). Finally, a process evaluation m ay help to facilita te replicatio n of the drug cou rt program in other areas. This often is referred to as technology transfer.

A different but related process evaluation goal might be a description of the failures and depa r tures from the way in which the interventio n o riginally was designed. How were the staff trained and hired? Did the intervention depart from the treatment manual rec-ommendations? Influences that shape and affect the intervention that clients receive need to be identified because they affect the fidelity of the treatment p rogram (e.g., delayed funding or staff hires, ch anges in policies or procedu res). “/hen program implementation deviates significantly from what was intended, this might be the logical explanation as to why a program is not working.

Outcome or Impact Evaluation Outcome or impact evaluation focuses on the targeted objectives of the program, often looking at variables such as behavior change. For example, many drug t reatment programs may measure outcomes or “success” by the number of clients who abstain from drug use. Questions always arise, though. For instance, an evaluation might reveal that 90% of those who graduate from the program abstai n from drug use 30 days after the prog ram was com-pleted. However, only 50% report abstai ning from drug use 12 months after the program was completed. Would key stakeholders involved all consider that a success or failure of the progr am? This exam ple brings up three critical issues in outcome evaluations.

One of the critical issues in outcome evaluations is related to understanding for whom docs the program work best and under what conditions. In other words, a more interest-ing and important question , rather than just asking whether a program works, would be to ask, “Who are those 50% of people who remained abstinent from drug use 12 mo nths after completing the program, and how do they differ from the 50% who relapsed?” It is not unusual for some evaluation questions to need a combination of both process and im pact evaluation m ethodol ogies. For example, if it turned o ut that r esults of a particular evaluatio n showed that the program was not effective (impact), then it might be useful to know why it was not effective (process ). Tn such cases, it would be important to know how the program was im plemented, what changes were made in the pro gram during the im plementation, what problems were experienced dur ing the implem entation, and what was done to overcome those problems.

Another important issue in outcome evaluation has to do with the timing of meas ur-ing the o utcomes. Ou tcome effects are usually measured after treatmen t or postin terven-tion. These effects may be either short term or long term. immediate outcomes, or those generally measured at the end of the treatment or intervention, might or might not pro-vi de the same resu lts as one would get later in a 6- or 12-m onth follow- up, as highlighted in the exa mple above.

The third important issue in outcome evaluation has to do with what specific measures were used. Is abstinence, for example, the only measure of interest, or is reduction in use something that might be of inte rest? Refra inin g from cri minal activity or holding a steady


job may also be an important goal of a subslance abuse program. If we only m easure abstinence, we would never know about other kinds of outcomes the program may affect .

These last two issues in outcome evaluations have to d o with the evaluation methodol-ogy and analysis and are add ressed in more detail below.

Determining the Methods and Analysis That Need to Be Used for the Program Evaluation

The next step in the evaluation process is to determine the evaluation design. There are several interrelated steps in this process, including determining the (a) sources of data, (h) research design, (c) measures, (d ) analysis of change, and (e) cost- benefit assessment of the program.

Sources of Data Several main so urces of data can be used for evaluat ions, includ ing quali tative informa-tion and quantitative information.

Qualita t ive Data Sources

Qualitative data sources are often used in p rocess evaluations and might include o bsen a-tions, analysis of existing program documents such as policy and procedure manuals, in -depth interview data, or focus group data. There are, however, trade-offs when using qualitative data so urces. On the positive side, q ua litative evaluation data provide an “in-depth” snapshot of var ious topics such as how the program functions, what staff think are the positive or negative aspects of the programs, or what clients really think of the O’erall program exp eriences. Reporting cl ients’ experiences in their own words is a characteristic of qualitative evaluations.

Interviews arc good for collecting qualitative or sensitive data such as values and atti -tud es. This method requires an interview prolocol or questionnaire. These usual!) are structured so that respondents are asked questions in a specific order, but they can be semistructured so t.hat there are fewe r topics, and the interviewer has the ability to cha nge the order based on a “reading” of the client’s responses. Surveys can request information of clients by mail, by telephone, or in person. They may or may not be 1>clf-administered. So, besides considering what data are desi red, evaluators must be concerned with prag-matic considerations regarding the best way in which to collect the desired data.

Pocus groups also offer insight in to cer tain aspects of the program or program func-tioning; participants add their input, and input is interpreted and discussed by other group members. This discussion component ml!y provide an opportunity to uncover information that might otherw ise remain undiscovered such as the m eaning of certain things to different people. Focus gro ups typically are small inform al groups of persons asked a series of questions that start out very general and then become more specific. Focus groups are increasingly being used to provide evaluative info rmation about human services. They work pa rt icula rly well in identifying t he questio ns that might be important to ask in a survey, in testing planned procedures or the phrasing of items for the spec ific target population, and in exploring possible reactions to an intervention or a service.


On the other hand, qualitative studies Lend to use small samp les, and care mus t be used in analyzing and interpreting the information. FurLhermore, although both qualitative and quantitative data are su bject to m ethod bias and threats to validity, qualitative data may be more sensitive to bias depending on how participants are selected to be inter-viewed, the nu mber of observations or focus groups, and even subtleties in the questions asked. With qualitative approaches, the evaluator often has less abil ity to account for alter-n ative expla nation s because th e data are more limited. Making strong conclusions about representativeness, validity, and reliability is more difficult with quali tative d ata corn-pared to something like an average rating of satisfaction across respondents (a quantita -tive measu re). Yet, an average rating do es not tell us much about why parti cipants a re satis fi ed with the program or why they may be dissatisfied with other aspects of the p rogram. Thus, it is often imperative to use a mixture of q ualitative and quantitative information to evaluate a program.

Quantitative Data Sources

Two main types of quantitative data sources ca n be used for program evalu ations: sec-ondary data and original data.

Secondary Data. One option for ob taining needed data is to use existi ng data. Collecting new data often is more expensive than using existing data. Examining the data on hand an d already available always is a good llrst step. H owever, the evaluator migh t want to rearrange or reassemble the data, for example, dividing it by quarters or combining it into 12 -m onth periods that help to reveal patterns and trends over t ime. Existing data can come from a variety of places, including the following:

Client records maintained by the program: These may include a host of demographic and service-related data items about the population served.

Program expense and financial data: T hese can help the evaluator to determ ine whether one intervention is much more expensive than another.

Agenc.y annual reports: These can be used to identify trends in service delivery and program costs . The evaluator can compare an n uil l reports from year to year and can develop graphs to easily identify trends wilh clientele and programs.

Databases maintained by the state health department and other state agencies. Public data such as births, d eaths, and divorces are available from each state. Furthermore, mos t state agencies produce annual reports that may reveal the number of clients served by program, geographic region, and on occasion, selcc t·ed sociodemographic variables (e.g., race or age).

Local and regional agencies. Planning boards for mental health services, child protec-tion, school boards, and so forth may be able to furnish statistics on outpatient and in patien t services, special school populations, or child abuse cases.

The federa l government. The fed era l governmen t collects and maintains a large amount of data on many different issues and topics. State and national data provide bench-marks fo r comparing local demographic or social indicators to national-level demo-graphic or social indicators. For instance, if you were worki ng as a cancer educator whose objective is to red uce the incidence of b reast cancer, you might want to consult That Web site w ill furnis h natio nal -, state-, and


county-level data on the nwnber of new cancer cases and deaths. By compariso n, it will be possible to determine if the rate in one county is higher than the state or national average. Demographic information about communities can be found at

Foundations. Certain well-established foundations provide a wealth of information about problems. For example, the Annie E. Casey Foundation provides an incredible Kids Count Data Book that provides an abundance of child welfare-related data at the state, national, and county level. By using their data, you could determine if infant mortality rates were rising, teen births were increasing, or high school dropouts were decreasing. You can find the Web site at .

lf existing data cannot be used or cannot answer all of the eva luation q uestions, then o riginal data rnust be coll ec lcd.

Original Data Sources. There a re rnan y typ es or evalua ti o n designs (rom wh ich to choose, and no single one will be ideal for every project. Th e specific approach chosen for the eva luation will depend on the purpose of the evaluation, the research questions to be explored, the h oped-to r or in ten d ed res ults, the quali ty and vo lume of data available or needed, and staff, time, and financial reso urces.

The evaluation design is a critical decision for a number of reasons. Without the appropriate evaluation design, confidence in the resuiL<> of the evaluation might be lack~ ing. A strong evaluation design minimizes alternative explanations and assists the evalua-tor in gauging the true effects attributable to the intervention. In other words, the evaluation design directly affects tl1e interpretation that can be made regarding whether an intervention should be viewed as the reason for change in clients’ behavior. Howewr, there are trade offs with each design in the credibility of information, causality of an)’ observed changes, and resources. These trade- off.~ must be carefully considered and discussed with the program staff.

Quantitative designs include surveys, pretest-posttest studies, quasi-experiments with noncquivalcnt control groups, lo ngit u dinal designs, and randomized experimental designs. Quantitative approaches transform answers to specific questions into numerical data. Outcome and impact evaluations nearly always are based on quantitative evaluation desig ns. Also, sa mpli ng strategies must be co nsidered as a n in regr<1l p<1rt of th e research design. Below is a brief overview of the major types of quantitative evaluation designs. For a n expanded disc ussio11 o r these topics, refe r Lo Royse, Thyer, Padgell, and Logan (2005 ).

Research Design Cross -Sectional Surveys

A survey is limited to a description of a sa mple at o ne point in time and provides us with a “snapshot” of a group of respondents and what they were like or what knowledge or atti-tudes they held at a particular point in time. If the survey is to generate good generalizable data, then the sampling procedures must be carefully planned and implemented. A cross-sectional survey requires rigorous random sampling procedures to ensure that the sample closely represents the population of interest. A repeated survey is similar to a cross-sectional study but collects information at two or more points in time from the same respondents. A repeated (longitudinal) survey is effective at measuring changes in facts, attitudes, or opinions over a co urse of Lime.


Pretest-Po sttest Design s (Nonexperimental)

Perhaps the mosl common quantitative evaluation design used in social and human service agencies is the pretest-posttest. In this design, a group of clients with some specific problem or diagnosis (e.g., depression) is admin istered a pretest pr ior to the start of inter-vention. At some point toward the end or after the inter vention, the same inst rument is admi nistered to the group a second time (the pos ttest) . The one-group pretest-posttest design can measure change, but the evaluator has no basis for attributing change solely to the program. Confidence about change increases and the design strengthens when control groups are added and when participan ts are randomly assigned to either a control or experimental condition.

Quasi-Experimental De signs

Also known as nonequivalent control group designs, quasi-experiments generally use comparison groups whereby two similar groups are selected and foll owed for a period of time. One group typically receives some program or benefit, v,rhereas the other gro up (th e control) does nol. Both groups are m easured and compared for any differences at the end of some time period. Participants used as controls may be clien ts who are on a waiting list, those who are enrolled in another treatment program, or those who live in a different city or county. The problem with this design is that the control or compa rison group might no t, in fact, be equivalent to the group receiving the intervention. Comparing Ocean View School to Inner City School might not be a fair comparison. Even two differen L sch ools within the same rural county might be more different than similar in terms of the learn-ing milieu, the proportion of students receiving free lunches, the number of computers and books in the school librar y, the principal’s hiring pract~ces, and the like. With this design, there always is the possibility that whatever the results, they might have been obtained because the intervention group really was different from the contro l group. However, many of these issues can be considered and either controlled for by collecting the information and performing statistical analysis with these considerations or at least can be considered within the contex1: of interpreting the resu lts. Even so, this type of study does not provide proof of cause and effect, and the evalu ator always must cons ider o th er facto rs (both known and measured a nd unknown or unmeasured) that co uld h ave affected the study’s outcomes.

Longitudinal Designs

Longitudinal designs are a type of quasi- experimental design t hat involves tracking a par-ticula r group of individu als over a substantial period of time to discover potential changes due to the influence of a program. It is not uncommon for evaluators to want to know about the effects of a program after an extended period of Lime has passed. The questio n of interest is whether treatment effects last. These ~tudies typically are compli-cated and expensive in time and reso urces. In add ition, the longer a study runs, the higher the expected rate of attrition from cl ients who drop out or move away. High rates of allrition can bias the sample.

Randomized Experimental Designs

l.n a true experimental desig n, participants are ran domly assigned to either the control or treatment group. This design provides a persuasive argument about causal effects of a program on participants. The random assignment of respondents lo treatm ent and con-trol groups helps to ensure both groups are equivalent across key variables such as age, race, area of residency, and treatment history. This design provides the best evidence Lhat


any observed differences between the tl’IO groups after the intervention can be attributed to the intervention, assuming the two groups were equal before the interven tion. E·en w ith random assignmen t, group differences preinLervention co uld exist, and the eval uato r should carefu lly look for them and use statistical controls when necessary.

One word of warning about random assignment is that key program stakeholders often view random ass ignment as unethical, especially if they view the treatment p rogram as benefici al. One o utcome of this diffk ulty of accepting random assignment is that staff mi gh t have problems not giving the intervention they believe is effective to specific needy clients or to all of their clients instead of just to those who were randomly assigned. If they do succumb to this temptation, then the eval uation effo r t can be unintentional ly sabo-ttlged . The evalua tor must tra in an d prepa re all of those individ uals involved in the e’al-uation to help them und erstand the purpose and importance of the random assignment. That, more than any other procedure, prov ides the evidence that the treatmen t really does benefit the clients.

Sampling Strategies and Consideration s

vVhen the client population of interest is too large to obtain information from each individual member, a sample is drawn. Sampling allows the eva luator to make predictions abou t a population based on study findings from a set of cases. Sampling st rategi es can be very complex. lf the evaluator needs th e type of precision afforded b y a p robability sam-ple in which there is a known level of confidence and margin of error (e.g., 95% confi-dence, pl us or m inus 3 pe rcentage points), th en he o r she m igh t need to hire a sampling consultant . A co nsultant is particularly recommended wh en the decisions about the program or intervention are critical such as in drug research or when treatments could have potentially harmful side effects. However, there is a need to recognize the trade-offs that a re made when deter mining sampling strategy and samp le si ze. Large samples can be more accurate than sma ller ones, yet they usually are much more expensive. Small samples can be acceptable if a big change or effect is ell.lJected. As a rule, the m ore critical the decision, the larger (and more precise) the sample should he.

T here are two main c<ttegories of sampli11g st rategies fro m whic h the evaluator can choose: probability sampling and nonprobability sampling. Probabili ty sampling imposes statistical rules to ensure that unbiased samples are drawn. These samples normally are used for impa ct studies. Nonprobability o r convenience sampling is less complicated to implement and is less expensive. This type of sampli ng often is used in p rocess evaluations.

Wit h probabi lity sampling, the primary idea is tha t every in d ivi du al, object, or institu-tion in the population under study has a chance of being selected into the sample, and the likelihood of the selection of any individual is known. Probability sampling pro,;des a firm basis for generalizing from the sample lo the population. No11probability samples severely red uce the eval uator’s ability to generalize the results of the study to the larger population.

The evaluator must balance the need for scientific rigor against convenience and often limited resources when determining sample size. If a m ajor decision is bei ng based on data collected, then precisio n and certa inty are critical. Statistical precisio n increases as the sample s ize increases. When differences in the results are expected to be small, a larger sampl e guards against confounding variables that might distort the results of a treatment.

Measures The next important meth od decision is to determine how best Lo measure the variables of interest needed to answer the evaluation questions. These will va ry from evaluation to


evaluation, depending on the questions being asked. In one project, the focus migh L be on the outcome variable of arrests (or rear rests) so as to determine whether the program reduced criminal justice involvement. In another project, the out come variable mighL be nmnbcr of hospitalizations or days of hospitalization.

Once there is agreement on the outcome variables, objective meas ures for those variables must be determined. Using the example of the drug court program above, the deci-sions might include the following: How will abstinence be m easured? How will reduction in substance use be measured? How will crimina 1 behavior be measured? llow will employment be measured? Th is may seem simple at first glance, but there are two complicating factors. First, there are a variety of ways to measure something as simple as abstinence. One could measure it by self-report or by actually giving the client a drug test. When looking at reduc-tion of use, the issu e of measurement becomes a bit more complicated. This will likely need to be self-report and some kind of comparison (either the same measures must be used with the same clients before and after the program [this being the best way) or the same mea-sure must be used with a control group of some kind [like program dropouts)).

The second complicating factor in measurement is determining what other constructs need to be included to better understand “who benefits from the program the most and under what circ umstances” and how those constructs are measured. Again, using the drug court program as an example, perhaps those clients who are most depressed, have the most health problems, or have the mos L anxiety do worse in drug court programs because the program may not address co-occurring disorders. If this is the case, then it will be important to include measures of depression, anxiety, and health. However, there are many different measures for each of these constructs, and different measures use different timeframes as points of reference. T n other words, some depression measures ask abo ul 12-month periods, some ask about 2-week periods, and some ask about 30-day periods.

ls one instrument or scale better than another fo r measuring depression? ·what are the trade-offs relative to shorter or longer instruments? (For example, the most valid instru-ment might be so long thal clien ts will get fatigued and refuse to complete it.) Is it better to measure a reduction in symptoms associated with a standardized test or to employ a behavioral measure (e.g., counting the n umber of days that patients with chronic mental illness are compliant with taking their medications)? Is measuring attitudes aboul drug abuse better than measuring knowledge about the symptoms of d rug addiction? Evaluators frequently have to struggle with decisions such as these and decide whether it is better to use instruments that are not “perfect” or to go to the tro u ble of developin g and validating new ones.

When no suitable instrument o r available data exist for the evaluation, the evaluator might have to create a new scale or at least modify an existing one. If an evaluator revises a previ.ously developed measure, then he or she has the burden of demonstrating that the newly adapted instrument is reliable and valid. Then, there are issues such as the reliabil-ity of data obtained from clients. Will cl ients be honest in reporting actual d rug and alco-hol use? How accurate are their memories?

A note mu st be made here about a special case of program evaluatio n: evaluating pre-vention programs. Evaluation of prevention programs is especially challenging because the typical goal of a prevention program is to prevent a particular problem or behavior from developing. The question then becomes, ” How do you measure something that never occurs?” In other words, if the prevention program is successful, the problem will not develop, but it is difficult to dclermine with any certainty that the problem would have developed in the flrst place absent the prevention program. Ti l uS, measures beco m e very important as well as the design (s uch as including a control group ).

Evaluators use a multitude of methods and instruments to collect data for their stud-ies. A good strategy is to include mu ltiple measures and methods if possible, especially


when random assignment is not possible. That way, one can possibly look for convergence of conclusions across methods and measures.

Analysis of Change After the data are collected, the evaluator is faced with a sometimes difficult question of how to determine whether change had occurred. And, of course, there are several consid-erations within this overall decision as welL One of the first issues to be decided is what the unit of analysis will be.

The unit of analysis refers to the person or things being stud ied or measured in the eval-uation of a program. Typically, the basic unit of analysis consists of individual clients but also may be groups, agencies, communities, schools, or even slates. Fo r example, an evalu-alor might examine the effectiveness of a drug prevention program by looking for a decrease in drug-related suspensions or disciplinary actions in high schools in which the program was imp lemented. In that instance, schools are the primary unit of analysis. Another eva lu ator might be concerned only with the attitudes toward d rugs and alcohol of students in one middle school; in that situation, individuals would be the uni l of analysis. The smal lest unit of ana lysis from which data are gathered often is referred to as a case. The unit of analysis is critical for determining both the sampling strategy and the data analysis.

The analysis will also be determined by the research design such as the number of groups to be analyzed, the type of dependent var iable (categorical vs. continuous), the control variables that need to be included, and whether the design is longitudinal. The literature on similar program eval uation s is also usefullo exam ine so that analysis plans can consider what has been done in the past. The analysis phase of the evaluation is basi-cally the end product of the evaluation activities. Therefore, a careful analysis is critical to the evaluation, the interpretation of the results, and the credibility of the results. Analysis should be conducted by somebody with adequate experience in statistical methods and statistical assumptions, and limitations of the study should be carefully examined and explained to program stakeholders.

Cost-Benefit Analysis Whi le assessing program outcomes is obv iously necessary to gauge the effectiveness of a program, a more comprehensive understanding of program “success” ca n be a ttained by examin ing program costs and economic benefits. In general, eco nomic costs and benefits associated with specific progra ms have received relat ively limited attention. One of the major challenges in estimating costs of some cornm uni ly-baseu socia l programs is that slanuard cost est imat ion procedures do not always reflect the true costs of the program. For example, a drug court program often combines both criminal justice supervision and substance ab use trcaLrnen L in a comm unity-based envi ronment. And in order for drug court programs to work effectively, they often use many community and outside agency resources that are not necessarily directly paid for by the program. For exa mple, although the drug court program may not directly pay for the jail time incurred as part of client sanctions, jail time is a central component in many drug cou rt programs. Thus, jail costs must be considered a drug court program cost.

A comprehensive economic cost analysis would include c.slimates of the value of all resources used in providing the program. When resources are donated or subsidized, the out-of-pocket cost will differ from the oppo rt uni ty cost of the resources for a given program. Opportunity costs take into account the forgone val ue of an alternative use for program resources. Other examples of opportunity costs for the drug court program may include the time and efforts of judges, police officers, probation officers, and prosecutors.


Including costs fo r which the program may not explicitly pay presents an interesting dilemma. The dilemma primarily stems from the trade-off in presenting only out-of-pocket expenditures for a program (thus the program will have a lower total cost) or accurately reflecting all of th e costs associated with the program regardless of whether th ose costs are pa id out of pocket (implying a higher total prog ra m cost). furthe rm ore, when agencies share resources (e.g., shared overhead costs), the correct proportion of these resources that are devoted specificall y to a program must be properly specified. To date, there has been liule discussion in the literature about estimating the opportunity cost of programs beyond the out of pocket costs. Knowin g which costs to include and what va lue to place o n certain se rvi ces or items that are not directl y charged Lo the program can be complica ted.

A comprehensive analysis of econom ic benefits also presents challenges. The goal of an econo mic benefit analysis is to determine the monetary value of changes in a range of program outcomes, mainly derived from changes in client behavior as a result of par-ticipating in the program. When estimating the benefits of a program such as drug court, o ne of the most o bvious and important o utco m es is the reduction in c riminal justice costs (e.g., reduced incarceration and supervision), and these are traditio nally the only sources of benefit s examined in many drug court evaluations. However, drug court programs often have a diverse set of goals in addition to reducing criminal justice costs. For examp le, drug court programs often focus on helping the particip ants beco m e more productive in society. This includes helping par ticipants take responsibil-ity fo r their financial o bligations such as child support. In addition, employment is often an important program goal for drug court clients. If the client is working, he or she is paying taxes and is less likely to use social welfare programs. Thus, the drug court program potentially re d uces several di fferent ca tegories of costs that might have accrued had program participants not r eceived treatment. These “avoided” costs or benefits are important com po nents to a full eco nomic eva lua tion of drug court programs.

So, although the direct cost of the program usually is easily computed, the full costs and the benefits are more di fficult to convert into dollars. For example, Logan et al. (2004) fo un d that the average direct cost per chug court treatment ep iso de for a grad uate was $3,319, w hereas the op portunity cost per episode was $5, 132. These differences in costs due to agency collaboration highlight the importance of clearly defining the perspective of the cost analysis. As discussed earlier, the trade-off in presenting only out-of-pocket expenditures for a progra m or accurately refl ect in g all of the costs associated with the prog ram regardless of whether those costs are paid out of pocket is an important distin c-tion that should be co nsi dered at the outset of every economic evaluation . On the benefit side of the program, results suggest that the net economic ben efit was 514,526 for each graduate of the program. In other words, this translates to a return of $3.83 in economic benefit for every dolla r invested in the drug court programs for graduates. Obviously, those who dropped o ut of the program before comple ti ng di d not generate as large of a r et urn. However, res ul ts suggest that when both gra duates and terminators were exam-ined together, the net economic benefit of any drug court experience amounted to $5,446 per participant. This translates to a return of $2.71 in economic benefit for every dollar invested in the drug court programs.

When looking <~l the cost -benefit analys is of programs o r compa r ing these costs and ben efi ts across programs, it is important to keep in mind that cost- benefit analysis may be done very differently, and a careful assessment of the methods must be under taken to ensure comparabilily across programs.

Leave a Reply

Your email address will not be published. Required fields are marked *