Chapter 10

class: center, middle, inverse
background-image: url(https://www.unomaha.edu/university-communications/downloadables/campus-icon-the-o/uno-icon-color.png)
background-position: 95% 90%
background-size: 10%

# Evaluation and Policy Analysis

[Justin Nix](https://jnix.netlify.app)  
*School of Criminology and Criminal Justice*  
*University of Nebraska Omaha*

.white[November 21, 2024]

???

---
class: top

# Why Do We Need Evaluation?

---
class: top

# A Brief History of Evaluation Research

## the Great Society and the War on Poverty

???

circa 1960s: Evaluation research in the social sciences isn't *new*, but taxpayer desires to know if their money was being spent wisely prompted a flurry of evaluation studies evaluating programs meant to alleviate poverty and the problems that stem from it.

## "Nothing works?"

???

**Robert Martinson's** 1974 work on the shortcomings of prisoner rehabilitation programs prompted the "nothing works" doctrine.
- Highly influential, and inspired a wave of strong sentencing reform and cancellation of rehabilitation programs

## Utility, feasibility, propriety, and accuracy

???

1981 Joint Committee on Standards published the list of features all evaluations should have:

1. **Utility**: should serve the practical information needs of intended users
2. **Feasibility**: Should be realistic, prudent, diplomatic and frugal
3. **Propriety**: evaluation should be conducted legally, ethically, and with due regard for those involved (including those affected by its results)
4. **Accuracy**: evaluation should reveal and convey technically adequate info about the features that determine worth or merit of the program

---
class: top

# Evaluation Basics

???

**Inputs** are what goes into the program (in our context, often clients/participants and staff).

**Program process** - the complete treatment or service delivered by the program
- can be simple or complicated, short or long...but the idea is the same: it is designed to have some impact on the cases as inputs go into the model and outputs are produced.

**Outputs** are the services delivered, or new products produced, by the program process. ***Note***: these are often WAY easier to measure than outcomes.
- e.g., clients served, managers trained, arrests made

**Outcomes** reflect the impact - good and/or bad - of the program process on cases processed. 
- e.g., improved test scores, lower crime rates, reduced poverty

Variation in both outputs and outcomes in turn influences the inputs to the program through a **feedback process** which often involves **program stakeholders**.
- e.g., not enough clients being served? too many negative side effects from a trial med?

---
class: top

# Evaluation Alternatives

## Is the program **needed**?

???

A **needs assessment** attempts to provide systematic, credible evidence of the need for a program. Key questions include the nature and scope of the problem, as well as the target population in need of the intervention.

- What is the magnitude of this problem in this community?
- How many people in this community are in need of this program?
- What are the demographic characteristics of these people?
- Is the proposed program or intervention appropriate for this population?

---
class: top

# Evaluation Alternatives

## **Can** the program be evaluated?

???

Evaluation research is pointless is the program itself can't be evaluated.

Often rely on qualitative methods; researchers might interview program managers or key staff to determine what data are available to analyze

Program sponsors might be interviewed about the importance they attached to different goals.

Can have an *action research* component, if researchers use their findings to encourage program managers or other key staff to make changes to current operations (e.g., collect more data or collect same data in a different way).

---
class: top

# Evaluation Alternatives

## Is the program operating **as planned**?

???

What actually happens? Evaluators are often asked to document the extent to which **implementation** has taken place.

Key concerns here are **program coverage** and **delivery**.

- Is the program reaching the target population? (coverage)
  - E.G., gun buyback programs often don't recover guns that will be used in crimes
- Is it operating as expected? (delivery)
  - E.G., the St. Louis Consent to Search Program looked promising, but died after key staff turned over and they removed the promise not to prosecute. 
- What resources are being expended?

***If a program has not been implemented as intended, should we even bother asking whether it had the intended outcomes?***

---
class: top

# Evaluation Alternatives

## Is the program operating **as planned**?

### Case study: [DARE](https://www.ojp.gov/pdffiles1/Digitization/152055NCJRS.pdf)

???

In 1994, RTI did a process evaluation of D.A.R.E. that included 3 goals:

1. Assess the organizational structure and operation of representative DARE programs nationwide.
2. Review and assess factors that contribute to effective implementation. 
3. Assess how DARE and other school-based drug prevention programs are tailored to meet the needs of specific populations.

They did site visits, interviews, discussion groups, and surveys of DARE program coordinators and advisers.

Compared to coordinators of other alcohol/drug prevention programs, DARE coordinators were more likely to indicate satisfaction with their curriculum, teaching, admin requirements, student receptivity, and effects on students. 
- This would suggest DARE was operating as designed.

OTOH, site visits helped identify implementation problems, including insufficient numbers of officers to carry out the program as planned, and a lack of Spanish-language DARE books in a largely Hispanic school.

www.dare.org

---
class: top

# Evaluation Alternatives

## Did the program **work**?

???

This most closely relates to a lot of the material we've covered thus far. Some program or policy is the independent variable, and its intended effects are the dependent variable.

Ideally, we'd use an experimental design or a strong causal identification strategy.

But that can be difficult with social programs, where the usual practice is to let people decide for themselves whether they want to participate in a program, or to establish eligibility criteria that ensure people who enter the program are different from those who don't.
- Either way: *selection bias has likely been introduced*.

---
class: top

# Evaluation Alternatives

## Did the program **work**?

### Case study: [D'Amico & Fromme (2002)](https://doi.org/10.1046/j.1360-0443.2002.00115.x)

???

This study compared the impact of a new "Risk Skills Training Program" to an abbreviated DARE program and a control group
- **Participants**: Young people between the ages 14 and 19
- **Outcomes**: positive and negative "alcohol expectancies," perception of peer risk taking, and actual alcohol consumption

---
class: top

# Evaluation Alternatives

## Did the program **work**?

### Case study: [D'Amico & Fromme (2002)](https://doi.org/10.1046/j.1360-0443.2002.00115.x)

???

Note that your textbook completely butchers the results of this study.

**Results**
- At post-test, the RSTP group showed no difference on negative alcohol expectancies, whereas DARE and control groups showed improvement!
- But at 6-month follow up, no meaningful differences in any of the groups.

---
class: top

# Evaluation Alternatives

## Did the program **work**?

### Case study: [D'Amico & Fromme (2002)](https://doi.org/10.1046/j.1360-0443.2002.00115.x)

???

**Results**
- Control group increased their alcohol consumption

*Notice also how the groups were not equivalent at pretest!*

---
class: top

# Evaluation Alternatives

## Is the program **worth it**?

---
class: top

# Evaluation Alternatives

## Is the program **worth it**?

### Case study: [Therapeutic communities](https://doi.org/10.1016/S0149-7189%2802%2900006-X)

???

"The core principles and methods of the TC are especially relevant to the treatment of mentally ill chemical abusers (MICAs). These include: providing a highly structured daily regimen, fostering personal responsibility and self-help in addressing life's difficulties, and using peers as role models and guides. The modified TC program for MICAs involves three fundamental alterations: increased flexibility, decreased intensity, and greater individualization. Nevertheless, the central TC feature remains: the modified TC seeks to develop a culture where clients learn through self-help and affiliation with the community to foster change in themselves and others" (p. 137)

This study was a CBA of a modified therapeutic community at three sites in Brooklyn. 342 individuals who'd been referred by homeless shelters or psychiatric facilities were **SEQUENTIALLY** assigned to a TC or "treatment as usual," which in this case could have been a variety of things:
- Other residential programs, discharge without follow up, intensive case management, continue at the referral site, or drop out and wind up back on the streets.

Employment status, criminal behavior, and use of health care services were measured for each group 3 months prior to treatment and 3 months post-treatment

Earnings from employment in each period were adjusted for costs incurred by criminal behavior and use of health care services.

---
class: top

# Evaluation Alternatives

## Is the program **worth it**?

### Case study: [Therapeutic communities](https://doi.org/10.1016/S0149-7189%2802%2900006-X)

Modified TC Costs (1994 Dollars) (French et al., 2002: Table 3)

.medium[| Resource Category       | Accounting Cost | % of Total | Economic Cost | % of Total |
|-------------------------|-----------------|-----------:|---------------|-----------:|
| Labor                   | $1,415,601      | 67.3%      | $1,415,601    | 65.5%      |
| Miscellaneous           | $327,131        | 15.5%      | $327,131      | 15.1%      |
| Supplies                | $189,667        | 9.0%       | $189,667      | 8.8%       |
| Contracted Services     | $160,450        | 7.6%       | $160,450      | 7.4%       |
| Buildings and Facilities| $0              | 0.0%       | $51,077       | 2.4%       |
| Equipment               | $11,855         | 0.06%      | $16,193       | 0.75%      |
| **Total annualized cost**   | **$2,104,704**     |            | **$2,160,120**  |            |
| **Average annual cost (per client)** | **$28,062**       |            | **$28,802**      |            |
| **Average weekly cost (per client)** | **$540**           |            | **$554**         |            |
| **Average episode cost** | **$19,845**|            | **$20,361**|            |]

???

**Results**: The average cost of TC treatment for a client was about $20K.  
But the economic benefit based on earnings to the average TC client, relative to the average TAU client, was $274K. 
- So after adjustments for costs the benefit to cost ratio was 13:1.
- In other words: for every dollar spent on this program, taxpayers saved $13
- (closer to a 5:2 ratio after removing extreme outliers)

**Limitations**: While the majority of subjects (N=281, 82%) completed at least one of the follow-up interviews, **only 218 subjects (64%) completed both the 6-month and 12-month follow-up interviews**. The study cohort in this paper was derived from **the 218 subjects that had both baseline and full 12-month follow-up data**. The sample consisted of 186 clients (146 modified TC clients and 40 TAU clients) who completed both the 6-month and 12-month follow-up interviews and had no missing values for any of the variables used in the benefit calculations.

---
class: top

# Design Decisions

## Black box evaluation or program theory?

???

**Do we care how the program gets results?**

Some might argue that we should simply be testing the input-output model - whether cases changed as a result of exposure to the program.

If an investigation of the program process is conducted, a program theory might be developed - describing what has been learned about *how* the program has its effect.

If researchers have sufficient knowledge before the evaluation begins, they can perhaps perform a *theory-driven evaluation*, which could guide the process in a more productive way.

---
class: top

# Design Decisions

## Researcher or stakeholder orientation?

???

**Whose goals matter most?**

In social science research, the researcher typically defines research questions, applicable theories, and outcomes to be investigated.

In program evaluation, program sponsors or government agencies often define the research questions, and findings get reported to them (rather than, or at least prior to, being published in a journal).

***What to do as a researcher if the program sponsor pushes back on your proposed research design?***
- What responsibility do you have to politicians and taxpayers when evaluating government-funded programs?

If you adopt a *stakeholder approach*, you conduct the evaluation in a way that will be most *useful* to stakeholders. This might involve program participants engaging with the researchers to help design, conduct, and report the research (i.e., *action* or *participatory research*).

A *social science approach* OTOH emphasizes the importance of researcher expertise. S/he is given more autonomy to develop the most unbiased program evaluation possible.

Or you might take a more *integrative approach*.

---
class: top

# Evaluation in Action

## Case study: [POP in Jersey City](https://doi.org/10.1111/j.1745-9125.1999.tb00496.x)

???

***What does POP refer to?***

**Anthony Braga** and colleagues are able to randomly assign 56 violent crime "hot spots" to receive POP or business as usual. There are 28 matched pairs, and a coin flip determines which gets treatment and which is control.

- Officers in POP spots worked with community to identify problems and come up with solutions. Most solutions involved targeting social disorder problems (order maintenance policing, securing vacant lots, and picking up litter).

- Citizen calls for police service are examined before and after the onset of the experiment.
- As well as incident report data and physical observation of each spot

**Results**: Generally, crime and disorder fell in the POP spots, without being displaced into surrounding areas.

---
class: top

# When Experiments are not Feasible

## Case study: [Boot camps](https://onlinelibrary.wiley.com/doi/full/10.4073/CSR.2003.1)

???

Boot camps started in 1983 in Georgia, and are also known as "shock incarceration programs."

***Do you think Boot Camps reduce recidivism?***

The idea is that a strict, military-style punishment can serve as an effective alternative to extended periods of traditional incarceration. 
- But do they work? Part of the challenge of answering this question is that boot camps have been implemented in a variety of ways.

**Doris MacKenzie** and colleagues evaluated boot camp program efficacy in eight states
- Characterized by strict rules, discipline, and a military-like atmosphere; mandatory participation in military drills and PT; separation of program participants from other inmates.

They used a non-equivalent control group design to evaluate:

- In each state, a sample of male boot camp program graduates served as the "treatment group," and two other samples were used as comparison groups (prison parolees, probationers, and boot camp dropouts). Importantly, individuals in all samples had initially met the eligibility requirements for boot camp.

- 2 year follow-ups, with recidivism measured as arrests and revocations for new crimes or technical violations.

**Results** were mixed. Programs with more intensive long-term follow-up supervision showed less recidivism, but traditional boot camps without such long-term supervision did not.

**Limitation**: NO RANDOM ASSIGNMENT!!

---
class: top

# When Experiments are not Feasible

## Case study: [Drinking and homicide in Eastern Europe](https://doi.org/10.1177/1088767907310851)

???

***Do you think drinking alcohol causes people to be violent?***

- Lots of research in the US and elsewhere suggests a correlation for sure. 
- E.G., a large percentage of homicides are committed under the influence of alcohol

- Research also suggests the relationship between alcohol and violence is *culturally mediated*

**Results of Bye (2008)**: Effects of alcohol consumption on homicide rates were more pronounced in countries with more detrimental drinking patterns:

> Implying that the relative risk will depend on the patterns of drinking and of behavior associated with drinking in a particular society.

---
class: top
background-image: url(https://static.wixstatic.com/media/f5df24_6755bbd35fac4ae09d0d8227126d091f~mv2.jpg/v1/fill/w_2396,h_1192,al_c,q_90/f5df24_6755bbd35fac4ae09d0d8227126d091f~mv2.webp)
background-position: 50% 65%
background-size: 97%

# The Hierarchy of Evidence

---
class: middle, center, inverse

# Have a great day! 😄

## *There are no perfect [evaluation] studies. And there cannot be, for there is no agreement on what constitutes perfection.*
<img src="perfect-study.png" width="30%" style="display: block; margin: auto;" />
## (Michael Patton, 1997:23)