Case 2.2: Causal Inference on Prison Reform Programs PDF

Summary

This document analyzes if prison reform programs reduce recidivism. It evaluates a vocational training and coping skills program for parolees from 2013-2015 in the southeastern United States, using matching and exploratory data analysis to assess the program's effectiveness and the likelihood of re-offense.

Full Transcript

Case 2.2: Causal inference Do prison reform programs reduce recidivism? Goals (3 min) In previous cases we discussed the importance of randomization in the design of experiments for causal analysis. We mentioned that randomization is a gold standard if we want to avoid erroneous or skewed results in...

Case 2.2: Causal inference Do prison reform programs reduce recidivism? Goals (3 min) In previous cases we discussed the importance of randomization in the design of experiments for causal analysis. We mentioned that randomization is a gold standard if we want to avoid erroneous or skewed results in our experiments. However, in many instances, conducting an experiment with a pure random treatment group and control group is not feasible (this may be due to ethical or constrained resource issues). In this case, you will learn how matching can help to alleviate this problem. from plotly.offline import init_notebook_mode, plot import numpy as np import numpy.random as nr from numpy.linalg import norm import plotly.express as px import pandas as pd import seaborn as sns import matplotlib.pyplot as plt Introduction (5 min) Business Context. Recidivism is a measure of a former prisoner's likelihood to be re-arrested, re-convicted, or returned to prison with or without a new sentence during a three-year period following the prisoner's release. It has been used to study the performance and effectiveness of privately and publicly managed prisons. Recidivism rates among prisoners in the United States are high, with more than 60% of released prisoners re-offending within three years of release. Consequently, prison reform programs have invested hundreds of millions of dollars per year on programs designed to reduce re-offense. However, implementing these programs is costly and evaluating them can be complex. Currently, policymakers are interested in evaluating a training program designed to reduce recidivism among paroled inmates. They have come to you for assistance in determining whether it is worthwhile. Business Problem. The policymakers want you to answer the following: "Will there be a reduction in recidivism if the proposed program were to be widely adopted?" Analytical Context. The new program was piloted for a subset of parolees (recently released prisoners) from two prisons. New parolees at these two prisons were given the option to volunteer to participate in the recidivism reduction program. If we determine that the program is worthwhile, it would subsequently be scaled to all parolees from twenty-five eligible prisons in the federal penal system. The case is structured as follows: you will (1) analyze the setup of the program and identify potential observational study design problems; (2) learn about matching and use it to correct for some of these problems; and finally (3) conduct exploratory data analysis to determine which features seem to account for most of the corrections/differences between the policymakers' study and our matched study. Initial analysis of the program (35 min) The program we are evaluating in this case study was focused on vocational training and coping skills and was applied to n=7105 parolees from 2013 - 2015 in the southeastern United States. Participants were followed for 30 months after completing the program and identified as a "responder" if they did not commit a serious offense during that period, and a "non-responder" if they did. The data collected in the study has the form ( X 1 , Y 1 ) , ( X 2 ,Y 2 ) , … , ( X n , Y n ), where X i is a vector containing several characteristics of parolee i and Y i ∈{0 ,1 } is a binary variable recording whether parolee i was a responder or not (1 meaning re-offended, 0 meaning did not re-offend). Parolee characteristics are as follows: Variable Description ID Unique prisoner identifier Max severity (1-5) Severity of most serious charge in current conviction Ave severity (1-5) Average severity across all charges in currennt conviction Sentence time Sentence length (months) assigned Sentence Time served (months) served Education (1-4) coded so that: 1 - less than High School, 2 - High School, 3 - Some College, and 4 - College Degree Repeat Binary indicator of previous conviction offender Substance use Binary indicator of substance abuse Support Binary indicator that parolee is being provided with transitional support (e.g., halfway house) study_data_pd = pd.read_csv('rec_study.csv') study_data_pd.head() max_severity ave_severity sentence_time sentence_served \ 0 1 1.000000 15.281172 4.032279 1 1 1.000000 19.816019 1.625926 2 2 1.333333 59.293718 38.978041 3 2 1.333333 72.026361 25.082745 4 1 1.000000 10.618477 9.412762 repeat_offender substance_use support education reoffend 0 0 0 1 1 1 1 0 0 0 1 0 2 0 1 1 1 1 3 1 0 0 4 0 4 0 0 0 4 0 Exercise 1: (3 min) What is one potential issue you see with the program design that might dilute the conclusiveness of your results? Answer. Because enrollment into the program was voluntary, the studied sample in the experiment was not randomized among all eligible prisoners. Thus, care must be taken in generalizing any observed effects in the participating parolees. Exercise 2: (12 min) Conduct a basic exploratory data analysis to see if any of the covariates appear to be related to the probability of re-offense. What is the average rate of re-offense of the participants in the study? Answer. All variables are either numeric, or numeric categories with clear directionality in their labels (e.g. for Education, the higher the label number, the more educated said participant is). This means that a correlation matrix is a good initial tool which will quickly give us basic insight into all the variables at once: # Correlations study_data_pd.corr() max_severity ave_severity sentence_time sentence_served \ max_severity 1.000000 0.817755 0.793862 0.515839 ave_severity 0.817755 1.000000 0.475187 0.308140 sentence_time 0.793862 0.475187 1.000000 0.715039 sentence_served 0.515839 0.308140 0.715039 1.000000 repeat_offender 0.137198 0.094026 0.127827 0.057331 substance_use 0.082750 0.076825 0.057291 0.038360 support -0.001907 -0.004226 0.006061 0.023929 education -0.007973 -0.001395 -0.006468 0.000696 reoffend -0.242499 -0.092318 -0.379868 - 0.092872 repeat_offender substance_use support education reoffend max_severity 0.137198 0.082750 -0.001907 -0.007973 - 0.242499 ave_severity 0.094026 0.076825 -0.004226 -0.001395 - 0.092318 sentence_time 0.127827 0.057291 0.006061 -0.006468 - 0.379868 sentence_served 0.057331 0.038360 0.023929 0.000696 - 0.092872 repeat_offender 1.000000 0.027728 -0.005602 0.007474 - 0.023067 substance_use 0.027728 1.000000 0.009827 -0.007708 - 0.030576 support -0.005602 0.009827 1.000000 0.005533 0.002608 education 0.007474 -0.007708 0.005533 1.000000 - 0.001031 reoffend -0.023067 -0.030576 0.002608 -0.001031 1.000000 Since correlations are best suited for analyzing linear relationships, let's create a matrix of 2D scatterplots for each pair of variables also, so that we can visually pick up on any potential non- linear relationships: sns.pairplot(study_data_pd) From this, we can see that the likelihood of re-offense increases with the following variables: 1. Maximum severity of the crime (i.e., the worst crime they were convicted of) 2. Substance use 3. Whether or not they were a repeat offender before However, we also see that it decreases as the gap between sentence time and time served. Possible explanations include that increased severity and prior offense indicate a pattern of criminal behavior and therefore an increased propensity for re-offense. A (positive) gap between sentence time an time served indicates that the subject was released early which is often an indicator of 'good behavior' which in tern may be an indicator of a desire for change, self-control, or other positive predictors for response. study_data_pd['reoffend'].mean() 0.4751583391977481 The avearge rate of re-offense is 47.5%, which is significantly lower than the national average of 60%. A reduction of nearly 12.5%, if it held throughout the entire system, would be a significant reduction translating into tens of millions of dollars saved on incarceration costs along with immeasurable other societal benefits. Exercise 3: (5 min) A reduction of 12.5% would be remarkable if such results held across all prisoners released across the entire prison system. Recall that policy makers are interested in the value of a program if it were instituted across the entire system. Can you identify any sources of bias that might prevent the study from generalizing to the entire prison system? Answer. A major issue is that the type of prisoner who would opt into such a program may have a strong desire to avoid future incarceration and therefore be less likely to re-offend. Put another way, the prisoners in the study may not be representative of the population of released prisoners in terms of their response because the factors influencing their decision to join the study may also influence their likelihood of response. Another potential source of bias is the fact that prisoners were only enrolled from two of the twenty-five eligible prisons. If these two prisons somehow differ from the other twenty-three in terms of the likelihood that prisoners will respond to the program (that is not captured in the measured covariates) then this could also bias our results. Let's assume now that the two sampled prisons are not systematically different from the others. Exercise 4: (10 min) Given that the set of prisoners who opt into the program could be biased, suggest three possible approaches to estimate the response rate in the entire population of released prisoners. Discuss the costs and benefits of these approaches. Answer. Three potential approaches are: 1. Run a follow-up study where subjects are randomly enrolled into the program and forced to participate as part of the terms of their release. This would certainly provide a representative sample but could be extremely costly and would delay implementation of full-scale implementation of the program. 2. Interview prisoners to identify factors influencing their decision to enroll in the study. Use qualitative or quantitative analyses to discern whether those who are more likely to respond(and not re-offend) were more likely to enroll. This approach might be reasonable if it is very difficult or impossible to learn anything about the entire population of prisoners any other way. However, this would also mean that it is very difficult or impossible to validate whether or not the potentially influential factors that were abundant amongst the prisoners in the study were also relatively absent in the general population of prisoners. This is an example of potential confirmation bias. 3. Use the covariates of the prisoners that did not enroll in the study (measured at release) to identify and adjust for differences between those enrolled in the study with those that did not. In our analysis, we will use matching which is an example of this type of approach. Causal analysis via matching (25 min) The core idea of matching is to group untreated individuals with treated individuals (here, the treatment is the recidivism reduction program) based on similarities in measured covariates to estimate the effect of treatment across the entire population. Assume that in addition to the in- study data, $ (\mathbf{X}_1, \mathbf{Y}_1),\dots, (\mathbf{X}_n, \mathbf{Y}_n)$ we have data from m=55799 prisoners released during the study period that either opted out of the study or were not eligible because they were not at one of the two prisons included in the study. Let us denote the characteristics of these out of study prisoners by X 1 ′ , X 2 ′ , … , X m ′. Informally, for each out of study prisoner j ∈ {1 , … , m} with characteristics X j ′ , our objective is to find an in study prisoner i∈ {1 , … , n } with characteristics X i , such that X i and X j ′ are as close as possible according to some measure of distance. The intuition here is that if prisoner i re-offended (i.e. Y i=1) and X i is close to X j ′ , then it is reasonable to estimate that the out of study prisoner j will re-offend as well. Of course, this analysis is highly dependent on the measure we choose and some thought should be taken when selecting the appropriate measure. For the purposes of this case, we will use the euclidean distance. Recall that given a pair of vectors X =( x 1 , … , x p ) and X ′=( x 1 ′ ,… , x p ′ ), the euclidean distance between X and X ′ is the value √ 2 2 2 d ( X , X ′ ) := ( x1 − x 1 ′ ) + ( x 2 − x2 ′ ) + …+ ( x p − x p ′ ). Thus, for each out-of-study prisoner j , our matching algorithm should find the in-study prisoner i with d ( X i , X ′ j ) as small as possible. After matching, we will compute the mean recidivism rate for the matched sample. This form of matching is called nearest-neighbor-one matching (KNN-1) and is among the most popular and intuitive variants of matching. The characteristics of the out-of-study prisoners are stored in the csv file rec_popn.csv. Let's first read in the rec_study.csv data into a dataframe called in_study_data, then read in rec_popn.csv into a dataframe called out_of_study: in_study_data = pd.read_csv('rec_study.csv') n, p = np.shape(in_study_data) out_of_study_data = pd.read_csv('rec_popn.csv') m, q = out_of_study_data.shape Before proceeding with the matching, it is important that we scale properly the different features of our data! (Why?) There exist various methods used for feature scaling, some popular options are min-max normalization, standardization and scaling to unit length. Here will scale each feature so that the columns of our data have unit length. We can do this using the function norm from numpy.linalg: in_study_norm = norm(in_study_data.iloc[:,0:8], axis=0) in_study_data.iloc[:,0:8]/= in_study_norm out_of_study_data/= in_study_norm In order to get a graphical idea of what our matching will do, let us pick 100 random out-of- study prisoners to merge in our data and make a simple scatter plot: #Select 10 random out-of-study prisoner np.random.seed(0) random_subset=np.random.randint(0,m-1,100) #merge with the all_data=pd.concat([in_study_data,out_of_study_data.iloc[random_subset ,:]],sort=True) all_data=all_data.fillna(2) all_data['reoffend']=all_data['reoffend'].astype('int') #Plot sentence_time and ave_severity features of the prisoners %matplotlib notebook plt.figure(figsize=(10, 6)) sns.scatterplot(x='sentence_time',y='sentence_served',hue='reoffend',d ata=all_data,palette='Paired') Notice that, if we judge only by the sentence time vs. sentence served features, there are a few out-of-study prisoners that we could consider as potential re-offenders (the ones closer to the orange dots) or potential responders (the ones closer to the blue dots). But, if we zoom in on some parts of the plot, there are some instances where it is hard to tell how to classify them. tqdm package (2 min) Sometimes, you will have to use for-loops, even if they are not efficient. You can use the tqdm package to get an estimate of how long it will take for stuff to run: from tqdm.auto import tqdm, trange for j in trange(1000000, desc = "Iterating"): if j%50000 == 0: print("Finished " + str(j) + " iterations") {"model_id":"7dcc2f7d70e24c97b5d3e178cec68762","version_major":2,"vers ion_minor":0} Finished 0 iterations Finished 50000 iterations Finished 100000 iterations Finished 150000 iterations Finished 200000 iterations Finished 250000 iterations Finished 300000 iterations Finished 350000 iterations Finished 400000 iterations Finished 450000 iterations Finished 500000 iterations Finished 550000 iterations Finished 600000 iterations Finished 650000 iterations Finished 700000 iterations Finished 750000 iterations Finished 800000 iterations Finished 850000 iterations Finished 900000 iterations Finished 950000 iterations Exercise 5: (15 min) Write code to create the matched data set for the first 10000 rows of the normalized out-of- study data using the normalized in-study data. (It may take a while to run, so you can use tqdm to track your progresss.) Once you have completed the matching, compute the estimated matched mean. Hint: You could run a loop using the formula provided, but this might be expensive (try it yourself). If you know a little about matrices you could use the following to speed up your computation: If X is an n × k matrix and Y is an m ×k matrix, the squared euclidean distance between the i-th row of X and the j -th row of Y equals the i j-entry of the matrix: ⊤ ⊤ ⊤ M =n o r m ( X ) ⋅o n e s ( m ) − 2 X ⋅Y + o n e s ( n ) ⋅n o r m (Y ) , where n o r m ( X ) is the vector of squared norms of each row of X and o n e s ( p ) is an all ones column vector of length p. Answer. study_outcome = in_study_data['reoffend'] #We use the provided hint X = np.array(in_study_data.iloc[:,0:8]) Y = np.array(out_of_study_data) onesn=np.ones((n,1)) onesm=np.ones((m,1)) #Remember that we need to square the norm here normX=norm(X,axis=1).reshape((n,1))**2 normY=norm(Y,axis=1).reshape((m,1))**2 #The formula given for M M=normX.dot(np.transpose(onesm))-2*X.dot(np.transpose(Y)) +onesn.dot(np.transpose(normY)) M.shape (7105, 55799) matchs=[] #Let us record the matches of each out-study prisoner here. #Our goal is to calculate the percentage of out-of-study population that #would re-offend after the matching. This is stored in matched_mean: matched_mean=0 #Finally, we look for the minimum value across each column for j in trange(m, desc="Number records processed"): vec=M[:,j] i=np.argwhere(vec == np.min(vec)) matchs.append(i) matched_mean+=study_outcome[i] print('The mean of re-offence after matching is ',matched_mean/m) {"model_id":"b5b483fc4b86483da9c38b7a4dd0b8c8","version_major":2,"vers ion_minor":0} The mean of re-offence after matching is 0.555189161096077 We see from the results that the estimated benefit to the population outside the study was significantly worse than the in-study population; in fact, it was closer to being on par with the average among the untreated (60%). Thus, we might hesitate to recommend the policy be widely adopted across the entire population. Unfortunately, this is a common phenomenon in these types of studies. Let's see if we can dig into this result and understand why the results differ. Along the way, we'll explore ways to improve our matching criterion and to speed up the matching algorithm. Looking at differentials in certain features (20 min) To begin, let's consider inclusion in the study as a binary classification problem, wherein a subject is labeled as a "1" if they were included in the study and "0" if they were not. To explore differences in the study and non-study populations, we can consider building a model that predicts whether or not a subject was in the study using their baseline covariates. As a first step in building such a predictive model, though, let's do some basic exploratory data analysis and look at distributions of the covariates. Exercise 6: (20 min) 6.1 Create a new column on both the datasets rec_study.csv and res_popn.csv; call it sample_ind. This column should take the value "1" for rec_study.csv data and "0" for res_popn.csv data. Merge them into a single dataset using outer merge. Note: Do not use the normalized data for this exercise. Answer. One possible solution is given below: in_study_data = pd.read_csv('rec_study.csv') out_of_study_data = pd.read_csv('rec_popn.csv') in_study_data['sample_ind'] = 1 out_of_study_data['sample_ind'] = 0 df_merged = pd.merge(in_study_data, out_of_study_data, how = 'outer') df_merged.head(-5) max_severity ave_severity sentence_time sentence_served \ 0 1 1.000000 15.281172 4.032279 1 1 1.000000 19.816019 1.625926 2 2 1.333333 59.293718 38.978041 3 2 1.333333 72.026361 25.082745 4 1 1.000000 10.618477 9.412762............... 62894 2 1.500000 78.036900 14.733427 62895 2 1.333333 38.725135 38.685805 62896 4 2.500000 92.901046 8.347756 62897 1 1.000000 15.977986 9.667846 62898 4 2.000000 157.926831 143.709528 repeat_offender substance_use support education reoffend \ 0 0 0 1 1 1.0 1 0 0 0 1 0.0 2 0 1 1 1 1.0 3 1 0 0 4 0.0 4 0 0 0 4 0.0.................. 62894 0 1 0 4 NaN 62895 0 0 1 3 NaN 62896 1 0 1 1 NaN 62897 1 1 1 2 NaN 62898 1 1 0 2 NaN sample_ind 0 1 1 1 2 1 3 1 4 1...... 62894 0 62895 0 62896 0 62897 0 62898 0 [62899 rows x 10 columns] 6.2 Create two plots: a boxplot of max severity for the two datasets, and a density plot of max severity for each severity bucket (1 - 5) for the two datasets (use sns.kdeplot()). What can you conclude? Answer. One possible solution is given below: sns.boxplot(y = 'max_severity', x = 'sample_ind',data = df_merged) plt.title("Boxplot of In study vs. Out of Study") Text(0.5, 1.0, 'Boxplot of In study vs. Out of Study') sns.kdeplot(df_merged.loc[df_merged['sample_ind'] == 1,'max_severity'],shade = True, label="In study",color="b") sns.kdeplot(df_merged.loc[df_merged['sample_ind'] == 0,'max_severity'],shade = True, label="Out of study",color="y") plt.title("Density plot of max_severity for In study vs. Out of Study"); It can be seen that the distribution of max severity shifted to the right among non-study participants; i.e. criminals convicted of less severe crimes were more likely to enroll in the study. 6.3 Compare the distributions of the remaining variables for in-study vs. out-of-study. Also, look at the distribution in the sentence gap: sentence time minus sentence served. Answer. One possible solution is given below: sns.kdeplot(df_merged.loc[df_merged['sample_ind'] == 1,'sentence_time'],shade = True, label="In study",color="b") sns.kdeplot(df_merged.loc[df_merged['sample_ind'] == 0,'sentence_time'],shade = True, label="Out of study",color="y") plt.title("Density plot of sentence_time for In study vs. Out of Study"); sns.kdeplot(df_merged.loc[df_merged['sample_ind'] == 1,'sentence_served'],shade = True, label="In study",color="b") sns.kdeplot(df_merged.loc[df_merged['sample_ind'] == 0,'sentence_served'],shade = True, label="Out of study",color="y") plt.title("Density plot of sentence_served for In study vs. Out of Study"); df_merged['sentence_gap'] = df_merged['sentence_time'] - df_merged['sentence_served'] sns.kdeplot(df_merged.loc[df_merged['sample_ind'] == 1,'sentence_gap'],shade = True, label="In study",color="b") sns.kdeplot(df_merged.loc[df_merged['sample_ind'] == 0,'sentence_gap'],shade = True, label="Out of study",color="y") plt.title("Density plot of sentence_gap for In study vs. Out of Study"); Similarly, if we look at the distribution of the sentencing gap, i.e., time sentenced minus time served, we see a difference in the distributions of the study and non-study populations. The figure below shows a QQ plot of the gap distributions. It can be seen that the right tail lies below the 45-degree line indicating that the right tail of the in-study population is larger than that of the out-of-study population. This suggests that those with a large gap (i.e. those released before their full sentence was served, often for "good behavior") were more likely to enter the study: Conclusions (2 min) We examined matching as a means of analyzing the effectiveness of an observational recidivism study. We see that even though the study looked promising on the study population, this was primarily due to factors driving selection into the study. After adjusting for these factors, the program no longer appeared to be effective. We implemented the matching algorithm based on Euclidean distance between covariates. Takeaways (5 min) In this case, we saw that one must take great care in analyzing observational data and pay particular attention to factors driving the construction of such studies. We learned that: 1. Sampling and treatment assignment methodology are key, as lack of prudence in these steps can lead to significant differences between the in-study and out-of-study data points. 2. Matching is one potential approach to analyzing such data. It is a conceptually simple approach but is also crude in that it does not weight covariates in terms of their importance for sample selection. 3. Several measures of distance can be used. In the future, it will be good to look at matching algorithms that weight covariates differently based on sensitivity to the objective variable. Throughout this case we also covered some mathematical notions and manipulations that can be intimidating at first. Come back at this notebook at your own pace if you feel you need to. Remember, ask as many questions as possible to your instructors!

Use Quizgecko on...
Browser
Browser