Reintegrative Shaming Experiments (RISE)
Recidivism patterns in the Canberra Reintegrative Shaming Experiments (RISE)
Lawrence W Sherman, Heather Strang and Daniel J Woods
Research School of Social Sciences, Australian National University
November 2000
Hypotheses
A central hypothesis of the experiments is that there will be less repeat offending after a conference than after court. The hypothesis draws on Braithwaite's theory of reintegrative shaming (1989) which argues that formal court justice stigmatises offenders as well as offences and makes it difficult for them to lead lives as responsible members of the community: the shame and mobilisation of a community of care engendered by a restorative intervention like conferencing should provide an opportunity for offenders to confront the consequences of their actions and allow the harm caused by the offence to be repaired. It also draws on Sherman's (1993) research which found that formal justice can result in more repeat offences than diversion to informal processing, especially when the court experience makes offenders generally more defiant of conventional society.
In addition, there is empirical research showing that people who feel they have been treated fairly by the justice system are more likely to comply with the law in future (Tyler 1990, Barnes 1999). RISE findings to date indicate that offenders randomly assigned to a conference expressed much more satisfaction with the procedural fairness of their treatment than offenders randomly assigned to court. We were particularly interested in examining whether previous research findings about this link would be replicated in the RISE experiments.
This hypothesis has taken longer than anticipated to investigate because of the pace with which cases were referred by police into the experiments and because of the necessity of leaving a window of opportunity for offenders to reoffend. We wished to leave two years between the date on which offenders were randomly assigned to court or to a conference and the date by which their reoffending behaviour was measured. This has been achieved in the drink driving experiment . However, new case assignments for the other three experiments ceased only in June 2000. Our earliest cases have five years of reoffending behaviour which can be measured, while the latest have only a few months. In order to compensate for the wide variation in 'time at risk' and to ensure that a fair assessment is made of offenders' post-assignment offending behaviour compared with their pre-assignment offending behaviour, we have limited the analysis to the minimal time period that we have on the maximum number of cases. Any other approach would lead to misleading comparisons of experimental results across experiments using very different periods of followup measurement of repeat offending.
Measures of repeat offending
In this report, official criminal history data are used to calculate reoffending behaviour. The Australian Federal Police have made these data available for all offenders in each of the four experiments. We have included self-report criminal history questions in the interview schedule administered to each offender two years after their random assignment, but sufficient data are not yet available to allow their meaningful interpretation. We will report on the results of these interviews in future publications and reports posted on the Australian Institute of Criminology website, where all prior RISE reports may be found.
Explaining different effects across the four experiments
In investigating what factors may contribute to the varying levels of reoffending by court-assigned and conference-assigned offenders across the four experiments, many theories can be tested. These include differences in the kinds of offenders who were selected by police to be included in the RISE tests, as well as theories of the offenders' perceptions of the quality of justice. The latter theories may be examined based on the extensive data collected in the offenders' responses at their interview obtained directly after their final case disposition. We have collected data on their perceptions of procedural justice (how fair they saw each procedure to be), of substantive justice (their perceptions of the substantive sanction they received), of restorative justice (their perceptions about the repair of harm) and their emotional reactions to the treatment they experienced. These findings are based upon interview response rates of 85 percent for drink driving offenders, 75 percent for juvenile personal property offenders, 79 percent for juvenile property (shoplifting apprehended by store security) offenders and 68 percent for youth violent crime offenders. We have also collected observational data from almost all conferences and about 85 percent of the court appearances offenders made when they were assigned to court.
While a full exploration of the reasons that restorative justice conferences produced different effects for different experiments must await the completion of data collection on all initial interviews and observations of justice processes, the current report presents a basic comparison of a key difference across the experiments: the kinds of people in each experiment. This analysis examines differences in the length and types of prior offence histories they brought to the experiment, as well as their gender, employment status, and other risk factors.
Analysing repeat offending effects
There are many ways that repeat offending can be analysed. The prevalence of offenders with any repeat offences, the frequency of offending rate of each offender, the time-to-failure (length of time until next criminal apprehension) and other measures may all be used. In experimental designs, it is also possible to examine only the differences between two groups after they have received different treatments-or to examine the differences in offending rates from before to after the treatment. When experiments consider before-after differences in offending rates, they face the further choice of presenting such differences within treatment groups, or differences between the treatment groups in the magnitude of their before-after differences. Each of these designs has different strengths and limitations, and each is more or less appropriate under the circumstances in which the present report is written: completion of case intake without completion of case treatment or followup interviews or recidivism measurement.
Given the circumstances of data collection at the present stage of RISE, the most appropriate design to use in the present analysis is the analysis of before-after differences in offending rates. This approach has the virtue of controlling for differences between the groups in prior offending rates, which can (and did) occur because of relatively small sample sizes in which such baseline differences can occur by chance. When the full samples become available for analysis, these differences will not be as likely to occur with such magnitude, as randomisation tends to even out such differences (on average) as sample sizes grow larger.
The report also employs offending rates rather than the prevalence of repeat offending. The latter measure would be distorted by the differences between treatment groups in some experiments in the prevalence of prior offending. It is also arguably more important for public policy to stress the effects of policies on offending rates, rather than on the proportion of offenders with any repeat offences. While the latter measure may be easier to comprehend, it may mask differences in the volume of crime in the community-differences that are far more important to victims and potential victims of crimes. Put another way, the use of offending frequency rates as the primary measure of repeat offending draws attention to the number of criminal events occurring in the community, rather than the number of active offenders residing in the community. The former of the two features the more direct effect on public safety.
The report uses two methods to present the differences in the rates of offending by offenders assigned to court and conference. One method shows the differences in rate of offending one year before to one year after the assignment of the case to court or conference, within each group. The other method examines the difference of differences. Using both approaches results in three tests of statistical significance ('P' values, or probability that the result is due to chance) reported for each graph showing repeat offending rates. One P value refers to the difference between the rates of offending among offenders randomly assigned to conferences. Another refers to such differences among offenders randomly assigned to court. The third refers to the difference between those two before-after differences.
The report uses the standard that if one treatment group shows a significant before-after change in rates, but the other one does not, that this contrast shows a treatment effect. Alternatively, if there is a difference of differences between the two groups that is not likely to be due to chance, then that is also evidence of a treatment effect. Either or both of these kinds of differences provide strong evidence that there is a different effect of court and conference on offending rates.
What does statistical 'significance' mean?
In this report the term 'statistical significance' is treated as a concept meaning 'statistically discernible from chance'. We assume, with most statisticians, that the cutoff point for such discernibility is arbitrary and that the probability that a result is due to chance can be interpreted by the reader. Thus the conventional significance level of .05 means that there is a 95 percent chance that a result is not a coincidence, a .15 level means that there is an 85 percent chance. In policy terms, an 85 percent chance of being right may be acceptable. Significance levels therefore are more of a heuristic guide to the interpretation of the results than a bright line between a difference and no difference. The magnitude of effect - or how big a difference is - in contrast, receives more weight in our analysis, especially as measured by the statistic for effect size, known as Cohen's D. Where this is large, even borderline statistical significance in the 80 percent range should give one confidence that this is an important result.
Another statistic, 'Somer's D,' is used to describe whether the magnitude of the differences in offender characteristics across experiments is due to chance. These differences are called 'categorical' or nominal data, which simply compare percentages of cases in different categories. This statistic is a measure of how much error can be reduced by predicting that a case will be in a certain category. When the statistic is larger, that suggests that the difference across experiments is substantial enough to warrant further investigation as a possible reason for the different effects of diversionary conferencing.
What does 'assigned' treatment mean?
The report refers often to the level of difference on these measures between court-assigned and conference-assigned offenders. It should be noted that, in order to preserve the equivalence between the two treatment groups, all analysis is presented on the basis of assigned treatment rather than treatment actually delivered. There was a very low deliberate misassignment rate in RISE (three percent), but there were a number of cases where it proved not possible to give the assigned treatment to offenders.1 The logic of using the treatment that was randomly assigned, rather than the one that was actually delivered, is that the delivery of treatment may have been partly a function of the offenders' behavior. If the offender never showed up for a conference, for example, the conference treatment was never delivered. Removing that offender from the 'assigned' conference sample, however, would bias the results in favor of conferences. That bias would occur because any ill-behaved or defiant offenders assigned to conference would weed themselves out, leaving the 'delivered' conference offenders a group likely to have a lower repeat offending rate than the full, randomly assigned sample. In short, using 'assigned' treatments preserves the level playing field between the two treatments, rather than letting other circumstances stack the deck against one or the other of the two approaches.
When does the measured effect occur?
All repeat offending rates are calculated from the day the each offender is randomly assigned to a RISE treatment, rather than from the day that treatment is actually completed. This decision has several bases. One is that the offenders are aware of whether they will go to court or conference from the date we employ, and there may thus be effects of that awareness on their offending behavior. Such 'placebo' effects have been found in medicine, with anticipation of a treatment having effects even when treatment is not delivered. Secondly, there were sometimes substantial delays between apprehension and final treatment, both in court and in conference.2 This creates uneven time periods between treatment groups, and makes fair comparisons impossible. The 'intention to treat' is actually an indication of a policy of treating people this way, and in that sense a better test of what would happen with such a policy-of trying to implement the treatment-than only examining cases of successfully completed treatments. (The terms 'pre-RISE' and 'post-RISE' refer to the 365 day periods before and after the RISE random assignment dates).
What is the difference between 'cases' and 'offenders'?
RISE was based on the random assignment of cases, rather than offenders, to court or conference treatments. A 'case' for these purposes was all of the offenders who were apprehended together for the same criminal offence. The case was the preferred unit of analysis because it is inherent in the theory of restorative justice and diversionary conferencing: that all offenders involved in committing an offence should share responsibility for it. While this was not always the theory of the court-which sometimes treated offenders separately and sometimes together-it was a consistent standard both for eligibility for RISE and for the completion of diversionary conferences. In most cases, conferences were held with all known offenders present in the room and with all known victims of the crime, as well as victim supporters and offender supporters.
Strictly speaking, the analysis of repeat offending is done most precisely when the unit of analysis is the case rather than the offender. The offenders assigned to each treatment vary in number more than the cases assigned to each treatment, which are almost equal. But in this report we employ the offender as the unit of analysis rather than the case. This makes the analysis somewhat easier to follow, as it allows us to avoid using mean offending rates per offender per case as the principal measure of repeat offending. That measure, while technically the best test of repeat offending, is best reserved for the complete analysis of all randomly assigned cases when two-year followup periods are available on all cases.
-
1 For Drink Driving, 96 percent assigned to court were treated in court (the remainder were abandoned), 90 percent assigned to conference were treated by conference (five percent went to court, the remainder were abandoned).
For Juvenile Personal Property, 84 percent assigned to court were treated in court (the remainder were cautioned, abandoned or not proceeded with, 69 percent assigned to conference were treated by conference (12 percent went to court, the remainder were cautioned or abandoned).
For Juvenile Property (Shoplifting apprehended by store security), 89 percent assigned to court were treated in court the reminder were cautioned or abandoned), 89 percent assigned to conference were treated by conference (five percent went to court, the remainder were cautioned or abandoned).
For Youth Violence, 90 percent assigned to court were treated in court (the remainder were either cautioned, abandoned or not proceeded with), 81 percent assigned to conference were treated by conference (nine percent went to court, the remainder were cautioned or abandoned). -
2 Average days until final treatment:
Drink driving - court = 54 days, conference = 60 days
Juvenile personal property - court = 74 days, conference = 106 days
Juvenile property (shoplifting apprehended by store security) - court = 37 days, conference = 63 days
Youth violence - court = 120 days, conference = 111 days
