feat: add content

2022-07-12 14:19:10 +00:00 · 2022-07-12 14:19:10 +00:00 · 70b58eda0c
commit 70b58eda0c
parent 345a241721
20 changed files with 904 additions and 1 deletions
--- a/blog/2022/06/16/criminal-justice/images/03d65aa34f47e0069e60cbb678095b453c2d5a7f.png
+++ b/blog/2022/06/16/criminal-justice/images/03d65aa34f47e0069e60cbb678095b453c2d5a7f.png
--- a/blog/2022/06/16/criminal-justice/images/0c85694437477dccf0b3946a2862e048c19e67bb.png
+++ b/blog/2022/06/16/criminal-justice/images/0c85694437477dccf0b3946a2862e048c19e67bb.png
--- a/blog/2022/06/16/criminal-justice/images/1607d26621c35b0a2878d8db41fe8d91adbb085f.png
+++ b/blog/2022/06/16/criminal-justice/images/1607d26621c35b0a2878d8db41fe8d91adbb085f.png
--- a/blog/2022/06/16/criminal-justice/images/2e1a11f1d4cf06178dc78edf452080242bac8bb1.png
+++ b/blog/2022/06/16/criminal-justice/images/2e1a11f1d4cf06178dc78edf452080242bac8bb1.png
--- a/blog/2022/06/16/criminal-justice/images/4e1e26305ff914c14b17d29ea1c6ebbfa6b1f499.png
+++ b/blog/2022/06/16/criminal-justice/images/4e1e26305ff914c14b17d29ea1c6ebbfa6b1f499.png
--- a/blog/2022/06/16/criminal-justice/images/6f619d6e670da7c4d64d4ac1e60507f06496e3cb.png
+++ b/blog/2022/06/16/criminal-justice/images/6f619d6e670da7c4d64d4ac1e60507f06496e3cb.png
--- a/blog/2022/06/16/criminal-justice/images/71219035c6e5d97058beaa8e8f24cfe43f3cd439.png
+++ b/blog/2022/06/16/criminal-justice/images/71219035c6e5d97058beaa8e8f24cfe43f3cd439.png
--- a/blog/2022/06/16/criminal-justice/images/957c49fd525185c70a2063ba3312ac52aeb51b55.png
+++ b/blog/2022/06/16/criminal-justice/images/957c49fd525185c70a2063ba3312ac52aeb51b55.png
--- a/blog/2022/06/16/criminal-justice/images/e4fe65888c1418fd2de212fc1445c960292326a6.png
+++ b/blog/2022/06/16/criminal-justice/images/e4fe65888c1418fd2de212fc1445c960292326a6.png
--- a/blog/2022/06/16/criminal-justice/index.md
+++ b/blog/2022/06/16/criminal-justice/index.md
@ -0,0 +1,546 @@
+A Critical Review of Open Philanthropy’s Bet On Criminal  Justice Reform
+==============
+
+_Epistemic status_: Dwelling on the negatives.
+
+From 2013 to 2021, Open Philanthropy donated $200M to criminal justice reform. My best guess is that, from a utilitarian perspective, this was likely suboptimal. In particular, I am fairly sure that it was possible to realize sooner that the area was unpromising and act on that earlier on. 
+
+In this post, I first present the background for Open Philanthropy's grants on criminal justice reform, and the abstract case for considering it a priority. I then estimate that criminal justice grants were distinctly worse than other grants in the global health and development portfolio, such as those to GiveDirectly or AMF.
+
+I speculate about why Open Philanthropy donated to criminal justice in the first place, and why it continued donating. I end up uncertain about to what extent this was a sincere play based on considerations around the value of information and learning, and to what extent it was determined by other factors, such as the idiosyncratic preferences of Open Philanthropy's funders, human fallibility and slowness, paying too much to avoid social awkwardness, “worldview diversification” being an imperfect framework imperfectly applied, or it being tricky to maintain a balance between conventional morality and expected utility maximization. In short, I started out being skeptical that a utilitarian, left alone, spontaneously starts exploring criminal justice reform in the US as a cause area, and to some degree I still think that upon further investigation, though I still have significant uncertainty.
+
+I then outline my updates about Open Philanthropy. Personally, I updated downwards on Open Philanthropy’s decision speed, rationality and degree of openness, from an initially very high starting point. I also provide a shallow analysis of Open Philanthropy’s _worldview diversification_ strategy and suggest that they move to a model where regular rebalancing roughly equalizes the marginal expected values for the grants in each cause area. Open Philanthropy is doing that for its global health and development portfolio anyways.
+
+Lastly, I brainstorm some mechanisms which could have accelerated and improved Open Philanthropy's decision-making and suggest red teams and monetary bets or prediction markets as potential avenues of investigation.
+
+Throughout this piece, my focus is aimed at thinking clearly and expressing myself clearly. I understand that this might come across as impolite or unduly harsh. However, I think that providing uncertain and perhaps flawed criticism is [still worth it](https://forum.effectivealtruism.org/users/negativenuno), in expectation. I would like to note that I still respect Open Philanthropy and think that it’s one of the best philanthropic organizations around.
+
+_Open Philanthropy staff reviewed this post prior to publication._
+
+## Index
+
+1.  Background information
+2.  What is the case for Criminal Justice Reform?
+3.  What is the cost-effectiveness of criminal justice grants?
+4.  Why did Open Philanthropy donate to criminal justice in the first place?
+5.  Why did Philanthropy keep donating to criminal justice?
+6.  What conclusions can we reach from this?
+7.  Systems that could have optimized Open Philanthropy’s impact
+8.  Conclusion
+
+## Background information
+
+From 2013 to 2021, Open Philanthropy distributed $199,574,123 to [criminal justice reform](https://www.openphilanthropy.org/giving/grants?field_focus_area_target_id_selective=726) \[0\]. In 2015, they [hired Chloe Cockburn](https://www.openphilanthropy.org/blog/incoming-program-officer-criminal-justice-reform-chloe-cockburn) as a program officer, following a [“stretch goal”](https://www.openphilanthropy.org/blog/open-philanthropy-project-update-us-policy) for the year. They elaborated on their method and reasoning on [The Process of Hiring our First Cause-Specific Program Officer](https://www.openphilanthropy.org/blog/process-hiring-our-first-cause-specific-program-officer). 
+
+In that blog post, they described their expansion into the criminal justice reform space as substantially a “_bet on Chloe_”. Overall, the post was very positive about Chloe (more on red teams below). But the post expressed some reservations because “_Chloe has a generally different profile from the sorts of people GiveWell has hired in the past. In particular, she is probably less quantitatively inclined than most employees at GiveWell. This isn’t surprising or concerning - most GiveWell employees are Research Analysts, and we see the Program Officer role as calling for a different set of abilities. That said, it’s possible that different reasoning styles will lead to disagreement at times. We think of this as only a minor concern_.” In hindsight, it seems plausible to me that this relative lack of quantitative inclination played a role in Open Philanthropy making comparatively suboptimal grants in the criminal justice space \[1\]. 
+
+In mid-2019, Open Philanthropy published a blog post titled [GiveWell’s Top Charities Are (Increasingly) Hard to Beat](https://www.openphilanthropy.org/blog/givewells-top-charities-are-increasingly-hard-beat). It explained that, with [GiveWell’s expansion into researching more areas](https://blog.givewell.org/2019/02/07/how-givewells-research-is-evolving/), Open Philanthropy expected that there would be enough room for more funding for charities that were as good as GiveWell’s top charities. Thus, causes like Criminal Justice Reform looked less promising.
+
+In the months following that blog post, Open Philanthropy donations to Criminal Justice reform spike, with multi-million, multi-year grants going to [Impact Justice](https://www.openphilanthropy.org/focus/us-policy/criminal-justice-reform/impact-justice-restorative-justice-project-2019) ($4M), [Alliance for Safety and Justice](https://www.openphilanthropy.org/focus/us-policy/criminal-justice-reform/alliance-safety-justice-general-support-2019) ($10M), [National Council for Incarcerated and Formerly Incarcerated Women and Girls](https://www.openphilanthropy.org/focus/us-policy/criminal-justice-reform/national-council-incarcerated-and-formerly-incarcerated-women-and-girls-general-support-december-2019) ($2.25M), [Essie Justice Group](https://www.openphilanthropy.org/focus/us-policy/criminal-justice-reform/essie-justice-group-general-support-december-2019) ($3M), [Texas Organizing Project](https://www.openphilanthropy.org/focus/us-policy/criminal-justice-reform/texas-organizing-project-criminal-justice-reform-2019) ($4.2M), [Color Of Change Education Fund](https://www.openphilanthropy.org/focus/us-policy/criminal-justice-reform/color-of-change-education-fund-criminal-justice-reform-2019) ($2.5M) and [The Justice Collaborative](https://www.openphilanthropy.org/focus/us-policy/criminal-justice-reform/the-justice-collaborative-general-support-2019) ($7.8M). 
+
+<p><img src="https://i.imgur.com/4OkAgAS.png" class="img-medium-center"></p>
+
+Initially, I thought that might be because of an expectation of winding down. However, other Open Philanthropy cause areas also show a similar pattern of going up in 2019, perhaps at the expense of spending on Global Health and Development for that year:
+
+<p><img src="https://i.imgur.com/iMxBdkt.png" class="img-medium-center"></p>
+
+In 2021, Open Philanthropy spun out its Criminal Justice Reform department as a new organization: Just Impact. Open Philanthropy seeded Just Impact with $50M. Their [parting blog post](https://www.openphilanthropy.org/blog/our-criminal-justice-reform-program-now-independent-organization-just-impact) explains their thinking: that Global Health and Development interventions have significantly better cost-effectiveness.
+
+## What is the case for Criminal Justice Reform?
+
+_Note:_ This section briefly reviews my own understanding of this area. For a more canonical source, see Open Philanthropy’s [strategy document](https://www.openphilanthropy.org/focus/us-policy/criminal-justice-reform/criminal-justice-reform-strategy) on criminal justice reform.
+
+There are around 2M people in US prisons and jails. Some are highly dangerous, but a glance at a [map of prison population rates](https://ourworldindata.org/grapher/prison-population-rate) per 100k people suggests that the US incarcerates a significantly larger share of its population than most other countries:
+
+Outlining a positive vision for reform is still an area of active work. Still, a first approximation might be as follows:
+
+Criminals should be punished in proportion to an estimate of the harm they have caused, times a factor to account for a less than 100% chance of getting caught, to ensure that crimes are not worth it in expectation. This is in opposition to otherwise disproportionate jail sentences caused by pressures on politicians to appear tough on crime. In addition, criminals then work to provide restitution to the victim, if the victim so desires, per some restorative justice framework \[2\].
+
+In a best-case scenario, criminal justice reform could achieve somewhere between a 25% reduction in incarceration in the short-term and a 75% reduction in the longer term, bringing the incarceration rate down to only twice that of Spain \[4\], while maintaining the crime rate constant. Say that $2B to $20B, or 10x to 100x the amount that Open Philanthropy has already spent, would have a 1 to 10% chance of succeeding at that goal \[5\].
+
+## What is the cost-effectiveness of criminal justice grants?
+
+### Estimation strategy
+
+In this section, I come up with some estimates of the impact of criminal justice reform, and compare them with some estimates of the impact of GiveWell-style global health and development interventions.
+
+Throughout, I am making the following modelling choices:
+
+1.  I am primarily looking at the impact of systemic change
+2.  I am looking at the first-order impacts
+3.  I am using subjective estimates
+
+I am primarily looking at the impact of systemic change because many of the largest Open Philanthropy donations were aiming for systemic change, and their individual cost-effectiveness was extremely hard to estimate. For completeness, I do estimate the impacts of a standout intervention as well.
+
+I am looking at the first-order impacts on prisoners and GiveWell recipients, rather than at the effects on their communities. My strong guess is that the story the second-order impacts would tell—e.g., harms to the community from death or reduced earnings in the case of malaria, harms from absence and reduced earnings in the case of imprisonment)—wouldn’t change the relative values of the two cause areas.
+
+After presenting my estimates, I discuss their limitations.
+
+### Simple model for systemic change
+
+Using those what I consider to be optimistic assumptions over first-order effects, I come up with the following [Squiggle](https://www.squiggle-language.com/playground) model:
+
+```
+initialPrisonPopulation = 1.5M to 2.5M 
+//Data for 2022 prison population has not yet been published, 
+// though this estimate is perhaps too wide.
+reductionInPrisonPopulation = 0.25 to 0.75
+badnessOfPrisonInQALYs = 0.2 to 6 # 80% as good as being alive to 5 times worse than living is good
+counterfactualAccelerationInYears = 5 to 50
+probabilityOfSuccess = 0.01 to 0.1 # 1% to 10%.
+counterfactualImpactOfGrant = 0.5 to 1 ## other funders, labor cost of activism
+estimateQALYs = initialPrisonPopulation 
+  * reductionInPrisonPopulation 
+  * badnessOfPrisonInQALYs 
+  * counterfactualAccelerationInYears 
+  * probabilityOfSuccess 
+  * counterfactualImpactOfGrant
+cost = 2B to 20B
+costPerQALY = cost / estimateQALYs
+costPerQALY
+```
+
+That model produces the following distribution:
+
+<p><img src="https://i.imgur.com/SDpeIg3.png" class="img-medium-center"></p>
+
+Note: `mean(cost)/mean(estimateQALYs)` is equal to $8160/QALY
+
+This model estimates that criminal justice reform buys one QALY \[6\](quality-adjusted life year) for $76k, on average. But the model is very uncertain, and its 90% confidence interval is $1.3k  to ~$290k per QALY. It assigns a 50% chance to it costing less than ~$19k. For a calculation that instead looks at more marginal impact, see [here](https://gist.github.com/NunoSempere/1718dbadfba4012d252d6b6118b72194).
+
+_EDIT 22/06/2022_**:** Commenters pointed out that the mean of `cost / estimateQALYs` in the chart above isn't the right quantity to look at in the chart above. `mean(cost)/mean(estimateQALYs)` is probably a better representation of "expected cost per QALY. That quantity is $8160/QALY for the above model. If one looks at `1/mean(estimateQALYs/cost)`, this is $5k per QALY. Overall I would instead recommend looking at the 95% confidence intervals, rather at the means.  See [this comment thread](https://forum.effectivealtruism.org/posts/h2N9qEbvQ6RHABcae/a-critical-review-of-open-philanthropy-s-bet-on-criminal?commentId=crJ9FLTKikzadqL36) for discussion. I've added notes below each model.
+
+### Simple model for a standout criminal justice reform intervention
+
+Some grants in criminal justice reform might beat systemic reform. I think this might be the case for closing [Rikers](https://www.openphilanthropy.org/focus/us-policy/criminal-justice-reform/justleadershipusa-close-rikers-campaign-2016), bail reform, and prosecutorial accountability: 
+
+*   Rikers is a large and particularly bad prison.
+*   Bail reform seems like a well-defined objective that could affect many people at once.
+*   Prosecutorial accountability could get a large multiplier over systemic reform by focusing on the prosecutors in districts that hold very large prison populations.
+
+For instance, for the case of Rikers, I can estimate:
+
+```
+initialPrisonPopulation = 5000 to 10000
+reductionInPrisonPopulation = 0.25 to 0.75
+badnessOfPrisonInQALYs = 0.2 to 6 # 80% as good as being alive to 5 times worse than living is good
+counterfactualAccelerationInYears = 5 to 20
+probabilityOfSuccess = 0.07 to 0.5
+counterfactualImpactOfGrant = 0.5 to 1 ## other funders, labor cost of activism
+estimatedImpactInQALYs = initialPrisonPopulation 
+  * reductionInPrisonPopulation 
+  * badnessOfPrisonInQALYs 
+  * counterfactualAccelerationInYears 
+  * probabilityOfSuccess 
+  * counterfactualImpactOfGrant
+cost = 5000000 to 15000000
+costPerQALY = cost / estimatedImpactInQALYs
+costPerQALY
+```
+
+<p><img src="https://i.imgur.com/HaiTaBM.png" class="img-medium-center"></p>
+
+Note: `mean(cost)/mean(estimateQALYs)` is $837/QALY
+
+### Simple model for GiveWell charities
+
+**Against Malaria Foundation**
+
+Using a similar estimation for the Against Malaria Foundation:
+
+```
+costPerLife = 3k to 10k
+lifeDuration = 30 to 70
+qalysPerYear = 0.2 to 1 ## feeling unsure about this.
+valueOfSavedLife = lifeDuration * qalysPerYear
+costEffectiveness = costPerLife/valueOfSavedLife
+costEffectiveness
+```
+
+<p><img src="https://i.imgur.com/yg4gMOI.png" class="img-medium-center"></p>
+
+Note: `mean(costPerLife)/mean(valueOfSavedLife)` is $245/QALY
+
+Its 95% confidence interval is $90 to ~$800 per QALY, and I likewise validated this with Simple Squiggle. Notice that this interval is _**disjoint**_ with the estimate for criminal justice reform of $1.3k to $290k.
+
+**GiveDirectly**
+
+One might argue that AMF is too strict a comparison and that one should instead compare criminal justice reform to the marginal global health and development grant. Recently, my colleague Sam Nolan quantified the uncertainty in [GiveDirectly’s estimate of impact](https://observablehq.com/@hazelfire/givewells-givedirectly-cost-effectiveness-analysis). He arrived at a final estimate of ~$120 to ~$960 _per doubling of consumption for one year_. 
+
+<p><img src="https://i.imgur.com/5O22S7Q.png" class="img-medium-center"></p>
+
+The conversion between a doubling of consumption and a QALY is open to some uncertainty. For instance:
+
+*   GiveWell estimates it about equal based on the different weights given to saving people of different ages—a factor of ~0.8 to 1.3, based on some eye-balling from [this spreadsheet](https://docs.google.com/spreadsheets/d/1GgpddvJ4j7gSi20IHUImFh7mYfil5Q7uGzgxeEYZcUA/edit#gid=1495605316).
+*   GiveWell recently [updated](https://www.openphilanthropy.org/research/technical-updates-to-our-global-health-and-wellbeing-cause-prioritization-framework/) their weighings to give a DALY (similar to a QALY) a value of around ~2 doublings of income.
+*   Commenters pointed out that few people would trade half their life to double their income, and that for them a conversion factor around 0.2 might be more appropriate. But they are much wealthier than the average GiveDirectly recipient.
+
+Using a final adjustment of 0.2 to 1.3 QALYs per doubling of consumption (which has a mean of 0.6 QALYs/doubling), I come up with the following model an estimate:
+
+```
+costPerDoublingOfConsumption = 118.4 to 963.15
+qalysPerDoublingOfConsumption = 0.2 to 1.3
+costEffectivenesss=costPerDoublingOfConsumption/qalysPerDoublingOfConsumption
+costEffectivenesss
+```
+
+<p><img src="https://i.imgur.com/2xxnJel.png" class="img-medium-center"></p>
+
+Note: `mean(costPerDoublingOfConsumption)/mean(qalysPerDoublingOfConsumption)` is $690/QALY
+
+This has a 95% confidence interval between $160 and $2700 per QALY.
+
+### Discussion
+
+My estimate for the impact of AMF ($90 to $800 per QALY) **does not overlap** with my estimate for systemic criminal justice reform ($1.3k to $290k per QALY). I think this is informative, and good news for uncertainty quantification: even though both estimates are very uncertain—they range 2 and 3 orders of magnitude, respoectively—we can still tell which one is better.
+
+When comparing GiveDirectly ($160 and $2700 per QALY; mean of $900/QALY) against one standout intervention in the space ($200 to $19K per QALY, with a mean of $5k/QALY), the estimates do overlap, but GiveDirectly is still much better in expectation.
+
+_EDIT 22/06/2022._ Using the better mean, the above paragraph would be: When comparing GiveDirectly ($160 and $2700 per QALY; mean of $690/QALY) against one standout intervention in the space ($200 to $19K per QALY, with a mean of $837/QALY), the estimates do overlap, but GiveDirectly is still better in expectation.
+
+One limitation of these estimates is that they only model first-order effects. GiveWell does have [some estimates](https://docs.google.com/spreadsheets/d/11HsJLpq0Suf3SK_PmzzWpK1tr_BTd364j0l3xVvSCQw/edit#gid=1364064522) of second-order effects (avoiding malaria cases that don’t lead to death, longer-term income increases, etc.) However, for the case of criminal justice interventions, these are harder to estimate. Nonetheless, my strong sense is that the second-order effects of death from malaria or cash transfers are similar to or greater than the second-order effects of temporary imprisonment, and don’t change the relative value of the two cause areas all that much.
+
+Some other sources of model error might be:
+
+*   QALYs being an inadequate modelling choice: QALYs intuitively have a bound of 1 QALY/year, and might not be the right way to think about certain interventions.
+*   I ignored the cost to the US of keeping someone in prison, as opposed to how that money could have been spent otherwise
+*   I didn’t model the increased productivity of someone outside prison
+*   I didn’t estimate recidivism or increased crime from lower incarceration
+*   I didn’t estimate the cost of pushback, such as lobbying for opposite policies
+*   My estimates of the cost of reform were pretty optimistic.
+
+Of these, I think that not modelling the cost to the US of keeping someone in prison, and not modelling recidivism are one of the weakest aspects of my current model. For a model which tries to incorporate these, see the appendix. So overall, there is likely a degree of model error. But I still think that the small models point to something meaningful. 
+
+We can also compare the estimates in this post with other estimates. A [lengthy report](https://www.openphilanthropy.org/files/Focus_Areas/Criminal_Justice_Reform/The_impacts_of_incarceration_on_crime_10.pdf) commissioned by Open Philanthropy on the impacts of incarceration on crime mostly concludes that **marginal** reduction in crime through more incarceration is non-existent—because the effects of reduced crime while prisoners are in prison are compensated by increased crime when they get out, proportional to the length of their sentence. But the report reasons about short-term effects and marginal changes, e.g., based on RCTs or natural experiments, rather than considering longer-term incentive landscape changes following systemic reform. So for the purposes of judging systemic reform rather than marginal changes, I am inclined to almost completely discount it \[7\]. That said, my unfamiliarity with the literature is likely one of the main weaknesses of this post. 
+
+Open Philanthropy’s [own initial casual cause estimations](https://docs.google.com/document/d/1GsE2_TNWn0x6MWL1PTdkZT2vQNFW8VFBslC5qjk4sgo/edit#) are much more optimistic. In a 2020 [interview](https://www.youtube.com/watch?t=2564&v=q4Z0Z-A_O5A&feature=youtu.be) with Chloe Cockburn, she mentions that Open Philanthropy estimates criminal justice reform to be around 1/4th as valuable as donations to top GiveWell charities, but that she is personally higher based on subjective factors \[8\].
+
+For illustration, here are a few grants that I don’t think meet the funding bar of being comparable to AMF or GiveDirectly, based on casual browsing of their websites: 
+
+*   $600k: [Essie Justice Group — General Support](https://www.openphilanthropy.org/focus/us-policy/criminal-justice-reform/essie-justice-group-general-support)
+*   $500k: [LatinoJustice — Work to End Mass Incarceration](https://www.openphilanthropy.org/focus/us-policy/criminal-justice-reform/latino-justice-work-end-mass-incarceration)
+*   $261k: [The Soze Agency — Returning Citizens Project](https://www.openphilanthropy.org/focus/us-policy/criminal-justice-reform/soze-agency-returning-citizens-project)
+*   $255k: [Mijente — Criminal Justice Reform](https://www.openphilanthropy.org/focus/us-policy/criminal-justice-reform/mijente-criminal-justice-reform)
+*   $200k: [Justice Strategies — General Support](https://www.openphilanthropy.org/focus/us-policy/criminal-justice-reform/justice-strategies-general-support)
+*   $100k: [ReFrame Mentorship — General Support](https://www.openphilanthropy.org/focus/us-policy/criminal-justice-reform/reframe-mentorship-general-support-2017)
+*   $100k: Cosecha, general support. ([part 1](https://www.openphilanthropy.org/focus/us-policy/criminal-justice-reform/pico-action-fund-general-support-cosecha), [part 2](https://www.openphilanthropy.org/focus/us-policy/criminal-justice-reform/pico-national-network-general-support-cosecha))
+*   $10k: [Photo Patch Foundation — General Support](https://www.openphilanthropy.org/focus/us-policy/criminal-justice-reform/photo-patch-foundation-general-support-2019)
+
+The [last one](https://www.openphilanthropy.org/focus/us-policy/criminal-justice-reform/photo-patch-foundation-general-support-2019) struck me as being both particularly bad and relatively easy to evaluate: A letter [costs $2.5](https://donorbox.org/patching-relationships-with-letters-photos-2), about the same as [deworming several kids](https://docs.google.com/spreadsheets/d/12C6jPuPuTiq2U8pWCBoT9Az2kunEXGkm_k7cqSIKm_8/edit#gid=215029904) at $0.35 to $0.97 per deworming treatment. But sending a letter intuitively seems significantly less impactful.
+
+Conversely, larger grants, such as, for instance, [a $2.5M grant](https://www.openphilanthropy.org/focus/us-policy/criminal-justice-reform/color-of-change-education-fund-criminal-justice-reform-2019) to [Color Of Change](https://colorofchange.org/), are harder to casually evaluate. For example, that particular grant was given to support [prosecutorial accountability campaigns](https://www.winningjustice.org/) and to support Color Of Change’s [work with the film Just Mercy](https://web.archive.org/web/20200717023842/https://untiljusticeisreal.colorofchange.org/). And because the grant was 50% of Color of Change’s [budget for one year](https://projects.propublica.org/nonprofits/organizations/204496889), I imagine it also subsidized its subsequent activities, such as the [campaigns currently featured on its website](https://colorofchange.org/) \[10\], or the $415k [salary](https://projects.propublica.org/nonprofits/organizations/204496889) of its president \[11\]. So to the extent that the grant’s funds were used for prosecutorial accountability, they may have been more cost-effective, and to the extent that they were used for other purposes, less so. Overall, I don’t think that estimating the cost-effectiveness of larger grants as the cost-effectiveness of systemic change would be grossly unfair.
+
+## Why did Open Philanthropy donate to criminal justice in the first place?
+
+_Epistemic status: Speculation._
+
+I will first outline a few different hypotheses about why Open Philanthropy donated to criminal justice, without regard to plausibility: 
+
+1.  The Back of the Envelope Calculation Hypothesis
+2.  The Value of Information Hypothesis
+3.  The Leverage Hypothesis
+4.  The Strategic Funder Hypothesis
+5.  The Progressive Funders Hypothesis
+6.  The “Politics is The Mind Killer” Hypothesis
+7.  The Non-Updating Funders Hypothesis
+8.  The Moral Tension Hypothesis
+
+I obtained this list by talking to people about my preliminary thoughts when writing this draft. After outlining them, I will discuss which of these I think are most plausible. 
+
+### The Back of the Envelope Calculation Hypothesis
+
+As highlighted in Open Philanthropy blog posts, early on, it wasn’t clear that GiveWell was going to find as many opportunities as it later did. It was plausible that the bar could have gone down with time. If so, and if one has a rosier outlook on the tractability and value of criminal justice reform, it could plausibly have been competitive with other areas.
+
+For instance, per [Open Philanthropy’s estimations:](https://www.openphilanthropy.org/focus/us-policy/criminal-justice-reform/criminal-justice-reform-strategy#What_were_doing_and_why)
+
+> _Each grant is subject to a cost-effectiveness calculation based on the following formula:_
+> 
+> _Number of years averted x $50,000 for prison or $100,000 for jail \[our valuation of a year of incarceration averted\] / 100 \[we aim to achieve at least 100x return on investment, and ideally much more\] - discounts for causation and implementation uncertainty and multiple attribution of credit > $ grant amount. Not all grants are susceptible to this type of calculation, but we apply it when feasible._
+
+That is, Open Philanthropy’s lower bound for funding criminal justice reform was $500 to $1,000 per year of prison/jail avoided. Per this lower bound, criminal justice reform would be roughly as cost-effective as GiveDirectly. But this bound is much more optimistic than my estimates of the cost-effectiveness of criminal justice reform grants above.
+
+### The Value of Information Hypothesis
+
+In 2015, when Open Philanthropy hadn’t invested as much into criminal justice reform, it might have been plausible that relatively little investment might have led to systematic reform. It might have also been plausible that, if found promising, an order of magnitude more funding could have been directed to the cause area.
+
+Commenters in a draft pointed out a second type of information gain: Open Philanthropy might gain experience in grantmaking, learn information, and acquire expertise that would be valuable for other types of giving. In the case of criminal justice reform, I would guess that the specific cause officers—rather than Open Philanthropy as an institution—would gain most of the information. I would also guess that the lessons learnt haven’t generalized to, for instance, pandemic prevention funding advocacy. So my best guess is that the information gained would not make this cause worth it if it otherwise would not have been. But I am uncertain about this.
+
+### The Leverage Hypothesis
+
+Even if systemic change itself is not cost-effective, criminal justice reform and adjacent issues attract a large amount of attention anyway. By working in this area, one could gain leverage, for instance:
+
+*   Leverage over other people’s attention and political will, by investing early in leaders who will be in a position to channel somewhat ephemeral political wills.
+*   Leverage over the grantmaking in the area, by seeding _Just Impact_
+
+### The Strategic Funder Hypothesis
+
+My colleagues raised the hypothesis that Open Philanthropy might have funded criminal justice reform in part because they wanted to look less weird. E.g., “Open Philanthropy/the EA movement has donated to global health, criminal justice reform, preventing pandemics and averting the risks of artificial intelligence” sounds less weird than “...donated to global health, preventing pandemics and averting the risks of artificial intelligence”. 
+
+### The Progressive Funders Hypothesis 
+
+Dustin Moskovitz and Cari Tuna likely have other goals beyond expected utility maximization. Some of these goals might align with the mores of the current left-wing of American society. Or, alternatively, their progressive beliefs might influence and bias their beliefs about what maximizes utility.
+
+On the one hand, I think this would be a mistake. Cause impartiality is one of EA’s major principles, and I think it catalyzes an important part of what we’ve found out about doing good better. But on the other hand, these are not my billions. On the third hand, it seems suboptimal if politically-motivated giving were post-hoc argued to be utility-optimal. If this was the case, I would really have appreciated if their research would have been upfront about this.
+
+### The “Politics is The Mind Killer” Hypothesis
+
+In the domain of politics, reasoning degrades, and principal-agent problems arise. And so another way to look at the grants under discussion is that Open Philanthropy flew too close to politics, and was sucked in. 
+
+To start, there is a selection effect of people who think an area is the most promising going into it. In addition, there is a principal-agent problem where people working inside a cause area are not really incentivized to look for arguments and evidence that they should be replaced by something better. My sense is that people will tend to give very, very optimistic estimates of impact for their own cause area. 
+
+These considerations are general arguments, and they could apply to, for instance, community building or forecasting, with similar force. Though perhaps the warping effects would be stronger for cause areas adjacent to politics.
+
+### The Moral Tension Hypothesis
+
+My sense is that Open Philanthropy funders lean a bit more towards conventional morality, whereas philosophical reflection leans more towards expected utility maximization. Managing the tension between these two approaches seems pretty hard, and it shouldn’t be particularly surprising that a few mistakes were made from a utilitarian perspective.
+
+### Discussion
+
+In conversation with Open Philanthropy staff, they mentioned that the first three hypotheses —Back of the Envelope, Value of Information and Leverage—sounded most true to them. In conversation with a few other people, mostly longtermists, some thought that the Strategic Funders and the Progressive Funders hypothesis were more likely.
+
+I would make a distinction between what the people who made the decision were thinking at the time, and the selection effects that chose those people. And so, I would think that early on, Open Philanthropy leadership mainly was thinking about back-of-the-envelope calculations, value of information, and leverage. But I would also expect them to have done so somewhat constrainedly. And I expect some of the other hypotheses—particularly the “progressive funders hypothesis”, and the “moral tension hypothesis”—to explain those constraints at least a little.
+
+I am left uncertain about whether and to what extent Open Philanthropy was acting sincerely. It could be that criminal justice reform was just a bet that didn’t pay off. But it could also be the case that some factor put the thumb on the scale and greased the choice to invest in criminal justice reform. In the end, Open Philanthropy is probably heterogenous; it seems likely that some people were acting sincerely, and others with a bit of motivated reasoning.
+
+## Why did Open Philanthropy keep donating to criminal justice?
+
+_Epistemic status: More speculation_
+
+### The Inertia Hypothesis
+
+Open Philanthropy wrote about [GiveWell’s Top Charities Are (Increasingly) Hard to Beat](https://www.openphilanthropy.org/blog/givewells-top-charities-are-increasingly-hard-beat) in 2019. They stopped investing in criminal justice reform in 2021, after giving an additional $100M to the cause area. I’m not sure what happened in the meantime.
+
+In a 2016 blog post explaining worldview diversification, Holden Karnofsky [writes](https://www.openphilanthropy.org/blog/worldview-diversification#When_and_how_should_one_practice_worldview_diversification):
+
+> _Currently, we tend to invest resources in each cause up to the point where it seems like there are strongly diminishing returns, or the point where it seems the returns are clearly worse than what we could achieve by reallocating the resources - whichever comes first_
+
+Under some assumptions explained in that post, namely that the amounts given to each cause area are balanced to ensure that the values of the marginal grants to each area are similar, worldview diversification would be approximately optimal even from an expected value perspective \[12\]. My impression is that this monitoring and rebalancing did not happen fast enough in the case of criminal justice reform.
+
+Incongruous as it might ring to my ears, it is also possible that optimizing the allocation of an additional $100M might not have been the most valuable thing for Open Philanthropy’s leadership to have been doing. For instance, exploring new areas, convincing or coordinating with additional billionaires or optimizing other parts of Open Philanthropy’s portfolio might have been more valuable. 
+
+### The Social Harmony Hypothesis
+
+Firing people is hard. When you structured your bet on a cause area as a bet on a specific person, I imagine that resolving that bet as a negative would be awkward \[14\]. 
+
+### The Soft Landing Hypothesis
+
+Abruptly stopping funding can really be detrimental for a charity. So Open Philanthropy felt the need to give a soft roll-off that lasts a few years. On the one hand, this is understandable. But on the other hand, it seems that Open Philanthropy might have given two soft landings, one of $50M in 2019, and another $50M in 2021 to spin-off _Just Impact_.
+
+<p><img src="https://i.imgur.com/4OkAgAS.png" class="img-medium-center"></p>
+
+### The Chessmaster Hypothesis
+
+There is probably some calculation or some factor that I am missing. There is nothing disallowing Open Philanthropy from making moves based on private information. In particular, see the discussion on information gains above. Information gains are particularly hard for me to estimate from the outside. 
+
+## What conclusions can we reach from this?
+
+### On Open Philanthropy’s [Observe–Orient–Decide–Act](https://en.wikipedia.org/wiki/OODA_loop) loops
+
+Open Philanthropy took several years and spent an additional $100M on a cause that they could have known was suboptimal. That feels like too much time.
+
+They also arguably gave two different “golden parachutes” when leaving criminal justice reform. The first, in 2019, gave a number of NGOs in the area generous parting donations. The second, in 2021, gave the outgoing program officers $50 million to continue their work.
+
+This might make similar experimentation—e.g., hiring a program officer for a new cause area, and committing to it only if it goes well—much more expensive. It’s not clear to me that Open Philanthropy would have agreed beforehand to give $100M in “exit grants”.
+
+### On Moral Diversification
+
+Open Philanthropy’s donations to criminal justice were part of its global health and development portfolio, and, thus, in theory, not subject to Open Philanthropy’s worldview diversification framework. But in practice, I get the impression that one of the bottlenecks for not noticing sooner that criminal justice reform was likely suboptimal, might have had to do with worldview diversification.
+
+In [Technical Updates to Our Global Health and Wellbeing Cause Prioritization Framework](https://www.openphilanthropy.org/blog/technical-updates-our-global-health-and-wellbeing-cause-prioritization-framework), Peter Favaloro and Alexander Berger write: 
+
+> _Overall, having a single “bar” across multiple very different programs and outcome measures is an attractive feature because equalizing marginal returns across different programs is a requirement for optimizing the overall allocation of resources_
+> 
+> _Prior to_ [_2019_](https://www.openphilanthropy.org/blog/givewells-top-charities-are-increasingly-hard-beat)_, we used a “100x” bar based on the units above, the scalability of direct cash transfers to the global poor, and the roughly 100x ratio of high-income country income to_ [_GiveDirectly_](https://www.givedirectly.org/) _recipient income. As of 2019, we tentatively switched to thinking of “roughly 1,000x” as our bar for new programs, because that was roughly our estimate of the unfunded margin of the top charities recommended by GiveWell_
+> 
+>  _We’re also updating how we measure the DALY burden of a death; our new approach will accord with GiveWell’s moral weights, which value preventing deaths at very young ages differently than implied by a DALY framework. (_[_More_](https://www.openphilanthropy.org/blog/technical-updates-our-global-health-and-wellbeing-cause-prioritization-framework#New_moral_weights)_)_
+
+> _This post focuses exclusively on how we value different outcomes for humans within Global Health and Wellbeing; when it comes to other outcomes like_ [_farm animal welfare_](https://www.openphilanthropy.org/focus/us-policy/farm-animal-welfare) _or_ [_the far future_](https://www.openphilanthropy.org/focus/global-catastrophic-risks)_, we practice_ [_worldview diversification_](https://www.openphilanthropy.org/blog/worldview-diversification) _instead of trying to have a single unified framework for cost-effectiveness analysis. We think it’s an open question whether we should have more internal “worldviews” that are diversified over within the broad Global Health and Wellbeing remit (vs everything being slotted into a unified framework as in this post)._
+
+  
+Speaking about Open Philanthropy’s portfolio rather than about criminal justice, instead of strict worldview diversification, one could compare these different cause areas as best one can, strive to figure out [better comparisons](https://forum.effectivealtruism.org/posts/3hH9NRqzGam65mgPG/five-steps-for-quantifying-speculative-interventions), and set the marginal impact of grants in each area to be roughly equal. This would better approximate expected value maximization, and it is in fact not too dissimilar to (part of the) the original reasoning for [worldview diversification](https://www.openphilanthropy.org/blog/worldview-diversification). As explained in the original post, worldview diversification makes the most sense in some contexts and under some assumptions: diminishing returns to each cause, and similar marginal values to more funding. 
+
+But somehow, I get the _**weak**_ impression that worldview diversification (partially) started as an [approximation to expected value](https://www.openphilanthropy.org/blog/worldview-diversification), and ended up being more of a [peace pact](https://www.openphilanthropy.org/blog/update-cause-prioritization-open-philanthropy#Allocating_capital_to_buckets_and_causes) between different cause areas. This peace pact disincentivizes comparisons between giving in different cause areas, which then leads to getting their marginal values out of sync. 
+
+Instead, I would like to see:
+
+*   further analysis of alternatives to moral diversification,
+*   more frequent monitoring of whether the assumptions behind moral diversification still make sense,
+*   and a more regular rebalancing of the proportion of funds assigned to each cause according to the value of their marginal grants \[13\].
+
+### On Open Philanthropy’s Openness
+
+After a shallow investigation and reading a few of its public writings, I’m still unsure why exactly Open Philanthropy invested a relatively large amount into this cause area. My impression is that there are some critical details about this that they have not yet written about publicly.
+
+### Open Philanthropy’s Rationality
+
+I used to implicitly model Open Philanthropy as a highly intelligent unified agent to which I should likely defer. I now get the impression that there might be a fair amount of politicking, internal division, and some suboptimal decision-making.
+
+I think that this update was larger for me than it might be for others, perhaps because I initially thought very highly of Open Philanthropy. So others who started from a more moderate starting point should make a more minor update, if any. 
+
+I still believe that Open Philanthropy is likely one of the best organizations working in the philanthropic space.
+
+## Systems that could improve Open Philanthropy’s decision-making
+
+While writing this piece, the uncomfortable thought struck me that if someone had realized in 2017 that criminal justice was suboptimal, it might have been difficult for them to point this out in a way which Open Philanthropy would have found useful. I’m also not sure people would have been actively incentivized to do so.
+
+Once the question is posed, it doesn’t seem hard to design systems that incentivize people to bring potential mistakes to Open Philanthropy’s attention. Below, I consider two options, and I invite commenters to suggest more.
+
+### Red teaming
+
+When investing substantial amounts in a new cause area, putting a large monetary bounty on red teams seems a particularly cheap intervention. For instance, one could put a prize on the best red teaming, and a larger bounty on a red teaming output, leading to a change in plans. The recent [Criticism Contest](https://forum.effectivealtruism.org/posts/8hvmvrgcxJJ2pYR4X/announcing-a-contest-ea-criticism-and-red-teaming) is a one-off example which could in theory address Open Philanthropy.
+
+### Forecasting systems
+
+Per [this recent writeup](https://forum.effectivealtruism.org/posts/RjNFyJS3jPb4DA7wA/how-accurate-are-open-phil-s-predictions), Open Philanthropy has predictions made and graded by each cause’s officer, who average about 1 prediction per $1 million moved. The focus of their prediction setup seems to be on learning from past predictions, rather than on using prediction setups to inform decisions before they are made. And it _seems_ like staff tend to make predictions on individual grants, rather than on strategic decisions.
+
+This echoes the findings of a previous report on [Prediction Markets in the Corporate Setting](https://forum.effectivealtruism.org/posts/dQhjwHA7LhfE8YpYF/prediction-markets-in-the-corporate-setting): organizations are hesitant to use prediction setups in situations where this would change their most important decisions, or where this would lead to social friction. But this greatly reduces the usefulness of predictions. And in fact, we do know that Open Philanthropy's prediction setup failed to avoid the pitfalls outlined in this post.
+
+Instead, have a forecasting system which is not restricted to Open Philanthropy staff, which has real-money bets, and which a focuses on using predictions to change decisions, rather than on learning after the fact. Such a system would ask things such as:
+
+*   whether a key belief underlying the favourable assessment of a grant will later be estimated to be false
+*   whether Open Philanthropy will regret having donated a given grant, or
+*   whether Open Philanthropy will regret some strategic decision, such as going into a cause area, or having set-up such-and-such disbursement schedule,
+
+These questions might be operationalized as:
+
+*   _“In year \[x\], what probability will \[some mechanism\] assign to \[some belief\]?”_
+*   _“In year \[x\], what will Open Philanthropy’s best estimate of the value for grant \[y\] be?”_ + _“In year \[x\], what will be Open Philanthropy’s bar for funding be?”_.
+    *   Or, even simpler still, asking directly or _“in year \[x\], will Open Philanthropy regret having made grant \[y\]?”,_
+*   _“in year \[x\], will Open Philanthropy regret having made decision \[y\]?”,_
+
+There would be a few challenges in creating such a forecasting system in a way that would be useful to Open Philanthropy:
+
+1.  It would be difficult to organize this at scale.
+2.  If open to the public, and if Open Philanthropy was listening to them, it might be easy and desirable to manipulate them.
+3.  If structured as a prediction market, it might not be worth it to participate unless the market also yielded interest.
+4.  If Open Philanthropy had enough bandwidth to create a forecasting system, it would also have been capable of monitoring the criminal justice reform situation more closely (?)
+5.  It would be operationally or legally complex
+6.  Prediction markets are mostly illegal in the US
+
+In 2018, the best way to structure this may have been as follows: Open Philanthropy decides on a probability and a metric of success and offers a trusted set of advisors to bet against the metric being satisfied. Note that the metric can be fuzzy, e.g., “Open Phil employee X will estimate this grant to have been worth it”.
+
+With time, advisors who can predict how Open Philanthropy will change its mind would acquire more money and thus more independent influence in the world. This isn’t [bullet-proof](https://www.lesswrong.com/posts/6bSjRezJDxR2omHKE/real-life-examples-of-prediction-systems-interfering-with)—for instance, advisors would have an incentive to make Open Philanthropy be wrong so that they can bet against them—but it’d be a good start. 
+
+Note that the pathway to impact of making monetary bets wouldn’t only be to change Open Philanthropy’s decisions—which past analysis suggests would be [difficult](https://forum.effectivealtruism.org/posts/dQhjwHA7LhfE8YpYF/prediction-markets-in-the-corporate-setting)—but also to transfer wealth to altruistic actors that have better models of the world.
+
+<p><img src="https://i.imgur.com/HTwilgL.png" class="img-medium-center"></p>
+
+The TarasBob [method](https://twitter.com/TarasBob/status/1498396274068008965) for maximizing predictive accuracy
+
+In July 2022, there still aren’t great forecasting systems that could deal with this problem. The closest might be Manifold Markets, which allows for the fast creation of different markets and the transfer of funds to charities, which gives some monetary value to their tokens. In any case, because setting up such a system might be laborious, one could instead just offer to set such a system up only upon request. 
+
+I am also excited about a few projects that will provide possibly scalable prediction markets, which are set to launch in the next few months and could be used for that purpose. My [forecasting newsletter](https://forecasting.substack.com/) will have announcements when these projects launch. 
+
+## Conclusion
+
+Open Philanthropy spent $200M on criminal justice reform, $100M of which came after their own estimates concluded that it wasn’t as effective as other global health and development interventions. I think Open Philanthropy could have done better.
+
+And I am left confused about why Open Philanthropy did not in fact do better. Part of this may have been their unique approach of worldview diversification. Part of this may have been the political preferences of their funders. And part of this may have been their more optimistic [Fermi estimates](https://docs.google.com/document/d/1GsE2_TNWn0x6MWL1PTdkZT2vQNFW8VFBslC5qjk4sgo/edit#). I oscillate between thinking “I, a young grasshopper, do not understand”, and “this was clearly suboptimal from the beginning, and obviously so”.
+
+Still, Open Philanthropy did end up parting ways with their criminal justice reform team. Perhaps forecasting systems or red teams would have accelerated their decision-making on this topic. 
+
+## Acknowledgements
+
+<p><img src="https://i.imgur.com/7yuRrge.png" class="img-frontpage-center"></p>
+
+Thanks to Linch Zhang, Max Ra, Damon Pourtahmaseb-Sasi, Sam Nolan, Lawrence Newport, Eli Lifland, Gavin Leech, Alex Lawsen, Hauke Hillebrandt, Ozzie Gooen, Aaron Gertler, Joel Becker and others for their comments and suggestions. 
+
+This post is a project by the [Quantified Uncertainty Research Institute](https://quantifieduncertainty.org/) (QURI). The language used to express probabilities distributions used throughout the post is [Squiggle,](https://www.squiggle-language.com/) which is being developed by QURI.
+
+# Appendix: Incorporating savings and the cost of recidivism.
+
+Epistemic status: These models are extremely rough, and should be used with caution. A more trustworthy approach would use the [share of the prison population by type of crime](https://bjs.ojp.gov/content/pub/pdf/p20st.pdf), the [chance of recidivism for each crime](https://www.prisonpolicy.org/graphs/sex_offense_recidivism_2019.html), and the [cost of new offenses by type](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2835847/). Nonetheless, the general approach might be as follows:
+
+```
+// First section: Same as before
+initialPrisonPopulation = 1.8M to 2.5M 
+// Data for 2022 prison population has not yet been published,
+// though this estimate is perhaps too wide.
+reductionInPrisonPopulation = 0.25 to 0.75
+badnessOfPrisonInQALYs = 0.2 to 6 # 80% as good as being alive to 5 times worse than living is good
+accelerationInYears = 5 to 50
+probabilityOfSuccess = 0.01 to 0.1 # 1% to 10%.
+estimateQALYs = initialPrisonPopulation 
+  * reductionInPrisonPopulation
+  * badnessOfPrisonInQALYs
+  * accelerationInYears 
+  * probabilityOfSuccess
+cost = 2B to 20B
+costEffectivenessPerQALY = cost / estimateQALYs
+
+// New section: Costs and savings
+
+numPrisonersFreed = initialPrisonPopulation 
+  * reductionInPrisonPopulation 
+  * accelerationInYears 
+  * probabilityOfSuccess  
+savedCosts = numPrisonersFreed * (14k to 70k)
+savedQALYsFromCosts = savedCosts / 50k
+probabilityOfRecidivism = 0.3 to 0.7
+numIncidentsUntilCaughtAgain = 1 to 10 
+// uncertain; look at what percentage of different 
+// types of crimes are reported and solved.
+costPerIncident = 1k to 50k
+lostCostsFromRecidivism = numPrisonersFreed * probabilityOfRecidivism * costPerIncident
+lostQALYsFromRecidivism = lostCostsFromRecidivism/50k
+costPerQALYIncludingCostsAndIncludingRecidivism = truncateLeft(cost
+  / (estimateQALYs + savedQALYsFromCosts - lostQALYsFromRecidivism), 0)
+// ^ truncateLeft needed because division is very numerically unstable.
+
+// Display
+// costPerQALYIncludingCostsAndIncludingRecidivism
+// ^ increase the number of samples to 10000 and uncomment this line
+```
+
+A review from Open Philanthropy on [the impacts of incarceration on crime](https://www.openphilanthropy.org/files/Focus_Areas/Criminal_Justice_Reform/The_impacts_of_incarceration_on_crime_10.pdf) concludes by saying that "_The analysis performed here suggests that it is hard to argue from high-credibility evidence that at typical margins in the US today, decarceration would harm society_". But “high-credibility evidence” does a lot of the heavy lifting: I have a pretty strong prior that incentives matter, and the evidence is weak. In particular, the evidence provided is a) mostly at the margin, and b) mostly using evidence based on short-term change. So I’m slightly convinced that for small changes, the effect in the short term—e.g., within one generation—is small. But if prison sentences are marginally reduced in length or in quantity, I still end up with the impression that crime would marginally rise in the longer term, as crimes become marginally more worth it. Conversely, if sentences are reduced more than in the margin, common sense suggests that crime will increase, as observed in, for instance, [San Francisco](https://www.wsj.com/articles/san-francisco-crime-chesa-boudin-progressive-prosecutor-11637961667) (note:or not; see [this comment](https://applieddivinitystudies.com/sf-crime-2/) and/or [this investigation](https://applieddivinitystudies.com/sf-crime-2/).)
+
+## Footnotes
+
+\[0\]. This number is $138.8 different than the $138.8M given in Open Philanthropy's [website](https://www.openphilanthropy.org/focus/criminal-justice-reform/), which is probably not up to date with their [grants database](https://www.openphilanthropy.org/grants/).
+
+\[1\]. Note that this paragraph is written from my perspective doing a postmortem, rather than aiming to summarize what they thought at the time.
+
+\[2\]. Note that restorative justice is normally suggested as a total replacement for punitive justice. But I think that pushing back punitive justice until it is incentive compatible and then applying restorative justice frameworks would also work, and would encounter less resistance.
+
+\[3\]. Subjective estimate based on the US having many more guns, a second amendment, a different culture, more of a drug problem.
+
+\[4\]. Subjective estimate; I think it would take 1-2 orders of magnitude more investment than the already given $2B.
+
+\[5\].  Note that QALYs refers to a specific [construct](https://en.wikipedia.org/wiki/Quality-adjusted_life_year). This has led people to [come up](https://forum.effectivealtruism.org/s/2nMw7ASQNQ35iAz4T) with extensions and new definitions, e.g., the WALY (wellbeing-adjusted), HALY (happiness-adjusted), DALY (disability-adjusted), and SALY (suffering-adjusted) life years. But throughout this post, I’m stretching that definition and mostly thinking about “QALYs as they should have been”.
+
+\[6\]. Initially, Squiggle was making these calculations using monte-carlo simulations. However, operations multiplying and dividing lognormals can be done analytically. I extracted the functionality to do so into Simple Squiggle, and then helped the main Squiggle branch compute the model analytically.   
+  
+Simple Squiggle does validate the model as producing an interval of $1.3k to $290k. To check this, feed \`1000000000 \* (2 to 20) / ((1000000 \* 1.5 to 2.5) \* 0.25 to 0.75 \* 0.2 to 6 \* 5 to 50 \* 0.01 to 0.1 \* 0.5 to 1 )\` into it
+
+\[7\]. To elaborate on this, as far as I understand, to estimate the impact of incarceration, the reports' best source of evidence are randomized trials or natural experiments, e.g., harsher judges randomly assigned, arbitrary threshold changes resulting from changes in guidelines or policy, etc. But these methods will tend to estimate short-term changes, rather than longer term (e.g., intergenerational) changes. 
+
+And I would give substantial weight to lighter sentencing in fact making it more worth it to commit crime. See Lagerros' [Unconscious Economics](https://www.lesswrong.com/posts/PrCmeuBPC4XLDQz8C/unconscious-economics).
+
+This topic also has very large number of degrees of choice (e.g., see p. 133 on lowballing the cost of murder on account of it being rare), which I am inclined to be suspicious about. 
+
+The report has a "devil's advocate case". But I think that it could have been much harsher, by incorporating hard-to-estimate long-term incentive changes.
+
+\[8\]. Excerpt, with some light editing to exclude stutters: 
+With a lot of hedging and assumptions and guessing, I think that we can show that we were at around 250x, versus GiveWell, which is at more like 1000x \[9\]. So according to Open Philanthropy, if you're just like, what's the place where I can put my dollar that does the most good, you should give to GiveWell, I think.
+
+That said, I would say well, first of all, if you feel that now's the time, now's a particular unique and important time to be working on this when there is a lot of traction, that puts a thumb on the scale more towards this. Deworming was very important 10 years ago, will be very important in 10 years. I think that's different than this issue, where you have this moments where we can actually make a lot of change, where a boost of cash is good.
+
+And then second, that there is a lot that's not captured in that 250x.
+
+And then third, that 250x is based on the assumption that a year of freedom from prison is worth $50k, and a year of freedom from jail is worth $100k. I think a jail bed gone empty for a year could be worth $250k, for example.
+
+So, I'm telling you this, I don't say this to normal people, I have no idea what I'm talking about. But for EA folks, I think we're closer to 1000x than I've been able to show thus far. But if you want to be like "I'm helping the most that I can be certain about" yeah, for sure, go give your money to deworming, that's still probably true.
+
+\[9\]: "1000x" (resp. 250x) refers to being 1000 times (resp. 250 times) more cost-effective than giving a dollar to someone with $50k of annual income; see [here](https://www.openphilanthropy.org/blog/technical-updates-our-global-health-and-wellbeing-cause-prioritization-framework#New_moral_weights).
+
+\[10\]. As I was writing this, it featured campaigns calling for common carriers to drop Fox, and for Amazon and Twitch to carry out [racial equity audits](https://web.archive.org/web/20220531204618/https://colorofchange.org/). But these have since cycled through.
+
+\[11\]. It rose from $216k in 2016 to $415k in 2019. Honestly I'm not even sure this unjustified; he could probably be a very highly paid political consultant, and a high salary is in fact a strong signal that his funders think that he shouldn't be.
+
+\[12\]. This excludes considerations around how much to donate each year.
+
+\[13\]. A side effect of spinning off Just Impact with a very sizeable initial endowment is that the careers of the Open Philanthropy officers involved appear to continue progressing. Commenters pointed out that this might make it easier to hire talent. But coming from a forecasting background which has some emphasis in proper scoring rules, this seems personally unappealing.
+
+\[14\]. Technically, according to the shape of the values of their grants and the expected future shape, not just the values of the marginal grant.
+
+I also considered suggesting a ruthless Hunger Games-style fight between the representatives of different cause areas, with the winner getting all the resources regardless of diminishing returns. But I concluded that this was likely not possible in practice, and also that the neartermists would probably be in better shape.
--- a/blog/2022/07/04/cancellation-insurance/index.md
+++ b/blog/2022/07/04/cancellation-insurance/index.md
@ -1,7 +1,10 @@
 Cancellation insurance
 ======================

-I am up for offering some amount of insurance for being [cancelled](https://en.wikipedia.org/wiki/Cancel_culture), i.e., losing one's job due as a result of inane culture war fights[^1]. I think that this could be in expectation a mutualy beneficial tradeoff in the case where my counterparty is very risk averse[^2], and so is happy to pay for a healthy risk premium.
+> I like to deliver the predictability you need in these troubled times
+> —[Peter Wildeford](https://twitter.com/peterwildeford/status/1545068907505000453)
+
+I am up for offering some amount of insurance for being [cancelled](https://en.wikipedia.org/wiki/Cancel_culture), i.e., losing one's job as a result of inane culture war fights[^1]. I think that this could be in expectation a mutualy beneficial tradeoff in the case where my counterparty is very risk averse[^2], and so is happy to pay for a healthy risk premium.

 ![](https://i.imgur.com/NdgBpXl.jpg)

--- a/blog/2022/07/05/i-will-bet-on-your-success-or-failure/images/bet-against.png
+++ b/blog/2022/07/05/i-will-bet-on-your-success-or-failure/images/bet-against.png
--- a/blog/2022/07/05/i-will-bet-on-your-success-or-failure/index.md
+++ b/blog/2022/07/05/i-will-bet-on-your-success-or-failure/index.md
@ -0,0 +1,17 @@
+I will bet on your success on Manifold Markets
+==============================================
+
+If you create a market on [Manifold Markets](https://manifold.markets/) on the outcome of an undertaking I will bet on its eventual success or failure. Because project creators or owners tend to be too optimistic about their success, this means that I will most often bet against your success.
+
+This betting allows you to obtain a rough probability for the success of your project. Prediction markets aren't perfect, but a 5% is markedly different from a 50% or from a 90%.
+
+<p><img src="https://i.imgur.com/KjRzuiW.png" alt="" class="img-medium-center" /></p>
+
+Some examples of past markets in this spirit:
+
+- [Will Petra Kosonen submit her thesis by July 6?](https://manifold.markets/JoelBecker/will-petra-kosonen-submit-her-thesi)
+- [Will QURI receive a grant from the SFF in the first half of this year?](https://manifold.markets/Nu%C3%B1oSempere/will-quri-receive-a-grant-from-the)
+- [Will I receive a grant of $50,000 USD before June 1st, 2022?](https://manifold.markets/TimothyRooney/will-i-receive-a-grant-of-50000-usd)
+- [Will I find a new job by the end of August 2022?](https://manifold.markets/dukeGartzea/will-i-find-a-new-job-by-the-end-of)
+
+To let me know about a new such market you want me to bet on, you can find me on [Twitter](https://twitter.com/NunoSempere).
--- a/blog/2022/07/09/maximum-vindictiveness-strategy/images/gadsden.svg
+++ b/blog/2022/07/09/maximum-vindictiveness-strategy/images/gadsden.svg
--- a/blog/2022/07/09/maximum-vindictiveness-strategy/index.md
+++ b/blog/2022/07/09/maximum-vindictiveness-strategy/index.md
@ -0,0 +1,17 @@
+The Maximum Vindictiveness Strategy
+===================================
+
+I've recently been thinking about what the appropriate response to someone fucking with you should be.
+
+<p><img src="https://upload.wikimedia.org/wikipedia/commons/d/df/Gadsden_flag_with_apostrophe.svg" alt="Gadsden flag with apostrophe" class="img-medium-center" /></p>
+
+On the one hand, you have the "roll over and submit" strategy, favored by, for instance, Scott Aaronson, being [apologetic](https://scottaaronson.blog/?p=2119) even after being trod over, or by Aaron Schwartz [killing himself](https://wikiless.org/wiki/Aaron_Swartz?lang=en). On the other extreme, you have the "maximum vindictiveness strategy", implemented by, for instance, Peter Thiel, who—acting within the bounds of legality—utterly destroyed [Gawker](https://wikiless.org/wiki/Gawker?lang=en).
+
+In the middle you'd have Scott Alexander, which didn't react quite so passively to Cade Metz [threatening to dox him](https://astralcodexten.substack.com/p/statement-on-new-york-times-article). Scott Alexander wrote about his plight, deleted his blog, and drove some proportion of the rationalist/EA spheres to unsubscribe from the NYT, but stopped far from the maximum legally allowed amount of vindictiveness. For instance, he could have created a cademetzisanasshole.com page, or publicly warned people from taking interviews with him, etc.
+
+One consideration here is that:
+
+- Implementing the maximum vindictiveness strategy could dissuade malicious actors from targetting you
+- But it has a cost once you are targetted: Peter Thiel could just hire some really badass lawyers, but for me to have a close to comparable effect, I'd have to spend 5-10% of my hours awake implementing revenge.
+
+Ultimately, I think that I am the sort of person that would take the maximum vindictiveness. In particular, because of its cost after-the-fact, maximum vindictiveness is probably an under-provided public good.
--- a/blog/2022/07/12/forecasting-newsletter-june-2022/images/245f33306fb1794aeaa5f30b3938855d35d01961.png
+++ b/blog/2022/07/12/forecasting-newsletter-june-2022/images/245f33306fb1794aeaa5f30b3938855d35d01961.png
--- a/blog/2022/07/12/forecasting-newsletter-june-2022/images/3fc3bd0d1e0beb398fd3cc03df902df76760cd7e.png
+++ b/blog/2022/07/12/forecasting-newsletter-june-2022/images/3fc3bd0d1e0beb398fd3cc03df902df76760cd7e.png
--- a/blog/2022/07/12/forecasting-newsletter-june-2022/images/664da1d51d0ce1b2bdcb2a32a50f12be8d41ae05.png
+++ b/blog/2022/07/12/forecasting-newsletter-june-2022/images/664da1d51d0ce1b2bdcb2a32a50f12be8d41ae05.png
--- a/blog/2022/07/12/forecasting-newsletter-june-2022/images/d0847b7705016ba680bc9bd4308ca992cd6c5814.png
+++ b/blog/2022/07/12/forecasting-newsletter-june-2022/images/d0847b7705016ba680bc9bd4308ca992cd6c5814.png
--- a/blog/2022/07/12/forecasting-newsletter-june-2022/index.md
+++ b/blog/2022/07/12/forecasting-newsletter-june-2022/index.md
@ -0,0 +1,185 @@
+Forecasting Newsletter: June 2022
+==============
+
+## Highlights
+
+*   Sequoia Capital on [forecasting and scenario planning](https://www.sequoiacap.com/wp-content/uploads/sites/6/2022/06/Forecasting_Sequoia-Capital-2022.pdf)
+*   GPI workshop on longterm forecasting happened, notes below.
+*   Forecasters were very [surprised](https://twitter.com/mishayagudin/status/1544121506409730049) by a recent large jump in ML models’ ability to do math
+*   Arb Research [compiles and scores](https://arbresearch.com/files/big_three.pdf) the forecasting track record of the big three science fiction writers
+
+## Index
+
+*   Notes from the 2022 GPI Workshop On Longterm Forecasting
+*   Prediction Markets & Forecasting Platforms
+*   Blog Posts and Research
+*   In The News
+
+You can sign up for this newsletter on [substack](https://forecasting.substack.com) or browse past newsletters [here](https://forum.effectivealtruism.org/s/HXtZvHqsKwtAYP6Y7). If you have a content suggestion or want to reach out, you can leave a comment or find me on [Twitter](https://twitter.com/NunoSempere). Thanks to [Nathan Young](https://twitter.com/NathanpmYoung) for help writing this edition.
+
+## Notes from the 2022 GPI Workshop On Longterm Forecasting
+
+Between the 29th and the 30th of June, the [Global Priorities Institute](https://globalprioritiesinstitute.org/) (GPI) organized a workshop on longterm forecasting and existential risk in Oxford. This section gives my thoughts and shares [the slides](https://drive.google.com/drive/folders/1cxnCxrKahRk43FKK6UxgxWSsZ7OwmA4X?usp=sharing) ([a](https://web.archive.org/web/20220711161527/https://drive.google.com/drive/folders/1cxnCxrKahRk43FKK6UxgxWSsZ7OwmA4X?usp=sharing)) for the presentations whose speakers gave me consent to do so. I was jetlagged throughout the conference, so I'm surely missing some stuff.
+
+### Talks 
+
+(I recommend going through the slides of the talks that sound interesting, and ignoring the rest)
+
+In the opening talk ([slides](https://drive.google.com/file/d/1lmELDGmZFrpVsB57DnegDvm_jD53Jihb/view?usp=sharing) ([a](https://web.archive.org/web/20220711161527/https://drive.google.com/file/d/1lmELDGmZFrpVsB57DnegDvm_jD53Jihb/view?usp=sharing))), Benjamin Tereick goes through GPI's reasons for existing and explains that recently, GPI has begun getting into forecasting, from a very academic angle. He then briefly covered some topics similar to the [Future Indices](https://cset.georgetown.edu/wp-content/uploads/CSET-Future-Indices.pdf) ([a](https://web.archive.org/web/20220711161536/https://cset.georgetown.edu/wp-content/uploads/CSET-Future-Indices.pdf)) report about how to forecast for the long term, for instance by using short-term proxies.
+
+Javier Prieto presented Open Philanthropy's calibration results on their grant forecasts ([slides](https://docs.google.com/presentation/d/1_JpirzX2kf3WfXwr3xrJFunWno2xsQLPfuq3LqNkS8c/edit)), covering content similar to [this blog post](https://forum.effectivealtruism.org/posts/RjNFyJS3jPb4DA7wA/how-accurate-are-open-phil-s-predictions).
+
+![](images/3fc3bd0d1e0beb398fd3cc03df902df76760cd7e.png)
+
+Open Philanthropy's calibration. Notice the 95% predictions, which happen around 60% of the time.
+
+Blanka Havlíčková talked about [Confido](https://confido.tools/) ([a](https://web.archive.org/web/20220711161543/https://confido.tools/)) ([slides](https://docs.google.com/presentation/d/1F1XAUF7TrJP_UlUTM8AupYmO9EzgqNe_/edit?usp=sharinghttps://docs.google.com/presentation/d/1F1XAUF7TrJP_UlUTM8AupYmO9EzgqNe_/edit?usp=sharing&ouid=105745069467002633267&rtpof=true&sd=true) ([a](https://web.archive.org/web/20220711161543/https://docs.google.com/presentation/d/1F1XAUF7TrJP_UlUTM8AupYmO9EzgqNe_/edit?usp=sharing&ouid=105745069467002633267&rtpof=true&sd=true))), an online tool meant to make eliciting forecasts significantly more approachable and fast.
+
+I presented on [Squiggle](https://squiggle-language.com/) ([slides](https://docs.google.com/presentation/d/1MUeCXkdqgCfC_PEY1aNVOJAvuWQ29izK5kHa0rQFjmk/edit?usp=sharing) ([a](https://web.archive.org/web/20220711161559/https://docs.google.com/presentation/d/1MUeCXkdqgCfC_PEY1aNVOJAvuWQ29izK5kHa0rQFjmk/edit?usp=sharing)), [slides content](https://docs.google.com/document/d/1jGwF79RlmAFcUbNHJEJcC_FdTxwfR6NjpQbhOOvhMJo/edit?usp=sharing) ([a](https://web.archive.org/web/20220711161605/https://docs.google.com/document/d/1jGwF79RlmAFcUbNHJEJcC_FdTxwfR6NjpQbhOOvhMJo/edit?usp=sharing))), an estimation tool meant to make hardcore forecasting and evaluation setups more feasible. Our [VS Code extension](https://marketplace.visualstudio.com/items?itemName=QURI.vscode-squiggle) might be of immediate interest to readers.
+
+Charlie Giattino talked about how Our World in Data could be useful for forecasting existential risks ([slides](https://docs.google.com/presentation/d/1nWdFbAsb5UrRp1fh2dIF3q_AUN2cc6_uI1hPZP-Hl2o/edit#slide=id.g13a8313b7ad_0_1)). I particularly appreciated the [one slide](https://twitter.com/NunoSempere/status/1542149704670232576/photo/1) on his thoughts on how best to produce and present forecasts so that policymakers will pay attention to them and find them useful.
+
+David Manheim briefly talked about his experience organizing a biorisk forecasting tournament ([slides](https://docs.google.com/presentation/d/1UMriFXBIJrivyGkDhkAtJJxhcauIFwrpnrkbTIfmpFo/edit#slide=id.p) ([a](https://web.archive.org/web/20220711170036/https://docs.google.com/presentation/d/1UMriFXBIJrivyGkDhkAtJJxhcauIFwrpnrkbTIfmpFo/edit#slide=id.p))). He emphasizes that most of the credit should go to Juan Cambeiro.
+
+Nathan Young ([slides](https://docs.google.com/document/d/1nAyH7WqY4BrjrXMBbDG_iOuoP46tRchI93HhBwJx9nM/edit) ([a](https://web.archive.org/web/20220711194754/https://docs.google.com/document/d/1nAyH7WqY4BrjrXMBbDG_iOuoP46tRchI93HhBwJx9nM/edit?usp=sharing))) talked about his struggles with and solutions for the question generation process. He proposes—and has gotten funding from the FTX Future Fund for—a question creation platform.
+
+David Rhys Bernard talked about approximating long-term forecasts. One ingenious method involved getting forecasters to make predictions about long-term datasets about to be released. This allows for rapid feedback for forecasters making long-term predictions. Eva Vivalt talked about forecasting counterfactuals and her work on the [Social Science Prediction Platform](https://socialscienceprediction.org/) ([a](https://web.archive.org/web/20220711161627/https://socialscienceprediction.org/)). But I can't find either of their slides.
+
+In addition, about half of the presenters didn't give me consent to share their research and/or slides, which I'd say is a pity because some were interesting.
+
+### Discussions
+
+_**To what extent do lessons from short-term, geopolitically flavored forecasting might not generalize to long-term existential forecasting?**_ The overall mood was, I think, that forecasting is not perfect, but still worth using. Personally, I notice that short-term forecasting has a pretty strong prior/bias towards "things will remain the same", and I'm not sure I buy that strong prior for technological forecasting.
+
+Clay Graubard pointed out that back in the day, Tetlock initially answered skeptics' suspicions by pointing out that there was a "goldilocks zone" of forecasters less than a few years out for which we have good past data and good information, and that forecasting was meaningfully better within that goldilocks zone. But existential risk seems like a pretty different beast, and pretty far from that goldilocks zone.
+
+Still, we can use forecasters to predict short-term proxies for long-term impacts, we can update on evidence like good Bayesians even if we aren't directly incentivized, or we can try speculative reward methods.
+
+_**To what extent is forecasting an adequate tool for interacting with policymakers, in contrast with other tools, like scenario planning?**_ A report from Perry World House discussed below interviews a number of policymakers, and they tend to appreciate explicit probabilities. But at least one workshop participant felt that other tools, like scenario planning or "horizon scanning" were more suitable tools.
+
+_**Could we bet against Open Philanthropy’s forecasts?**_ After Javier’s talk, I tried to convince him to allow my forecasting group—Samotsvety Forecasting—to bet against their forecasts. The case for doing this would simply be that allowing people to put their money where their mouth is creates incentives for accuracy. Conversely, decoupling forecasting from any real reward—as OpenPhil seems to currently do—makes the forecasting process become more totemic. In any case, betting seems unlikely to happen.
+
+I also thought it was suboptimal that Open Philanthropy’s predictions were about specific grants, rather than about strategic decisions.
+
+_**To what extent do more expensive forecasting methods produce better or more legible predictions?**_ There is an academic discipline involved with studying and improving forecasting methods. But more complex and innovative forecasting methods have bigger costs, and there is a case to be made that object-level forecasting work—obtaining better models of the world about important topics and translating those better models into predictions—is more important than investing in a marginal forecasting improvement.
+
+Ultimately, it tugs on my heartstrings when forecasting is used for utility maximization. Forecasting leads to better estimation of the consequences of actions, and that in turn can be used to choose better decisions. Right now, enabled by a past abundance in funding, there are many groups working on this broad area. Some might be doomed from the start, but we’ll hopefully produce enough value that it will be worth it.
+
+## Prediction Markets & Forecasting Platforms
+
+### Polymarket
+
+Polymarket hired an [ex-CFTC head](https://www.bloomberg.com/news/articles/2022-05-19/polymarket-names-cryptodad-board-chair-months-after-cftc-probe#sapqmn) back in May. This follows in the footsteps of Kalshi, which previously hired a [CFTC commissioner](https://kalshi.com/blog/former-cftc-commissioner-brian-quintenz-joins-our-board) ([a](https://web.archive.org/web/20220711161404/https://kalshi.com/blog/former-cftc-commissioner-brian-quintenz-joins-our-board)). I don't like the [revolving door](https://wikiless.org/wiki/Revolving_door_(politics)?lang=en) dynamics here.
+
+Prediction markets like Polymarket or Kalshi haven't yet sustainably solved the "sucker problem": In order for the research behind a bet to be worth it, one has to be at least somewhat confident that one’s counterparty will not know more. Polymarket sometimes achieves this on politics questions, for instance when betting against Trump supporters. But it otherwise has been using VC money. One answer to the sucker problem would be for those who want the information to subsidize the markets, but I've yet to see that in practice. In the meantime, Polymarket got some funds from [UMA](https://discourse.umaproject.org/t/revised-funding-request-for-liquidity-mining-program-extension-from-polymarket/1716) ([a](https://web.archive.org/web/20220711161409/https://discourse.umaproject.org/t/revised-funding-request-for-liquidity-mining-program-extension-from-polymarket/1716)), the oracle it is using for resolving its markets, for the purpose of incentivizing trading.
+
+I appreciated Polymarket's coverage of [Boris Johnson's PM survival chances](https://polymarket.com/market/will-boris-johnson-remain-prime-minister-of-the-united-kingdom-through-august) ([a](https://web.archive.org/web/20220711161419/https://polymarket.com/market/will-boris-johnson-remain-prime-minister-of-the-united-kingdom-through-august)).
+
+### Manifold
+
+![](images/664da1d51d0ce1b2bdcb2a32a50f12be8d41ae05.png)
+
+Manifold continues [shipping features](https://news.manifold.markets/) ([a](https://web.archive.org/web/20220711161346/https://news.manifold.markets/)), but its user growth has been [stalling](https://manifold.markets/stats) ([a](https://web.archive.org/web/20220711161337/https://manifold.markets/stats)). Partly as a result, I am offering [to bet against people's success or failure if they create a market on Manifold Markets](https://nunosempere.com/blog/2022/07/05/i-will-bet-on-your-success-or-failure/) ([a](https://web.archive.org/web/20220711161337/https://nunosempere.com/blog/2022/07/05/i-will-bet-on-your-success-or-failure/)).
+
+At the same time, Manifold has received a $500k donation from FTX to build [prediction markets for charity](https://news.manifold.markets/p/above-the-fold-10000-for-charity) ([a](https://web.archive.org/web/20220711161356/https://news.manifold.markets/p/above-the-fold-10000-for-charity)), where people bet real money but the money goes to charity.
+
+### Metaculus
+
+Two comments from Metaculus [this month](https://metaculusextras.com/top_comments?start_date=2022-06-01&end_date=2022-07-01) (a) worth highlighting.
+
+*   [Dan Hendrycks](https://www.metaculus.com/questions/7024/ai-to-beat-humans-on-metaculus/#comment-96276=) brings Metaculus’ attention to a [benchmark to help track ML model forecasting ability](https://arxiv.org/abs/2206.15474) ([a](https://web.archive.org/web/20220711161253/https://arxiv.org/abs/2206.15474)).
+*   Jim1776 mentions that [in Lex Fridman's latest podcast, Demis Hassabis states that DeepMind is in the middle of scaling up Gato](https://www.metaculus.com/questions/3479/date-weakly-general-ai-system-is-devised/#comment-96221=).
+
+Tamay organized an [AI Progress Essay Contest](https://www.metaculus.com/project/ai-fortified-essay-contest/) ([a](https://web.archive.org/web/20220711161314/https://www.metaculus.com/project/ai-fortified-essay-contest/)). He summarizes the results on [Twitter](https://nitter.net/tamaybes/status/1538979307908997122). Metaculus also has a small [humanitarian conflict tournament](https://www.metaculus.com/tournament/humanitarian/).
+
+Metaculus is also looking to [hire people](https://apply.workable.com/metaculus/) ([a](https://web.archive.org/web/20220711161313/https://apply.workable.com/metaculus/)) for a bunch of positions, including that of CTO (Chief Technology Officer).
+
+### Odds and ends
+
+Forecasters—including those from Hypermind, Metaculus and Samotsvety, as well as myself personally—were very surprised by a recent jump in performance on the [MATH dataset](https://bounded-regret.ghost.io/ai-forecasting-one-year-in/) ([a](https://web.archive.org/web/20220711161433/https://bounded-regret.ghost.io/ai-forecasting-one-year-in/)); it generally exceeded our 95% percentile confidence interval. Some Twitter threads about this [here](https://twitter.com/mishayagudin/status/1544121506409730049) ([a](https://web.archive.org/web/20220711161437/https://twitter.com/mishayagudin/status/1544121506409730049)), [here](https://twitter.com/eli_lifland/status/1542580443635146753) ([a](https://web.archive.org/web/20220711161442/https://twitter.com/eli_lifland/status/1542580443635146753)) or [here](https://twitter.com/JacobSteinhardt/status/1543979109738369031).
+
+![](images/245f33306fb1794aeaa5f30b3938855d35d01961.png)
+
+Eli Lifland's predictions on the MATH dataset.
+
+The jump was caused by a new Google AI model, Minerva, which [reaches 50.3% on the MATH dataset](https://ai.googleblog.com/2022/06/minerva-solving-quantitative-reasoning.html) ([a](https://web.archive.org/web/20220711161448/https://ai.googleblog.com/2022/06/minerva-solving-quantitative-reasoning.html)). A [previous model](https://arxiv.org/pdf/2112.15594.pdf) ([a](https://web.archive.org/web/20220711161451/https://arxiv.org/pdf/2112.15594.pdf)) from the beginning of this year reached a performance of 81% but was allowed to use programming libraries, and I think skipped the geometry questions.
+
+The FTX Future Fund's grants and regrants for forecasting can be seen [here](https://ftxfuturefund.org/our-grants/?_search=forecasting) ([a](https://web.archive.org/web/20220711161453/https://ftxfuturefund.org/our-grants/?_search=forecasting)) and [here](https://ftxfuturefund.org/our-regrants/?_search=forecasting) ([a](https://web.archive.org/web/20220711161504/https://ftxfuturefund.org/our-regrants/?_search=forecasting)) respectively. I'd be excited about more people applying!
+
+Avraham Eisenberg [calls the integrity of Kleros in question](https://deepfivalue.substack.com/p/the-kleros-experiment-has-failed) ([a](https://web.archive.org/web/20220711161423/https://deepfivalue.substack.com/p/the-kleros-experiment-has-failed)). Kleros aims to be a decentralized jury system, where evidence is presented to jurors and they have an incentive to resolve cases fairly because of Keynesian Beauty contest dynamics (much like in reciprocal scoring). I used to be a fan of Kleros because I thought it could enable pretty decentralized prediction market resolutions. But now, Eisenberg alleges that one of the Kleros founders successfully ran a 51% attack to resolve cases in his favor.
+
+I liked [this analysis](https://www.gjopen.com/comments/1465987) of Putin's health on Good Judgment Open. Apparently, he underwent cancer surgery in April. The poster assigns an 8% to him losing power by the end of the year, which ultimately isn't all that high.
+
+Kalshi has [some markets](https://kalshi.com/hurricanes) ([a](https://web.archive.org/web/20220711161517/https://kalshi.com/hurricanes)) about this years' hurricane season. They suggest that they could be used as an insurance mechanism.
+
+INFER continues to use some pretty adversarial framings, e.g., on [this page](https://www.infer-pub.com/challenges/32-international-competitiveness-in-ai) ([a](https://web.archive.org/web/20220711161517/https://www.infer-pub.com/challenges/32-international-competitiveness-in-ai)) with questions about the "Global AI Race".
+
+Richard Hanania covers [Kalshi prediction markets](https://richardhanania.substack.com/p/finally-real-money) ([a](https://web.archive.org/web/20220711161427/https://richardhanania.substack.com/p/finally-real-money)).
+
+Hedgehog markets, a crypto prediction market previously known for implementing ["no-loss"](https://hedgehogmarkets.gitbook.io/hedgehog-markets/mainnet-user-guide/no-loss-competitions) ([a](https://web.archive.org/web/20220712122943/https://hedgehogmarkets.gitbook.io/hedgehog-markets/mainnet-user-guide/no-loss-competitions)) markets, has now launched [peer to peer markets](https://p2p.hedgehog.markets/) ([a](https://web.archive.org/web/20220712123053/https://p2p.hedgehog.markets/)). I find the interface a bit clunky to use, but I'm happy it exists.
+
+Aver, another crypto prediction market, launched its [public beta](https://app.aver.exchange/) ([a](https://web.archive.org/web/20220712123213/https://app.aver.exchange/)), which does use real money. 
+
+## Blog Posts and Research
+
+Arb Research [compiles and scores](https://arbresearch.com/files/big_three.pdf) ([a](https://web.archive.org/web/20220711161231/https://arbresearch.com/files/big_three.pdf)) the track record of the ‘big three’ science fiction writers of the second half of the twentieth century: Asimov, Clarke and Heinlein. Holden Karnofsky summarizes this as ["the track record of futurists seems fine"](https://www.cold-takes.com/the-track-record-of-futurists-seems-fine/) ([a](https://web.archive.org/web/20220711161641/https://www.cold-takes.com/the-track-record-of-futurists-seems-fine/)).
+
+Miles Brundage writes that [AGI Timeline Research/Discourse Might Be Overrated](https://forum.effectivealtruism.org/posts/SEqJoRL5Y8cypFasr/why-agi-timeline-research-discourse-might-be-overrated) ([a](https://web.archive.org/web/20220711161642/https://forum.effectivealtruism.org/posts/SEqJoRL5Y8cypFasr/why-agi-timeline-research-discourse-might-be-overrated)).
+
+> Research and discourse on AGI timelines aren't as helpful as they may at first appear, and a lot of the low-hanging fruit (i.e. motivating AGI-this-century as a serious possibility) has already been plucked.
+
+In [the comments](https://forum.effectivealtruism.org/posts/SEqJoRL5Y8cypFasr/why-agi-timeline-research-discourse-might-be-overrated?commentId=n6PE24wHMdQ5fCwY5), Carl Shulman ([a](https://web.archive.org/web/20220711161653/https://forum.effectivealtruism.org/posts/SEqJoRL5Y8cypFasr/why-agi-timeline-research-discourse-might-be-overrated?commentId=n6PE24wHMdQ5fCwY5)) gives some intervention types that are sensitive to timelines.
+
+I really liked [this blog post](https://www.lesswrong.com/posts/uDz4ydD8dZBdm9PgZ/forecasts-are-not-enough) ([a](https://web.archive.org/web/20220711161752/https://www.lesswrong.com/posts/uDz4ydD8dZBdm9PgZ/forecasts-are-not-enough)) by Ege Erdil, on why "forecasts are not enough". The key quote is:
+
+> Physics is a domain in which it's particularly easy to cut out external interference from an experiment and ensure that an understanding of just a few nodes in a causal network and their interactions will be sufficient to make good predictions about the results. If you have a similar good understanding of some subset of a complicated causal network, though, it's possible you really never get to express this in forecast quality in a measurable way if the real world never allows you to ask "local questions".
+> 
+> It would, however, be a mistake to conclude from this that \[a forecasting approach\] understand\[s\] earthquakes better than the expert or that I'm more reliable on questions about e.g. how to reduce the frequency of earthquakes. I likely know nothing about such questions and you better listen to the expert since he knows much more about what actually causes earthquakes than I do. That also means for anyone who wants to take action to affect earthquakes in the real world the expert's advice will be more relevant, but for someone who just wants to sell earthquake insurance my approach is probably what they will prefer.
+
+I published [A Critical Review of Open Philanthropy’s Bet On Criminal Justice Reform](https://forum.effectivealtruism.org/posts/h2N9qEbvQ6RHABcae/a-critical-review-of-open-philanthropy-s-bet-on-criminal):
+
+> Open Philanthropy spent $200M on criminal justice reform, $100M of which came after their own estimates concluded that it wasn’t as effective as other global health and development interventions. I think Open Philanthropy could have done better.
+
+A particularly interesting result from that post was that the 90% confidence intervals of my estimates for the cost-effectiveness of the AMF and of criminal justice reform were not overlapping, even though they were both very wide. Though see [the comments](https://forum.effectivealtruism.org/posts/h2N9qEbvQ6RHABcae/a-critical-review-of-open-philanthropy-s-bet-on-criminal#comments) for some pushback.
+
+![](images/d0847b7705016ba680bc9bd4308ca992cd6c5814.png)
+
+Perry World House has a new short report on [A Roadmap to Implementing Probabilistic Forecasting Methods](https://global.upenn.edu/sites/default/files/perry-world-house/PWH-2022-Forecasting%20Report%20June%202022.pdf) ([a](https://web.archive.org/web/20220711161659/https://global.upenn.edu/sites/default/files/perry-world-house/PWH-2022-Forecasting%20Report%20June%202022.pdf)).
+
+> The National Intelligence Council recently announced that it will once again be piloting a crowdsourced probabilistic geopolitical forecasting platform, after previous attempts to institutionalize this kind of intelligence gathering foundered for bureaucratic reasons. Crowdsourced geopolitical forecasting is a powerful complement to traditional analysis, but just creating a platform is not enough.
+> 
+> The IC will need to make choices about the platform and how it communicates forecasts to both IC leaders and policymakers.
+
+Overall, I thought that it was a good, relatively short introductory report, and I appreciated the summary of interviews with policymakers, who generally tend to appreciate explicit probabilities. I think I caught two mistakes, namely describing Good Judgment Open as "open source", and characterizing the Cosmic Bazaar as a success. My impression is that the Good Judgment Open source code is [nowhere](https://github.com/orgs/CultivateLabs/repositories?type=all) (a) to be [found](https://github.com/goodjudgment) ([a](https://web.archive.org/web/20220711161936/https://github.com/goodjudgment)). And my sources tell me that the Cosmic Bazaar tends to have inoffensive questions, rather than questions which could lead to better decisions.
+
+There is a new [dataset containing thousands of forecasting questions and an accompanying news corpus](https://arxiv.org/abs/2206.15474) ([a](https://web.archive.org/web/20220711161253/https://arxiv.org/abs/2206.15474)), meant to be able to test ML forecasting prowess.
+
+Ben Garfinkel wrote a post [On Deference and Yudkowsky's AI Risk Estimates](https://forum.effectivealtruism.org/posts/NBgpPaz5vYe3tH4ga/on-deference-and-yudkowsky-s-ai-risk-estimates) ([a](https://web.archive.org/web/20220711161755/https://forum.effectivealtruism.org/posts/NBgpPaz5vYe3tH4ga/on-deference-and-yudkowsky-s-ai-risk-estimates)), highlighting some of [Eliezer Yudkowsky](https://en.wikipedia.org/wiki/Eliezer_Yudkowsky)'s past failed estimates. The [comments](https://forum.effectivealtruism.org/posts/NBgpPaz5vYe3tH4ga/on-deference-and-yudkowsky-s-ai-risk-estimates#comments) section was pretty heated.
+
+## In the News
+
+[Epoch](https://epochai.org/blog/announcing-epoch) ([a](https://web.archive.org/web/20220711161806/https://epochai.org/blog/announcing-epoch)) is a new organization working on "working on investigating trends in Machine Learning and forecasting the development of Transformative Artificial Intelligence". They are [hiring](https://epochai.org/careers) ([a](https://web.archive.org/web/20220711161811/https://epochai.org/careers)) for research and management roles, with salaries ranging from $60k to $80k.
+
+The [New York Times](https://web.archive.org/web/20220711175715/https://www.nytimes.com/2022/06/28/business/recession-probability-us.html) has a short article comparing different experts' probabilities of a recession. I thought that prediction markets and forecasting platforms were [much more informative here](https://metaforecast.org/?query=recession) ([a](https://web.archive.org/web/20220711161834/https://metaforecast.org/?query=recession)), because they give a bottom-line probability, rather than a hard-to-aggregate litany of experts.
+
+[Here](https://www.sequoiacap.com/wp-content/uploads/sites/6/2022/06/Forecasting_Sequoia-Capital-2022.pdf) ([a](https://web.archive.org/web/20220711161817/https://www.sequoiacap.com/wp-content/uploads/sites/6/2022/06/Forecasting_Sequoia-Capital-2022.pdf)) is a presentation to Sequoia executives on Forecasting and Scenario planning. They are drawing analogies to the 2000 and 2008 bubbles. The presentation seems to have pretty good models of the world, and I would recommend it to readers who are at all interested or affected by the startup funding situation.
+
+A [few](https://www.elperiodico.com/es/politica/20220531/predicciones-elecciones-andalucia-2022-el-periodico-13730009) ([a](https://web.archive.org/web/20220711161826/https://www.elperiodico.com/es/politica/20220531/predicciones-elecciones-andalucia-2022-el-periodico-13730009)) small [Spanish newspapers](https://www.diariocordoba.com/andalucia/2022/06/18/ganara-elecciones-andalucia-2022-son-66832548.html) ([a](https://web.archive.org/web/20220711161824/https://www.diariocordoba.com/andalucia/2022/06/18/ganara-elecciones-andalucia-2022-son-66832548.html)) featured prediction markets on their coverage of the elections in Andalusia, a region in Spain. The forecasts come from a [new play-money prediction market](https://thepredictionmarket.com/) ([a](https://web.archive.org/web/20220711161858/https://thepredictionmarket.com/)) from the University of Zurich, which I was previously completely unaware of.
+
+India recently did a U-turn around wheat exports; first planning to have substantial exports and then needing to import wheat to deal with the bad crop. An [Indian newspaper](https://theprint.in/india/from-tomatoes-to-wheat-indian-crop-forecasting-is-in-the-grip-of-a-raja-todar-mal-problem/985907/) ([a](https://web.archive.org/web/20220711161841/https://theprint.in/india/from-tomatoes-to-wheat-indian-crop-forecasting-is-in-the-grip-of-a-raja-todar-mal-problem/985907/)) makes the case that this was because of an over-reliance on an archaic system:
+
+> India’s U-turn on wheat exports is a result of incorrect estimates derived from an archaic crop forecasting system devised 4 centuries ago by emperor Akbar’s finance minister.
+> 
+> “When it comes to yield estimates, the budgets are so low that local revenue officials seldom visit the field for CCEs. There is hardly any use of ground truthing aided with satellites or remote sensing. Decisionmakers still rely on a system developed by Raja Todar Mal,” the official added.
+
+CoinDesk has an [introductory article](https://www.coindesk.com/layer2/2022/06/04/forecasting-prediction-markets-and-the-age-of-better-information/) ([a](https://web.archive.org/web/20220711161845/https://www.coindesk.com/layer2/2022/06/04/forecasting-prediction-markets-and-the-age-of-better-information/)) on prediction markets by friends of the newsletter Clay Graubard and Andrew Eaddy.
+
+---
+
+Note to the future: All links are added automatically to the Internet Archive, using this [tool](https://github.com/NunoSempere/longNowForMd) ([a](https://web.archive.org/web/20220711161908/https://github.com/NunoSempere/longNowForMd)). "(a)" for archived links was inspired by [Milan Griffes](https://www.flightfromperfection.com/) ([a](https://web.archive.org/web/20220711161909/https://www.flightfromperfection.com/)), [Andrew Zuckerman](https://web.archive.org/web/20220408093057/https://www.andzuck.com/), and [Alexey Guzey](https://guzey.com/) ([a](https://web.archive.org/web/20220711161922/https://guzey.com/)).
+
+---
+
+> There's a storm comin' that the weatherman couldn't predict 
+> 
+> — Eminem, [Cinderella Man](https://youtu.be/ZWouG1bo6uk?t=55)