From ec1ee19f46feebf2694ae1379d215cd73033d0a6 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Nu=C3=B1o=20Sempere?= <nuno.sempere@gmail.com>
Date: Wed, 31 Oct 2018 19:34:54 +0100
Subject: [PATCH] Update Write-up.md

---
 ESPR-Evaluation/Write-up.md | 21 +++++++++++++--------
 1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/ESPR-Evaluation/Write-up.md b/ESPR-Evaluation/Write-up.md
index f4e7369..3a18708 100644
--- a/ESPR-Evaluation/Write-up.md
+++ b/ESPR-Evaluation/Write-up.md
@@ -1,6 +1,6 @@
 # ESPR-Evaluation Writeup
 
-(Epistemic status: Cognitive dissonance. On the one hand )
+(Epistemic status: Cognitive dissonance.)
 
 ## Introduction
 I have spent the last 2-4 months thinking about how to evaluate the impact of the European Summer Camp on Rationality (ESPR) [1], a selective program affiliated with CFAR (Center for Applied Rationality) which takes brilliant highschoolers and teach thems a variety of rationality techniques. Here are the highlights of what I have found, as well as some remarks on what CFAR could do if it was interested in measuring impact with a randomized controlled trial (an RCT).
@@ -10,6 +10,7 @@ The question I am answering here is not "Should I donate to CFAR?" but "Which pl
 
 ## Current evidence
 
+### Logical model
 There isn't much evidence on how effective ESPR is, besides it's logical model. In the words of a student which came back this year as a Junior Counselor:
 
 >... ESPR (teaches) smart people not to make stupid mistakes. Examples: betting, prediction markets decrease overconfidence. Units of exchange class decreases likelihood of spending time, money, other currency in counterproductive ways. The whole asking for examples thing prevents people from hiding behind abstract terms and to pretend to understand something when they don't. Some of this is learned in classes. A lot of good techniques from just interacting with people at espr.
@@ -22,25 +23,29 @@ There isn't much evidence on how effective ESPR is, besides it's logical model.
 >
 >espr also increased positive impact participants will have on the world in the future by introducing them to effective altruism ideas. I think last year’s batch would have been affected more by this because I remember there being more on x-risk and prioritizing causes and stuff [3].
 
-Thus, because CFAR has a similar logical model, the current evidence on ESPR, i.e, a literature review, would simply be the evidence CFAR has on itself. I've mainly studied [CFAR's 2015 Longitudinal Study](http://www.rationality.org/studies/2015-longitudinal-study) together with the more recent [Case Studies](http://rationality.org/studies/2016-case-studies) and the [2017 CFAR Impact report](http://www.rationality.org/resources/updates/2017/cfar-2017-impact-report).
+On the other hand, when reading CFAR's own [Rationality Checklist](http://www.rationality.org/resources/rationality-checklist), I notice that to acquire the mental movements mentioned seems more like a long term project, and less like a skill acquirable in 4-13 days. This is something which CFAR itself also underscores. However, CFAR and ESPR still have a very similar logical model, so outside the logical model, the current evidence on ESPR, i.e, a literature review, would simply be the evidence CFAR has on itself. 
 
-I find myself confused, in the sense that I don't find it satisfactory, and I wouldn't go about collecting evidence in the same way. On the other hand, I respect these people, and I may be under the effects of tunnel vision after having been reading about RCTs for a couple of months. Alternatively, it could be that their Data Analyst is mostly a normal member of staff / ops person [4], and that justifying their impact is not a priority for this relatively young organization.
+### The Studies CFAR has conducted.
 
-With regards to the first study, it notes that a control group would be a difficult thing to implement, because it would be necessary to find people who would like to come to the program and forbidding them to do so. The study tries to compensate for the lack of a control by being statistically clever. It seems to be rigorous enough for a study which is not an RCT.
+I've mainly studied [CFAR's 2015 Longitudinal Study](http://www.rationality.org/studies/2015-longitudinal-study) together with the more recent [Case Studies](http://rationality.org/studies/2016-case-studies) and the [2017 CFAR Impact report](http://www.rationality.org/resources/updates/2017/cfar-2017-impact-report). I am not aware of any more studies, besides a low powered unpublished and unfindable 2012 RCT.
+
+I find myself confused, in the sense that I don't find it satisfactory, and I wouldn't go about collecting evidence in the same way. On the other hand, I respect these people, and I may be under the effects of tunnel vision after having been reading about RCTs for a couple of months. Alternatively, it could be that their Data Analyst is normally a regular member of staff / ops person [4], and that justifying their impact is not a priority for this relatively young organization. 
+
+With regards to the first study, it notes that a control group would be a difficult thing to implement, because it would be necessary to find people who would like to come to the program and forbidding them to do so. The study tries to compensate for the lack of a control by being statistically clever. It seems to be rigorous enough for a study which is not an RCT, although 
 
 But I feel like that is only partially sufficient. The magnitude of the effect found could be wildly overestimated; MIT's Abdul Latif Jameel Poverty Action Lab provides the following slides [5]:
 
 ![](https://nunosempere.github.io/ESPR-Evaluation/Pre-post-1.jpg)
 ![](https://nunosempere.github.io/ESPR-Evaluation/Pre-post-2.jpg)
 
-I find them scary; depending on the method used to test your effect, you can get an effect size that is 4-5 times as great as the effect you find with an RCT, or about as great, in the other direction. The effects the CFAR study finds, f.ex. the one most prominently displayed in CFAR's webpage, an increased life satisfaction of 0.17 standard deviations (i.e., going from 50 to 56.75%) are small enough for me to worry about such inconveniences.
+I find them scary; depending on the method used to test your effect, you can get an effect size that is 4-5 times as great as the effect you find with an RCT, or about as great, in the other direction. The effects the CFAR study finds, f.ex. the one most prominently displayed in CFAR's webpage, an increased life satisfaction of 0.17 standard deviations (i.e., going from 50 to 56.75%) are small enough for me to worry about such inconveniences. 
 
 Recently, CFAR has moved away from that more rigorous kind of study to Case Studies and Student Profiles. This annoys me, because asking participants for counterfactual estimations is such a swamp of complexity and complications that the error bars are bound to be incredibly wide, and thus most of the impact probably comes from the uncertainty. Additionally, it is just very easy to get very positive reviews of mostly anything; searching for "nonviolent communication testimonials" brings up [this webpage](https://www.rachellelamb.com/testimonials/). In other words, I would expect to find similar texts at mostly any level of impact. 
 
-Finally, one their three Organization Case Studies (Arbital) is now a failed project, but this doesn't change my mind much, because learning that a sparky person who attended CFAR founded a project to improve some aspect of the world didn't give me much information to begin with.
-
+Finally, one their three Organization Case Studies (Arbital) is now a failed project, but this doesn't change my mind much, because learning that a sparky person who attended CFAR founded a project to improve some aspect of the world didn't give me much information to begin with. 
+ 
 ### A note on perverse incentives
-To the extent that OpenPhilantropy prefers these and other weak forms of evidence *now*, rather than stronger evidence two-five years later, OpenPhilantropy might be giving ESPR perverse incentives. Note that with 20-30 students per year, even after we start an RCT, there must pass a number of years before we can amass some meaningful statistical power (see the power calculations). On the other hand, taking a process of iterated improvement as an admission of failure would also be pretty shitty.
+To the extent that OpenPhilantropy prefers Case Studies and other weak forms of evidence *now*, rather than stronger evidence two to five years later, OpenPhilantropy might be giving ESPR perverse incentives. Note that with 20-30 students per year, even after we start an RCT, there must pass a number of years before we can amass some meaningful statistical power (see the power calculations). On the other hand, taking a process of iterated improvement as an admission of failure would also be pretty shitty.
 
 The questions designing a RCT poses are hard, but the bigger problem is that there's an incentive to not ask them at all. But that would be agaist CFAR's ethos.