diff --git a/ESPR-Evaluation/1-Current-Evidence.md b/ESPR-Evaluation/1-Current-Evidence.md new file mode 100644 index 0000000..5d7d60f --- /dev/null +++ b/ESPR-Evaluation/1-Current-Evidence.md @@ -0,0 +1,95 @@ +# On the cost effectiveness of ESPR, for Nick Beckstead, by Nuño Sempere. + +## Introduction + +> There is a certain valuable way of thinking, which is not yet taught in schools, in this present day. This certain way of thinking is not taught systematically at all. It is just absorbed by people who grow up reading books like Surely You’re Joking, Mr. Feynman or who have an unusually great teacher in high school. + +> Most famously, this certain way of thinking has to do with science, and with the experimental method. The part of science where you go out and look at the universe instead of just making things up. The part where you say “Oops” and give up on a bad theory when the experiments don’t support it. + +> But this certain way of thinking extends beyond that. It is deeper and more universal than a pair of goggles you put on when you enter a laboratory and take off when you leave. It applies to daily life, though this part is subtler and more difficult. But if you can’t say “Oops” and give up when it looks like something isn’t working, you have no choice but to keep shooting yourself in the foot. You have to keep reloading the shotgun and you have to keep pulling the trigger. You know people like this. And somewhere, someplace in your life you’d rather not think about, you are people like this. It would be nice if there was a certain way of thinking that could help us stop doing that. + +- Eliezer Yudkowsky, https://www.lesswrong.com/rationality/preface + +## The evidence on CFAR. +The evidence for/against CFAR interests me, because I take it as likely that it is very much correlated with the evidence on ESPR. For example, if reading programs in India show that dividing students by initial level improves their learning outcome, then you'd expect similar processes to be at play in Kenya. Thus, if the evidence on CFAR were robust, we might be able to afford being less rigorous when it comes to ESPR. + +I've mainly looked over [CFAR 2015 Longitudinal Study](http://www.rationality.org/studies/2015-longitudinal-study) and the more recent [Case studies](http://rationality.org/studies/2016-case-studies) and [2017 CFAR Impact report](http://www.rationality.org/resources/updates/2017/cfar-2017-impact-report) + +With regards to the first, I consider the data to be weak evidence on causal questions about the effects of the workshop. The study notes that a control group would be a difficult thing to implement, noting it would require finding people who would like to come to the program and forbidding them to do so. The study tries to compensate for the lack of a control by being statistically clever, and to a certain extent, achieves this. + +I feel like the above is only partially sufficient, that is: it conludes that there is probably some kind of effect. But it's magnitude could be wildly overestimated. Thus, I feel that an RCT can be delayed on the strength of the evidence that CFAR currently has, but not indefinitely. I suggest teaming up with MIT's JPAL, i.e., (The Abdul Latif Jameel Poverty Action Lab)[https://www.povertyactionlab.org/], which specializes on designing and implementing evaluations. JPAL would provide like the following: we can randomly admit people for either this year or the next, and take as the control the group which has been left waiting. + +With regards to the second and third documents, I feel that they provide strong intuitions for why CFAR's logical model is not totally bullshit. This would be something like: CFAR students are taught rationality techniques + have an environment in which they can question their current decisions and consider potentially better choices -> they go on to do more good in the world, f.ex. by switching careers. + +> Eric described the mindset of people at CFAR as “the exact opposite of learned helplessness”, and found that experiencing more of this mindset, in combination with an increased ability to see what was going on with his mind, was particularly helpful for making this shift. + +## ESPR as distinct from CFAR. + +It must be noted that ESPR gets little love from the main organization, being mainly run by volunteers, with some instructors coming in to give classes. Eventually, it might make sense to institute espr as a different organization with a focus on Europe instead of as an American side project. + +## ESPR's Logical model. +I think that the logical model underpinning ESPR is fundamentally solid, i.e., as solid as CFAR's. In the words of a student which came back this year as a Junior Counselor: + +> [Teaches] ESPR smart people not to make stupid mistakes. Examples: betting, prediction markets decrease overconfidence. Units of exchange class decreases likelihood of spending time, money, other currency in counterproductive ways. The whole asking for examples thing prevents people from hiding behind abstract terms and to pretend to understand something when they don't. Some of this is learned in classes. A lot of good techniques from just interacting with people at espr. + +> I've had conversations with otherwise really smart people and thought “you wouldn't be stuck with those beliefs if you'd gone though two weeks of espr” + +> ESPR also increases self-awareness. A lot of espr classes / techniques / culture involves noticing things that happen in your head. This is good for avoiding stupid mistakes and also for getting better at accomplishing things. + +> It is nice to be surrounded by very smart. ambitious people. This might be less relevant for people who do competitions like IMO or go to very selective universities. Personally, it is a fucking awesome and rare experience every time I meet someone really smart with a bearable personality in the real world. Being around lots of those people at espr was awesome. Espr might have made a lot of participants consider options they wouldn't seriously have before talking to the instructors like founding a startup, working on ai alignment, everything that galit talked about etc + +> espr also increased positive impact participants will have on the world in the future by introducing them to effective altruism ideas. I think last year’s batch would have been affected more by this because I remember there being more on x-risk and prioritizing causes and stuff [1]. + +> I spent 15 mins +> =) + +Additionally, espr gives some of it's alumni the opportunity to come back as Junior Counselors, which take on a possition of some responsibility, an aspect not present in cfar workshops. + +[1]. I am not sure I share this impression. In particular, this year, being in Edimburgh, we didn't bring in an FHI person to give a talk. We did have an AI risk panel, and ea/x-risk were important (~10%) focus of conversations. However, I will make a note to bring someone from the FHI next year. We also continued grappling with the boundaries between presenting an important problem and indoctrinating and mindfucking impressionable young persons. + +## Perverse incentives + +As with CFAR's, I think that the profiles in the following section provide useful intuitions. However, while perhaps narratively compelling, there is no control group, which is supremely shitty. These profiles may not allow us to falsify any hypothesis, i.e., to meaningfully change our priors. The evidence is weak in that with the current evidence, I would feel uncomfortable saying that ESPR should be scaled up. + +To the extent that OpenPhilantropy prefers these and other weak forms of evidence *now*, rather than stronger evidence two-three years later, OpenPhilantropy is giving ESPR perverse incentives. Note that with 20-30 students per year, even after we start an rct, there must pass a number of years before we can amass some meaningful statistical power. Furthermore, seeing the process of iterated improvement as an admission of failure would also be catastrophic. + +## Student profiles. + +Alumni only have the tip of your nose perspective. Because some of the effects of espr are similar to rationally adulting, they are different to separate from growing up. + +Stag Lynn: Left university and is fucking around the rationality community in search for status. +Owen Shen: Would still be into ea/rat. Stressfull position in the second year. +Jordan Alexander: Would still be into ea/rat. +Roan Talbut: Mental health? +Haardik Kumar: ?? + +Quaratulain Zainabab: Hero license. +Luke Raskopf: Went to work for CFAR as a result of fucking around as a volunteer at espr. +Stan: Hero license. +Raúl Alfredo Alef Pineda Reyes: Went to MIT, minor celebrity. +Andrew Lin: Similar counterfactual. +Wendi Fan: ?? +Reka Tron: Hero license. +Emily Beaty: Hero license. +Tyler Zhu: ?? +Caleb Ji: ?? + +Lennie: Changed his mind, introduced to ea. Too son to tell. +Yulia: Big impact. Mental health. +Steven Qu: To soon to tell. +Rachana: Mental health. +Andrea Laguna: Mental health. + +I am not answering "difference in expected impact", but "people I like". + +Go over the list again. + +## Randomized control trial. +Necessary to have the full buy in. + +## Alternatives to espr: The cheapest option. +One question which interests me is: what is the cheapest version of the program which is still cost effective? What happens if you just record the classes, send them to bright people, and answer their questions? What if you set up a course on edx? Interventions based on universities and highschools are likely to be much cheaper, given that neither board nor flight, nor classrooms would have to be paid for. Is there a low-cost, scalable approach? + +I'm told that some of the cfar instructors have strong intuitions that in-person teaching is much more effective, based on their own experience and perhaps also on a 2012 small rct, which is either unpublished or unfindable. + +Still, I want to test this assumption, because, almost by definition, to do so would be pretty cheap. As a plus, we can take the population who takes the cheaper course to be a second control group. diff --git a/ESPR-Evaluation/3-Power-calculations.md b/ESPR-Evaluation/3-Power-calculations.md new file mode 100644 index 0000000..8579f3f --- /dev/null +++ b/ESPR-Evaluation/3-Power-calculations.md @@ -0,0 +1,176 @@ +# Power calculations + +Using R we will do some power calculations +Necessary library pwr, loads with library(pwr) +Necessary function: pwr.t2n.test +See: https://www.statmethods.net/stats/power.html + +## Year 1, pessimistic projections + With n-treatment=20, n-control = 20, power = 0.9,sig.level= 0.05, power = 0.9, minimal detectable effect = ? + +t test power calculation + +n1 = 20 +n2 = 20 +d = 1.051997 +sig.level = 0.05 +power = 0.9 +alternative = two.sided + +## Year 1, optimistic projections + With n_treatment=30, n_control = 60, power = 0.9,sig.level= 0.05, minimal detectable effect = ? + +t test power calculation + +n1 = 30 +n2 = 60 +d = 0.7328756 +sig.level = 0.05 +power = 0.9 +alternative = two.sided + +Withn = ?, power = 0.9,sig.level= 0.05, power = 0.9, minimal detectable effect = 0.5 + +Two-sample t test power calculation + +n = 85.03128 +d = 0.5 +sig.level = 0.05 +power = 0.9 +alternative = two.sided + +NOTE: n is number in *each* group + + +## Year 2, pessimistic projections +With n_treatment=40, n_control = 40, power = 0.9,sig.level= 0.05, minimal detectable effect = ? + +t test power calculation + +n1 = 40 +n2 = 40 +d = 0.7339255 +sig.level = 0.05 +power = 0.9 +alternative = two.sided + +## Year 2, optimistic projections +With n_treatment=60, n_control = 120, power = 0.9,sig.level= 0.05, minimal detectable effect = ? + +t test power calculation + +n1 = 60 +n2 = 120 +d = 0.5153056 +sig.level = 0.05 +power = 0.9 +alternative = two.sided + + +## Year 3, pessimistic projections + With n_treatment=60, n_control = 60, power = 0.9,sig.level= 0.05, minimal detectable effect = ? + +t test power calculation + +n1 = 60 +n2 = 60 +d = 0.5967207 +sig.level = 0.05 +power = 0.9 +alternative = two.sided + +## Year 3, optimistic projections + With n_treatment=90, n_control = 180, power = 0.9,sig.level= 0.05, minimal detectable effect = ? + +t test power calculation + +n1 = 90 +n2 = 180 +d = 0.4200132 +sig.level = 0.05 +power = 0.9 +alternative = two.sided + +## Year 4, pessimistic projections + With n_treatment=80, n_control = 80, power = 0.9,sig.level= 0.05, minimal detectable effect = ? + +t test power calculation + +n1 = 80 +n2 = 80 +d = 0.5156619 +sig.level = 0.05 +power = 0.9 +alternative = two.sided + +## Year 4, optimistic projections + With n_treatment=120, n_control = 240, power = 0.9,sig.level= 0.05, minimal detectable effect = ? + +t test power calculation + +n1 = 120 +n2 = 240 +d = 0.3633959 +sig.level = 0.05 +power = 0.9 +alternative = two.sided + +## Population necessary to detect an effect size of 0.2 with significance level = 0.05 and power = 0.9 + +here the free variable was d= minimal detectable effect +Withn = ?, power = 0.9,sig.level= 0.05, power = 0.9, minimal detectable effect = 0.2 + +Two-sample t test power calculation + +n = 526.3332 +d = 0.2 +sig.level = 0.05 +power = 0.9 +alternative = two.sided + +NOTE: n is number in *each* group + +here the free variable was n, the population of the treatment group +son = population of the treatmente group = population of the control group +necessary to detect an effect of 0.2 + +## Population necessary to detect an effect size of 0.5 with significance level = 0.05 and power = 0.9 + +Two-sample t test power calculation + +n = 85.03128 +d = 0.5 +sig.level = 0.05 +power = 0.9 +alternative = two.sided + +NOTE: n is number in *each* group + +## Population necessary to detect an effect size of 0.2 with significance level = 0.10 and power = 0.9 + +Two-sample t test power calculation + +n = 428.8664 +d = 0.2 +sig.level = 0.1 +power = 0.9 +alternative = two.sided + +NOTE: n is number in *each* group + + +## Population necessary to detect an effect size of 0.5 with significance level = 0.10 and power = 0.9 + +Two-sample t test power calculation + +n = 69.19719 +d = 0.5 +sig.level = 0.1 +power = 0.9 +alternative = two.sided + +NOTE: n is number in *each* group + + +## Conclusions. +Even after 4 years, under the most optimistic population projections (i.e., every participant answers our surveys every year, and 60 students who didn't get selected also do), we wouldn't have enough power to detect an effect size of 0.2 standard deviations with significance level = 0.05. However, it seems feasible to detect the kinds of effects which would justify the upward of $150.000 / year costs of ESPR within 3 years. The minimum effect which justifies the costs of ESPR should be determined beforehand, as should the axis along which we measure. I would also suggest to expand the RCT to SPARC once its feasibility has been tested at ESPR. diff --git a/ESPR-Evaluation/4-Measurements.md b/ESPR-Evaluation/4-Measurements.md new file mode 100644 index 0000000..3bbc106 --- /dev/null +++ b/ESPR-Evaluation/4-Measurements.md @@ -0,0 +1,228 @@ +# Measurements + +## Difficulties + +The changes which through ESPR could be induced in the students are, in some sense, fuzzy and soft. There is some tension between measuring what is easiest to measure and measuring what we're actually interested in, and we firmly choose the second kind. For example, when measuring openness, we don't care about questions such as: + +I see Myself as Someone Who... +- Is original, comes up with new ideas +- Is curious about many different things +- Is ingenious, a deep thinker +- Has an active imagination +- Is inventive +- Values artistic, aesthetic experiences +- Prefers work that is routine +- Likes to reflect, play with ideas +- Has few artistic interests +- Is sophisticated in art, music, or +literature + +From John, O. P., & Srivastava, S. (1999). The Big-Five trait taxonomy: History, measurement, and theoretical perspectives. In L. A. Pervin & O. P. John (Eds.), *Handbook of personality: Theory and research* (Vol. 2, pp. 102–138). New York: Guilford Press. + +But instead want to ask things such as: +- What was the last time you tried out something new? +- How often do you try something new? +- How much have you explored vs specialized in the last year? +- What was the last time you did something which you thought had a <=5% chance of succeeding? + +Note that the 2015 CFAR Longitudinal study takes a different approach: +> "We relied heavily on existing measures which have been validated and used by psychology researchers, especially in the areas of well-being and personality. These measures typically are not a perfect match for what we care about, but we expected them to be sufficiently correlated with what we care about for them to be worth using" + +For example, they used the questions written above, but they'd be insufficient to capture the effects of CoZE, one of the highest impact activities in a CFAR Workshop. + +## Things we want to measure. +**- and ways to measure them.** + +Recommend a song which lasts roughly as long as it should take to complete the survey. + +Every time you lie or exaggerate, a kitten dies. By answering this survey, you help make the world a better place. + +If you find yourself fatigued by the length of the survey feel free to take a break and come back. It is also preferable to just go to the end and turn in what you have. Some questions are +explicitly marked 'bonus' or 'optional' meaning they are especially skippable. + +1. Demographic information: +Ask for consent for aggregation / doing a study on this. ✓ +Can we include your survey data in a public dataset? ✓ +Ask for the email. Followup survey. ✓ +Ikea: birthdate: dd.mm.yyyy + initials + first letter of the country you were born with. +Age / Sex assigned at birth / Gender / Country (if many, the one you most identify with) / Ethnic group (most identify) / sexual orientation ✓ + +1. Choices influenced by espr. +- Average prestigiousness of the universities to which the apply / to which they get in. +- % people who are not going to university. +- Do you feel like you've made a life-changing choice in the last year? +If you have: Write a brief tweet. +- Do you feel like your life has significantly changed in the last year? +If you have: Write a brief tweet. +- Do you feel like the course of your life has significantly changed in the last year? +If you have: Write a brief tweet. + +1. Self-Confidence/ Modern Survival Skills. +- I think I could do pretty well in a Zombie Apocalypse. ✓ +- It wouldn't bother me excessively if I woke up in a random city in the world with nothing but my clothes. + +1. Decisiveness. +- To what extent do you agree with the statement: I am a decisive person. +- To what extent would your friends agree with the statement: You are a decisive person. +- What was the last time you did something which you thought had <=5% of succeeding? Why did you attempt it? Also, describe it. ✓ +- To what extent do you struggle with doing things you've decided to do? ✓ + +1. Openess to new experiences? +- What was the last time you tried out something new? ✓ +- How often do you try something new? + +1. People, connections. +- What is the approximate number of people who you interacted with in the past week? +- What is the approximate number of people you'd be willing to confide in about something personal? +- What is the approximate number of people who would let you crash at their place if you needed somewhere to stay? +Their numerical responses were then capped at 300, log transformed, and averaged into a single measure of social support. + +1. Attitudes towards EA. +There is no right or wrong answers. Our philosophical positions are very diverse, and even include nietzschean philosophy. +- Do you know what Effective Altruism is + - Yes / No but I've heard of it / No. +- Do you self-identify as an Effective Altruist? +- Has Effective Altruism caused you to make donations you otherwise wouldn't? +- Do you expect Effective Altruism to cause you to make donations in the future which you otherwise won't? +- If that is the case, what % of your earnings do you expect to donate to ea charities (like Against Malaria Foundation, Malaria Consortium, Schistosomiasis Control Initiative, Evidence Action's Deworm the World Initiative, GiveWell, 80.000 hours, etc) over your life? +- What's your overall opinion of Effective Altruism? +- If you had to distribute 1 billion dollars to different charities, on the basis of which criteria would you do it? + +1. Attitudes towards existential risk. +- Are you familiar with the term "existential risk"? +- Without searching the internet, looking at Wikipedia, etc., how would you describe the concept in a short tweet? +- If you had heard about it, how much of a threat do you think it poses? +- In percentage points, how likely do you judge it that your career will in some way be related to existential risk? And directly related? + +1. Attitudes towards AI Safety +- Are you familiar with the field of AI Safety? +- Have you read any papers related to the field? +- If you knew about it beforehand, how much of a threat do you think it poses? +- How would you describe the concept in a short tweet? +- In percentage points, how likely do you judge it that your career will in some way be related to AI Safety? And directly related? + + +**Sofware upgrade** + +1. Introspective power. Internal Design. Habits. +- I undestand myself ✓ +- I have fiddled with the different parts of myself. +- I work to change the parts of myself which I don't like. ✓ +- I purposefully create habits. ✓ +- When was the last time you did this? + +1. Position towards emotions. +- Emotions as your allies. +- To what extent do you agree with the following: +- Emotions are my allies, +- Emotions often give me useful information. +- Emotions often hinder me, +- I would prefer to feel less. +- I often ignore my emotions. +- I am in touch with my emotions. + +1. Life optimization +- I have in place mechanisms for constant, iterated improvement of my life. +- Write a short tweet about it. +- Units of exchange: I often explicitly consider the tradeoffs between money, time, prestige, etc., when making decisions. +- When was the last time you've done that (if ever) +- Write a short tweet about it. +- Think about your current set of skills, your habits, the things you spend your time on, how you interact with other people, the intellectual questions that you find engaging, the goals you’re aiming towards, and the challenges that you’re currently facing going forward. Next, think about how you were one year ago on each of these dimensions. How different are you now from how you were one year ago? +- Not at all different / Slightly different / Somewhat different / Very different / Extremely different. +- [optional] In about one tweet, what is one difference that stands out as being particularly large or significant? +- Can you think of any changes that you’ve made in the past month to your daily routines or habits in order to make things go better? These can be tiny changes (e.g., adjusted the curtains on my bedroom window so that less light comes in while I’m sleeping) or large ones. Spend about 60 seconds recalling as many examples of these kinds of changes as you can and listing them here. (If you want to skip this question, leave it blank. If you spend the 60 seconds and no specific examples come to mind, write "none.") +- + +1. Mental illness. +I actually don't care about the "Post-espr depression". +While having a mental illness sucks, there is no right or wrong answer. Some of the best people I know face depression, aspergers, etc. +- Have you been diagnosed with a mental illness? +- Do you think you have a mental illness? +- If so, which? + +1. Goal clarity +With regards to my goals, +- I know what my goals are. +- I feel that different parts of myself are aligned. +- I feel that different parts of myself are more aligned than 1y ago. +- I feel that the different parts of myself are more aligned than 1y ago. +- when an internal conflict arises, I have adequate tools to resolve it. +|N: I copied the first person from somewhere else. + +1. Communication +- I can nonviolently communicate with the people I care about. +- When I talk to people, they perceive that I'm speaking in good faith. +- I successfully assert my needs to others. +- The last time I had a discussion, it was resolved gracefully. +- When I debate with people, there is often a satisfying conclusion. + +1. Stupid mistakes. +- How often do you make stupid mistakes? +- When was the last stupid mistake you made? +- Write a short tweet about it. +- Did you implement any measures to avoid making that specific stupid mistake in the future? +- If so, write a short tweet about it. + +1. Life satisfaction +- How satisfied are you with your life as a whole? +- To what extent do you agree with the statement: I am winning at life? +- Stuckness: I feel like my life is stuck + +1. Effective Approaches to Working on Projects +When I decide that I want to do something (like doing a project, developing a new regular practice, or changing some part of my lifestyle), I … +- plan out what specific tasks I will need to do to accomplish it. +- try to think in advance about what obstacles I might face, and how I can get past them. +- seek out information about other people who have attempted similar projects to learn about what they did. +- end up getting it done. +The four items were averaged into a single measure of effective approaches to projects. + +1. Probabilities / Calibration. +- Are you comfortable using probabilities? Do you use them in your daily life / When was the last time you explicitly assigned a probability to something? +- To what extent do you agree with the following: Thinking in terms of probabilities is a valuable tool in my skill repertoire. +- When was the last time you explicitly assigned a probability to something? Write a short tweet about it. + +- "Calibration" is the practice of knowing how certain you are, even when you're not certain. For example, a bookie who says they're 90% certain of the outcome of each of a hundred horse races, and who is right about ninety out of those hundred horse races - is perfectly calibrated. + +In these questions, you will be asked a question and then asked to give a calibration percent. The percent represents your probability that the answer is right. Suppose the question is "What country is the city of Paris located in?" and you are absolutely sure it is France. In that case, your calibration percent is 100 - you are 100% sure it is France. + +But suppose you think there's a fifty-fifty chance it's either France or Germany. In that case, you might still answer France, but your calibration percent is only 50 - you are only 50% sure it's France. + +Or suppose you have no idea, so you pick a country totally at random. In that case, you might think that if there are about one hundred possible countries, and it could be any of them, there's only about a 1% chance you're right. Therefore, you would put down a calibration percent of 1. Please answer on a scale from 0% (definitely false) to 100% (definitely true) + +- Are you smiling right now? +- After each question: Without checking a source, estimate your subjective probability that the answer you just gave is correct.- +- Which is heavier, a virus or a prion? +- I'm thinking of a number between one and ten, what is it? +- What year was the fast food chain "Dairy Queen" founded? (Within five years) +- Alexander Hamilton appears on how many distinct denominations of US Currency? +- Without counting, how many keys on a standard IBM keyboard released after 1986, within ten? +- What's the diameter of a standard soccerball, in cm within 2? +- How many calories in a reese's peanut butter cup within 20? +- What is the probability that supernatural events (including God, ghosts, magic, etc) have occurred since the +beginning of the universe? +- What is the probability that there is a god, defined as a supernatural intelligent entity who created the universe? +- What is the probability that any of humankind's revealed religions is more or less correct? + + + +**LOOK OVER PAST YEAR POST-SURVEYS. LESS WRONG COMMUNITY SURVEY** +**REMEMBER YOU HAVE NOTES AT THE END OF THE CFAR FILE** + +Heroic Narrativomancy +Attachment Theory +Internal Double Crux +Romantic Epistemology +Circling +TAPs +Bayesian Probability +Defense Against the Dark Arts +Happiness / Illusion of the Self +Everything we teach at ESPR is wrong. +Sum Math Class. Models. +CoZE +Seeking sensibility (Luke) = ? +Romantic Epistemology +Circling + +## Attribution +I cribbed some questions from the 2016 LessWrong Diaspora Survey and the CFAR 2015 longitudinal study