feat: savepoint

This commit is contained in:
Nuno Sempere 2023-03-10 19:07:21 +00:00
parent 88ce1d0ed7
commit f6f2b63696
17 changed files with 323 additions and 33 deletions

View File

@ -1,5 +1,5 @@
Why do social movements fail: Two concrete examples.
==============
=====================================================
Status: Time-capped analysis.

View File

@ -1,4 +1,4 @@
A computable version of Solomoff induction
A computable version of Solomonoff induction
==========================================
Thinking about [Just-in-time Bayesianism](https://nunosempere.com/blog/2023/02/04/just-in-time-bayesianism/) a bit more, here is a computable approximation to Solomonoff Induction, which converges to the Turing machine generating your trail of bits in finite time. <script src="https://polyfill.io/v3/polyfill.min.js?features=es6"></script> <script id="MathJax-script" async src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script> <!-- Note: to correctly render this math, compile this markdown with /usr/bin/markdown -f fencedcode -f ext -f footnote -f latex $1 where /usr/bin/markdown is the discount markdown binary https://github.com/Orc/discount http://www.pell.portland.or.us/~orc/Code/discount/ -->

View File

@ -1,81 +1,157 @@
Use of the construction "I'd bet" on the EA Forum is mostly metaphorical
Use of &ldquo;I'd bet&rdquo; on the EA Forum is mostly metaphorical
==========================================================
**tl;dr**: I look at people saying "I'd bet" on the EA Forum. I find that they mostly mean this metaphorically. I suggest reserving the word "bet" for actual bets, offer to act as a judge for the next 10 bets that people ask me to judge, and mention that I'll be keeping an eye on people who offer bets on the EA Forum to consider taking them. I think that these measures give people the
Epistemic status: much ado about nothing.
**tl;dr**: I look at people saying "I'd bet" on the EA Forum. I find that they mostly mean this metaphorically. I suggest reserving the word "bet" for actual bets, offer to act as a judge for the next 10 bets that people ask me to judge, and mention that I'll be keeping an eye on people who offer bets on the EA Forum to consider taking them. Usage of the construction "I'd bet" is a strong signal of belief only if it is occasionally tested, and I suggest we make it so.
Inspired by [this manifold market created by Alex Lawsen](https://manifold.markets/AlexL/has-nuno-sempere-got-an-automated-s)—which hypothesized that I have an automated system to detect where people offer bets—and by [this exchange](https://forum.effectivealtruism.org/posts/xBfp3HGapGycuSa6H/i-m-less-approving-of-the-ea-community-now-than-before-the?commentId=Gafn8GSkTge6q6ezf#46wAwgBFBmHBmoLt4)—where someone said "I would bet all the money I have (literally not figuratively) that X" and then didn't accept one such bet—I wrote a small script[^1] to search for instances of the word "bet" on the EA Forum:
```
$ data="{
\"query\": \"{
comments( input: {
terms: {
limit: 5000
}
}){
results{
postId,
htmlBody,
postedAt
}
}
}\"
}"
data='{ "query": "{ comments( input: {terms: {limit: 5000}} ){ results{ postId, htmlBody, postedAt } } }" }'
$ response="$(curl 'https://forum.effectivealtruism.org/graphql/' \
-X POST -H 'content-type: application/json' --data "$(echo $data)")"
curl 'https://forum.effectivealtruism.org/graphql/' -X POST -H 'content-type: application/json' --data "$data" | jq .data.comments.results | jq 'map(select(.htmlBody | contains(" bet ")))' | jq 'reverse' | ack --passthru ' bet '
$ echo "$response" | jq .data.comments.results |
jq 'map(select(.htmlBody | contains(" bet ")))' |
jq 'reverse' | ack --passthru ' bet '
```
That script fetches the last 5k comments from the EA Forum, and searches for instances of the string " bet ".
## 29 usages of the word "bet" in the last 5k comments.
### 29 usages of the word "bet" in the last 5k comments.
In the last 5k comments, 29 mention the word "bet". I'll just paste them here, and then give some thoughts.
> I'd **bet** that the TEMS utility is lower than the aggregate TEMS costs, especially if you account for TEMS EOL costs in novel ways.
> ---
> I'll **bet** those victims didn't do so well on an IQ test the day after their lives were destroyed.
> ---
> ...if you're feeling brave you can even decide to just look at the variants in some small subset of genes that you **bet** are more likely to have a role in determining your trait, but even after those extra assumptions, you still have many candidate variants.
> ---
> I appreciate pushback. I don't think this issue is being brigaded. If it were we'd see a load of chat on twitter. I think that explains too much. I just would **bet** it isn't the case. Some people are making new burners but I'd guess that's it.
> ---
> We want to make bets on expertises that could come in handy in unpredictable ways. (CoI: I'm making one such **bet** myself as we speak!).
> ---
> This isn't something we need to centrally decide, you can just... start doing it. Start a fund that makes decisions democratically! I **bet** you'd get funding from the EAIF even.
> ---
> I assume these "trained raters" were grad students who had thought about the problem for a couple days or something, and I **bet** that if you actually genuinely studied this you could get good at it, but probably very few people are in that reference class.
> ---
> We don't really know where on subjective wellbeing scales people construe wellbeing to go from positive to negative. My best **bet** is around 2.5 on a 0 to 10 scale.
> ---
> Yeah I'd **bet** this is true. I think there are tradeoffs here though (and I have also talked to women who like the status quo and I assume men do). It's not clear to me that the obvious path forward is.
> ---
> I think I'm gonna make a **bet** on a skillset that isn't correlated with the rest of the movement
> ---
> I think it's probably disproportionately common for the times when your actions were followed by bad outcomes (even if that wasn't caused by your action, or was you making a good **bet** but getting unlucky) to become visible and salient.
> ---
> I don't necessarily endorse this but I think it's valuable for us to know what we think on stuff like this - I **bet** we disagree a surprising amount.
> ---
> Personally I am able to take healing from this post, and I **bet** I'm not the only one who is finding something positive to take out of it
> ---
> We don't know that any complainants would have been paying enough attention to notice announcements, but I **bet** at least some would have, and the Time journalist could have noticed at least.
> ---
> I would **bet** this is a big part of "being a good researcher". Writing this as someone who isn't doing an AI PhD.
> ---
> I wouldn't be surprised if the situation is quite a bit different in authoritarian regimes like China and Russia, where I **bet** there's much less animal liberation activity and, in China at least, a lot more state capacity to crack down on biohazardous practice
> ---
> I'd place a **bet** that the majority of the people who are concerned about this commitment know their content, and that the majority of the people who support it don't.
> ---
> Because IMO -isms are not going to erode this community - I would **bet** almost everyone here is against them, a lot of us actively. But losing EA values - good epistemic, avoiding virtue signalling and poseuring, avoiding group-think, etc. might.
> ---
> I would like for all involved to consider this, basically, a **bet** on "making and publishing this pledge" being an effective intervention on ... something. I'm not sure whether the something is "actual racism and sexism and other bigotry within EA," or "the median EA's discomfort at their uncertainty about whether racism and sexism are a part of EA," or what. But (in the spirit of the E in EA) I'd like that **bet** to be more clear, so since you were willing to leave a comment above: would you be willing to state with a little more detail which problem this was intended to solve, and how confident you (the group involved) are that it will be
a good intervention?
> ---
> Although I can't say for sure, I would also **bet** that there's dozens of unofficial rationalist events (and a few unofficial EA events) that he attended in the last five years, given that he was literally hanging out in the miri/cfar reception area for hours per week, right until the time he was officially banned.
> ---
> Also, for Nuno I'll ask when OP gonna let us **bet** against them.
> ---
> There's a phenomenon where a gambler places their money on 32, and then the roulette wheel comes up 23, and they say "I'm such a fool; I should have **bet** 23". More useful would be to say "I'm such a fool; I should have noticed that the EV of this gamble is negative." Now at least you aren't asking for magic lottery powers. Even more useful would be to say "I'm such a fool; I had three chances to notice that this **bet** was bad: when my partner was trying to explain EV to me; when I snuck out of the house and ignored a sense of guilt; and when I suppressed a qualm right before placing the bet. I should have paid attention in at least one of those cases and internalized the arguments about negative EV, before gambling my money." Now at least you aren't asking for magic cognitive powers. My impression is that various EAs respond to crises in a manner that kinda rhymes with saying "I wish I had **bet** 23", or at best "I wish I had noticed this **bet** was negative EV"
> ---
> Let's say you have to **bet** on whether a machine will turn on or off, however the machine is a 'frustrator', once you **bet** that it will turn off it will turn on and vise versa. You aren't predicting the outcome (you literally can't) you are causin the opposite outcome. Humans are the same, if I see that someone predicts I will do X, I might do Y just to assert my autonomy (or maybe not because of mindgames). So let's say a new prediction market opens on the machine frustrator, no one wants to be the first to make a bet, since you're guaranteed to lose money if no one else participates. Even if people are irrational enough to **bet** on it, we don't learn anything new and we just wasted a bunch of electricity and manhours for nothing. It's a waste of time and money to **bet** on frustrators, especially since you can get a 10% return on the stockmarket.
> ---
> When the president promises not to manipulate the market about event X some people will trust him 90% and **bet** accordingly while others will trust him 10% and **bet** according to that. But that's not the same thing as people having 10% credence event X will happen, it can be that you think the president is trustworthy but the event itself is unlikely [...]
> ---
> Compared to other interventions I would not **bet** a substantial amount on proofs for deep learning systems given its important, neglectedness, and tractability
> ---
> The large underclass has almost no money to **bet** that it will, while the small upperclass bets a large chunk of their money that it won't. Predictably, more money is betted on it not increasing welfare and when the market closes, everyone gets their money back and the government decides not to implement it.
> ---
> I'd **bet** you that multiple EA communities and orgs improved or are working on improving their processes after reading your post (and the comments as by EA NYC). Community builders talk (as you know) so if I had to bet, I'd **bet** changes would have reached the EA Forum (for grassroots implementation) or the CEA Groups Team (for top-down implementation) within a few months of now.
> ---
> If you look at their notable recipients, and imagine a similar cohort producing a similar amount of impact in a less capitalistic direction, it seems at least conceivable that such a **bet** could be well worth it. That's my 2cts.
> ---
> However, maybe I'm blindspotted, but I can't find a better topic to **bet** on - would solve all problems solvable with resources I don't think I can find a non-emotional way to convince people to switch from we should not search to we should search (for infinite energy).
### Thoughts on this status quo
@ -90,17 +166,17 @@ I think it would be better if there was some way of conveying enough belief in a
To that effect, here are some proposed improvements:
- If you would like to offer a bet for real, I would suggest:
- that you propose propose an operationalization. This can be a clear, unambiguous indicator, e.g., "who will the president of the US in 2025 according to Wikipedia", "what will be US mortality according to Our World In Data in 2030". Or it can be an approximate or somewhat ambiguous resolution, such that there is some room for noise, and accept that sometimes the resolution will be wrong. For example, "we will solve this bet by mutual accord", "this bet is resolved by Nu&ntilde;o's best guess as to whether life in 2030 is better for the median American than life in 2025", "this bet resolves positively if 3 out of these five indicators are positive". The core thing here is that there can be some ambiguity, but also a method for arriving at a resolution despite that ambiguity.
- propose the odds and the amount you'd be willing to bet A $100:$10k bet isn't the same as a $10:$10 bet. This helps people consider whether the effort to engage is worth it.
- If you would like to offer a bet for real, I would suggest...
- That you propose propose an operationalization. This can be a clear, unambiguous indicator, e.g., "who will the president of the US in 2025 according to Wikipedia", "what will be US mortality according to Our World In Data in 2030". Or it can be an approximate or somewhat ambiguous resolution, such that there is some room for noise, and accept that sometimes the resolution will be wrong. For example, "we will solve this bet by mutual accord", "this bet is resolved by Nu&ntilde;o's best guess as to whether life in 2030 is better for the median American than life in 2025", "this bet resolves positively if 3 out of these five indicators are positive". The core thing here is that there can be some ambiguity, but also a method for arriving at a resolution despite that ambiguity.
- That you propose the odds and the amount you'd be willing to bet. A $100:$10k bet isn't the same as a $10:$10 bet. This helps people consider whether the effort to engage is worth it.
- On my side:
- I am making the following offer: for the amount of $5 for each resolution, I will judge the next 10 bets of more than $100 that people ask me to judge[^2].
- I will be listening for instances of the "bet" on the EA Forum, and will consider taking the bets that I think are wrong or uncalibrated.
- Additionally, I would find it convenient if people reserved the word "bet" for situations where they are willing to bet, and otherwise use like "I guess that", "I am confident in", "I imagine that", "it seems to me that", "I believe quite strongly that", "I am taking a chance on", and so on. This would make it easier for me programmatically search the EA forum for bet offers.
One midly subtle point here is that the more often people end up making bets when one bet is offered, the stronger the signal to offer a bet is. Conversely, if when people offer to make a bet it turns out that they meant so metaphorically, that they don't end up agreeing on a resolution, or that they initially meant it but then backed out, then the less point there is in trying to accept someone's bet, and the weaker the signal of offering a bet is.
One mildly subtle point here is that the more often people end up making bets when one bet is offered, the stronger the signal to offer a bet is. Conversely, if when people offer to make a bet it turns out that they meant so metaphorically, that they don't end up agreeing on a resolution, or that they initially meant it but then backed out, then the less point there is in trying to accept someone's bet, and the weaker the signal of offering a bet is.
## Guidelines in action
### Example transformations
Together, these guidelines would change:
@ -112,7 +188,7 @@ to either
or to
> I srongly believe that Napoleon would agree with me on X
> I strongly believe that Napoleon would agree with me on X
or
@ -126,6 +202,19 @@ or
> .... I'd bet my $60 against your $30 that if we did a survey on Positly of at least 50 people, we'd find that the average neutral point is between 2 and 4 on a 0 to 10 scale.
<br>
Anyways, that's all I have for today. I might post this in the EA forum in a few days. In the meantime, comments are welcome.
PS: You can subscribe to these posts [here](https://nunosempere.com/.subscribe/).
[^1]: Readers who think this is cool may want to learn more about Linux, and then maybe buy a Linux computer next time they are switching laptops. I'd recommend a [Lenovo Thinkpad](https://www.lenovo.com/us/en/d/linux-laptops-desktops/).
[^2]: To be clear, I think that, e.g., spending an hour researching the outcome of a bet costs me way than $10, and I am providing this pretty much as a public service. The $5 are meant so that a) people get into the habit of paying for resolutions, which I think makes them more sustainable, and b) together with the $100 limit, so that I don't get summoned for trivial bets.
[^2]: To be clear, I think that, e.g., spending an hour researching the outcome of a bet costs me way more than $5, and I am providing this pretty much as a public service. The $5 are meant so that a) people get into the habit of paying for resolutions, which I think makes them more sustainable, and b) together with the $100 limit, so that I don't get summoned for trivial bets.
<p>
<section id='isso-thread'>
<noscript>Javascript needs to be activated to view comments.</noscript>
</section>
</p>

View File

@ -0,0 +1,122 @@
Winners of the Squiggle Experimentation and 80,000 Hours Quantification Challenges
==============
In the second half of 2022, we of [QURI](https://quantifieduncertainty.org/) announced the [Squiggle Experimentation Challenge](https://forum.effectivealtruism.org/posts/ZrWuy2oAxa6Yh3eAw/usd1-000-squiggle-experimentation-challenge) and a [$5k challenge to quantify the impact of 80,000 hours' top career paths](https://forum.effectivealtruism.org/posts/noDYmqoDxYk5TXoNm/usd5k-challenge-to-quantify-the-impact-of-80-000-hours-top). For the first contest, we got three long entries. For the second, we got five, but most were fairly short. This post presents the winners.
## [Squiggle Experimentation Challenge](https://forum.effectivealtruism.org/posts/ZrWuy2oAxa6Yh3eAw/usd1-000-squiggle-experimentation-challenge)
### Objectives
From the [announcement post](https://forum.effectivealtruism.org/posts/ZrWuy2oAxa6Yh3eAw/usd1-000-squiggle-experimentation-challenge): 
> [Our] team at QURI has [recently released Squiggle,](https://forum.effectivealtruism.org/posts/TfPdb2aMKzgWXFvc3/announcing-squiggle-early-access) a very new and experimental programming language for probabilistic estimation. Were curious about what promising use cases it could enable, and we are launching a prize to incentivize people to find this out.
### Top Entries
[**@Tanae**](https://forum.effectivealtruism.org/users/tanae?mention=user)**s** [**Adding Quantified Uncertainty to GiveWell's Cost-Effectiveness Analysis of the Against Malaria Foundation**](https://forum.effectivealtruism.org/posts/4Qdjkf8PatGBsBExK/adding-quantified-uncertainty-to-givewell-s-cost)
Tanae adds uncertainty estimates to each step in GiveWells estimate for AMF in the Democratic Republic of Congo, and ends up with this endline estimate for lives saved (though not other effects):
<img src="https://i.imgur.com/QBAO2Ui.png" class='.img-medium-center'>
[**@drwahl**](https://forum.effectivealtruism.org/users/drwahl?mention=user)**'s** [**Cost-effectiveness analysis for the Lead Exposure Elimination Project in Malawi**](https://forum.effectivealtruism.org/posts/BK7ze3FWYu38YbHwo/squiggle-experimentation-challenge-cea-leep-malawi)
Dan creates a probabilistic estimate for the effectiveness of the Lead Exposure Elimination Project in Malawi. In the process, he gives some helpful, specific improvements we could make to [Squiggle](https://www.squiggle-language.com/). In particular, his feedback motivated us to make Squiggle faster, first from part of his model not being able to run, then to his model running in 2 mins, then in 3 to 7 seconds.
[**@Erich\_Grunewald**](https://forum.effectivealtruism.org/users/erich_grunewald?mention=user)**'s** [**How many EA billionaires five years from now?**](https://forum.effectivealtruism.org/posts/Ze2Je5GCLBDj3nDzK/how-many-ea-billionaires-five-years-from-now) 
Erich creates a Squiggle model to estimate the number of future EA billionaires. His estimate looks like this:
<img src="https://i.imgur.com/3Hq9KuH.png" class='.img-medium-center'>
That is, he is giving a 5-10% probability of negative billionaire growth, i.e., of losing a billionaire, as, in fact, happened. In hindsight, this seems like a neat example of quantification capturing some relevant tail risk. 
Perhaps if people had looked to this estimate when making decisions about earning to give or personal budgeting decisions in light of FTXs largesse, they might have made better decisions. But it wasnt the case that this particular estimate was incorporated into the way that people made choices. Rather my impression is that it was posted in the EA Forum and then forgotten about. Perhaps it would have required more work and vetting to make it useful.
### Results
| Entry | Estimated relative value (normalized to 100%) | Prize |
|-----------------------------------------------------------------------------------------------------------|------------------------------------------------|-------|
| Adding Quantified Uncertainty to GiveWell's Cost Effectiveness Analysis of the Against Malaria Foundation | 67% | $600 |
| CEA LEEP Malawi | 26% | $300 |
| How many EA Billionaires five years from now? | 7% | $100 |
Judges were Ozzie Gooen, Quinn Dougherty, and Nuño Sempere. You can see our estimates [here](https://docs.google.com/spreadsheets/d/1-8NguS_DWEhaxn-q7lf8KSsV3OGEeQYq-VWRFvd2QcA/edit). Note that per the contest rules, we judged these prizes before October 1, 2022—so before the downfall of FTX, and winners received their prizes shortly thereafter. Previously I mentioned the results in [this edition](https://forecasting.substack.com/p/forecasting-newsletter-september-57b) of the Forecasting Newsletter.
## [$5k challenge to quantify the impact of 80,000 hours' top career paths](https://forum.effectivealtruism.org/posts/noDYmqoDxYk5TXoNm/usd5k-challenge-to-quantify-the-impact-of-80-000-hours-top)
### Objectives
With this post, we hoped to elicit estimates that could be built upon to estimate the value of 80,000 hours [top 10 career paths](https://80000hours.org/career-reviews/#our-priority-paths). We were also curious about whether participants would use Squiggle or other tools when given free rein to choose their tools.
### Entries
[**@Vasco Grilo**](https://forum.effectivealtruism.org/users/vascoamaralgrilo?mention=user)**s** [**Cost-effectiveness of operations management in high-impact organisations**](https://forum.effectivealtruism.org/posts/LWN6qFhCtPDEJJpeG/cost-effectiveness-of-operations-management-in-high-impact)
Vasco Grilo looks at the cost-effectiveness of operations, first looking at various ways of estimating the impact of the EA community and then sending a brief survey to various organizations about the “multiplier” of operations work, which is, roughly, the ratio of the cost-effectiveness of one marginal hour of operations work to the cost-effectiveness of one marginal hour of their direct work. He ends up with a pretty high estimate for that multiplier, of between ~4.5 and ~13.
[**@10xRational**](https://forum.effectivealtruism.org/users/10xrational?mention=user)**s** [**Estimating the impact of community building work**](https://forum.effectivealtruism.org/posts/gKywAZWEWe4WWfoEx/quantitatively-estimating-the-impact-of-working-in-community)
@10xrational gives fairly granular estimates of the value of various community-building activities in terms of first-order effects of more engaged EAs, and second-order effects of more donations to effective charities and more people working on 80,000 hours top career paths. @10xrational ends up concluding that 1-on-1s are particularly effective.
[**@charrin**](https://forum.effectivealtruism.org/users/charrin?mention=user)**s** [**Estimating the Average Impact of an ARPA-E Grantmaker**](https://forum.effectivealtruism.org/posts/ydMnQtgptfvEZHRHy/estimating-the-average-impact-of-an-arpa-e-grantmaker)
@charrin looks at the average impact of an ARPA-EA grantmaker, in terms of how much money they have influence over, and what the value of their projects—lowballed as their market cap—is. The formatting was bare-bones, but I thought this was valuable because of the concreteness.
[**@Joel Becker**](https://forum.effectivealtruism.org/users/joel-becker?mention=user)**s** [**Quantifying the impact of grantmaking career paths**](https://forum.effectivealtruism.org/posts/3tR7gpqYWzByPDwqL/quantifying-the-impact-of-grantmaking-career-paths)
Joel looks at the impact of grantmaking career paths, and decomposes the problem into the probability of getting a job, the money under management, and the counterfactual improvement. He then applies adjustments for non-grantmaking impact, and then translates his numbers to basis points of existential risk averted. A headline number is “a mean estimate of $5.7m for the Open Philanthropy-equivalent resources counterfactually moved by grantmaking activities of the prospective marginal grantmaker, conditional on job offer.”
[**@Duncan Mcclements**](https://forum.effectivealtruism.org/users/duncan-mcclements-1?mention=user)**s** [**Estimating the marginal impact of outreach**](https://forum.effectivealtruism.org/posts/yeMzJATjqxLioGM6K/estimating-the-marginal-impact-of-outreach)
Duncan fits a standards economics model to estimate the impact of outreach, in R. According to his assumptions, he concludes that
> “these results counterintuitively imply that the current marginal individual would be having substantially higher marginal impact working to expand effective altruism than working on maximising the reduction in existential risk today, with 99.7% confidence”. 
Note that 99.7% confidence is the probability given by the model. And one disadvantage of that econ-flavoured approach is that most of the probability of the conclusion going the other way will come from model error.
### Results
| Entry | Estimated relative value (normalized to 100%) | Prize (scaled up to $5k) |
|--------------------------------------------------------------------------|------------------------------------------------|-------------------------------------------------|
| Cost-effectiveness of operations management in high-impact organisations | 25% (25.0636%) | $1253 (25.0636% of $5k, to the nearest dollar) |
| Estimating the impact of community building work | 24% | $1214 |
| Estimating the Average Impact of an ARPA-E Grantmaker | 22% | $1094 |
| Quantifying the impact of grantmaking career paths | 18% | $912 |
| Estimating the marginal impact of outreach | 11% | $528 |
Judges for this challenge were Alex Lawsen, Sam Nolan, and Nuño Sempere. They each gave their relative estimates, which can be seen [here](https://docs.google.com/spreadsheets/d/1ga9sP1A3dGdKwcnVQ2-HKVBB3kHqI16hB71muUuwW-k/edit#gid=1803722829), and these were averaged to determine what proportion of the prize each participant should receive. Weve recently contacted the winners, and they should be receiving their prices in the near future.
With most of these posts, we appreciated the estimation strategies, as well as the initial estimation attempts. But we generally thought that the posts were far from complete estimates, and there is still much work between now and estimating the relative or absolute values of 80,000 hours top career paths in a way which would be decision-relevant.
## Lessons learnt
### From the estimates themselves
From the estimates for the Squiggle experimentation prize, we got some helpful comments that we used to make Squiggle better and faster. I also thought that @Erich\_Grunewalds [How many EA billionaires five years from now?](https://forum.effectivealtruism.org/posts/Ze2Je5GCLBDj3nDzK/how-many-ea-billionaires-five-years-from-now) was ultimately a good example of quantified estimates capturing some tail risk.
From the entries to the 80,000 hours estimation challenge, we probably underestimated the difficulty of producing comprehensive estimates for 80,000 hours top career paths. The strategies submissions proposed nonetheless remain ingenious or valid, and they could be built upon.
### On expected participation for estimation prizes
Participation for both prizes was relatively low. This is even though the expected monetary prize seemed pretty high. Both challenges have around 1k views in the EA forum (1139 views for the Squiggle experimentation prize and 1690 for the 80,000 hours quantification challenge). They were also advertised on the forecasting newsletter, on Twitter, or on relevant discords. And for the Squiggle experimentation prize, on the [Squiggle announcement post](https://forum.effectivealtruism.org/posts/TfPdb2aMKzgWXFvc3/announcing-squiggle-early-access).
My sense is that similar contests with similar marketing should expect a similar number of entries.
Note also that the prizes were organized when EA had comparatively more money, due to FTX.
### On judging and reward methods
For the first prize, we asked judges to estimate relative values. Then we converted these to our predetermined prize amounts. I thought that this was inelegant, so for the second prize, we instead scaled the prizes in proportion to the estimated value of the entries.
A yet more elegant method might be to have a variably-sized pot that scales with the estimated value of the submissions. This, for example, does not penalize participants from telling other people about the prize, as a fixed pot prize does. Its possible that we might try that method in subsequent contests. One possible downside might be that this adds some uncertainty for participants. But that uncertainty can be mitigated by giving clear examples and their corresponding payout amounts.
It remains unclear whether a more incentive-compatible prize design ends up meaningfully improve the outcome of a prize. It might for larger contests, though, so thinking about it doesnt seem completely useless.
<p>
<section id='isso-thread'>
<noscript>Javascript needs to be activated to view comments.</noscript>
</section>
</p>

Binary file not shown.

After

Width:  |  Height:  |  Size: 53 KiB

View File

@ -0,0 +1,25 @@
What happens in Aaron Sorkin's *The Newsroom*
=============================================
WILL MACAVOY is an aging news anchor who, together with his capable but amoral executive producer, DON KEEFER, is creating a news show that is optimizing for viewership, sacrificing newsworthiness and journalistic honour in the process. Unsatisfied with this, his boss CHARLIE SKINNER, hires MacAvoy's idealistic yet supremely capable ex-girlfriend, MACKENZIE MCHALE, to be the new executive producer. She was recently wounded in Afghanistan and is physically and mentally exhausted, but SKINNER is able to see past that, trust his own judgment, and make a bet on her.
Over the course of three seasons, MACKENZIE MCHALE imprints her idealistic and principled journalistic style on an inexperienced news team, whom she mentors and cultivates. She also infects MACAVOY and DON KEEFER, who, given the chance also choose to report newsworthy events over populistic gossip. All the while, CHARLIE SKINNER insulates that budding team from pressures from the head honchos to optimize for views and to not antagonize poweful political figures, like the Koch brothers. His power isn't infinite, but it is ENOUGH to make the new team, despite trials and tribulations, flourish.
-><iframe title="You know how? We just decided to." src="https://video.nunosempere.com/videos/embed/e8cc8e0b-1605-41fc-a809-27b294197d23" allowfullscreen="" sandbox="allow-same-origin allow-scripts allow-popups" width="1200" height="700" frameborder="0"></iframe><-
Towards the end of the series, the work of the underlings ends up convincing the head honchos, LEANA and REESE LANSING, that having news reporting that is not crap is something that they, too, desire, and that they are willing to sacrifice some profits to nourish. This becomes relevant when the parent company confronts a hostile takeover, and the LANSINGS have to make a conscious choice to exert their efforts to preserve their news division, which they have come to cherish as a valuable public good.
A theme of the series is that ignoring flawed incentives and the siren call of cynicism, and instead doing the right thing, is a choice that people in positions of power have the ability to make and propagate.
Inspired by:
- [What Happens in Batman Begins](http://www.aaronsw.com/weblog/batmanbegins)
- [What Happens in The Dark Knight](http://www.aaronsw.com/weblog/tdk)
- [What Happens in The Dark Knight Rises](http://www.aaronsw.com/weblog/tdkr)
- [and more](http://www.aaronsw.com/weblog/fullarchive)
<p>
<section id='isso-thread'>
<noscript>Javascript needs to be activated to view comments.</noscript>
</section>
</p>

View File

@ -0,0 +1,63 @@
Estimation for sanity checks
============================
I feel very warmly about using relatively quick estimates to carry out sanity checks, i.e., to quickly check whether something is clearly off, whether some decision is clearly overdetermined, or whether someone is just bullshitting. This is in contrast to Fermi estimates, which aim to arrive at an estimate for a quantity of interest, and which I also feel warmly about but which aren't the subject of this post. In this post, I explain why I like quantitative sanity checks so much, and I give some examples.
### Why I like this so much
I like this so much because:
- It is very defensible. There are some cached arguments against more quantified estimation, but sanity checking cuts through most—if not all—of them. "Oh, well, I just think that estimation has some really nice benefits in terms of sanity checking and catching bullshit, and in particular in terms of defending against scope insensitivity. And I think we are not even at the point where we are deploying enough estimation to catch all the mistakes that would be obvious in hindsight after we did some estimation" is both something I believe and also just a really nice motte to retreat when I am tired, don't feel like defending a more ambitious estimation agenda, or don't want to alienate someone socially by having an argument.
- It can be very cheap, a few minutes, a few Google searches. This means that you can practice quickly and build intuitions.
- They are useful, as we will see below.
### Some examples
#### Photo Patch Foundation
The [Photo Patch Foundation](https://photopatch.org/) is an organization which has received a [small amount of funding](https://www.openphilanthropy.org/grants/photo-patch-foundation-general-support-2019/) from Open Philanthropy:
> Photo Patch has a website and an app that allows kids with incarcerated parents to send letters and pictures to their parents in prison for free. This diminishes barriers, helps families remain in touch, and reduces the number of children who have not communicated with their parents in weeks, months, or sometimes years.
It takes [little digging](https://donorbox.org/patching-relationships-with-letters-photos-2) to figure out that their costs are $2.5/photo. If we take the [AMF numbers at all seriously](https://forum.effectivealtruism.org/posts/4Qdjkf8PatGBsBExK/adding-quantified-uncertainty-to-givewell-s-cost), it seems very likely that this is not a good deal. For example, for $2.5 you can deworm several kids in developing countries, or buy [a bit more](https://www.againstmalaria.com/DollarsPerNet.aspx) than one malaria net. Or, less intuitively, trading 0.5% chance of saving a statistical life for sending a photo to a prisoner seems like a pretty bad trade.
One can then do [somewhat more elaborate estimations](https://forum.effectivealtruism.org/posts/h2N9qEbvQ6RHABcae/a-critical-review-of-open-philanthropy-s-bet-on-criminal) about criminal justice reform.
#### Sanity-checking that supply chain accountability has enough scale
At some point in the past, I looked into [supply chain accountability](https://forum.effectivealtruism.org/posts/ME4zE34KBSYnt6hGp/new-cause-proposal-international-supply-chain-accountability), a cause area related to improving how multinational corporations treat labor. One quick sanity check is, well, how many people does this affect? You can check, and per [here](https://static.inditex.com/annual_report_2021/es/documentos/informe-de-gestion-integrado-2021.pdf)[^1], Inditex—a retailer which owns brands like Zara, Pull&Bear, Massimo Dutti, etc.—employed 3M people in its supply chain, as of 2021.
So scalability is large enough that this may warrant further analysis. One this simple sanity check is passed, one can then go on and do some more complex estimation about how cost-effective improving supply chain accountability is, like [here](https://www.getguesstimate.com/models/14645).
[^1]: Sum the number of people in p. 391-392 [here](https://static.inditex.com/annual_report_2021/es/documentos/informe-de-gestion-integrado-2021.pdf): 19546 + 49647 + 90363 + 383032 + 435469 + 845778 + 134970 + 92146 + 652808 + 381607 + 8499 + 4989 = 3098854.
#### Sanity checking the cost-effectiveness of the EA Wiki
In my analysis of the EA Wiki, I calculated how much the person behind the EA Wiki [was being paid per word](https://forum.effectivealtruism.org/posts/kTLR23dFRB5pJryvZ/external-evaluation-of-the-ea-wiki#Costs_per_word_compared_to_other_industries), and found that it was in the ballpark of other industries. If it had been egregiously low, my analysis could have been shorter, and maybe concluded that this was a really good bargain. If the amount had been egregiously high, maybe I would have had to dig in about why that was.
As it was, the sanity check was passed, and I went on to look at [other considerations](https://forum.effectivealtruism.org/posts/kTLR23dFRB5pJryvZ/external-evaluation-of-the-ea-wiki#Evaluating_outcomes).
#### Optimistic estimation for early causes
Occasionally, I've seen some optimistic cost-effectiveness estimates by advocates of a particular cause area or approach (e.g., [here](https://forum.effectivealtruism.org/posts/CcNY4MrT5QstNh4r7/cost-effectiveness-of-foods-for-global-catastrophes-even), [here](https://forum.effectivealtruism.org/posts/HqEmL7XAuuD5Pc4eg/evaluating-strongminds-how-strong-is-the-evidence), or [here](https://forum.effectivealtruism.org/posts/XpeamS2yTNhagxAip/remote-health-centers-in-uganda-a-cost-effective)). One possible concern here is that because it's the advocates that are doing this cost-effective estimates, they might be biased upwards. But even if they are biased upwards, they are not completely uninformative: they show that at least some assumptions and parameters, chosen by someone who is trying their best, under which the proposed intervention looks great. And then further research might reveal that the initial optimism is or isn't warranted. But that first hurdle isn't trivial.
#### Other examples
- You can see the revival of LessWrong pretty clearly if you look at the [number of votes per year](https://i.imgur.com/sPA5IAZ.png). Evaluating the value of that revival is much harder, but one first sanity check is to see whether there was some revival being done.
- When evaluating small purchases, sometimes the cost of the item is much lower than the cost of thinking about it, or the cost of the time one would spend using the item (e.g., for me, the cost of a hot chocolate is smaller than the cost of sitting down to enjoy a hot chocolate). I usually take this as a strong sign that the price shouldn't be the main consideration for those types of purchase, and that I should remember that I am no longer a poor student.
- Some causes, like neglected diseases, are not going to pass a cost-effectiveness sanity check, because they affect too few people.
- If you spend a lot of time in front of a computer, or having calls, the cost of better computer equipment and a better microphone is most likely worth it. I wish I'd internalized this sooner.
- Raffles and lotteries (e.g., "make three forecasts and enter a lottery to win $300", or "answer this survey to enter a raffle to win $500") are usually not worth it, because they don't reveal the number of people who enter, and it's usually very high.
- etc.
### Conclusion
I explained why I like estimates as sanity checks: they are useful, cheap, and very defensible. I then gave several examples of dead-simple sanity checks, and in each case pointed to more elaborate follow-up estimates.
<p>
<section id='isso-thread'>
<noscript>Javascript needs to be activated to view comments.</noscript>
</section>
</p>

View File

@ -15,5 +15,8 @@
- [Straightforwardly eliciting probabilities from GPT-3](https://nunosempere.com/2023/02/09/straightforwardly-eliciting-probabilities-from-gpt-3)
- [Inflation-proof assets](https://nunosempere.com/2023/02/11/inflation-proof-assets)
- [A Bayesian Adjustment to Rethink Priorities' Welfare Range Estimates](https://nunosempere.com/2023/02/19/bayesian-adjustment-to-rethink-priorities-welfare-range-estimates)
- [A computable version of Solomoff induction](https://nunosempere.com/2023/03/01/computable-solomoff)
- [Use of the construction "I'd bet" on the EA Forum is mostly metaphorical](https://nunosempere.com/2023/03/02/metaphorical-bets)
- [A computable version of Solomonoff induction](https://nunosempere.com/2023/03/01/computable-solomonoff)
- [Use of &ldquo;I'd bet&rdquo; on the EA Forum is mostly metaphorical](https://nunosempere.com/2023/03/02/metaphorical-bets)
- [Winners of the Squiggle Experimentation and 80,000 Hours Quantification Challenges](https://nunosempere.com/2023/03/08/winners-of-the-squiggle-experimentation-and-80-000-hours)
- [What happens in Aaron Sorkin's *The Newsroom*](https://nunosempere.com/2023/03/10/aaron-sorkins-newsroom)
- [Estimation for sanity checks](https://nunosempere.com/2023/03/10/estimation-sanity-checks)

View File

@ -1,15 +0,0 @@
year=2023
echo "## In $year..."
echo
for dir in */*/*
do
index_path="$(pwd)/$dir/index.md"
title="$(cat $index_path | head -n 1)"
url="https://nunosempere.com/$year/$dir"
# echo $dir
# echo $index_path
# echo $title
echo "- [$title]($url)"
done

View File

@ -3,3 +3,6 @@ This file is used for testing the werc framework. The symbols below are probably
---
En un lugar de la Mancha de cuyo nombre no quiero acordarme no ha mucho tiempo que vivía
........
...