feat: add a few posts

This commit is contained in:
Nuno Sempere 2022-09-28 22:33:40 +00:00
parent eb9266b92e
commit 0221e0242e
31 changed files with 922 additions and 3 deletions

Binary file not shown.

After

Width:  |  Height:  |  Size: 99 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 81 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 26 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 85 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 80 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 168 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 33 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 62 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 21 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 43 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 58 KiB

View File

@ -0,0 +1,274 @@
Simple estimation examples in Squiggle
==============
This post goes through several simple estimates, written in [Squiggle](https://www.squiggle-language.com/), a new estimation language. My hope is that it might make it easier to write more estimates of a similar sort, wider adoption of Squiggle itself, and ultimately better decisions. 
## Initial setup
One can use Squiggle in several ways. This blog post will cover using it on its website and in a Google Spreadsheet. An upcoming blog post will cover using it in more complicated setups.
### [squiggle-language.com](https://www.squiggle-language.com/playground)
The simplest way to use Squiggle is to open [squiggle-language.com/playground](https://www.squiggle-language.com/playground). You will see something like the following:
<p><img src="https://i.imgur.com/lqEe46S.png" class="img-medium-center"></p>
You can write your model on the editor in the left side, and the results will be displayed in the right side. As you make edits, the url in your browser bar will change so that you copy it and use it to refer people to your model.
### [Squiggle <> Google docs](https://docs.google.com/spreadsheets/d/1uKtQDlOQ0UYJznlzDNAwSdvwIB6bYv4tT8Z5gIQi7Fc/edit#gid=0)
When working with multiple models, Ive also found it useful to use Squiggle in Google sheets. To do so, make a copy of [this spreadsheet](https://docs.google.com/spreadsheets/d/1uKtQDlOQ0UYJznlzDNAwSdvwIB6bYv4tT8Z5gIQi7Fc/edit#gid=1407581126), and allow app permissions. Edit the “Main” sheet, and click on “Squiggle” > “Feed into Squiggle” to compute models. If you have difficulties, read the “Instructions” sheet, or leave a comment. 
<p><img src="https://i.imgur.com/yCskQSU.png" class="img-medium-center"></p>
So without further ado, the simple example models:
## Partially replicating [Dissolving the Fermi Paradox](https://arxiv.org/abs/1806.02404) (complexity = 1/10)
Page 2 of the paper defines the factors for the [Drake equation](https://en.wikipedia.org/wiki/Drake_equation#Equation):
<p><img src="https://i.imgur.com/e5eAKsg.png" class="img-medium-center"></p>
Page 10 of the paper gives its estimates for the factors of the [Drake equation](https://en.wikipedia.org/wiki/Drake_equation#Equation):
<p><img src="https://i.imgur.com/F53wXSL.png)" class="img-medium-center"></p>
Because Squiggle doesnt yet have the log-uniform probability distribution, were going to have to define it first. A [log-uniform](https://en.wikipedia.org/wiki/Reciprocal_distribution) is a probability distribution whose log is a uniform distribution.
```
// Define the log-uniform
// (<https://en.wikipedia.org/wiki/Reciprocal_distribution>)
loguniform(a, b) = exp(uniform(log(a), log(b)))
// Estimates
rate_of_star_formation = loguniform(1,100)
fraction_of_stars_with_planets = loguniform(0.1, 1)
number_of_habitable_planets_per_star_system = loguniform(0.1, 1)
fraction_of_habitable_planets_in_which_any_life_appears = 1
// ^ more on this below
fraction_of_planets_with_life_in_which_intelligent_life_appears = loguniform(0.001, 1)
fraction_of_intelligent_planets_which_are_detectable_as_such = loguniform(0.01, 1)
longevity_of_detectable_civilizations = loguniform(100, 10000000000)
// Expected number of civilizations in the Milky way;
// see footnote 3 (p. 5)
n = rate_of_star_formation *
 fraction_of_stars_with_planets *
 number_of_habitable_planets_per_star_system *
 fraction_of_habitable_planets_in_which_any_life_appears *
 fraction_of_planets_with_life_in_which_intelligent_life_appears *
 fraction_of_intelligent_planets_which_are_detectable_as_such *
 longevity_of_detectable_civilizations
// Display
n
```
This produces the [following estimate](https://www.squiggle-language.com/playground#code=eNqdkk1vnDAQhv%2FKiBNE2wVU9bJtc0qOvXSvqJaBAUYxNrGHEFr1v9d4P7QfidotF4TH7%2FPO8M6vyHVm2o59L%2B0cbRqpHK7C2WNNbGy0YTv6E9LEJNX2eaS2VbhlS7qNNlGawgM2pBG4Q1Cm%2FTBqaoztodC%2BFn%2FpmAe3SVPU64meaMCa5NrYNl2%2B0u9Y0WBNJZWoyXloOTIZfZ8U2rP2qFiuoEzgK%2BDrEB%2FOfDmWyWqxjMsk8YJg%2BOiYesnoCm39S5hGOJZWLBq5oD3mhJyv8izz2sbKaqke7jsxEXdiUFIju3NNts5XkHuRHvsS7SLpZEksS4UHhRh8IRi72TH27xBOba8ZpMXUUdUJqWehqEEhhwF9c56Wh2l%2FQG8sgp%2BKO3JQojLTOfXACuMExpFKmlEpalHzJf2s1yx7o9tT8dFj16xFUSNjtRtGOuHGqruC7pnK6BZfiOcFeiKr6IUU%2FQyRXXTkE%2FPa7Pgco38dvBpr2OUCpoFzCumwpN9IPc0wyflzkDlEaIxhbRjhI8TDGj4t4XrTdzbortAAf9mYcOeWBbmDS%2BotC3Gt%2Fp%2Fgryk3xxzm%2FqdQ96k9kPPgefnhhY5%2B%2FwEFqbCw):
<p><img src="https://i.imgur.com/rKqyWoZ.png" class="img-medium-center"></p>
The estimate is fairly wide, but the model gives a 10%-ish chance that there is, in expectation, less than once civilization in the Milky Way. After updating on a bunch of observations, the paper raises that probability, hence the conclusion that the Fermi paradox has been “dissolved”. 
Why did we set *fraction_of_planets_in_which_any_life_appears* to 1? Well, the paper considers an estimate of 1exp(r), where r is distributed as a lognormal(1,50). But because r ranges from very small numbers to very large numbers, they get collapsed to either 0 or 1 when going through 1exp(r), which produces some numerical errors when multiplying by 0. In addition, that estimate has been [questioned](https://forum.effectivealtruism.org/posts/kvZshdx5FzTPjyhxG/the-fermi-paradox-has-not-been-dissolved). 
So following a similar move in the paper, we can set that factor to a high value (in this case, to 1, meaning that all planets capable of life do host life). And then, when we notice that the probability of no other life in the Milky Way is still significant, the Fermi paradox will still have been somewhat dissolved, though to a lesser extent.
From here on, we could tweak the rate of life formation to other plausible estimates, like:
- 0.01 to 1, per [this post](https://forum.effectivealtruism.org/posts/kvZshdx5FzTPjyhxG/the-fermi-paradox-has-not-been-dissolved#comments).
- 0.001 to 1
- 10^-100 to 0.1
- 1 - (10^(-36) to 1), which approximates 1 - exp(-lognormal(1,50))
- (10^(-36) to 1), which flips the above
We could also [question the other estimates](https://forum.effectivealtruism.org/posts/kvZshdx5FzTPjyhxG/the-fermi-paradox-has-not-been-dissolved#The_origin_of_life), or notice that the uncertainty over the rate of life formation probably dominates other uncertainties.
## Expected value for a list of things (complexity = 2/10)
Moving on to a slightly more complicated example, I was recently talking with someone who was thinking about various career pathways. To estimate the effectiveness of each opportunity, I created the following [spreadsheet](https://docs.google.com/spreadsheets/d/1BlTejl2qh9gOoQVVIkdimdj3oE5vgeUUfxMUigP-qOg/edit#gid=0) (project name and notes censored): 
<p><img src="https://i.imgur.com/RW8H2Wd.png" class="img-medium-center"></p>
That is, I decomposed each job opportunity into:
* value if successful, 
* probability of it being successful,
* effort required. 
I estimated each of the values, combined them as strings using a Google Sheets function, and then fed it to Squiggle using a copy of [this Google spreadsheet](https://docs.google.com/spreadsheets/d/1uKtQDlOQ0UYJznlzDNAwSdvwIB6bYv4tT8Z5gIQi7Fc/edit?usp=sharing). The whole thing took 10 mins. And the complexity in this example comes from integrating Squiggle with Google docs, which was annoying to set up but which shouldnt be as annoying to use.
One interesting feature of these estimates is that the value estimates were [relative](https://forum.effectivealtruism.org/posts/hrdxf5qdKmCZNWTvs/valuing-research-works-by-eliciting-comparisons-from-ea), rather than using a concrete unit like QALYs. That is, the thing that has meaning is the ratio between the different values in that column, not the absolute values. In this case, this makes estimates much, much quicker to produce, but more error-prone, and harder to interpret, improve, or disagree with.
## Probability of a nuclear explosion in London (complexity = 3/10)
A [logic model](https://en.wikipedia.org/wiki/Logic_model) (I also use pathway to impact as a synonym for this) is a chain between inputs and outcomes, which at its simplest might look as follows:
 
<p><img src="https://i.imgur.com/ev5fz2m.png" class="img-medium-center"></p>
When doing quick estimates, we can keep this pathway to impact implicit. But when working with slightly more complex models, it can be worth it to write it explicitly. Ive been using [excalidraw](https://excalidraw.com/) for this, and I would recommend it.  
For example, when calculating the chance that a nuclear bomb hits London, we can draw a logic model like:
<p><img src="https://i.imgur.com/Zzckq5Y.png" class="img-medium-center"></p>
To get a sense of the historical frequency, we can use Laplaces rule of succession. Per Wikipedia on [RussiaNATO relations](https://en.wikipedia.org/wiki/Russia%E2%80%93NATO_relations), the one direct NATO/Russia incident was the [shootdown of a Russian plane by Turkish forces in 2015](https://en.wikipedia.org/wiki/2015_Russian_Sukhoi_Su-24_shootdown). Putting numbers on this, we have:
```
num_years_NATO_Russia_relations = 2022 - 1991
num_years_with_direct_confrontation = 1
laplace(successes, failures) = (successes + 1)/( successes + failures + 2) // this is a function!
[
 laplace(num_years_with_direct_confrontation,
 num_years_NATO_Russia_relations), laplace(0, num_years_NATO_Russia_relations)
] // this is an array
// [ 0.0588, 0.0303 ] = [ 5.88%, 3.03% ]
```
That is, if we count the Turkish incident as a “confrontation”, the base rate implied by Laplaces rule would be 5.88%, and if we dont, it would be 3.03%. See also some more discussion about possible confrontations [here](https://twitter.com/NunoSempere/status/1547245909930639361). I think that when Im thinking of a direct NATO/Russia confrontation, Im thinking more of NATOs Article 4 being invoked.
In any case, giving some numbers I think are plausible, we would have something like:
```
chance_of_direct_nato_confrontation = 0.005 to 0.05
and_use_of_nuclear_weapons_in_Europe = 0.01 to 0.2
and_use_of_nuclear_weapons_in_London = 0.05 to 0.3
use_of_nuclear_weapons_in_London = chance_of_direct_nato_confrontation *
 and_use_of_nuclear_weapons_in_Europe *
 and_use_of_nuclear_weapons_in_London
yearly_probability = mean(use_of_nuclear_weapons_in_London)
monthly_probability = 1 - (1 - yearly_probability)^(1/12) // roughly
```
This produces an estimate of ~0.02% per year, or 0.0016% per month. We can then transform these into expected hours lost:
```
remaining_life_expectancy_in_years = 40 to 60
days_in_a_year = 365
hours_in_a_day = 24
expected_lost_hours = monthly_probability * remaining_life_expectancy_in_years * days_in_a_year * hours_in_a_day
{
   expected_lost_hours: expected_lost_hours,
   mean_expected_lost_hours: mean(expected_lost_hours)
} // this is a record
```
The last four lines are a record. It is telling Squiggle two things:
* To build a record, an object which contains other objects, similarly to how one would do so in Javascript
* Because the record isnt assigned to any variable, to only display that record
This produces the following output:
<p><img src="https://i.imgur.com/0tTT6bP.png" class="img-medium-center"></p>
From here, we could refine these estimates, or add some probability of getting out of London before a bomb hits. 
In this model, we had a few moving parts, so now seems like a good time to mention that we can [link to a squiggle model directly](https://www.squiggle-language.com/playground/#code=eNqNUctugzAQ%2FBWLU0BpC3kdkHrMrbdcra4cMGDJrKmxlaIo%2F17bpIc2lsLFsmZ3dmZ3rsnYqcvJ9j3TU1I2TI58HbBjLYzSSWm0dYhAYQSTpy8r2lbyk9EC26RMqo5hxUE1UAvNKwPIjIJKYaMVGmaEQvJO8tc83xOj%2FGdPkWENdgwstJXkTMOFs0HhCALhaLUa%2BMwqZtLmGedDYf2rdBfaUqS4gLFkg4xQJGSR7SWts7b3N7mKnGDQ6szOQgozOUc9Z7h6xk4p9s5f98AuyAtZ%2Bedxdvq5Kt6KTeqVNe%2BZCxVbkKLhwL8Ht7u7xOQlPHV0s3a5v%2BUhp1izKYizUHOl7cEF2Smr77BrcOhm52fPw3gNUo0GQpNfK2I3Iwt8ZOSfekb%2BCnvNqz%2F748plBFv71oibMgaG5shCZQxch%2FB9fhClhGQjFZfIjWJy%2BwGOw1Uj) by copying the link which the playground produces when edited.
## [Bilateral Ceasefire between Russia & Ukraine](https://www.metaculus.com/questions/10050/bilateral-ceasefire-between-russia--ukraine/#comment-95768), using Laplaces rule of succession (complexity = 4/10)
Metaculus has a question on a [Bilateral Ceasefire between Russia & Ukraine](https://www.metaculus.com/questions/10050/bilateral-ceasefire-between-russia--ukraine/#comment-95768). The question asks about two different things:
* Whether there will be a ceasefire
* Whether it will last for at least 30 days.
We can use [Laplaces rule of succession](https://en.wikipedia.org/wiki/Rule_of_succession) for both cases, first to estimate the likelihood that a ceasefire starts, and then to estimate whether it will last. For the latter, we will make the Laplace prediction, but also look at a [list of ceasefires](https://en.wikipedia.org/wiki/Category:Ceasefires) to estimate the [historical frequency of success](https://docs.google.com/spreadsheets/d/1ufNP_iuTghzKGUCRPMoLY4-c9PibiNlVFEaAxyAad-A/edit#gid=315186780).
For the likelihood that a ceasefire will start:
 
```
// Likelihood that a ceasefire will start
numSuccesses = 0 // no ceasefire so far
numFailures = 138 // days since the 24th of February
numFutureTrials = 172 // days until end of year
laplaceSuccessByNthTrial(successes, failures, numFutureTrials) = {
 numTrialsAlready = successes + failures
 numTrialsAtTheEnd = numTrialsAlready + numFutureTrials
 pAllFailures = (failures + 1) / ( numTrialsAtTheEnd + 1)
 // e.g., 10 trials, 10 failures, 0 successes
 // then probability of one additional failure is:
 // 11/12
 // and the probability of three additional failures is:
 // 11/12 * 12/13 * 13/14 = 11 / 14 =  (failures + 1) / ( numTrialsAtTheEnd + 1)
 // e.g.: 3/4 * 4/5 * 5/6 * 6/7 * 7/8 = 3/8
 1 - pAllFailures
}
chance_of_ceasefire = laplaceSuccessByNthTrial(numSuccesses, numFailures, numFutureTrials)
```
And for the likelihood that a ceasefire, once started, lasts, we are going to do some textbook outside view aggregation with the “mixture” function.
```
// Likelihood that a ceasefire lasts
// per laplace
laplace(successes, failures) = (successes + 1)/( successes + failures + 2)
chance_ceasefire_lasts_laplace = laplace(0,0)
// per dataset of ceasefires
// (<https://docs.google.com/spreadsheets/d/1ufNP_iuTghzKGUCRPMoLY4-c9PibiNlVFEaAxyAad-A/edit#gid=315186780>)
chance_ceasefire_lasts_historical = 0.3130434783
// per dataset of ceasefires, restricted to Soviet conflicts (same link)
chance_ceasefire_lasts_historical_russian = 0.4
// per brief literature review (same link)
chance_ceasefire_lasts_lit_review = 0.6392156863
// overall guess
distribution_ceasefire_lasts = mixture(
chance_ceasefire_lasts_laplace,
chance_ceasefire_lasts_historical,
chance_ceasefire_lasts_historical_russian,
chance_ceasefire_lasts_lit_review,
[0.25, 0.3, 0.3, 0.15])
// instead of mixture(...), we can also use mx(...), which is the same but shorter.
chance_ceasefire_lasts = mean(distribution_ceasefire_lasts)
```
[Putting this all together](https://www.squiggle-language.com/playground/#code=eNqdVe9v0zAQ%2FVdORUgpZHXd9BcRQyqw8QGYJjqQEEWVmziNhWsX29lWEP875zRNu9JuE19c63Lvvbvzs%2Fu7YXN9My4WC2ZWjThj0vKwjJ2lwmnTiJ0pMCKUcILJ8c9CzOeSj50Rat6IG4TAB%2FGDS5FrnYLLmQMGCWeWZ8JwuBFSgnXMuIlSxWJcJAm3lls4hTYgVumdZKshY6ZMPGdCFqbMo9HQZ6ZsZcEKlXBU4dDpuhx0Bud8ZgosHdawwiHqymClJXTQqaGFckICV6lHrbjXmSjJlpIlvCrr9erC5SU4sJtCQyxpXUoIewJNVPg9UeDj68hIGs7SFcZrPDyvCe6muqucn2E1p%2F%2FCn%2B8reeRyJOXOVIINK2bTJhAIDnD7Tx6LM%2BCteSsE2gZXppTbbWftbcFVPs5YwdLoGZsJKdzKT00rDixFWwitmNzAQdi4AlFKaKfaM5WWB7XH4XLDD7HYfRp4BrRDaOR%2FI0K7%2Fjgp9lnu%2Fqv9GCLSRbou6eHaI31c%2B2SA64AMkTQiQ59N4eTOsCfqz0QlOUPnTXU23dr1FI7aZ9fqa98cNZG34QOXSDLrbJW35GYjW9v3kFu9OYNdF9ImCQ7aEredZt1hrTotVaeVxLbZoB22mzvFpMwhwvmzrbEWys%2FBy9y5pY0JSXViW3Ot8eVoJXpB7NI73eacO0tSQovs4nIqiqt5%2Fuv9u89vPl1%2B1B%2B%2Bdk%2BSF5diJi7kl%2FMzNrpdjVh6MiIcrfNkLtLTiPbosD8Ytl8dLz8XFh8xkaDR8MVpRTRqd6PuYBg91EEIuOBdSRzHE9Ew1tcCcxKtMolBi8NlCzwaoX48Rn1qCmsFU2UV3R3xmRE8QxrHDfOuQFkUunkUPaKmVbqn7UcvOrTXH%2FY3zelrJMX3d17gmU9UKnxDs8JfvH0uJFiIW19AcL8RQniw10ekbMZxNHXbWgjf2q1OL%2FSnVy%2B09%2F3oWHwrnKngvnY3Bv7EbSFdzVRdj6yQd675ocv%2F7EjlJa0paWPwhfeRoNNr9Z9Wkm%2BFxUGuJqr86zhAHR8KhjvJe4rxkfgu5FBj8T3fyjev8ecvmSzUuQ%3D%3D), we get:
```
// Result
chance_successful_ceasefire = chance_of_ceasefire * chance_ceasefire_lasts
// result: 0.256 = 25.6%
// Display
{
 chance_of_ceasefire: chance_of_ceasefire,
 chance_ceasefire_lasts: chance_ceasefire_lasts,
 chance_successful_ceasefire:chance_successful_ceasefire
}
```
<p><img src="https://i.imgur.com/iUmJSPk.png" class="img-medium-center"></p>
## Conclusion
We have looked at some simple models and produced some estimates of things we might care about. But this amount of complexity hasnt exhausted the number of things we can estimate. 
In particular, we havent yet looked at:
* Estimates of variables over time
* Functions which produce estimates
* Using Squiggle as a programmer rather than as a normal user.
An upcoming blog post will look at these complications. Readers might also want to browse the [Squiggle EA Forum tag](https://forum.effectivealtruism.org/topics/squiggle), which has begun to fill up with some neat estimates.
## Acknowledgements
<p><img src="https://i.imgur.com/7yuRrge.png" class="img-frontpage-center"></p>
This post is a project by the [Quantified Uncertainty Research Institute](https://quantifieduncertainty.org/) (QURI).

Binary file not shown.

After

Width:  |  Height:  |  Size: 108 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 6.0 KiB

View File

@ -0,0 +1,160 @@
Forecasting Newsletter: August 2022.
==============
## Highlights
* CFTC [asking for public comments](https://comments.cftc.gov/PublicComments/CommentList.aspx?id=7311) about allowing Kalshi to [phagocytize](https://www.youtube.com/watch?v=ygkvXT1BmaA) PredictIts niche
* $25k [tournament](https://news.manifold.markets/p/manifold-markets-forecasting-tournament) by Richard Hanania on Manifold Markets.
* [pastcasting.com](https://www.pastcasting.com/) allows users to forecast on already resolved questions with unknown resolutions which hopefully results in faster feedback loops and faster learning
* Hedgehog Markets now have [automatic market-maker](https://amm.hedgehog.markets/)\-based markets
* Jonas Moss looks at [updating just on the passage of time](https://forum.effectivealtruism.org/posts/kbZjCF8kmCWgLbzxC/updating-on-the-passage-of-time-and-conditional-prediction)
## Index
* Prediction Markets & Forecasting Platforms
* Blog Posts and Research
* In The News
You can sign up for this newsletter on [Substack](https://forecasting.substack.com), or browse past newsletters [here](https://forum.effectivealtruism.org/s/HXtZvHqsKwtAYP6Y7). If you have a content suggestion or want to reach out, you can leave a comment or find me on [Twitter](https://twitter.com/NunoSempere).
## Prediction Markets & Forecasting Platforms
### Manifold Markets
Manifold markets [partnered](https://news.manifold.markets/p/manifold-markets-forecasting-tournament) ([a](https://web.archive.org/web/20220909113910/https://news.manifold.markets/p/manifold-markets-forecasting-tournament)) with Richard Hanania's [Center for the Study of Partisanship and Ideology](https://www.cspicenter.com/p/introducing-the-salemcspi-forecasting) ([a](https://web.archive.org/web/20220909113929/https://www.cspicenter.com/p/introducing-the-salemcspi-forecasting)) (some updates [here](https://www.cspicenter.com/p/salem-tournament-5-days-in) ([a](http://web.archive.org/web/20220814093212/https://www.cspicenter.com/p/salem-tournament-5-days-in)).
> The Salem Center at the University of Texas is hiring a new research fellow for the 2023-2024 academic year. This position will pay $25,000 and not require teaching or in-person residency. Rather, it will provide an academic job and financial support for a researcher to do whatever they want with their time, in order to advance their career or work on other projects.
>
> Unlike a typical fellowship, you will not apply to this one by sending us letters of recommendation and a CV listing all of your publications and awards, and then relying on our subjective judgements about other peoples subjective judgments about your work. Rather, you will participate in a forecasting tournament on economics and social and political issues. At the end of the process, we will interview the top five finalists and decide among them.
Substack now [supports manifold market embeds](https://news.manifold.markets/p/above-the-fold-anyone-want-a-loan) ([a](http://web.archive.org/web/20220823055758/https://news.manifold.markets/p/above-the-fold-anyone-want-a-loan)), which looks much like the version in the EA Forum. But now, users who are logged into Manifold Markets in the same browser can bet in there directly:
## Metaculus
Nikos Bosse [summarizes Metaculus Keep Virginia Safe Tournament](https://www.metaculus.com/notebooks/11162/the-keep-virginia-safe-tournament-202122-project-summary/) ([a](http://web.archive.org/web/20220808155107/https://www.metaculus.com/notebooks/11162/the-keep-virginia-safe-tournament-202122-project-summary/)). I would have found it interesting to read some speculation about what decisions were changed back at the Virginia Department of Health as a result of this tournament.
I appreciated the comments on [this Metaculus question](https://www.metaculus.com/questions/5320/chinese-annexation-of-half-of-taiwan-by-2050) ([a](http://web.archive.org/web/20220909113953/https://www.metaculus.com/questions/5320/chinese-annexation-of-half-of-taiwan-by-2050/)) on China annexing at least half of Taiwan by 2050. Some examples:
* [blednotik](https://www.metaculus.com/questions/5320/chinese-annexation-of-half-of-taiwan-by-2050/#comment-99163) on how hard Russia is to sanction.
* [nextbigfuture](https://www.metaculus.com/questions/5320/chinese-annexation-of-half-of-taiwan-by-2050/#comment-98615): "Apple CEO Tim Cook, other CEOS and the heads of Vanguard etc... would be on the phone to Biden, Pelosi, Schumer telling them... what are we paying you for. The chips must flow".
Metaculus is [still hiring](https://apply.workable.com/metaculus/).
### Polymarket
Polymarket now [supports deposits from Coinbase](https://nitter.grimneko.de/PolymarketHQ/status/1555207858630918146) ([a](https://web.archive.org/web/20220909114016/https://nitter.grimneko.de/PolymarketHQ/status/1555207858630918146)), and is [trying out an order book](https://lobster.polymarket.com/market/mlb-who-will-win-atlanta-braves-v-boston-red-sox-scheduled-for-august-9-710-pm-et) ([a](http://web.archive.org/web/20220909114039/https://lobster.polymarket.com/market/mlb-who-will-win-atlanta-braves-v-boston-red-sox-scheduled-for-august-9-710-pm-et)).
### PredictIt
PredictIt CEO goes on [Star Spangled Gamblers](https://starspangledgamblers.com/2022/08/08/predictit-ceo-were-not-quitting/) ([a](http://web.archive.org/web/20220809031554/https://starspangledgamblers.com/2022/08/08/predictit-ceo-were-not-quitting/)) to give encouragement to the PredictIt community after the CFTC decided to withdraw its no-action letter. PredictIt veteran Domah is [skeptical](https://nitter.it/Domahhhh/status/1556865659799322624) ([a](https://web.archive.org/web/20220909114137/https://nitter.it/Domahhhh/status/1556865659799322624)).
Various news media covered the downfall of PredictIt, for example:
* [National Review](https://www.nationalreview.com/2022/08/the-cloudy-future-of-political-futures/amp/) ([a](http://web.archive.org/web/20220829151106/https://www.nationalreview.com/2022/08/the-cloudy-future-of-political-futures/amp/))
* [Slate](https://slate.com/business/2022/08/predictit-cftc-shut-down-politics-forecasting-gambling.html?via=rss_socialflow_twitter) ([a](http://web.archive.org/web/20220814230649/https://slate.com/business/2022/08/predictit-cftc-shut-down-politics-forecasting-gambling.html?via=rss_socialflow_twitter))
* [Bloomberg](https://www.bloomberg.com/news/articles/2022-08-06/predictit-betting-on-us-elections-nixed-by-american-regulators#xj4y7vzkg)
* [Y Combinator](https://news.ycombinator.com/item?id=32353795) ([a](http://web.archive.org/web/20220806042452/https://news.ycombinator.com/item?id=32353795))
* [Politico](https://www.politico.com/news/2022/08/09/no-future-regulator-orders-political-betting-market-to-shut-down-in-u-s-00050238) ([a](http://web.archive.org/web/20220905183640/https://www.politico.com/news/2022/08/09/no-future-regulator-orders-political-betting-market-to-shut-down-in-u-s-00050238))
* [The Washington Post](https://www.washingtonpost.com/opinions/2022/08/25/predictit-gambling-political-prediction-markets/) ([a](http://web.archive.org/web/20220827024822/https://www.washingtonpost.com/opinions/2022/08/25/predictit-gambling-political-prediction-markets/))
In the previous edition of this newsletter, I mentioned that I assigned a 60% chance that Kalshi caused the previous fall from grace of Polymarket, and a 40% chance that they caused PredictIt's demise. 
Ive gotten some pushback on that, and a [simple calculation](https://www.squiggle-language.com/playground/#code=eNqNkl9rgzAUxb%2FKxaeu%2F7TCYBT62Iexl4GvQojtrV6aJi6JDFf63Xct7QojVZ%2BMyfGcX871HLnKfGfN6SRtG629bXB%2B3druyRt73yFNnqTKvhoqS4WZt6TLaB3lurZkrDhK5SoStvGVQueEdOzBK9hAskzBG368%2FhNr2qHQSGVVmMZ20hUsoN8v17mOY%2Fi0ppAFKfItmAO%2F4p52%2Ft1DgcwFR1IK96DoiOBMrluUVqDzslDkKj7ZQJqsmEfJWkmmqB9%2B4vYx08STNEnTRSeFGaQvgfCbmA7wcSUGaRG08XAnhwd5fbN%2Bfv%2FeeqbwnHYs2Qiq0AgHZjzN9SRZvnFH3az7OP8qNM5jZ8oA2tiTVPTDgxMHueOfrksc6Go2ArzLqu9BPaUPRMUhxKB3sLrBep%2F6n4fx1zCkmI%2FhDNgE2rzkOrr8AjRymo0%3D) ([a](https://web.archive.org/web/20220909114211/https://www.squiggle-language.com/playground/#code=eNqNkl9rgzAUxb%2FKxaeu%2F7TCYBT62Iexl4GvQojtrV6aJi6JDFf63Xct7QojVZ%2BMyfGcX871HLnKfGfN6SRtG629bXB%2B3druyRt73yFNnqTKvhoqS4WZt6TLaB3lurZkrDhK5SoStvGVQueEdOzBK9hAskzBG368%2FhNr2qHQSGVVmMZ20hUsoN8v17mOY%2Fi0ppAFKfItmAO%2F4p52%2Ft1DgcwFR1IK96DoiOBMrluUVqDzslDkKj7ZQJqsmEfJWkmmqB9%2B4vYx08STNEnTRSeFGaQvgfCbmA7wcSUGaRG08XAnhwd5fbN%2Bfv%2FeeqbwnHYs2Qiq0AgHZjzN9SRZvnFH3az7OP8qNM5jZ8oA2tiTVPTDgxMHueOfrksc6Go2ArzLqu9BPaUPRMUhxKB3sLrBep%2F6n4fx1zCkmI%2FhDNgE2rzkOrr8AjRymo0%3D)) just based on Laplace's law shows that the probability is probably higher.
### Kalshi
![](https://i.imgur.com/0Dv3xAE.jpeg)
The US CFTC is asking for public comments about [allowing Kalshi to host politics prediction markets](https://comments.cftc.gov/PublicComments/CommentList.aspx?id=7311) ([a](http://web.archive.org/web/20220828210656/https://comments.cftc.gov/PublicComments/CommentList.aspx?id=7311)). I particularly liked [this comment](https://comments.cftc.gov/PublicComments/ViewComment.aspx?id=69666&SearchText=) ([a](https://web.archive.org/web/20220909114108/https://comments.cftc.gov/PublicComments/ViewComment.aspx?id=69666&SearchText=)) by a JP Morgan executive, and [this one](https://comments.cftc.gov/PublicComments/ViewComment.aspx?id=69617%5C) ([a](https://web.archive.org/web/20220909114122/https://comments.cftc.gov/PublicComments/ViewComment.aspx?id=69617&SearchText=)) by a representative of the "Center for Effective Bribery". I drafted my own comment, but the CFTCs website isnt playing nice: Ill report back next month.
### Good Judgment Inc and Good Judgment Open
Good Judgment releases a [report](https://goodjudgment.com/wp-content/uploads/2022/08/FF1FF2-Climate-report-final.pdf) ([a](https://web.archive.org/web/20220909114223/https://goodjudgment.com/wp-content/uploads/2022/08/FF1FF2-Climate-report-final.pdf)) on Superforecasters™'s probabilities on various climate risks.
I appreciated these two comments on Good Judgment Open:
* [belikewater](https://www.gjopen.com/comments/1491315) ([a](https://web.archive.org/web/20220909114239/https://www.gjopen.com/comments/1491315)) considers the [chance](https://www.gjopen.com/comments/1491891) ([a](https://web.archive.org/web/20220909114844/https://www.gjopen.com/comments/1491891)) of an electrical blackout lasting at least one hour and affecting 60 million or more people in the US and/or Canada before April 2023, and pegs it at 2%.
* [orchidny](https://www.gjopen.com/comments/1493360) ([a](https://web.archive.org/web/20220909114357/https://www.gjopen.com/comments/1493360)) considers the chance that Donald Trump be criminally charged with or indicted for a federal and/or state crime in the US.
### Odds and ends
[pastcasting.com](https://www.pastcasting.com/) is a new website which allows users to forecast on already resolved questions they dont have prior knowledge about to get quick feedback. Would recommend!
Hedgehog Markets now have [automatic market-maker](https://amm.hedgehog.markets/)\-based markets ([a](http://web.archive.org/web/20220714024403/https://amm.hedgehog.markets/)). This is a type of prediction market that I've come to know and love because it moves the game a bit from user vs user to platform vs user. They also have a neat piece that covers [the recent history of prediction markets](https://scribe.citizen4.eu/decentralized-prediction-markets-past-to-present-23ba1c5b186a) ([a](https://web.archive.org/web/20220909114417/https://scribe.citizen4.eu/decentralized-prediction-markets-past-to-present-23ba1c5b186a)). Id say that Hedgehog Markets has matured a fair bit since launch, and I would encourage readers to [explore their markets](https://hedgehog.markets/) a bit.
Hypermind has a [small contest](https://mailchi.mp/hypermind/russia-sanctions-challenge?e=1d84f56c13) ([a](https://web.archive.org/web/20220909114441/https://us13.campaign-archive.com/?u=f028e1d3668703556410a42ec&id=d69d549943&e=1d84f56c13)) on Russian sanctions. In the question creation phase, with a $4k prize pool, participants can propose questions and bet on which questions will be chosen. Then a $10k tournament will take place on the basis of those questions.
[Yoloreked](https://www.yolorekt.finance) ([a](https://web.archive.org/web/20220909114503/https://www.yolorekt.finance/)) from [YOLO](https://www.urbandictionary.com/define.php?term=Yolo) ([a](http://web.archive.org/web/20220901050316/https://www.urbandictionary.com/define.php?term=Yolo)) and [wreck](https://dictionary.cambridge.org/us/dictionary/english/wreck) ([a](http://web.archive.org/web/20190730205128/https://dictionary.cambridge.org/us/dictionary/english/wreck)) is a new crypto prediction market. Im mentioning it because I find the name funny, but its probably on the scammer side.
A consultant reviews [some Oracle products related to forecasting in the transportation industries](https://jbf-consulting.com/review-of-oracle-transportation-management-platform-otm-roadmap/) ([a](http://web.archive.org/web/20220810215248/https://jbf-consulting.com/review-of-oracle-transportation-management-platform-otm-roadmap/)). I found this a neat look into that industry; seems much more professionalized.
## Blog Posts and Research
Aleja Cotra—a researcher known for producing a [Biological Anchors estimate](https://www.alignmentforum.org/posts/KrJfoZzpSDpnrv9va/draft-report-on-ai-timelines) ([a](http://web.archive.org/web/20220806162221/https://www.alignmentforum.org/posts/KrJfoZzpSDpnrv9va/draft-report-on-ai-timelines)) for the time until AGI—posted a [two-year update on her personal AI timelines](https://www.lesswrong.com/posts/AfH2oPHCApdKicM4m/two-year-update-on-my-personal-ai-timelines) ([a](http://web.archive.org/web/20220906194336/https://www.lesswrong.com/posts/AfH2oPHCApdKicM4m/two-year-update-on-my-personal-ai-timelines)). These have become shorter.
Friend of the newsletter Eli Lifland has been upping his publishing pace, starting with a [Personal forecasting retrospective: 2020-2022](https://www.foxy-scout.com/retro/) ([a](http://web.archive.org/web/20220821081309/https://www.foxy-scout.com/retro/)).
[Comparing expert elicitation and model-based probabilistic technology cost forecasts for the energy transition](https://www.pnas.org/doi/full/10.1073/pnas.1917165118) ([a](http://web.archive.org/web/20220611150709/https://www.pnas.org/doi/full/10.1073/pnas.1917165118)), h/t Dan Carey:
> We conduct a systematic comparison of technology cost forecasts produced by expert elicitation methods and model-based methods. Our focus is on energy technologies due to their importance for energy and climate policy 
>
> We show that, overall, model-based forecasting methods outperformed elicitation methods 
>
> However, all methods underestimated technological progress in almost all technologies, likely as a result of structural change across the energy sector due to widespread policies and social and market forces.
Stephanie Losi writes [The Silence of Risk Management Victory](https://riskmusings.substack.com/p/the-silence-of-risk-management-victory) ([a](http://web.archive.org/web/20220901154711/https://riskmusings.substack.com/p/the-silence-of-risk-management-victory)), giving past examples of scenarios which might have led to catastrophe if not for preventative measures. See also [the preparedness paradox](https://en.wikipedia.org/wiki/Preparedness_paradox) ([a](http://web.archive.org/web/20220827083518/https://en.wikipedia.org/wiki/Preparedness_paradox)).
[Social Perception of Forecasters: People See Forecasts of Future Outcomes as Cues to Forecasters Desires, Attitudes, and Identity](https://journals.sagepub.com/doi/pdf/10.1177/1948550618789607):
> While peoples forecasts of future outcomes are often guided by their preferences (“desirability bias”), it has not been explored yet whether people infer others preferences from their forecasts.
>
> Across 3 experiments and overall 30 judgments, forecasters who thought that a particular future outcome was likely (vs. unlikely) were perceived as having a stronger preference for this outcome
Holden Karnofsky looks at [AI strategy nearcasting](https://forum.effectivealtruism.org/posts/ktEzS3pkfeqPNh6r5/ai-strategy-nearcasting) ([a](http://web.archive.org/web/20220908052328/https://forum.effectivealtruism.org/posts/ktEzS3pkfeqPNh6r5/ai-strategy-nearcasting)), defined as "trying to answer key strategic questions about transformative AI, under the assumption that key events (e.g., the development of transformative AI) will happen in a world that is otherwise relatively similar to today's."
The Quantified Uncertainty Research Institute, the NGO for which I work, recently released an "early access" version of [Squiggle](https://forum.effectivealtruism.org/posts/ZrWuy2oAxa6Yh3eAw/usd1-000-squiggle-experimentation-challenge) ([a](http://web.archive.org/web/20220824162645/https://forum.effectivealtruism.org/posts/ZrWuy2oAxa6Yh3eAw/usd1-000-squiggle-experimentation-challenge)), a language for probabilistic estimation. We are also [hiring](https://quantifieduncertainty.org/careers) ([a](http://web.archive.org/web/20220907233928/https://quantifieduncertainty.org/careers))!
Nathan Barnard looks at how [forecasting could have prevented intelligence failures](https://thegoodblog.substack.com/p/intelligence-failures-and-a-theory) ([a](http://web.archive.org/web/20220902204310/https://thegoodblog.substack.com/p/intelligence-failures-and-a-theory)), speculating that better forecasting would lead to better outcomes by allowing nations to better know when to hold 'em and when to fold 'em. I am sympathetic to the general argument, but a bit uncertain about the extent to which Tetlock-style forecasting could have provided better guidance in the specific historical case studies mentioned, as opposed to on average, across many such cases.
[This blog post](https://www.realclimate.org/index.php/archives/2021/02/dont-climate-bet-against-the-house/) ([a](http://web.archive.org/web/20220815134913/https://www.realclimate.org/index.php/archives/2021/02/dont-climate-bet-against-the-house/)), via [Stat Modeling](https://statmodeling.stat.columbia.edu/2022/08/11/bets-as-forecasts-bets-as-probability-assessment-difficulty-of-using-bets-in-this-way/) ([a](http://web.archive.org/web/20220829033825/https://statmodeling.stat.columbia.edu/2022/08/11/bets-as-forecasts-bets-as-probability-assessment-difficulty-of-using-bets-in-this-way/)), covers some recent bets on climate change.
Nostalgebraist [picks a beef](https://nostalgebraist.tumblr.com/post/692086358174498816/idk-who-needs-to-hear-this-but-metaculus-is) ([a](http://web.archive.org/web/20220809164313/https://nostalgebraist.tumblr.com/post/692086358174498816/idk-who-needs-to-hear-this-but-metaculus-is)) with [Metaculus](https://nostalgebraist.tumblr.com/post/692246981744214016/more-on-metaculus-badness) ([a](http://web.archive.org/web/20220827092332/https://nostalgebraist.tumblr.com/post/692246981744214016/more-on-metaculus-badness)).
Nikos Bosse and Sam Abbott [argue that one currently neglected strategy of making forecasting more useful is to focus on making domain experts better forecasters](https://forum.effectivealtruism.org/posts/WFbf2d4LHjgvWJCus/cause-exploration-prizes-training-experts-to-be-forecastersThree) ([a](http://web.archive.org/web/20220826124413/https://forum.effectivealtruism.org/posts/WFbf2d4LHjgvWJCus/cause-exploration-prizes-training-experts-to-be-forecastersThree)).
An article in Nature publishes a [standardized and comparable set of short-term forecasts](https://www.nature.com/articles/s41597-022-01517-w) ([a](http://web.archive.org/web/20220820063200/https://www.nature.com/articles/s41597-022-01517-w)) on COVID-19 in the US.
Issues from the Technological Forecasting and Social Change journal can be seen [here](https://www.sciencedirect.com/journal/technological-forecasting-and-social-change/issues) ([a](http://web.archive.org/web/20191224192411/https://www.sciencedirect.com/journal/technological-forecasting-and-social-change/issues)). I only briefly skimmed it, and I don't particularly expect it to be particularly good, but it's possible it might be of interest to some in the community.
### Technical content
Jonas Moss looks at [updating just on the passage of time](https://forum.effectivealtruism.org/posts/kbZjCF8kmCWgLbzxC/updating-on-the-passage-of-time-and-conditional-prediction) ([a](https://web.archive.org/web/20220909114605/https://forum.effectivealtruism.org/posts/kbZjCF8kmCWgLbzxC/updating-on-the-passage-of-time-and-conditional-prediction)). In particular, he works out the details for questions about hazard rates, like "Will Putin stay in power until August 11th 2030?", and given a few possible forms of the hazard rate (constant, Weibull, [GompertzMakeham](https://en.wikipedia.org/wiki/Gompertz%E2%80%93Makeham_law_of_mortality) ([a](http://web.archive.org/web/20220830122559/https://en.wikipedia.org/wiki/Gompertz%E2%80%93Makeham_law_of_mortality))). 
I found it amusing that his more complicated Gompertz-Makeham model gave essentially the same answer as a much simpler constant hazard rate model:
![](https://i.imgur.com/mREixOS.png)
Ege Erdil presents [Variational Bayesian Methods](https://www.lesswrong.com/posts/MFm3A4ihz9s5j2cCo/variational-bayesian-methods) ([a](http://web.archive.org/web/20220903082102/https://www.lesswrong.com/posts/MFm3A4ihz9s5j2cCo/variational-bayesian-methods)), and interprets naïve k-means clustering as a Bayesian approximation.
There was some neat back and forth on continuous prediction market at the [Eth Research Forum](https://ethresear.ch/t/continuous-prediction-markets/12993) ([a](http://web.archive.org/web/20220716043542/https://ethresear.ch/t/continuous-prediction-markets/12993)). In particular, [Bo Wagonner](https://www.bowaggoner.com/) proposes the following [scheme](https://papers.nips.cc/paper/2015/hash/7af6266cc52234b5aa339b16695f7fc4-Abstract.html) ([a](http://web.archive.org/web/20210726022240/https://papers.nips.cc/paper/2015/hash/7af6266cc52234b5aa339b16695f7fc4-Abstract.html)):
> The market maker offers to sell “shares” in any given point on the real line. The payoff of a share drops off with distance between its center point and the actual outcome, in the shape of a Gaussian. E.g. if you bought a share of 500k, and the outcome is 300k, your share pays out something like e^{-(500k - 300k)^2 / 2sigma^2} where sigma is a constant chosen ahead of time.
I think this is ingenious because it allows users to bet in favour or against a distribution, but without having to specify all of it, and while being resilient against small perturbations.
## In the News
Fraser Nelson writes about [fallible forecasts in the UK's recent history](https://www.spectator.co.uk/article/why-liz-truss-is-right-to-say-forecasts-are-not-destiny-) ([a](http://web.archive.org/web/20220810102427/https://www.spectator.co.uk/article/why-liz-truss-is-right-to-say-forecasts-are-not-destiny-)).
[Zuckerberg: Company's pandemic-era forecast was too rosy](https://www.reuters.com/technology/metas-mark-zuckerberg-companys-pandemic-era-forecast-was-too-rosy-2022-07-29/) ([a](http://web.archive.org/web/20220901164537/https://www.reuters.com/technology/metas-mark-zuckerberg-companys-pandemic-era-forecast-was-too-rosy-2022-07-29/)). "Zuckerberg told staffers the world's biggest social media company had planned for growth too optimistically, mistakenly expecting that a bump in usage and revenue growth during COVID-19 lockdowns would be sustained."
[Hungary's weather chief sacked over the wrong forecast](https://www.bbc.com/news/world-europe-62640051) ([a](http://web.archive.org/web/20220904034607/https://www.bbc.com/news/world-europe-62640051)).
---
Note to the future: All links are added automatically to the Internet Archive, using this [tool](https://github.com/NunoSempere/longNowForMd) ([a](http://web.archive.org/web/20220711161908/https://github.com/NunoSempere/longNowForMd)). "(a)" for archived links was inspired by [Milan Griffes](https://www.flightfromperfection.com/) ([a](http://web.archive.org/web/20220814131834/https://www.flightfromperfection.com/)), [Andrew Zuckerman](https://www.andzuck.com/) ([a](http://web.archive.org/web/20220316214638/https://www.andzuck.com/)), and [Alexey Guzey](https://guzey.com/) ([a](http://web.archive.org/web/20220901135024/https://guzey.com/)).
---
> When you lose — and you _sure can_ lose, with N large, you can lose real big. Q.E.D.
Paul Samuelson, in [Why we should not make mean log of wealth big though years to act are long](http://www-stat.wharton.upenn.edu/~steele/Courses/434/434Context/Kelly%20Resources/Samuelson1979.pdf) ([a](http://web.archive.org/web/20220113223222/http://www-stat.wharton.upenn.edu/~steele/Courses/434/434Context/Kelly%20Resources/Samuelson1979.pdf)), points out that Kelly-betting is not [statewise dominant](https://en.wikipedia.org/wiki/Stochastic_dominance#Statewise_dominance) over more risk-averse approaches, and thus its preferability depends on one's risk profile.

View File

@ -0,0 +1,337 @@
An experiment eliciting relative estimates for Open Philanthropys 2018 AI safety grants
==============
## Summary
I present the design and results of an experiment eliciting relative values from six different researchers for the nine large AI safety grants Open Philanthropy made in 2018. 
The specific elicitation procedures I used might be usable for **rapid evaluation setups**, for going from zero to some evaluation, or for identifying disagreements. For weighty decisions, I would recommend more time-intensive approaches, like explicitly modelling the pathways to impact.
## Background and motivation
This experiment follows up on past work around relative values ([1](https://forum.effectivealtruism.org/posts/9hQFfmbEiAoodstDA/simple-comparison-polling-to-create-utility-functions), [2](https://forum.effectivealtruism.org/posts/hrdxf5qdKmCZNWTvs/valuing-research-works-by-eliciting-comparisons-from-ea), [3](https://utility-function-extractor.quantifieduncertainty.org/)) and more generally on work to better [estimate values](https://forum.effectivealtruism.org/s/AbrRsXM2PrCrPShuZ). The aim of this research direction is to explore a possibly scalable way of producing estimates and evaluations. If successful, this would bring utilitarianism and/or longtermism closer to producing practical guidance around more topics, which has been a recurring thread in my work in the last few years. 
## Methodology
My methodology was as follows:
1. I selected a group of participants whose judgment I consider to be good.
2. I selected a number of grants which I thought would be suitable for testing purposes.
3. Participants familiarized themselves with the grants and with what exactly they ought to be estimating.
4. Participants made their own initial estimates using two different methods:
1. Method 1: Using a utility function extractor app.
2. Method 2: Making a “hierarchical tree” of estimates.
5. For each participant, I aggregated and/or showed their two estimates side by side, and asked them to make a best guess estimate.
6. I took their best guess estimates, and held a discussion going through each grant, making participants discuss their viewpoints when they had some disagreements.
7. After holding the discussion, I asked participants to make new estimates.
Overall, the participants took about [two to three hours](https://www.squiggle-language.com/playground#code=eNqrVirOyC8PLs3NTSyqVLIqKSpN1QELuaZkluQXwUQy8zJLMhNzggtLM9PTc1KDS4oy89KVrJQ0NAwNFEryFYwNNBW0FTSMwBwTCMcUIWFmgCygqW9moFQLAMYeI4o%3D) each to complete this process, roughly divided as follows:
1. 10 to 30 mins to familiarize themselves with the estimation target and to re-familiarize themselves with the grants
2. 20 to 40 mins to do the two initial estimates
3. 5 to 30 mins to give their first best guess estimate after seeing the result of the two different methods
4. 1h to hold a discussion
5. 5 to 30 mins to give their resulting best guess estimate 
The rest of this section goes through these steps individually.
### Selection of participants
I selected participants by asking friends or colleagues whose judgment I trust, and who had some expertise or knowledge of AI safety. In particular, I selected participants who would be somewhat familiar with Open Philanthropy grants, because otherwise the time required for research would have been too onerous.
The participants were Gavin Leech, Misha Yagudin, Ozzie Gooen, Jaime Sevilla, Daniel Filan and another participant who prefers to remain anonymous. Note that one participant didnt participate in all the rounds, which is why some summaries contain only five datapoints. 
### Selection of grants
The grants I selected were:
* [AI Impacts — General Support (2018)](https://www.openphilanthropy.org/focus/global-catastrophic-risks/potential-risks-advanced-artificial-intelligence/ai-impacts-general-support-2018): $100,000
* [Machine Intelligence Research Institute — AI Safety Retraining Program](https://www.openphilanthropy.org/focus/global-catastrophic-risks/potential-risks-advanced-artificial-intelligence/machine-intelligence-research-institute-ai-safety-retraining-program): $150,000
* [Open Phil AI Fellowship — 2018 Class](https://www.openphilanthropy.org/focus/global-catastrophic-risks/potential-risks-advanced-artificial-intelligence/open-phil-ai-fellowship-2018-class): $1,135,000
* [Ought — General Support (2018)](https://www.openphilanthropy.org/focus/global-catastrophic-risks/potential-risks-advanced-artificial-intelligence/ought-general-support): $525,000
* [Oxford University — Research on the Global Politics of AI](https://www.openphilanthropy.org/focus/global-catastrophic-risks/potential-risks-advanced-artificial-intelligence/oxford-university-global-politics-of-ai-dafoe): $429,770
* [Stanford University — Machine Learning Security Research Led by Dan Boneh and Florian Tramer](https://www.openphilanthropy.org/focus/global-catastrophic-risks/potential-risks-advanced-artificial-intelligence/stanford-university-machine-learning-security-research-dan-boneh-florian-tramer): $100,000
* [UC Berkeley — AI Safety Research (2018)](https://www.openphilanthropy.org/focus/global-catastrophic-risks/potential-risks-advanced-artificial-intelligence/university-of-california-berkeley-artificial-intelligence-safety-research-2018): $1,145,000
* [Wilson Center — AI Policy Seminar Series](https://www.openphilanthropy.org/focus/global-catastrophic-risks/potential-risks-advanced-artificial-intelligence/wilson-center-ai-policy-seminar-series): $400,000
These are all the grants that Open Philanthropy made to reduce AI risk in 2018 above a threshold of $10k, according to their [database](https://www.openphilanthropy.org/grants/?q=&focus-area=potential-risks-advanced-ai&yr=2018). The year these grants were made is long enough ago that we have some information about their success.
I shared a [briefing](https://docs.google.com/document/d/1sTCwFUA7_G46YzUp4p4U_OvpYd9tdmq7D8IRdL63BeA/edit#heading=h.tf7bismm62hi) with the participants summarizing the nine Open Philanthropy grants above, with the idea that it might speed the process along. 
In hindsight, this was suboptimal, and might have led to some anchoring bias. Some participants complained that the summaries had some subjective component. These participants said they used the source links but did not pay that much attention to these opinions.
On the other hand, other participants said they found the subjective estimates useful. And because the briefing was written in good faith, I am personally not particularly worried about it. Even if there are anchoring issues, we may not necessarily care about it if we think that the output is accurate, in the same way that we may not care about forecasters anchoring on the base rate.
If I were redoing this experiment, I would probably limit myself even more to expressing only factual claims and finding sources. A better scheme may have been share a writeup with a minimal subjective component, then  strongly encouraging participants to make their own judgments before looking at a separate writeup with more subjective summaries, which they can optionally use to adjust their estimates
### Estimation target
I asked participants to estimate “_the probability distribution of the relative ex-post counterfactual values of Open Philanthropys grants”_. 
* _**the distribution**_: inputs are distributions, using Guesstimate-like syntax, like “1 to 10”, which represents a lognormal distribution with its 90% confidence interval ranging from 1 to 10.
* estimates are _**relative**_: we dont necessarily have an absolute set comparison point, like percentage points of reduction in x-risk. This means that estimates were expressed in the form “grant A is x to y times more valuable than grant B”.
* estimates are _**ex-post**_ (after the fact) because estimating ex-ante expected values of something that already has happened is a) more complicated, and b) amenable to falling prey to hindsight bias.
* estimates are of the _**counterfactual value**_ because estimating the Shapley value is a headache. And if we want to arrive at cost-effectiveness, we can just divide by the grant cost, which is known.
* estimates are about the value _**of the grants**_, as opposed to the value of the projects, because some of the projects could have gotten funding elsewhere. And so the value of the grants might be small, lie in OpenPhil acquiring influence, or have more to do with seeding a field than with the project themselves. 
More detailed instructions to participants can be seen [here](https://docs.google.com/document/d/1VNnFtKKoMqJcqMD_4XFy9-86LJkv3p_rIGF3VDWRTac/edit#). In elicitation setups such as this, I think that specifying the exact subject of discussion is valuable, so that participants are talking about the same thing.
Still, there were some things I wasnt explicit about:
* Participants were not intended to consider the counterfactual cost of capital. So for example, a neutral grant that didnt have further effects on the world should have been rated as having a value of 0. However, I wasnt particularly explicit about this, so its possible that participants were thinking something else.
* I dont remember being clear about whether participants should have estimated relative values or relative _expected_ values. Looking at the intervals below, they are pretty narrow, which might be explained by participants thinking about expected value instead.
### Elicitation method #1: Utility function extractor application
The first method was a “utility function extractor”, the app for which can be found [here](https://utility-function-extractor.quantifieduncertainty.org/). The idea here is to make possibly inconsistent pairwise comparisons between pairs of grants, and extract a utility function from this. Past prior work and explanations can be found [here](https://forum.effectivealtruism.org/posts/hrdxf5qdKmCZNWTvs/valuing-research-works-by-eliciting-comparisons-from-ea) and [here](https://forum.effectivealtruism.org/posts/9hQFfmbEiAoodstDA/simple-comparison-polling-to-create-utility-functions). 
An example of the results for one user looks like this:
![](https://i.imgur.com/vQ0WYXa.png)
I first processed each participants utility function extractor results into a table like this one:
![](https://i.imgur.com/j7GhxSB.png)
and then processed it into proper distributional aggregates using [this package](https://github.com/quantified-uncertainty/utility-function-extractor/tree/master/packages/utility-tools). One difficulty I ran into is that I didnt consider that some of the estimates could be negative, because I was using the geometric mean as an aggregation method. This wrought havoc into distributional aggregates, particularly when some of the estimates for one particular element were sometimes positive and sometimes negative.
### Elicitation method #2: Hierarchical tree estimates
The second method involved creating a hierarchical tree of estimates, using [this Observable document](https://observablehq.com/@nunosempere/relative-value-comparisons-within-clusters-public). The idea here is to express relationships between the grants using a “hierarchical model”, where grants belonging to a category are compared to a reference grant, and reference grants are then compared to a greater reference element (“one year of Paul Christiano's work”).
The interface I asked participants to use looked as follows:
![](https://i.imgur.com/01zIRPk.png)
A participant mentioned that this part was painful to fill. Using a visualization scheme which the participants didnt have access to at the time, participant results can be displayed as:
![](https://i.imgur.com/LsaL7yK.png)
In this case, the top-most element is “percentage reduction in x-risk”. I asked some participants for their best guess for this number, and the one displayed gave 0.03% per year of Paul Christianos work.
### Elicitation method #3:  Individual aggregate estimates
After presenting participants with their estimates from the two different methods, I asked the participants to give their pointwise first guesses after reflection. Their answers, normalized to add up to 100, can be summarized as follows:
![](https://i.imgur.com/WkKpz2A.png)
Researcher #6 only reported his estimates using one method (the utility function extractor), and then participated on the discussion round, which is why he isnt shown in this table.
So for example, researcher #4 is saying that the first grant, to research on the Global Politics of AI at the University of Oxford (GovAI), was the most valuable grant. In particular, the estimate is saying that it has 71% of the total value of the grants. The estimate is also saying that the grant to GovAI is 71/21.2 = 3.3 times as valuable as the next most valuable grant, to Michael Cohen and Dmitri Krasheninnikov.
### Elicitation method #4: Discussion and new individual estimates
After holding a discussion round for an hour, participants estimates shifted to the following[\[1\]](#fnpmfo0q7i4di):
![](https://i.imgur.com/xleSkdf.png)
To elicit these estimates, I asked participants to divide approximately 100 units of value between the different grants. Some participants found this elicitation method more convenient and less painful than the previous pairwise comparisons. 
## Observations and reflections
### Initial estimates from the same researcher using two different methods did not tend to overlap
Consider two estimates, expressed as 90% confidence intervals:
* 10 to 100
* 500 to 1000
These estimates do not overlap. That is, the highest end of the first estimate is smaller than the lower end of the second estimate.
When analyzing the results, I was very surprised to see that in many cases, estimates made by the same participant about the same grant using the first two methods—the utility function extractor and hierarchical tree—did not overlap:
![](https://i.imgur.com/8BYTnyP.png)
In the table above, for example, the first light red “FALSE” square under “Researcher 1” and to the side of “Oxford University…” indicates that the 90% estimates initial produced by researcher 1 about that grant do not overlap.
### Estimates between participants after holding a discussion round were mostly in agreement
The final estimates made by the participants after discussion were fairly concordant[\[2\]](#fnqbjzronh3oi):
![](https://i.imgur.com/xleSkdf.png)
For instance, if we look at the first row, the 90% confidence intervals[\[3\]](#fnacizl98aof) of the normalized estimates are 0.1 to 1000, 48 to 90, -16 to 54, 41 to 124, 23 to 233, and 20 to 180. These all overlap! If we visualize these 90% confidence intervals as lognormals or loguniforms, they would look as follows[\[4\]](#fnclvpudp11e): 
![](https://i.imgur.com/LNqcXxv.png)
### Discussion of the shape of the results
Many researchers assigned most of the expected impact to one grant, similar to a power law or an 80/20 Pareto distribution, though a bit flatter. There was a tail of grants widely perceived to be close to worthless. There was also disagreement about the extent to which grants could have negative value.
The estimates generally seem to me to have been too narrow. In many cases they span merely an order of magnitude. This can maybe be partially explained by some ambiguity about whether participants were estimating relative expected values or the actual values.
### Thoughts on accuracy
The fact that the estimates end up clustering together could be a result of:
* Participants rationally coming to agree as a result of acquiring the same knowledge.
* Social pressure, group-think, human biases, or other effects. Not all of these might be negative: for example, if the group correctly identifies the most knowledgeable person about each grant and then defers to them, this could make the estimates better.
Overall I think that convergence is a weak and positive signal of accuracy. For example, per Aumanns agreement theorem, participants shouldnt expect to “agree to disagree”, so to the extent that irrational disagreement is not happening, convergence is good.
One way to find out whether this aggregate is converging to something like the truth would be to have a separate group, or a separate person known to have good judgment, make their own estimates independently, and then compare them with these estimates. This would require an additional time investment.
### What was the role of Squiggle?
I used Squiggle in the utility function extractor and in the hierarchical method, interpreting distributions using Squiggle syntax. I then also used it for aggregating the estimates, both to aggregate the many estimates made by one participant, and to arrive at an aggregate of all participants estimates.
### Thoughts on scaling up this type of estimation up
Im estimating that the experiment took 20 to 40 hours:
```
hours_per_participant = 2 to 5
participants = 5 to 6
participant_hours = hours_per_participant * participants
organizer_hours = (2 to 4) + (2)  + (0.3 to 2) + (4 to 15) + (0.2 to 0.5) // preparation + hosting + nagging + writeup + paying
participant_hours + organizer_hours
```
![](https://i.imgur.com/bRfX2lv.png)
So for 9 grants, this is 2.6 to 4.9 hours per grant. Perhaps continued investment could bring this down to one hour per grant. I also think that time might scale roughly linearly with the number of grants, because grants can be divided into buckets, and then we can apply the relative value method to each bucket. Then we can compare buckets at a small additional cost—e.g., by comparing the best grants from each bucket.
Im not actually sure how many grants the EA ecosystem has, but Im guessing something like 300 to 1000 grants per year[\[5\]](#fn6fjejaxnj27). Given this, it would take half to two FTEs (full-time equivalents) to evaluate all grants, which was lower than I suspected:
```
hours_per_participant = 2 to 5
participants = 5 to 6
participant_hours = hours_per_participant * participants
organizer_hours = (2 to 4) + (2)  + (0.3 to 2) + (4 to 15) + (0.2 to 0.5) // preparation + hosting + nagging + writeup + paying
hours_per_grant = (participant_hours + organizer_hours) /  9
grants_per_year = 300 to 1000
hours_per_person_per_year = (30 to 50) * 52
ftes_to_evaluate_all_grants = grants_per_year * hours_per_grant / hours_per_person_per_year
ftes_to_evaluate_all_grants
```
![](https://i.imgur.com/pzNLX5j.png)
~1 FTE per year seems low enough that it seems doable. However, note that this would be spread amongst many people, which would have extra costs, because of attention/context-switching and coordination costs.
### Relative estimates as an elicitation method vs as an output format
There is a difference between relative estimates as an elicitation method (as presented here) and relative estimates as an output format (where we have the relative values of projects, and transformations between these and reference units, like QALYs, fractions of the future, etc.)\\. Its possible that relative values as an output format remain promising even as relative values as a (rapid) elicitation method remain less so. 
### Relative estimates of value seem a bit more resilient to shifts in what we care about
One advantage of relative values as a format might be that they are more resilient to shifts in what we care about (sometimes called “[ontological crisis](https://www.lesswrong.com/tag/ontological-crisis)”). Thanks to Ozzie Gooen for this point. For instance, raw estimates of value may change as we switch from DALYs, to QALYs, to fractions of the future, to other units, or as we realize that the future is larger or smaller than we thought. But relative values would perhaps remain more stable.
### Thoughts on alternative value estimation methods
The main alternative to relative values that Im considering is estimates made directly in a unit of interest, such as percentage or basis points of existential risk reduction, or QALYs saved. In particular, Im thinking of setups which decompose impact into various steps and then estimate the value or probability of each step.
**A concrete example**
For instance, per [AI Governance: Opportunity and Theory of Impact](https://forum.effectivealtruism.org/posts/42reWndoTEhFqu6T8/ai-governance-opportunity-and-theory-of-impact#Prioritization_and_Theory_of_Impact), the pathway to impact for the GovAI center would be something like this:
![](https://i.imgur.com/fuRWkA9.png)
(Higher quality image [here](https://i.imgur.com/CS1mg13.png).)
Giving some _very_ quick numbers to this, say:
* a 12% chance of AGI being built before 2030, 
* a 30% of it being built in Britain by then if so,
* a 90% of it being built by DeepMind if so, 
* an initial 50% chance of it going well if so
* GovAI efforts shift the probability of it going well from 50% to 55%. 
Punching those numbers into a calculator, a rough estimate is that GovAI reduces existential risk by around 0.081%, or 8.1 [basis points](https://en.wikipedia.org/wiki/Basis_point). 
The key number here is the 5% improvement (from 50% to 55%). Im getting this estimate mostly because I think that Allan Dafoe being the “Head of Long-term Strategy and Governance” at DeepMind seems like a promising signal. It nicely corresponds to the “having people in places to implement safety strategies” part of GovAIs pathway to impact. But that estimation strategy is very crude, and I could imagine a better estimate ranging from <0.5% to more than 5%.
To avoid the class of problems around using point estimates rather than distributions that [Dissolving the Fermi Paradox](https://arxiv.org/abs/1806.02404) points out, we can rewrite these point estimates into distributional probabilities:
```
t(d) = truncateLeft(truncateRight(d, 1), 0)
agi_before_2030 = t(0.01 to 0.3) // should really be using beta distributions, though
agi_in_britain_if_agi_before_2030 = t(0.1 to 0.5)
agi_by_deepmind_if_agi_in_britain = t(0.8 to 1)
increased_survival_probability = t(0.001 to 0.1) // changed my mind while putting a distributional estimate
value_of_govai = t(agi_before_2030 * agi_in_britain_if_agi_before_2030 * agi_by_deepmind_if_agi_in_britain * increased_survival_probability)
value_of_govai_in_percentage_points = value_of_govai * 100
value_of_govai_in_percentage_points
```
This produces an estimate of 0.52% of the future, or 52 basis points, which is around 6x higher than our initial estimate of 8.1 basis points. But we shouldnt be particularly surprised to see these estimates vary by ~1 order of magnitude.
We could make a more granular estimate by thinking about how many people would be involved in that decision, how many would have been influenced by GovAI, etc. 
In any case, in [this post](https://forum.effectivealtruism.org/posts/cKPkimztzKoCkZ75r/how-many-ea-2021-usds-would-you-trade-off-against-a-0-01), Linch estimates that we should be prepared to pay [$100M to $1B](https://forum.effectivealtruism.org/posts/cKPkimztzKoCkZ75r/how-many-ea-2021-usds-would-you-trade-off-against-a-0-01?commentId=ooEuFiZKQwwacb7MJ) for a 0.01% reduction in existential risk, or $7.2B to $72B for the existential risk reduction of 0.72% that I quickly estimated GovAI to produce. Because GovAIs budget is much lower, it seems like an outstanding opportunity, conditional on that estimate being correct.
**How does that example differ from the relative estimates method?**
In this case, both the relative values method and the explicit pathway to impact method end up concluding that GovAI is an outstanding opportunity, but the explicit estimate method seems much more legible, because its moving parts are explicit and thus can more easily be scrutinized and challenged. 
Note that GovAI has a [very clearly written](https://forum.effectivealtruism.org/posts/42reWndoTEhFqu6T8/ai-governance-opportunity-and-theory-of-impact#Prioritization_and_Theory_of_Impact) explanation of its theory of impact, which other interventions may not have. And producing a clear theory of impact, of the sort which could be used for estimation, might be too time-consuming for any given small grant. But I am optimistic that we could have templates which we could then reuse.
### Future work
Future work directions might involve:
* Finding more convenient and scalable ways to produce these kinds of estimates
* Finding better ways to **visualize**, present and interrogate these estimates
* Checking whether these estimates align with expert intuition
* Applying these estimation methods to regimes where there was previously very estimation being done
* Further experimenting with more in-depth and high-quality estimation methods than the one used here
* Using relative estimates as a way to identify disagreements
I still think relative values are meaningful for creating units, such as “quality-adjusted sentient life year”. But otherwise, Im most excited about purely relative estimates as a better method for aiding relatively low-level decisions, and estimates based on the pathway to impact as a more expensive estimation option for more important decisions.
One reason for this view is that I have become more convinced that direct estimates of variables of interest (like basis points of existential risk reduction) can be meaningfully estimated, although at some expense. Previously, I thought that producing endline estimates might end up being too expensive.
Its possible that relative value estimates could also be used for other use cases, such as to create evaluations of grants in cases where there previously were none, or to align the intuitions of senior and junior grantmakers. But I dont consider this particularly likely, maybe because people who could be doing this kind of thing would have more valuable projects to implement.
## Acknowledgements
<p><img src="https://i.imgur.com/7yuRrge.png" class="img-frontpage-center"></p>
Thanks to Gavin Leech, Misha Yagudin, Ozzie Gooen, Jaime Sevilla, Daniel Filan and another other anonymous participant for participating in this experiment. Thanks to them and to Eli Lifland for their comments and suggestions throughout and afterwards. Thanks to Hauke Hillebrandt, Ozzie Gooen and Nick Beckstead for encouragement around this research direction.
This post is a project by the [Quantified Uncertainty Research Institute](https://quantifieduncertainty.org/) (QURI). The language used to express probability distributions used throughout the post is [Squiggle,](https://www.squiggle-language.com/) which is being developed by QURI.
## Appendix: More details
You can find more detailed estimates in [this Google Sheet](https://docs.google.com/spreadsheets/d/1ivaFYy_07X5JDZ0qj8kkgTr790gPDJ8Asc2I7UVLjHA/edit?usp=sharing). For each participant, their sheet shows:
* The results for each method
* The results for an aggregate of both methods
* The best guess of the participant after seeing the results for each method and an aggregate
* The best guess of the participant after discussing with other participants
You can also find more detailed aggregates in [this Google Sheet](https://docs.google.com/spreadsheets/d/13inKETvESvcOu8UX2uyM7nlUvUNbECEugt3ec_YqnoY/edit#gid=253364323), which include the individual distributions and the medians in the table in the last section.
Note that there are various methodological inelegancies:
* Researcher #2 did not participate in the discussion, and only read the notes
* Researcher #6 only used the utility function extractor method
* Various researchers at times gave idiosyncratic estimate types, like 80% confidence intervals, or medians instead of distributions.
In part because the initial estimates were not congruent, I procrastinated in hosting the discussion session, which was held around a month after the initial experiment, if I recall correctly. If I were redoing the experiment, I would hold the different parts of this experiment closer together.
1. **[^](#fnrefpmfo0q7i4di)**
Note that in the first case, I am displaying the mean, and in the other, the medians. This is because a) means of very wide distributions are fairly counterintuitive, and in various occasions, I don't think that participants thought much about this, and b) because of a methodological accident, participants provided means in the first case and medians in the second.
Note also that medians are a pretty terrible aggregation method.
2. **[^](#fnrefqbjzronh3oi)**
Note that the distributions aren't necessarily lognormally distributed, hence why the medians may look off. See [this spreadsheet](https://docs.google.com/spreadsheets/d/13inKETvESvcOu8UX2uyM7nlUvUNbECEugt3ec_YqnoY/edit?usp=sharing) for details.
3. **[^](#fnrefacizl98aof)**
80% for researcher #5, because of idiosyncratic reasons.
4. **[^](#fnrefclvpudp11e)**
Squiggle model [here](https://www.squiggle-language.com/playground/#code=eNqdkMFOwzAQRH9l5VMiBZQ4BRVLHPmCHDGKAnWTFYkNa5sWRfl34gJqi5Dcdk6r8WqfZ0ZmO7Op%2FDA09MmEI6%2BynfWwQmfo10GNDpu%2BevfYtr2qHKFumWArtPP47B0abWteO1Nb3MI9jFLDrKN3AY9Sf%2FtB434M0s2gBEhGyqqGXjpF4DZGsgyO9w5PClgswRm4y%2Fc7U3YWoiOlYpCr4jZQbhaXUtbGUzRJERgFvxgyFx9j8HzHWP6p66wo%2BBHti5cBw8vy%2Fyinw4yOstT2LfEa14aGpDdtkl8XaQZhKvLXNE0PvvBz50nqKWRm0xfkbtQi).
5. **[^](#fnref6fjejaxnj27)**
Open Philanthropy grants for 2021: 216, Long-term future fund grants for 2021: 46, FTX Future fund public grants and regrants: 113 so far, so an expected ~170 by the end of the year. In total this is 375 grants, and I'd wager it will be growing year by year.

View File

@ -0,0 +1,148 @@
$5k challenge to quantify the impact of 80,000 hours' top career paths
==============
## Motivation
80,000 hours has identified a number of promising career paths. They have a fair amount of analysis behind their recommendations, and in particular, they have a list of [top ten priority paths.](https://80000hours.org/career-reviews/#our-priority-paths) 
However, 80,000 hours doesnt quite[^1] have quantitative estimates of these paths' value. Although their usefulness would not be guaranteed, quantitative estimates could make it clearer:
* how valuable their top career paths are relative to each other
* how valuable their top career paths are relative to options further down their list
* at which level of personal fit one should switch between different career paths[^2]
* where the expected impact is coming from, and which variables we are most uncertain about
* eventually, whether certain opportunities are valuable in themselves or for the value of information or career capital that they provide
* etc.
**The Prize**
Following up on the [$1,000 Squiggle Experimentation Challenge](https://forum.effectivealtruism.org/posts/ZrWuy2oAxa6Yh3eAw/usd1-000-squiggle-experimentation-challenge) and the [Forecasting Innovation Prize](https://forum.effectivealtruism.org/posts/8Nwy3tX2WnDDSTRoi/announcing-the-forecasting-innovation-prize) we are offering a prize of $5k for quantitative estimates of the value of [80,000 hours' top 10 career paths](https://80000hours.org/career-reviews/#our-priority-paths). 
## Rules
**Step 1:** Make a public post online between now and December 1, 2022. Posts on the EA Forum (link posts are fine) are encouraged.
**Step 2:** Complete [this submission form.](https://airtable.com/shrFX656rcOXaDioh)
**Further details**
* Participants can use units or strategies of their choice—these might be QALYs, percentage points of reduction in existential risk, [basis points of the future](https://forum.effectivealtruism.org/posts/xmmqDdGqNZq5RELer/shallow-evaluations-of-longtermist-organizations#Notes), [basis points of existential risk reduced](https://forum.effectivealtruism.org/posts/cKPkimztzKoCkZ75r/how-many-ea-2021-usds-would-you-trade-off-against-a-0-01), career-dependent units, etc. Contestants could also use some other method, like [relative values](https://forum.effectivealtruism.org/posts/hrdxf5qdKmCZNWTvs/valuing-research-works-by-eliciting-comparisons-from-ea), estimating proxies, or some original option.
* We are specifically looking for quantitative estimates that attempt to estimate some magnitude reasonably close to the real world, similar to the units above[^3]. So for example, assigning valuations from 0 to 5 stars would not fulfil the requirements of the contest, but estimates in terms of the units above would qualify.
* Participants are free to estimate the value of one, several, or all ten career paths.
* Participants are free to use whatever tool or language they want to produce these estimates. Some possible tooling might be: Excel, [Squiggle](https://www.squiggle-language.com/), [Guesstimate](https://getguesstimate.com/), probabilistic languages or libraries (e.g., Turing.jl, PyMC3, Stan), [Causal](https://causal.app/), working directly in a popular programming language, etc.
* Participants _can_ provide point estimates of impact, but they are encouraged to provide their estimates as distributions instead. 
* Participants are free to estimate the impact of a marginal person, of a marginal person with a good fit, the average value, etc. Participants are welcome to provide both average and marginal value—for example, they could provide a function which provides an estimate of marginal value at different levels of labor and capital.
We provide some examples of possible rough submissions in an appendix. We are also happy to comment on estimation strategies: feel free to leave a comment on this post or to send a message to Nuño Sempere using the EA forum message functionality.
## Judging
The judges will be Nuño Sempere, Eli Lifland, Alex Lawsen and Sam Nolan. These judges will judge on their personal capacities, and their stances do not represent their organizations.
Judges will estimate the quality and value of the entries, and we will distribute the prize amount of $5k[^4] in proportion to an equally weighted aggregate of those subjective estimates[^5].
To reduce our operational burden, we are looking to send out around three to five prizes. If there are more than five submissions, we plan to implement a lottery system. For example, a participant who would have won $100 would instead get a 10% chance of receiving $1k.
## Acknowledgements
<p><img src="https://i.imgur.com/7yuRrge.png" class="img-frontpage-center"></p>
This contest is a project of the Quantified Uncertainty Research Institute, which is providing the contest funds and administration. Thanks in advance to Eli Lifland, Alex Lawsen and Sam Nolan for their good judgments. Thanks to Ozzie Gooen for comments and suggestions.
## Appendix: Example models
### Example I: "[Founder of new projects tackling top problems](https://80000hours.org/career-reviews/founder-impactful-organisations/)"
The following is a crude example estimate for the career path of [Founder of new projects tackling top problems](https://80000hours.org/career-reviews/founder-impactful-organisations/), written in [Squiggle](https://www.squiggle-language.com/playground#code=eNq1lN9P2zAQx%2F8Vq0%2FplCZuuwCqtIdKMLQfXceKNqGFB5M6iYdjZ%2FYFGhD%2F%2B85OC90QUB72kF%2F23dd3n7vLbc%2BW%2BnrRVBUzbW8CpuGhXzpaCtBmsyKUAMHk4ncjikLyBRihit6kF8dkXnPDQGjFJMlKJiVXBbepwneV8Xk%2BBWDoropT01iYquUxB8DP941a4oO8IxccWJCEZEz7BAUhoBFNCGhCo3FIhv0HrbXrVM1NwZS48efO89OSHxuNehuxIUW%2FB7VxJ3bQiaUKl48siIoBJzonoqpZBqlqOTOy3ZZ%2BiLFaBQm9dDoJpZchGdHua0hnIUlm3Qa%2B%2FqTRHqZCo5G%2F0%2BQcTyzEFT8Uhmcg2%2B9MNpjJyfTzmf3KzaGWkhk8YBgHwz3qdEb7dB06u%2BSK5EZXxOqKkwWryBctmSJ8Hb2NUqW34vXis0aCqKXgpos7VcSFNfThIgL3RBQH%2FomgDvyp4xG%2B4m2T4nno%2Fd6GzhbvI%2FQ8T1Xf07vipiVIvChlS4SyNSa3JBfthJQAtZ3Eca5NU0U8zzFpzJ5J7CNhqwijjWttwcbHN9XHUSNOV2%2BLs7LOTj7F99aKWzsQdsAGmVa%2FGpW53AY6H1T3qWGD2ZLVkrd%2FpYvcu1onqZIi54gJYW%2BXtGPirfaxcl26m66YFoXhBYJNFWhgcl2sIwWG1xhWY2wp6g%2FKVw%2BVdu3xN47lzk3srZ9uRr%2F9ckt5s%2Beawxs8AcnvPQL8LyVyLaAkFVuJqqmwIhAshYXQrfSRDpZcZWj2TRTl9tYucH%2Bg8oytEPI2uv%2FAGYInSYduEGb9VwGH4Bnk7qfk9Nxwa5wh%2FFnejzIxOCQ4TZYrK9wQuM6Ekm%2FwkpIbHr1UMggeFW3rp3cobC1Zm6pb4ozRfOc6TNYer%2FFBlzu8end%2FAB%2BtOwA%3D).
```
// Operational challenges
chanceOfAttainingTrustAndGettingFunding = beta(5, 30) // t(0.05 to 0.3, 1)
chanceOfGettingAnOrganizationOfTheGround = beta(10, 10) // t(0.3 to 0.8, 1)
// Estimate of impact
yearlyOrganizationFunding = mx(50k to 500k, 200k to 10M, 5M to 50M, [0.65, 0.25, 0.05])
giveDirectlyValueOfQALYsPerDollar = 1/(160 to 2700)
// ^ taken from some Sam Nolans estimates:
// <https://observablehq.com/@hazelfire/givewells-givedirectly-cost-effectiveness-analysis>
organizationValueMultiplier = mx(
 [0.1 to 1, 1 to 8, 8 to 80, 80 to 320, 320 to 500k],
 [4, 8, 4, 2, 1]
)
// very roughly inspired by:
// https://forum.effectivealtruism.org/posts/GzmJ2uiTx4gYhpcQK/
// effectiveness-is-a-conjunction-of-multipliers
shapleyMultiplier = 0.2 to 0.5
lifetimeOfOrganization = mx(2 to 7, 5 to 50)
// Aggregate
totalValueOfEntrepeneurshipInQALYs = chanceOfAttainingTrustAndGettingFunding *
 chanceOfGettingAnOrganizationOfTheGround *
 yearlyOrganizationFunding *
 giveDirectlyValueOfQALYsPerDollar *
 organizationValueMultiplier *
 lifetimeOfOrganization *
 shapleyMultiplier
// Aggregate with maximums
t(dist, max) = truncateRight(dist, max)
totalValueOfEntrepeneurshipInQALYsWithMaxs =
 chanceOfAttainingTrustAndGettingFunding *
 chanceOfGettingAnOrganizationOfTheGround *
 t(yearlyOrganizationFunding, 500M) *
 giveDirectlyValueOfQALYsPerDollar *
 t(organizationValueMultiplier, 10M) *
// ^ overall estimate really sensitive to the maximum here.
 lifetimeOfOrganization *
 t(shapleyMultiplier, 1)
// Display
{
   totalValueOfEntrepeneurshipInQALYsWithMaxs:
     totalValueOfEntrepeneurshipInQALYsWithMaxs
}
```
Alone, the estimate might be too obscure, so it would be better if it were accompanied by some explanation about the estimation strategy it is using. So, its estimation strategy is:
* To estimate the chance of getting funding and then getting an organization off the ground
* This is based on subjective guesses. Perhaps Charity Entrepreneurship, or EA funds if it kept data, could have better estimates
* To estimate the value that an organization produces. This is the weakest part of the model, and it would be better if it were based on specific steps. Instead, we are using more of a "black box" model, and estimating:
* The funding that the organization would receive
* The QALYs per dollar that a reference organization—GiveDirectly—produces, taken from [Sam Nolans estimate thereof](https://observablehq.com/@hazelfire/givewells-givedirectly-cost-effectiveness-analysis).
* The advantage over GiveDirectly that the new organization would have. We are getting this estimate from [this EA forum post](https://forum.effectivealtruism.org/posts/GzmJ2uiTx4gYhpcQK/effectiveness-is-a-conjunction-of-multipliers)
* To estimate some other factors to go from the above to the total output, again based on pretty subjective estimates:
* The lifetime of the organization
* The "Shapley multiplier" penalizes efforts which require more people. In this case, we are saying that the founder gets between 20% and 50% of the impact.
We also have to take care that not only the 90% confidence interval, but also the overall shape of the estimates was correct. For this reason, we have a step where we truncate some of them.
As mentioned, a key input of the model is the multiplier of impact over GiveDirectly, but this is based on black box reasoning. This could be a possible point of improvement. For example, we could improve it with an estimate of how many QALYs, or what percentage of the future is an speculative area like AI safety research worth.
### Example II: Value of global health charities
There are various distributional models of global health charities in the EA forum that participants may want to take some inspiration from, e.g.:
* Dan Wahls estimate of the cost-effectiveness [of LEEP](https://danwahl.net/blog/leep-cea)
* Sam Nolans [cost-effectiveness models](https://forum.effectivealtruism.org/users/hazelfire)
The advantage of these is that they can be pretty clean. The disadvantage is that they come from a different cause area.
### Example III: Value of the Centre for the Governance of AI
[Here](https://forum.effectivealtruism.org/posts/EPhDMkovGquHtFq3h/an-experiment-eliciting-relative-estimates-for-open#A_concrete_example), I give an estimate for the value of the Centre for the Governance of AI (GovAI) in terms of basis points of existential risk reduced. It might serve as a source of inspiration. One disadvantage is that it only considers one particular pathway to impact that GovAI might have, and it doesn't consider other pathways that might be more important—e.g., field-building.
### Example IV: Value of ALLFED
Historically, one of the few longtermist organizations which has made an attempt to estimate their own impact quantitatively is ALLFED. A past estimate of theirs can be seen [here](https://www.getguesstimate.com/models/9782). My sense is that the numeric estimates might have been on the optimistic side (some alternative numbers [here](https://forum.effectivealtruism.org/posts/xmmqDdGqNZq5RELer/shallow-evaluations-of-longtermist-organizations#Disagreements_and_Uncertainties)). But the estimation strategy of dividing their influence and impact depending on different steps might be something to take inspiration from.about
 
[^1]: 80,000 hours, when thinking abou their own impaabouct, internally use "discounted impact-adjusted peak year" (DIPY). But this seems like a fairly coarse unit.
[^2]: This is actually more nuanced. There might be some frustration about people quickly/naïvely jumping to whatever cause or sub-cause has the best apparent marginal value at each point in time rather than committing to something. But this might be counterproductive if people have more impact staying in one place, or if impact is a combination of people working on different areas. For a specific example, suppose that impact is a [CobbDouglas](https://en.wikipedia.org/wiki/Cobb%E2%80%93Douglas_production_function) function of work in different areas, and that there is some coordination inefficiencies. Then focusing on attaining the optimal proportion of people in each area might be better than aiming to estimate marginal values through time.
[^3]: The criteria isn't exactly to have a unit such that 2x on that unit is twice as better. For example, percentage reductions of existential/catastrophic risk in the presence of several such risks aren't additive, but we would accept such estimates. Similarly, relative values can only be translated to magnitudes in an "additive" unit with a bit of work, but we would also accept such estimates. 
[^4]: Having a fixed pot is slightly less elegant than deciding beforehand on an amount to reward for a given level of quality, but it comes with an added operations burden/uncertainty.
[^5]: For example, if we get two submissions and we estimate the first one to be twice as valuable as the second one, the first submission would receive $3.33k and the second submission would receive 1.66k. Instead, if the first submission's individual estimates were estimated to be twice as valuable, but also were twice as many in number as those of the second submission, the first one would receive $4k and the second one would receive $1k.

View File

@ -1,7 +1,7 @@
Use a less coarse analysis of AMF beneficiary age and consider counterfactual deaths
===================================================================================
**tl;dr**: GiveWell considers a fairly coarse division of beneficiary age, and groups children of 0 to 5 years old together. This may lead to inaccurate or inexact calculations. In addition, GiveWell doesn't completely account for counterfactual mortality: where a beneficiary is saved from dying of malaria but dies later anyways, although this inaccuracy is not as severe as it could be.
**tl;dr**: GiveWell considers a fairly coarse division of beneficiary age, and groups children of 0 to 5 years old together. This may lead to inaccurate or inexact calculations. In addition, GiveWell doesn't completely account for counterfactual mortality: where a beneficiary is saved from dying of malaria but dies later anyways.
Following up on [Use distributions to more parsimoniously estimate impact](https://nunosempere.com/blog/2022/09/15/use-distributions-to-more-parsimoniously-estimate-impact/), I was looking at the population analysis of the AMF distributions, because a [previous attempt](https://forum.effectivealtruism.org/posts/4Qdjkf8PatGBsBExK/adding-quantified-uncertainty-to-givewell-s-cost) at adding uncertainty to the analysis was messier than I would have wished.
@ -39,7 +39,7 @@ As far as I understand—and let me reiterate that I could be wrong here—GiveW
This is relevant because malaria is probably not the top cause of death in the regions considered:
<figure>V
<figure>
![](https://i.imgur.com/Oc3JtrT.jpeg)
<figcaption>Cumulative probability of dying by given age. a: all cause mortality; b: malaria specific mortality</figcaption>
</figure>
@ -62,7 +62,7 @@ The solution is to apply a discount factor corresponding to the probability that
A quick and dirty adjustment would be to add \*(1-B40/100) to row B56 (resp. rows C to J) of the AMF calculation, as in [this spreadsheet](https://docs.google.com/spreadsheets/d/1Gtd5ga6UREGFjnz-V4HiXAZLhOM_ifrv4CchUGSsRf4/edit?usp=sharing). This ends up making a difference of 2.5% (in the case of DRC) to 22.5% (in the case of Chad).
Besides the issue with child mortality being higher for younger children, I would also argue that children who would have otherwise died of malaria are differentially more likely to die of other causes. For instance, a sickly child is likelier to be saved from malaria, but also likelier to die of other causes. It's possible that this might add another 2 to 10% difference.
Besides the issue with child mortality being higher for younger children, I would also argue that children who would have otherwise died of malaria are differentially more likely to die of other causes. For instance, a sickly child is likelier to be saved from malaria, but also likelier to die of other causes. It's possible that this might add another 2 to 20% difference.
Conversely, the [Pryce et al.](http://ncbi.nlm.nih.gov/pmc/articles/PMC6418392/) Cochrane Review on which GiveWell's estimates are based does estimate reduction in *overall mortality*. This diminishes the impact of my argument above, because if a child was saved from malaria but died anyways of something else, it would be counted correctly in the tally of overall mortality. However, the lengths of the followups in the analyses considered by the review are: 2 years, 12 months, 6 months, 24 months, 12 months, 8 months, 12 months, 12 months, 12 months, 2 years, 1 year, 6 months, 2 years, 1 year, 12 months, 2 years, 2 years, 13 months, 6 months, 15 weeks, 1 year, 6 months, and 10 months, which isn't enough to deal with all of the under 5 mortality.