feat: New content and font

This commit is contained in:
Nuno Sempere 2022-04-07 14:49:49 +00:00
parent d9f5219eee
commit aac615d4b0
40 changed files with 641 additions and 3 deletions

Binary file not shown.

Binary file not shown.

0
_werc/config Normal file → Executable file
View File

View File

@ -1,5 +1,11 @@
<br class="doNotDisplay doNotPrint" />
<div style="margin-right: auto;"><a href="http://werc.cat-v.org">Powered by werc</a></div>
<div style="margin-right: auto;">Powered by <a href="http://werc.cat-v.org/">werc</a>, <a href="https://alpinelinux.org/">Alpine Linux</a> and <a href="https://nginx.org/en/">nginx</a></div>
<div><form action="/_search/" method="POST"><input type="text" id="searchtext" name="q"> <input type="submit" value="Search"></form></div>
<!-- TODO: wait until duckduckgo indexes site
<form action="https://duckduckgo.com/" method="get">
<input type="hidden" name="sites" value="nunosempere.com">
<input type="search" name="q">
<input type="submit" value="Search">
</form>
-->

Binary file not shown.

After

Width:  |  Height:  |  Size: 41 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 67 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 119 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 114 KiB

View File

@ -0,0 +1,165 @@
Samotsvety Nuclear Risk Forecasts — March 2022
==============
_Thanks to Misha Yagudin, Eli Lifland, Jonathan Mann, Juan Cambeiro, Gregory Lewis, @belikewater, and Daniel Filan for forecasts. Thanks to Jacob Hilton for writing up an earlier analysis from which we drew heavily. Thanks to Clay Graubard for sanity checking and to  Daniel Filan for independent analysis. This document was written in collaboration with Eli and Misha, and we thank those who commented on an earlier version._
## Overview
In light of the war in Ukraine and fears of nuclear escalation[\[1\]](#fn1tt4cl1ut8si), we turned to forecasting to assess whether individuals and organizations should leave major cities. We aggregated the forecasts of 8 excellent forecasters for the question _**What is the risk of death in the next month due to a nuclear explosion in London?**_ Our aggregate answer is 24 micromorts (7 to 61) when excluding the most extreme on either side[\[2\]](#fnl2youss95ij). A micromort is defined as a 1 in a million chance of death. Chiefly, we have a low baseline risk, and we think that escalation to targeting civilian populations is even more unlikely. 
For San Francisco and most other major cities[\[3\]](#fncfroiodaps), we would forecast 1.5-2x lower probability (12-16 micromorts). We focused on London as it seems to be at high risk and is a hub for the effective altruism community, one target audience for this forecast.
Given an estimated 50 years of life left[\[4\]](#fnpt69xmy0uqf), this corresponds to ~10 hours lost. The forecaster range without excluding extremes was <1 minute to ~2 days lost. Because of productivity losses, hassle, etc., we are currently not recommending that individuals evacuate major cities. 
## Methodology
We aggregated the forecasts from eight excellent forecasters between the 6th and the 10th of March. [Eli Lifland](https://www.elilifland.com/), [Misha Yagudin](https://forum.effectivealtruism.org/users/misha_yagudin), [Nuño Sempere](https://nunosempere.com/), [Jonathan Mann](https://jonathanmann.github.io/) and [Juan Cambeiro](https://twitter.com/juan_cambeiro)[\[5\]](#fntfhokbz7tk) are part of Samotsvety, a forecasting group with a good track record — we won CSET-Foretells first two seasons, and have great track records on various platforms. The remaining forecasters were [Gregory Lewis](https://www.fhi.ox.ac.uk/team/lewis-gregory/)[\[6\]](#fnmz4wxunrpnc), @belikewater, and [Daniel Filan](https://danielfilan.com/), who likewise had good track records. 
The overall question we focused on was: _**What is the risk of death in the next month**_[\[7\]](#fnazu0qsuph3h) _**due to a nuclear explosion in London?**_. We operationalized this as: “If a nuke does not hit London in the next month, this resolves as 0. If a nuke does hit London in the next month, this resolves as the percentage of people in London who died from the nuke, subjectively down-weighted by the percentage of reasonable people that evacuated due to warning signs of escalation.” We roughly borrowed the question operationalization and decomposition from [Jacob Hilton](https://docs.google.com/document/d/17q-Ok4EVV42IscLMFOLztht7i0iLiALx0DFcX3xLn-A/edit?pli=1#heading=h.9vfmnuhgbjzv).
We broke this question down into:
1. What is the chance of nuclear warfare between NATO and Russia in the next month?
2. What is the chance that escalation sees central London hit by a nuclear weapon conditioned on the above question?
3. What is the chance of not being able to evacuate London beforehand?
4. What is the chance of dying if a nuclear bomb drops in London?
However, different forecasters preferred different decompositions. In particular, there were some disagreements about the odds of a tactical strike in London given a nuclear exchange in NATO, which led to some forecasters preferring to break down (2.) into multiple steps. Other forecasters also preferred to first consider the odds of direct Russia/NATO confrontation, and then the odds of nuclear warfare given that. 
## Our aggregate forecast
![](images/edc67c8614df4e8216a052ca5d623084edc5791c.png)
We use the aggregate with min/max removed as our all-things-considered forecast for now given the extremity of outliers. We aggregated forecasts using the geometric mean of odds[\[8\]](#fnt1dm5d62pkl).
Note that we are forecasting one month ahead and its quite likely that the crisis will get less acute/uncertain with time. Unless otherwise indicated, we use “monthly probability” for our and readers' convenience.
## Comparisons with previous forecasts
We compared the decomposition of our forecast to [Jacob Hiltons](https://docs.google.com/document/d/17q-Ok4EVV42IscLMFOLztht7i0iLiALx0DFcX3xLn-A/edit?pli=1#) to understand the main drivers of the difference. We compare to Jacobs revised forecast he made after reading comments on his document. Note that Jacob forecasted on the time horizon of the whole crisis then estimated 10% of the risk was incurred in the upcoming week. We guess that he would put roughly 25% over the course of a month which we forecasted (adjusting down some from weekly \* 4), and assume so in the table below. The numbers we assign to him are also approximate in that our operationalizations are a bit different than his.
![](images/afb763248cc2ad79ba5948d1c1f24ff644d33a5a.png)
We are ~an order of magnitude lower than Jacob. This is primarily driven by (a) a ~4x lower chance of a nuclear exchange in the next month and (b) a ~2x lower chance of dying in London, given a nuclear exchange.
(a) may be due to having a lower level of baseline risk before adjusting up based on the current situation. For example, while [Luisa Rodríguezs analysis](https://forum.effectivealtruism.org/posts/PAYa6on5gJKwAywrF/how-likely-is-a-nuclear-exchange-between-the-us-and-russia) puts the chance of a US/Russia nuclear exchange at .38%/year. We think this seems too high for the post-Cold War era after new [de-escalation methods have been implemented](https://en.wikipedia.org/wiki/Moscow%E2%80%93Washington_hotline#Background) and lessons have been learnt from close calls. Additionally, we trust the superforecaster aggregate the most out of the estimates aggregated in the post.
(b) is likely driven primarily by a lower estimate of London being hit at all given a nuclear exchange. Commenters mentioned that targeting London would be a good example of a [decapitation strike](https://en.wikipedia.org/wiki/Decapitation_strike#In_nuclear_warfare). However, we consider it less likely that the crisis would escalate to targeting massive numbers of civilians, and in each escalation step, there may be avenues for de-escalation. In addition, targeting London would invite stronger retaliation than meddling in Europe, particularly since the UK, unlike countries in Northern Europe, is a nuclear state. 
A more likely scenario might be Putin saying that if NATO intervenes with troops, he would consider Russia to be "existentially threatened" and that he might use a nuke if they proceed. If NATO calls his bluff, he might then deploy a small tactical nuke on a specific military target while maintaining lines of communication with the US and others using the [red phone](https://en.wikipedia.org/wiki/Moscow%E2%80%93Washington_hotline). 
## Appendix A: Sanity checks
We commissioned a sanity check from [Clay Graubard](https://twitter.com/ClayGraubard?ref_src=twsrc%5Egoogle%7Ctwcamp%5Eserp%7Ctwgr%5Eauthor), who has been following the situation in Ukraine more closely. His somewhat rough comments can be found [here](https://docs.google.com/document/d/18TJLYBXLb3XhNr6BsAM1044V5Zvtxo_xl8B7q6sYjW4/edit).
Graubard estimates the likelihood of nuclear escalation in Ukraine to be fairly low (3%: 1 to 8%), but didnt have a nuanced opinion on escalation beyond Ukraine to NATO (a very uncertain 55%: 10 to 90%). Taking his estimates at face value, this gives a 1.3%/yr of nuclear warfare between Russia and NATO, which is in line with our 0.8 %/yr estimate. 
He further highlighted further sources of uncertainty, like the likelihood that the US would send [anti-long range ballistic missile interceptors](https://carnegieendowment.org/2020/11/19/new-u.s.-missile-defense-test-may-have-increased-risk-of-nuclear-war-pub-83273), which the UK itself doesnt have. He also pointed out that in case of a nuclear bomb dropping in a highly populated city, Putin might choose to give a warning. 
Daniel Filan also independently wrote up his own thoughts on the matter: his more engagingly written reasoning can be found [here](https://docs.google.com/document/d/10UeqFuhrdew21DCENeQfSfZTKvWrKaspoE79w0msPf4/edit) (shared with permission): he arrives at an estimate of ~100 micromorts. We also incorporated his forecasts into our current aggregate.
## Appendix B: Tweaking our forecast
Here are a few models one can play around with by copy-and-pasting them into the [Squiggle alpha](https://playground.squiggle-language.com/dist-builder).
### Simple model
```
russiaNatoNuclearexchangeInNextMonth = 0.00067
londonHitConditional = 0.18
informedActorsNotAbleToEscape = 0.25
proportionWhichDieIfBombDropsInLondon = 0.78
probabilityOfDying = russiaNatoNuclearexchangeInNextMonth * londonHitConditional * informedActorsNotAbleToEscape * proportionWhichDieIfBombDropsInLondon
remainlingLifeExpectancyInYears = 40 to 60
daysInYear=365
lostDays=probabilityOfDying*remainlingLifeExpectancyInYears*daysInYear
lostHours=lostDays*24
lostHours ## Replace with mean(lostDays) to get an estimate in days instead
```
### Overcomplicated models
These models have the advantage that the number of informed actors not able to escape, and the proportion of Londoners who die in the case of a nuclear explosion are modelled by ranges rather than by point estimates. However, the estimates come from individual forecasters, rather than representing an aggregate (we werent able to elicit ranges when our forecasters were convened).
**Nuño Sempere**
```
firstYearRussianNuclearWeapons = 1953
currentYear = 2022
laplace(firstYear, yearNow) = 1/(yearNow-firstYear+2)
laplacePrediction= (1-(1-laplace(firstYearRussianNuclearWeapons, currentYear))^(1/12))
laplaceMultiplier = 0.5 # Laplace tends to overestimate stuff
russiaNatoNuclearexchangeInNextMonth=laplaceMultiplier*laplacePrediction
londonHitConditional = 0.16 # personally at 0.05, but taking the aggregate here.
informedActorsNotAbleToEscape = 0.2 to 0.8
proportionWhichDieIfBombDropsInLondon = 0.6 to 1
probabilityOfDying = russiaNatoNuclearexchangeInNextMonth*londonHitConditional*informedActorsNotAbleToEscape*proportionWhichDieIfBombDropsInLondon
remainlingLifeExpectancyInYears = 40 to 60
daysInYear=365
lostDays=probabilityOfDying*remainlingLifeExpectancyInYears*daysInYear
lostHours=lostDays*24
lostHours ## Replace with mean(lostDays) to get an estimate in days instead
```
![](images/5f001f3c45c8a48a083871362bde46eb55862e81.png)
**Eli Lifland**
Note that this model was made very quickly out of interest and I wouldnt be quite ready to endorse it as my actual estimate (my current actual median is 51 micromorts so ~21 lost hours).
```
russiaNatoNuclearexchangeInNextMonth=.0001 to .003
londonHitConditional = .1 to .5
informedActorsNotAbleToEscape = .1 to .6
proportionWhichDieIfBombDropsInLondon = 0.3 to 1
probabilityOfDying = russiaNatoNuclearexchangeInNextMonth*londonHitConditional*informedActorsNotAbleToEscape*proportionWhichDieIfBombDropsInLondon
remainingLifeExpectancyInYears = 40 to 60
daysInYear=365
lostDays=probabilityOfDying*remainingLifeExpectancyInYears*daysInYear
lostHours=lostDays*24
lostHours ## Replace with mean(lostDays) to get an estimate in days instead
```
![](images/71160022319350cdba174346ddbdfae0ac80b88e.png)
## Footnotes
1. **[^](#fnref1tt4cl1ut8si)**
See e.g. [here](https://forum.effectivealtruism.org/posts/2KRqH5wsymqvhGQge/how-are-you-keeping-it-together) and [here](https://forum.effectivealtruism.org/posts/TkLk2xoeE9Hrx5Ziw/nuclear-attack-risk-implications-for-personal-decision)
2. **[^](#fnrefl2youss95ij)**
3.1 (0.0001 to 112.5) including the most extreme to either side.
3. **[^](#fnrefcfroiodaps)**
Excluding those with military bases
4. **[^](#fnrefpt69xmy0uqf)**
This could be adjusted to consider life expectancy and quality of life _conditional_ on nuclear exchange
5. **[^](#fnreftfhokbz7tk)**
who is also a Superforecaster®
6. **[^](#fnrefmz4wxunrpnc)**
Likewise a Superforecaster®
7. **[^](#fnrefazu0qsuph3h)**
By April the 10th at the time of publication
8. **[^](#fnreft1dm5d62pkl)**
See [When pooling forecasts, use the geometric mean of odds](https://forum.effectivealtruism.org/posts/sMjcjnnpoAQCcedL2/when-pooling-forecasts-use-the-geometric-mean-of-odds). Since then, the author has proposed a [more complex method](https://forum.effectivealtruism.org/posts/biL94PKfeHmgHY6qe/principled-extremizing-of-aggregated-forecasts) that we havent yet fully understood, and is more at risk of overfitting. Some of us also feel that aggregating the deviations from the base rate is more elegant, but that method has likewise not been tested as much.

View File

@ -0,0 +1,191 @@
Valuing research works by eliciting comparisons from EA researchers
==============
**tl;dr:** 6 EA researchers each spent ~1-2 hours estimating the value (relative counterfactual values) of 15 very different research documents. The results varied highly between researchers and within similar comparisons differently posed to the same researchers. This variance suggests that EAs might have relatively undeveloped assessments of the value of different projects.
## Executive Summary
Six EA researchers I hold in high regard—Fin Moorhouse, Gavin Leech, Jaime Sevilla, Linch Zhang, Misha Yagudin, and Ozzie Gooen—each spent 1-2 hours rating the value of different pieces of research. They did this rating using a [utility function extractor](https://utility-function-extractor.quantifieduncertainty.org/research), an app that [presents the user with pairwise comparisons](https://forum.effectivealtruism.org/posts/9hQFfmbEiAoodstDA/simple-comparison-polling-to-create-utility-functions) and aggregates these comparisons to produce a utility function.
This method revealed a wide gap between different researchers' conceptions of research value. Sometimes, their disagreement ranged over several orders of magnitude. Results were also inconsistent at the individual level: a test subject might find A to be x times as valuable as B, and B to be y times as valuable as C, but A to be something very different from x\*y times as valuable as C.
It seems clear that individual estimates, even those of respected researchers, are likely very noisy and often inaccurate. Future research will further investigate ways to better elicit information from these people and recommend best guesses for the all-things-considered answers. It is also likely that researchers spending more time would have produced better estimates, and we could also experiment with this in the future.
My guess is that EA funders also have inconsistent preferences and similarly wide-ranging disagreements. That is one of the reasons I am excited about augmenting or partially automating them.
Current aggregate estimates look as follows:
![](images/c7addcb750e48d25f9c34e2083d96bddffe2e300.png)
## Motivation
EAs make important decisions based on how valuable different projects seem. For example, EAs can distribute funding based on expectations of future value. In fact, I estimate that the group I studied will cumulatively grant several millions of dollars, both in terms of advising various funds and because they are influential in the longtermist funding space.
Estimating the value of past projects seems easier than estimating the value of future projects, but even that is relatively tricky. We at the Quantified Uncertainty Research Institute are interested in helping to encourage more estimation of previous and future projects, and we are trying to find the best ways of doing so.
The most straightforward experiment we could do was survey a few researchers on their relative estimates of the value of different projects. We did this with six researchers.
My original plan was to create a [unit of research value](https://forum.effectivealtruism.org/posts/3hH9NRqzGam65mgPG/five-steps-for-quantifying-speculative-interventions) based on the aggregate estimates of a group of researchers. Initially, I expected the estimates to be consistent and that the aggregate could be a good best-guess at a “ground truth”. We could then build evaluations and shared assessments on top of them. For instance, forecasting systems could estimate how valuable this trusted group would find a new project and fund it according to their estimate.
However, ratings turned out to be very inconsistent, which made me more sceptical that the individual or aggregate opinion could be a good best guess. Instead, I would now prefer to improve elicitation and aggregation methods before building a forecasting system on top.
Further, core decision-makers might be similarly inconsistent and might be making mistakes accordingly. In that case, further work in this area might also be considered promising.
## Methodology
I asked six researchers to use [the application](https://utility-function-extractor.quantifieduncertainty.org/research) described in [Simple comparison polling to create utility functions](https://forum.effectivealtruism.org/posts/9hQFfmbEiAoodstDA/simple-comparison-polling-to-create-utility-functions) to compare 15 pieces of research. These pieces ranged from [a comment](https://forum.effectivealtruism.org/posts/3PjNiLLkCMzAN2BSz/when-setting-up-a-charity-should-you-employ-a-lawyer?commentId=YNKNcp6nKqxqkZgCu) on the EA Forum to Shannon's foundational text, "The Mathematical Theory of Communication".
![](images/09f6245837ac758b2d952e4a0c4a2b9613f43f7b.png)
The app presents the user with pairwise comparisons. Each comparison asks the user how valuable the first element is, compared to the second (e.g., 10 times as valuable, 0.01 times as valuable). The app internally uses [merge sort](https://en.wikipedia.org/wiki/Merge_sort) to ensure that there can be no cyclical comparisons—so that the user cannot express a preference that A > B > C > A. Readers are encouraged to [play around with it](https://utility-function-extractor.quantifieduncertainty.org/research).
## Results
### Visualization of results
For individual researchers, results can be visualized as follows:
![](images/4d8364726d8879bab0ace9ae58db66e3d0125da3.png)
The green lines represent how much more valuable the element to the right is than the element to the left. The table below the graph uses the geometric mean to combine the users guesses into an average guess. See the appendix for the method behind this.
When combining the results of all the individuals using the geometric mean—see the appendix for the method—we get a table such as the following:
![](images/2b9614c75de9ed5262f8a3c25950917c3b951f4c.png)
The coefficient of variation is the standard deviation divided by the geometric mean. “OOM range” stands for “order of magnitude range”, where an order of magnitude is a difference of 10x. The method to calculate the relative values is in the first appendix. 
To create such a table, we need a reference element, which by construction has a value of “1”. In this experiment, that reference element was [Categorizing Variants of Goodharts Law](https://arxiv.org/abs/1803.04585). I picked this because it seemed like a very high-quality research output, but not unattainably so, and it was somewhat around the midpoint of value. Further research could determine which element or combination of elements is optimal to pick as a reference point. Further, many reference points could be used, which would have the advantage that, e.g., all guesses would have a coefficient of variation.
### Consistency in ordering
In the app, users stated their value ranges for the differences between elements. In a preliminary analysis, we simplified this data by simply calculating the ordering for each evaluator. The different orderings were as follows:
![](images/b2b65f1e73a16b3e65bd37352b647e7b7df2a672.png)
These are pretty consistent. Some of the most salient differences are:
* Linch Zhang indicated that the [What are some low-information priors that you find useful…](https://forum.effectivealtruism.org/posts/SBbwzovWbghLJixPn/what-are-some-low-information-priors-that-you-find) —which he wrote—was pretty valuable, whereas other readers did not. Misha Yagudin felt that it was not that valuable at all. 
* Though note that Zhang is on the lower end of the crowd for his other piece of writting included in the table: [The motivated reasoning critique of effective altruism](https://forum.effectivealtruism.org/posts/pxALB46SEkwNbfiNS/the-motivated-reasoning-critique-of-effective-altruism)
* Fin Moorhouse indicated that the [Database of orgs relevant to longtermist/x-risk work](https://forum.effectivealtruism.org/posts/twMs8xsgwnYvaowWX/database-of-orgs-relevant-to-longtermist-x-risk-work) was more valuable than others thought.
* There were significant disagreements on [Reversals in Psychology](https://www.gleech.org/psych).
### Inconsistency in relative values between users
Yet, what we care about is not relative ordinal position—A is in the first position, but B is in the fifth position. Instead, we care about relative value—A is 10x better than B. The results are as follows:
![](images/2b9614c75de9ed5262f8a3c25950917c3b951f4c.png)
### Inconsistency within the same researcher
Consider Misha Yagudins results:
![](images/0789cbb6695c45b020ae9090b08fc6ba5d0bb0fe.png)
Zooming in, we see that element #M is 2x as valuable as element #L, #L is 100x as valuable as #K, and #K is 2x as valuable as #J. So overall, #M should be 2\*100\*2 = 400x as valuable as #J. However, Yagudin evaluates it as only 33x as valuable in a face-to-face comparison.
![](images/0789cbb6695c45b020ae9090b08fc6ba5d0bb0fe.png)
Gavin Leech was generally consistent.
![](images/2bf104abc7763c2667dc0137d667ccca07b45277.png)
This was because he was paying particular attention to producing consistent estimates. On the other hand, the distance between, for example, #H and #K, was 10 when calculated one way, but 10,000\*1\*5=50,000 when calculated another way.
![](images/2bf104abc7763c2667dc0137d667ccca07b45277.png)
It would be interesting to calculate the coefficients of variations for each user in future iterations and see which user is the most inconsistent (or whether they are comparably so) and which item elicits the most inconsistency in the users.
It's also worth noting again that participants didn't end up spending all that much time on this, and it's likely that they would have been able to produce more consistent estimates if they had thought longer about it. 
### Raw data
The data and results are in [this git repository](https://github.com/QURIresearch/utility-function-extractor/tree/8ce1a4a8572ec692bc82d39a1bd983216fb0f136/data/results/research). It includes both visualizations and the raw data needed to generate them. 
## Conclusions
### Judgmental assessment
This exercise can be understood as a measure for _noise reduction_. Each researcher knows something about the value of research, but their assessments are noisy and incomplete. So using different comparisons, first by the same researcher and then aggregating the comparisons of many researchers, reduces noise and gets us closer to the ground truth. 
However, this exercise can also be understood as a “garbage in, garbage out” situation. We might have expected that respected researchers had somewhat consistent intuitions about relative values, both individually and between themselves.
And comparisons were indeed somewhat consistent, but not as much as I hoped. So even though aggregation is, in general, a powerful tool that can make, e.g., forecasters predictions more accurate, I am less sure about how close this aggregation comes to estimating a “ground truth” in this case.
One particularly worrying difference in opinions is the difference in the range of values. Moorhouses range is 5.1 orders of magnitude, whereas Leechs is 12.6 (the participants average is 7.6). 
I am hesitant to extract too many conclusions from this, given that participants only spent roughly an hour on this per person. But at the very least, this work suggests that you and I might hold intuitions that differ much more than we would ever realize without a formal elicitation like this.
### Alternatives and future work
We could perhaps improve this elicitation method by applying lessons from forecasting. We could have a setup similar to the [Delphi method](https://en.wikipedia.org/wiki/Delphi_method) where:
1. The participants give their first estimates.
2. They reflect on their estimates and tweak them until they become more self-consistent.
3. Only then do the participants talk with each other and share and justify their estimates.
4. Finally, we could aggregate all relative value comparisons.
A different approach would be to rely on one central authority to think hard about different projects' value and produce more coherent and legible comparisons.
Both of these methods would require more upfront investment than this current experiment.
It could be that there has just been very little public discussion of this—so there is still low-hanging fruit that could be valuable to the EA community. It is possible that I am not familiar enough with the relevant elicitation and social choice literature. My impression is that the relevant literature mostly asks for many better/worse binary comparisons, or works with datasets of prices, to run lots of regressions to try to estimate the relevance of different features (thanks to Eva Vivalt for a brief discussion on this). This is more inefficient than trying to come up with a utility function from scratch using relative value comparisons but does require less sophistication from its users.
Further work and clarification in this area could be highly valuable. We could deploy rudimentary tools like the utility function extractor to sanity-check funding decisions. We could also generally invest in more powerful estimation infrastructure and then apply it at scale to relative value estimations to produce combined guesses which are better than our individual, imperfect guesses.
## Acknowledgements
<p><img src="images/7385a0f4bc3ff0ac194d9b0054b8a3b0fa9cae77.png" alt="QURI logo" class="img-frontpage-center"></p>
This post is a project by the [Quantified Uncertainty Research Institute](https://quantifieduncertainty.org/). It was written by Nuño Sempere. Thanks to Ozzie Gooen and Gavin Leech for comments and suggestions and Finn Moorhouse, Gavin Leech, Jaime Sevilla, Linch Zhang, Misha Yagudin and Ozzie Gooen for participation in this experiment, and for permission to share their results. 
## Appendix I: Methods to extract relative values
We have several relative comparisons of the sort “A is 10 times as valuable than B”, and “C is 0.1 times as valuable than D”. The way I am calculating relative values is by choosing a reference value, say A, and then taking the geometric mean of all monotonic—either all increasing or all decreasing—paths from the element of interest to A.
So, for instance, suppose we have element B. The monotonic paths from B to A could be:
* B is 3 times as valuable as X, X is 2 times as valuable as Y, Y is 10 times as valuable as A. This implies that B is 3\*2\*10=60 times as valuable as A
* B is 5 times as valuable as N, N is 4 times as valuable as A. This implies that B is 5\*4=20 times as valuable as A.
* B is 25 times as valuable as A.
So, in this case, I would take the geometric mean and say that B is (60\*20\*25)^(1/3) ~ 31 times more valuable than A.
The coefficient of variation is calculated as the standard deviation of the list, divided by the geometric mean. For example, the coefficient of variation is the same for (1, 2, 3) than for (0.1, 0.2, 0.3). This makes it possible to compare the variation of elements in very different orders of magnitude.
## Appendix II: Current and future improvements to the utility function extractor as a result of this experiment
After finishing this experiment, I added several improvements to the utility function extractor that I expect to be useful in future related projects.  
**Distributions over point estimates**
I modified the utility function extractor to allow for uncertain estimates using foretold syntax, e.g., of the sort “1 to 10”, or “normal(5,2)”, or “mm(normal(5,2), 3 to 100)”, rather than point estimates. This is a fairly obvious improvement but would have been harder before [Squiggle](https://github.com/foretold-app/squiggle) was more production-ready. For ease of composability, I send the query to an API endpoint for ease of composability, [https://server.loki.red/squiggle](https://server.loki.red/squiggle).
**Clearer instructions to participants**
Some participants were confused as to what “value” referred to. If I repeat this experiment, I will make sure that users consistently choose one of counterfactual, Shapley, or total values. We could also consider quality, value and counterfactuals separately.
I also added a prompt for participants to write their names to identify their estimates as theirs.
**A more conservative estimation of the number of comparisons**
Initially, I gave the expected number of comparisons needed to build the utility function. But users who went over this mean reported being confused by this.
I then added a reference to the maximum number of steps needed, but this still confused users. Finally, I gave up and made the “expected” value of steps a more conservative number.
**Better visualizations**
I also slightly improved the visualizations.
**Saving database to a server**
Initially, I had problems with the comparisons not being saved to my server. This works now, but I changed my experimental procedure to ensure that I could get the results even if they did not reach the server by providing a way to display them on the web page itself.
**Remaining improvements**
Some remaining improvements might be:
* Allow people to modify their initial guesses. This is manually allowed, but the interface for doing this is by directly editing a JSON object.
* Add a measure of how inconsistent comparisons are within one participant. This can be the same coefficient of variation explained above, but for within-participant comparisons. Later, the reference point could be chosen as the point that minimizes the geometric mean of the coefficients of variation.
* Allow participants to come back another day and do the same exercise and view their comparisons side by side.
* Structure an experiment so that participants can talk to each other and take time for reflection.

Binary file not shown.

After

Width:  |  Height:  |  Size: 102 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 106 KiB

View File

@ -0,0 +1,74 @@
Forecasting Newsletter: April 2222
==============
## Highlights
* Keine Davon to become German Chancellor despite prediction markets' confidence to the contrary
* Netflix releases Korean soap opera: Forecasting Love And Weather.
* Hague to allow Treaty on Accuracy to stand
## Index
* Highlights
* Prediction Markets & Forecasting Platforms
* In The News
* Long Content
* Hard To Categorize
You can sign up for this newsletter on [substack](https://forecasting.substack.com/), or browse past newsletters [here](https://forum.effectivealtruism.org/s/HXtZvHqsKwtAYP6Y7). If you have a content suggestion or want to reach out, you can leave a comment or find me on [Twitter](https://twitter.com/NunoSempere).
I have received an offer I couldnt refuse from a premier Substack competitor, so this newsletter will be moving to [onlyfans.com/forecasting](https://onlyfans.com/forecasting) starting next month (I had some troubles with verification this month). Although I understand that it might be awkward for some readers, the signup bonus alone made this the utility-maximizing move. I am also excited about incorporating OnlyFans paying functionality to streamline my consulting and allow readers to solicit calibrated forecasts.
## Prediction Markets & Forecasting Platforms
Palantir, a controversial (approval rating: 22%, source: Poll aggregation by FiveFourtyTwo) defence contractor headed by semiquincentennial entrepreneur, past antipope and presidential candidate Peter Thiel, has launched its first assassination market in collaboration with the UN's Security Council. Participants will have the possibility to anonymously bet on the date of the death or disappearance of the elusive globetrotter terrorist and hacker known only as "Morpheus". In an unusually emotional speech, UN Security Council head-honcho Malia Ngo profusely thanked Thiel, saying that it "warms \[her\] heart to see that human innovation can help contain such disruptions to the normal functioning of civilization."
Ought, the machine learning research lab, has been acquired by Metacortex. Metacortex predicts (confidence: 79%, source: Metacortex proprietary systems) that it will be able to successfully tightly integrate Ought's autonomous research, forecasting and decision-making capabilities into its AI-based defence and deterrence products. Metacortex's stock market valuation rose 0.12% on intra-minute trading after the announcement.
As the Argentina-UCS cold war continues, [Mary Ann Island](https://es.wikipedia.org/wiki/Islas_Bridges), a small island previously administered by Argentina, has been invaded by a confederacy of independent traders seeking to exploit ambiguity in some prediction markets' resolution criteria. Some high-volume prediction markets were set up to give advance warning of a possible invasion of any part of Argentina but neglected to specify that the invading party had to be the UCS as an exercise in diplomatic tact. The island itself is unpopulated and known for its large population of rabbits, but otherwise unremarkable.
## In the News
The International Court of Justice in the Hague has allowed the Treaty on Accuracy, and in particular, its harsh punitive measures, to stand. The Commentators, Litterateurs And Pundits Society (CLAPS) had previously argued that not differentiating between an assertion of fact, an unfounded opinion and a calibrated forecast was a permitted exercise of "free speech", whereas Chief Prosecutor Michael Townsend successfully argued before the court that readers have a symmetric right to true facts and that this right justifies restrictions in journalistic freedoms. To comply with the new regulations, this newsletter shall (probability estimation: 95%, source: personal estimate) here onwards incorporate probabilistic estimates of statements with less than 98% probability; a third party service will ensure and incentivize calibration.
Great Britain's GDP is now 2^10 times larger than that of continental Europe. Since it replaced its ceremonial monarchy with a futarchy-based decentralized parliamentary system set to optimize "hedons", Great Britain's economy has been doubling every four months, which stands in sharp contrast to an average doubling time of one year in Honduras, one and a half in the Mars colony, two years in continental Europe, five years in developing nations, or ten in the United Catholic States of America. Nonetheless, the methods of The Great DAO of Great Britain remain controversial (50.1% approval rate among eligible voters.) For example, despite Metacortex's highly accurate simulations conclusively (99.9%+) having shown that acting decisively against rebel Scottish separatists was a necessary move to preserve Great Britain's prosperity, a group of revisionist historians recently argued that obliterating Edinburgh with a kinetic orbital strike was "morally wrong" and a display of "excessive force".
Succession troubles in the Arab Emirates intensify, as prediction markets and calibrated proprietary systems predict that a less charismatic brother would reign more effectively than the current heir apparent. Current reigning monarch Abdulaziz bin Salman still holds the power to appoint his heir, but choosing an in-expectation-worse successor might (probability estimate: 75%, source: personal estimation) lead to a loss of legitimacy and public unrest (e.g., protests), but would probably not topple the regime (20% that it will, source: personal estimation.)
As foreseen by prediction markets and pundits alike, Keine Davon has been elected leader of the CDU, and is widely expected to become the German Chancellor in the upcoming elections this June (e.g., FiveFourtyTwo currently gives this a 97% probability). I'd personally give it 95%+ probability, however, prediction markets are currently sitting at 85% because of a small minority of ardently delusional deniers who expect the candidacy to be rendered illegal after judicial review.
UN Secretary-General Yan Zhang vows to move prediction markets to at least a 30% implied probability that the Spanish military junta will not be in power by the end of the decade. Prediction markets rose to 35% upon announcement (source: Metacortex), up from an early estimate of 28%. The move is widely considered to be an attempt by Zhang to distract attention away from an embezzlement scandal, in which famine prediction systems were manipulated to show increasing risk in areas that were actually safe, leading to the deployment of additional funds which could then safely be stolen.
<p><img src="images/02034183e0de16f853fc004dab47ef5a01ca677f.png" alt="QURI logo" class="img-frontpage-center"></p>
Netflix releases a new Korean soap opera, [Forecasting Love and Weather](https://en.wikipedia.org/wiki/Forecasting_Love_and_Weather), which tells the gripping tale of how a young man with an affinity and talent for weather forecasting falls in love with an analytical woman of comparable forecasting prowess. "It was as if an occult hand had reached into Korean society and made forecasting cool and mainstream", mentions a spokesbeing for the Korean Forecasting Congregation. It further seems that a lot of [attention to detail](https://www.soompi.com/article/1512045wpp/real-life-meteorological-administration-spokesperson-explains-how-forecasting-love-and-weather-was-made-realistic) went into making the show realistic.
Mars Emperor [Tim Chu](https://www.lesswrong.com/posts/LYXb2fLkGDRXoAx7M/timothy-chu-origins-chapter-1) vows to colonize Andromeda. Prediction markets rose to 99% upon announcement, up from an early estimate of 0.5% (source: Metacortex.)
## Recent blog posts
[Sand Teal Cortex](https://slatestarcodex.com/) investigates the story of the Chinese precogs who are rumoured to have recently been making waves in the prediction and stock markets (quantified in later sentences). In short, in the 2050s, the then-communist Chinese regime started an embryo editing and selection program (99%+; this is well documented) for a variety of traits, i.e., for charisma, military-strategic ability, mathematical talent, etc. Most of these experiments otherwise never went anywhere that we know of (30%, the fact that there isn't public information doesn't update me much either way, and this contradicts [theoretical models](https://www.gwern.net/Embryo-selection#limits-to-iterated-selection-the-paradox-of-polygenicity)). However, after an unknown number of generations, humans optimized for correlates of predictive prowess reportedly displayed truly uncanny predictive ability (70%; reports are unclear, but again theoretical models suggest that gains in the absence of ethical constraints can be massive). After the fall of the Chinese communist regime, these precogs are speculated to have begun to use those abilities for profit (35%; here we enter the realms of speculation). This would—so the theory goes—explain a recent very noticeable upwards blip in the accuracy of various prediction markets.
In particular, since a couple of days ago, global financial markets have begun acting strangely, in a way that suggests that some entity has been exponentially growing the fraction of total market power it controls (40%; I'm deferring to the experts here, but don't have detailed models myself.) Prediction markets on the topic don't have much liquidity yet, but in the meantime, superforecasting systems give \[rest of sentence interdicted on the authority of Guardian Samuel Kuehlruhe\].
_**Trigger warning**: Reading the next paragraph is grossly illegal in the UTS and allied jurisdictions. If you're an emulated being, consult your TOS or face termination at your own risk before proceeding. Honestly, I thought that this was worth reporting on, but at least get a VPN, plz._
Rootclaim has a new feature analyzing the reasons for Peter Thiel's extraordinary longevity. They find that the most likely hypotheses are a combination of cryogenic stasis (75%), speculative medical procedures (85%) (e.g., blood transfusion from younger Thiel clones (45%)), and replacement by clones once the original Thiel becomes too decrepit (35%). One can only hope (20%; informal estimation) that articles such as this will halt—or at least decelerate—the seemly inevitable rise of the Thielian church.
## Long Content
![](images/57ec1c1ce52c9b097c46dedb0ca9233fda286405.png)
[Robin Hanson To Represent Sweden At 2021 Olympic Games In Tokyo](https://calbears.com/news/2021/5/27/mens-swimming-diving-robin-hanson-heading-to-tokyo). To settle a bet about whether he would have found a career in sports more meaningful than his intellectual career, Robin Hanson has agreed to spin up universe afea6ef9628fcb91771abc9f799cf15. You can bet on the outcome [here](https://polymarket.com/market/will-robin-hanson-find-a-swimming-career-more-meaningful-than-an-intellectual-career). [United Nations Security Council Resolution 26280](https://en.wikipedia.org/wiki/United_Nations_Security_Council_Resolution_26280) requires us to inform you that if there are two or more Robin Hansons in your universe, you might be in a simulation (probability depends on the specific [anthropic question being asked](https://www.lesswrong.com/posts/LARmKTbpAkEYeG43u/anthropics-different-probabilities-different-questions) and on how much credence one lends to the [simulation hypothesis](https://www.simulation-argument.com/).)
T. Greer of [The Scholar's Stage](https://scholars-stage.blogspot.com/) speculates (implied probability estimate: 7%, source: Scholar's Stage) that Russia has systematically been misleading US analysts as to the efficacy of various forecasting methodologies. He proposes this as an explanation as to why superforecasters are better at predicting geopolitical events, but monetary prediction markets are better at everything else. The idea is that the KGB would have carried out their own experiments to determine which forecasting method is more accurate, and then changed its own actions in low-stakes events in the geopolitical arena to make superforecasting appear superior, so that its rivals would have access to worse probability elicitation measures in situations where it truly mattered.
---
This newsletter is generously sponsored by Metacortex and the Cult of Tim Chu. They cover server costs for around twenty subjective hours a month, which is just barely enough to write this newsletter, so I rely on subscribers to exist beyond that. Please become a paying subscriber. Please become a paying subscriber. [Please become a paying subscriber](https://forecasting.substack.com/).
---
> They said that conquering Afghanistan had been tried before, that it was a fool's errand. But if you are a strong enough optimizer, base-rates don't apply. That's why Afghanistan is now a paradise on Earth, and that's how I got a nation-sized impenetrable fortress.
—Peter Thiel

Binary file not shown.

After

Width:  |  Height:  |  Size: 116 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 130 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 58 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 68 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 61 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 82 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 78 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 45 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 118 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 105 KiB

View File

@ -0,0 +1,163 @@
Forecasting Newsletter: March 2022
==============
## Highlights
* [Comparing top forecasters and domain experts](https://forum.effectivealtruism.org/posts/qZqvBLvR5hX9sEkjR/comparing-top-forecasters-and-domain-experts) finds that past studies mainly were not comparing apples to apples and that the assertion that superforecasters were 30% better than intelligence analysts was unjustified.
* [Samotsvety's Nuclear Forecasts](https://forum.effectivealtruism.org/posts/KRFXjCqqfGQAYirm5/samotsvety-nuclear-risk-forecasts-march-2022) got picked up in the [Spanish press](https://english.elpais.com/science-tech/2022-03-26/is-it-possible-to-predict-the-future-of-the-war-in-ukraine-online-forecasting-communities-think-so.html) and criticized by a [pessimistic nuclear expert](https://forum.effectivealtruism.org/posts/W8dpCJGkwrwn7BfLk/).
* [Forecasting Wiki launched](https://forecasting.wiki/wiki/Main_Page)
* Polymarket is inflating its volume by incentivizing wash trading. (edit: wrong, see [here](https://twitter.com/NunoSempere/status/1511425326701854720))
## Index
* The state of forecasting
* Notable news
* Platform by platform
* Relevant research
You can sign up for this newsletter on [substack](https://forecasting.substack.com/), or browse past newsletters [here](https://forum.effectivealtruism.org/s/HXtZvHqsKwtAYP6Y7).
## The state of forecasting
On account of getting a plug on one of Spain's most-read newspapers, this newsletter has reached 1,000 subscribers:
![](images/4d6eb12702b96fa17bc7715d995c4e9de83eba60.png)
You can find a market on when it will reach 2000 [here](https://manifold.markets/Nu%C3%B1oSempere/when-will-my-newsletter-reach-2000).
So I thought I would summarize the state of forecasting as I see it, striving to be informative to new readers. If you're already familiar with the key points, you might want to skip to the next section.
The main problem is bullshit or lack of epistemic virtue and ability. The US misled itself into thinking that Iraq still had weapons of mass destruction or that [everything would be okay in Afghanistan](https://forecasting.substack.com/p/looking-back-at-2021?s=w) ([a](http://web.archive.org/web/20220304222048/https://forecasting.substack.com/p/looking-back-at-2021?s=w)). People were not expecting covid to last so long. And everyone keeps expecting a better brand of politician to show up.
What is the alternative? The alternative is to develop better models of the world and then use those better models to make better decisions.
But how do we know which models of the world are good? How do we differentiate real understanding from fake understanding? It's tricky, but to a first approximation, we make our hypotheses about the world output predictions, and we [reduce our confidence in the hypotheses that make worse predictions](https://arbital.com/p/bayes_rule/?l=1zq) ([a](http://web.archive.org/web/20220402005820/https://arbital.com/p/bayes_rule/?l=1zq)). The book _Superforecasting_ is a neat introduction to the practices involved. E.T. Jaynes' _Probability Theory: The Logic of Science_ is a hardcore introduction to the math behind it. Both books are probably available for free in the [z library](https://b-ok.org/) ([a](http://web.archive.org/web/20210129172145/https://b-ok.org/)).
![](images/357eb016aaa462558a014aa83debc25d46192d28.png)
A graphical representation of [Bayes' rule](https://en.wikipedia.org/wiki/Bayes%27_theorem), from [Arbital](https://arbital.com/p/bayes_waterfall_diagram/?l=1x1&pathId=84358).
You could keep track of your probabilities in a spreadsheet. But it would also be convenient to collaborate and compete with others. And here come various forecasting platforms, like [Metaculus](https://www.metaculus.com/) ([a](https://web.archive.org/web/20220401114829/https://www.metaculus.com/questions/)), [Manifold Markets](https://manifold.markets/) ([a](https://web.archive.org/web/20220328122934/https://manifold.markets/)), Good Judgment Open, or INFER. These forecasting platforms struggle to seduce forecasters into tracking their probabilities on their site and get the funds of decision-makers who want to use probabilities to make better decisions.
Besides forecasting platforms, we also have real-money prediction markets, where participants bet their own money on their degree of belief. These can either be based on cryptocurrencies, like [Polymarket](https://polymarket.com/) ([a](http://web.archive.org/web/20220401003928/https://polymarket.com/)), [Insight prediction](https://insightprediction.com/) ([a](http://web.archive.org/web/20220401195423/https://insightprediction.com/)), [Hedgehog](https://hedgehog.markets/) ([a](http://web.archive.org/web/20220322035748/https://hedgehog.markets/)), or be regulated, like Betfair, Kalshi, Nadex or PredictIt. Historically, prediction markets have focused on sports, but in recent times, they have also hosted more informative markets, e.g., on covid, the invasion of Ukraine, and various US political developments.
To my new Spanish readers, I would recommend that you start forecasting on [Metaculus](https://www.metaculus.com/questions/?show-welcome=true) and only consider trying prediction markets once youve proven to be good in platforms that dont risk real money.
Something that has been on my mind is that forecasting platforms tend to either have institutional partnerships or be nice to use. But generally not both. I think this can be explained by older websites using worse technology but having had more time to develop partnerships:
![](images/64c123da69f1a003597c0ce0f222e7c9989bb602.png)
I generally tend to take a _technology maximalist_ perspective toward that tradeoff in this newsletter. I tend to express the view that platforms with better technology will outcompete the others because they will be able to move and experiment faster, add new features, and retain more users.
Recently, two interesting developments have been affecting the forecasting ecosystem. First, the war between Russia and Ukraine has sparked broader interest in whether forecasting platforms or prediction markets have anything to say about it:
![](images/c2892cda125e9c804203100d2016ab29cf72bde3.png)
Popularity of the search term "Metaculus" in Google trends. h/t Metaculus user [UgandaMaximum](https://www.metaculus.com/accounts/profile/116440/)
And secondly, the [FTX Future Fund](https://ftxfuturefund.org/) ([a](http://web.archive.org/web/20220321183544/https://ftxfuturefund.org/)), a very large philanthropic funder, has expressed interest in forecasting. Platforms and individuals in the space have been scrambling to present proposals that might please it.
And with this, we are left to discuss recent developments:
## Notable news
[Pricing existential risk](https://www.project-syndicate.org/commentary/nuclear-war-existential-risk-in-stock-market-pricing-by-willem-h-buiter-2022-03) (see aso: [existential risk](https://wikiless.org/wiki/Global_catastrophic_risk?lang=en) ([a](http://web.archive.org/web/20220404233146/https://wikiless.org/wiki/Global_catastrophic_risk?lang=en))): All investments go to zero in the case of existential risk, so it's hard to price it correctly. In particular, one can't just substitute riskier assets with less risky assets. Still, the higher the existential risk is, the more one should frontload consumption. And if stocks are roughly worth the discounted value of dividends and other payments, higher existential risk should reduce their value. But the market may not have realized this yet. I thought that the article was great, but I would have appreciated a more comprehensive treatment.
[The Forecasting Wiki](https://forecasting.wiki/wiki/Main_Page) ([a](http://web.archive.org/web/20220331053135/https://forecasting.wiki/wiki/Main_Page)) is getting started. As advertised on their website, they have a meetup on April 24th, as well as a Discord channel. 
[Global Guessing](https://twitter.com/GlobalGuessing) continues to do a great job following developments in the Ukraine war through shifts in probabilities. For example:
![](images/ec09dfbc3b8498f5f5cea784815e18119327a1c6.jpg)
Global Guessing's tracking of probabilities about the Ukraine conflict.
## Platform by platform
Metaculus [continued publishing questions on the Ukraine conflict](https://manifoldmarkets.substack.com/p/above-the-fold-bountiful-manifold?s=r) ([a](http://web.archive.org/web/20220404233248/https://manifoldmarkets.substack.com/p/above-the-fold-bountiful-manifold?s=r)), [estimated low meat production](https://forum.effectivealtruism.org/posts/2b9HCjTiFnWM8jkRM/forecasts-estimate-limited-cultured-meat-production-through) ([a](http://web.archive.org/web/20220331152933/https://forum.effectivealtruism.org/posts/2b9HCjTiFnWM8jkRM/forecasts-estimate-limited-cultured-meat-production-through)) and organized a [small White Hat cybersecurity tournament](https://www.metaculus.com/tournament/white-hat/) ([a](http://web.archive.org/web/20220316141130/https://www.metaculus.com/tournament/white-hat/)), which got picked up by [Lawfare](https://www.lawfareblog.com/come-compete-white-hat-cyber-forecasting-challenge) ([a](http://web.archive.org/web/20220330064955/https://www.lawfareblog.com/come-compete-white-hat-cyber-forecasting-challenge))
Per SimonM, the most insightful comments on Metaculus were:
* [orion.tjungarryi](https://www.metaculus.com/questions/9939/kyiv-to-fall-to-russian-forces-by-april-2022/#comment-85701) looks at the relationship between population and how long cities hold out, to figure out whether Kiev would fall. The larger the cities, the longer they tend to hold.
* [haukurth](https://www.metaculus.com/questions/10057/will-russia-control-chernihiv-on-june-1/#comment-87889): "It's a full time job now to constantly degrade Russian chances on various Metaculus questions."
* [aqsalose](https://www.metaculus.com/questions/10246/russian-coup-or-regime-change-by-2024/#comment-86610=) calculates a base rate for regime change in Russia. Based on historical precedent, Putin's grip on power doesn't look to bad in the short term.
* [Joker](https://www.metaculus.com/questions/9939/kyiv-to-fall-to-russian-forces-by-april-2022/#comment-84323) also looks at the base rate of sieges—they last longer than a month. Based on this, he gave a 1% chance of Kiev falling at a time when the Metaculus aggregate was at ~65%.
I also liked Richard Hanania's Metaculus notebook on [Why Forecasting War is Hard](https://www.metaculus.com/notebooks/10226/why-forecasting-war-is-hard/) ([a](http://web.archive.org/web/20220317133356/https://www.metaculus.com/notebooks/10226/why-forecasting-war-is-hard/)).
Good Judgement Inc is hiring a [Director of Sales](https://www.linkedin.com/jobs/view/2972722902/) ([a](https://web.archive.org/web/20220404233402/https://www.linkedin.com/jobs/view/2972722902/)).
Manifold Markets [discusses their market mechanics](https://manifoldmarkets.substack.com/p/above-the-fold-market-mechanics) ([a](http://web.archive.org/web/20220315225021/https://manifoldmarkets.substack.com/p/above-the-fold-market-mechanics)) (technical). Prediction markets need a way to match bets between users. In modern times, they do so by betting against a central automated market-maker, but different algorithms determine the specifics. Manifold Markets tells how they started with Dynamic Parmimutuel, considered the logarithmic market scoring rule, and ended up with a less elegant constant product market maker.
Manifold also [implemented loans on the first M$20 bet on any market](https://manifoldmarkets.substack.com/p/above-the-fold-borrow-away?s=r) ([a](http://web.archive.org/web/20220405134025/https://manifoldmarkets.substack.com/p/above-the-fold-borrow-away?s=r)), [applied to the FTX Fund](https://manifoldmarkets.substack.com/p/predicting-for-good-charity-prediction?s=r) ([a](http://web.archive.org/web/20220404234202/https://manifoldmarkets.substack.com/p/predicting-for-good-charity-prediction?s=r)), and [awarded some bounties to active community members](https://manifoldmarkets.substack.com/p/above-the-fold-bountiful-manifold?s=r) ([a](http://web.archive.org/web/20220404233248/https://manifoldmarkets.substack.com/p/above-the-fold-bountiful-manifold?s=r)).
INFER is organizing a tournament for [EA university groups](https://www.infer-pub.com/2022-ea-college-forecasting-tournament) ([a](http://web.archive.org/web/20220405134334/https://www.infer-pub.com/2022-ea-college-forecasting-tournament)). I would recommend joining; I enjoyed their team functionality.
[Insight predictions](https://insightprediction.com/markets/206) ([a](http://web.archive.org/web/20220404154746/https://insightprediction.com/markets/206)) continues to have the guts to ask the important questions, such as: "Will Russia Conquer the Donbass by the End of July 2022?". Though liquidity (the opportunity to trade on both sides of a question) is a bit thin.
![](images/eb0db333c24fcfab0eb74133238e1fe5c6a925a3.png)
The ¿founder? of Insight Predictions also [objected](https://forum.effectivealtruism.org/posts/xpkpXq57mXmLbgkSC/forecasting-newsletter-february-2022?commentId=jCn8ri7ux7Q28WmTP) ([a](https://web.archive.org/web/20220320031942/https://forum.effectivealtruism.org/posts/xpkpXq57mXmLbgkSC/forecasting-newsletter-february-2022#comments)) to me characterizing Insight as possibly but most likely not a scam in a previous newsletter. One of the key elements that made me suspicious was that he had previously remained anonymous. But he has now de-anonymized himself, and he turns out to be [Douglas Campbell](https://twitter.com/TradeandMoney), who previously served in Obamas Council of Economic Advisors. So theres that.
[Kalshi](https://kalshi.com/events/FED-22MAY/markets/FED-22MAY-T0.75) ([a](http://web.archive.org/web/20220405134451/https://kalshi.com/events/FED-22MAY/markets/FED-22MAY-T0.75)) and [Polymarket](https://polymarket.com/market/will-the-fed-set-interest-rates-above-1-after-their-scheduled-june-meeting) ([a](http://web.archive.org/web/20220315184508/https://polymarket.com/market/will-the-fed-set-interest-rates-above-1-after-their-scheduled-june-meeting)) offer markets on interest rate hikes by the US Federal Reserve. This seems like an interesting hedge.
Hypermind has a small [$5k tournament on African developments](https://prod.hypermind.com/ngdp/en/welcomeHA.html) ([a](http://web.archive.org/web/20211128100119/https://prod.hypermind.com/ngdp/en/welcomeHA.html))
![](images/ae398c69f1958af3f7c93b64ab0de9bef84bad65.png)
Polymarket has been offering rewards for trading. Trading incurs a fee, but trading rewards are higher, which incentivizes wash trading (trading back-and-forth at high volumes.) The thing is, Polymarket developers are not stupid, so I'm guessing that they are doing this because they want the volume to be as high as possible ¿possibly to impress or appease investors? The non-nefarious explanation is that they deeply want to attract new traders and keep the engagement of old ones, and are ok paying wash traders as the cost of doing business.
In any case, I have downgraded [my estimates](https://github.com/QURIresearch/metaforecast/commit/067832b72f44420049330cbdc07269605e785160) ([a](https://web.archive.org/web/20220405151057/https://github.com/QURIresearch/metaforecast/commit/067832b72f44420049330cbdc07269605e785160)) of Polymarket prediction quality as a function of volume for Metaforecast. [Metaforecast](https://metaforecast.org/) ([a](http://web.archive.org/web/20220314154524/https://metaforecast.org/)) itself is doing great, with a bit over 15k views a month. I've also recently hired an [extremely competent developer](https://github.com/berekuk) ([a](http://web.archive.org/web/20220124211739/https://github.com/berekuk)) to continue working on the project. So far, he has been leaving the codebase in a much better position, solidifying and professionalizing parts that were previously more glued together with ducktape. [Feature ideas](https://github.com/QURIresearch/metaforecast/issues) are welcome!
[Spose](https://spose.app/) ([a](http://web.archive.org/web/20220331211539/https://spose.app/)) (pronounced like "I suppose", I'm guessing) is a smallish platform to "casually forecast serious stuff". They ask one very short-term question every day.
## Research
![](images/be76720e2918d0be6803b4839a2dbc1b498389a3.png)
Source: [goodjudgment.com](https://goodjudgment.com/) frontpage.
[Comparing top forecasters and domain experts](https://forum.effectivealtruism.org/posts/qZqvBLvR5hX9sEkjR/comparing-top-forecasters-and-domain-experts) ([a](http://web.archive.org/web/20220315164806/https://forum.effectivealtruism.org/posts/qZqvBLvR5hX9sEkjR/comparing-top-forecasters-and-domain-experts)) reviews the idea that the very best generalist forecasters can beat experts at predicting events _in their own domain of expertise_.
In particular, there is an oft-cited refrain that "superforecasters are 30% better than experts with access to classified information". But the authors find that a large share of the difference may boil down to different aggregation methods: _"The forecaster prediction market performed about as well as the intelligence analyst prediction market; and in general, prediction pools outperform prediction markets in the current market regime (e.g. low subsidies, low volume, perverse incentives, narrow demographics)."_
The CEO of Good Judgment Inc answers [in the comments](https://forum.effectivealtruism.org/posts/qZqvBLvR5hX9sEkjR/comparing-top-forecasters-and-domain-experts?commentId=ohy9WutsY6o8LjGud) ([a](http://web.archive.org/web/20220405134724/https://forum.effectivealtruism.org/posts/qZqvBLvR5hX9sEkjR/comparing-top-forecasters-and-domain-experts?commentId=ohy9WutsY6o8LjGud)): _"These claims about Superforecasting are eye-catching. However, it's difficult to draw any conclusions when most of the research cited doesn't in fact include Superforecasters"_. But this seems inconsistent with the eye-catching 30% claim on Good Judgment's own website.
My forecasting group recently estimated the [risks of nuclear war](https://forum.effectivealtruism.org/posts/KRFXjCqqfGQAYirm5/samotsvety-nuclear-risk-forecasts-march-2022) ([a](http://web.archive.org/web/20220319084525/https://forum.effectivealtruism.org/posts/KRFXjCqqfGQAYirm5/samotsvety-nuclear-risk-forecasts-march-2022)). We arrived at a 24 in a million chance that an "informed and unbiased" Londoner would be hit by a nuclear blast in the next month. This estimate was picked up by [Scott Alexander](https://astralcodexten.substack.com/p/mantic-monday-31422?s=r) ([a](http://web.archive.org/web/20220321223923/https://astralcodexten.substack.com/p/mantic-monday-31422?s=r)) and the [Spanish press](https://english.elpais.com/science-tech/2022-03-26/is-it-possible-to-predict-the-future-of-the-war-in-ukraine-online-forecasting-communities-think-so.html) ([a](http://web.archive.org/web/20220403005302/https://english.elpais.com/science-tech/2022-03-26/is-it-possible-to-predict-the-future-of-the-war-in-ukraine-online-forecasting-communities-think-so.html))
Now a subject matter expert who served as deputy staff director of the Senate Committee on Foreign Relations where he worked on approval of the New START agreement, [critiziced our estimates](https://forum.effectivealtruism.org/posts/W8dpCJGkwrwn7BfLk/) ([a](http://web.archive.org/web/20220326095536/https://forum.effectivealtruism.org/posts/W8dpCJGkwrwn7BfLk/)). Our answer can be seen in [the comments](https://forum.effectivealtruism.org/posts/W8dpCJGkwrwn7BfLk/?commentId=PRkbcuTRDi6s2seLj) ([a](https://web.archive.org/web/20220405134835/https://forum.effectivealtruism.org/posts/W8dpCJGkwrwn7BfLk/?commentId=PRkbcuTRDi6s2seLj)).
![](images/1fdd63c407a3fd9083bfee556bbed6fc95d13d9c.png)
[Why short-range forecasting can be useful for longtermism](https://forum.effectivealtruism.org/posts/zjMeGcgWpvDcm3CkH/why-short-range-forecasting-can-be-useful-for-longtermism) ([a](http://web.archive.org/web/20220322085048/https://forum.effectivealtruism.org/posts/zjMeGcgWpvDcm3CkH/why-short-range-forecasting-can-be-useful-for-longtermism))
> I argue that advances in short-range forecasting (particularly in quality of predictions, number of hoursted, and the quality and decision-relevance of questions) can be robustly and significantly useful for existential risk reduction, even without directly improving our ability to forecast long-range outcomes, and without large step-change improvements to our current approaches to forecasting itself (as opposed to our pipelines for and ways of organizing forecasting efforts).
>
> To do this, I propose the hypothetical example of a futuristic EA Early Warning Forecasting Center. The main intent is that, in the lead up to or early stages of potential major crises (particularly in bio and AI), EAs can potentially (a) have several weeks of lead time to divert our efforts to respond rapidly to such crises and (b) target those efforts effectively.
In [Cryptoepistemology](https://www.lesswrong.com/posts/sDk3RziupmzShN2RN/cryptoepistemology) ([a](http://web.archive.org/web/20220307222715/https://www.lesswrong.com/posts/sDk3RziupmzShN2RN/cryptoepistemology)), davidad maps different theories of justified beliefs to different styles of cryptographic proof.
![](images/1884d4bff206fcdeca154b51b2fbaa121872ee89.png)
Lastly, I really enjoyed two prediction-market related April Fool's jokes: [Using prediction markets to generate LessWrong posts](https://www.lesswrong.com/posts/stefz96G9ycfMhjD2/using-prediction-markets-to-generate-lesswrong-posts) ([a](http://web.archive.org/web/20220404092319/https://www.lesswrong.com/posts/stefz96G9ycfMhjD2/using-prediction-markets-to-generate-lesswrong-posts)) and [Anti-Corruption Market](https://www.lesswrong.com/posts/px8ha4wSXcmfejEF9/anti-corruption-market) ([a](http://web.archive.org/web/20220404092344/https://www.lesswrong.com/posts/px8ha4wSXcmfejEF9/anti-corruption-market)). I'm also pretty proud of my own April Fool's: [Forecasting Newsletter: April 2222](https://forecasting.substack.com/p/forecasting-newsletter-april-2222?s=w) ([a](https://web.archive.org/web/20220405155605/https://forecasting.substack.com/p/forecasting-newsletter-april-2222?s=w)).
---
Note to the future: All links are added automatically to the Internet Archive, using this [tool](https://github.com/NunoSempere/longNowForMd) ([a](http://web.archive.org/web/20220304021930/https://github.com/NunoSempere/longNowForMd)). "(a)" for archived links was inspired by [Milan Griffes](https://www.flightfromperfection.com/) ([a](http://web.archive.org/web/20220304021952/https://www.flightfromperfection.com/)), [Andrew Zuckerman](https://www.andzuck.com/) ([a](http://web.archive.org/web/20220211080149/https://www.andzuck.com/)), and [Alexey Guzey](https://guzey.com/) ([a](http://web.archive.org/web/20220304022034/https://guzey.com/)).
---
> y en el mundo, en conclusión, 
>
> todos sueñan lo que son, 
>
> aunque ninguno lo entiende.
English translation:
> and in the world, in conclusion, 
>
> they all dream what they are 
>
> although none of them understands it
Fragment of Segismundos monologue, in _La vida es sueño_, from Spanish playwright Calderón de la Barca.

View File

@ -0,0 +1,32 @@
A quick note on the value of donations
======================================
The value you get from money is higher the less money you have. So if you live on $50k a year, $100 is worth much less than if you earn $500 a year.
If you eyeball a map of GPD per capita, then Europe/the US is living on around $50k/year, Latin America is living on $10k a year, and central Africa is living on $1k/year:
<iframe src="https://ourworldindata.org/grapher/gdp-per-capita-worldbank" loading="lazy" style="width: 100%; height: 600px; border: 0px none;"></iframe>
But these are averages, so the poorest people in Africa are earning less:
<iframe src="https://ourworldindata.org/grapher/share-of-population-in-extreme-poverty?country=BGD~BOL~MDG~IND~CHN~ETH~COD" loading="lazy" style="width: 100%; height: 600px; border: 0px none;"></iframe>
To a first approximation, the value you get from your money is roughly logarithmic. So the value of $100 for someone earning $50k a year is
Δrich = log(50,000 + 100) - log(50,000) = 0.00199...
whereas the value of $100 for someone earning $500 a year is
Δpoor = log(500 + 100) - log(500) = 0.182...
The units are arbitrary, but the gist is that an additional $100 is worth Δpoor/Δrich = 0.182/0.00199 ~90 times as much to someone much poorer.
[GiveDirectly](https://www.givedirectly.org/financials/) does just that: sending money to some of the people to whom it would make the biggest difference. They spend a bit over 10% on delivery costs, have a good reputation amongst people who care about it, and are recommended by GiveWell, the hardcore charity evaluator:
<div class="flourish-embed flourish-chart" data-src="visualisation/7232617?728238"><script src="https://public.flourish.studio/resources/embed.js"></script></div>
So donations to GiveDirectly are probably a good way to start on the path to making the world a better place, or at least vastly, vastly better than many alternatives. For further resources, see:
- [GiveWell](https://www.givewell.org/)
- [80,000 hours](https://80000hours.org/)
- [Effectiveness is a Conjunction of Multipliers](https://forum.effectivealtruism.org/posts/GzmJ2uiTx4gYhpcQK/effectiveness-is-a-conjunction-of-multipliers)

View File

@ -2,6 +2,10 @@
### Most recent pieces
- 2022/04/05: [Forecasting Newsletter: March 2022](https://nunosempere.com/blog/2022/04/05/forecasting-newsletter-march-2022/)
- 2022/04/01: [Forecasting Newsletter: April 2222](https://nunosempere.com/blog/2022/04/01/forecasting-newsletter-april-2222/) (april fools)
- 2022/03/17: [Valuing research works by eliciting comparisons from EA researchers](https://nunosempere.com/blog/2022/03/17/valuing-research-works-by-eliciting-comparisons-from-ea/)
- 2022/03/10: [Samotsvety Nuclear Risk Forecasts — March 2022](https://nunosempere.com/blog/2022/03/10/samotsvety-nuclear-risk-forecasts-march-2022/)
- 2022/03/05: [Forecasting Newsletter: February 2022](https://nunosempere.com/blog/2022/03/05/forecasting-newsletter-february-2022/)
- 2022/02/18: [Five steps for quantifying speculative interventions](https://forum.effectivealtruism.org/posts/3hH9NRqzGam65mgPG/five-steps-for-quantifying-speculative-interventions)
- 2022/02/08: [We are giving $10k as forecasting micro-grants](https://forum.effectivealtruism.org/posts/oqFa8obfyEmvD79Jn/we-are-giving-usd10k-as-forecasting-micro-grants)
@ -12,7 +16,6 @@
- [Simple comparison polling to create utility functions](https://forum.effectivealtruism.org/posts/9hQFfmbEiAoodstDA/simple-comparison-polling-to-create-utility-functions)
- [A Model of Patient Spending and Movement Building](https://forum.effectivealtruism.org/posts/FXPaccMDPaEZNyyre/a-model-of-patient-spending-and-movement-building)
- [An estimate of the value of Metaculus questions](https://forum.effectivealtruism.org/posts/zyfeDfqRyWhamwTiL/an-estimate-of-the-value-of-metaculus-questions)
- [Building Blocks of Utility Maximization](https://forum.effectivealtruism.org/posts/8XWi8FBkCuKfgPLMZ/building-blocks-of-utility-maximization)
### 2022/02/20

Binary file not shown.

After

Width:  |  Height:  |  Size: 162 KiB

BIN
gossip/computer-setup.jpg Executable file → Normal file

Binary file not shown.

Before

Width:  |  Height:  |  Size: 162 KiB

After

Width:  |  Height:  |  Size: 41 KiB

View File

@ -1,3 +1,7 @@
_2022/04/01_: I post a really inside-jokeish [April fools' newsletter](https://nunosempere.com/blog/2022/04/01/forecasting-newsletter-april-2222).
_2022/03/17_: I am working on a doc outlining things I'm confused about w/r to OpenPhil. If this is something which may interest you feel free to [reach out](mailto:nunosempere@protonmail.com).
_2022/03/08_: Scott Alexander wants your [Google Drive documents](https://forum.effectivealtruism.org/posts/xapRLBTpMYokrpd9q/we-re-announcing-a-usd100-000-blog-prize?commentId=kgjCJNiKh5NEWDLPu)
_2022/03/07_: Misha and Gavin post the results of a new investigation that finds that [Good Judgment claims about superforecasters vs experts were greatly exaggerated](https://forum.effectivealtruism.org/posts/qZqvBLvR5hX9sEkjR/comparing-top-forecasters-and-domain-experts)

BIN
photo.jpg

Binary file not shown.

Before

Width:  |  Height:  |  Size: 253 KiB

After

Width:  |  Height:  |  Size: 112 KiB

0
sitemap.txt Normal file → Executable file
View File