Compare commits

...

6 Commits

21 changed files with 1110 additions and 1 deletions

15
.subscribe/index.html Normal file
View File

@ -0,0 +1,15 @@
<form method="post" action="https://listmonk.nunosempere.com/subscription/form" class="listmonk-form">
<div>
<h3>Subscribe</h3>
<input type="hidden" name="nonce" />
<p><input type="email" name="email" required placeholder="E-mail" class="subscribe-input" /></p>
<p><input type="text" name="name" placeholder="Name (optional)" class="subscribe-input" /></p>
<p>
<input id="8b4b1" type="checkbox" name="l" checked value="8b4b1b32-7df5-4ccb-a99e-60edd4f4b40c" />
<label for="8b4b1" style="font-size: 18px">samotsvety.org</label>
</p>
<p><input type="submit" value="Subscribe" class="subscribe-button"/></p>
</div>
</form>

View File

@ -0,0 +1,470 @@
Prediction Markets in The Corporate Setting
==============
What follows is a report that Misha Yagudin, Nuño Sempere, and Eli Lifland wrote back in October 2021 for [Upstart](<https://wikiless.org/wiki/Upstart_(company)?lang=en>), an AI lending platform that was interesting in exploring forecasting methods in general and prediction markets in particular. 
We believe that the report is of interest to EA as it relates to the [institutional decision-making](https://forum.effectivealtruism.org/tag/institutional-decision-making) cause area and because it might inform EA organizations about which forecasting methods, if any, to use. In addition, the report covers a large number of connected facts about prediction markets and forecasting systems which might be of interest to people interested in the topic.
Note that since this report was written, Google has started a new [internal prediction market](https://cloud.google.com/blog/topics/solutions-how-tos/design-patterns-in-googles-prediction-market-on-google-cloud). Note also that this report mostly concerns company-internal prediction markets, rather than external prediction markets or forecasting platforms, such as Hypermind or Metaculus. However, one might think that the concerns we raise still apply to these. 
This writeup was originally posted on the [EA Forum](https://forum.effectivealtruism.org/posts/dQhjwHA7LhfE8YpYF/prediction-markets-in-the-corporate-setting), where there was some interesting discussion in the comments.
## Executive Summary
* We reviewed the academic consensus on and corporate track record of prediction markets.
* We are much more sure about the fact that prediction markets fail to gain adoption than about any particular explanation of why this is.
* The academic consensus seems to overstate their benefits and promisingness. Lack of good tech, the difficulty of writing good and informative questions, and social disruptiveness are likely to be among the reasons contributing to their failure.
* We don't recommend adopting company-internal prediction markets for these reasons. We see room for exceptions: using them in limited contexts or delegating external macroeconomic questions to them.
* We survey some alternatives to prediction markets. Generally, we prefer these alternatives' pros and cons.
## Introduction
This section:
* Defines prediction markets
* Outlines their value proposition
### What are prediction markets
Prediction markets are markets in which contracts are traded that have some value if an event happens, and no value if an event doesn't happen. For example, a share of "Democrat" in a prediction market on the winner of the 2024 US presidential election will pay $1 if the winner of the 2024 election is a Democrat, and $0 if the winner is not.
Prices in prediction markets can be interpreted as probabilities. For example, the expected value of a "Democrat" contract in the previous market is $ 1 ⋅ p + $ 0 ⋅ ( 1 p ) , where p is the chance that a Democrat will win. To the extent that the market is efficient, one expects the expected value of a contract to be equal to its current value. So if one observes a contract price of $0.54, one can deduce the expected probability by setting $ 0.54 = $ 1 p + $ 0 ( 1 p ) , and thus p = 0.54 = 54 % . It is also in this sense that one says that "the market as a whole expects" Democrats to win with 54% probability.
One might expect markets _not_ to be efficient, for instance after remembering Keynes' adage that "markets can remain irrational longer than you can remain solvent." And we do see inefficiencies in modern prediction markets, sometimes glaring. However, note that, unlike the stock market, prediction markets have hard deadlines after which the market comes in contact with reality and gets resolved.
Besides binary prediction markets, there are also markets with multiple options—e.g., "Who will win the 2024 election?", with multiple contracts only one of which will pay out in the end,— or markets that pay out proportionally to some yet unknown number—e.g., "How many Senate seats will Republicans control after the 2022 elections?", which pays out proportionally to the number of seats.
Predictions markets thrive in some niches:
* Bookmakers are offering odds and taking bets for notable sports events.
* Markets like PredictIt and BetFair attract quite a bit of money for major political events \[cf. predict-it-growth\].
* Some financial markets can be thought of as prediction markets.
\[cf. predict-it-growth\]: US Presidential elections volumes on Betfair [have been growing](https://www.lesswrong.com/posts/4XXnMXfTrYXqpugwB/growth-of-prediction-markets-over-time?commentId=t2jmTnhMB9x2GzhrP) at an implied ~35%/year.
### Value proposition
The core value proposition of prediction markets is that they may produce accurate, calibrated and useful probabilities. They create an incentive for participants to seek information that would give them an edge. And when those participants trade on a prediction market they reveal some information about their degree of conviction. Finally, prediction markets provide an aggregate of the differing perspectives, namely the current price. At their best, the market mechanism aggregates more information than what could fit in the working memory of any one individual.
Ideally, prediction markets would elicit knowledge that wouldn't have otherwise been shared—e.g., the knowledge that a given deadline is unrealistic—and that knowledge would be used to make better decisions. This is the core pathway to impact, and if prediction markets don't end up changing decisions, their impact will normally be negligible, no matter how accurate they otherwise are.
In addition to producing accurate forecasts which drive better decisions, prediction markets could also be used to manage risk. For instance, a company fearing rising taxes or an increased regulatory burden might make a bet to hedge against that risk, i.e., make a bet on the side of the undesirable event so that if it happens, the company mitigates part of its downside. As of today, few prediction markets would have high enough volume and liquidity to allow for meaningful hedging, with the possible exception of Nadex and FTX, the latter of which has recently seen behaviour consistent with a large actor hedging against the chance of the Tokyo Olympics being cancelled. Note that this point, while perhaps of interest to Upstart more generally, doesn't feature in the rest of the report, which focuses on internal prediction markets.
Besides better decisions and risk hedging, one can also speculate that prediction markets may have a range of social benefits. For example, Robin Hanson speculates that prediction markets, if widely used, might create a visible expert or social consensus with clear incentives for honest contribution for arbitrary questions. For example, input from prediction markets can help people operate under shared and trustworthy assumptions like "we will not return to the office in the next half a year." Widespread prediction markets might also decrease the influence of especially persuasive individuals and groups: it would be harder for them to manufacture consensus just through good oratory without having a good track record or putting money on the line.
Further, Robin Hanson developed a proposal for governance called Futarchy \[cf. hanson-2013\]. In a futarchy, conditional prediction markets are used for estimating welfare (e.g. GDP or market capitalization) conditional on taking decisions under consideration, then the decision leading to the highest welfare is chosen. \[cf. crypto-futarchy\]
But overall, better information on its own has very little value if doesn't eventually end up changing any decisions. One can thus calculate the value of information as the difference in value between the decisions before and after gaining that information. In particular, the value of information may be lower than the cost to acquire it, and this could be a reason for the lack of adoption of prediction markets.
\[cf. hanson-2013\]: Hanson, Robin. “Shall We Vote on Values, But Bet on Beliefs?: Shall We Vote on Values, But Bet on Beliefs?” _Journal of Political Philosophy_ 21, no. 2 (June 2013): 15178. [https://doi.org/10.1111/jopp.12008](https://doi.org/10.1111/jopp.12008).
\[cf. crypto-futarchy\]: In 2016, one of the most ambitious futarchy-based projects, the [DAO](https://en.wikipedia.org/wiki/The_DAO_(organization)), was launched and failed. But more crypto organizations have tried out futarchy-inspired governance mechanisms [before and since](https://en.wikipedia.org/wiki/Decentralized_autonomous_organization#List_of_DAOs). It seems plausible to us that in the coming years a larger number of organizations might try similar models.
## Track record
This section
* Gives examples of high-profile companies using prediction markets
* Outlines the academic consensus on prediction markets, and the limitations of this consensus
### High-profile companies that have used prediction markets.
In our literature review, we have found that a large number of high-profile for-profit organizations have used prediction markets. These include Arcelor Mittal, Best Buy, Boeing, CNBC, Chevron, Chrysler, Deutsche Bank, Electronic Arts, Ford, General Electric, Goldman Sachs, Google, Hewlett Packard, Intel, J&J, Koch Industries, Lockheed Martin, MITRE Corp, Microsoft, Motorola, Nokia, PayPal, Proctor and Gamble, Qualcomm, Siemens, Yahoo and Yandex, among others.
Sources for some of the above are given in the following table. The rest are taken from Table 1 of [Cowgill and Zitzewitz, 2015](https://academic.oup.com/restud/article-abstract/82/4/1309/2607345).
| Company | Source | Notes |
|------------------------------|----------------------------------|-------------------------------------------------------------------------------------------------------|
| Eli Lilly | The End Of Management, The Times | Large American pharmaceutical corporation |
| Ford & others | Cowgill and Zitzewitz, 2015 | |
| Goldman Sachs, Deutsche Bank | Wolfers and Zitzewitz, 2004 | High volume markets, website only remains on the Internet Archive |
| Google | Cowgill, Wolfers, et al., 2009 | |
| Hewlett Packard | Chen and Plott, 2002 | Markets were thinly traded, but still performed better than HP's own predictions |
| Koch Industries | Cowgill and Zitzewitz, 2013 | Early draft of the 2015 paper mentions Koch Industries, though this was removed in the final version. |
| Microsoft | Prediction Markets at Microsoft | Very clear pdf; very much worth downloading and reading. |
| Nokia | Hankins and Lee, 2011 | |
| Siemens | Ortner, 1998 | Prediction markets predicted deadlines better than other processes. |
| Yahoo | Bloomberg | |
| Yandex | Interview with Yandex employee. | |
### Academic consensus
In practice, prediction markets have done well at predicting elections, sports outcomes, or at anticipating future data releases such as COVID infection numbers. Multiple academic studies—as well as analogies with the more liquid stock markets—give reason to believe that prediction markets should produce accurate probabilities. To sample some representative quotes from the academic literature:
> Ortner (1998) described an experiment at Siemens in which an internal market predicted that the firm would definitely fail to deliver on a software project on time, even when traditional planning tools suggested that the deadline could be met.
>
> An internal market at Hewlett-Packard produced more accurate forecasts of printer sales than the firms internal processes (Chen and Plott, 2002).
>
> In each case, the firms ran real money exchanges, with only a relatively small trading population (20 60 people), and subsidized participation in the market, by either endowing traders with a portfolio or matching initial deposits. The predictive performance of even these very thin markets was quite striking. (Wolfers and Zitzewitz, 2004)
>
> Despite theoretically adverse conditions, we find these markets are relatively efficient, and improve upon the forecasts of experts at all three firms by as much as a 25% reduction in mean squared error (Cowgill and Zitzewitz, 2015)
Already in 2008, a group of prestigious economists, including four Nobel Prize recipients published _The Promise of Prediction Markets_ \[cf. arrow-2008\], which outlined their common optimism about prediction markets as a tool for producing forecasts with lower prediction error than conventional forecasting methods, and urged the Commodity Futures Trading Commission as well as US state and federal legislatures to establish safe-harbour rules to encourage the use and research of prediction markets.
In addition, there are elegant theoretical reasons to think that prediction markets may perform optimally. For instance, per (Begelzimmer et al., 2012) \[cf. beygelzimer-2012\], if bettors are following a Kelly betting strategy the market learns at the optimal rate, i.e., the market price reacts as if updating according to Bayes' Law.
\[cf. arrow-2008\]: Arrow, Kenneth J., Robert Forsythe, Michael Gorham, Robert Hahn, Robin Hanson, John O. Ledyard, Saul Levmore, et al. “The Promise of Prediction Markets.” _Science_ 320, no. 5878 (May 16, 2008): 877. [https://doi.org/10.1126/science.1157679](https://doi.org/10.1126/science.1157679).
\[cf. beygelzimer-2012\]: Beygelzimer, Alina, John Langford, and David Pennock. “Learning Performance of Prediction Markets with Kelly Bettors.” ArXiv:1201.6655 \[Cs, q-Fin\], January 31, 2012. [http://arxiv.org/abs/1201.6655](http://arxiv.org/abs/1201.6655).
### What is left unsaid in the academic literature
However, in most cases, prediction markets are only used for a short time, and early adopters chose not to incorporate markets into their business after initial trials. For instance, Hewlett Packard's prediction market was in collaboration with CalTech academics, and once the collaboration ended, it seems that so did the use of prediction markets within HP. Similarly, we know that the use of prediction markets at Google ceased after their main advocates left the organization.
In contrast with the literature, which is generally very optimistic, an internal Microsoft document provides a more honest outlook over the advantages and disadvantages of prediction markets. Further, it lists observed pros and cons of using prediction markets in the various contexts: project schedules, external reviews, sales, etc. It can be found [here](https://users.wfu.edu/strumpks/PMConf_2007/HenryBerg(PredictionPoint%20KC%20071101).pdf) ([archive link](https://web.archive.org/web/2020*/https://users.wfu.edu/strumpks/PMConf_2007/HenryBerg(PredictionPoint%20KC%20071101).pdf)). Although outdated and terse, it also has healthier epistemic incentives. We have also gathered excerpts related to prediction markets from interviews by Tyler Cowen of industry and academic experts at \[private\]. These are also worth reading.
To express this point explicitly, the academic literature on prediction markets gives a distorted perspective on their efficacy \[cf. page-2020\] and suitability for companies. This is probably because of perverse academic incentives which cause academics to overstate the impact and promisingness of their research. In particular, the academic literature makes emphasis on how prediction markets are interesting and promising, but makes less emphasis on how they are underdeveloped as a technology. Academics that study prediction markets for a time and then move on to other more fruitful areas also do not generally write up the reasons for their change in focus—academic bloggers like Hanson are an exception.
Further, we would like to note that while prediction markets performed better than sales and scheduling techniques used 20 years ago, they will face much stronger competition now because forecasting tools have improved a lot (e.g., through data science methods, but also through e.g., tools like Fibonacci point estimation \[cf. fibonacci\].) We dive into specialized SaaS systems later in the report.
Proponents like Robin Hanson argue that prediction markets are not popular because they are too politically disruptive: they expose hypocrisy and there is a lot of hypocrisy; they remove excuses. They can be thought of as a very direct socially awkward person who doesnt navigate interpersonal relationships well and misses social conventions around excuses, failure, or the need to convince stakeholders. This is unpleasant for people interacting with them. While this person might be right, they are bad at coalition politics (and hence are ostracized) and/or they are harmful because their interactions—while candid—are not productive in a more socially competent world.
To give a recent example, forecasters at platforms like [pandemic.metaculus.com](https://pandemic.metaculus.com) were communicating fairly accurate predictions about the coronavirus pandemic, while the US CDC was, to our taste, fairly misleading. This can be partially explained by the different roles forecasters and bureaucrats play. Forecasters are just seeking truth, whereas bureaucrats are coordinating a large number of people who may have very different beliefs and decisions to make. Just being candid by saying that "we are unsure but this and that seems likely" works for the Metaculus crowd advising friends and relatives. But it might not work for a government agency communicating to millions of indifferent and millions of anxious people looking for guidance or needing some coaxing to do anything at all. Of course, less charitable interpretations of the CDC's behavior are also possible.
Another guess as to why prediction markets fail to gain adoption is that it's hard to incorporate them into the making of decisions. Partly, social forces outlined in previous paragraphs are to blame. But also, it's just hard to operationalize questions such that they are useful for making decisions. Even if one did, it might be difficult to create a forecasting system around those questions which has low enough overhead for them to be worth it. This suggests trying to use some of prediction markets' advantages while minimizing overhead. In a later section, we outline how this might be possible.
However, there are many other hypotheses as to why prediction markets fail to be useful or to get adopted; some are outlined in a later section. Some of these hypotheses may apply to Upstart; they seem difficult or expensive to rigorously falsify, and would likely require high overhead and coordination costs to anticipate, diagnose and overcome.
The rest of this report
* provides details about how prediction markets ought to be structured in order to increase their usefulness,
* presents hypotheses about the failure of prediction markets in the corporate setting,
* outlines alternative information aggregation mechanisms,
* and concludes with recommendations.
\[cf. page-2020\]: Cf. [this review](https://twitter.com/page_eco/status/1308049931278512128) of experimental markets which [finds](https://twitter.com/page_eco/status/1308054640911556609) that markets are much better at incorporating public information than private information. And one of the selling points of the internal prediction market is to elicit information from everyone across the organization.
\[cf. fibonacci\]: cf. [Fibonacci Agile Estimation](https://www.productplan.com/glossary/fibonacci-agile-estimation/)
## Requirements and challenges for a well-functioning prediction market
We believe the most important challenges are:
1. Maintaining a prediction market given current technologies
* Our research finds that current technologies add substantial overhead to maintaining prediction markets.
2. Writing questions that are both informative for decision-making and attractive to traders
* In particular, questions must a) target areas where employees have useful knowledge to share, b) concern topics where management has significant uncertainties about how to proceed, and c) be suitable for clear operationalization.
* There are many failure modes questions can fall into, as detailed below.
3. Attracting enough predictors to get accurate forecasts while also preventing the market from taking up too much of the employees' time and attention.
4. Managing the social effects and potential backlash of prediction markets.
However, we are fairly uncertain because it is difficult to observe the reasons for the lack of adoption by corporations. In any case, we note that the practical evidence for the usefulness of prediction markets is not so overwhelming that they have penetrated industry. In particular, if prediction markets were known to be highly useful, it would seem difficult to keep this knowledge limited to any one company, given imperfect employee retention.
### Categorization scheme
The requirements and challenges for well-functioning prediction markets can be broken down into the following categories:
1. The markets must have a low enough cost to create and maintain.
2. The markets must provide more value to decision-makers than the cost to create them and to subsidize predictions on them.
3. The markets must be attractive enough to traders to elicit accurate predictions.
4. The markets must not have large negative side-effects, such as costs to the company's dynamics and morale.
What follows is a summary of the most important challenges, followed by details on each requirement and challenges to achieving it. We quote in full several requirements and challenges from Zvi Mowshowitz's [Prediction Markets: When Do They Work?](https://thezvi.wordpress.com/2018/07/26/prediction-markets-when-do-they-work/) and Tyler Cowen's [Why dont more businesses use prediction markets?](https://marginalrevolution.com/marginalrevolution/2006/03/why_dont_busine.html).
### The market must have a low enough cost to create and maintain
The market must be low enough cost to create and maintain. As documented above, the creation step has happened at many companies so the main plausible bottleneck seems to be the maintenance step. Challenges for maintaining prediction markets include:
1. Chicken-and-egg problems of an immature technology:
* On the engineering side: Prediction markets and forecasting platforms are not mature as a technology. They are not streamlined, they are clunky \[cf. clunky-ux\] to use, and there are not many people familiar with the practice. This is unlike, e.g., Scrum or other development frameworks. In particular, it's hard work to incorporate prediction markets into decision-making or into the "rhythm-of-the-business", and this hard work hasn't been done yet. If "prediction markets as a service" was an established product, they might be worth buying, but they aren't.
* On the social side: Practically nobody is experienced at judgmental forecasting or at prediction markets. On the one hand, this makes it harder and more expensive to acquire the related skills \[cf. superforecaster-selection\]. But on the other hand, introducing a new metric to rank current managers or to decide whom to promote to managerial positions might be perceived as disruptive, unfair, and might otherwise be subject to the costs of innovation.
* On the rationality-waterline side: On well-functioning financial and prediction markets, participants self-select from the general population by being good at beating the market, and come from a huge pool of candidates. Corporate prediction markets don't have the luxury of a large pool to select from and e.g. lack of necessary skills like probabilistic thinking shrinks an already thin market.
2. Invisible improvements: Prediction markets might lead to a significant improvement in decision-making, say, to decisions that are 20% better. But that improvement might be too small to be perceivable, so the methodology gets abandoned after an initial experimental period. Peter Thiel's _Zero to One_ suggests that startups need a 10x improvement in their product for that difference to be so noticeable as to produce a "wow" factor which is conducive to a monopoly. Prediction markets as they are right now don't seem to reach that threshold. Note also that in the case of a startup offering prediction markets as a service, it might only capture a fraction of the improvements in decision quality that their clients can detect, so it might never reach profitability \[cf. personal-communication\].
\[cf. clunky-ux\]: It seems plausible that the lack of software with a great UX (user experience) might be one particularly significant bottleneck. For example, two of the authors couldn't stick with forecasting on Metaculus because of the messy UX around forecasting distributions and keeping track of forecasts. Current software (both prediction markets and forecasting platforms) isn't shiny and pleasant to use, nor it is very collaborative or socially rewarding. Perhaps because competent people have many demands on their time, not finding platforms attractive could likely prevent them from engaging.
\[cf. superforecaster-selection\]: In contrast, superforecasters are selected in two stages: they predict on at least 100 questions on GJ Open and pass a 3-month long trial period. And from [research into superforecasting](https://aiimpacts.org/evidence-on-good-forecasting-practices-from-the-good-judgment-project-an-accompanying-blog-post/#easy-footnote-10-1260) we know that talent-spotting is responsible for the most notable jump of accuracy.
One can also speculate that part of Robinhood's success in introducing millions to day trading can be attributed to its superior UX. 
\[cf. personal-communication\]: In personal communications, a celebrated data scientist consultant/executive said that introducing "big data" to already competitive traditional companies leads to limited gains (~1-2%).
### The questions provided by the market must provide more value to decision-makers than the cost to create and predict on them
For a market to be useful, good predictions must provide substantial value to decision-makers. The time to create and predict on questions is non-negligible and the benefits need to be large enough to outweigh the time costs \[cf. caveat-misha\]. Additionally, perhaps the bottleneck for improving the company isn't necessarily better decisions after all.
To give a few examples:
* Sales are important but forecasting sales might not be as important. In particular, forecasting sales might not meaningfully help increase them. Salespeople already have well-developed incentive structures to encourage them to sell more. Sales forecast might be very important for just-in-time manufacturing, but this is only crucial for a few industries.
* Forecasting project scheduling is important. But it might not restore confidence in likely-to-be-missed deadlines, and measures taken to catch up to deadlines would or would not be implemented regardless of the forecast.
The following are some specific hypotheses as to why it may be hard to create questions that provide substantial value to decision-makers:
1. No matter what they pretend, businesses are not much interested in forecasting many future variables. Successful businesses find product markets they can control for a long time. They do a few things well, and let a surprisingly large number of tasks slide. \[cf. company-model\]
2. For many—but not all—variables of interest, we already implicit betting markets in the form of resource prices. These may be explicit—like aluminium prices, or slightly fuzzy—like the market salary for software engineers. Firms can look at those prices in outside markets for the information they need, rather than to potential internal prediction markets.
* Note that this applies to some questions of interest, but not to others. For instance, most of the information about whether "we should move to Columbus/Texas/.../outside the Bay area" might be contained in the market salary and property prices at those locations, which are external markets. But for markets that more intimately depend on the firm's specific details—e.g, whether a project will be done on time, or how much of a product will be sold in a given timeframe—an external market might not be desirable or feasible. In this latter case, one might then use an internal information aggregation method, or use a SaaS to generate an estimate.
3. Large corporate companies are far more constrained than most outsiders imagine. Interest groups must be courted, coordinated, and sometimes fought every step of the way. When it comes to choice, there are fewer degrees of freedom than one might think. The real question is not what to do, but rather having the will and effectiveness to do it. Prediction markets dont help much in this regard.
4. There is a challenge about which markets to choose. Markets about schedules might lead to self-fulfilling prophecies or undermine management. Markets about intimately decision-relevant factors might not see much participation, or participants might not have as much information. Markets about external events are likely to attract more interest, but also be less useful.
5. The time taken for specialized employees to familiarize themselves with prediction markets is also costly. For example, if 20 employees who are paid $50 to $500 per hour spend one to four hours on a given prediction market, costs can quickly balloon. Also note that most people are unfamiliar with probabilities, so prediction market benefits may not accrue until participants become more familiar with probabilistic thinking. Note that under this objection, prediction markets could still add new information, just not enough to justify their cost vis-à-vis a product manager.
* Note that we need to compare the costs of prediction markets with the costs of alternatives. Some alternatives are maybe pretty costly as well. The default direct substitute for eliciting information and building consensus is meetings. It is well-known that they can be very wasteful (as they have quadratic costs in the number of participants). We are unsure about how much prediction markets will help with reducing the costs of meetings. It's plausible that some disagreements which arise during meetings could be resolved faster by deferring to whatever prediction markets say. But at least initially, prediction markets and regular procedures will co-exist, so initial costs might be pretty high.
6. Prediction markets and similar forecasting tournaments can have an addictive quality that could hurt further hurt productivity. The addictive qualities of gambling are well-documented and prediction markets share some of these addictive features. Eli has experience with being sucked into spending more time predicting on forecasting platforms than seems best, due to trying to increase his score. A similar thing may happen with trying to increase one's profits in the market. Even if employees aren't spending up much raw time on prediction markets, they may use up too large a portion of their attention \[cf. attention\].
7. The regulatory landscape is hostile towards prediction markets, and this increases their costs for risk-averse actors. In particular, real-money prediction markets have very attractive incentive structures, but fall prey to onerous regulations pertaining to gambling or the trading of securities.
* Even if internal prediction markets are not outlawed right now, they might become outlawed in the future. Delegating a fraction of one's decision-making capacity to a procedure that might get outlawed might be unnecessarily risky. It also doesn't feel outside the realm of possibility that a ruthless competitor will lobby legislators to be hard on gambling, or frame prediction markets in a negative light to the public.
\[cf. company-model\]: Perhaps when companies are small, there isn't much need for prediction markets as they add overhead and there isn't much need for more information aggregation. When a company is large, it has often already found a niche and should mostly keep doing the same thing. It may be tricky to find the middle ground where prediction markets can provide real value.
That said, it's hard to know whether such a model is true in practice, and e.g., perhaps small companies could use prediction markets or forecasting tournaments to better make early strategic decisions.
\[cf. caveat-misha\]: Though note that at Google, employees participated in a number of prediction markets as their 20% projects. This might have been fairly low cost if one thinks that the time would otherwise have been wasted. On the other hand, we know the number of traders, and a [quick guesstimate](https://www.getguesstimate.com/models/19293) arrives at an implied cost of wages of between $50k and $15M, with a point estimate of $1.5M.
\[cf. attention\]: For more on the importance of managing attention, see this [blog post by a CTO on managing attention](https://www.benkuhn.net/attention/) or this [book on maintaining enough focused work](https://www.goodreads.com/book/show/25744928-deep-work). Further, prediction markets are uniquely demanding: their nature requires constant vigilance as profits and losses are immediate. In contrast, on other forecasting platforms, one can update predictions semi-regularly without noticeable loss in performance.
### The market must be attractive enough to traders to elicit accurate predictions
A potential failure mode of prediction markets is not generating enough interest from traders to get accurate predictions. In particular, some requirements to be attractive to traders—from [an experienced trader](https://thezvi.wordpress.com/)—are:
**1\. Questions must be well-defined**
* Resolving questions when unforeseen edge cases arise will lead to needless controversy and will put participants off from participating in future markets.
* It's very unpleasant for traders not to get paid when they think they ought to per their reasonable interpretation of resolution criteria.
* A bit more abstractly, the risk of not getting paid because of a misinterpretation or ambiguity in the rules decreases traders' expected reward and therefore discourages participation.
Here is an example of how it can be unexpectedly hard to write a well-defined question for a seemingly straightforward matter: "Who will control US Senate after the elections?". "Control" is a colloquial term so some kind of operationalization is needed. "Would Democrats have >50% of seats in the US Senate?" sounds simple enough. However, it suffers from a few problems, namely:
* In the case of a 50/50 split, the VP breaks ties, so one might want to redefine control as ">50% seats or 50% seats + the VP"
* How should one count independent politicians? Today Wikipedia states that Democrats are in majority despite having only 48 senators. This is because two independent senators, Angus King of Maine and Bernie Sanders of Vermont, caucus with the Democrats.
* How should one count "rebellious" senators? In particular, Joe Manchin of West Virginia can't or doesn't want to vote on measures that would jeopardize the Democratic Party's standing on West Virginia, which is fairly conservative, and Kyrsten Sinema of Arizona likewise feels attached to bipartisanship. As a result, the Democratic Party cannot implement some of its priorities.
So a seemingly straightforward question faces many possible technicalities. This could be circumvented by taking a reasonable yet ultimately arbitrary stance on confusing details, e.g., by defining "control" as ">50% of seats". But then one might find oneself talking about and trying to forecast a strange slice of reality.
In any case, unforeseen corner cases lead to heated online discussions with no clear right way to resolve a question. And such a scenario is not particularly unlikely \[cf. metaculus-2021\] unless one drafts questions very carefully.
\[cf. metaculus-2021\]: And corner cases are not particularly unlikely, see [this question](https://www.metaculus.com/questions/7240/what-is-a-counterparty-risk-of-polymarket/) about Polymarket hitting an edge case. As mentioned above, corner cases are bad as they make traders less certain about their interpretation of resolution and hence push for higher edges for trading.
**2\. Questions resolve soon** \[cf. hedgehog-markets\]
* When traders make bets, they lock in funds that could have been used for other purposes (e.g., for other bets, for investing in an index fund, for getting crypto yields, etc.).
* If markets are sufficiently deep and liquid, this issue can be mitigated, because participants can exit their positions after profiting from discovering new information earlier than others.
\[cf. hedgehog-markets\]: Recent developments in crypto prediction markets make this point somewhat moot, because the funds at stake can be invested to generate yield using a crypto contract, and then given out to the winner. However, this mechanism seems far away from being incorporated into prediction market platforms usable by corporate entities.)
**3\. Questions are likely to be resolved**
* Some markets resolve ambiguously or don't resolve at all in certain circumstances. For example,
* if markets are conditional on another event and the conditional doesn't happen
* e.g. [_Conditional on President Trump being convicted of "incitement of insurrection," what will the Senate's average Bipartisan Index score be from 2021-2022?_](https://www.cset-foretell.com/questions/108); or
* if resolution criteria are no longer evaluable e.g. a resolution referencing to a rating that stops existing or changes dramatically
* e.g. [_What will U.S. holiday season retail sales be for 2020 relative to the 2019 holiday season?_](https://www.gjopen.com/questions/1672-what-will-u-s-holiday-season-retail-sales-be-for-2020-relative-to-the-2019-holiday-season) was voided because the data provider changed methodology and mentioned that methodology may be adjusted again.
* Such questions lock in money and offer a much reduced expected return, and hence disincentivize participation. More precisely, participants win (or lose) nothing if the question is voided. If this happens with non-negligible probability, expected returns are reduced proportionally to that chance
**4\. Sources of disagreements and profits (also known as "suckers at the table")**
* Prediction markets require sources of disagreements. These can either be:
* Direct subsidies
* VC money willing to subsidize a market to capture market share, like on Polymarket.
* An external actor willing to lose money to elicit more accurate probabilities.
* Traders not looking to make money, but who still seek to extract value
* Actors hedging against risk, like airlines or countries hedging against movement in oil prices.
* Actors who want to manipulate prices, like parties who want to inflate their perceived chances of winning.
* Unsophisticated traders
* Inexperienced or naïve traders,
* Gamblers who bet based on emotional reasons (e.g., on their favourite team, or on principle),
* Traders who think they know something, but are mistaken.
**5\. Limited hidden information**
Limited hidden information, or lack of insider trading, is also a factor that influences whether traders are attracted to questions. However, we will delegate discussion to a footnote. \[cf. insider-trading\]
\[cf. insider-trading\]: Zvi considers:
Limited Hidden Information
* Insider trading does add more information to the market.
* However, insider trading—or the suspicion that it could happen—drives other traders away
* "The first season of Survivor, there was a market on who would win. The production crew found out. Then there was no market."
* If some group of actors can control the outcome, it might also not be worth it to bet against them.
* Access to information might be seen as unfair or, in more extreme cases, as favouritism. (We elaborate on that in the section about social costs.)
This last point interacts strangely with well-subsidized markets in the corporate setting because:
* All employees are "inside traders", and this is well-known and expected from the start
* It is not a problem if only insider traders participate, since they would be doing so against the liquidity pool set-up by the employer. And the employer would get the information regardless, which fulfills the purpose of the prediction market.
* Insider-trading can be noticed, and other participants could update on the moves of others. However, note that this leads to an arms-race dynamic in terms of sneakiness and counter-sneakiness.
A 1998 article by Bainbridge, [Insider Trading: An Overview](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=132529) provides a list of 261 papers that have discussed insider trading, and then succinctly summarizes the arguments for and against allowing insider trading as follows:
* In favour of deregulating insider trading: Insider trading causes the market price of the affected security to move toward the price that the security would command if the inside information were publicly available. If so, both society and the firm benefit through increased price accuracy. Secondly, insider trading might be an efficient way of compensating entrepreneurially inclined managers for having produced information and value. If so, the firm benefits directly (and society indirectly) because managers have a greater incentive to produce additional value to the firm.
* In favour of regulating insider trading: Traditionally the argument was based on fairness issues, which had little traction in the law and economics communities. Instead, the economic argument in favour of mandatory insider trading prohibitions has typically rested on features of the economics of property rights in information: a ban would reduce adverse selection costs and perverse incentives, increase liquidity, improve confidence in the market, reduce interference in corporate plans, and motivate large shareholders to monitor management instead of seeking to profit from inside information.
But even if a market attracts as many dedicated traders as one can inside a company, this can still not be enough to produce accurate probabilities:
1. Most employees may have no rational basis on which to bet. If someone knows the truth, but is otherwise locked out from credibly signalling that knowledge to management, something is wrong with the organization of the company \[cf. anonymous\]. The small prizes from corporate prediction markets wont be enough to elicit that knowledge from them in any case.
2. Too small a scale: Prediction markets at companies are too small. For example, maybe they are not liquid enough, and thus inefficient, or they have such a small number of questions that scoring is not really possible. In particular, we might expect overconfident forecasters to make some gains at the beginning, and for the market to not be able to realize this until overconfident predictors have gone bust for the first time. This might result in the initial estimates from the prediction markets being biased by overconfident players.
\[cf. anonymous\]: In that case, an anonymous market might indeed reveal said information because betting on an anonymous market is low threat vs. openly speaking up. On the other hand, making markets credibly anonymous would add additional overhead.
### The market must not have too large negative side-effects, such as costs to the company's dynamics and morale
Prediction markets, being an infrequently used tool, may also have unintentional negative side effects:
1. As mentioned in the above section, prediction markets may be generally too politically disruptive, lacking the social intelligence to present their conclusions in the right way. To add a few specific examples:
* It interferes with coalition politics. To quote Robin Hanson: "You have a couple of job candidates and you want to hire the best one for the company supposedly right? Well, I think actually when a person volunteers to be in charge of a hiring committee they dont intend to pick the best person for the company. They intend to pick the best person for their coalition in the company. Forcing these metrics of who is best for the company would interfere with their plan to pick someone decent for the company but even better for their coalition. \[...\] You would be uncomfortable setting up this process that didnt give you the flexibility to pretend to do A while really doing B."
* There is also general cultural pressure. Per [The Elephant in the Brain](https://en.wikipedia.org/wiki/The_Elephant_in_the_Brain) our behaviours and beliefs are optimized for living in a social group and very often we are self-deceptive and unaware of our motives. Prediction markets might reveal these hypocrisies: the disagreement between the likely outcome and our stated goals. This has the potential to be very unpleasant and awkward for everyone involved.
* For example, sometimes companies hire consultants to deal with workplace problems. These consultants sometimes serve more like a face-saving device than an actual solution (which might be fine). Having a prediction market around their effect will be awkward. \[cf. consultants\]
2. Prediction markets threaten the hierarchical control of top managers. It would become too obvious that most managers are less capable of planning/predicting the future than their confident behaviour suggests.
* And this behaviour might not be irrational, bad or harmful (if not revealed) as leadership is not about making accurate quantitative forecasts.
3. Prediction markets make a big chunk of the bettors into "losers." Yet within a company, morale is very important. Businesses proceed by soliciting feedback, and by reshaping their plans to pretend that everyone is on board and has an ego stake in the final outcome. Prediction markets make this coordination more difficult. Once people make bets, they start rooting for their bet to win and for the other bet to lose. They move away from maximizing the value of the firm and develop an oppositional mentality vis-a-vis other employees. Furthermore, it is disruptive to have a running tally on who are the winners and losers each day.
4. When reward systems are created, employees view them as a means to distribute further privileges to insiders and favourites. Prediction markets would be viewed the same way. Who else is going to win all those bets? Do corporations really need more insider favouritism?
* For example, companies might have both quantitatively oriented employees and non-quantitatively oriented ones, and adding predictions markets might be perceived or construed as a way to dole out rewards to the former.
\[cf. consultants\]: See also: [Too Much Consulting?](https://www.overcomingbias.com/2012/01/why-so-much-consulting.html)
## Other Information Aggregation Mechanisms
### External platforms
Forecasting tournaments such as Metaculus, forecasting services such as Good Judgment or Maby forecasting, or public prediction markets such as Polymarket seem like a superior alternative to internal prediction markets for questions that don't require large amounts of context.
Metaculus is cheaper; a tournament or many questions costs on the order of $1000 to $5000, and may provide access to the inner reasoning of forecasters if incentives are structured correctly. However, quality and engagement may depend on the ability to present interesting questions to forecasters. Metaculus can produce somewhat sensible long-term forecasts.
Polymarket is much more expensive, on the order of $500 to $3000 on "sacrificed" liquidity needed to entice trades per question. So far, it is also limited to short term questions. However, the concept of winning money attracts very sharp sharks to even otherwise extremely dry questions. They are only just experimenting with including external questions.
Hypermind is a French forecasting platform/prediction market whose chief advantage is that its legal status is clear. Kalshi is based in the US, and its legal status is also clear, but is newer and thus might see lower volume. FTX might be particularly suitable for hedging for relatively large amounts, but it's unclear how one would go about coordinating this.
Good Judgment offers a degree of legibility, forecaster professionalism and experience which is not really available anywhere else. In particular, variance in forecast quality is likely to be lower—Good Judgment forecasts are likely to be of consistently high quality. However, it is much more expensive and inflexible. They may also be slightly overconfident. Forecasts can be confidential.
Other forecasting consultancies such as Maby forecasting, Azul Foresight or our own Samotsvety Forecasting are likely to be cheaper, but also more capacity constrained and less formidable than Good Judgment (but still fairly formidable). For instance, the two Maby forecasters were top Good Judgment superforecasters, but are still likely to be somewhat less effective in the absence of a broader forecasting team. They are also likely to have their own strengths; I view Azul Foresight as being strong on red-team aspects of forecasting.
### Specialized machine learning/data-analysis systems
Unlike prediction markets, startups that aim to predict a very narrow slice of reality have seen some success and high valuations. By restricting predictions to only one specific metric, current ML/data analysis pipelines can be thrown at that metric and the marginal cost of an additional prediction becomes very low. This is in contrast to prediction markets run by humans, in which the marginal cost of new predictions generally remains high.
Some instances of this kind of service in the management space:
* Evidence Based Scheduling: [FogBugz](https://fogbugz.com/evidence-based-scheduling/), [Liquid Planner](https://www.liquidplanner.com/). See also: [Evidence Based Scheduling](https://www.joelonsoftware.com/2007/10/26/evidence-based-scheduling/) from Joel on Software.
* Employee retention/turnover: [Peakon](https://peakon.com/solutions/experience-and-retention/) specializes in this. [This guide](https://bonus.ly/employee-retention-guide/employee-retention-tools) contains a guide to employee retention tools by a company that offers an adjacent service.
* Sales/revenue forecasting. Searching for those keywords in Google uncovers plenty of alternatives. As a highlight, [People.ai](https://people.ai/) recently raised $100M on a $1.1 billion valuation, and was acquired by SalesForce. [HSBC](https://www.business.hsbc.uk/en-gb/tomorrow-ready-programme/optimise-your-cash-flow-and-finance) also offers cash flow forecasting.
These are generally SaaS startups. Once they get started, they can improve their predictive power by learning from their different clients, and give each client better predictions than if each only had access to their own data. On the negative side, they often come bundled with other products, like general HR software, and they might make clients dependent on the startup.
On the more elaborate end, the [Makridakis 5](https://en.wikipedia.org/wiki/Makridakis_Competitions#Fifth_competition,_started_on_March_3,_2020,_ended_on_July_1,_2020) competition can also be viewed as a [sanity check](https://www.sciencedirect.com/science/article/abs/pii/S0169207021001023?via%3Dihub) on Wallmart's own sales forecasting methodology, and as a more expensive and elaborate forecasting product.
### Internal forecasting competitions
Forecasting competitions are the main alternative to prediction markets. Unlike prediction markets, they don't have to be zero-sum competitions, but on the other hand, incentives are more loosely aligned. Nonetheless, most of the analysis above about the lack of adoption inside companies still applies.
Right now, the platforms which can support ongoing forecasting would be Cultivate Labs, Foretold, and maybe private instances of Metaculus or of Maby's platform. If money is not much of a concern, Cultivate Labs would be convenient, and does have the capacity to provide "forecasting competitions as a service". Otherwise, depending on in-house forecasting capabilities, creating one's own forecasting platform might be possible. Based on our experience with Foretold, I'd recommend against it because it is easy to underestimate the amount of hassle.
Note that—unlike prediction markets—[forecasting platforms](https://arxiv.org/abs/2106.11248) are very difficult to incentivize correctly. In particular, rewarding only the top forecasters incentivizes high-variance strategies. And although the hope for forecasting competitions might be that one also gets access to the comments, this is normally strictly disincentivized by scoring rules.
Further, if one tries to have a proper scoring rule \[cf. proper-scoring\] for a forecasting competition, this in our experience ends up producing dynamics that are not too different from prediction markets. One can also take care to design cooperative scoring rules, but these are generally underdeveloped and there isn't much literature studying their effects. There is also a tension between incentivizing effort and incentivizing collaboration.
Lastly, consider that forecasting platforms like Metaculus or Good Judgment Open are paying significantly below market rate, or getting essentially free labour. In the case of Metaculus, this is done by providing a product in some ways better than PredictionBook for keeping track of one's predictions, by incentivizing users with reputational benefits, and by appealing to users' altruistic motives. In the case of Good Judgment Open, users are enticed by the chance to attain the title of "superforecaster". It's unclear whether forecasting competitions within a company would be able to replicate a similarly strong appeal without recourse to monetary incentives.
**Judgemental forecasting might be better than prediction markets in some cases**
In Superforecasting Tetlock writes:
> The results were clear-cut each year. Teams of ordinary forecasters beat the wisdom of the crowd by about 10%. Prediction markets beat ordinary teams by about 20%. And superteams beat prediction markets by 15% to 30%. I can already hear the protests from my colleagues in finance that the only reason the superteams beat the prediction markets was that our markets lacked liquidity… they may be right. It is a testable idea, and one worth testing.
We guess that this might be due to prediction markets having much higher overhead than judgemental forecasting. On prediction markets, winning/losing is instantaneous, so markets give a premium on constant vigilance. Vigilance is stressful and takes a lot of attention, which is especially detrimental to the productivity of cognitive workers.
A forecaster, on the other hand, can update his forecast every other week, and doesn't lose many points due to not responding to breaking news fast enough. Or, if a fraction of forecasters catches any particular new development on time, this can be incorporated into the aggregate earlier by weighing more recent forecasts more.
\[cf. proper-scoring\]: I.e., one in which inputting one's true belief is incentivized.
### Delphi Method
The Delphi Method is a well-known method for collaborative forecasting. Its chief characteristics are:
* Participants are anonymous to each other
* There are a series of rounds in which:
* Participants write down their quantified perspectives and/or the reasons for these.
* Participants read other's perspectives and/or a statistical summary thereof.
* and then make their all-things-considered forecast
This is thought to help counteract various biases of normal face-to-face meetings—e.g., more extroverted or self-confident participants dominating the conversation. For a brief introduction, see pages 7 to 13 of "Analysis of the Future: The Delphi Method", an early paper by a RAND corporation researcher on the topic.
### Automatic Prediction Markets, Pseudo Prediction Markets
An automatic prediction market is one in which participants merely enter their honest probabilities, and their bets are calculated based on that probability. This allows one to track the accuracy of the different participants, by looking at their increasing or decreasing budgets. It might also avoid the zero-sum strategic trading aspect of current prediction markets if all participants bet against the house (more technically, if each participant bets against an automated market-maker whose liquidity is given by the house.)
For instance, in current prediction markets, a common trading pattern is to flip shares close to $0, e.g, to buy at $0.02 and sell at $0.04, even if one ultimately strongly suspects that the underlying event will not happen. This might be profitable, but may not contribute to increased market accuracy, and, crucially, would not happen in an automatic prediction market design.
The Kelly criterion can be automatically used as the rule for automatically determining betting amounts. Per [Learning Performance of Prediction Markets with Kelly Bettors](https://arxiv.org/abs/1201.6655), the market price in a market with participants willing to bet according to the Kelly criterion corresponds to the average of participants' private forecasts, weighted by their budget.
As some participants do better, their opinion is counted more within the aggregate. This automatic prediction market design could be combined with the Delphi method to produce a setup like the following:
* Participants reveal their initial predictions (and these are recorded)
* Participants outline the reasons for their predictions, with each participant being allocated an equal amount of time.
* Participants update their predictions (and these are recorded)
* Perhaps secretly: A prediction market is simulated using the methods from [Learning Performance of Prediction Markets with Kelly Bettors](https://arxiv.org/pdf/1201.6655.pdf). This produces a principled probability aggregate that takes into account the past accuracy of participants.
For a slower version more faithful to the original Delphi method, participants can anonymously write down both their predictions and their reasoning, and then anonymously update after reading everyone else's reasoning. One would then run two prediction markets for all questions, the one before reading others' probabilities and reasoning, and the one after.
### Low-tech options: Surveys and Interviews
Besides prediction markets, forecasting tournaments or Delphi methods, the option of doing very low-tech stuff—like sending out surveys or asking people what their informed opinion is on some—topic, still remains. One can also just hang a large post-it sheet in a public place and use a simple system like [BitBets](https://forum.effectivealtruism.org/posts/B8aWmfnWESxSQCxEL/bitbets-a-simple-scoring-system-for-forecaster-training) to give out rewards.
With regards to surveys specifically, [this](https://forum.effectivealtruism.org/posts/DCcciuLxRveSkBng2/a-review-of-two-books-on-survey-making) might be a good introduction to biases and pitfalls to take into account when designing surveys. We would also recommend abandoning Google Surveys and using [TypeForm](https://www.typeform.com/), which has beautiful typography and more functionalities.
## Conclusion
We:
* Considered the academic consensus on prediction markets
* Covered the corporate track record of prediction markets
* Discussed hypotheses for the lack of adoption of prediction markets
* Discussed alternative information aggregation mechanisms
At this point, because we do not get to observe the reasons for the lack of adoption of prediction markets, and because we are analyzing a complex question, we have slightly different overall views:
### Nuño Sempere
Nuño thinks that Upstart should not jump to using prediction markets, because the reasons for their failure are poorly understood. That is, we are much more sure about the fact that prediction markets fail to gain adoption than about any particular explanation of why this is.
To some extent, this conclusion depends on to what extent one thinks that something like the [efficient market hypothesis](https://en.wikipedia.org/wiki/Efficient-market_hypothesis) holds in management techniques. And I tend to think that it holds enough to not make prediction markets an attractive intervention \[cf. misha-disagrees\]. In particular, I think that if prediction markets were as good as advertised, they would have been adopted after initial trials, and then quickly spread. More specifically, I think that if prediction markets/forecasting competitions among employees were worth it, companies such as Google or Microsoft—which we know have experimented with prediction markets—would already be using them. I'm also very hesitant about suggesting interventions that may disrupt (in a negative way) current mechanisms.
Moreover, I observe that machine-learning or model-based or data-analysis solutions on forecasting weather, pandemics, supply chain, sales, etc. are happily adopted, and the startups that produce them reach quite high valuations. When trying to explain why prediction markets are not adopted, this makes me favor explanations based on high overhead, low performance and low applicability over Robin Hanson-style explanations based on covert and self-serving status moves.
Although prediction markets seem like an interesting technology, developing them to maturity seems too costly for a company centered on something else. Going on on a tangent, I'm also not sure that a startup that developed prediction markets would be able to capture enough of a percentage of the value of decision improvements to become profitable.
Many of the reasons for the lack of adoption of prediction markets also seem like they would apply to forecasting tournaments with proper incentives, unless one takes particular care to make them collaborative.
The above is more strongly worded than Misha and Eli's more nuanced perspectives below, with which I also partially agree. This might be a reaction of being in shock at the strong contrast between the optimism in the academic literature and the realities of prediction markets being hard to usefully implement in practice.
Like Eli below, I am also in favour of starting with small interventions and titrating one's way towards more significant ones. In particular, I feel attracted to:
* Delphi-like automatic prediction markets built on top of dead-simple polls, and/or
* forecasting setups using collaborative scoring rules—e.g., in which all employees (perhaps including \[private\]) play against \[private\]'s initial guess.
\[cf. misha-disagrees\]: Misha holds the opinion that the efficient market hypothesis obviously doesn't hold (a lot of good techniques are not used or widely known; management was and is improving a lot; there are a bunch of startups delivering value through improving management). And he feels that is not really relevant for head-to-head comparison between different management techniques.
### Misha Yagudin
Misha thinks that Upstart shouldn't adopt company-wide prediction markets because prediction markets are disruptive to the social fabric and their effects on morale and culture are unclear. Further, Upstart is fairly small and markets shine when there are a lot of self-selected skilled participants. It seems better to select good managers who are excellent at conventional tools.
At the same time, I think Upstart might want to consider niche markets or even better forecasting tournaments (because these, as far as I can tell, take less attention and might get fairly accurate). The following condition should be met for these forecasting systems to feel reasonable to me:
* should not mess up the social fabric too much;
* should target people who might be a good cultural fit
* once again not to disturb up social fabric too much and to get people who might be better at probabilistic reasoning
* e.g. engineers and especially data scientists
* should focus participants attention on something worthy
First, from Microsoft report and other sources, we know that prediction markets are quite accurate at predicting how long something will take.
If there are teams that are already using flexible approaches to development (like Scrum), it might be a fun idea to try a market or forecasting competition on top of it. This market is likely to be fairly active as new information comes regularly. And it is pretty fair because everyone syncs during stand-ups. Further, it's unlikely to affect managers in any way because they are already exposed by effort estimates produced via Scrum.
Second, another Microsoft success was with predicting something fuzzy like external product reviews. This is an understandable success because employees have different perspectives and contributed to different aspects of the product.
Something similar might be found in Upstart's workflow, for example, markets about how valuable new data sources will be (in terms of improved metrics). Further, this is what developers should care about; and developing a sense for better data seems beneficial.
### Eli Lifland
Eli feels fairly uncertain about what Upstart should do since this is a complicated issue. He would be more certain if he had a better model backed by more evidence on why existing corporate prediction markets have usually disappeared.
His best guess for what Upstart should do is:
1. To the extent that Upstart is interested in public predictions like macroeconomic trends and thinks more perspectives could benefit, solicit forecasts from platforms like Metaculus.
2. For internal predictions, start with interventions that take the least amount of employee time and are the least likely to damage morale then work your way up from there as it continues to be helpful.
* For example, maybe start with using the Delphi method for one-time predictions on \[private\], or the value of projects for trying to improve Upstart's ML model.
* If these go well, consider expanding to a small prediction market with similarly benign questions, before adding questions about e.g. project timelines.
1. A lot of the value that prediction markets could provide may be achieved through cultivating a culture of making quantified predictions to justify decisions and encouraging feedback/input from all members of the team on these types of predictions. You may already be doing this. This could be an 80/20 solution that brings a lot of the benefits without as much downside risk.
One uncertainty he has is about how strongly to take the weight of the evidence that many companies have started using prediction markets but then stopped; it's unclear to what extent this was because prediction markets actually don't provide more costs than benefits, versus other factors such as managers not prioritizing maintaining them due to undermining the managers' value or the improvements being hard to measure.
One's belief about this is likely correlated to one's belief about how close to maximally efficient for-profit businesses are given the efficient market hypothesis. Eli feels they are fairly inefficient but is not sure whether they are efficient to such an extent that the lack of adoption of prediction markets is enough evidence to recommend not trying them out.

Binary file not shown.

After

Width:  |  Height:  |  Size: 119 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 114 KiB

View File

@ -0,0 +1,182 @@
Samotsvety Nuclear Risk Forecasts — March 2022
==============
_Thanks to Misha Yagudin, Eli Lifland, Jonathan Mann, Juan Cambeiro, Gregory Lewis, @belikewater, and Daniel Filan for forecasts. Thanks to Jacob Hilton for writing up an earlier analysis from which we drew heavily. Thanks to Clay Graubard for sanity checking and to  Daniel Filan for independent analysis. This document was written in collaboration with Eli and Misha, and we thank those who commented on an earlier version._
This writeup was originally posted on the [EA Forum](https://forum.effectivealtruism.org/posts/KRFXjCqqfGQAYirm5/samotsvety-nuclear-risk-forecasts-march-2022), which has some good discussion.
## Overview
In light of the war in Ukraine and fears of nuclear escalation[\[1\]](#fn1tt4cl1ut8si), we turned to forecasting to assess whether individuals and organizations should leave major cities. We aggregated the forecasts of 8 excellent forecasters for the question _**What is the risk of death in the next month due to a nuclear explosion in London?**_ Our aggregate answer is 24 micromorts (7 to 61) when excluding the most extreme on either side[\[2\]](#fnl2youss95ij). A micromort is defined as a 1 in a million chance of death. Chiefly, we have a low baseline risk, and we think that escalation to targeting civilian populations is even more unlikely. 
For San Francisco and most other major cities[\[3\]](#fncfroiodaps), we would forecast 1.5-2x lower probability (12-16 micromorts). We focused on London as it seems to be at high risk and is a hub for the effective altruism community, one target audience for this forecast.
Given an estimated 50 years of life left[\[4\]](#fnpt69xmy0uqf), this corresponds to ~10 hours lost. The forecaster range without excluding extremes was <1 minute to ~2 days lost. Because of productivity losses, hassle, etc., we are currently not recommending that individuals evacuate major cities. 
## Methodology
We aggregated the forecasts from eight excellent forecasters between the 6th and the 10th of March. [Eli Lifland](https://www.elilifland.com/), [Misha Yagudin](https://forum.effectivealtruism.org/users/misha_yagudin), [Nuño Sempere](https://nunosempere.com/), [Jonathan Mann](https://jonathanmann.github.io/) and [Juan Cambeiro](https://twitter.com/juan_cambeiro)[\[5\]](#fntfhokbz7tk) are part of Samotsvety, a forecasting group with a good track record — we won CSET-Foretells first two seasons, and have great track records on various platforms. The remaining forecasters were [Gregory Lewis](https://www.fhi.ox.ac.uk/team/lewis-gregory/)[\[6\]](#fnmz4wxunrpnc), @belikewater, and [Daniel Filan](https://danielfilan.com/), who likewise had good track records. 
The overall question we focused on was: _**What is the risk of death in the next month**_[\[7\]](#fnazu0qsuph3h) _**due to a nuclear explosion in London?**_. We operationalized this as: “If a nuke does not hit London in the next month, this resolves as 0. If a nuke does hit London in the next month, this resolves as the percentage of people in London who died from the nuke, subjectively down-weighted by the percentage of reasonable people that evacuated due to warning signs of escalation.” We roughly borrowed the question operationalization and decomposition from [Jacob Hilton](https://docs.google.com/document/d/17q-Ok4EVV42IscLMFOLztht7i0iLiALx0DFcX3xLn-A/edit?pli=1#heading=h.9vfmnuhgbjzv).
We broke this question down into:
1. What is the chance of nuclear warfare between NATO and Russia in the next month?
2. What is the chance that escalation sees central London hit by a nuclear weapon conditioned on the above question?
3. What is the chance of not being able to evacuate London beforehand?
4. What is the chance of dying if a nuclear bomb drops in London?
However, different forecasters preferred different decompositions. In particular, there were some disagreements about the odds of a tactical strike in London given a nuclear exchange in NATO, which led to some forecasters preferring to break down (2.) into multiple steps. Other forecasters also preferred to first consider the odds of direct Russia/NATO confrontation, and then the odds of nuclear warfare given that. 
## Our aggregate forecast
<img src='https://i.imgur.com/oZHAz7q.png' class='.img-medium-center'>
We use the aggregate with min/max removed as our all-things-considered forecast for now given the extremity of outliers. We aggregated forecasts using the geometric mean of odds[\[8\]](#fnt1dm5d62pkl).
Note that we are forecasting one month ahead and its quite likely that the crisis will get less acute/uncertain with time. Unless otherwise indicated, we use “monthly probability” for our and readers' convenience.
## Comparisons with previous forecasts
We compared the decomposition of our forecast to [Jacob Hiltons](https://docs.google.com/document/d/17q-Ok4EVV42IscLMFOLztht7i0iLiALx0DFcX3xLn-A/edit?pli=1#) to understand the main drivers of the difference. We compare to Jacobs revised forecast he made after reading comments on his document. Note that Jacob forecasted on the time horizon of the whole crisis then estimated 10% of the risk was incurred in the upcoming week. We guess that he would put roughly 25% over the course of a month which we forecasted (adjusting down some from weekly \* 4), and assume so in the table below. The numbers we assign to him are also approximate in that our operationalizations are a bit different than his.
<img src='https://i.imgur.com/dNBEHAT.png' class='.img-medium-center'>
We are ~an order of magnitude lower than Jacob. This is primarily driven by (a) a ~4x lower chance of a nuclear exchange in the next month and (b) a ~2x lower chance of dying in London, given a nuclear exchange.
(a) may be due to having a lower level of baseline risk before adjusting up based on the current situation. For example, while [Luisa Rodríguezs analysis](https://forum.effectivealtruism.org/posts/PAYa6on5gJKwAywrF/how-likely-is-a-nuclear-exchange-between-the-us-and-russia) puts the chance of a US/Russia nuclear exchange at .38%/year. We think this seems too high for the post-Cold War era after new [de-escalation methods have been implemented](https://en.wikipedia.org/wiki/Moscow%E2%80%93Washington_hotline#Background) and lessons have been learnt from close calls. Additionally, we trust the superforecaster aggregate the most out of the estimates aggregated in the post.
(b) is likely driven primarily by a lower estimate of London being hit at all given a nuclear exchange. Commenters mentioned that targeting London would be a good example of a [decapitation strike](https://en.wikipedia.org/wiki/Decapitation_strike#In_nuclear_warfare). However, we consider it less likely that the crisis would escalate to targeting massive numbers of civilians, and in each escalation step, there may be avenues for de-escalation. In addition, targeting London would invite stronger retaliation than meddling in Europe, particularly since the UK, unlike countries in Northern Europe, is a nuclear state. 
A more likely scenario might be Putin saying that if NATO intervenes with troops, he would consider Russia to be "existentially threatened" and that he might use a nuke if they proceed. If NATO calls his bluff, he might then deploy a small tactical nuke on a specific military target while maintaining lines of communication with the US and others using the [red phone](https://en.wikipedia.org/wiki/Moscow%E2%80%93Washington_hotline). 
## Appendix A: Sanity checks
We commissioned a sanity check from [Clay Graubard](https://twitter.com/ClayGraubard?ref_src=twsrc%5Egoogle%7Ctwcamp%5Eserp%7Ctwgr%5Eauthor), who has been following the situation in Ukraine more closely. His somewhat rough comments can be found [here](https://docs.google.com/document/d/18TJLYBXLb3XhNr6BsAM1044V5Zvtxo_xl8B7q6sYjW4/edit).
Graubard estimates the likelihood of nuclear escalation in Ukraine to be fairly low (3%: 1 to 8%), but didnt have a nuanced opinion on escalation beyond Ukraine to NATO (a very uncertain 55%: 10 to 90%). Taking his estimates at face value, this gives a 1.3%/yr of nuclear warfare between Russia and NATO, which is in line with our 0.8 %/yr estimate. 
He further highlighted further sources of uncertainty, like the likelihood that the US would send [anti-long range ballistic missile interceptors](https://carnegieendowment.org/2020/11/19/new-u.s.-missile-defense-test-may-have-increased-risk-of-nuclear-war-pub-83273), which the UK itself doesnt have. He also pointed out that in case of a nuclear bomb dropping in a highly populated city, Putin might choose to give a warning. 
Daniel Filan also independently wrote up his own thoughts on the matter: his more engagingly written reasoning can be found [here](https://docs.google.com/document/d/10UeqFuhrdew21DCENeQfSfZTKvWrKaspoE79w0msPf4/edit) (shared with permission): he arrives at an estimate of ~100 micromorts. We also incorporated his forecasts into our current aggregate.
We also got [reviewed by a nuclear expert](https://forum.effectivealtruism.org/posts/W8dpCJGkwrwn7BfLk/nuclear-expert-comment-on-samotsvety-nuclear-risk-forecast-2). Their estimate is an order of magnitude larger, but as we point in [a response](https://forum.effectivealtruism.org/posts/W8dpCJGkwrwn7BfLk/nuclear-expert-comment-on-samotsvety-nuclear-risk-forecast-2?commentId=PRkbcuTRDi6s2seLj), this comes down to thinking that “core EAs” would not be able to evacuate on time (3x difference) and using suboptimal aggregation methods for Luisa Rodríguezs collection of estimates (another 1.5x to 3x reduction). After the first adjustment, their forecast is within our intra-group range; after the second adjustment, he is really close to our estimate. Some of our forecasters found this reassuring (but updated somewhat on other disagreements.) Given that experts are generally more pessimistic and given selection effects, we think overall that review is a good sign  for us.
Update 2022/05/11: Zvi Moskovitz also gives [his own estimates](https://thezvi.substack.com/p/ukraine-post-8-risk-of-nuclear-war?s=r), and ends up between the Samotsvety forecast and the above-mentioned nuclear expert.
Update 2022/05/01: Peter Wildeford [looks](https://www.pasteurscube.com/are-nuclear-close-calls-getting-rarer/#fn6) at the chance of nuclear war.
## Appendix B: Tweaking our forecast
Here are a few models one can play around with by copy-and-pasting them into the [Squiggle alpha](https://playground.squiggle-language.com/dist-builder).
### Simple model
```
russiaNatoNuclearexchangeInNextMonth = 0.00067
londonHitConditional = 0.18
informedActorsNotAbleToEscape = 0.25
proportionWhichDieIfBombDropsInLondon = 0.78
probabilityOfDying = russiaNatoNuclearexchangeInNextMonth *
londonHitConditional *
informedActorsNotAbleToEscape *
proportionWhichDieIfBombDropsInLondon
remainlingLifeExpectancyInYears = 40 to 60
daysInYear=365
lostDays=probabilityOfDying*remainlingLifeExpectancyInYears*daysInYear
lostHours=lostDays*24
lostHours ## Replace with mean(lostDays) to get an estimate in days instead
```
### Overcomplicated models
These models have the advantage that the number of informed actors not able to escape, and the proportion of Londoners who die in the case of a nuclear explosion are modelled by ranges rather than by point estimates. However, the estimates come from individual forecasters, rather than representing an aggregate (we werent able to elicit ranges when our forecasters were convened).
**Nuño Sempere**
```
firstYearRussianNuclearWeapons = 1953
currentYear = 2022
laplace(firstYear, yearNow) = 1/(yearNow-firstYear+2)
laplacePrediction= (1-(1-laplace(firstYearRussianNuclearWeapons, currentYear))^(1/12))
laplaceMultiplier = 0.5 # Laplace tends to overestimate stuff
russiaNatoNuclearexchangeInNextMonth=laplaceMultiplier*laplacePrediction
londonHitConditional = 0.16 # personally at 0.05, but taking the aggregate here.
informedActorsNotAbleToEscape = 0.2 to 0.8
proportionWhichDieIfBombDropsInLondon = 0.6 to 1
probabilityOfDying = russiaNatoNuclearexchangeInNextMonth *
londonHitConditional *
informedActorsNotAbleToEscape *
proportionWhichDieIfBombDropsInLondon
remainlingLifeExpectancyInYears = 40 to 60
daysInYear=365
lostDays=probabilityOfDying*remainlingLifeExpectancyInYears*daysInYear
lostHours=lostDays*24
lostHours ## Replace with mean(lostDays) to get an estimate in days instead
```
<img src='https://i.imgur.com/ugrKyeC.png' class='.img-medium-center'>
**Eli Lifland**
Note that this model was made very quickly out of interest and I wouldnt be quite ready to endorse it as my actual estimate (my current actual median is 51 micromorts so ~21 lost hours).
```
russiaNatoNuclearexchangeInNextMonth=.0001 to .003
londonHitConditional = .1 to .5
informedActorsNotAbleToEscape = .1 to .6
proportionWhichDieIfBombDropsInLondon = 0.3 to 1
probabilityOfDying = russiaNatoNuclearexchangeInNextMonth *
londonHitConditional *
informedActorsNotAbleToEscape *
proportionWhichDieIfBombDropsInLondon
remainingLifeExpectancyInYears = 40 to 60
daysInYear=365
lostDays=probabilityOfDying*remainingLifeExpectancyInYears*daysInYear
lostHours=lostDays*24
lostHours ## Replace with mean(lostDays) to get an estimate in days instead
```
<img src='https://i.imgur.com/tE2XiXy.png' class='.img-medium-center'>
## Footnotes
1. **[^](#fnref1tt4cl1ut8si)**
See e.g. [here](https://forum.effectivealtruism.org/posts/2KRqH5wsymqvhGQge/how-are-you-keeping-it-together) and [here](https://forum.effectivealtruism.org/posts/TkLk2xoeE9Hrx5Ziw/nuclear-attack-risk-implications-for-personal-decision)
2. **[^](#fnrefl2youss95ij)**
3.1 (0.0001 to 112.5) including the most extreme to either side.
3. **[^](#fnrefcfroiodaps)**
Excluding those with military bases
4. **[^](#fnrefpt69xmy0uqf)**
This could be adjusted to consider life expectancy and quality of life _conditional_ on nuclear exchange
5. **[^](#fnreftfhokbz7tk)**
who is also a Superforecaster®
6. **[^](#fnrefmz4wxunrpnc)**
Likewise a Superforecaster®
7. **[^](#fnrefazu0qsuph3h)**
By April the 10th at the time of publication
8. **[^](#fnreft1dm5d62pkl)**
See [When pooling forecasts, use the geometric mean of odds](https://forum.effectivealtruism.org/posts/sMjcjnnpoAQCcedL2/when-pooling-forecasts-use-the-geometric-mean-of-odds). Since then, the author has proposed a [more complex method](https://forum.effectivealtruism.org/posts/biL94PKfeHmgHY6qe/principled-extremizing-of-aggregated-forecasts) that we havent yet fully understood, and is more at risk of overfitting. Some of us also feel that aggregating the deviations from the base rate is more elegant, but that method has likewise not been tested as much.

View File

@ -0,0 +1,84 @@
Samotsvety's AI risk forecasts
==============
_Crossposted to_ _[the EA Forum](https://forum.effectivealtruism.org/posts/EG9xDM8YRz4JN4wMN/samotsvety-s-ai-risk-forecasts)_, [_LessWrong_](https://www.lesswrong.com/posts/YMsD7GA7eTg2BafQd/samotsvety-s-ai-risk-forecasts) _and_ [_Foxy Scout_](https://www.foxy-scout.com/samotsvetys-ai-risk-forecasts/)
## Introduction
In [my review of What We Owe The Future](https://forum.effectivealtruism.org/posts/9Y6Y6qoAigRC7A8eX/my-take-on-what-we-owe-the-future) (WWOTF), I wrote:
> Finally, Ive updated some based on my experience with [Samotsvety forecasters](https://samotsvety.org) when discussing AI risk… When we discussed the report on power-seeking AI, I expected tons of skepticism but in fact almost all forecasters seemed to give >=5% to disempowerment by power-seeking AI by 2070, with many giving >=10%.
In the comments, [Peter Wildeford asked](https://forum.effectivealtruism.org/posts/9Y6Y6qoAigRC7A8eX/my-take-on-what-we-owe-the-future?commentId%3DcB2FnhFRJujCpF6Dn%23comments):
> It looks like Samotsvety also forecasted AI timelines and AI takeover risk - are you willing and able to provide those numbers as well?
We separately received a request from the [FTX Foundation](https://ftxfoundation.org/) to forecast on 3 questions about AGI timelines and risk.
I sent out surveys to get Samotsvetys up-to-date views on all 5 of these questions, and thought it would be valuable to share the forecasts publicly.
A few of the headline aggregate forecasts are:
1. 25% chance of misaligned AI takeover by 2100, barring pre-[APS-AI](https://docs.google.com/document/d/1smaI1lagHHcrhoi6ohdq3TYIZv0eNWWZMPEy8C8byYg/edit%23heading%3Dh.14onymzb0y9) catastrophe
2. 81% chance of [Transformative AI](https://www.openphilanthropy.org/research/some-background-on-our-views-regarding-advanced-artificial-intelligence/) (TAI) by 2100, barring pre-TAI catastrophe
3. 32% chance of AGI being developed in the next 20 years
## Forecasts
In each case I aggregated forecasts by removing the single most extreme forecast on each end, then taking the [geometric mean of odds](https://forum.effectivealtruism.org/posts/sMjcjnnpoAQCcedL2/when-pooling-forecasts-use-the-geometric-mean-of-odds).
To reduce concerns of in-group bias to some extent, I calculated a separate aggregate for those who werent highly-engaged EAs (HEAs) before joining Samotsvety. In most cases, these forecasters hadnt engaged with EA much at all; in one case the forecaster was aligned but not involved with the community. Several have gotten more involved with EA since joining Samotsvety.
Unfortunately Im unable to provide forecast rationales in this post due to forecaster time constraints, though I might in a future post. I provided my personal reasoning for relatively similar forecasts (35% AI takeover by 2100, 80% TAI by 2100) in [my WWOTF review](https://forum.effectivealtruism.org/posts/9Y6Y6qoAigRC7A8eX/my-take-on-what-we-owe-the-future%23Underestimating_risk_of_misaligned_AI_takeover).
### WWOTF questions
| | Aggregate (n=11) | Aggregate, non-pre-Samotsvety-HEAs (n=5) | Range |
|--------------------------------------------------------------------------------------------|------------------|------------------------------------------|----------|
| What's your probability of misaligned AI takeover by 2100, barring pre-[APS-AI](https://docs.google.com/document/d/1smaI1lagHHcrhoi6ohdq3TYIZv0eNWWZMPEy8C8byYg/edit#heading=h.14onymzb0y9) catastrophe? | 25% | 14% | 3-91.5% |
| What's your probability of Transformative AI (TAI) by 2100, barring [pre-TAI](https://www.openphilanthropy.org/research/some-background-on-our-views-regarding-advanced-artificial-intelligence/) catastrophe? | 81% | 86% | 45-99.5% |
### FTX Foundation questions
For the purposes of these questions, FTX Foundation defined AGI as roughly “AI systems that power a comparably profound transformation (in economic terms or otherwise) as would be achieved in \[a world where cheap AI systems are fully substitutable for human labor\]”. See [here](https://docs.google.com/document/d/1I2_pN42wkHJph7QjJnAHevgUu7JIg5fdugQHZbnEZi8/edit) for the full definition used.
Unlike the above questions, these are not conditioning on no pre-AGI/TAI catastrophe.
| | Aggregate (n=11) | Aggregate, non-pre-Samotsvety-HEAs (n=5) | Range |
|-----------------------------------------------------------------------------------------------------------|------------------|------------------------------------------|--------|
| What's the probability of [existential catastrophe](https://forum.effectivealtruism.org/topics/existential-catastrophe-1) from AI, conditional on AGI being developed by 2070? [^1] | 38% | 23% | 4-98% |
| What's the probability of AGI being developed in the next 20 years? | 32% | 26% | 10-70% |
| What's the probability of AGI being developed by 2100? | 73% | 77% | 45-80% |
## Who is Samotsvety Forecasting?
Edited to add: Our track record is now online [here](https://samotsvety.org/track-record/).
[Samotsvety Forecasting](https://samotsvety.org/) is a forecasting group that was started primarily by [Misha Yagudin](https://forum.effectivealtruism.org/users/misha_yagudin), [Nuño Sempere](https://forum.effectivealtruism.org/users/nunosempere), and myself predicting as a team on [INFER](https://www.infer-pub.com/teams/31) (then Foretell). Over time, we invited more forecasters who had very strong track records of accuracy and sensible comments, mostly on [Good Judgment Open](https://www.gjopen.com/) but also a few from INFER and [Metaculus](https://www.metaculus.com/). Some strong forecasters were added through social connections, which means the group is a bit more EA-skewed than it would be without these additions. A few Samotsvety forecasters are also [superforecasters](https://en.wikipedia.org/wiki/Superforecaster).
## How much do these forecasters know about AI?
Most forecasters have at least read Joe Carlsmiths report on AI x-risk, [Is Power-Seeking AI an Existential Risk?](https://arxiv.org/abs/2206.13353). Those who are short on time may have just skimmed the report and/or watched the [presentation](https://forum.effectivealtruism.org/posts/ChuABPEXmRumcJY57/video-and-transcript-of-presentation-on-existential-risk). We discussed the report section by section over the course of a few weekly meetings.
~5 forecasters also have some level of AI expertise, e.g. I did some [adversarial robustness research](https://scholar.google.com/citations?user%3DQ33DXbEAAAAJ%26hl%3Den) during my last year of undergrad then worked at [Ought](https://ought.org/) applying AI to improve open-ended reasoning.
## How much weight should we give to these aggregates?
My personal tier list for how much weight I give to AI x-risk forecasts to the extent I defer:
1. Individual forecasts from people who seem to generally have great judgment, and have spent a ton of time thinking about AI x-risk forecasting e.g. [Cotra](https://www.lesswrong.com/posts/AfH2oPHCApdKicM4m/two-year-update-on-my-personal-ai-timelines), Carlsmith
2. Samotsvety aggregates presented here
3. A superforecaster aggregate (Im biased re: quality of Samotsvety vs. superforecasters, but Im pretty confident based on personal experience)
4. Individual forecasts from AI domain experts who seem to generally have great judgment, but havent spent a ton of time thinking about AI x-risk forecasting (this is the one Im most uncertain about, could see anywhere from 2-4)
5. Everything else I can think of I would give little weight to [^2] [^3]
## Acknowledgments
Thanks to Tolga Bilge, Juan Cambeiro, Molly Hickman, Greg Justice, Jared Leibowich, Alex Lyzhov, Jonathan Mann, Nuño Sempere, Pablo Stafforini, and Misha Yagudin for making forecasts.
[^1]: Unlike the WWOTF question, this includes any existential catastrophe caused by AI and not just misaligned takeovers (this is a non-negligible consideration for me personally and Im guessing several other forecasters, though I do give most weight to misaligned takeovers).
[^2]: Why do I give little weight to Metaculuss views on AI? Primarily because of the [incentives](https://forum.effectivealtruism.org/posts/S2vfrZsFHn7Wy4ocm/bottlenecks-to-more-impactful-crowd-forecasting-2%23Failure_modes1) to make very shallow forecasts on a ton of questions (e.g. probably <20% of Metaculus AI forecasters have done the equivalent work of reading the Carlsmith report), and secondarily that forecasts arent aggregated from a select group of high performers but instead from anyone who wants to make an account and predict on that question.
[^3]: Why do I give little weight to AI expert surveys such as [When Will AI Exceed Human Performance? Evidence from AI Experts](https://arxiv.org/abs/1705.08807)? I think most AI experts have incoherent and poor views on this because they dont think of it as their job to spend time thinking and forecasting about what will happen with very powerful AI, and many dont have great judgment.

Binary file not shown.

After

Width:  |  Height:  |  Size: 66 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 272 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 28 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 253 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 65 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 22 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 23 KiB

View File

@ -0,0 +1,328 @@
Samotsvety Nuclear Risk update October 2022
==============
This writeup was originally posted on the [EA Forum](https://forum.effectivealtruism.org/posts/2nDTrDPZJBEerZGrk/samotsvety-nuclear-risk-update-october-2022), which has some good discussion.
After recent events in Ukraine, [Samotsvety](https://samotsvety.org/) convened to update our probabilities of nuclear war. In March 2022, at the beginning of the Ukraine war, we were at ~[0.01%](https://forum.effectivealtruism.org/posts/KRFXjCqqfGQAYirm5/samotsvety-nuclear-risk-forecasts-march-2022) that London would be hit with a nuclear weapon in the next month. Now, we are at ~0.02% for the next 1-3 months, and at 16% that Russia uses any type of nuclear weapon in Ukraine in the next year. 
Expected values are more finicky and more person-dependent than probabilities, and readers are encouraged to enter their own estimates, for which we provide a [template](https://www.squiggle-language.com/playground/#code=eNqVU9tu00AQ%2FZVRnqBqklZAhSz6ADSCiKitlIQKyS8be2yPupk1e2mwqv4740tayFV9sbyzZ86ZMzP72HOFWU3Dcqls1Yu8DXjahEYpeWPXEWLypPT0d6A81zj1ljjvRb3hEK5DolFZCPdWEWPMNjhSc4euu7lDVRp2Y563CLgESfvUh4y07hPHXLMYj%2BAL5eWDUKIlk4LJwNMSgRxozDwEdiUmlBGmcZfWiV9%2Fnt3EjC5RWnkyfBO8oxQ7xW%2F0gGv1OyV13WO6XcVL9szUfHNODEsXJKC0wI%2F6OokZ4DU17DAB3gB354nIGx4KupBueyPoBeXzH%2FPpV%2FJVw9CwjvZIblv8J33T3WH3J1B7e5V6Z268LFXigRi0cR4KE6yTHcGlYLTYmlCGoz8yVq84qcb8S5w7qef9Wd2Ki7POQlIozoUzVZVrQZfw7uJDzKU1aUi8VPK9ph7zlaok%2FaLOPv%2B4ka0WGmemrrjEL5gZu6NH2BSD6UTKbSgFsrdvzcjfnEMftqnfdl075rXhePHVZe3yVcebpt5asxDBCnIjz0ScKu2MjFBei5J3RCkZV3FiZSwJZNJ%2FIzouJAUo1zL0YSz3qJy8BGMhxfXhWVe81lNTkMjf4Rx6nvF%2F6J9KBxRFRzljU6YL9kGYOW%2BxK2N1CirzaOW0XvsVsQQ6jsFgUC9Sc7oiV%2Bq6C491I%2BDog4yOIk5booO7Hx2%2B7ij2bUi092atvblu0XaoRcb81Hv6C9nOCyo%3D). Wed guess that readers would lose 2 to 300 hours by staying in London in the next 13 months, but this estimate is at the end of a garden of forking paths, and more pessimistic or optimistic readers might make different methodological choices. We would [recommend leaving if Russia uses a tactical nuclear weapon in Ukraine](https://forum.effectivealtruism.org/posts/2nDTrDPZJBEerZGrk/samotsvety-nuclear-risk-update-october-2022#Estimating_the_value_of_leaving_London_or_other_major_cities).
Since March, we have also added our track record to [samotsvety.org/track-record](https://samotsvety.org/track-record/), which might be of use to readers when considering how much weight to give to our predictions. 
_Update 2022-10-04: Changed our estimates as a result of finding an aggregation error. You can see the previous version  of our post_ [_here_](https://web.archive.org/web/20221003195959/https://forum.effectivealtruism.org/posts/2nDTrDPZJBEerZGrk/samotsvety-nuclear-risk-update-october-2022)_. We also noticed that because of the relatively low number of estimates, they are fairly sensitive to each forecasts, so we are working on incorporating more forecasts._
_Update 2022-10-19: These estimates seem a bit out of date now; see_ [_this comment_](https://forum.effectivealtruism.org/posts/2nDTrDPZJBEerZGrk/samotsvety-nuclear-risk-update-october-2022?commentId=fYGxRsRCfzM4nWvN9#comments) _and_ [_these forecasts from the Swift Institute_](https://www.swiftcentre.org/will-russia-use-a-nuclear-weapon/)_._
## Question decomposition
We have updated our decomposition to the following:
1. What is the probability that Russia will use a nuclear weapon in Ukraine in the next MONTH?
2. Conditional on Russia using a nuclear weapon in Ukraine what is the probability that nuclear conflict will scale beyond Ukraine in the next MONTH after the initial nuclear weapon use?
3. Conditional on the nuclear conflict expanding to NATO, what is the chance that London would get hit, one MONTH after the first non-Ukraine nuclear bomb is used?
For each of those questions, we also asked forecasters for their yearly probabilities. Following up on previous feedback, we also asked forecasters for their core reasons behind their forecasts, and well present those alongside their probabilities.
We also asked a range of questions about counterfactuals:
* Conditional on Russia NOT using a nuclear weapon in Ukraine, what is the probability of a nuclear conflict outside Ukraine in the next MONTH?
* Conditional on Russia NOT using a nuclear weapon in Ukraine what is the probability that nuclear conflict will scale beyond Ukraine in the next YEAR?
* Conditional on Russia NOT dropping a nuclear weapon in Ukraine in October, what is the probability that London will be hit with a nuclear weapon in October?
As well as a sanity check:
* What is the unconditional probability of London being hit with a nuclear weapon in October?
## Summaries
### Summary tables
_For ≤ 1 month staggering times between each step_
| Event | Conditional on previous step | Unconditional probability |
|-----------------------------------------------------------------------------------------------|------------------------------|---------------------------|
| Russia uses a nuclear weapon in Ukraine in the next month | — | 5.3% |
| Nuclear conflict scales beyond Ukraine in the next month after the initial nuclear weapon use | 2.5% | 0.13% |
| London gets hit, one month after the first non-Ukraine nuclear bomb is used? | 14% | 0.02% |
_For ≤ 1 year staggering times between each step_
| Event | Conditional on previous step | Unconditional probability |
|----------------------------------------------------------------------------------------------|------------------------------|---------------------------|
| Russia uses a nuclear weapon in Ukraine in the next year | — | 16% |
| Nuclear conflict scales beyond Ukraine in the next year after the initial nuclear weapon use | 9.6% | 1.6% |
| London gets hit, one year after the first non-Ukraine nuclear bomb is used? | 23% | 0.36% |
### Visualizations
This time, we are also experimenting with providing a few visualizations. Their advantage is that they may be more intuitive; the disadvantage is that they may gloss over the shape of our uncertainty, and thus mislead. Reader beware.
For the forecast with one month between each escalation step, we have:
<img src='https://i.imgur.com/UUZ0MSx.png' class='.img-medium-center'>
<img src='https://i.imgur.com/3POz1Zr.png' class='.img-medium-center'>
<img src='https://i.imgur.com/wXigRlV.png' class='.img-medium-center'>
<img src='https://i.imgur.com/W0I3ztj.png' class='.img-medium-center'>
### A forecasters perspective
In order to understand at what level we are forecasting here, we are providing forecasters comments. One forecaster provided his comments in a more self-contained form—rather than question by question—so Im presenting those comments here, lightly edited:
> _In general, nuclear rhetoric has been used extensively before and it seems that it was fairly successful at achieving its intended goals without having to use the weapons (e.g., Germany was hesitant to send weapons to Ukraine). I think such bluffing might be wearing off but Moscow is very good at maintaining ambiguity._
>
> * _Nonetheless, previously stated “red lines” have already been crossed in this war without nuclear escalation. E.g., cross-border raids into Belgorod and strikes against Crimea._
>
> _Being ambiguous about ones willingness to use these weapons is what we have seen in the past and is what we see now. E.g., Zvi_ [_previously summarizes_](https://thezvi.substack.com/p/ukraine-post-12)_, when discussing a recent_ [_Putin's speech_](https://en.kremlin.ru/events/president/transcripts/69390)_:_
>
> \> What I heard were several instances of drawing a distinction between Russia and its territorial integrity, and the territories under occupation. He said that the call-ups would be sufficient for the operation. He declared his intention to keep the territory, if he can maintain physical control. Then, he went back to saying that Ukraine was getting weapons that could threaten Russia, explicitly including Crimea as part of Russia but not Donbass, whereas Ukraines normal forces can obviously already threaten Donbass or Kherson. He framed his threats of nuclear use in response to claimed Western nuclear blackmail and what he says are Western attempts to get Ukraine to invade clearly Russian territories.
>
> _Using nukes doesnt feel like a good choice._
>
> * _Using one on a battlefield cant be all much helpful. The frontline is ~1,000 km; troops are not concentrated. I guess the main benefits can come from “scaring troops,” “being credibly nuclear,” and maybe destroying key infrastructure._
> * _Breaking the nuclear taboo is likely to alienate parties that are ~neutral right now — most of all India. This effect is greater the more damage is done with nukes (e.g., “just testing” vs. using a very small nuke on a battlefield vs. attacking key infrastructure vs. endangering civilians)._
> * _Using nuclear weapons would also alienate various parties in Russia:_
> * _IIRC, most people disapprove of the use of nuclear weapons._
> * _Likewise, elites might be legitimately more scared: its one thing to be cut off from EU/US: you can still live lavishly in Russia. Its another thing to endanger yourself and your loved ones with the salient possibility of nuclear war._
> * _Even military planners, I think, would not be happy about stretching the nuclear doctrine that far._
>
> _Consider what will happen if the Ukrainian offensive continues. Russia is losing cities in Lugansk. I feel that Ukrainians are_ [_calling Putins nuclear bluff_](https://thezvi.substack.com/p/ukraine-post-12)_. And this gives Putin few good options to work with._
>
> * _It seems like the most likely option is Russia just trying to sustain the conflict by pouring more resources and will into it. But it also might just lose in the end. I think “partial mobilisation” can be seen through that lens._
> * _Maybe Putins move is just to wait until the winter, when the European energy crisis will be most acutely felt?_
> * _I think the nuclear pretext might be important for Western leadership, because they cant just make a deal with Putin right now, he is far beyond redemption. But making deals to “avoid nuclear holocaust” — while also giving citizens cheaper gas — might be manageable._   
>
> _If things go nuclear:_
>
> * _I think it might be with the “least” scary nuke, because every escalation step, every credibly ambiguous situation could be turned into concessions, pauses, etc. Giving up intermediate steps is not wise._
> * _Other forecasters discussed, just “testing” or nuking a small island or just dumping it in the Black sea._
> * _I am worried about the multi-step conditional probabilities we are using here. While I think we have some ability to model the present situation, if the nuclear taboo were to be broken, we would be in unchartered land._
> * _In this case, people would still push for de-escalation and would try to avoid a RussiaNATO conflict (and especially a full-out RussiaNATO nuclear war). It's just hard to think about._
> * _(A) Because evidently previous diplomatic efforts would have failed catastrophically, and its unclear if there would be any remaining diplomatic tricks in their sleeves;_
> * _(B) we havent been at this level of tension for a while, and we just dont know how everyone would react;_
> * _(C) the situation is likely to worsen for Putin (both internally and externally), and Putin might be likely to increase risk-taking as his likelihood of attaining a “win” diminishes._ 
>
> _I feel uncomfortable about my estimation process for a few reasons:_
>
> * _We are in the territory where the “proven technique” of carefully crafting base-rates is less applicable._
> * _There is a good GJOpen “rule of thumb:” if a decision depends on one person, dont go below 5%. This is because other people are not transparent to us, we dont know their constraints and we dont know the bulk of their incentives. In this case:_
> * _It's not inconceivable that the decision to invade Ukraine in late February was misinformed (and ~unilateral). Relevant actors might be misinformed now, and they might be misinformed in surprising-to-us ways due to Putin being partly “siloed.”_  
## Forecaster probabilities and comments
See a later section for a comment on our aggregation method.
### Russia using a nuclear weapon in Ukraine
_**What is the probability that Russia will use a nuclear weapon in Ukraine in the next MONTH?**_
* Aggregate probability: 0.053025 (5.303%)
* All probabilities: 0.27, 0.04, 0.02, 0.001, 0.09, 0.08, 0.07
_**What is the probability that Russia will use a nuclear weapon in Ukraine in the next YEAR?**_
* Aggregate probability: 0.16388 (16%)
* All probabilities: 0.38, 0.11, 0.11, 0.005, 0.42, 0.2, 0.11
_**Conditional on Russia using a nuclear weapon in Ukraine in the next year, will it be a tactical nuclear weapon?**_
* Aggregate probability: 0.96356 (96%)
* All probabilities: 0.97, 0.93, 0.97, “Yes”, 0.98, 0.95, 0.8
_**Forecaster comments**_
These have been lightly edited. Reading them is probably indicative of the level at which we are thinking, which has the flavor of “we have a lot of uncertainty about this.”
> _This is a particularly dangerous time. Many of the gambles Putin has taken so far have gone badly and now he stands a real risk of losing power as the war drags on and he has nothing to show for it. Even still, for Putin, even without moral guardrails, the risks of using nuclear weapons of any kind should still outweigh the benefits if he is seeing things clearly. If things continue to deteriorate, the situation may change, but for now, it seems that although Putin has been weakened, he still has a very good chance of remaining in power if he can simply get to a stalemate in the territories he now controls. Although I've frontloaded a lot of the risk into the next month, if a nuclear weapon is going to be used, there will probably be some build-up before it is deployed with warning signs along the way. It is likely Putin will try to prepare his population, and, while declaring territories within Ukraine to be part of Russia may provide some pretense of a justification, each stage of escalation brings heightened risk. At each stage, it makes sense to escalate slowly to attempt to extract the maximum possible concessions a before taking on the increased risk of further escalation. I would expect to see nuclear tests or warning shots before seeing nuclear attacks, and for the first nuclear attack, tactical nuclear weapons would be the most logical starting point._ 
> _I think the use of nuclear weapons tactically would be a lot easier for Putin to explain to the Russian people. Perhaps strategic use could come afterwards, if he is in a desperate situation._
>
> _I think that Putin is 100% committed to conquering Ukraine. His "special military action" has largely failed so far, so he is expanding his military efforts with a "partial" mobilization. If that fails, or perhaps in combination with increased military mobilization, it looks possible to me that he could detonate a tactical nuclear weapon in the mistaken belief that it would make NATO countries back off at least from territory that Russia currently controls. In reality, I think detonating a tactical nuclear weapon would have the opposite effect, though._
 
> _\[My uncertainty is\] primarily methodological and from skewing to uncertainty. The main errors in the_ [_Superforecaster post-mortem_](https://goodjudgment.com/wp-content/uploads/2022/03/1570-Post-Mortem-v2.pdf) _for predicting invasion were overreliance on certain base rates and underestimating Putins willingness to take major risks. Im hesitant to make the same mistakes twice._
>
> _I also think Putin and Kremlin officials are less analyzable than most seem to think. I still dont have a compelling explanation for why Putin wants Ukraine so bad and why hes taken so much risk up until this point, which to me says my mental model of their decision-making isnt good enough to do much with._
> _Plausible scenarios exist where Putin uses a tactical nuke, probably to scare Ukraine, divide NATO, etc._ 
 
> _I would be higher with my first two estimates if they included an attack on a nuclear plant that could lead to a radiation disaster. This might be Putin's preferred method because he could keep a level of ambiguity as to Russia being responsible. That said, Putin's reason for using a tactical nuclear weapon might precisely be to let Ukraine and the world know how serious he is about not backing down. I think Putin wants to win the Ukraine War at pretty much any cost._
>
> _\> \[…\] I think Putin would almost definitely use a tactical nuke instead of a strategic one because it would make Ukraine and America/NATO more fearful of the situation without as high of a chance of a nuclear apocalypse (when compared to a strategic nuke being detonated in Ukraine)._
> _Putin has established a land bridge to Crimea, which is a major strategic goal for Russia. In recent speeches, he has explicitly said that Russia will use everything it has on the table to protect the newly annexed region._
> _Using nuclear weapons would drastically upend the current geopolitical order. But I don't have enough confidence to confidently reject that outcome._
### Nuclear conflict escalating beyond Ukraine after Russia uses a nuclear weapon in Ukraine
**Conditional on Russia using a nuclear weapon in Ukraine what is the probability that nuclear conflict will scale beyond Ukraine in the next MONTH after the initial nuclear weapon use?** 
* Aggregate probability: 0.0254 (2.5%)
* All probabilities: 0.15, 0.09, 0.0013, 10^(-5), 0.01, 0.3, 0.05
**Conditional on Russia using a nuclear weapon in Ukraine, what is the probability that nuclear conflict will scale beyond Ukraine in the next YEAR after the initial nuclear weapon use?**
* Aggregate probability: 0.095685 (9.6%)
* All probabilities: 0.2, 0.15, 0.0151, 10^(-5), 0.15, 0.4, 0.1
**Forecaster comments**
> I think nuclear war happening as a result of Russia using a tactical nuke in Ukraine is not extremely unlikely because the world would be in somewhat unprecedented territory, so this could make for a catastrophe as a result of miscalculations on one or both sides.
> If Russia uses a nuclear weapon, the west probably would not respond with a nuclear strike, but would probably try to use other channels which I won't speculate about publicly. Depending on the type, scale, and impact of the attack, a nuclear response is possible. If there is no Russian nuclear attack there is a minuscule chance of either a preemptive strike (based on intelligence that Russia is likely to launch a nuclear attack) or a false signal based on something that looks like an attack triggering a nuclear strike against Russia. The fact of heightened tensions makes these kinds of accidents more likely than they would otherwise be.
> I don't think Russia nuking Ukraine raises the global nuclear risk by much. I think most of the risk still comes from accidental launches due to false alarms, which I think is probably at an elevated risk currently.
> I think that the [MAD](https://en.wikipedia.org/wiki/Mutual_assured_destruction) precludes nuclear conflict scaling up. And I think that if nuclear conflict were to expand following Russia detonating a nuclear weapon in Ukraine (or elsewhere), then that would likely happen close to immediately.
> Payload and target of tactical nukes are all widely variable, if one is used Id imagine those parameters would be chosen to minimize the risk of a nuclear response. 
>
> NATO isnt currently personally involved in the war, its hard to imagine them deciding to send troops or especially to send nukes in response to a hit on a military target or a demonstration blast on Snake Island or the Black Sea.
>
> Its possible Putin miscalculates or actually wants nuclear war, but to me the most likely outcome is negotiations (for better or for worse).
> I have high confidence that nuclear weapons will not be used outside this conflict.
>
> I don't have high confidence that nuclear weapons will not be used in areas close to the strategic landscape (e.g., areas supporting either side in NATO, Belarus, inner Russia, etc.)
> No one wants it to escalate. Escalating to NATO is suicidal, just clearly a loss for Putin and folks.
>
> Also, I expect revolt of elites or something. As they would feel that this is totally suicidal, not worth it. I expect a lot of people to fear that nuclear war would mean guaranteed death or misery for their families etc. 
### London being hit with a nuclear weapon, conditional on nuclear conflict escalating beyond Ukraine
**Conditional on the nuclear conflict expanding to NATO, what is the chance that London would get hit, one MONTH after the first non-Ukraine nuclear bomb is used?** 
* Aggregate probability: 0.1424 (14%)
* All probabilities: 0.4, 0.15, 0.9985, 0.05, 0.02, 0.002, 0.5
**Conditional on the nuclear conflict expanding to NATO, what is the chance that London would get hit, one YEAR after the first non-Ukraine nuclear bomb is used?**
* Aggregate probability: 0.232015 (23%)
* All probabilities: 0.45, 0.3, 0.9985, 0.05, 0.12, 0.01, 0.5
**What is the unconditional probability of London being hit with a nuclear weapon in October?**
* Aggregate probability: 0.00066 (0.066%)
* All probabilities: 0.01, 0.00056, 0.001251, 10^-8, 0.000144, 0.0012, 0.001
**Forecasters' comments**
 
> If nuclear conflict expands outside of Ukraine, it seems quite likely that London would get hit because I think that the UK would be the second choice of a Russian nuclear attack—the first choice being America. I also think that in the case of a nuclear war, it is a likely scenario that Russia launches a general nuclear attack on most, if not all of, NATO.
> Barring accidents nd other unlikely circumstances, London will only be a target in the event of full-scale nuclear war. At each stage of escalation, prior to full-scale war, there would be attempts to take off ramps. But, it is possible, even if unlikely, that predetermined nuclear response protocols could kick in, or, in the fog of war mistakes and miscalculations could result in rapid escalation.
> If there is a nuclear exchange between NATO and Russia, London will be hit very quickly.
> If a nuclear conflict does expand to NATO, I would still hold out some hope that it doesn't turn into an all-out nuclear war. Thus, my forecast for London getting hit in the event of nuclear conflict with NATO is relatively low. And, if the nuclear conflict expanded to NATO, I'd expect that if London were to get hit, then it would happen within a month. My forecast for the unconditional chance of London getting hit in October is about 10% of my forecast for any nuclear conflict in October and is barely above my forecast conditional on Russia not dropping a nuclear weapon in Ukraine.
> Conflict likely wouldnt expand to the exchange of strategic nukes after a tactical nuke exchange. Large cities are where the leaders making decisions are. Its one thing to kill soldiers and civilians but it's another to put your own life on the line. Unlike other questions, we have a fairly strong historical track record here for mutually assured destruction during the cold war. Time has passed and tactical nukes are a key difference, but I think the core concept still applies. 
>
> London getting targeted is also a very foreseeable scenario, Id be surprised if NATOs military systems arent ready and sophisticated enough to detect and shoot down a missile or submarine. 
>
> There are also layers of complication from assassination, coups, and civil unrest. The risk to Putin feels much more personal than in other scenarios. 
> Escalation is still possible, e.g. maybe Putin just really hates the West and thats his true motivation, or maybe conflict simply keeps escalating once nukes are exchanged. But that type of dramatic escalation feels unlikely.
> Escalation beyond Ukraine doesn't help Russia achieve its strategic goals.
> hard to see intermediate escalation
## Comparison vs other sources
A few other sources which have forecasts on this are:
* Back in 2019, [Luisa Rodríguezs analysis](https://forum.effectivealtruism.org/posts/PAYa6on5gJKwAywrF/how-likely-is-a-nuclear-exchange-between-the-us-and-russia) put the chance of a US/Russia nuclear exchange at 0.38%/year (if taking the arithmetic mean of her samples), or a 0.13%/year if taking the geometric mean of odds.
* Back in March, [we gave](https://forum.effectivealtruism.org/posts/KRFXjCqqfGQAYirm5/samotsvety-nuclear-risk-forecasts-march-2022) a 0.067%/month to a “NATO/Russia nuclear exchange killing at least one person in the next month”, and an 18% probability of London being hit with a nuclear weapon after that, for an implied 0.012% monthly probability.
* Back at the end of March, [Peter Scoblic](https://forum.effectivealtruism.org/posts/W8dpCJGkwrwn7BfLk/nuclear-expert-comment-on-samotsvety-nuclear-risk-forecast-2) gave a **heavily caveated** 5% to a “NATO/Russia nuclear exchange killing at least one person in the next month”, and a likewise heavily caveated 65% probability to London being hit with a nuclear weapon after that, for an implied 3.2% probability
* [Zvi](https://thezvi.substack.com/p/ukraine-post-8-risk-of-nuclear-war) and [Daniel Filan](https://danielfilan.com/2022/03/10/prob_smart_londoner_dies_of_russian_nuke.html) also gave their probabilities using our decomposition. 
* Metaculus has several questions on nuclear weapons, such as:
* [Will there be at least one fatality due to deliberate nuclear detonation by 2024?](https://www.metaculus.com/questions/7407/deliberate-nuclear-detonation-by-2024/) (7%)
* [Will there be an offensive nuclear detonation on a nation's capital by 2024, if an offensive nuclear detonation occurs anywhere by 2024?](https://www.metaculus.com/questions/8127/nuclear-detonation-on-a-capital-by-2024/) (20%)
* [Will the first offensive nuclear detonation by 2024 be against a battlefield target, if there's an offensive detonation by then?](https://www.metaculus.com/questions/8585/bt-as-the-first-nuclear-detonation-by-2024/) (53%)
* [Will at least one nuclear weapon be detonated in Ukraine before 2023?](https://www.metaculus.com/questions/12591/nuclear-detonation-in-ukraine-by-2023/) (7%)
* [Will a Russian nuclear weapon be detonated in the US before 2023?](https://www.metaculus.com/questions/12593/2022-russian-nuclear-detonation-in-the-us/) (<1%; note that Metaculus doesnt accept probabilities below 1%)
* [Will a non-test nuclear detonation cause at least 1 fatality before 2024?](https://www.metaculus.com/questions/7404/nuclear-detonation-fatality-by-2024/) (12%)
* [Will >2 countries offensively detonate nuclear weapons by 2024, if any offensive detonation of a country's nuclear weapon occurs by then?](https://www.metaculus.com/questions/8145/conditional-2-countries-detonate-by-2024/) 35%
* [Will >2 countries have nuclear weapons offensively detonated on or over their territories by 2024, if any country offensively detonates a nuclear weapon by then?](https://www.metaculus.com/questions/8146/conditional-2-countries-attacked-by-2024/) (49%)
* Manifold Markets also has [a few markets](https://manifold.markets/search?s=24-hour-vol&f=open&q=nuclear) on this, such as:
* [Will a nuclear weapon be launched in combat by the end of 2023?](https://manifold.markets/AndyMartin/will-a-nuclear-weapon-be-launched-i-015e44ed91f5) (7%)
* [Will Russia give a nuclear ultimatum to Ukraine and/or it's Western allies during 2022?](https://manifold.markets/Nostradamnedus/will-russia-give-a-nuclear-ultimatu) (80%)
There is internal discord within Samotsvety about the degree to which the magnitude of the difference between our current and former probabilities is indicative of a lack of accuracy. We Samotsvety updated our endline monthly probability of London being hit with a nuclear weapon by ~2 (~0.02% vs 0.067 \* 0.18 = 0.012%). The difference was higher before correcting an aggregation error, so I've moved discussion to a footnote[\[1\]](#fn2tohbl1ecsm).
In addition, a [former senior U.S. government official](https://en.wikipedia.org/wiki/Andrew_C._Weber) previously gave me a 20% probability of Russia using nuclear weapons by the end of the year, and at the time I thought that this was too high, but now think that this was a reasonable belief to have, and I regret not having deferred more to him.
## Estimating the value of leaving London or other major cities
[Here](https://www.squiggle-language.com/playground/#code=eNqVU9tu00AQ%2FZVRnqBqklZAhSz6ADSCiKitlIQKyS8be2yPupk1e2mwqv4740tayFV9sbyzZ86ZMzP72HOFWU3Dcqls1Yu8DXjahEYpeWPXEWLypPT0d6A81zj1ljjvRb3hEK5DolFZCPdWEWPMNjhSc4euu7lDVRp2Y563CLgESfvUh4y07hPHXLMYj%2BAL5eWDUKIlk4LJwNMSgRxozDwEdiUmlBGmcZfWiV9%2Fnt3EjC5RWnkyfBO8oxQ7xW%2F0gGv1OyV13WO6XcVL9szUfHNODEsXJKC0wI%2F6OokZ4DU17DAB3gB354nIGx4KupBueyPoBeXzH%2FPpV%2FJVw9CwjvZIblv8J33T3WH3J1B7e5V6Z268LFXigRi0cR4KE6yTHcGlYLTYmlCGoz8yVq84qcb8S5w7qef9Wd2Ki7POQlIozoUzVZVrQZfw7uJDzKU1aUi8VPK9ph7zlaok%2FaLOPv%2B4ka0WGmemrrjEL5gZu6NH2BSD6UTKbSgFsrdvzcjfnEMftqnfdl075rXhePHVZe3yVcebpt5asxDBCnIjz0ScKu2MjFBei5J3RCkZV3FiZSwJZNJ%2FIzouJAUo1zL0YSz3qJy8BGMhxfXhWVe81lNTkMjf4Rx6nvF%2F6J9KBxRFRzljU6YL9kGYOW%2BxK2N1CirzaOW0XvsVsQQ6jsFgUC9Sc7oiV%2Bq6C491I%2BDog4yOIk5booO7Hx2%2B7ij2bUi092atvblu0XaoRcb81Hv6C9nOCyo%3D) is a template for calculating risk, given ones probabilities (also saved [here](https://nunosempere.com/.secret/nuclear-2022-10.squiggle) and [here](https://gist.githubusercontent.com/NunoSempere/42e44c33e4be8c973b49b154e5c0b4d8/raw/1e0ae12d0b0eba7fa784f7747f5cd79d08f41c1c/nuclear-2022-10.squiggle)). 
If we input **the full range** of our forecasters probabilities together with some default values, we get [the following estimate](https://develop--squiggle-documentation.netlify.app/playground#code=eNqVVO9P2zAQ%2FVdO%2FURRW1Kg%2FKjGh22gUa0CpNKhadkkk1ySE66d2Q4sQvzvuzhpYe3aii9OfT6%2F9%2Fzurs8tm%2BmnSTGbCVO2hs4U2PGhi5icNvMIKXIk5OR3QWkqceIMqbQ1bO3twSXKHI0NlbU7wpg2nMFEzHJOQtdLjJ6NyTp%2FEqpQ8YWrIpIoDBQPRpDCUJnCkphatM3JHYpcKztS0zqDERn7R9DbP%2B5A0AsO%2Fbrv16DvP6d%2BPfHr8c82s3zoQkJSdqkh1Q7BZcLxgsCCScegE3A0QyALEhMHhbI5RpQQxktarz7eXocKbSSkcKTVdeEsxdgI%2FEKPOBd7J%2FgZDxgvRPcHbxQG%2FYMO9INfO91B2we8%2FAP%2Fc7Ci%2B5XvVlcKpirSiuvCASGZYKtxu6ECeI%2Fq%2FzwbnAbV7MdMr9UeZ2dcf6c5%2B57S6dfp5DO50iN41Is1lAtTfAlrZ05PT2qHBm%2FL6j%2BrjrxhWzZjs1m7UFnxLrGNF6NZLiIHpEBq6yDTRdXsBmecwyaMKcGLP9w2TqioHKnv7JNlOYdBZdxRAPULokyolCFjUdo66QwOjgahyo2Oi8ixkMsKeaTORcnXj6rb%2FZOl2%2BJe4q2uBOf4CRNtKkfZpyWXGDPXxr8ouUHNszhSdenuMn1O9aVj7i%2BvG%2BMxP8yz88Fah30v7fShC6sq2o2%2FW4nrtC3meaZXo16xV4yq4r5IN0bfs6wSUs1zzdYJaTW3BI%2B34MGnmLQtVWS4zBEkXE%2FNPLaIMhC2RujCiM9RWB5dbSDG%2BWbBy45UXSAg4l%2Bb79CiZ%2F7J%2FiZkgcxoKVXoZdrCPDKySuvcJ21kDCJxaHg3n7onUhxoMHq9XtWYfndONpeVC8%2BVEbD1%2F2C4NaNTA22cpeHm4wZiXR8N157MuZebcrgaqjND9dJ6%2BQs8VVWn) of how many lost hours one loses in expectation as a result of staying in London in the medium term—where, because of the way we prompted forecasters, the “medium term” can range from one to three months:
<img src='https://i.imgur.com/kDIjmEv.png' class='.img-medium-center'>
If we instead input the **forecasters aggregate**, rather than the range, we arrive at: 
<img src='https://i.imgur.com/1WEammz.png' class='.img-medium-center'>
A [mixture of both estimates](https://develop--squiggle-documentation.netlify.app/playground/#code=eNqVVWFP2zAQ%2FSunfmpRW1KgFKrxYRtoVKsAqXRoWjbJJJf0RGJntgNUiP%2B%2Bs5MW1q6t9sWJ7fN7z%2B%2FukpeGmamnSZnnQs8bQ6tLbPuli5is0osVkmRJZJPfJaVphhOrSaaNYWN%2FHy4xK1CbUBrTFFq34AwmIi84CG030Sofk7F%2BJ5Sh5ANXZZSh0FA%2BaEESQ6lLQ2Jq0NQ7dygKJc1ITqsIRmTsH0H3YNCGoBsc%2BfHAj0HPP079eOLHwc8Ws3zoQEJZ1qGaVFkEOxOWBwQWTCoGlYClHIEMZJhYKKUpMKKEMF7RevXx9jqUaCKRCUtKXpfWUIy1wC%2F0iAuxd4Kv8YDxUnSv%2F05h0DtsQy%2F41ez0W37Byz%2F0r%2F013W98t8opmMpISc4LL4iMCXYatxdKgP9R%2FY9rg1Ug6%2FmY6ZXc5%2BgZ598qjr6ndPp1OvlMdu4RPOrFBsqlKT6FlTOnpyeVQ%2F33afWPdUfesa2asd2sPXBW%2FJfY2otRXojIAknIlLEwU6Urdo05x7AJY0rw4pnLxgoZzUfyO%2FtkWM5R4Iw7DqC6QTQTMmXIWMxNFXQGh8f9UBZaxWVkWcilQx7JczHn48fudO9k5bS4z%2FBWOcEFfsJEaeco%2B7TiEmMWSvsbJTeouBdHskrd3UydU3VowPXldWM85ot5dt7Y6LCvpWYPOrCuolX7u5O4Ctthnmd6M%2BoNe80ot%2B6TdKPVPcuaQ6q4r9k6kRnFJcHtLbjxKSZl5jLSnOYIEs6nYh5TRjMQpkLowIj3URhuXaUhxsVkycuOuCoQEPHb9jO0rJm%2For%2BJrERmNJRK9DJNqR8ZWaZV7JPSWQwisah5tui6J5K8UGN0u11XmH52TqbInAsvzgjY%2BT0Y7oxoV0Bbe2m4fbuG2FRHw407C%2B7VohwC%2F4NkxHkc80e6mT8310LaMHBuHhy4j2qrwgnla%2BP1D%2BYEXqU%3D) gives a 90% confidence interval of ~2 to 300 hours lost. Personally, I would use this second estimate, but it's hard to say why: maybe because I think that taking the minimum and maximum out of each question does a good job of filtering the least accurate forecasts.
Compare with a [previous estimate](https://forum.effectivealtruism.org/posts/KRFXjCqqfGQAYirm5/samotsvety-nuclear-risk-forecasts-march-2022#Nu_o_Sempere) back in March:
<img src='https://i.imgur.com/CP4x7Z4.png' class='.img-medium-center'>
So, the danger of staying in London has increased by ~1-10x since March. Wed guess for most people reading this post moving out of the city for 1-3 months would still cause more value in lost productivity than the updated estimates of expected lost life hours, but it might be a closer call than it was previously.
For personal purposes, we probably dont have a better decision rule than “leave major cities if any tactical nukes are dropped in Ukraine” (as this will ~10x risk).
## Miscellanea
### A sanity check
We can compare the directly elicited probability of nuclear war reaching London in October with the conditional steps multiplied directly:
The conditional steps are:
1. What is the probability that Russia will use a nuclear weapon in Ukraine in the next MONTH?  0.053025 (5.303%)
2. Conditional on Russia using a nuclear weapon in Ukraine what is the probability that nuclear conflict will scale beyond Ukraine in the next MONTH after the initial nuclear weapon use? 0.0254 (2.5%)
3. Conditional on the nuclear conflict expanding to NATO, what is the chance that London would get hit, one MONTH after the first non-Ukraine nuclear bomb is used? 0.1424 (14%)
And if we multiply these together, we get 0.053025 \* 0.0254 \* 0.1424 = .00019178930400 (0.019% ~ 0.02%), versus 0.00065 (0.066%) when elicited directly. 
I think that the conditionals multiplied directly should be higher. Because the directly elicited probability assumes a scenario where escalation happens within one month, whereas the conditionals multiplied directly would include that scenario, but also scenarios where each escalation step is more staggered.
One way to think about this difference is that a ~3x difference when eliciting unlikely, <1% events is relatively normal. Personally, I (Nuño) would give more weight to the conditionals multiplied directly.
### Counterfactual baseline risk
Forecasters also predicted on these counterfactual questions. 
* Conditional on Russia NOT using a nuclear weapon in Ukraine, what is the probability of a nuclear conflict outside Ukraine in the next MONTH? (0.036%)
* Conditional on Russia NOT using a nuclear weapon in Ukraine what is the probability that nuclear conflict will happen beyond Ukraine in the next YEAR? (0.132%)
* Conditional on Russia NOT dropping a nuclear weapon in Ukraine in October, what is the probability that London will be hit with a nuclear weapon in October? 0.006%
* All probabilities: 0.1%, 0.002%, 0.125%, 0.000001%, 0.001%, 0.01%, 0.005%.
The first two probabilities are dwarfed by the probabilities in the Russian conflict. The third probability indicates a very low baseline risk, but is also very sensitive to the individual forecasts.
### A brief note on the aggregation method
We used the geometric mean of the samples with the minimum and maximum removed to better deal with extreme outliers, as described in [our previous post](https://forum.effectivealtruism.org/posts/KRFXjCqqfGQAYirm5/samotsvety-nuclear-risk-forecasts-march-2022#fnt1dm5d62pkl). Note that the minimum (resp. maximum) do matter. For example, in \[0.1, 1, 10, 100, 1000\], the aggregate would be (1 \* 10 \* 100) ^ (1/3)  = 10. But if we remove 0.1, that aggregate would become (10 \* 100) ^ (1/2) = 31.6. 
## Acknowledgements
This is a project by [Samotsvety](https://samotsvety.org/). Thanks to Jared Leibowich, Jonathan Mann, Tolga Bilge, belikewater, Greg Justice (@slapthepancake), Misha Yagudin and Nuño Sempere for providing updates. Thanks as well to Eli Lifland for comments and suggestions, and to Daniel Kokotajlo and Bhuvan Singla for their [probability mass app](https://daniel-kokotajlo.vercel.app/). 
1. **[^](#fnref2tohbl1ecsm)**
Dropping into the first person, I (Nuño) felt that the degree to which we updated, or at least the degree to which I personally updated, is indicative that our/my probability wasnt a [martingale](https://en.wikipedia.org/wiki/Martingale_(probability_theory)), i.e., that it didnt accurately price the likelihood of future movements. See some discussion about this [here](https://arxiv.org/pdf/1703.06351.pdf), in the context of Nassim Taleb criticizing Nate Silver. Overall, that update to me suggests we should give probabilities closer to 50%, to better adjust for future unknowns, which we maybe arent pricing in.
On the other hand, other proud Samotsvety forecasters point out that our previous forecast was only for March, even though we presented the risk in annualized units. Its also just straight-out possible that we are in the bottom 10-20% of scenarios. So overall we are not done with our post-mortem, which would also include personal updates in April &c.

2
blog/_werc/config Executable file
View File

@ -0,0 +1,2 @@
# conf_enable_comments -n
conf_enable_blog

View File

@ -10,5 +10,5 @@ Scott Alexander [describes us as](https://astralcodexten.substack.com/p/mantic-m
> Enter Samotsvety Forecasts. This is a team of some of the best superforecasters in the world. They won the CSET-Foretell forecasting competition by an absolutely obscene margin, “around twice as good as the next-best team in terms of the relative Brier score”. If the point of forecasting tournaments is to figure out who you can trust, the science has spoken, and the answer is “these guys”.
We are open to forecasting consulting requests, and can be reached out at [info@samotsvety.org](mailto:info@samotsvety.org). Readers might also want to view our [track record](./track-record), browse our [projects](./projects) or read our [media mentions](./media-mentions).
We are open to forecasting consulting requests, and can be reached out at [info@samotsvety.org](mailto:info@samotsvety.org). Readers might also want to view our [track record](./track-record), browse our [projects](./projects), read our [media mentions](./media-mentions) or [subscribe to our updates](https://samotsvety.org/.subscribe/).

View File

@ -2,6 +2,19 @@
N.B.: We are open to media mentions or collaborations, and can be reached at [info@samotsvety.org](mailto:info@samotsvety.org).
Fantastic Anachronism, [Forecasting Forecasting](https://fantasticanachronism.com/2022/11/21/forecasting-forecasting/) ([a](https://web.archive.org/web/20221123112102/https://fantasticanachronism.com/2022/11/21/forecasting-forecasting/
)):
> Why pay tens of thousands for a prediction market (which takes time and effort to organize) when you can just give a couple of grand to Nuño and get better answers, faster?
> ...
> With the right kind of marketing angle I could easily see Samotsvety becoming a kind of 21st century McKinsey for the hip SV crowd that wants to signal that it needs actual advice rather than political cover.
[WIRED: Worried About Nuclear War? Consider the Micromorts](https://www.wired.co.uk/article/micromorts-nuclear-war) ([a](https://web.archive.org/web/20221030121813/https://www.wired.co.uk/article/micromorts-nuclear-war)) covers our group at some length throughout the article.
> In October, the Samotsvety group updated their predictions. In a [blog post](https://forum.effectivealtruism.org/posts/2nDTrDPZJBEerZGrk/samotsvety-nuclear-risk-update-october-2022) published on October 3, they estimated that the chance of London being hit with a nuclear weapon in the next three months was now around 0.02 percent. Since their previous prediction only covered a single month, its hard to directly compare these forecasts in terms of micromorts, but [Sempere estimates](https://www.squiggle-language.com/playground/#code=eNqVU8lu20AM%2FRXCpzaotwIGCiM9dAkKo01SwHZz0WUsURKREanOYsMI8u%2BlFieu06TIRRpuj%2B9xOHcDX8puGavKuP1gHlzEd63rIqMg7uAhpkDGLn9HKgqLy%2BCIi8F8MB7DJaVOKnEB0AeqTEAPOwolTKES1v8Gww6RNZwaawIJgw9Y%2B4QT1vqrmFo0DuKtM8SYsIuezNqj7yM3aGphv%2BB1lwEfYTKaTGYQpDnMOhQJCKE0QT8INTqSDCQHZYRAHizmASL7GlPKCbOT5lefVtcJPzK8jsFThn3Hb7TFQ%2Fcbo7xuMetYTDsS0xmAop0PISdrh8THWCtp0NecCutM1WGsFv9X5VnCAK9h9A9JDTvu7R%2FaXnis2aXeXRDN3lCx%2Fr5efqGwbxFa1ItnWnaC%2B6m%2Ff6L3COtU6sujOING6Kuo9EoXVW3SAMRgxQcoJTpdKrOxuJKmtMbPmIvrmPfEP5zyrg7r6zXtWRHtZbyZwhCewr8F1TC9bCn9dLLRjD0UovunHY318vAwgDISv%2BfUqagUcmUvzoOPaQnGQ4swhIXG0XhdMXGQ4cGonWQxDbRVco1mA6meXq6hhwn9lf3L2Ija0VPB2NL00W0VmYsudyfOZmDygE6twwbtiNXRY4xGo%2BYaWusr%2BdoaRb9rxgTwONP50bmLJXw%2FuP8DtRGUZA%3D%3D) that their projected risk for a Londoner over the next one-to-three months may now be around 40 micromorts
[CNN: How to assess the risk of nuclear war without freaking out](https://edition.cnn.com/2022/06/28/opinions/nuclear-war-likelihood-probability-russia-us-scoblic-mandel/index.html) ([a](https://web.archive.org/web/20220628081603/https://edition.cnn.com/2022/06/28/opinions/nuclear-war-likelihood-probability-russia-us-scoblic-mandel/index.html)):
> One group of [highly regarded forecasters](https://forum.effectivealtruism.org/posts/KRFXjCqqfGQAYirm5/samotsvety-nuclear-risk-forecasts-march-2022#_blank) put the probability of Russia using a nuclear weapon against London before February 2023 at 0.8%[^1]

Binary file not shown.

View File

@ -1,3 +1,18 @@
https://samotsvety.org/blog/
https://samotsvety.org/blog/2021/
https://samotsvety.org/blog/2021/12/
https://samotsvety.org/blog/2021/12/31/
https://samotsvety.org/blog/2021/12/31/prediction-markets-in-the-corporate-setting/
https://samotsvety.org/blog/2022/
https://samotsvety.org/blog/2022/03/
https://samotsvety.org/blog/2022/03/10/
https://samotsvety.org/blog/2022/03/10/samotsvety-nuclear-risk-forecasts-march-2022/
https://samotsvety.org/blog/2022/09/
https://samotsvety.org/blog/2022/09/09/
https://samotsvety.org/blog/2022/09/09/samotsvety-s-ai-risk-forecasts/
https://samotsvety.org/blog/2022/10/
https://samotsvety.org/blog/2022/10/03/
https://samotsvety.org/blog/2022/10/03/samotsvety-nuclear-risk-update-october-2022/
https://samotsvety.org/media-mentions/
https://samotsvety.org/projects/
https://samotsvety.org/track-record/