Forecasting Newsletter for May draft

This commit is contained in:
Nuno Sempere 2020-05-23 20:09:01 +02:00
parent dd5559d099
commit 111b1157ad

View File

@ -1,4 +1,4 @@
Whatever happened to forecasting? April 2020
Whatever happened to forecasting? May 2020
============================================
A forecasting digest with a focus on experimental forecasting. You can sign up [here](https://mailchi.mp/18fccca46f83/forecastingnewsletter). The newsletter itself is experimental, but there will be at least five more iterations.
@ -13,6 +13,7 @@ A forecasting digest with a focus on experimental forecasting. You can sign up [
- Metaculus.
- Good Judgement Open.
- In the News.
- Grab bag.
- Long Content.
## Prediction Markets & Forecasting platforms.
@ -39,7 +40,7 @@ Some of the interesting and wrong ones are:
- [Will Michelle Obama run for president in 2020?](https://www.predictit.org/markets/detail/4632/Will-Michelle-Obama-run-for-president-in-2020)
- [Will Hillary Clinton run for president in 2020?](https://www.predictit.org/markets/detail/4614/Will-Hillary-Clinton-run-for-president-in-2020)
Answers are: 80%, 15%, 69%, 79%, 8%, 2%, 7%, 11%.
Market odds are: 80%, 15%, 69%, 79%, 8%, 2%, 7%, 11%.
Further, the following two markets are plain inconsistent:
- [Will the 2020 Democratic nominee for president be a woman?](https://www.predictit.org/markets/detail/2902/Will-the-2020-Democratic-nominee-for-president-be-a-woman): 11%
@ -89,9 +90,11 @@ Good Judgement Inc. also organizes the Good Judgement Open [gjopen.com](https://
- [Before 1 January 2021, will the People's Liberation Army (PLA) and/or Peoples Armed Police (PAP) be mobilized in Hong Kong?](https://www.gjopen.com/questions/1499-before-1-january-2021-will-the-people-s-liberation-army-pla-and-or-people-s-armed-police-pap-be-mobilized-in-hong-kong)
- [Will the winner of the popular vote in the 2020 United States presidential election also win the electoral college?](https://www.gjopen.com/questions/1495-will-the-winner-of-the-popular-vote-in-the-2020-united-states-presidential-election-also-win-the-electoral-college)- This one is interesting, because it has infrequently gone the other way historically, but 2/5 of the last USA elections were split.
- [Will Benjamin Netanyahu cease to be the prime minister of Israel before 1 January 2021?](https://www.gjopen.com/questions/1498-will-benjamin-netanyahu-cease-to-be-the-prime-minister-of-israel-before-1-january-2021). Just when I thought he was out, he pulls himself back in.
- [Before 28 July 2020, will Saudi Arabia announce the cancellation or suspension of the Hajj pilgrimage, scheduled for 28 July 2020 to 2 August 2020?] (https://www.gjopen.com/questions/1621-before-28-july-2020-will-saudi-arabia-announce-the-cancellation-or-suspension-of-the-hajj-pilgrimage-scheduled-for-28-july-2020-to-2-august-2020)
- [Before 28 July 2020, will Saudi Arabia announce the cancellation or suspension of the Hajj pilgrimage, scheduled for 28 July 2020 to 2 August 2020?](https://www.gjopen.com/questions/1621-before-28-july-2020-will-saudi-arabia-announce-the-cancellation-or-suspension-of-the-hajj-pilgrimage-scheduled-for-28-july-2020-to-2-august-2020)
- [Will formal negotiations between Russia and the United States on an extension, modification, or replacement for the New START treaty begin before 1 October 2020?](https://www.gjopen.com/questions/1551-will-formal-negotiations-between-russia-and-the-united-states-on-an-extension-modification-or-replacement-for-the-new-start-treaty-begin-before-1-october-2020)s
Odds: 20%, 75%, 44%, 86%, 19%
On the Good Judgement Inc. side, [here](https://goodjudgment.com/covidrecovery/) is a dashboard presenting forecasts related to covid. The ones I found most worthy are:
- [When will the FDA approve a drug or biological product for the treatment of COVID-19?](https://goodjudgment.io/covid-recovery/#1384)
- [Will the US economy bounce back by Q2 2021?](https://goodjudgment.io/covid-recovery/#1373)
@ -134,7 +137,7 @@ The Center for Security and Emerging Technology is looking for forecasters to pr
> Now one thing I think is interesting is that often people, theyre not interested in my saying, “Theres a 78% chance of something happening.” What they want to know is, how did I get there? What is my arguments? Thats not unreasonable. I really like thinking in terms of probabilities, but I think it often helps people understand what the mechanism is because it tells them something about the world that might help them make a decision. So I think one thing that maybe can be done is not to treat it as a black box probability, but to have some kind of algorithmic transparency about our thinking because that actually helps people, might be more useful in terms of making decisions than just a number.
- [Forecasting s-curves is hard](https://constancecrozier.com/2020/04/16/forecasting-s-curves-is-hard/): Some sweet visualizations of what it says on the title.
- [Fashion Trend Forecasting](https://arxiv.org/pdf/2005.03297.pdf) using Instagram and baking preexisting knowledge into NNs.
- [Space Weather Challenge and Forecasting Implications of Rossby Waves](https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2018SW002109). Recent advances may help predict solar flares better. I don't know how bad the worst solar flare could be, and how much a two year warning could buy us, but I view developments like this very positively.
- [Space Weather Challenge and Forecasting Implications of Rossby Waves](https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2018SW002109). Recent advances may help predict solar flares better. I don't know how bad the worst solar flare could be, and how much a two year warning could buy us, but I tend to view developments like this very positively.
- [The advantages and limitations of forecasting](https://rwer.wordpress.com/2020/05/12/the-advantages-and-limitations-of-forecasting/). A short and sweet blog post, with a couple of forecasting anecdotes and zingers.
- The [University of Washington Medicine](https://patch.com/washington/seattle/uw-medicine-forecasting-losses-500-million-summers-end) might be pretending they need more money to try to bait donors. Of course, America being America, they might actually not have enough money. During a pandemic. "UW Medicine has been at the forefront of the national response to COVID-19 in treating critically ill patients".
- [Forecasting drug utilization and expenditure: ten years of experience in Stockholm](https://bmchealthservres.biomedcentral.com/articles/10.1186/s12913-020-05170-0). A normally pretty good forecasting model had the bad luck of not foreseeing a Black Swan, and sending a study to a journal just before a pandemic, so that it's being published now. They write: "According to the forecasts, the total pharmaceutical expenditure was estimated to increase between 2 and 8% annually. Our analyses showed that the accuracy of these forecasts varied over the years with a mean absolute error of 1.9 percentage points." They further conclude: "Based on the analyses of all forecasting reports produced since the model was established in Stockholm in the late 2000s, we demonstrated that it is feasible to forecast pharmaceutical expenditure with a reasonable accuracy." Presumably, this has increased further because of covid, sending the mean absolute error through the roof.
@ -157,10 +160,10 @@ The Center for Security and Emerging Technology is looking for forecasters to pr
- [How to evaluate 50% predictions](https://www.lesswrong.com/posts/DAc4iuy4D3EiNBt9B/how-to-evaluate-50-predictions). "I commonly hear (sometimes from very smart people) that 50% predictions are meaningless. I think that this is wrong."
- [Named Distributions as Artifacts](https://blog.cerebralab.com/Named%20Distributions%20as%20Artifacts). On how the named distributions we use (the normal distribution, etc.), were selected for being easy to use in pre-computer eras, rather than on being a good ur-prior on distributions for phenomena in this universe.
- [The fallacy of placing confidence in confidence intervals](https://link.springer.com/article/10.3758/s13423-015-0947-8). On how the folk interpretation of confidence intervals can be misguided, as it conflates: a. the long-run probability, before seeing some data, that a procedure will produce an interval which contains the true value, and b. and the probability that a particular interval contains the true value, after seeing the data. This is in contrast to Bayesian theory, which can use the information in the data to determine what is reasonable to believe, in light of the model assumptions and prior information. I found their example where different confidence procedures produce 50% confidence intervals which are nested inside each other particularly funny. Some quotes:
> Using the theory of confidence intervals and the support of two examples, we have shown that CIs do not have the properties that are often claimed on their behalf. Confidence interval theory was developed to solve a very constrained problem: how can one construct a procedure that produces intervals containing the true parameter a fixed proportion of the time? Claims that confidence intervals yield an index of precision, that the values within them are plausible, and that the confidence coefficient can be read as a measure of certainty that the interval contains the true value, are all fallacies and unjustified by confidence interval theory.
> “I am not at all sure that the confidence is not a confidence trick. Does it really lead us towards what we need the chance that in the universe which we are sampling the parameter is within these certain limits? I think it does not. I think we are in the position of knowing that either an improbable event has occurred or the parameter in the population is within the limits. To balance these things we must make an estimate and form a judgment as to the likelihood of the parameter in the universe that is, a prior probability the very thing that is supposed to be eliminated.”
> The existence of multiple, contradictory long-run probabilities brings back into focus the confusion between what we know before the experiment with what we know after the experiment. For any of these confidence procedures, we know before the experiment that 50 % of future CIs will contain the true value. After observing the results, conditioning on a known property of the data — such as, in this case, the variance of the bubbles — can radically alter our assessment of the probability.
> “You keep using that word. I do not think it means what you think it means.” Íñigo Montoya, The Princess Bride (1987)
> Using the theory of confidence intervals and the support of two examples, we have shown that CIs do not have the properties that are often claimed on their behalf. Confidence interval theory was developed to solve a very constrained problem: how can one construct a procedure that produces intervals containing the true parameter a fixed proportion of the time? Claims that confidence intervals yield an index of precision, that the values within them are plausible, and that the confidence coefficient can be read as a measure of certainty that the interval contains the true value, are all fallacies and unjustified by confidence interval theory.
> “I am not at all sure that the confidence is not a confidence trick. Does it really lead us towards what we need the chance that in the universe which we are sampling the parameter is within these certain limits? I think it does not. I think we are in the position of knowing that either an improbable event has occurred or the parameter in the population is within the limits. To balance these things we must make an estimate and form a judgment as to the likelihood of the parameter in the universe that is, a prior probability the very thing that is supposed to be eliminated.”
> The existence of multiple, contradictory long-run probabilities brings back into focus the confusion between what we know before the experiment with what we know after the experiment. For any of these confidence procedures, we know before the experiment that 50 % of future CIs will contain the true value. After observing the results, conditioning on a known property of the data — such as, in this case, the variance of the bubbles — can radically alter our assessment of the probability.
> “You keep using that word. I do not think it means what you think it means.” Íñigo Montoya, The Princess Bride (1987)
- [Psychology of Intelligence Analysis](https://www.cia.gov/library/center-for-the-study-of-intelligence/csi-publications/books-and-monographs/psychology-of-intelligence-analysis/), courtesy of the American Central Intelligence Agency, seemed interesting, and I read chapters 4, 5 and 14. Sometimes forecasting looks like reinventing intelligence analysis; from that perspective, I've found this reference work useful. Thanks to EA Discord user @Willow for bringing this work to my attention.
- Chapter 4: Strategies for Analytical Judgement. Discusses and compares the strengths and weaknesses of four tactics: situational analysis (inside view), applying theory, comparison with historical situations, and immersing oneself on the data. It then brings up several suboptimal tactics for choosing among hypothesis.
- Chapter 5: When does one need more information, and in what shapes does new information come from?
@ -169,23 +172,21 @@ The Center for Security and Emerging Technology is looking for forecasters to pr
> There is strong experimental evidence, however, that such self-insight is usually faulty. The expert perceives his or her own judgmental process, including the number of different kinds of information taken into account, as being considerably more complex than is in fact the case. Experts overestimate the importance of factors that have only a minor impact on their judgment and underestimate the extent to which their decisions are based on a few major variables. In short, people's mental models are simpler than they think, and the analyst is typically unaware not only of which variables should have the greatest influence, but also which variables actually are having the greatest influence.
- Chapter 14: A Checklist for Analysts. "Traditionally, analysts at all levels devote little attention to improving how they think. To penetrate the heart and soul of the problem of improving analysis, it is necessary to better understand, influence, and guide the mental processes of analysts themselves." The Chapter also contains an Intelligence Analysis reading list.
- [The Limits of Prediction: An Analysts Reflections on Forecasting](https://www.cia.gov/library/center-for-the-study-of-intelligence/csi-publications/csi-studies/studies/vol-63-no-4/Limits-of-Prediction.html), also courtesy of the American Central Intelligence Agency. On how intelligence analysts should inform their users of what they are and aren't capable of. It has some interesting tidbits and references on predicting discontinuities. It also suggests some guiding questions that the analyst may try to answer for the policymaker.
- What is the context and reality of the problem I am facing?
- How does including information on new developments affect my problem/issue?
- What are the ways this situation could play out?
- How do we get from here to there? and/or What should I be looking out for?
> "We do not claim our assessments are infallible. Instead, we assert that we offer our most deeply and objectively based and carefully considered estimates."
- What is the context and reality of the problem I am facing?
- How does including information on new developments affect my problem/issue?
- What are the ways this situation could play out?
- How do we get from here to there? and/or What should I be looking out for?
> "We do not claim our assessments are infallible. Instead, we assert that we offer our most deeply and objectively based and carefully considered estimates."
- [How to Measure Anything](https://www.lesswrong.com/posts/ybYBCK9D7MZCcdArB/how-to-measure-anything), a review.
- The World Meteorological organization, on their mandate to guarantee that [no one is surprised by a flood](https://public.wmo.int/en/our-mandate/water/no-one-is-surprised-by-a-flood). Browsing the webpage it seems that the organization is either a Key Organization Safeguarding the Vital Interests of the World or Just Another of the Many Bureaucracies Already in Existence, but it's unclear how to differentiate between the two.
- [95%-ile isn't that good](https://danluu.com/p95-skill/): "Reaching 95%-ile isn't very impressive because it's not that hard to do."
- [The Backwards Arrow of Time of the Coherently Bayesian Statistical Mechanic](https://arxiv.org/abs/cond-mat/0410063):
> "Many physicists think that the maximum entropy formalism is a straightforward application of Bayesian statistical ideas to statistical mechanics. Some even say that statistical mechanics is just the general Bayesian logic of inductive inference applied to large mechanical systems. This approach identifies thermodynamic entropy with the information-theoretic uncertainty of an (ideal) observer's subjective distribution over a system's microstates. In this brief note, I show that this postulate, plus the standard Bayesian procedure for updating probabilities, implies that the entropy of a classical system is monotonically non-increasing on the average -- the Bayesian statistical mechanic's arrow of time points backwards. Avoiding this unphysical conclusion requires rejecting the ordinary equations of motion, or practicing an incoherent form of statistical inference, or rejecting the identification of uncertainty and thermodynamic entropy."
This might be interesting to students in the tradition of E.T. Jaynes: for example, the paper directly conflicts with this LessWrong post: [The Second Law of Thermodynamics, and Engines of Cognition](https://www.lesswrong.com/posts/QkX2bAkwG2EpGvNug/the-second-law-of-thermodynamics-and-engines-of-cognition), part of *Rationality, From AI to Zombies*. The way out might be to postulate that actually, the Bayesian updating process itself would increase entropy, in the form of e.g., the work needed to update bits on a computer. Any applications to Christian lore are left as an excercise for the reader. Otherwise, seeing two bright people being cogently convinced of different perspectives does something funny to my probabilities: it pushes them towards 50%, but also increases the expected time I'd have to spend on the topic to move them away from 50%.
- [The Backwards Arrow of Time of the Coherently Bayesian Statistical Mechanic](https://arxiv.org/abs/cond-mat/0410063): Identifying thermodinamic entropy with the Bayesian uncertainty of an ideal observer leads to a contradiction, because as the observer observes more about the system, they update on this information, which reduces uncertainty, and thus entropy.
+ This might be interesting to students in the tradition of E.T. Jaynes: for example, the paper directly conflicts with this LessWrong post: [The Second Law of Thermodynamics, and Engines of Cognition](https://www.lesswrong.com/posts/QkX2bAkwG2EpGvNug/the-second-law-of-thermodynamics-and-engines-of-cognition), part of *Rationality, From AI to Zombies*. The way out might be to postulate that actually, the Bayesian updating process itself would increase entropy, in the form of e.g., the work needed to update bits on a computer. Any applications to Christian lore are left as an excercise for the reader. Otherwise, seeing two bright people being cogently convinced of different perspectives does something funny to my probabilities: it pushes them towards 50%, but also increases the expected time I'd have to spend on the topic to move them away from 50%.
- [Behavioral Problems of Adhering to a Decision Policy](https://pdfs.semanticscholar.org/7a79/28d5f133e4a274dcaec4d0a207daecde8068.pdf)
> Our judges in this study were eight individuals, carefully selected for their expertise as
handicappers. Each judge was presented with a list of 88 variables culled from the past performance charts. He was asked to indicate which five variables out of the 88 he would wish to use when handicapping a race, if all he could have was five variables. He was then asked to indicate which 10, which 20, and which 40 he would use if 10, 20, or 40 were available to him.
> We see that accuracy was as good with five variables as it was with 10, 20, or 40. The flat curve is an average over eight subjects and is somewhat misleading. Three of the eight actually showed a decrease in accuracy with more information, two improved, and three stayed about the same. All of the handicappers became more confident in their judgments as information increased.
The study contains other nuggets, such as:
+ The study contains other nuggets, such as:
- An experiment on trying to predict the outcome of a given equation. When the feedback has a margin of error, this confuses respondents.
- "However, the results indicated that subjects often chose one gamble, yet stated a higher selling price for the other gamble"
- "We figured that a comparison between two students along the same dimension should be easier, cognitively, than a 13 comparison between different dimensions, and this ease of use should lead to greater reliance on the common dimension. The data strongly confirmed this hypothesis. Dimensions were weighted more heavily when common than when they were unique attributes. Interrogation of the subjects after the experiment indicated that most did not wish to change their policies by giving more weight to common dimensions and they were unaware that they had done so."