Forecasting Newsletter for June - Draft

This commit is contained in:
Nuno Sempere 2020-06-28 19:57:53 +02:00
parent 2fa8fe7a07
commit 8408e5be4c

View File

@ -0,0 +1,116 @@
Whatever happened to forecasting? June 2020
===========================================
## Index
- Prediction Markets & Forecasting platforms.
- In the News.
- Grab bag.
- Negative examples.
- Long Content.
## Prediction Markets & Forecasting platforms.
Ordered in subjective order of importance:
- Foretell, a forecasting tournament by the Center for Security and Emerging Technology, is now [open](www.cset-foretell.com). I find the thought heartening that this might end up influencing actual politicians.
- Metaculus
- posted [A Preliminary Look at Metaculus and Expert Forecasts](https://www.metaculus.com/news/2020/06/02/LRT/): Metaculus forecasters do better, and the piece is a nice reference point.
- was featured in [Forbes](https://www.forbes.com/sites/erikbirkeneder/2020/06/01/do-crowdsourced-predictions-show-the-wisdom-of-humans/#743b7e106d9d).
- anounced their [Metaculus Summer Academy](https://www.metaculus.com/questions/4566/announcing-a-metaculus-academy-summer-series-for-new-forecasters/): "an introduction to forecasting for those who are relatively new to the activity and are looking for a fresh intellectual pursuit this summer"
- [Replication Markets](https://predict.replicationmarkets.com/) might add a new round with social and behavioral science claims related to COVID-19, and a preprint market, which would ask participants to forecast items like publication or citation. Replication Markets is also asking for more participants, with the catchline "If they are knowledgeable and opinionated, Replication Markets is the place to be to make your opinions really count."
- Good Judgement family
- [Good Judgement Open](https://www.gjopen.com/): Superforecasters were able to detect that Russia and the USA would in fact undertake some (albeit limited) form of negotiation, and do so much earlier than the general public, even while posting their reasons in full view. One thread to follow is [this one](https://www.gjopen.com/comments/1039968).
- Good Judgement Analytics continues to provide their [covid dashboard](https://goodjudgment.com/covidrecovery/).
- [PredictIt](https://www.predictit.org/) & [Election Betting Odds](http://electionbettingodds.com/). I stumbled upon an old [538 piece](https://fivethirtyeight.com/features/fake-polls-are-a-real-problem/) on fake polls: some may have been conducted by PredictIt traders in order to mislead or troll other PredictIt traders.
- Augur:
- [An overview of the platform and of v2 modifications](https://bravenewcoin.com/insights/augur-price-analysis-v2-release-scheuled-for-june-12th).
- Augur also happens to have a [blog](https://augur.substack.com/archive) with some interesting tidbits, such as the extremely clickbaity [How One Trader Turned $400 into $400k with Political Futures](https://augur.substack.com/p/how-one-trader-turned-400-into-400k) ("I find high volume markets...like the Democratic Nominee market or the 2020 Presidential Winner market... and what Im doing is Im just getting in line at the buy price and waiting my turn until my orders get filled. Then when those orders get filled I just sell them for 1c more.")
- [Coronavirus Information Markets](https://coronainformationmarkets.com/) is down to ca. $12000 in trading volume; it seems like they didn't take off.
## In the News.
- Facebook releases a forecasting app ([link to the app](https://www.forecastapp.net/), [press release](https://npe.fb.com/2020/06/23/forecast-a-community-for-crowdsourced-predictions-and-collective-insights/), [TechCrunch take](https://techcrunch.com/2020/06/23/facebook-tests-forecast-an-app-for-making-predictions-about-world-events-like-covid-19/), [hot-takes](https://cointelegraph.com/news/crypto-prediction-markets-face-competition-from-facebook-forecasts)). The release comes before Augur v2 launches, and it is easy to speculate that it might end up being combined with Facebook's stablecoin, Libra.
- Survey of macroeconomic researchers predicts economic recovery will take years, reports [538](https://fivethirtyeight.com/features/dont-expect-a-quick-recovery-our-survey-of-economists-says-it-will-likely-take-years/).
- The Economist has a new electoral model out ([article](https://www.economist.com/united-states/2020/06/11/meet-our-us-2020-election-forecasting-model), [model](https://projects.economist.com/us-2020-forecast/president)) which gives Trump an 11% chance of winning reelection. Given that Andrew Gelman was involved, I'm hesitant to critizice it, but it seems a tad overconfident.
- [Google](https://www.forbes.com/sites/jeffmcmahon/2020/05/31/thanks-to-renewables-and-machine-learning-google-now-forecasts-the-wind/) produces wind schedules for windfarms. "The result has been a 20 percent increase in revenue for wind farms". See [here](https://www.pv-magazine-australia.com/2020/06/01/solar-forecasting-evolves/) for essentially the same thing on solar forecasting.
- ["Israeli Central Bank Forecasting Gets Real During Pandemic"](https://www.nytimes.com/reuters/2020/06/23/world/middleeast/23reuters-health-coronavirus-israel-cenbank.html). Israeli Central Bank is using data to which it has real time access, like credit-card spending, instead of lagging indicators.
## Grab bag.
- [An online prediction market with reputation points](https://www.lesswrong.com/posts/sLbS93Fe4MTewFme3/an-online-prediction-market-with-reputation-points), implementing an [idea](https://sideways-view.com/2019/10/27/prediction-markets-for-internet-points/) by Paul Christiano.
- [Box Office Pro](https://www.boxofficepro.com/the-art-and-science-of-box-office-forecasting/) looks at some factors around box-office forecasting.
- [How to improve space weather forecasting](https://eos.org/research-spotlights/how-to-improve-space-weather-forecasting) (see [here](https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2018SW002108#) for the original paper):
> For instance, the National Oceanic and Atmospheric Administrations Deep Space Climate Observatory (DSCOVR) satellite sits at the location in space called L1, where the gravitational pulls of Earth and the Sun cancel out. At this point, which is roughly 1.5 million kilometers from Earth, or barely 1% of the way to the Sun, detectors can provide warnings with only short lead times: about 30 minutes before a storm hits Earth in most cases or as little as 17 minutes in advance of extremely fast solar storms.
- [A Personal COVID-19 Postmortem](https://www.lesswrong.com/posts/B7sHnk8P8EXmpfyCZ/a-personal-interim-covid-19-postmortem), by FHI researcher [David Manheim](https://twitter.com/davidmanheim).
> I think it's important to clearly and publicly admit when we were wrong. It's even better to diagnose why, and take steps to prevent doing so again. COVID-19 is far from over, but given my early stance on a number of questions regarding COVID-19, this is my attempt at a public personal review to see where I was wrong.
- [FantasyScotus](https://fantasyscotus.net/user-predictions/case/altitude-express-inc-v-zarda/) beat [GoodJudgementOpen](https://www.gjopen.com/questions/1300-in-zarda-v-altitude-express-inc-will-the-supreme-court-rule-that-the-civil-rights-act-of-1964-prohibition-against-employment-discrimination-because-of-sex-encompasses-discrimination-based-on-an-individual-s-sexual-orientation) on legal decisions. I'm still waiting to see whether [Hollywood Stock Exchange](https://www.hsx.com/search/?action=submit_nav&keyword=Mulan&Submit.x=0&Submit.y=0) will also beat GJOpen on [film predictions](https://www.gjopen.com/questions/1608-what-will-be-the-total-domestic-box-office-gross-for-disney-s-mulan-as-of-8-september-2020-according-to-box-office-mojo).
## Negative examples.
- World powers to converge on strategies for presenting COVID-19 information to make forecasters' jobs more interesting:
- [Brazil stops releasing Covid-19 death toll and wipes data from official site](https://www.theguardian.com/world/2020/jun/07/brazil-stops-releasing-covid-19-death-toll-and-wipes-data-from-official-site).
- Meanwhile in Russia, [St Petersburg issues 1,552 more death certificates in May than last year, but Covid-19 toll was 171](https://www.theguardian.com/world/2020/jun/04/st-petersburg-death-tally-casts-doubt-on-russian-coronavirus-figures).
- In the US, [CDC wants states to count probable coronavirus cases and deaths, but most arent doing it](https://www.washingtonpost.com/investigations/cdc-wants-states-to-count-probable-coronavirus-cases-and-deaths-but-most-arent-doing-it/2020/06/07/4aac9a58-9d0a-11ea-b60c-3be060a4f8e1_story.html)
- [India has the fourth-highest number of COVID-19 cases, but the Government denies community transmission](https://www.abc.net.au/news/2020-06-21/india-coronavirus-fourth-highest-covid19-community-transmission/12365738)
- One suspects that this denial is political, because India is otherwise [being](https://www.maritime-executive.com/editorials/advanced-cyclone-forecasting-is-saving-thousands-of-lives) [extremely](https://economictimes.indiatimes.com/news/politics-and-nation/world-meteorological-organization-appreciates-indias-highly-accurate-cyclone-forecasting-system/articleshow/76280763.cms) [competent](https://economictimes.indiatimes.com/news/politics-and-nation/mumbai-to-get-hyperlocal-rain-outlooks-flood-forecasting-launched/articleshow/76343558.cms) in weather forecasting.
- Youyang Gu's model, widely aclaimmed as one of the best coronavirus models for the US, produces 95% confidence intervals which are [too narrow](https://twitter.com/LinchZhang/status/1270443040860106753) when extended to [Pakistan](https://covid19-projections.com/pakistan).
- [COVID-19 vaccine before US election](https://www.aljazeera.com/ajimpact/wall-street-banking-covid-19-vaccine-election-200619204859320.html). Analysts see White House pushing through vaccine approval to bolster Trump's chances of reelection before voters head to polls. "All the datapoints we've collected make me think we're going to get a vaccine prior to the election," Jared Holz, a health-care strategist with Jefferies, said in a phone interview. The current administration is "incredibly incentivized to approve at least one of these vaccines before Nov. 3."
## Long Content.
- [When the crowds aren't wise](https://hbr.org/2006/09/when-crowds-arent-wise); a sober overview, with judicious use of [Cordocet's jury theorem](https://en.wikipedia.org/wiki/Condorcet's_jury_theorem)
> Suppose that each individual in a group is more likely to be wrong than right because relatively few people in the group have access to accurate information. In that case, the likelihood that the groups majority will decide correctly falls toward zero as the size of the group increases.
> Some prediction markets fail for just this reason. They have done really badly in predicting President Bushs appointments to the Supreme Court, for example. Until roughly two hours before the official announcement, the markets were essentially ignorant of the existence of John Roberts, now the chief justice of the United States. At the close of a prominent market just one day before his nomination, “shares” in Judge Roberts were trading at $0.19—representing an estimate that Roberts had a 1.9% chance of being nominated.
> Why was the crowd so unwise? Because it had little accurate information to go on; these investors, even en masse, knew almost nothing about the internal deliberations in the Bush administration. For similar reasons, prediction markets were quite wrong in forecasting that weapons of mass destruction would be found in Iraq and that special prosecutor Patrick Fitzgerald would indict Deputy Chief of Staff Karl Rove in late 2005.
- [A review of Tetlocks Superforecasting (2015)](https://dominiccummings.com/2016/11/24/a-review-of-tetlocks-superforecasting-2015/), by Dominic Cummings. Cummings then went on to hire one such superforecaster, which then resigned over a [culture war](https://www.bbc.com/news/uk-politics-51545541) scandal, characterized by adversarial selection of quotes which indeed are outside the British Overton Window. Notably, Dominic Cummings then told reporters to "Read Philip Tetlock's *Superforecasters*, instead of political pundits who don't know what they're talking about."
- [Coup cast](https://oefresearch.org/activities/coup-cast): A site which estimates the yearly probability of coup. The color coding is misleading; click on the countries instead.
- [A list of prediction markets](https://docs.google.com/spreadsheets/d/1XB1GHfizNtVYTOAD_uOyBLEyl_EV7hVtDYDXLQwgT7k/edit#gid=0), and their fates, mantained by Jacob Laguerros. Like most startups, most prediction markets fail.
- [Prediction = Compression](https://www.lesswrong.com/posts/hAvGi9YAPZAnnjZNY/prediction-compression-transcript-1). "Whenever you have a prediction algorithm, you can also get a correspondingly good compression algorithm for data you already have, and vice versa."
- [Assessing the Performance of Real-Time Epidemic Forecasts: A Case Study of *Ebola* in the Western Area Region of Sierra Leone, 2014-15](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6386417/). The one caveat is that their data is much better than coronavirus data, because Ebola symptoms are more evident; otherwise, pretty interesting:
> Real-time forecasts based on mathematical models can inform critical decision-making during infectious disease outbreaks. Yet, epidemic forecasts are rarely evaluated during or after the event, and there is little guidance on the best metrics for assessment.
> ...good probabilistic calibration was achievable at short time horizons of one or two weeks ahead but model predictions were increasingly unreliable at longer forecasting horizons.
> This suggests that forecasts may have been of good enough quality to inform decision making based on predictions a few weeks ahead of time but not longer, reflecting the high level of uncertainty in the processes driving the trajectory of the epidemic.
> Comparing different versions of our model to simpler models, we further found that it would have been possible to determine the model that was most reliable at making forecasts from early on in the epidemic. This suggests that there is value in assessing forecasts, and that it should be possible to improve forecasts by checking how good they are during an ongoing epidemic.
> One forecast that gained particular attention during the epidemic was published in the summer of 2014, projecting that by early 2015 there might be 1.4 million cases. This number was based on unmitigated growth in the absence of further intervention and proved a gross overestimate, yet it was later highlighted as a “call to arms” that served to trigger the international response that helped avoid the worst-case scenario
> Methods to assess probabilistic forecasts are now being used in other fields, but are not commonly applied in infectious disease epidemiology
> The deterministic SEIR model we used as a null model performed poorly on all forecasting scores, and failed to capture the downturn of the epidemic in Western Area.
> On the other hand, a well-calibrated mechanistic model that accounts for all relevant dynamic factors and external influences could, in principle, have been used to predict the behaviour of the epidemic reliably and precisely. Yet, lack of detailed data on transmission routes and risk factors precluded the parameterisation of such a model and are likely to do so again in future epidemics in resource-poor settings.
- [Calibration Scoring Rules for Practical Prediction Training](https://arxiv.org/abs/1808.07501). I found it most interesting when considering how Brier and log rules didn't have all the pedagogic desiderata.
- I also found the following derivation of the logarithmic scoring rule interesting. Consider: If you assign a probability to n events, then the combined probability of these events is p1 x p2 x p3 x ... pn. Taking logarithms, this is log(p1 x p2 x p3 x ... x pn) = Σ log(pn), i.e., the logarithmic scoring rule.
- [Binary Scoring Rules that Incentivize Precision](https://arxiv.org/abs/2002.10669). The results (the closed-form of scoring rules which minimize the a given forecasting error) are interesting, but the journey to get there is kind of a drag, and ultimately the logarithmic scoring rule ends up being pretty decent according to their measure of error.
- Opinion: I'm not sure whether their results are going to be useful for things I'm interested in (like human forecasting tournaments, rather than kaggle data analysis competitions). In practice, what I might do if I wanted to incentivize precision is to ask myself if this is a question where the answer is going to be closer to 50%, or closer to either of 0% or 100%, and then use either the Brier or the logarithmic scoring rules. That is, I don't want to minimize an l-norm of the error over [0,1], I want to minimize an l-norm over the region I think the answer is going to be in, and the paper falls short of addressing that.