Compare commits
19 Commits
444648619c
...
afc175d2b9
Author | SHA1 | Date |
---|---|---|
Nuno Sempere | afc175d2b9 | 6 months ago |
Nuno Sempere | 8d9b82f1a3 | 6 months ago |
Nuno Sempere | aa23baa293 | 6 months ago |
Nuno Sempere | 65a33de789 | 6 months ago |
Nuno Sempere | 824cd87850 | 7 months ago |
Nuno Sempere | 026501bb14 | 7 months ago |
Nuno Sempere | d931e354e1 | 8 months ago |
Nuno Sempere | b8b84b505d | 8 months ago |
Nuno Sempere | 41f8e7d56a | 8 months ago |
Nuno Sempere | 711f3698dc | 8 months ago |
Nuno Sempere | 9d16722ebb | 9 months ago |
Nuno Sempere | 738b609a3f | 9 months ago |
Nuno Sempere | 7eb7a65824 | 9 months ago |
Nuno Sempere | dd9538f970 | 9 months ago |
Nuno Sempere | 7f44d84bb0 | 9 months ago |
Nuno Sempere | 17b0599fdc | 9 months ago |
Nuno Sempere | 0a967e74b4 | 9 months ago |
Nuno Sempere | 2e7f31c6be | 9 months ago |
Nuno Sempere | f2626320f9 | 9 months ago |
@ -0,0 +1,16 @@
|
||||
Here are some links to those who, through no fault of their own, I consider friends of the blog:
|
||||
|
||||
https://gavinhoward.com
|
||||
https://maia.crimew.gay
|
||||
https://www.themotte.org/
|
||||
https://www.gleech.org/
|
||||
https://www.askell.blog/
|
||||
https://sebastiano.tronto.net/
|
||||
https://hindenburgresearch.com/
|
||||
https://blog.tinfoil-hat.net
|
||||
https://acesounderglass.com/
|
||||
http://annas-blog.org/
|
||||
https://cadence.moe/
|
||||
https://suckless.org/atom.xml
|
||||
https://niplav.site/services.html
|
||||
https://philiptrammell.com/blog/
|
|
@ -0,0 +1,22 @@
|
||||
if (document.domain == "twitter.com" ){
|
||||
styles = `
|
||||
/* hide promoted tweets */
|
||||
:has(meta[property="og:site_name"][content="Twitter"])
|
||||
[data-testid="cellInnerDiv"]:has(svg + [dir="auto"]) {
|
||||
display: none;
|
||||
}
|
||||
[data-testid^="placementTracking"] {
|
||||
display: none;
|
||||
}
|
||||
|
||||
/* hide what's happening section */
|
||||
:has(meta[property="og:site_name"][content="Twitter"])
|
||||
[aria-label="Timeline: Trending now"] {
|
||||
display: none !important;
|
||||
}
|
||||
[data-testid^="sidebarColumn"] {
|
||||
display: none;
|
||||
}
|
||||
|
||||
`
|
||||
}
|
@ -1,20 +1,20 @@
|
||||
## In 2020...
|
||||
|
||||
- [A review of two free online MIT Global Poverty courses](https://nunosempere.com/2020/01/15/mit-edx-review)
|
||||
- [A review of two books on survey-making](https://nunosempere.com/2020/03/01/survey-making)
|
||||
- [Shapley Values II: Philantropic Coordination Theory & other miscellanea.](https://nunosempere.com/2020/03/10/shapley-values-ii)
|
||||
- [New Cause Proposal: International Supply Chain Accountability](https://nunosempere.com/2020/04/01/international-supply-chain-accountability)
|
||||
- [Forecasting Newsletter: April 2020](https://nunosempere.com/2020/04/30/forecasting-newsletter-2020-04)
|
||||
- [Forecasting Newsletter: May 2020.](https://nunosempere.com/2020/05/31/forecasting-newsletter-2020-05)
|
||||
- [Forecasting Newsletter: June 2020.](https://nunosempere.com/2020/07/01/forecasting-newsletter-2020-06)
|
||||
- [Forecasting Newsletter: July 2020.](https://nunosempere.com/2020/08/01/forecasting-newsletter-2020-07)
|
||||
- [Forecasting Newsletter: August 2020. ](https://nunosempere.com/2020/09/01/forecasting-newsletter-august-2020)
|
||||
- [Forecasting Newsletter: September 2020. ](https://nunosempere.com/2020/10/01/forecasting-newsletter-september-2020)
|
||||
- [Forecasting Newsletter: October 2020.](https://nunosempere.com/2020/11/01/forecasting-newsletter-october-2020)
|
||||
- [Incentive Problems With Current Forecasting Competitions.](https://nunosempere.com/2020/11/10/incentive-problems-with-current-forecasting-competitions)
|
||||
- [Announcing the Forecasting Innovation Prize](https://nunosempere.com/2020/11/15/announcing-the-forecasting-innovation-prize)
|
||||
- [Predicting the Value of Small Altruistic Projects: A Proof of Concept Experiment.](https://nunosempere.com/2020/11/22/predicting-the-value-of-small-altruistic-projects-a-proof-of)
|
||||
- [An experiment to evaluate the value of one researcher's work](https://nunosempere.com/2020/12/01/an-experiment-to-evaluate-the-value-of-one-researcher-s-work)
|
||||
- [Forecasting Newsletter: November 2020.](https://nunosempere.com/2020/12/01/forecasting-newsletter-november-2020)
|
||||
- [What are good rubrics or rubric elements to evaluate and predict impact?](https://nunosempere.com/2020/12/03/what-are-good-rubrics-or-rubric-elements-to-evaluate-and)
|
||||
- [Big List of Cause Candidates](https://nunosempere.com/2020/12/25/big-list-of-cause-candidates)
|
||||
- [A review of two free online MIT Global Poverty courses](https://nunosempere.com/blog/2020/01/15/mit-edx-review)
|
||||
- [A review of two books on survey-making](https://nunosempere.com/blog/2020/03/01/survey-making)
|
||||
- [Shapley Values II: Philantropic Coordination Theory & other miscellanea.](https://nunosempere.com/blog/2020/03/10/shapley-values-ii)
|
||||
- [New Cause Proposal: International Supply Chain Accountability](https://nunosempere.com/blog/2020/04/01/international-supply-chain-accountability)
|
||||
- [Forecasting Newsletter: April 2020](https://nunosempere.com/blog/2020/04/30/forecasting-newsletter-2020-04)
|
||||
- [Forecasting Newsletter: May 2020.](https://nunosempere.com/blog/2020/05/31/forecasting-newsletter-2020-05)
|
||||
- [Forecasting Newsletter: June 2020.](https://nunosempere.com/blog/2020/07/01/forecasting-newsletter-2020-06)
|
||||
- [Forecasting Newsletter: July 2020.](https://nunosempere.com/blog/2020/08/01/forecasting-newsletter-2020-07)
|
||||
- [Forecasting Newsletter: August 2020. ](https://nunosempere.com/blog/2020/09/01/forecasting-newsletter-august-2020)
|
||||
- [Forecasting Newsletter: September 2020. ](https://nunosempere.com/blog/2020/10/01/forecasting-newsletter-september-2020)
|
||||
- [Forecasting Newsletter: October 2020.](https://nunosempere.com/blog/2020/11/01/forecasting-newsletter-october-2020)
|
||||
- [Incentive Problems With Current Forecasting Competitions.](https://nunosempere.com/blog/2020/11/10/incentive-problems-with-current-forecasting-competitions)
|
||||
- [Announcing the Forecasting Innovation Prize](https://nunosempere.com/blog/2020/11/15/announcing-the-forecasting-innovation-prize)
|
||||
- [Predicting the Value of Small Altruistic Projects: A Proof of Concept Experiment.](https://nunosempere.com/blog/2020/11/22/predicting-the-value-of-small-altruistic-projects-a-proof-of)
|
||||
- [An experiment to evaluate the value of one researcher's work](https://nunosempere.com/blog/2020/12/01/an-experiment-to-evaluate-the-value-of-one-researcher-s-work)
|
||||
- [Forecasting Newsletter: November 2020.](https://nunosempere.com/blog/2020/12/01/forecasting-newsletter-november-2020)
|
||||
- [What are good rubrics or rubric elements to evaluate and predict impact?](https://nunosempere.com/blog/2020/12/03/what-are-good-rubrics-or-rubric-elements-to-evaluate-and)
|
||||
- [Big List of Cause Candidates](https://nunosempere.com/blog/2020/12/25/big-list-of-cause-candidates)
|
||||
|
@ -1,30 +1,30 @@
|
||||
## In 2021...
|
||||
|
||||
- [Forecasting Newsletter: December 2020](https://nunosempere.com/2021/01/01/forecasting-newsletter-december-2020)
|
||||
- [2020: Forecasting in Review](https://nunosempere.com/2021/01/10/2020-forecasting-in-review)
|
||||
- [A Funnel for Cause Candidates](https://nunosempere.com/2021/01/13/a-funnel-for-cause-candidates)
|
||||
- [Forecasting Newsletter: January 2021](https://nunosempere.com/2021/02/01/forecasting-newsletter-january-2021)
|
||||
- [Forecasting Prize Results](https://nunosempere.com/2021/02/19/forecasting-prize-results)
|
||||
- [Forecasting Newsletter: February 2021](https://nunosempere.com/2021/03/01/forecasting-newsletter-february-2021)
|
||||
- [Introducing Metaforecast: A Forecast Aggregator and Search Tool](https://nunosempere.com/2021/03/07/introducing-metaforecast-a-forecast-aggregator-and-search)
|
||||
- [Relative Impact of the First 10 EA Forum Prize Winners](https://nunosempere.com/2021/03/16/relative-impact-of-the-first-10-ea-forum-prize-winners)
|
||||
- [Forecasting Newsletter: March 2021](https://nunosempere.com/2021/04/01/forecasting-newsletter-march-2021)
|
||||
- [Forecasting Newsletter: April 2021](https://nunosempere.com/2021/05/01/forecasting-newsletter-april-2021)
|
||||
- [Forecasting Newsletter: May 2021](https://nunosempere.com/2021/06/01/forecasting-newsletter-may-2021)
|
||||
- [2018-2019 Long-Term Future Fund Grantees: How did they do?](https://nunosempere.com/2021/06/16/2018-2019-long-term-future-fund-grantees-how-did-they-do)
|
||||
- [What should the norms around privacy and evaluation in the EA community be?](https://nunosempere.com/2021/06/16/what-should-the-norms-around-privacy-and-evaluation-in-the)
|
||||
- [Shallow evaluations of longtermist organizations](https://nunosempere.com/2021/06/24/shallow-evaluations-of-longtermist-organizations)
|
||||
- [Forecasting Newsletter: June 2021](https://nunosempere.com/2021/07/01/forecasting-newsletter-june-2021)
|
||||
- [Forecasting Newsletter: July 2021](https://nunosempere.com/2021/08/01/forecasting-newsletter-july-2021)
|
||||
- [Forecasting Newsletter: August 2021](https://nunosempere.com/2021/09/01/forecasting-newsletter-august-2021)
|
||||
- [Frank Feedback Given To Very Junior Researchers](https://nunosempere.com/2021/09/01/frank-feedback-given-to-very-junior-researchers)
|
||||
- [Building Blocks of Utility Maximization](https://nunosempere.com/2021/09/20/building-blocks-of-utility-maximization)
|
||||
- [Forecasting Newsletter: September 2021.](https://nunosempere.com/2021/10/01/forecasting-newsletter-september-2021)
|
||||
- [An estimate of the value of Metaculus questions](https://nunosempere.com/2021/10/22/an-estimate-of-the-value-of-metaculus-questions)
|
||||
- [Forecasting Newsletter: October 2021.](https://nunosempere.com/2021/11/02/forecasting-newsletter-october-2021)
|
||||
- [A Model of Patient Spending and Movement Building](https://nunosempere.com/2021/11/08/a-model-of-patient-spending-and-movement-building)
|
||||
- [Simple comparison polling to create utility functions](https://nunosempere.com/2021/11/15/simple-comparison-polling-to-create-utility-functions)
|
||||
- [Pathways to impact for forecasting and evaluation](https://nunosempere.com/2021/11/25/pathways-to-impact-for-forecasting-and-evaluation)
|
||||
- [Forecasting Newsletter: November 2021](https://nunosempere.com/2021/12/02/forecasting-newsletter-november-2021)
|
||||
- [External Evaluation of the EA Wiki](https://nunosempere.com/2021/12/13/external-evaluation-of-the-ea-wiki)
|
||||
- [Prediction Markets in The Corporate Setting](https://nunosempere.com/2021/12/31/prediction-markets-in-the-corporate-setting)
|
||||
- [Forecasting Newsletter: December 2020](https://nunosempere.com/blog/2021/01/01/forecasting-newsletter-december-2020)
|
||||
- [2020: Forecasting in Review](https://nunosempere.com/blog/2021/01/10/2020-forecasting-in-review)
|
||||
- [A Funnel for Cause Candidates](https://nunosempere.com/blog/2021/01/13/a-funnel-for-cause-candidates)
|
||||
- [Forecasting Newsletter: January 2021](https://nunosempere.com/blog/2021/02/01/forecasting-newsletter-january-2021)
|
||||
- [Forecasting Prize Results](https://nunosempere.com/blog/2021/02/19/forecasting-prize-results)
|
||||
- [Forecasting Newsletter: February 2021](https://nunosempere.com/blog/2021/03/01/forecasting-newsletter-february-2021)
|
||||
- [Introducing Metaforecast: A Forecast Aggregator and Search Tool](https://nunosempere.com/blog/2021/03/07/introducing-metaforecast-a-forecast-aggregator-and-search)
|
||||
- [Relative Impact of the First 10 EA Forum Prize Winners](https://nunosempere.com/blog/2021/03/16/relative-impact-of-the-first-10-ea-forum-prize-winners)
|
||||
- [Forecasting Newsletter: March 2021](https://nunosempere.com/blog/2021/04/01/forecasting-newsletter-march-2021)
|
||||
- [Forecasting Newsletter: April 2021](https://nunosempere.com/blog/2021/05/01/forecasting-newsletter-april-2021)
|
||||
- [Forecasting Newsletter: May 2021](https://nunosempere.com/blog/2021/06/01/forecasting-newsletter-may-2021)
|
||||
- [2018-2019 Long-Term Future Fund Grantees: How did they do?](https://nunosempere.com/blog/2021/06/16/2018-2019-long-term-future-fund-grantees-how-did-they-do)
|
||||
- [What should the norms around privacy and evaluation in the EA community be?](https://nunosempere.com/blog/2021/06/16/what-should-the-norms-around-privacy-and-evaluation-in-the)
|
||||
- [Shallow evaluations of longtermist organizations](https://nunosempere.com/blog/2021/06/24/shallow-evaluations-of-longtermist-organizations)
|
||||
- [Forecasting Newsletter: June 2021](https://nunosempere.com/blog/2021/07/01/forecasting-newsletter-june-2021)
|
||||
- [Forecasting Newsletter: July 2021](https://nunosempere.com/blog/2021/08/01/forecasting-newsletter-july-2021)
|
||||
- [Forecasting Newsletter: August 2021](https://nunosempere.com/blog/2021/09/01/forecasting-newsletter-august-2021)
|
||||
- [Frank Feedback Given To Very Junior Researchers](https://nunosempere.com/blog/2021/09/01/frank-feedback-given-to-very-junior-researchers)
|
||||
- [Building Blocks of Utility Maximization](https://nunosempere.com/blog/2021/09/20/building-blocks-of-utility-maximization)
|
||||
- [Forecasting Newsletter: September 2021.](https://nunosempere.com/blog/2021/10/01/forecasting-newsletter-september-2021)
|
||||
- [An estimate of the value of Metaculus questions](https://nunosempere.com/blog/2021/10/22/an-estimate-of-the-value-of-metaculus-questions)
|
||||
- [Forecasting Newsletter: October 2021.](https://nunosempere.com/blog/2021/11/02/forecasting-newsletter-october-2021)
|
||||
- [A Model of Patient Spending and Movement Building](https://nunosempere.com/blog/2021/11/08/a-model-of-patient-spending-and-movement-building)
|
||||
- [Simple comparison polling to create utility functions](https://nunosempere.com/blog/2021/11/15/simple-comparison-polling-to-create-utility-functions)
|
||||
- [Pathways to impact for forecasting and evaluation](https://nunosempere.com/blog/2021/11/25/pathways-to-impact-for-forecasting-and-evaluation)
|
||||
- [Forecasting Newsletter: November 2021](https://nunosempere.com/blog/2021/12/02/forecasting-newsletter-november-2021)
|
||||
- [External Evaluation of the EA Wiki](https://nunosempere.com/blog/2021/12/13/external-evaluation-of-the-ea-wiki)
|
||||
- [Prediction Markets in The Corporate Setting](https://nunosempere.com/blog/2021/12/31/prediction-markets-in-the-corporate-setting)
|
||||
|
@ -0,0 +1,90 @@
|
||||
Webpages I am making available to my corner of the internet
|
||||
===========================================================
|
||||
|
||||
Here is a list of internet services that I make freely available to friends and allies, broadly defined—if you are reading this, you qualify. These are ordered roughly in order of usefulness.
|
||||
|
||||
### search.nunosempere.com
|
||||
|
||||
[search.nunosempere.com](https://search.nunosempere.com/) is an instance of [Whoogle](https://github.com/benbusby/whoogle-search). It presents Google results as they were and as they should have been: without clutter and without advertisements.
|
||||
|
||||
Readers are welcome to make this their default search engine. The process to do this is a bit involved and depends on the browser, but can be found with a Whoogle search. In past years, I've had technical difficulties around once every six months, but tend to fix them quickly.
|
||||
|
||||
### forum.nunosempere.com
|
||||
|
||||
[forum.nunosempere.com](https://forum.nunosempere.com) is a frontend to the [Effective Altruism Forum](https://forum.effectivealtruism.org/) that I personally find soothing. It is *much* faster than the official frontend, more minimalistic, and offers an RSS endpoint for all posts [here](https://forum.nunosempere.com/feed).
|
||||
|
||||
```
|
||||
$ time curl https://forum.effectivealtruism.org > /dev/null
|
||||
% Total % Received % Xferd Average Speed Time Time Time Current
|
||||
Dload Upload Total Spent Left Speed
|
||||
100 847k 0 847k 0 0 439k 0 --:--:-- 0:00:01 --:--:-- 438k
|
||||
|
||||
real 0m1.945s
|
||||
user 0m0.030s
|
||||
sys 0m0.021s
|
||||
|
||||
$ time curl https://forum.nunosempere.com/frontpage > /dev/null
|
||||
% Total % Received % Xferd Average Speed Time Time Time Current
|
||||
Dload Upload Total Spent Left Speed
|
||||
100 35091 100 35091 0 0 190k 0 --:--:-- --:--:-- --:--:-- 190k
|
||||
|
||||
real 0m0.195s
|
||||
user 0m0.025s
|
||||
sys 0m0.004s
|
||||
```
|
||||
|
||||
If you use the EA Forum with some frequency, I'd recommend giving it and [ea.greaterwrong.com](https://ea.greaterwrong.com/) a spin.
|
||||
|
||||
### shapleyvalue.com
|
||||
|
||||
[shapleyvalue.com](http://shapleyvalue.com/) is an online calculator for [Shapley Values](https://wikiless.northboot.xyz/wiki/Shapley_value?lang=en). I wrote it for [this explainer](https://forum.effectivealtruism.org/s/XbCaYR3QfDaeuJ4By/p/XHZJ9i7QBtAJZ6byW) after realizing that no other quick calculators exist.
|
||||
|
||||
### Find a beta distribution which fits a given confidence interval
|
||||
|
||||
[trastos.nunosempere.com/fit-beta](https://trastos.nunosempere.com/fit-beta) is a POST endpoint to find a beta distribution that fits a given confidence interval.
|
||||
|
||||
```
|
||||
curl -X POST -H "Content-Type: application/json" \
|
||||
-d '{"ci_lower": "0.2", "ci_upper":"0.8", "ci_length": "0.95"}' \
|
||||
https://trastos.nunosempere.com/fit-beta
|
||||
```
|
||||
|
||||
I also provide a widget [here](https://nunosempere.com/blog/2023/03/15/fit-beta/) and an npm package [here](https://www.npmjs.com/package/fit-beta), which is probably more convenient than the endpoint.
|
||||
|
||||
### nunosempere.com/misc/proportional-approval-voting-calculator/
|
||||
|
||||
Proportional approval voting is a bit tricky to generalize to choosing candidates for more than one position, which is why little software for it exists. [This page](https://nunosempere.com/misc/proportional-approval-voting-calculator/) provides a samaple implementation. It was previously hosted [here](https://nunosempere.com/misc/proportional-approval-voting-calculator/).
|
||||
|
||||
### git.nunosempere.com
|
||||
|
||||
[git.nunosempere.com](https://git.nunosempere.com/) is my personal git server. It hosts some of my personal projects, and occasional backups of some open source projects worth preserving.
|
||||
|
||||
### video.nunosempere.com
|
||||
|
||||
[video.nunosempere.com](https://video.nunosempere.com) is a [peertube](https://github.com/Chocobozzz/PeerTube/) instance with some videos worth preserving.
|
||||
|
||||
### royalroad.nunosempere.com
|
||||
|
||||
A frontend for [Royal Road](https://www.royalroad.com/), a site which hosts online fiction but which has grown pretty cluttered. Reuses a whole lot of the code from forum.nunosempere.com.
|
||||
|
||||
### wikiless.nunosempere.com (added 27/08/2023)
|
||||
|
||||
A [frontend](https://wikiless.nunosempere.com/) for Wikipedia.
|
||||
|
||||
### gatitos.nunosempere.com
|
||||
|
||||
Shows a photo of two cute cats:
|
||||
|
||||
<img src="https://gatitos.nunosempere.com/">
|
||||
|
||||
### Also on this topic
|
||||
|
||||
- [Soothing Software](https://nunosempere.com/blog/2023/03/27/soothing-software/)
|
||||
- [Hacking on rose](https://nunosempere.com/blog/2022/12/20/hacking-on-rose/)—in particular, readers might be interested in [this code](https://git.nunosempere.com/open.source/rosenrot/src/branch/master/plugins/style/style.js#L62) to block advertisements on Reddit and Twitter. It could be adapted for Firefox with an extension like [Stylus](https://addons.mozilla.org/en-US/firefox/addon/styl-us/).
|
||||
- [Metaforecast](https://metaforecast.org/), which I started, and which is now maintained by Slava Matyuhin of QURI and myself.
|
||||
|
||||
<p>
|
||||
<section id='isso-thread'>
|
||||
<noscript>javascript needs to be activated to view comments.</noscript>
|
||||
</section>
|
||||
</p>
|
After Width: | Height: | Size: 31 KiB |
@ -0,0 +1,111 @@
|
||||
<h1>Incorporate keeping track of accuracy into X (previously Twitter)</h1>
|
||||
|
||||
<p><strong>tl;dr</strong>: Incorporate keeping track of accuracy into X<sup id="fnref:1"><a href="#fn:1" rel="footnote">1</a></sup>. This contributes to the goal of making X the chief source of information, and strengthens humanity by providing better epistemic incentives and better mechanisms to separate the wheat from the chaff in terms of getting at the truth together.</p>
|
||||
|
||||
<h2>Why do this?</h2>
|
||||
|
||||
<p><img src="https://images.nunosempere.com/blog/2023/08/19/keeping-track-of-accuracy-on-twitter/michael-dragon.jpg" alt="St Michael Killing the Dragon - public domain, via Wikimedia commons" style="width: 30% !important"/></p>
|
||||
|
||||
|
||||
<ul>
|
||||
<li>Because it can be done</li>
|
||||
<li>Because keeping track of accuracy allows people to separate the wheat from the chaff at scale, which would make humanity more powerful, more <a href="https://nunosempere.com/blog/2023/07/19/better-harder-faster-stronger/">formidable</a>.</li>
|
||||
<li>Because it is an asymmetric weapon, like community notes, that help the good guys who are trying to get at what is true much more than the bad guys who are either not trying to do that or are bad at it.</li>
|
||||
<li>Because you can’t get better at learning true things if you aren’t trying, and current social media platforms are, for the most part, not incentivizing that trying.</li>
|
||||
<li>Because rival organizations—like the New York Times, Instagram Threads, or school textbooks—would be made more obsolete by this kind of functionality.</li>
|
||||
</ul>
|
||||
|
||||
|
||||
<h2>Core functionality</h2>
|
||||
|
||||
<p>I think that you can distill the core of keeping track of accuracy to three elements<sup id="fnref:2"><a href="#fn:2" rel="footnote">2</a></sup>: predict, resolve, and tally. You can see a minimal implementation of this functionality in <60 lines of bash <a href="https://github.com/NunoSempere/PredictResolveTally/tree/master">here</a>.</p>
|
||||
|
||||
<h3>predict</h3>
|
||||
|
||||
<p>make a prediction. This prediction could take the form of</p>
|
||||
|
||||
<ol>
|
||||
<li>a yes/no sentence, like “By 2030, I say that Tesla will be worth $1T”</li>
|
||||
<li>a probability, like “I say that there is a 70% chance that by 2030, Tesla will be worth $1T”</li>
|
||||
<li>a range, like “I think that by 2030, Tesla’s market cap will be worth between $800B and $5T”</li>
|
||||
<li>a probability distribution, like “Here is my probability distribution over how likely each possible market cap of Tesla is by 2030”</li>
|
||||
<li>more complicated options, e.g., a forecasting function that gives an estimate of market cap at every point in time.</li>
|
||||
</ol>
|
||||
|
||||
|
||||
<p>I think that the sweet spot is on #2: asking for probabilities. #1 doesn’t capture that we normally have uncertainty about events—e.g., in the recent superconductor debacle, we were not completely sure one way or the other until the end—, and it is tricky to have a system which scores both #3-#5 and #2. Particularly at scale, I would lean towards recommending using probabilities rather than something more ambitious, at first.</p>
|
||||
|
||||
<p>Note that each example gave both a statement that was being predicted, and a date by which the prediction is resolved.</p>
|
||||
|
||||
<h3>resolve</h3>
|
||||
|
||||
<p>Once the date of resolution has been reached, a prediction can be marked as true/false/ambiguous. Ambiguous resolutions are bad, because the people who have put effort into making a prediction feel like their time has been wasted, so it is good to minimize them.</p>
|
||||
|
||||
<p>You can have a few distinct methods of resolution. Here are a few:</p>
|
||||
|
||||
<ul>
|
||||
<li>Every question has a question creator, who resolves it</li>
|
||||
<li>Each person creates and resolves their own predictions</li>
|
||||
<li>You have a community-notes style mechanism for resolving questions</li>
|
||||
<li>You have a jury of randomly chosen peers who resolves the prediction</li>
|
||||
<li>You have a jury of previously trusted members, who resolves the question</li>
|
||||
<li>You can use a <a href="https://en.wikipedia.org/wiki/Keynesian_beauty_contest">Keynesian Beauty Contest</a>, like Kleros or UMA, where judges are rewarded for agreeing with the majority opinion of other judges. This disincentivizes correct resolutions for unpopular-but-true questions, so I would hesitate before using it.</li>
|
||||
</ul>
|
||||
|
||||
|
||||
<p>Note that you can have resolution methods that can be challeged, like the lower court/court of appeals/supreme court system in the US. For example, you could have a system where initially a question is resolved by a small number of randomly chosen jurors, but if someone gives a strong signal that they object to the resolution—e.g., if they pay for it, or if they spend one of a few “appeals” tokens—then the question is resolved by a larger pool of jurors.</p>
|
||||
|
||||
<p>Note that the resolution method will shape the flavour of your prediction functionality, and constrain the types of questions that people can forecast on. You can have a more anarchic system, where everyone can instantly create a question and predict on it. Then, people will create many more questions, but perhaps they will have a bias towards resolving questions in their own favour, and you will have slightly duplicate questions. Then you will get something closer to <a href="https://manifold.markets/">Manifold Markets</a>. Or you could have a mechanism where people propose questions and these are made robust to corner cases in their resolution criteria by volunteers, and then later resolved by a jury of volunteers. Then you will get something like <a href="https://www.metaculus.com/">Metaculus</a>, where you have fewer questions but these are of higher quality and have more reliable resolutions.</p>
|
||||
|
||||
<p>Ultimately, I’m not saying that the resolution method is unimportant. But I think there is a temptation to nerd out too much about the specifics, and having some resolution method that is transparently outlined and shipping it quickly seems much better than getting stuck at this step.</p>
|
||||
|
||||
<h3>tally</h3>
|
||||
|
||||
<p>Lastly, present the information about what proportion of people’s predictions come true. E.g., of the times I have predicted a 60% likelihood of something, how often has it come true? Ditto for other percentages. These are normally binned to produce a calibration chart, like the following:</p>
|
||||
|
||||
<p><img src="https://images.nunosempere.com/blog/2023/08/19/keeping-track-of-accuracy-on-twitter/calibrationChart2.png" alt="my calibration chart from Good Judgment Open" /></p>
|
||||
|
||||
<p>On top of that starting point, you can also do more elaborate things:</p>
|
||||
|
||||
<ul>
|
||||
<li>You can have a summary statistic—a proper scoring rule, like the Brier score or a log score—that summarizes how good you are at prediction “in general”. Possibly this might involve comparing your performance to the performance of people who predicted in the same questions.</li>
|
||||
<li>Previously, you could have allowed people to bet against each other. Then, their profits would indicate how good they are. I think this might be too complicated at Twitter style, at least at first.</li>
|
||||
</ul>
|
||||
|
||||
|
||||
<p><a href="https://arxiv.org/abs/2106.11248">Here</a> is a review of some mistakes people have previously made when scoring these kinds of forecasts. For example, if you have some per-question accuracy reward, people will gravitate towards forecasting on easier rather than on more useful questions. These kinds of considerations are important, particularly since they will determine who will be at the top of some scoring leaderboard, if there is any such. Generally, <a href="https://arxiv.org/abs/1803.04585">Goodhart’s law</a> is going to be a problem here. But again, having <em>some</em> tallying mechanism seems way better than the current information environment.</p>
|
||||
|
||||
<p>Once you have some tallying—whether a calibration chart, a score from a proper scoring rule, or some profit it Musk-Bucks<sup id="fnref:3"><a href="#fn:3" rel="footnote">3</a></sup>, such a tally could:</p>
|
||||
|
||||
<ul>
|
||||
<li>be semi-prominently displayed so that people can look to it when deciding how much to trust an account,</li>
|
||||
<li>be used by X’s algorithm to show more accurate accounts a bit more at the margin,</li>
|
||||
<li>provide an incentive for people to be accurate,</li>
|
||||
<li>provide a way for people who want to become more accurate to track their performance</li>
|
||||
</ul>
|
||||
|
||||
|
||||
<p>When dealing with catastrophes, wars, discoveries, and generally with events that challenge humanity’s ability to figure out what is going on, having these mechanisms in place would help humanity make better decisions about who to listen to: to listen not to who is loudest but to who is most right.</p>
|
||||
|
||||
<h2>Conclusion</h2>
|
||||
|
||||
<p>X can do this. It would help with its goal of outcompeting other sources of information, and it would do this fair and square by improving humanity’s collective ability to get at the truth. I don’t know what other challenges and plans Musk has in store for X, but I would strongly consider adding this functionality to it.</p>
|
||||
|
||||
<p>
|
||||
<section id='isso-thread'>
|
||||
<noscript>javascript needs to be activated to view comments.</noscript>
|
||||
</section>
|
||||
</p>
|
||||
|
||||
<div class="footnotes">
|
||||
<hr/>
|
||||
<ol>
|
||||
<li id="fn:1">
|
||||
previously Twitter<a href="#fnref:1" rev="footnote">↩</a></li>
|
||||
<li id="fn:2">
|
||||
Ok, four, if we count question creation and prediction as distinct. But I like <a href="https://bw.vern.cc/worm/wiki/Parahuman_Response_Team">PRT</a> as an acronym.<a href="#fnref:2" rev="footnote">↩</a></li>
|
||||
<li id="fn:3">
|
||||
Using real dollars is probably illegal/too regulated in America.<a href="#fnref:3" rev="footnote">↩</a></li>
|
||||
</ol>
|
||||
</div>
|
||||
|
@ -0,0 +1,28 @@
|
||||
Twitter Improvement Proposal: Incorporate Prediction Markets, Give a Death Blow to Punditry
|
||||
===========================================================================================
|
||||
|
||||
**tl;dr**: Incorporate prediction markets into Twitter, give a death blow to punditry.
|
||||
|
||||
## The core idea
|
||||
|
||||
A prediction market is...
|
||||
|
||||
## Why do this?
|
||||
|
||||
Because it will usher humanity in an era of epistemic greatness.
|
||||
|
||||
## Caveats and downsides
|
||||
|
||||
Play money, though maybe with goods and services
|
||||
|
||||
Give 1000 doublons to all semi-active accounts on the 22/02/2022
|
||||
|
||||
## How to go about this?
|
||||
|
||||
One possiblity might be to acquihire [Manifold Markets](https://manifold.markets/) for something like $20-$50M. They are a team of competent engineers with a fair share of ex-Googlers, who have been doing a good job at building a prediction platform from scratch, and iterating on it. So one possible step might be to have the Manifold guys come up with demo functionality, and then pair them with a team who understands how one would go about doing this at Twitter-like scale.
|
||||
|
||||
|
||||
|
||||
However, I am not really cognizant of the technical challenges here, and it's possible that might not be the best approach ¯\_(ツ)_/¯
|
||||
|
||||
## In conclusion
|
@ -0,0 +1,199 @@
|
||||
<!DOCTYPE HTML>
|
||||
<html>
|
||||
<head>
|
||||
|
||||
<title>Incorporate keeping track of accuracy into X (previously Twitter)</title>
|
||||
|
||||
<link rel="stylesheet" href="/pub/style/style.css" type="text/css" media="screen, handheld" title="default">
|
||||
<link rel="shortcut icon" href="/favicon.ico" type="image/vnd.microsoft.icon">
|
||||
|
||||
<meta charset="UTF-8">
|
||||
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
|
||||
<meta property="og:image" content="https://cards.nunosempere.com/api/dynamic-image?endpoint=/blog/2023/08/19/keep-track-of-accuracy-on-twitter/">
|
||||
<meta name="twitter:card" content="summary_large_image" />
|
||||
<meta name="twitter:title" content="Measure is unceasing" />
|
||||
<meta name="twitter:description" content="Incorporate keeping track of accuracy into X (previously Twitter)" />
|
||||
<meta name="twitter:url" content="https://nunosempere.com/" />
|
||||
<meta name="twitter:image" content="https://cards.nunosempere.com/api/dynamic-image?endpoint=/blog/2023/08/19/keep-track-of-accuracy-on-twitter/" />
|
||||
<meta name="twitter:site" content="@NunoSempere" />
|
||||
|
||||
|
||||
|
||||
|
||||
<script data-isso="//comments.nunosempere.com/" data-isso-max-comments-top="inf" data-isso-max-comments-nested="inf" data-isso-postbox-text-text-en="On the Internet, nobody knows you are a dog" src="//comments.nunosempere.com/js/embed.min.js"></script>
|
||||
|
||||
|
||||
</head>
|
||||
<body>
|
||||
|
||||
<header>
|
||||
<h1><a href="/">Measure is unceasing </a><span id="headerSubTitle"></span></h1>
|
||||
</header>
|
||||
|
||||
<nav id="side-bar" class="hidden-mobile">
|
||||
<div>
|
||||
<ul>
|
||||
<li><a href="/blog/" class="thisPage">»<i> blog/</i></a></li>
|
||||
<li><ul>
|
||||
<li><a href="/blog/2019/">› 2019/</a></li>
|
||||
<li><a href="/blog/2020/">› 2020/</a></li>
|
||||
<li><a href="/blog/2021/">› 2021/</a></li>
|
||||
<li><a href="/blog/2022/">› 2022/</a></li>
|
||||
<li><a href="/blog/2023/" class="thisPage">»<i> 2023/</i></a></li>
|
||||
<li><ul>
|
||||
<li><a href="/blog/2023/01/">› 01/</a></li>
|
||||
<li><a href="/blog/2023/02/">› 02/</a></li>
|
||||
<li><a href="/blog/2023/03/">› 03/</a></li>
|
||||
<li><a href="/blog/2023/04/">› 04/</a></li>
|
||||
<li><a href="/blog/2023/05/">› 05/</a></li>
|
||||
<li><a href="/blog/2023/06/">› 06/</a></li>
|
||||
<li><a href="/blog/2023/07/">› 07/</a></li>
|
||||
<li><a href="/blog/2023/08/" class="thisPage">»<i> 08/</i></a></li>
|
||||
<li><ul>
|
||||
<li><a href="/blog/2023/08/01/">› 01/</a></li>
|
||||
<li><a href="/blog/2023/08/14/">› 14/</a></li>
|
||||
<li><a href="/blog/2023/08/19/" class="thisPage">»<i> 19/</i></a></li>
|
||||
<li><ul>
|
||||
<li><a href="/blog/2023/08/19/keep-track-of-accuracy-on-twitter/" class="thisPage">»<i> keep track of accuracy on twitter/</i></a></li>
|
||||
</ul></li>
|
||||
</ul></li>
|
||||
</ul></li>
|
||||
</ul></li>
|
||||
<li><a href="/consulting/">› consulting/</a></li>
|
||||
<li><a href="/forecasting/">› forecasting/</a></li>
|
||||
<li><a href="/gossip/">› gossip/</a></li>
|
||||
<li><a href="/misc/">› misc/</a></li>
|
||||
<li><a href="/research/">› research/</a></li>
|
||||
<li><a href="/software/">› software/</a></li>
|
||||
</ul>
|
||||
</div>
|
||||
</nav>
|
||||
|
||||
<article>
|
||||
<h1>Incorporate keeping track of accuracy into X (previously Twitter)</h1>
|
||||
|
||||
<p><strong>tl;dr</strong>: Incorporate keeping track of accuracy into X<sup id="fnref:1"><a href="#fn:1" rel="footnote">1</a></sup>. This contributes to the goal of making X the chief source of information, and strengthens humanity by providing better epistemic incentives and better mechanisms to separate the wheat from the chaff in terms of getting at the truth together.</p>
|
||||
|
||||
<h2>Why do this?</h2>
|
||||
|
||||
<p><img src="https://images.nunosempere.com/blog/2023/08/19/keeping-track-of-accuracy-on-twitter/michael-dragon.png" alt="St Michael Killing the Dragon - public domain, via Wikimedia commons" style="width: 30% !important"/></p>
|
||||
|
||||
|
||||
<ul>
|
||||
<li>Because it can be done</li>
|
||||
<li>Because keeping track of accuracy allows people to separate the wheat from the chaff at scale, which would make humanity more powerful, more <a href="https://nunosempere.com/blog/2023/07/19/better-harder-faster-stronger/">formidable</a>.</li>
|
||||
<li>Because it is an asymmetric weapon, like community notes, that help the good guys who are trying to get at what is true much more than the bad guys who are either not trying to do that or are bad at it.</li>
|
||||
<li>Because you can’t get better at learning true things if you aren’t trying, and current social media platforms are, for the most part, not incentivizing that trying.</li>
|
||||
<li>Because rival organizations—like the New York Times, Instagram Threads, or school textbooks—would be made more obsolete by this kind of functionality.</li>
|
||||
</ul>
|
||||
|
||||
|
||||
<h2>Core functionality</h2>
|
||||
|
||||
<p>I think that you can distill the core of keeping track of accuracy to three elements<sup id="fnref:2"><a href="#fn:2" rel="footnote">2</a></sup>: predict, resolve, and tally. You can see a minimal implementation of this functionality in <60 lines of bash <a href="https://github.com/NunoSempere/PredictResolveTally/tree/master">here</a>.</p>
|
||||
|
||||
<h3>predict</h3>
|
||||
|
||||
<p>make a prediction. This prediction could take the form of</p>
|
||||
|
||||
<ol>
|
||||
<li>a yes/no sentence, like “By 2030, I say that Tesla will be worth $1T”</li>
|
||||
<li>a probability, like “I say that there is a 70% chance that by 2030, Tesla will be worth $1T”</li>
|
||||
<li>a range, like “I think that by 2030, Tesla’s market cap will be worth between $800B and $5T”</li>
|
||||
<li>a probability distribution, like “Here is my probability distribution over how likely each possible market cap of Tesla is by 2030”</li>
|
||||
<li>more complicated options, e.g., a forecasting function that gives an estimate of market cap at every point in time.</li>
|
||||
</ol>
|
||||
|
||||
|
||||
<p>I think that the sweet spot is on #2: asking for probabilities. #1 doesn’t capture that we normally have uncertainty about events—e.g., in the recent superconductor debacle, we were not completely sure one way or the other until the end—, and it is tricky to have a system which scores both #3-#5 and #2. Particularly at scale, I would lean towards recommending using probabilities rather than something more ambitious, at first.</p>
|
||||
|
||||
<p>Note that each example gave both a statement that was being predicted, and a date by which the prediction is resolved.</p>
|
||||
|
||||
<h3>resolve</h3>
|
||||
|
||||
<p>Once the date of resolution has been reached, a prediction can be marked as true/false/ambiguous. Ambiguous resolutions are bad, because the people who have put effort into making a prediction feel like their time has been wasted, so it is good to minimize them.</p>
|
||||
|
||||
<p>You can have a few distinct methods of resolution. Here are a few:</p>
|
||||
|
||||
<ul>
|
||||
<li>Every question has a question creator, who resolves it</li>
|
||||
<li>Each person creates and resolves their own predictions</li>
|
||||
<li>You have a community-notes style mechanism for resolving questions</li>
|
||||
<li>You have a jury of randomly chosen peers who resolves the prediction</li>
|
||||
<li>You have a jury of previously trusted members, who resolves the question</li>
|
||||
<li>You can use a <a href="https://en.wikipedia.org/wiki/Keynesian_beauty_contest">Keynesian Beauty Contest</a>, like Kleros or UMA, where judges are rewarded for agreeing with the majority opinion of other judges. This disincentivizes correct resolutions for unpopular-but-true questions, so I would hesitate before using it.</li>
|
||||
</ul>
|
||||
|
||||
|
||||
<p>Note that you can have resolution methods that can be challeged, like the lower court/court of appeals/supreme court system in the US. For example, you could have a system where initially a question is resolved by a small number of randomly chosen jurors, but if someone gives a strong signal that they object to the resolution—e.g., if they pay for it, or if they spend one of a few “appeals” tokens—then the question is resolved by a larger pool of jurors.</p>
|
||||
|
||||
<p>Note that the resolution method will shape the flavour of your prediction functionality, and constrain the types of questions that people can forecast on. You can have a more anarchic system, where everyone can instantly create a question and predict on it. Then, people will create many more questions, but perhaps they will have a bias towards resolving questions in their own favour, and you will have slightly duplicate questions. Then you will get something closer to <a href="https://manifold.markets/">Manifold Markets</a>. Or you could have a mechanism where people propose questions and these are made robust to corner cases in their resolution criteria by volunteers, and then later resolved by a jury of volunteers. Then you will get something like <a href="https://www.metaculus.com/">Metaculus</a>, where you have fewer questions but these are of higher quality and have more reliable resolutions.</p>
|
||||
|
||||
<p>Ultimately, I’m not saying that the resolution method is unimportant. But I think there is a temptation to nerd out too much about the specifics, and having some resolution method that is transparently outlined and shipping it quickly seems much better than getting stuck at this step.</p>
|
||||
|
||||
<h3>tally</h3>
|
||||
|
||||
<p>Lastly, present the information about what proportion of people’s predictions come true. E.g., of the times I have predicted a 60% likelihood of something, how often has it come true? Ditto for other percentages. These are normally binned to produce a calibration chart, like the following:</p>
|
||||
|
||||
<p><img src="https://images.nunosempere.com/blog/2023/08/19/keeping-track-of-accuracy-on-twitter/calibrationChart2.png" alt="my calibration chart from Good Judgment Open" /></p>
|
||||
|
||||
<p>On top of that starting point, you can also do more elaborate things:</p>
|
||||
|
||||
<ul>
|
||||
<li>You can have a summary statistic—a proper scoring rule, like the Brier score or a log score—that summarizes how good you are at prediction “in general”. Possibly this might involve comparing your performance to the performance of people who predicted in the same questions.</li>
|
||||
<li>Previously, you could have allowed people to bet against each other. Then, their profits would indicate how good they are. I think this might be too complicated at Twitter style, at least at first.</li>
|
||||
</ul>
|
||||
|
||||
|
||||
<p><a href="https://arxiv.org/abs/2106.11248">Here</a> is a review of some mistakes people have previously made when scoring these kinds of forecasts. For example, if you have some per-question accuracy reward, people will gravitate towards forecasting on easier rather than on more useful questions. These kinds of considerations are important, particularly since they will determine who will be at the top of some scoring leaderboard, if there is any such. Generally, <a href="https://arxiv.org/abs/1803.04585">Goodhart’s law</a> is going to be a problem here. But again, having <em>some</em> tallying mechanism seems way better than the current information environment.</p>
|
||||
|
||||
<p>Once you have some tallying—whether a calibration chart, a score from a proper scoring rule, or some profit it Musk-Bucks<sup id="fnref:3"><a href="#fn:3" rel="footnote">3</a></sup>, such a tally could:</p>
|
||||
|
||||
<ul>
|
||||
<li>be semi-prominently displayed so that people can look to it when deciding how much to trust an account,</li>
|
||||
<li>be used by X’s algorithm to show more accurate accounts a bit more at the margin,</li>
|
||||
<li>provide an incentive for people to be accurate,</li>
|
||||
<li>provide a way for people who want to become more accurate to track their performance</li>
|
||||
</ul>
|
||||
|
||||
|
||||
<p>When dealing with catastrophes, wars, discoveries, and generally with events that challenge humanity’s ability to figure out what is going on, having these mechanisms in place would help humanity make better decisions about who to listen to: to listen not to who is loudest but to who is most right.</p>
|
||||
|
||||
<h2>Conclusion</h2>
|
||||
|
||||
<p>X can do this. It would help with its goal of outcompeting other sources of information, and it would do this fair and square by improving humanity’s collective ability to get at the truth. I don’t know what other challenges and plans Musk has in store for X, but I would strongly consider adding this functionality to it.</p>
|
||||
|
||||
<p>
|
||||
<section id='isso-thread'>
|
||||
<noscript>javascript needs to be activated to view comments.</noscript>
|
||||
</section>
|
||||
</p>
|
||||
|
||||
<div class="footnotes">
|
||||
<hr/>
|
||||
<ol>
|
||||
<li id="fn:1">
|
||||
previously Twitter<a href="#fnref:1" rev="footnote">↩</a></li>
|
||||
<li id="fn:2">
|
||||
Ok, four, if we count question creation and prediction as distinct. But I like <a href="https://bw.vern.cc/worm/wiki/Parahuman_Response_Team">PRT</a> as an acronym.<a href="#fnref:2" rev="footnote">↩</a></li>
|
||||
<li id="fn:3">
|
||||
Using real dollars is probably illegal/too regulated in America.<a href="#fnref:3" rev="footnote">↩</a></li>
|
||||
</ol>
|
||||
</div>
|
||||
|
||||
</article>
|
||||
|
||||
<footer class="hidden-mobile">
|
||||
<br class="doNotDisplay doNotPrint" />
|
||||
|
||||
<div style="margin-right: auto;">Powered by <a href="http://werc.cat-v.org/">werc</a>, <a href="https://alpinelinux.org/">alpine</a> and <a href="https://nginx.org/en/">nginx</a></div>
|
||||
|
||||
<!-- TODO: wait until duckduckgo indexes site
|
||||
<form action="https://duckduckgo.com/" method="get">
|
||||
<input type="hidden" name="sites" value="nunosempere.com">
|
||||
<input type="search" name="q">
|
||||
<input type="submit" value="Search">
|
||||
</form>
|
||||
-->
|
||||
</footer>
|
||||
</body></html>
|
After Width: | Height: | Size: 17 MiB |
@ -0,0 +1,95 @@
|
||||
Incorporate keeping track of accuracy into X (previously Twitter)
|
||||
====
|
||||
|
||||
**tl;dr**: Incorporate keeping track of accuracy into X<sup id="fnref:1"><a href="#fn:1" rel="footnote">1</a></sup>. This contributes to the goal of making X the chief source of information, and strengthens humanity by providing better epistemic incentives and better mechanisms to separate the wheat from the chaff in terms of getting at the truth together.
|
||||
|
||||
## Why do this?
|
||||
|
||||
<p><img src="https://images.nunosempere.com/blog/2023/08/19/keeping-track-of-accuracy-on-twitter/michael-dragon.png" alt="St Michael Killing the Dragon - public domain, via Wikimedia commons" style="width: 30% !important"/></p>
|
||||
|
||||
- Because it can be done
|
||||
- Because keeping track of accuracy allows people to separate the wheat from the chaff at scale, which would make humanity more powerful, more [formidable](https://nunosempere.com/blog/2023/07/19/better-harder-faster-stronger/).
|
||||
- Because it is an asymmetric weapon, like community notes, that help the good guys who are trying to get at what is true much more than the bad guys who are either not trying to do that or are bad at it.
|
||||
- Because you can't get better at learning true things if you aren't trying, and current social media platforms are, for the most part, not incentivizing that trying.
|
||||
- Because rival organizations---like the New York Times, Instagram Threads, or school textbooks---would be made more obsolete by this kind of functionality.
|
||||
|
||||
## Core functionality
|
||||
|
||||
I think that you can distill the core of keeping track of accuracy to three elements<sup id="fnref:2"><a href="#fn:2" rel="footnote">2</a></sup>: predict, resolve, and tally. You can see a minimal implementation of this functionality in <60 lines of bash [here](https://github.com/NunoSempere/PredictResolveTally/tree/master).
|
||||
|
||||
### predict
|
||||
|
||||
make a prediction. This prediction could take the form of
|
||||
|
||||
1. a yes/no sentence, like "By 2030, I say that Tesla will be worth $1T"
|
||||
2. a probability, like "I say that there is a 70% chance that by 2030, Tesla will be worth $1T"
|
||||
3. a range, like "I think that by 2030, Tesla's market cap will be worth between $800B and $5T"
|
||||
4. a probability distribution, like "Here is my probability distribution over how likely each possible market cap of Tesla is by 2030"
|
||||
5. more complicated options, e.g., a forecasting function that gives an estimate of market cap at every point in time.
|
||||
|
||||
I think that the sweet spot is on #2: asking for probabilities. #1 doesn't capture that we normally have uncertainty about events—e.g., in the recent superconductor debacle, we were not completely sure one way or the other until the end—, and it is tricky to have a system which scores both #3-#5 and #2. Particularly at scale, I would lean towards recommending using probabilities rather than something more ambitious, at first.
|
||||
|
||||
Note that each example gave both a statement that was being predicted, and a date by which the prediction is resolved.
|
||||
|
||||
### resolve
|
||||
|
||||
Once the date of resolution has been reached, a prediction can be marked as true/false/ambiguous. Ambiguous resolutions are bad, because the people who have put effort into making a prediction feel like their time has been wasted, so it is good to minimize them.
|
||||
|
||||
You can have a few distinct methods of resolution. Here are a few:
|
||||
|
||||
- Every question has a question creator, who resolves it
|
||||
- Each person creates and resolves their own predictions
|
||||
- You have a community-notes style mechanism for resolving questions
|
||||
- You have a jury of randomly chosen peers who resolves the prediction
|
||||
- You have a jury of previously trusted members, who resolves the question
|
||||
- You can use a [Keynesian Beauty Contest](https://en.wikipedia.org/wiki/Keynesian_beauty_contest), like Kleros or UMA, where judges are rewarded for agreeing with the majority opinion of other judges. This disincentivizes correct resolutions for unpopular-but-true questions, so I would hesitate before using it.
|
||||
|
||||
Note that you can have resolution methods that can be challeged, like the lower court/court of appeals/supreme court system in the US. For example, you could have a system where initially a question is resolved by a small number of randomly chosen jurors, but if someone gives a strong signal that they object to the resolution—e.g., if they pay for it, or if they spend one of a few "appeals" tokens—then the question is resolved by a larger pool of jurors.
|
||||
|
||||
Note that the resolution method will shape the flavour of your prediction functionality, and constrain the types of questions that people can forecast on. You can have a more anarchic system, where everyone can instantly create a question and predict on it. Then, people will create many more questions, but perhaps they will have a bias towards resolving questions in their own favour, and you will have slightly duplicate questions. Then you will get something closer to [Manifold Markets](https://manifold.markets/). Or you could have a mechanism where people propose questions and these are made robust to corner cases in their resolution criteria by volunteers, and then later resolved by a jury of volunteers. Then you will get something like [Metaculus](https://www.metaculus.com/), where you have fewer questions but these are of higher quality and have more reliable resolutions.
|
||||
|
||||
Ultimately, I'm not saying that the resolution method is unimportant. But I think there is a temptation to nerd out too much about the specifics, and having some resolution method that is transparently outlined and shipping it quickly seems much better than getting stuck at this step.
|
||||
|
||||
### tally
|
||||
|
||||
Lastly, present the information about what proportion of people's predictions come true. E.g., of the times I have predicted a 60% likelihood of something, how often has it come true? Ditto for other percentages. These are normally binned to produce a calibration chart, like the following:
|
||||
|
||||
![my calibration chart from Good Judgment Open](https://images.nunosempere.com/blog/2023/08/19/keeping-track-of-accuracy-on-twitter/calibrationChart2.png)
|
||||
|
||||
On top of that starting point, you can also do more elaborate things:
|
||||
|
||||
- You can have a summary statistic—a proper scoring rule, like the Brier score or a log score—that summarizes how good you are at prediction "in general". Possibly this might involve comparing your performance to the performance of people who predicted in the same questions.
|
||||
- Previously, you could have allowed people to bet against each other. Then, their profits would indicate how good they are. I think this might be too complicated at Twitter style, at least at first.
|
||||
|
||||
[Here](https://arxiv.org/abs/2106.11248) is a review of some mistakes people have previously made when scoring these kinds of forecasts. For example, if you have some per-question accuracy reward, people will gravitate towards forecasting on easier rather than on more useful questions. These kinds of considerations are important, particularly since they will determine who will be at the top of some scoring leaderboard, if there is any such. Generally, [Goodhart's law](https://arxiv.org/abs/1803.04585) is going to be a problem here. But again, having *some* tallying mechanism seems way better than the current information environment.
|
||||
|
||||
Once you have some tallying—whether a calibration chart, a score from a proper scoring rule, or some profit it Musk-Bucks<sup id="fnref:3"><a href="#fn:3" rel="footnote">3</a></sup>, such a tally could:
|
||||
|
||||
- be semi-prominently displayed so that people can look to it when deciding how much to trust an account,
|
||||
- be used by X's algorithm to show more accurate accounts a bit more at the margin,
|
||||
- provide an incentive for people to be accurate,
|
||||
- provide a way for people who want to become more accurate to track their performance
|
||||
|
||||
When dealing with catastrophes, wars, discoveries, and generally with events that challenge humanity's ability to figure out what is going on, having these mechanisms in place would help humanity make better decisions about who to listen to: to listen not to who is loudest but to who is most right.
|
||||
|
||||
## Conclusion
|
||||
|
||||
X can do this. It would help with its goal of outcompeting other sources of information, and it would do this fair and square by improving humanity's collective ability to get at the truth. I don't know what other challenges and plans Musk has in store for X, but I would strongly consider adding this functionality to it.
|
||||
|
||||
<div class="footnotes">
|
||||
<hr/>
|
||||
<ol>
|
||||
<li id="fn:1">
|
||||
previously Twitter<a href="#fnref:1" rev="footnote">↩</a></li>
|
||||
<li id="fn:2">
|
||||
Ok, four, if we count question creation and prediction as distinct. But I like <a href="https://bw.vern.cc/worm/wiki/Parahuman_Response_Team">PRT</a> as an acronym.<a href="#fnref:2" rev="footnote">↩</a></li>
|
||||
<li id="fn:3">
|
||||
Using real dollars is probably illegal/too regulated in America.<a href="#fnref:3" rev="footnote">↩</a></li>
|
||||
</ol>
|
||||
</div>
|
||||
|
||||
<p>
|
||||
<section id='isso-thread'>
|
||||
<noscript>javascript needs to be activated to view comments.</noscript>
|
||||
</section>
|
||||
</p>
|
@ -0,0 +1,26 @@
|
||||
Quick thoughts on Manifund's application to Open Philanthropy
|
||||
=============================================================
|
||||
|
||||
[Manifund](https://manifund.org/) is a new effort to improve, speed up and decentralize funding mechanisms in the broader Effective Altruism community, by some of the same people previously responsible for [Manifold](https://manifold.markets/home). Due to Manifold's policy of making a bunch of their internal documents public, you can see their application to Open Philanthropy [here](https://manifoldmarkets.notion.site/OpenPhil-Grant-Application-3c226068c3ae45eaaf4e6afd7d1763bc) (also a markdown backup [here](https://nunosempere.com/blog/2023/09/05/manifund-open-philanthropy/.src/application)).
|
||||
|
||||
Here is my perspective on this:
|
||||
|
||||
- They have given me a $50k regranting budget. It seems plausible that this colors my thinking.
|
||||
- Manifold is highly technologically competent.
|
||||
- [Effective Altruism Funds](https://funds.effectivealtruism.org/), which could be the closest point of comparison to Manifund, is not highly technologically competent. In particular, they have been historically tied to Salesforce, a den of mediocrity that slows speed, makes interacting with their systems annoying, and isn't that great across any one dimension.
|
||||
- Previously, Manifold blew [Hypermind](https://predict.hypermind.com/hypermind/app.html), a previous play-money prediction market, completely out of the water. Try browsing markets, searching markets, making a prediction on Hypermind, and then try the same thing in Manifold.
|
||||
- It seems very plausible to me that Manifund could do the same thing to CEA's Effective Altruism Funds: Create a product that is incomparably better by having a much higher technical and operational competence.
|
||||
- One way to think about the cost and value of Manifund would be Δ(value of grant recipients) - Δ(costs of counterfactual funding method).
|
||||
- The cost is pretty high, because Austin's counterfactual use of his capable engineering labour is pretty valuable.
|
||||
- Value is still to be determined. One way might be to compare the value of grants made in 2023 year by Manifund, EA Funds, SFF, Open Philanthropy, etc., and see if there are any clear conclusions.
|
||||
- Framing this as "improving EA Funds" would slow everything down and make it more mediocre, and would make Manifund less motivated by reducing their sense of ownership, so it doesn't make sense as a framework.
|
||||
- Instead, it's worth keeping in mind that Manifund has the option to incorporate aspects of EA funds if it so chooses—like some grantmakers, questions to prospective grantees, public reports, etc.
|
||||
- Manifund also has the option of identifying and then unblocking historical bottlenecks that EA funds has had, like slow response speed, not using grantmakers who are already extremely busy, etc.
|
||||
|
||||
A funny thing is that Manifund itself can't say, and probably doesn't think of their pathway to impact as: do things much better than EA funds by being absurdly more competent than them. It would look arrogant if they said it. But I can say it!
|
||||
|
||||
<p>
|
||||
<section id='isso-thread'>
|
||||
<noscript>javascript needs to be activated to view comments.</noscript>
|
||||
</section>
|
||||
</p>
|
@ -0,0 +1,124 @@
|
||||
Count words in <50 lines of C
|
||||
===
|
||||
|
||||
The Unix utility wc counts words. You can make simple, non-POSIX compatible version of it that solely counts words in [159 words and 42 lines of C](https://git.nunosempere.com/personal/wc/src/branch/master/src/wc.c). Or you can be like GNU and take 3615 words and 1034 lines to do something more complex.
|
||||
|
||||
## Desiderata
|
||||
|
||||
- Simple: Just count words as delimited by spaces, tabs, newlines.
|
||||
- Allow: reading files, piping to the utility, and reading from stdin—concluded by pressing Ctrl+D.
|
||||
- Separate utilities for counting different things, like lines and characters, into their own tools.
|
||||
- Avoid off-by-one errors.
|
||||
- Linux only.
|
||||
- Small.
|
||||
|
||||
## Comparison with other versions of wc
|
||||
|
||||
The [version of wc.c](https://git.nunosempere.com/personal/wc/src/branch/master/src/wc.c) in this repository sits at 44 lines. It decides to read from stdin if the number of arguments fed to it is otherwise zero, and uses the standard C function getc to read character by character. It doesn't have flags, instead, there are further utilities in the src/extra/ folder for counting characters and lines, sitting at 32 and 35 lines of code, respectively. This version also has little error checking.
|
||||
|
||||
[Here](https://github.com/dspinellis/unix-history-repo/blob/Research-V7-Snapshot-Development/usr/src/cmd/wc.c) is a version of wc from UNIX V7, at 86 lines. It allows for counting characters, words and lines. I couldn't find a version in UNIX V6, so I'm guessing this is one of the earliest versions of this program. It decides to read from stdin if the number of arguments fed to it is zero, and reads character by character using the standard C getc function.
|
||||
|
||||
The busybox version ([git.busybox.net](https://git.busybox.net/busybox/tree/coreutils/wc.c)) of wc sits at 257 lines (162 with comments stripped), while striving to be [POSIX-compliant](https://pubs.opengroup.org/onlinepubs/9699919799/), meaning it has a fair number of flags and a bit of complexity. It reads character by character by using the standard getc function, and decides to read from stdin or not using its own fopen_or_warn_stdin function. It uses two GOTOs to get around, and has some incomplete Unicode support.
|
||||
|
||||
The [plan9](https://9p.io/sources/plan9/sys/src/cmd/wc.c) version implements some sort of table method in 331 lines. It uses plan9 rather than Unix libraries and methods, and seems to read from stdin if the number of args is 0.
|
||||
|
||||
The plan9port version of wc ([github](https://github.com/9fans/plan9port/blob/master/src/cmd/wc.c)) also implements some sort of table method, in 352 lines. It reads from stdin if the number of args is 0, and uses the Linux read function to read character by character.
|
||||
|
||||
The [OpenBSD](https://github.com/openbsd/src/blob/master/usr.bin/wc/wc.c) version is just *nice*. It reads from stdin by default, and uses a bit of buffering using read to speed things up. It defaults to using fstat when counting characters. It is generally pleasantly understandable, nice to read. I'm actually surprised at how pleasant it is to read.
|
||||
|
||||
The [FreeBSD version](https://cgit.freebsd.org/src/tree/usr.bin/wc/wc.c) sits at 367 lines. It has enough new things that I can't parse all that it's doing: in lines 137-143, what is capabilities mode? what is casper?, but otherwise it decides whether to read from stdin by the number of arguments, in line 157. It uses a combination of fstat and read, depending on the type of file.
|
||||
|
||||
Finally, the GNU utils version ([github](https://github.com/coreutils/coreutils/tree/master/src/wc.c), [savannah](http://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=blob;f=src/wc.c;hb=HEAD)) is a bit over 1K lines of C. It does many things and checks many possible failure modes. I think it detects whether it should be reading from stdin using some very wrapped fstat, and it reads character by character using its own custom function.
|
||||
|
||||
So this utility started out reasonably small, then started getting more and more complex. [The POSIX committee](https://pubs.opengroup.org/onlinepubs/9699919799/) ended up codifying that complexity, and now we are stuck with it because even implementations like busybox which strive to be quite small try to keep to POSIX.
|
||||
|
||||
## Installation
|
||||
|
||||
```
|
||||
git clone
|
||||
make
|
||||
sudo make install
|
||||
## ^ installs to /bin/ww if there isn't a /bin/ww already
|
||||
```
|
||||
|
||||
## Usage examples
|
||||
|
||||
```
|
||||
echo "En un lugar de la Mancha" | ww
|
||||
cat README.md | ww
|
||||
ww README.md
|
||||
ww # write something, then exit with Ctrl+D
|
||||
```
|
||||
|
||||
## Relationship with cat-v
|
||||
|
||||
Does one really need to spend 1k lines of C code to count characters, words and lines? There are many versions of this rant one could give, but the best and probably best known is [this one](https://harmful.cat-v.org/cat-v/unix_prog_design.pdf) on cat -v. Busybox itself has given up here, and its [version of cat](https://git.busybox.net/busybox/tree/coreutils/cat.c) has the following comment:
|
||||
|
||||
> Rob had "cat -v" implemented as a separate applet, catv.
|
||||
> See "cat -v considered harmful" at
|
||||
> http://cm.bell-labs.com/cm/cs/doc/84/kp.ps.gz
|
||||
> From USENIX Summer Conference Proceedings, 1983
|
||||
>
|
||||
> """
|
||||
>
|
||||
> The talk reviews reasons for UNIX's popularity and shows, using UCB cat
|
||||
> as a primary example, how UNIX has grown fat. cat isn't for printing
|
||||
> files with line numbers, it isn't for compressing multiple blank lines,
|
||||
> it's not for looking at non-printing ASCII characters, it's for
|
||||
> concatenating files.
|
||||
> We are reminded that ls isn't the place for code to break a single column
|
||||
> into multiple ones, and that mailnews shouldn't have its own more
|
||||
> processing or joke encryption code.
|
||||
>
|
||||
> """
|
||||
>
|
||||
> I agree with the argument. Unfortunately, this ship has sailed (1983...).
|
||||
> There are dozens of Linux distros and each of them has "cat" which supports -v.
|
||||
> It's unrealistic for us to "reeducate" them to use our, incompatible way
|
||||
> to achieve "cat -v" effect. The actual effect would be "users pissed off
|
||||
> by gratuitous incompatibility".
|
||||
|
||||
I'm not sure that gratuitous incompatibility is so bad if it leads to utilities that are much simpler and easier to understand and inspect. That said, other projects aiming in this direction that I could find, like [tiny-core](https://github.com/keiranrowan/tiny-core/tree/master/src) or [zig-coreutils](https://github.com/leecannon/zig-coreutils) don't seem to be making much headway.
|
||||
|
||||
## To do
|
||||
|
||||
- [ ] Possible follow-up: Write simple versions for other coreutils. Would be a really nice project.
|
||||
- [ ] Get some simple version of this working on a DuskOS/CollapseOS machine?
|
||||
- [ ] Or, generally find a minimalistic kernel that could use some simple coreutils.
|
||||
- [ ] Add man pages?
|
||||
- [ ] Pitch to lwn.net as an article?
|
||||
- [ ] Come back to writting these in zig
|
||||
- [ ] ...
|
||||
|
||||
|
||||
## Done or discarded
|
||||
|
||||
- [x] Look into how C utilities both read from stdin and from files.
|
||||
- [x] Program first version of the utility
|
||||
- [x] Compare with other implementations, see how they do it, after I've created my own version
|
||||
- [x] Compare with gnu utils.
|
||||
- [x] Compare with busybox implementation
|
||||
- [x] Compare with other versions
|
||||
- [x] Compare with other projects: <https://github.com/leecannon/zig-coreutils>, <https://github.com/keiranrowan/tiny-core/tree/master>.
|
||||
- [x] Install to ww, but check that ww is empty (installing to wc2 or smth would mean that you don't save that many keypresses vs wc -w)
|
||||
- [x] Look specifically at how other versions do stuff.
|
||||
- [x] Distinguish between reading from stdin and reading from a file
|
||||
- [x] If it doesn't have arguments, read from stdin.
|
||||
- [x] Open files, read characters.
|
||||
- [x] Write version that counts lines (lc)
|
||||
- [x] Take into account what happens if file doesn't end in newline.
|
||||
- [ ] ~~Count EOF as word & line separator~~
|
||||
- [x] Document it
|
||||
- [x] Document reading from user-inputed stdin (end with Ctrl+D)
|
||||
- [x] add chc, or charcounter (cc is "c compiler")
|
||||
- [x] Add licenses to historical versions before redistributing.
|
||||
- [ ] ~~Could use zig? => Not for now~~
|
||||
- [ ] ~~Maybe make some pull requests, if I'm doing something better? => doesn't seem like it~~
|
||||
- [ ] ~~Write man files?~~
|
||||
|
||||
<p>
|
||||
<section id='isso-thread'>
|
||||
<noscript>javascript needs to be activated to view comments.</noscript>
|
||||
</section>
|
||||
</p>
|
||||
|
After Width: | Height: | Size: 188 KiB |
After Width: | Height: | Size: 224 KiB |
After Width: | Height: | Size: 252 KiB |
After Width: | Height: | Size: 260 KiB |
After Width: | Height: | Size: 336 KiB |
After Width: | Height: | Size: 296 KiB |
@ -0,0 +1,86 @@
|
||||
*Epistemic status*: This post is blunt. Please see the extended disclaimer about negative feedback [here](https://forum.effectivealtruism.org/users/negativenuno). Consider not reading it if you work on the EA forum and don't have thick skin.
|
||||
|
||||
*tl;dr*: Once, the EA forum was a lean, mean machine. But it has become more bloated over time, and I don't like it. Separately, I don't think it's worth the roughly $2M/year[^twomillion] it costs, although I haven't modelled this in depth.
|
||||
[^twomillion]: This is not a great amount in the grand scheme of things. Still, I am interested it in for two reasons: a) I'm working on a different piece, and this is a small, concrete case study that I can later reference, and b) I used to cherish the EA forum, and wrote over 100k words in it, only to see it become hostile to the type of disagreeable person that I am.
|
||||
|
||||
### The EA forum frontpage through time.
|
||||
|
||||
In [2018-2019](https://web.archive.org/web/20181115134712/https://forum.effectivealtruism.org/), the EA forum was a lean and mean machine:
|
||||
|
||||
![](https://images.nunosempere.com/blog/2023/10/02/ea-forum-2018-2019.png)
|
||||
|
||||
In 2020, there was a small redesign:
|
||||
|
||||
![](https://images.nunosempere.com/blog/2023/10/02/ea-forum-2020.png)
|
||||
|
||||
In 2021, the sidebar expands:
|
||||
|
||||
![](https://images.nunosempere.com/blog/2023/10/02/ea-forum-2021.png)
|
||||
|
||||
In 2022, the sidebar expands further, and pinned and curated posts take up more space:
|
||||
|
||||
![](https://images.nunosempere.com/blog/2023/10/02/ea-forum-2022.png)
|
||||
|
||||
In 2023, the sidebar splits in two. Pinned and curated posts acquire shiny symbols. Since recently, you can now add [reactions](https://forum.effectivealtruism.org/posts/fyCnfiL49T5HvMjvL/forum-update-10-new-features-oct-2023)
|
||||
|
||||
![](https://images.nunosempere.com/blog/2023/10/02/ea-forum-2023-bis.png)
|
||||
|
||||
### EA forum costs
|
||||
|
||||
Per [this comment](https://forum.effectivealtruism.org/posts/auhi3JoiqGhi5PqnQ/ama-we-re-the-forum-team-and-we-re-hiring-ask-us-anything?commentId=tjTkjLpBD59ybtcuX), the EA forum was spending circa $2M/year and employing 8 people as of July 2023. Per the [website](https://www.centreforeffectivealtruism.org/team#online-team) of the Center for Effective Altruism, the online team now has 6 members, including ¿one designer?
|
||||
|
||||
### EA forum moderation
|
||||
|
||||
In the beginning, when the EA forum was smaller, there was one moderator, Aaron Gertler, and all was well. Now, as the EA forum has grown, there is a larger pool of moderators, which protect the forum from spam and ban malicious users.
|
||||
|
||||
At the same time, the moderation team has acted [against](https://forum.effectivealtruism.org/posts/myp9Y9qJnpEEWhJF9/linch-s-shortform?commentId=DvPcdhnN7wcXpcB7Z) [disagreeable](https://forum.effectivealtruism.org/posts/Pfayu5Bf2apKreueD/?commentId=7cHvfzMLw2Jua9JPh) [people](https://forum.effectivealtruism.org/posts/FZFzqPYpTpGGRhyrj/does-ea-get-the-best-people-hypotheses-call-for-discussion?commentId=o3mahDSh4wuHTvsXh) [that](https://forum.effectivealtruism.org/posts/CfEAggjzSDrado6ZC/forecasting-our-world-in-data-the-next-100-years?commentId=upkHDudfh8c9FpM8u) [I](https://forum.effectivealtruism.org/posts/DB9ggzc5u9RMBosoz/wrong-lessons-from-the-ftx-catastrophe?commentId=cp6ngfKrqyjsuAQoo) [liked](https://forum.effectivealtruism.org/posts/4zjnFxGWYkEF4nqMi/how-could-we-have-avoided-this?commentId=Q7BQJFyEwk96Q6g95).
|
||||
|
||||
**Counterpoint**: When I review the [moderation comments](https://forum.effectivealtruism.org/moderatorComments) log, moderation actions seem infrequent. I guess that disagreeable people whom I like getting banned or warned was memorable to me, though.
|
||||
|
||||
### EA forum culture evolution
|
||||
|
||||
My impression is that the EA forum has been catering more to the [marginal user](https://nothinghuman.substack.com/p/the-tyranny-of-the-marginal-user); creating more introductory content, signposts, accessibility features, decreasing barriers to entry, etc. As the audience has increased, the marginal user is mostly a newbie. To me, the forum has been becoming more something like Reddit over time, which I dislike.
|
||||
|
||||
In stark contrast, consider [Hackernews](https://news.ycombinator.com/). Hackernews is an influential tech forum with [5M monthly users and 10M views/day](https://news.ycombinator.com/item?id=33454140). It has been able to retain its slim design through the years. Its moderation team has three persons, and they [*correspond with users via email*](https://news.ycombinator.com/item?id=34920400).
|
||||
|
||||
### Brief thoughts on cost-effectiveness.
|
||||
|
||||
The EA forum's existence is valuable. It is still a place for high-quality discussion, and it helps the EA community collaborate on research, coordinate, identify opportunities, make sense of incoming challenges. But on top of the EA forum's existence, are changes made in recent years positive at all, and worth $2M/year if so?
|
||||
|
||||
My individual perspective, my inside view, my personal guess is that a lean and mean version of the EA forum, in the style of Hackernews, would have done a better job for less money. From that perspective, the cost-effectiveness of the marginal $1.5M would be negative. Making a [marginal donation](https://forum.effectivealtruism.org/posts/PAco5oG579k2qzrh9/ltff-and-eaif-are-unusually-funding-constrained-right-now) to the EA Infrastructure or Long-term Future Fund would have been a better choice.
|
||||
|
||||
A different perspective one might take, that I don't know quite how to inhabit, might be to make the argument that actually, a small improvement in user experience leads to an increased chance that a person will become more committed to EA over its counterfactual, and that this is valuable. For example:
|
||||
|
||||
1. if the EA forum had 500k unique yearly visitors, and improvements to the forum in recent years mean that 1% of them continue interacting with the EA movement, that would lead to 5k counterfactual EAs. If think that creating more EAs is valuable, and we value this at $10k per EA, this would be worth $50M.
|
||||
2. if the forum influenced five to a hundred decisions a day each worth $1k to $100k, and improved them by 1% to 20%, this would be worth ~20M a year.
|
||||
|
||||
The problem with those two hypothetical examples are that I don't buy the numbers. I think it's easy to greatly overestimate small percentages: when one is inclined to model something as having an influence of 1%, it's often a 0.01% instead. Less importantly, I think one should use Shapley values instead of counterfactual values in order to avoid double-counting and over-spending[^shapley].
|
||||
|
||||
[^shapley]: E.g., I think that if four agents (80,000 hours; a local EA group; a personal friend; the EA forum) are needed to make someone significantly more altruistic, each organization should get 1/4th of the credit. Otherwise the credit would sum up to more than 100%, and this hinders comparisons between opportunities. For a longer treatment of this topic, see [this post](https://forum.effectivealtruism.org/posts/XHZJ9i7QBtAJZ6byW/shapley-values-better-than-counterfactuals).
|
||||
|
||||
### Suggestions
|
||||
|
||||
If you are a user of the forum...
|
||||
|
||||
- Consider that the EA forum is currently pushing content on you. Make use of it if you are a newbie, but maybe actively filter it out once you are not.
|
||||
- Consider using faster and more minimal frontends, like [ea.greaterwrong.com](https://ea.greaterwrong.com/) or my own opinionated [forum.nunosempere.com](https://forum.nunosempere.com).
|
||||
- Consider interacting with the EA forum frontpage through [RSS](https://forum.effectivealtruism.org/feed.xml?view=community-rss&karmaThreshold=30) or the [all posts](https://forum.effectivealtruism.org/allPosts) page, not the frontpage.
|
||||
- Host your own content in independent platforms, like substack or your own blog, and build your own audience, rather than relying on a platform you don't control. You can always cross-post it to the EA forum, but having an independent place to build your own audience and as a hedge costs you little.
|
||||
|
||||
If you are a CEA director or middle manager, you might have thought about this more than I have. Still, you might want to:
|
||||
|
||||
- Consider going back to ~1 developer and ~1 content person; save >$1M/year of your and your donors' money. My sense is that you are probably going to have to do this anyways, since you will probably not get enough money from donors[^donors], to continue your current course.[^course]
|
||||
- Consider characterizing the EA forum's team role to be one of lightly shepharding discussion, not leading it or defining it.
|
||||
- Consider reflecting on which incentives led to the creation of a larger EA Forum team. For example, Google has well-known incentives around managers being rewarded for leading larger teams to develop new products, and doesn't value maintenance, leading to a continuous churn and sunsetting of Google products. Might something similar, though at a lower scale, have happened here?
|
||||
- As a distant fourth point, consider opening up authentication mechanisms so that users can make comments and posts using open-source frontends. This was previously doable through the greaterwrong frontend, but is no longer possible. It's possible that this might not be possible with your current software stack, or be too difficult, though.
|
||||
|
||||
[^donors]: Realistically, this is going to be mainly Open Philanthropy, as other donors can't support $2M/year.
|
||||
|
||||
[^course]: you could check this by creating a market on Manifold!
|
||||
|
||||
If you are working on the EA forum...
|
||||
|
||||
- I am probably missing a bunch of factors in this analysis. If you think that spending $2M/year, or having 6 to 8 people full-time on the EA forum is meaningful, you might want to post a BOTEC outlining why.
|
||||
- I think that this post probably sounds very harsh, sorry. Note that these three things can be true at the same time: a) a more minimalistic forum would have been better, b) CEA leadership made a bad judgment call expanding the EA forum during the FTX days and will now have to downsize, c) given your work description, you did good work.
|
||||
- It is possible that your current position is precarious, e.g., that you might be fired, or transferred to a different project within CEA.
|
||||
|
@ -0,0 +1,5 @@
|
||||
<p>
|
||||
<section id='isso-thread'>
|
||||
<noscript>javascript needs to be activated to view comments.</noscript>
|
||||
</section>
|
||||
</p>
|
@ -0,0 +1,5 @@
|
||||
MARKDOWN=/usr/bin/markdown -f fencedcode -f ext -f footnote -f latex
|
||||
build:
|
||||
$(MARKDOWN) index.md > temp
|
||||
cat title.md temp isso-snippet.txt > ../index.md
|
||||
rm temp
|
@ -0,0 +1,3 @@
|
||||
Brief thoughts on CEA's stewardship of the EA Forum
|
||||
====================================
|
||||
|
@ -0,0 +1,2 @@
|
||||
Add much kinder stuff
|
||||
Address Misha's feedback
|
@ -0,0 +1,110 @@
|
||||
Brief thoughts on CEA's stewardship of the EA Forum
|
||||
====================================
|
||||
|
||||
<p><em>Epistemic status</em>: This post is blunt. Please see the extended disclaimer about negative feedback <a href="https://forum.effectivealtruism.org/users/negativenuno">here</a>. Consider not reading it if you work on the EA forum and don’t have thick skin.</p>
|
||||
|
||||
<p><em>tl;dr</em>: Once, the EA forum was a lean, mean machine. But it has become more bloated over time, and I don’t like it. Separately, I don’t think it’s worth the roughly $2M/year<sup id="fnref:1"><a href="#fn:1" rel="footnote">1</a></sup> it costs, although I haven’t modelled this in depth.</p>
|
||||
|
||||
<h3>The EA forum frontpage through time.</h3> <p>In <a href="https://web.archive.org/web/20181115134712/https://forum.effectivealtruism.org/">2018-2019</a>, the EA forum was a lean and mean machine:</p>
|
||||
|
||||
<p><img src="https://images.nunosempere.com/blog/2023/10/02/ea-forum-2018-2019.png" alt="" /></p>
|
||||
|
||||
<p>In 2020, there was a small redesign:</p>
|
||||
|
||||
<p><img src="https://images.nunosempere.com/blog/2023/10/02/ea-forum-2020.png" alt="" /></p>
|
||||
|
||||
<p>In 2021, the sidebar expands:</p>
|
||||
|
||||
<p><img src="https://images.nunosempere.com/blog/2023/10/02/ea-forum-2021.png" alt="" /></p>
|
||||
|
||||
<p>In 2022, the sidebar expands further, and pinned and curated posts take up more space:</p>
|
||||
|
||||
<p><img src="https://images.nunosempere.com/blog/2023/10/02/ea-forum-2022.png" alt="" /></p>
|
||||
|
||||
<p>In 2023, the sidebar splits in two. Pinned and curated posts acquire shiny symbols. Since recently, you can now add <a href="https://forum.effectivealtruism.org/posts/fyCnfiL49T5HvMjvL/forum-update-10-new-features-oct-2023">reactions</a></p>
|
||||
|
||||
<p><img src="https://images.nunosempere.com/blog/2023/10/02/ea-forum-2023-bis.png" alt="" /></p>
|
||||
|
||||
<h3>EA forum costs</h3>
|
||||
|
||||
<p>Per <a href="https://forum.effectivealtruism.org/posts/auhi3JoiqGhi5PqnQ/ama-we-re-the-forum-team-and-we-re-hiring-ask-us-anything?commentId=tjTkjLpBD59ybtcuX">this comment</a>, the EA forum was spending circa $2M/year and employing 8 people as of July 2023. Per the <a href="https://www.centreforeffectivealtruism.org/team#online-team">website</a> of the Center for Effective Altruism, the online team now has 6 members, including ¿one designer?</p>
|
||||
|
||||
<h3>EA forum moderation</h3>
|
||||
|
||||
<p>In the beginning, when the EA forum was smaller, there was one moderator, Aaron Gertler, and all was well. Now, as the EA forum has grown, there is a larger pool of moderators, which protect the forum from spam and ban malicious users.</p>
|
||||
|
||||
<p>At the same time, the moderation team has acted <a href="https://forum.effectivealtruism.org/posts/myp9Y9qJnpEEWhJF9/linch-s-shortform?commentId=DvPcdhnN7wcXpcB7Z">against</a> <a href="https://forum.effectivealtruism.org/posts/Pfayu5Bf2apKreueD/?commentId=7cHvfzMLw2Jua9JPh">disagreeable</a> <a href="https://forum.effectivealtruism.org/posts/FZFzqPYpTpGGRhyrj/does-ea-get-the-best-people-hypotheses-call-for-discussion?commentId=o3mahDSh4wuHTvsXh">people</a> <a href="https://forum.effectivealtruism.org/posts/CfEAggjzSDrado6ZC/forecasting-our-world-in-data-the-next-100-years?commentId=upkHDudfh8c9FpM8u">that</a> <a href="https://forum.effectivealtruism.org/posts/DB9ggzc5u9RMBosoz/wrong-lessons-from-the-ftx-catastrophe?commentId=cp6ngfKrqyjsuAQoo">I</a> <a href="https://forum.effectivealtruism.org/posts/4zjnFxGWYkEF4nqMi/how-could-we-have-avoided-this?commentId=Q7BQJFyEwk96Q6g95">liked</a>.</p>
|
||||
|
||||
<p><strong>Counterpoint</strong>: When I review the <a href="https://forum.effectivealtruism.org/moderatorComments">moderation comments</a> log, moderation actions seem infrequent. I guess that disagreeable people whom I like getting banned or warned was memorable to me, though.</p>
|
||||
|
||||
<h3>EA forum culture evolution</h3>
|
||||
|
||||
<p>My impression is that the EA forum has been catering more to the <a href="https://nothinghuman.substack.com/p/the-tyranny-of-the-marginal-user">marginal user</a>; creating more introductory content, signposts, accessibility features, decreasing barriers to entry, etc. As the audience has increased, the marginal user is mostly a newbie. To me, the forum has been becoming more something like Reddit over time, which I dislike.</p>
|
||||
|
||||
<p>In stark contrast, consider <a href="https://news.ycombinator.com/">Hackernews</a>. Hackernews is an influential tech forum with <a href="https://news.ycombinator.com/item?id=33454140">5M monthly users and 10M views/day</a>. It has been able to retain its slim design through the years. Its moderation team has three persons, and they <a href="https://news.ycombinator.com/item?id=34920400"><em>correspond with users via email</em></a>.</p>
|
||||
|
||||
<h3>Brief thoughts on cost-effectiveness.</h3>
|
||||
|
||||
<p>The EA forum’s existence is valuable. It is still a place for high-quality discussion, and it helps the EA community collaborate on research, coordinate, identify opportunities, make sense of incoming challenges. But on top of the EA forum’s existence, are changes made in recent years positive at all, and worth $2M/year if so?</p>
|
||||
|
||||
<p>My individual perspective, my inside view, my personal guess is that a lean and mean version of the EA forum, in the style of Hackernews, would have done a better job for less money. From that perspective, the cost-effectiveness of the marginal $1.5M would be negative. Making a <a href="https://forum.effectivealtruism.org/posts/PAco5oG579k2qzrh9/ltff-and-eaif-are-unusually-funding-constrained-right-now">marginal donation</a> to the EA Infrastructure or Long-term Future Fund would have been a better choice.</p>
|
||||
|
||||
<p>A different perspective one might take, that I don’t know quite how to inhabit, might be to make the argument that actually, a small improvement in user experience leads to an increased chance that a person will become more committed to EA over its counterfactual, and that this is valuable. For example:</p>
|
||||
|
||||
<ol>
|
||||
<li>if the EA forum had 500k unique yearly visitors, and improvements to the forum in recent years mean that 1% of them continue interacting with the EA movement, that would lead to 5k counterfactual EAs. If think that creating more EAs is valuable, and we value this at $10k per EA, this would be worth $50M.</li>
|
||||
<li>if the forum influenced five to a hundred decisions a day each worth $1k to $100k, and improved them by 1% to 20%, this would be worth ~20M a year.</li>
|
||||
</ol>
|
||||
|
||||
|
||||
<p>The problem with those two hypothetical examples are that I don’t buy the numbers. I think it’s easy to greatly overestimate small percentages: when one is inclined to model something as having an influence of 1%, it’s often a 0.01% instead. Less importantly, I think one should use Shapley values instead of counterfactual values in order to avoid double-counting and over-spending<sup id="fnref:2"><a href="#fn:2" rel="footnote">2</a></sup>.</p>
|
||||
|
||||
<h3>Suggestions</h3>
|
||||
|
||||
<p>If you are a user of the forum…</p>
|
||||
|
||||
<ul>
|
||||
<li>Consider that the EA forum is currently pushing content on you. Make use of it if you are a newbie, but maybe actively filter it out once you are not.</li>
|
||||
<li>Consider using faster and more minimal frontends, like <a href="https://ea.greaterwrong.com/">ea.greaterwrong.com</a> or my own opinionated <a href="https://forum.nunosempere.com">forum.nunosempere.com</a>.</li>
|
||||
<li>Consider interacting with the EA forum frontpage through <a href="https://forum.effectivealtruism.org/feed.xml?view=community-rss&karmaThreshold=30">RSS</a> or the <a href="https://forum.effectivealtruism.org/allPosts">all posts</a> page, not the frontpage.</li>
|
||||
<li>Host your own content in independent platforms, like substack or your own blog, and build your own audience, rather than relying on a platform you don’t control. You can always cross-post it to the EA forum, but having an independent place to build your own audience and as a hedge costs you little.</li>
|
||||
</ul>
|
||||
|
||||
|
||||
<p>If you are a CEA director or middle manager, you might have thought about this more than I have. Still, you might want to:</p>
|
||||
|
||||
<ul>
|
||||
<li>Consider going back to ~1 developer and ~1 content person; save >$1M/year of your and your donors' money. My sense is that you are probably going to have to do this anyways, since you will probably not get enough money from donors<sup id="fnref:3"><a href="#fn:3" rel="footnote">3</a></sup>, to continue your current course.<sup id="fnref:4"><a href="#fn:4" rel="footnote">4</a></sup></li>
|
||||
<li>Consider characterizing the EA forum’s team role to be one of lightly shepharding discussion, not leading it or defining it.</li>
|
||||
<li>Consider reflecting on which incentives led to the creation of a larger EA Forum team. For example, Google has well-known incentives around managers being rewarded for leading larger teams to develop new products, and doesn’t value maintenance, leading to a continuous churn and sunsetting of Google products. Might something similar, though at a lower scale, have happened here?</li>
|
||||
<li>As a distant fourth point, consider opening up authentication mechanisms so that users can make comments and posts using open-source frontends. This was previously doable through the greaterwrong frontend, but is no longer possible. It’s possible that this might not be possible with your current software stack, or be too difficult, though.</li>
|
||||
</ul>
|
||||
|
||||
|
||||
<p>If you are working on the EA forum…</p>
|
||||
|
||||
<ul>
|
||||
<li>I am probably missing a bunch of factors in this analysis. If you think that spending $2M/year, or having 6 to 8 people full-time on the EA forum is meaningful, you might want to post a BOTEC outlining why.</li>
|
||||
<li>I think that this post probably sounds very harsh, sorry. Note that these three things can be true at the same time: a) a more minimalistic forum would have been better, b) CEA leadership made a bad judgment call expanding the EA forum during the FTX days and will now have to downsize, c) given your work description, you did good work.</li>
|
||||
<li>It is possible that your current position is precarious, e.g., that you might be fired, or transferred to a different project within CEA.</li>
|
||||
</ul>
|
||||
|
||||
<div class="footnotes">
|
||||
<hr/>
|
||||
<ol>
|
||||
<li id="fn:1">
|
||||
This is not a great amount in the grand scheme of things. Still, I am interested it in for two reasons: a) I’m working on a different piece, and this is a small, concrete case study that I can later reference, and b) I used to cherish the EA forum, and wrote over 100k words in it, only to see it become hostile to the type of disagreeable person that I am.<a href="#fnref:1" rev="footnote">↩</a></li>
|
||||
<li id="fn:2">
|
||||
E.g., I think that if four agents (80,000 hours; a local EA group; a personal friend; the EA forum) are needed to make someone significantly more altruistic, each organization should get ¼th of the credit. Otherwise the credit would sum up to more than 100%, and this hinders comparisons between opportunities. For a longer treatment of this topic, see <a href="https://forum.effectivealtruism.org/posts/XHZJ9i7QBtAJZ6byW/shapley-values-better-than-counterfactuals">this post</a>.<a href="#fnref:2" rev="footnote">↩</a></li>
|
||||
<li id="fn:3">
|
||||
Realistically, this is going to be mainly Open Philanthropy, as other donors can’t support $2M/year.<a href="#fnref:3" rev="footnote">↩</a></li>
|
||||
<li id="fn:4">
|
||||
you could check this by creating a market on Manifold!<a href="#fnref:4" rev="footnote">↩</a></li>
|
||||
</ol>
|
||||
</div>
|
||||
|
||||
<p>
|
||||
<section id='isso-thread'>
|
||||
<noscript>javascript needs to be activated to view comments.</noscript>
|
||||
</section>
|
||||
</p>
|
@ -0,0 +1,183 @@
|
||||
### Introduction
|
||||
|
||||
In recent years there have been various attempts at using forecasting to discern the shape of the future development of artificial intelligence, like the [AI progress Metaculus tournament](https://www.metaculus.com/tournament/ai-progress/), the Forecasting Research Institute's [existential risk forecasting tournament/experiment](https://forum.effectivealtruism.org/posts/un42vaZgyX7ch2kaj/announcing-forecasting-existential-risks-evidence-from-a), [Samotsvety forecasts](https://forum.effectivealtruism.org/posts/EG9xDM8YRz4JN4wMN/samotsvety-s-ai-risk-forecasts) on the topic of AI progress and dangers, or various questions osn [INFER](https://www.infer-pub.com) on short-term technological progress.
|
||||
|
||||
Here is a list of reasons, written with early input from Misha Yagudin, on why using forecasting to make sense of AI developments can be tricky, as well some casual suggestions of ways forward.
|
||||
|
||||
### Excellent forecasters and Superforecasters™ have an imperfect fit for long-term questions
|
||||
|
||||
Here are some reasons why we might expect longer-term predictions to be more difficult:
|
||||
|
||||
1. No fast feedback loops for long-term questions. You can't get that many predict/check/improve cycles, because questions many years into the future, tautologically, take many years to resolve. There are shortcuts, like this [past-casting](https://www.quantifiedintuitions.org/pastcasting) app, but they are imperfect.
|
||||
2. It's possible that short-term forecasters might acquire habits and intuitions that are good for forecasting short-term events, but bad for forecasting longer-term outcomes. For example, "things will change more slowly than you think" is a good heuristic to acquire for short-term predictions, but might be a bad heuristic for longer-term predictions, in the same sense that "people overestimate what they can do in a week, but underestimate what they can do in ten years". This might be particularly insidious to the extent that forecasters acquire intuitions which they can see are useful, but can't tell where they come from. In general, it seems unclear to what extent short-term forecasting skills would generalize to skill at longer-term predictions.
|
||||
3. "Predict no change" in particular might do well, until it doesn't. Consider a world which has a 2% probability of seeing a worldwide pandemic, or some other large catastrophe. Then on average it will take 50 years for one to occur. But at that point, those predicting a 2% will have a poorer track record compared to those who are predicting a ~0%.
|
||||
4. In general, we have been in a period of comparative technological stagnation, and forecasters might be adapted to that, in the same way that e.g., startups adapted to low interest rates.
|
||||
5. Sub-sampling artifacts within good short-term forecasters are tricky. For example, my forecasting group Samotsvety is relatively bullish on transformative technological change from AI, whereas the Forecasting Research Institute's pick of forecasters for their existential risk survey was more bearish.
|
||||
|
||||
### Forecasting loses value when decontextualized, and current forecasting seems pretty decontextualized
|
||||
|
||||
Forecasting seems more valuable when it is commissioned to inform a specific decision. For instance, suppose that you were thinking of starting a new startup. Then it would be interesting to look at:
|
||||
|
||||
- The base rate of success for startups
|
||||
- The base rate of success for all new businesses
|
||||
- The base rate of success for startups that your friends and wider social circle have started
|
||||
- Your personal rate of success at things in life
|
||||
- The inside view: decomposing the space between now and potential success into steps and giving explicit probabilities to each step
|
||||
- etc.
|
||||
|
||||
With this in mind, you could estimate the distribution of monetary returns to starting a startup, vs e.g., remaining an employee somewhere, and make the decision about what to do next with that estimate as an important factor.
|
||||
|
||||
But our impression is that AI forecasting hasn't been tied to specific decisions like that. Instead, it has tended to ask questions that might contribute to an "holistic understanding" of the field. For example, look at [Metaculus' AI progress tournament](https://www.metaculus.com/tournament/ai-progress/). The first few questions are:
|
||||
|
||||
- [How many Natural Language Processing e-prints will be published on arXiv over the 2021-01-14 to 2030-01-14 period?](https://www.metaculus.com/questions/6299/nlo-e-prints-2021-01-14-to-2030-01-14/)
|
||||
- [What percent will software and information services contribute to US GDP in Q4 of 2030?](https://www.metaculus.com/questions/5958/it-as--of-gdp-in-q4-2030/)
|
||||
- [What will be the average top price performance (in G3D Mark /$) of the best available GPU on the following dates?](https://www.metaculus.com/questions/11241/top-price-performance-of-gpus/)
|
||||
|
||||
My impression is that these questions don't have the immediacy of the previous example about startups failing; they aren't incredibly connected to impending decisions. You could draft questions which are more connected to impending decisions, like asking about whether specific AI safety research agendas would succeed, whether AI safety organizations that were previously funded would be funded again, or about how Open Philanthropy would evaluate its own AI safety grant-making in the future. However, these might be worse qua forecasting questions, or at least less Metaculus-like.
|
||||
|
||||
Overall, my impression is that forecasting questions about AI haven't been tied to specific decisions in a way that would make them incredibly valuable. This is curious, because if we look at the recent intellectual history of forecasting, its original raison d'être was to make US intelligence reports more useful, and those reports were directly tied to decisions. But now forecasts are presented separately. In our experience, it has often been more meaningful for forecasters to look in depth at a topic, and then produce a report which contains predictions, rather than producing predictions alone. But this doesn't happen often.
|
||||
|
||||
### The phenomena of interest are really imprecise
|
||||
|
||||
Misha Yagudin recalls that he knows of at least five different operationalizations of "human-level AGI". "Existential risk" is also ambiguous: does it refer to human extinction? or to losing a large fraction of possible human potential? if so, how is "human potential" specified?
|
||||
|
||||
To deal with this problem, one can:
|
||||
|
||||
- Not spend much time on operationalization, and accept that different forecasters will be talking about slightly different concepts.
|
||||
- Try to specify concepts as precisely as possible, which involves a large amount of effort.
|
||||
|
||||
Neither of those options is great. Although some platforms like Manifold Markets and Polymarket are experimenting with under-specified questions, forecasting seems to work best when working with clear definitions. And the fact that this is expensive to do makes the topic of AI a bit of a bad fit for forecasting.
|
||||
|
||||
CSET had a great report trying to address this difficulty: [Future Indices](https://search.nunosempere.com/search?q=Future%20Indices). By having a few somewhat overlapping questions on a topic, e.g., a few distinct operationalizations of AGI, or a few proxies that capture different aspects of a domain of interest, we can have a summary index that better captures the fuzzy concept that we are trying to reason about than any one imperfect question.
|
||||
|
||||
That approach does make dealing with imprecise phenomena easier. But it increases costs, and a bundle of very similar questions can sometimes be dull to forecast on. It also doesn't solve this problem completely—some concepts, like "disempowering humanity", still remain very ambiguous.
|
||||
|
||||
Here are some high-level examples for which operationalization might still be a concern:
|
||||
|
||||
- You might want to ask about whether "AI will go well". The answer depends whether you compare this against "humanity's maximum potential" or with human extinction.
|
||||
- You might want to ask whether any AI startup will "have powers akin to that of a world government".
|
||||
- You might want to ask about whether measures taken by AI labs are "competent".
|
||||
- You might want to ask about whether some AI system is "human-level", and find that there are wildly different operationalizations available for this
|
||||
|
||||
Here are some lower-level but more specific examples:
|
||||
|
||||
- Asking about FLOPs/$ seems like a tempting abstraction at first, because then you can estimate the FLOPs if the largest experiment is willing to spend $100M, $1B, $10B, etc. However, the abstraction ends up breaking down a bit when you look at specifics.
|
||||
- Dollars are unspecified: For example, consider a group like [Inflection](https://www.reuters.com/technology/inflection-ai-raises-13-bln-funding-microsoft-others-2023-06-29/), which raises $1B from NVIDIA and Microsoft, and pays NVIDIA and Microsoft $1B to buy the chips and build the datacenters. Then the FLOPs/$ is very under-defined. OpenAI's deal with Microsoft also makes their FLOPS/$ ambiguous. If China becomes involved, their ability to restrict emigration and the pre-eminent role of their government in the economy also makes FLOPs/$ ambiguous.
|
||||
- FLOPS are under-specified. Do you mean 64-bit precision bits? 16-bit precision? 8-bit precision? Do you count a [multiply-accumulate](https://wikiless.nunosempere.com/wiki/Multiply%E2%80%93accumulate_operation?lang=en) operation as one FLOP or two FLOPs?
|
||||
- Asking about what percentage of labor is automated gets tricky when, instead of automating exactly past labor, you automatize a complement. For example, instead of automatizing a restaurant as is, you design the menu and experience that is most amenable to being automated. Portable music devices don't automate concert halls, they provide a different experience. These differences matter when asking short-term resolvable questions about automation.
|
||||
- You might have some notion of a "leading lab". But operationalizing this is tricky, and simply enumerating current "leading labs" risks them being sidelined by an upstart, or that list not including important Chinese labs, etc. In our case, we have operationalized "leading lab" as "a lab that has performed a training run within 2 OOM of the largest ever at the time of the training run, within the last 2 years", which leans on the inclusive side, but requires keeping good data of what the largest training data is at each point in time, like [here](https://epochai.org/research/ml-trends), which might not be available in the future.
|
||||
|
||||
### Many questions don't resolve until it's already too late
|
||||
|
||||
Some of the questions we are most interested in, like "will AI permanently disempower humanity", "will there be a catastrophe caused by an AI system that kills >5%, or >95% of the human population", or "over the long-term, will humanity manage to harness AI to bring forth a flourishing future & achieve humanity's potential?" don't resolve until it's already too late.
|
||||
|
||||
This adds complications, because:
|
||||
|
||||
- Using short-term proxies rather than long-term outcomes brings its own problems
|
||||
- Question resolution after transformative AI poses incentive problems. E.g., the answer incentivized by "will we get unimaginable wealth?" is "no", because if we do get unimaginable wealth, the reward is worth less.
|
||||
- You may have ["prevention paradox"](https://en.wikipedia.org/wiki/Prevention_paradox) and fixed-point problems, where asking a probability reveals that some risk is high, after which you take measures to reduce that risk. You could have asked about the probability conditional on taking no measures, but then you can't resolve the forecasting question.
|
||||
- You can chain forecasts, e.g., ask "what will [another group] predict that the probability of [some future outcome] is, in one year". But this adds layers of indirection and increases operational burdens.
|
||||
|
||||
Another way to frame this is that some stances about how the future of AI will go are unfalsifiable until a hypothesized treacherous turn in which humanity dies, but otherwise don't have strong enough views on short-term developments that they are willing to bet on short-term events. That seems to be the takeaway from the [late 2021 MIRI conversations](https://www.lesswrong.com/s/n945eovrA3oDueqtq), which didn't result in a string of $100k bets. While this is a disappointing position to be in, not sure that forecasting can do much here beyond pointing it out.
|
||||
|
||||
### More dataset gathering is needed
|
||||
|
||||
A pillar of Tetlock-style forecasting is looking at historical frequencies and extrapolating trends. For the topic of AI, it might be interesting to do some systematic data gathering, in the style of Our World In Data-type work, on measures like:
|
||||
|
||||
- Algorithmic improvement for [chess/image classification/weather prediction/...]: how much compute do you need for equivalent performance? what performance can you get for equivalent compute?
|
||||
- Price of FLOPs
|
||||
- Size of models
|
||||
- Valuation of AI companies, number of AI companies through time
|
||||
- Number of organizations which have trained a model within 1, 2 OOM of the largest model
|
||||
- Performance on various capability benchmarks
|
||||
- Very noisy proxies: Machine learning papers uploaded to arXiv, mentions in political speeches, mentions in American legislation, Google n-gram frequency, mentions in major newspaper headlines, patents, number of PhD students, number of Sino-American collaborations, etc.
|
||||
- Answers to AI Impacts' survey of ML researchers through time
|
||||
- Funding directed to AI safety through time
|
||||
|
||||
Note that datasets for some of these exist, but systematic data collection and presentation in the style of [Our World In Data](https://ourworldindata.org/) would greatly simplify creating forecasting pipelines about these questions, and also produce an additional tool for figuring out "what is going on" at a high level with AI. As an example, there is a difference between "Katja Grace polls ML researchers every few years", and "there are pipelines in place to make sure that that survey happens regularly, and forecasting questions are automatically created five years in advance and included in forecasting tournaments with well-known rewards". [Epoch](https://epochai.org/) is doing some good work in this domain.
|
||||
|
||||
### Forecasting AI hits the limits of Bayesianism in general
|
||||
|
||||
One could answer worries about Tetlock-style forecasting by saying: sure, that particular brand of forecasting isn't known to work on long-term predictions. But we have good theoretical reasons to think that Bayesianism is a good model of a perfect reasoner: see for example the review of [Cox's theorem](https://en.wikipedia.org/wiki/Cox%27s_theorem) in the first few chapters of [Probability Theory. The Logic of Science](https://annas-archive.org/md5/ddec0cf1982afa288d61db3e1f7d9323). So the thing that we should be doing is some version of subjective Bayesianism: keeping track of evidence and expressing and sharpening our beliefs with further evidence. See [here](https://nunosempere.com/blog/2022/08/31/on-cox-s-theorem-and-probabilistic-induction/) for a blog post making this argument in more length, though still informally.
|
||||
|
||||
But Bayesianism is a good model of a perfect reasoner with *infinite compute* and *infinite memory*, and in particular access to a bag of hypotheses which contains the true hypothesis. However, humans don't have infinite compute, and sometimes don't have the correct hypothesis in mind. [Knightian uncertainty](https://en.wikipedia.org/wiki/Knightian_uncertainty) and [Kuhnian revolutions](https://en.wikipedia.org/wiki/The_Structure_of_Scientific_Revolutions)[^kuhn], [Black swans](https://en.wikipedia.org/wiki/Black_swan_theory) or [ambiguity aversion](https://en.wikipedia.org/wiki/Ambiguity_aversion) can be understood as consequences of normally being able to get around being approximately Bayesian, but sometimes getting bitten by that approximation being bounded and limited.
|
||||
|
||||
[^kuhn]: To spell this out more clearly, Kuhn was looking at the structure of scientific revolutions, and he notices that you have these "paradigm changes" every now in a while. As a naïve Bayesian, those paradigm changes are kinda confusing, and shouldn't have any special status. You should just have hypotheses, and they should just rise and fall in likelihood according to Bayes rule. But as a Bayesian who knows he has finite compute/memory, you can think of Kuhnian revolutions as encountering a true hypothesis which was outside your previous hypothesis space, and having to recalculate. On this topic, see [Just-in-time Bayesianism](https://nunosempere.com/blog/2023/02/04/just-in-time-bayesianism/) or [A computable version of Solomonoff induction](https://nunosempere.com/blog/2023/03/01/computable-solomonoff/).
|
||||
|
||||
So there are some situations where we can get along by being approximately Bayesian, like coin flips and blackjack tables, domains where we pull our hairs and accept that we don't have infinite compute, like maybe some turbulent and chaotic physical systems or trying to predict dreams. Then we have some domains in which our ability to predict is meaningfully improving with time, like for example weather forecasts, where we can throw supercomputers and PhD students at it, because we care.
|
||||
|
||||
Now the question is where AI in particular falls within that spectrum. Personally, I suspect that it is a domain in which we are likely to not have the correct hypothesis in our prior set of hypotheses. For example, observers in general, but also the [Machine Intelligence Research Institute](https://intelligence.org/) in particular, failed to predict the rise of LLMs and to orient their efforts into making such systems safer, or into preventing such systems from coming into existence. I think this tweet, though maybe meant to be hurtful, is also informative about how tricky of a domain predicting AI progress is:
|
||||
|
||||
<blockquote class="twitter-tweet"><p lang="en" dir="ltr">eliezer has IMO done more to accelerate AGI than anyone else.<br><br>certainly he got many of us interested in AGI, helped deepmind get funded at a time when AGI was extremely outside the overton window, was critical in the decision to start openai, etc.</p>— Sam Altman (@sama) <a href="https://twitter.com/sama/status/1621621724507938816?ref_src=twsrc%5Etfw">February 3, 2023</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>
|
||||
|
||||
However, consider the following caveat: imagine that instead of being interested in AI progress, we were interested in social science, and concerned that they couldn't arrive at the correct conclusion in cases where it was Republican-flavored. Then, one could notice that moving from p-values to likelihood ratios and Bayesian calculations wouldn't particularly help, since Bayesianism doesn't work unless your prior assigns a sufficiently high prior probability to the correct hypothesis. In this case, I think one easy mistake to make might be to just shrug and keep using p-values.
|
||||
|
||||
Similarly, for AI progress, one could notice that there is this subtle critique of forecasting and Bayesianism, and move to using, I don't know, scenario planning, which arguendo could be even worse, assume even more strongly that you know the shape of events to come, or not provide mechanisms for noticing that none of your hypotheses are worth much. I think that would be a mistake.
|
||||
|
||||
### Forecasting also has a bunch of other limitations as a genre
|
||||
|
||||
You can see forecasting as a type of genre. In it, someone writes a forecasting question, that question is deemed sufficiently robust, and then forecasters produce probabilities on it. As a genre, it has some limitations. For instance, when curious about a topic, not all roads lead to forecasting questions, and working in a project such that you *have* to produce forecasting questions could be oddly limited.
|
||||
|
||||
The conventions of the forecasting genre also dictate that forecasters will spend a fairly short amount of time researching before making a prediction. Partly this is a result of, for example, the scoring rule in Metaculus, which incentivized forecasting on many questions. Partly this is because forecasting platforms don't generally pay their forecasters, and even those that are [well funded](https://www.openphilanthropy.org/grants/applied-research-laboratory-for-intelligence-and-security-forecasting-platforms/) pay their forecasters badly, which leads to forecating being a hobby, rather than a full-time occupation. If one thinks that some questions require one to dig deep, and that one will otherwise easily produce shitty forecasts, this might be a particularly worrying feature of the genre.
|
||||
|
||||
Perhaps also as a result of its unprofitability, the forecasting community has also tended to see a large amount of churn, as hobbyist forecasters rise up in their regular careers and it becomes more expensive for them in terms of income lost to forecast on online platforms. You also see this churn in terms of employees of these forecasting platforms, where maybe someone creates some new project—e.g., Replication Markets, Metaculus' AI Progress Tournament, Ought's Elicit, etc.—but then that project dies as its principal person moves on to other topics.
|
||||
|
||||
Forecasting also makes use of scoring rules, which aim to reward forecasters such that they will be incentivized to input their true probabilities. Sadly, these often have the effect of incentivizing people to not collaborate and share information. This can be fixed by using more capital-intensive scoring rules that incentivize collaboration, like [these ones](https://github.com/SamotsvetyForecasting/optimal-scoring) or by grouping forecasters into teams such that they will be incentivized to share information within a team.
|
||||
|
||||
### As an aside, here is a casual review of the track record of long-term predictions
|
||||
|
||||
If we review the track record of superforecasters on longer term questions, we find that... there isn't that much evidence here—remember that the [ACE program](https://wikiless.nunosempere.com/wiki/Aggregative_Contingent_Estimation_Program?lang=en) started in 2010. In *Superforecasting* (2015), Tetlock wrote:
|
||||
|
||||
> Taleb, Kahneman, and I agree there is no evidence that geopolitical or economic forecasters can predict anything ten years out beyond the excruciatingly obvious—“there will be conflicts”—and the odd lucky hits that are inevitable whenever lots of forecasters make lots of forecasts. These limits on predictability are the predictable results of the butterfly dynamics of nonlinear systems. In my EPJ research, the accuracy of expert predictions declined toward chance five years out. And yet, this sort of forecasting is common, even within institutions that should know better.
|
||||
|
||||
However, in p. 33 of [Long-Range Subjective-Probability Forecasts of Slow-Motion Variables in World Politics: Exploring Limits on Expert Judgment](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4377599) (2023), we see that the experts predicting "slow-motion variables" 25 years into the future attain a Brier score of 0.07, which isn't terrible.
|
||||
|
||||
Karnofsky, the erstwhile head-honcho of Open Philanthropy, [spins](https://www.cold-takes.com/the-track-record-of-futurists-seems-fine/) some research by Arb and others as saying that the track record of futurists is "fine". [Here](https://danluu.com/futurist-predictions/) is a more thorough post by Dan Luu which concludes that:
|
||||
|
||||
> ...people who were into "big ideas" who use a few big hammers on every prediction combined with a cocktail party idea level of understanding of the particular subject to explain why a prediction about the subject would fall to the big hammer generally fared poorly, whether or not their favored big ideas were correct. Some examples of "big ideas" would be "environmental doomsday is coming and hyperconservation will pervade everything", "economic growth will create near-infinite wealth (soon)", "Moore's law is supremely important", "quantum mechanics is supremely important", etc. Another common trait of poor predictors is lack of anything resembling serious evaluation of past predictive errors, making improving their intuition or methods impossible (unless they do so in secret). Instead, poor predictors often pick a few predictions that were accurate or at least vaguely sounded similar to an accurate prediction and use those to sell their next generation of predictions to others.
|
||||
>
|
||||
> By contrast, people who had (relatively) accurate predictions had a deep understanding of the problem and also tended to have a record of learning lessons from past predictive errors. Due to the differences in the data sets between this post and Tetlock's work, the details are quite different here. The predictors that I found to be relatively accurate had deep domain knowledge and, implicitly, had access to a huge amount of information that they filtered effectively in order to make good predictions. Tetlock was studying people who made predictions about a wide variety of areas that were, in general, outside of their areas of expertise, so what Tetlock found was that people really dug into the data and deeply understood the limitations of the data, which allowed them to make relatively accurate predictions. But, although the details of how people operated are different, at a high-level, the approach of really digging into specific knowledge was the same.
|
||||
|
||||
### In comparison with other mechanisms for making sense of future AI developments, forecasting does OK.
|
||||
|
||||
Here are some mechanisms that the Effective Altruism community has historically used to try to make sense of possible dangers stemming from future AI developments:
|
||||
|
||||
- Books, like Bostrom's *Superintelligence*, which focused on the abstract properties of highly intelligent and capable agents in the limit.
|
||||
- [Reports](https://www.openphilanthropy.org/research/?q=&focus-area%5B%5D=potential-risks-advanced-ai&content-type%5B%5D=research-reports) by Open Philanthropy. They either try to model AI progress in some detail, like [example 1](https://www.lesswrong.com/posts/KrJfoZzpSDpnrv9va/draft-report-on-ai-timelines), or look at priors on technological development, like [example 2](https://www.openphilanthropy.org/research/semi-informative-priors-over-ai-timelines/).
|
||||
- Mini think tanks, like Rethink Priorities, Epoch or AI impacts, which produce their own research and reports.
|
||||
- Larger think tanks, like CSET, which produce reports like [this one](https://cset.georgetown.edu/publication/future-indices/) on Future Indices.
|
||||
- Online discussion on lesswrong.com, that typically assumes things like: intelligence gains would be fast and explosive, that we should aim to design a mathematical construction that guarantees safety, that iteration would not be advisable in the face of fast intelligence gains, etc.
|
||||
- Occasionally, theoretical or mathematical arguments or models of risk.
|
||||
- One-off projects, like Drexler's [Comprehensive AI systems](https://www.fhi.ox.ac.uk/reframing/)
|
||||
- Questions on forecasting platforms, like Metaculus, that try to solidly operationalize possible AI developments and dangers, and ask their forecasters to anticipate when and whether they will happen.
|
||||
- Writeups from forecasting groups, like [Samotsvety](https://forum.effectivealtruism.org/posts/EG9xDM8YRz4JN4wMN/samotsvety-s-ai-risk-forecasts)
|
||||
- More recently, the Forecasting Research Institute's [existential risk tournament/experiment writeup](https://forecastingresearch.org/xpt), which has tried to translate geopolitical forecasting mechanisms to predicting AI progress, with mixed success.
|
||||
- Deferring to intellectuals, ideologues, and cheerleaders, like Toby Ord, Yudkowsky or MacAskill.
|
||||
|
||||
None of these options, as they currently exist, seem great. Forecasting has the hurdles discussed above, but maybe other mechanisms have even worse downsides, particularly the more pundit-like ones. Conversely, forecasting will be worse than deferring to a brilliant theoretical mind that is able to grasp the dynamics and subtleties of future AI development, like perhaps Drexler's on a good day.
|
||||
|
||||
Anyways, you might think that this forecasting thing shows potential. Were you a billionnaire, money would not be a limitation for you, so...
|
||||
|
||||
### In this situation, here are some strategies of which you might avail yourself
|
||||
|
||||
#### A. Accept the Faustian bargain
|
||||
|
||||
1. Make a bunch of short-term and long-term forecasting questions on AI progress
|
||||
2. Wait for the short-term forecasting questions to resolve
|
||||
3. Weight the forecasts for the long-term questions according to accuracy in the short term questions
|
||||
|
||||
This is a Faustian bargain because of the reasons reviewed above, chiefly that short-term forecasting performance is not a guarantee of longer term forecasting performance. A cheap version of this would be to look at the best short-term forecasters on the AI categories on Metaculus, and report their probabilities on a few AI and existential risk questions, which would be more interpretable than the current opaque "Metaculus prediction".
|
||||
|
||||
If you think that your other methods of making sense of what it's going on are sufficiently bad, you could choose this and hope for the best? Or, conversely, you could anchor your beliefs on a weighted aggregate of the best short-term forecasters and the most convincing theoretical views. Maybe things will be fine?
|
||||
|
||||
#### B. Attempt to do a Bayesianism
|
||||
|
||||
Go to the effort of rigorously formulating hypotheses, then keep track of incoming evidence for each hypothesis. If a new hypothesis comes in, try to do some version of [just-in-time Bayesianism](https://nunosempere.com/blog/2023/02/04/just-in-time-bayesianism/), i.e., monkey-patch it after the fact. Once you are specifying your beliefs numerically, you can deploy some cute incentive mechanisms and [reward people who change your mind](https://github.com/SamotsvetyForecasting/optimal-scoring/blob/master/3-amplify-bayesian/amplify-bayesian.pdf).
|
||||
|
||||
Hope that keeping track of hypotheses about the development of AI at least gives you some discipline, and enables you to shed untrue hypotheses or frames a bit earlier than you otherwise would have. Have the discipline to translate the worldviews of various pundits into specific probabilities[^tetlock], and listen to them less when their predictions fail to come. And hope that going to the trouble of doing things that way allows you to anticipate stuff 6 months to 2 years sooner than you would have otherwise, and that it is worth the cost.
|
||||
|
||||
[^tetlock]: Back in the day, Tetlock received a [grant](https://www.openphilanthropy.org/grants/university-of-pennsylvania-philip-tetlock-on-forecasting/#2-about-the-grant) to "systematically convert vague predictions made by prominent pundits into explicit numerical forecasts", but I haven't been able to track what happened to it, and I suspect it never happened.
|
||||
|
||||
#### C. Invest in better prediction pipelines as a whole
|
||||
|
||||
Try to build up some more speculative and [formidable](https://nunosempere.com/blog/2023/07/19/better-harder-faster-stronger/) type of forecasting that can deal with the hurdles above. Be more explicit about the types of decisions that you want better foresight for, realize that you don't have the tools you need, and build someone up to be that for you.
|
@ -0,0 +1,5 @@
|
||||
<p>
|
||||
<section id='isso-thread'>
|
||||
<noscript>javascript needs to be activated to view comments.</noscript>
|
||||
</section>
|
||||
</p>
|
@ -0,0 +1,5 @@
|
||||
MARKDOWN=/usr/bin/markdown -f fencedcode -f ext -f footnote -f latex
|
||||
build:
|
||||
$(MARKDOWN) index.md > temp
|
||||
cat title.md temp isso-snippet.txt > ../index.md
|
||||
rm temp
|
@ -0,0 +1,3 @@
|
||||
Hurdles of using forecasting as a tool for making sense of AI progress
|
||||
======================================================================
|
||||
|
@ -0,0 +1,234 @@
|
||||
Hurdles of using forecasting as a tool for making sense of AI progress
|
||||
======================================================================
|
||||
|
||||
<h3>Introduction</h3>
|
||||
|
||||
<p>In recent years there have been various attempts at using forecasting to discern the shape of the future development of artificial intelligence, like the <a href="https://www.metaculus.com/tournament/ai-progress/">AI progress Metaculus tournament</a>, the Forecasting Research Institute’s <a href="https://forum.effectivealtruism.org/posts/un42vaZgyX7ch2kaj/announcing-forecasting-existential-risks-evidence-from-a">existential risk forecasting tournament/experiment</a>, <a href="https://forum.effectivealtruism.org/posts/EG9xDM8YRz4JN4wMN/samotsvety-s-ai-risk-forecasts">Samotsvety forecasts</a> on the topic of AI progress and dangers, or various questions osn <a href="https://www.infer-pub.com">INFER</a> on short-term technological progress.</p>
|
||||
|
||||
<p>Here is a list of reasons, written with early input from Misha Yagudin, on why using forecasting to make sense of AI developments can be tricky, as well some casual suggestions of ways forward.</p>
|
||||
|
||||
<h3>Excellent forecasters and Superforecasters™ have an imperfect fit for long-term questions</h3>
|
||||
|
||||
<p>Here are some reasons why we might expect longer-term predictions to be more difficult:</p>
|
||||
|
||||
<ol>
|
||||
<li>No fast feedback loops for long-term questions. You can’t get that many predict/check/improve cycles, because questions many years into the future, tautologically, take many years to resolve. There are shortcuts, like this <a href="https://www.quantifiedintuitions.org/pastcasting">past-casting</a> app, but they are imperfect.</li>
|
||||
<li>It’s possible that short-term forecasters might acquire habits and intuitions that are good for forecasting short-term events, but bad for forecasting longer-term outcomes. For example, “things will change more slowly than you think” is a good heuristic to acquire for short-term predictions, but might be a bad heuristic for longer-term predictions, in the same sense that “people overestimate what they can do in a week, but underestimate what they can do in ten years”. This might be particularly insidious to the extent that forecasters acquire intuitions which they can see are useful, but can’t tell where they come from. In general, it seems unclear to what extent short-term forecasting skills would generalize to skill at longer-term predictions.</li>
|
||||
<li>“Predict no change” in particular might do well, until it doesn’t. Consider a world which has a 2% probability of seeing a worldwide pandemic, or some other large catastrophe. Then on average it will take 50 years for one to occur. But at that point, those predicting a 2% will have a poorer track record compared to those who are predicting a ~0%.</li>
|
||||
<li>In general, we have been in a period of comparative technological stagnation, and forecasters might be adapted to that, in the same way that e.g., startups adapted to low interest rates.</li>
|
||||
<li>Sub-sampling artifacts within good short-term forecasters are tricky. For example, my forecasting group Samotsvety is relatively bullish on transformative technological change from AI, whereas the Forecasting Research Institute’s pick of forecasters for their existential risk survey was more bearish.</li>
|
||||
</ol>
|
||||
|
||||
|
||||
<h3>Forecasting loses value when decontextualized, and current forecasting seems pretty decontextualized</h3>
|
||||
|
||||
<p>Forecasting seems more valuable when it is commissioned to inform a specific decision. For instance, suppose that you were thinking of starting a new startup. Then it would be interesting to look at:</p>
|
||||
|
||||
<ul>
|
||||
<li>The base rate of success for startups</li>
|
||||
<li>The base rate of success for all new businesses</li>
|
||||
<li>The base rate of success for startups that your friends and wider social circle have started</li>
|
||||
<li>Your personal rate of success at things in life</li>
|
||||
<li>The inside view: decomposing the space between now and potential success into steps and giving explicit probabilities to each step</li>
|
||||
<li>etc.</li>
|
||||
</ul>
|
||||
|
||||
|
||||
<p>With this in mind, you could estimate the distribution of monetary returns to starting a startup, vs e.g., remaining an employee somewhere, and make the decision about what to do next with that estimate as an important factor.</p>
|
||||
|
||||
<p>But our impression is that AI forecasting hasn’t been tied to specific decisions like that. Instead, it has tended to ask questions that might contribute to an “holistic understanding” of the field. For example, look at <a href="https://www.metaculus.com/tournament/ai-progress/">Metaculus' AI progress tournament</a>. The first few questions are:</p>
|
||||
|
||||
<ul>
|
||||
<li><a href="https://www.metaculus.com/questions/6299/nlo-e-prints-2021-01-14-to-2030-01-14/">How many Natural Language Processing e-prints will be published on arXiv over the 2021-01-14 to 2030-01-14 period?</a></li>
|
||||
<li><a href="https://www.metaculus.com/questions/5958/it-as--of-gdp-in-q4-2030/">What percent will software and information services contribute to US GDP in Q4 of 2030?</a></li>
|
||||
<li><a href="https://www.metaculus.com/questions/11241/top-price-performance-of-gpus/">What will be the average top price performance (in G3D Mark /$) of the best available GPU on the following dates?</a></li>
|
||||
</ul>
|
||||
|
||||
|
||||
<p>My impression is that these questions don’t have the immediacy of the previous example about startups failing; they aren’t incredibly connected to impending decisions. You could draft questions which are more connected to impending decisions, like asking about whether specific AI safety research agendas would succeed, whether AI safety organizations that were previously funded would be funded again, or about how Open Philanthropy would evaluate its own AI safety grant-making in the future. However, these might be worse qua forecasting questions, or at least less Metaculus-like.</p>
|
||||
|
||||
<p>Overall, my impression is that forecasting questions about AI haven’t been tied to specific decisions in a way that would make them incredibly valuable. This is curious, because if we look at the recent intellectual history of forecasting, its original raison d'être was to make US intelligence reports more useful, and those reports were directly tied to decisions. But now forecasts are presented separately. In our experience, it has often been more meaningful for forecasters to look in depth at a topic, and then produce a report which contains predictions, rather than producing predictions alone. But this doesn’t happen often.</p>
|
||||
|
||||
<h3>The phenomena of interest are really imprecise</h3>
|
||||
|
||||
<p>Misha Yagudin recalls that he knows of at least five different operationalizations of “human-level AGI”. “Existential risk” is also ambiguous: does it refer to human extinction? or to losing a large fraction of possible human potential? if so, how is “human potential” specified?</p>
|
||||
|
||||
<p>To deal with this problem, one can:</p>
|
||||
|
||||
<ul>
|
||||
<li>Not spend much time on operationalization, and accept that different forecasters will be talking about slightly different concepts.</li>
|
||||
<li>Try to specify concepts as precisely as possible, which involves a large amount of effort.</li>
|
||||
</ul>
|
||||
|
||||
|
||||
<p>Neither of those options is great. Although some platforms like Manifold Markets and Polymarket are experimenting with under-specified questions, forecasting seems to work best when working with clear definitions. And the fact that this is expensive to do makes the topic of AI a bit of a bad fit for forecasting.</p>
|
||||
|
||||
<p>CSET had a great report trying to address this difficulty: <a href="https://search.nunosempere.com/search?q=Future%20Indices">Future Indices</a>. By having a few somewhat overlapping questions on a topic, e.g., a few distinct operationalizations of AGI, or a few proxies that capture different aspects of a domain of interest, we can have a summary index that better captures the fuzzy concept that we are trying to reason about than any one imperfect question.</p>
|
||||
|
||||
<p>That approach does make dealing with imprecise phenomena easier. But it increases costs, and a bundle of very similar questions can sometimes be dull to forecast on. It also doesn’t solve this problem completely—some concepts, like “disempowering humanity”, still remain very ambiguous.</p>
|
||||
|
||||
<p>Here are some high-level examples for which operationalization might still be a concern:</p>
|
||||
|
||||
<ul>
|
||||
<li>You might want to ask about whether “AI will go well”. The answer depends whether you compare this against “humanity’s maximum potential” or with human extinction.</li>
|
||||
<li>You might want to ask whether any AI startup will “have powers akin to that of a world government”.</li>
|
||||
<li>You might want to ask about whether measures taken by AI labs are “competent”.</li>
|
||||
<li>You might want to ask about whether some AI system is “human-level”, and find that there are wildly different operationalizations available for this</li>
|
||||
</ul>
|
||||
|
||||
|
||||
<p>Here are some lower-level but more specific examples:</p>
|
||||
|
||||
<ul>
|
||||
<li>Asking about FLOPs/$ seems like a tempting abstraction at first, because then you can estimate the FLOPs if the largest experiment is willing to spend $100M, $1B, $10B, etc. However, the abstraction ends up breaking down a bit when you look at specifics.
|
||||
|
||||
<ul>
|
||||
<li>Dollars are unspecified: For example, consider a group like <a href="https://www.reuters.com/technology/inflection-ai-raises-13-bln-funding-microsoft-others-2023-06-29/">Inflection</a>, which raises $1B from NVIDIA and Microsoft, and pays NVIDIA and Microsoft $1B to buy the chips and build the datacenters. Then the FLOPs/$ is very under-defined. OpenAI’s deal with Microsoft also makes their FLOPS/$ ambiguous. If China becomes involved, their ability to restrict emigration and the pre-eminent role of their government in the economy also makes FLOPs/$ ambiguous.</li>
|
||||
<li>FLOPS are under-specified. Do you mean 64-bit precision bits? 16-bit precision? 8-bit precision? Do you count a <a href="https://wikiless.nunosempere.com/wiki/Multiply%E2%80%93accumulate_operation?lang=en">multiply-accumulate</a> operation as one FLOP or two FLOPs?</li>
|
||||
</ul>
|
||||
</li>
|
||||
<li>Asking about what percentage of labor is automated gets tricky when, instead of automating exactly past labor, you automatize a complement. For example, instead of automatizing a restaurant as is, you design the menu and experience that is most amenable to being automated. Portable music devices don’t automate concert halls, they provide a different experience. These differences matter when asking short-term resolvable questions about automation.</li>
|
||||
<li>You might have some notion of a “leading lab”. But operationalizing this is tricky, and simply enumerating current “leading labs” risks them being sidelined by an upstart, or that list not including important Chinese labs, etc. In our case, we have operationalized “leading lab” as “a lab that has performed a training run within 2 OOM of the largest ever at the time of the training run, within the last 2 years”, which leans on the inclusive side, but requires keeping good data of what the largest training data is at each point in time, like <a href="https://epochai.org/research/ml-trends">here</a>, which might not be available in the future.</li>
|
||||
</ul>
|
||||
|
||||
|
||||
<h3>Many questions don’t resolve until it’s already too late</h3>
|
||||
|
||||
<p>Some of the questions we are most interested in, like “will AI permanently disempower humanity”, “will there be a catastrophe caused by an AI system that kills >5%, or >95% of the human population”, or “over the long-term, will humanity manage to harness AI to bring forth a flourishing future & achieve humanity’s potential?” don’t resolve until it’s already too late.</p>
|
||||
|
||||
<p>This adds complications, because:</p>
|
||||
|
||||
<ul>
|
||||
<li>Using short-term proxies rather than long-term outcomes brings its own problems</li>
|
||||
<li>Question resolution after transformative AI poses incentive problems. E.g., the answer incentivized by “will we get unimaginable wealth?” is “no”, because if we do get unimaginable wealth, the reward is worth less.</li>
|
||||
<li>You may have <a href="https://en.wikipedia.org/wiki/Prevention_paradox">“prevention paradox”</a> and fixed-point problems, where asking a probability reveals that some risk is high, after which you take measures to reduce that risk. You could have asked about the probability conditional on taking no measures, but then you can’t resolve the forecasting question.</li>
|
||||
<li>You can chain forecasts, e.g., ask “what will [another group] predict that the probability of [some future outcome] is, in one year”. But this adds layers of indirection and increases operational burdens.</li>
|
||||
</ul>
|
||||
|
||||
|
||||
<p>Another way to frame this is that some stances about how the future of AI will go are unfalsifiable until a hypothesized treacherous turn in which humanity dies, but otherwise don’t have strong enough views on short-term developments that they are willing to bet on short-term events. That seems to be the takeaway from the <a href="https://www.lesswrong.com/s/n945eovrA3oDueqtq">late 2021 MIRI conversations</a>, which didn’t result in a string of $100k bets. While this is a disappointing position to be in, not sure that forecasting can do much here beyond pointing it out.</p>
|
||||
|
||||
<h3>More dataset gathering is needed</h3>
|
||||
|
||||
<p>A pillar of Tetlock-style forecasting is looking at historical frequencies and extrapolating trends. For the topic of AI, it might be interesting to do some systematic data gathering, in the style of Our World In Data-type work, on measures like:</p>
|
||||
|
||||
<ul>
|
||||
<li>Algorithmic improvement for [chess/image classification/weather prediction/…]: how much compute do you need for equivalent performance? what performance can you get for equivalent compute?</li>
|
||||
<li>Price of FLOPs</li>
|
||||
<li>Size of models</li>
|
||||
<li>Valuation of AI companies, number of AI companies through time</li>
|
||||
<li>Number of organizations which have trained a model within 1, 2 OOM of the largest model</li>
|
||||
<li>Performance on various capability benchmarks</li>
|
||||
<li>Very noisy proxies: Machine learning papers uploaded to arXiv, mentions in political speeches, mentions in American legislation, Google n-gram frequency, mentions in major newspaper headlines, patents, number of PhD students, number of Sino-American collaborations, etc.</li>
|
||||
<li>Answers to AI Impacts' survey of ML researchers through time</li>
|
||||
<li>Funding directed to AI safety through time</li>
|
||||
</ul>
|
||||
|
||||
|
||||
<p>Note that datasets for some of these exist, but systematic data collection and presentation in the style of <a href="https://ourworldindata.org/">Our World In Data</a> would greatly simplify creating forecasting pipelines about these questions, and also produce an additional tool for figuring out “what is going on” at a high level with AI. As an example, there is a difference between “Katja Grace polls ML researchers every few years”, and “there are pipelines in place to make sure that that survey happens regularly, and forecasting questions are automatically created five years in advance and included in forecasting tournaments with well-known rewards”. <a href="https://epochai.org/">Epoch</a> is doing some good work in this domain.</p>
|
||||
|
||||
<h3>Forecasting AI hits the limits of Bayesianism in general</h3>
|
||||
|
||||
<p>One could answer worries about Tetlock-style forecasting by saying: sure, that particular brand of forecasting isn’t known to work on long-term predictions. But we have good theoretical reasons to think that Bayesianism is a good model of a perfect reasoner: see for example the review of <a href="https://en.wikipedia.org/wiki/Cox%27s_theorem">Cox’s theorem</a> in the first few chapters of <a href="https://annas-archive.org/md5/ddec0cf1982afa288d61db3e1f7d9323">Probability Theory. The Logic of Science</a>. So the thing that we should be doing is some version of subjective Bayesianism: keeping track of evidence and expressing and sharpening our beliefs with further evidence. See <a href="https://nunosempere.com/blog/2022/08/31/on-cox-s-theorem-and-probabilistic-induction/">here</a> for a blog post making this argument in more length, though still informally.</p>
|
||||
|
||||
<p>But Bayesianism is a good model of a perfect reasoner with <em>infinite compute</em> and <em>infinite memory</em>, and in particular access to a bag of hypotheses which contains the true hypothesis. However, humans don’t have infinite compute, and sometimes don’t have the correct hypothesis in mind. <a href="https://en.wikipedia.org/wiki/Knightian_uncertainty">Knightian uncertainty</a> and <a href="https://en.wikipedia.org/wiki/The_Structure_of_Scientific_Revolutions">Kuhnian revolutions</a><sup id="fnref:1"><a href="#fn:1" rel="footnote">1</a></sup>, <a href="https://en.wikipedia.org/wiki/Black_swan_theory">Black swans</a> or <a href="https://en.wikipedia.org/wiki/Ambiguity_aversion">ambiguity aversion</a> can be understood as consequences of normally being able to get around being approximately Bayesian, but sometimes getting bitten by that approximation being bounded and limited.</p>
|
||||
|
||||
<p>So there are some situations where we can get along by being approximately Bayesian, like coin flips and blackjack tables, domains where we pull our hairs and accept that we don’t have infinite compute, like maybe some turbulent and chaotic physical systems or trying to predict dreams. Then we have some domains in which our ability to predict is meaningfully improving with time, like for example weather forecasts, where we can throw supercomputers and PhD students at it, because we care.</p>
|
||||
|
||||
<p>Now the question is where AI in particular falls within that spectrum. Personally, I suspect that it is a domain in which we are likely to not have the correct hypothesis in our prior set of hypotheses. For example, observers in general, but also the <a href="https://intelligence.org/">Machine Intelligence Research Institute</a> in particular, failed to predict the rise of LLMs and to orient their efforts into making such systems safer, or into preventing such systems from coming into existence. I think this tweet, though maybe meant to be hurtful, is also informative about how tricky of a domain predicting AI progress is:</p>
|
||||
|
||||
<blockquote class="twitter-tweet"><p lang="en" dir="ltr">eliezer has IMO done more to accelerate AGI than anyone else.<br><br>certainly he got many of us interested in AGI, helped deepmind get funded at a time when AGI was extremely outside the overton window, was critical in the decision to start openai, etc.</p>— Sam Altman (@sama) <a href="https://twitter.com/sama/status/1621621724507938816?ref_src=twsrc%5Etfw">February 3, 2023</a></blockquote>
|
||||
|
||||
|
||||
<p> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
|
||||
|
||||
<p>However, consider the following caveat: imagine that instead of being interested in AI progress, we were interested in social science, and concerned that they couldn’t arrive at the correct conclusion in cases where it was Republican-flavored. Then, one could notice that moving from p-values to likelihood ratios and Bayesian calculations wouldn’t particularly help, since Bayesianism doesn’t work unless your prior assigns a sufficiently high prior probability to the correct hypothesis. In this case, I think one easy mistake to make might be to just shrug and keep using p-values.</p>
|
||||
|
||||
<p>Similarly, for AI progress, one could notice that there is this subtle critique of forecasting and Bayesianism, and move to using, I don’t know, scenario planning, which arguendo could be even worse, assume even more strongly that you know the shape of events to come, or not provide mechanisms for noticing that none of your hypotheses are worth much. I think that would be a mistake.</p>
|
||||
|
||||
<h3>Forecasting also has a bunch of other limitations as a genre</h3>
|
||||
|
||||
<p>You can see forecasting as a type of genre. In it, someone writes a forecasting question, that question is deemed sufficiently robust, and then forecasters produce probabilities on it. As a genre, it has some limitations. For instance, when curious about a topic, not all roads lead to forecasting questions, and working in a project such that you <em>have</em> to produce forecasting questions could be oddly limited.</p>
|
||||
|
||||
<p>The conventions of the forecasting genre also dictate that forecasters will spend a fairly short amount of time researching before making a prediction. Partly this is a result of, for example, the scoring rule in Metaculus, which incentivized forecasting on many questions. Partly this is because forecasting platforms don’t generally pay their forecasters, and even those that are <a href="https://www.openphilanthropy.org/grants/applied-research-laboratory-for-intelligence-and-security-forecasting-platforms/">well funded</a> pay their forecasters badly, which leads to forecating being a hobby, rather than a full-time occupation. If one thinks that some questions require one to dig deep, and that one will otherwise easily produce shitty forecasts, this might be a particularly worrying feature of the genre.</p>
|
||||
|
||||
<p>Perhaps also as a result of its unprofitability, the forecasting community has also tended to see a large amount of churn, as hobbyist forecasters rise up in their regular careers and it becomes more expensive for them in terms of income lost to forecast on online platforms. You also see this churn in terms of employees of these forecasting platforms, where maybe someone creates some new project—e.g., Replication Markets, Metaculus' AI Progress Tournament, Ought’s Elicit, etc.—but then that project dies as its principal person moves on to other topics.</p>
|
||||
|
||||
<p>Forecasting also makes use of scoring rules, which aim to reward forecasters such that they will be incentivized to input their true probabilities. Sadly, these often have the effect of incentivizing people to not collaborate and share information. This can be fixed by using more capital-intensive scoring rules that incentivize collaboration, like <a href="https://github.com/SamotsvetyForecasting/optimal-scoring">these ones</a> or by grouping forecasters into teams such that they will be incentivized to share information within a team.</p>
|
||||
|
||||
<h3>As an aside, here is a casual review of the track record of long-term predictions</h3>
|
||||
|
||||
<p>If we review the track record of superforecasters on longer term questions, we find that… there isn’t that much evidence here—remember that the <a href="https://wikiless.nunosempere.com/wiki/Aggregative_Contingent_Estimation_Program?lang=en">ACE program</a> started in 2010. In <em>Superforecasting</em> (2015), Tetlock wrote:</p>
|
||||
|
||||
<blockquote><p>Taleb, Kahneman, and I agree there is no evidence that geopolitical or economic forecasters can predict anything ten years out beyond the excruciatingly obvious—“there will be conflicts”—and the odd lucky hits that are inevitable whenever lots of forecasters make lots of forecasts. These limits on predictability are the predictable results of the butterfly dynamics of nonlinear systems. In my EPJ research, the accuracy of expert predictions declined toward chance five years out. And yet, this sort of forecasting is common, even within institutions that should know better.</p></blockquote>
|
||||
|
||||
<p>However, in p. 33 of <a href="https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4377599">Long-Range Subjective-Probability Forecasts of Slow-Motion Variables in World Politics: Exploring Limits on Expert Judgment</a> (2023), we see that the experts predicting “slow-motion variables” 25 years into the future attain a Brier score of 0.07, which isn’t terrible.</p>
|
||||
|
||||
<p>Karnofsky, the erstwhile head-honcho of Open Philanthropy, <a href="https://www.cold-takes.com/the-track-record-of-futurists-seems-fine/">spins</a> some research by Arb and others as saying that the track record of futurists is “fine”. <a href="https://danluu.com/futurist-predictions/">Here</a> is a more thorough post by Dan Luu which concludes that:</p>
|
||||
|
||||
<blockquote><p>…people who were into “big ideas” who use a few big hammers on every prediction combined with a cocktail party idea level of understanding of the particular subject to explain why a prediction about the subject would fall to the big hammer generally fared poorly, whether or not their favored big ideas were correct. Some examples of “big ideas” would be “environmental doomsday is coming and hyperconservation will pervade everything”, “economic growth will create near-infinite wealth (soon)”, “Moore’s law is supremely important”, “quantum mechanics is supremely important”, etc. Another common trait of poor predictors is lack of anything resembling serious evaluation of past predictive errors, making improving their intuition or methods impossible (unless they do so in secret). Instead, poor predictors often pick a few predictions that were accurate or at least vaguely sounded similar to an accurate prediction and use those to sell their next generation of predictions to others.</p>
|
||||
|
||||
<p>By contrast, people who had (relatively) accurate predictions had a deep understanding of the problem and also tended to have a record of learning lessons from past predictive errors. Due to the differences in the data sets between this post and Tetlock’s work, the details are quite different here. The predictors that I found to be relatively accurate had deep domain knowledge and, implicitly, had access to a huge amount of information that they filtered effectively in order to make good predictions. Tetlock was studying people who made predictions about a wide variety of areas that were, in general, outside of their areas of expertise, so what Tetlock found was that people really dug into the data and deeply understood the limitations of the data, which allowed them to make relatively accurate predictions. But, although the details of how people operated are different, at a high-level, the approach of really digging into specific knowledge was the same.</p></blockquote>
|
||||
|
||||
<h3>In comparison with other mechanisms for making sense of future AI developments, forecasting does OK.</h3>
|
||||
|
||||
<p>Here are some mechanisms that the Effective Altruism community has historically used to try to make sense of possible dangers stemming from future AI developments:</p>
|
||||
|
||||
<ul>
|
||||
<li>Books, like Bostrom’s <em>Superintelligence</em>, which focused on the abstract properties of highly intelligent and capable agents in the limit.</li>
|
||||
<li><a href="https://www.openphilanthropy.org/research/?q=&focus-area%5B%5D=potential-risks-advanced-ai&content-type%5B%5D=research-reports">Reports</a> by Open Philanthropy. They either try to model AI progress in some detail, like <a href="https://www.lesswrong.com/posts/KrJfoZzpSDpnrv9va/draft-report-on-ai-timelines">example 1</a>, or look at priors on technological development, like <a href="https://www.openphilanthropy.org/research/semi-informative-priors-over-ai-timelines/">example 2</a>.</li>
|
||||
<li>Mini think tanks, like Rethink Priorities, Epoch or AI impacts, which produce their own research and reports.</li>
|
||||
<li>Larger think tanks, like CSET, which produce reports like <a href="https://cset.georgetown.edu/publication/future-indices/">this one</a> on Future Indices.</li>
|
||||
<li>Online discussion on lesswrong.com, that typically assumes things like: intelligence gains would be fast and explosive, that we should aim to design a mathematical construction that guarantees safety, that iteration would not be advisable in the face of fast intelligence gains, etc.</li>
|
||||
<li>Occasionally, theoretical or mathematical arguments or models of risk.</li>
|
||||
<li>One-off projects, like Drexler’s <a href="https://www.fhi.ox.ac.uk/reframing/">Comprehensive AI systems</a></li>
|
||||
<li>Questions on forecasting platforms, like Metaculus, that try to solidly operationalize possible AI developments and dangers, and ask their forecasters to anticipate when and whether they will happen.</li>
|
||||
<li>Writeups from forecasting groups, like <a href="https://forum.effectivealtruism.org/posts/EG9xDM8YRz4JN4wMN/samotsvety-s-ai-risk-forecasts">Samotsvety</a></li>
|
||||
<li>More recently, the Forecasting Research Institute’s <a href="https://forecastingresearch.org/xpt">existential risk tournament/experiment writeup</a>, which has tried to translate geopolitical forecasting mechanisms to predicting AI progress, with mixed success.</li>
|
||||
<li>Deferring to intellectuals, ideologues, and cheerleaders, like Toby Ord, Yudkowsky or MacAskill.</li>
|
||||
</ul>
|
||||
|
||||
|
||||
<p>None of these options, as they currently exist, seem great. Forecasting has the hurdles discussed above, but maybe other mechanisms have even worse downsides, particularly the more pundit-like ones. Conversely, forecasting will be worse than deferring to a brilliant theoretical mind that is able to grasp the dynamics and subtleties of future AI development, like perhaps Drexler’s on a good day.</p>
|
||||
|
||||
<p>Anyways, you might think that this forecasting thing shows potential. Were you a billionnaire, money would not be a limitation for you, so…</p>
|
||||
|
||||
<h3>In this situation, here are some strategies of which you might avail yourself</h3>
|
||||
|
||||
<h4>A. Accept the Faustian bargain</h4>
|
||||
|
||||
<ol>
|
||||
<li>Make a bunch of short-term and long-term forecasting questions on AI progress</li>
|
||||
<li>Wait for the short-term forecasting questions to resolve</li>
|
||||
<li>Weight the forecasts for the long-term questions according to accuracy in the short term questions</li>
|
||||
</ol>
|
||||
|
||||
|
||||
<p>This is a Faustian bargain because of the reasons reviewed above, chiefly that short-term forecasting performance is not a guarantee of longer term forecasting performance. A cheap version of this would be to look at the best short-term forecasters on the AI categories on Metaculus, and report their probabilities on a few AI and existential risk questions, which would be more interpretable than the current opaque “Metaculus prediction”.</p>
|
||||
|
||||
<p>If you think that your other methods of making sense of what it’s going on are sufficiently bad, you could choose this and hope for the best? Or, conversely, you could anchor your beliefs on a weighted aggregate of the best short-term forecasters and the most convincing theoretical views. Maybe things will be fine?</p>
|
||||
|
||||
<h4>B. Attempt to do a Bayesianism</h4>
|
||||
|
||||
<p>Go to the effort of rigorously formulating hypotheses, then keep track of incoming evidence for each hypothesis. If a new hypothesis comes in, try to do some version of <a href="https://nunosempere.com/blog/2023/02/04/just-in-time-bayesianism/">just-in-time Bayesianism</a>, i.e., monkey-patch it after the fact. Once you are specifying your beliefs numerically, you can deploy some cute incentive mechanisms and <a href="https://github.com/SamotsvetyForecasting/optimal-scoring/blob/master/3-amplify-bayesian/amplify-bayesian.pdf">reward people who change your mind</a>.</p>
|
||||
|
||||
<p>Hope that keeping track of hypotheses about the development of AI at least gives you some discipline, and enables you to shed untrue hypotheses or frames a bit earlier than you otherwise would have. Have the discipline to translate the worldviews of various pundits into specific probabilities<sup id="fnref:2"><a href="#fn:2" rel="footnote">2</a></sup>, and listen to them less when their predictions fail to come. And hope that going to the trouble of doing things that way allows you to anticipate stuff 6 months to 2 years sooner than you would have otherwise, and that it is worth the cost.</p>
|
||||
|
||||
<h4>C. Invest in better prediction pipelines as a whole</h4>
|
||||
|
||||
<p>Try to build up some more speculative and <a href="https://nunosempere.com/blog/2023/07/19/better-harder-faster-stronger/">formidable</a> type of forecasting that can deal with the hurdles above. Be more explicit about the types of decisions that you want better foresight for, realize that you don’t have the tools you need, and build someone up to be that for you.</p>
|
||||
<div class="footnotes">
|
||||
<hr/>
|
||||
<ol>
|
||||
<li id="fn:1">
|
||||
To spell this out more clearly, Kuhn was looking at the structure of scientific revolutions, and he notices that you have these “paradigm changes” every now in a while. As a naïve Bayesian, those paradigm changes are kinda confusing, and shouldn’t have any special status. You should just have hypotheses, and they should just rise and fall in likelihood according to Bayes rule. But as a Bayesian who knows he has finite compute/memory, you can think of Kuhnian revolutions as encountering a true hypothesis which was outside your previous hypothesis space, and having to recalculate. On this topic, see <a href="https://nunosempere.com/blog/2023/02/04/just-in-time-bayesianism/">Just-in-time Bayesianism</a> or <a href="https://nunosempere.com/blog/2023/03/01/computable-solomonoff/">A computable version of Solomonoff induction</a>.<a href="#fnref:1" rev="footnote">↩</a></li>
|
||||
<li id="fn:2">
|
||||
Back in the day, Tetlock received a <a href="https://www.openphilanthropy.org/grants/university-of-pennsylvania-philip-tetlock-on-forecasting/#2-about-the-grant">grant</a> to “systematically convert vague predictions made by prominent pundits into explicit numerical forecasts”, but I haven’t been able to track what happened to it, and I suspect it never happened.<a href="#fnref:2" rev="footnote">↩</a></li>
|
||||
</ol>
|
||||
</div>
|
||||
|
||||
<p>
|
||||
<section id='isso-thread'>
|
||||
<noscript>javascript needs to be activated to view comments.</noscript>
|
||||
</section>
|
||||
</p>
|
@ -0,0 +1,8 @@
|
||||
## About
|
||||
A repository for tests & diagnostics.
|
||||
|
||||
The symbols below are probably arbitrary:
|
||||
|
||||
---
|
||||
|
||||
un caballero de los de lanza en astillero
|
@ -0,0 +1,212 @@
|
||||
|
||||
|
||||
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.0.0-alpha1/jquery.min.js"></script>
|
||||
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery-csv/0.71/jquery.csv-0.71.min.js"></script>
|
||||
<!--
|
||||
Sources:
|
||||
+ https://gist.github.com/cmatskas/8725a6ee4f5f1a8e1cea
|
||||
+ cdnjs
|
||||
-->
|
||||
<script type="text/javascript">
|
||||
$(document).ready(function() {
|
||||
|
||||
// The event listener for the file upload
|
||||
document.getElementById('txtFileUpload').addEventListener('change', upload, false);
|
||||
|
||||
document.getElementById('txtFileUpload').addEventListener('click', reset, false);
|
||||
|
||||
function reset(){
|
||||
document.getElementById("txtFileUpload").value = null;
|
||||
// This way, the event change fires even if you upload the same file twice
|
||||
}
|
||||
|
||||
// Method that checks that the browser supports the HTML5 File API
|
||||
function browserSupportFileUpload() {
|
||||
var isCompatible = false;
|
||||
if (window.File && window.FileReader && window.FileList && window.Blob) {
|
||||
isCompatible = true;
|
||||
}
|
||||
return isCompatible;
|
||||
}
|
||||
|
||||
// Method that reads and processes the selected file
|
||||
function upload(evt) {
|
||||
uploadedSameFileTwice = false;
|
||||
if (!browserSupportFileUpload()) {
|
||||
alert('The File APIs are not fully supported in this browser!');
|
||||
} else {
|
||||
// alert("Checkpoint Charlie");
|
||||
// var data = null;
|
||||
data = null;
|
||||
var file = evt.target.files[0];
|
||||
var reader = new FileReader();
|
||||
reader.readAsText(file);
|
||||
reader.onload = function(event) {
|
||||
var csvData = event.target.result;
|
||||
data = $.csv.toArrays(csvData);
|
||||
if (data && data.length > 0) {
|
||||
// alert('Imported -' + data.length + '- rows successfully!');
|
||||
wrapperProportionalApprovalVoting(data);
|
||||
} else {
|
||||
alert('No data to import!');
|
||||
}
|
||||
};
|
||||
reader.onerror = function() {
|
||||
alert('Unable to read ' + file.fileName);
|
||||
};
|
||||
}
|
||||
}
|
||||
|
||||
function wrapperProportionalApprovalVoting(data){
|
||||
let dataColumn1 = data.map(x => x[1]);
|
||||
// this gets us the first column (columns start at 0).
|
||||
// data[][1] breaks the thing without throwing an error in the browser.
|
||||
|
||||
let dataColumn1Split = dataColumn1.map( element => element.split(", "));
|
||||
// One row of the first column might be "Candidate1, Candidate2".
|
||||
// This transforms it to ["Candidate1", "Candidate2"]
|
||||
|
||||
|
||||
let uniqueCandidates = findUnique(dataColumn1Split);
|
||||
// Finds all the candidates
|
||||
|
||||
// In this voting method, all voters start with a weight of 1, which changes as candidates are elected
|
||||
// So that voters who have had one of their candidates elected have less influence for the next candidates.
|
||||
|
||||
let weights = Array(dataColumn1Split.length).fill(1);
|
||||
|
||||
// Find the most popular one, given the weights. Update the weights
|
||||
|
||||
//alert("\n"+dataColumn1Split[0]);
|
||||
|
||||
let n = document.getElementById("numWinners").value;
|
||||
let winners = [];
|
||||
|
||||
for(i=0; i<n; i++){
|
||||
let newWinner = findTheNextMostPopularOneGivenTheWeights(dataColumn1Split, weights, uniqueCandidates, winners);
|
||||
winners.push(newWinner);
|
||||
weights = updateWeightsGivenTheNewWinner(dataColumn1Split, weights, newWinner);
|
||||
}
|
||||
//alert(winners);
|
||||
|
||||
// Display the winners.
|
||||
displayWinners(winners);
|
||||
}
|
||||
|
||||
function displayWinners(winners){
|
||||
|
||||
// Header
|
||||
|
||||
|
||||
// Ordered list with the winners
|
||||
///alert(document.getElementsByTagName("OL")[0]);
|
||||
|
||||
if(document.getElementsByTagName("OL")[0]==undefined){
|
||||
headerH3 = document.createElement("h3");
|
||||
headerH3.innerHTML = "Winners under Proportional Approval Voting:";
|
||||
|
||||
document.getElementById("results").appendChild(headerH3);
|
||||
|
||||
orderedList = document.createElement("OL"); // Creates an ordered list
|
||||
for(let i =0; i<winners.length; i++){
|
||||
HTMLWinner = document.createElement("li");
|
||||
HTMLWinner.appendChild(document.createTextNode(winners[i]));
|
||||
orderedList.appendChild(HTMLWinner);
|
||||
}
|
||||
document.getElementById("results").appendChild(orderedList);
|
||||
|
||||
}else{
|
||||
|
||||
oldOL = document.getElementsByTagName("OL")[0];
|
||||
oldOL.remove();
|
||||
|
||||
orderedList = document.createElement("OL"); // Creates an ordered list
|
||||
for(let i =0; i<winners.length; i++){
|
||||
HTMLWinner = document.createElement("li");
|
||||
HTMLWinner.appendChild(document.createTextNode(winners[i]));
|
||||
orderedList.appendChild(HTMLWinner);
|
||||
}
|
||||
|
||||
document.body.appendChild(orderedList);
|
||||
|
||||
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
function findTheNextMostPopularOneGivenTheWeights(arrayOfArrays, weights, uniqueCandidates, winners){
|
||||
let popularity = Array(uniqueCandidates.length).fill(0);
|
||||
for(let i = 0; i<uniqueCandidates.length; i++){
|
||||
for(let j=1; j<arrayOfArrays.length; j++){
|
||||
// j = 1 because we don't want to include the title
|
||||
//alert("array = "+arrayOfArrays[j]);
|
||||
if(arrayOfArrays[j].includes(uniqueCandidates[i])){
|
||||
popularity[i]+= 1/weights[j];
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
for(let i = 0; i<popularity.length; i++){
|
||||
//alert("popularity["+uniqueCandidates[i]+"] =" +popularity[i]);
|
||||
}
|
||||
|
||||
let maxPopularity = 0;
|
||||
let winner = undefined;
|
||||
//alert(popularity + "\n"+uniqueCandidates);
|
||||
for(let i=0; i<uniqueCandidates.length; i++){
|
||||
if(popularity[i]>=maxPopularity && !winners.includes(uniqueCandidates[i])){
|
||||
// Note, this breaks a tie pretty arbitrarily
|
||||
// Tie breaking mechanism: so obscure as to be random.
|
||||
winner = uniqueCandidates[i];
|
||||
//alert("new better:" +uniqueCandidates[i]);
|
||||
maxPopularity = popularity[i];
|
||||
}
|
||||
}
|
||||
//alert(winner);
|
||||
return winner;
|
||||
}
|
||||
|
||||
function updateWeightsGivenTheNewWinner(arrayOfArrays, weights, newWinner){
|
||||
for(let i=0; i<arrayOfArrays.length; i++){
|
||||
|
||||
if(arrayOfArrays[i].includes(newWinner)){
|
||||
weights[i] = weights[i]+1;
|
||||
}
|
||||
|
||||
}
|
||||
return weights;
|
||||
}
|
||||
|
||||
function findUnique(arrayOfArrays){
|
||||
let uniqueElements = [];
|
||||
|
||||
for(let i = 1; i<arrayOfArrays.length; i++){ // We start with the second row (i=1, instead of i=0, because we take the first row to be a header)
|
||||
for(let j=0; j<arrayOfArrays[i].length; j++){
|
||||
if(!uniqueElements.includes(arrayOfArrays[i][j])){
|
||||
uniqueElements.push(arrayOfArrays[i][j]);
|
||||
}
|
||||
}
|
||||
}
|
||||
return uniqueElements;
|
||||
|
||||
}
|
||||
});
|
||||
|
||||
</script>
|
||||
|
||||
<h1>Proportional Approval Voting MVP</h1>
|
||||
<h3>What is this? How does this work?</h3>
|
||||
<p>This is the simplest version of a program which computes the result of an election, under the <a href="https://www.electionscience.org/learn/electoral-system-glossary/#proportional_approval_voting" target="_blank">Proportional Approval Voting</a> method, for elections which have one or more winners (e.g., presidential elections, but also board member elections).</p>
|
||||
<p>It takes a csv (comma separated value) file, with the same format as <a href="https://docs.google.com/spreadsheets/d/11pBOP6UJ8SSaHIY-s4dYwgBr4PHodh6cIXf-D4yl7HU/edit?usp=sharing" target="_blank">this one</a>, which might be produced by a Google Forms like <a href="https://docs.google.com/forms/d/1_-B5p8ePHnE1jXTGVT_kfRrMRqJuxmm8DPKn-MR1Pok/edit" target="_blank">this one.</a></p>
|
||||
<p>It computes the result using client-side JavaScript, which means that all operations are run in your browser, as opposed to in a server which is not under your control. In effect, all this webpage does is provide you with a bunch of functions. In fact, you could just load this page, disconnect from the internet, upload your files, and you could still use the webpage to get the results you need.</p>
|
||||
<div id="dvImportSegments" class="fileupload ">
|
||||
<fieldset>
|
||||
<legend>Upload your CSV File to compute the result</legend>
|
||||
<label>Number of winners: </label><input type="number" id="numWinners" value="2">
|
||||
<!-- This is not really aesthetic; change. -->
|
||||
<br>
|
||||
<input type="file" name="File Upload" id="txtFileUpload" accept=".csv" />
|
||||
</fieldset>
|
||||
</div>
|
||||
<div id="results"></div>
|
||||
|
@ -1 +0,0 @@
|
||||
Subproject commit 4f5fa42a8214057289b30ff92ef5fe082700d59e
|