fix: reorg
|
@ -1,5 +1,7 @@
|
|||
## Changelog
|
||||
|
||||
Page was getting too annoying to update.
|
||||
|
||||
### Most recent pieces
|
||||
|
||||
- 2022/04/17: [Simple Squiggle](https://nunosempere.com/blog/2022/04/17/simple-squiggle/)
|
6
.secret/werc-bounties/index.md
Normal file
|
@ -0,0 +1,6 @@
|
|||
Werc bounties
|
||||
=============
|
||||
|
||||
[werc](https://werc.cat-v.org/) is the web technology that I am using to run this blog. It is relatively solid, but it has some annoyances that I don't always have the time to figure out. I intend to use this post as a reference for myself, and eventually send it to the werc people with monetary bounties attached once I have a few elements.
|
||||
|
||||
|
|
@ -1,6 +1,6 @@
|
|||
<br class="doNotDisplay doNotPrint" />
|
||||
|
||||
<div style="margin-right: auto;">Powered by <a href="http://werc.cat-v.org/">werc</a>, <a href="https://alpinelinux.org/">Alpine Linux</a> and <a href="https://nginx.org/en/">nginx</a></div>
|
||||
<div style="margin-right: auto;">Powered by <a href="http://werc.cat-v.org/">werc</a>, <a href="https://alpinelinux.org/">alpine</a> and <a href="https://nginx.org/en/">nginx</a></div>
|
||||
|
||||
<!-- TODO: wait until duckduckgo indexes site
|
||||
<form action="https://duckduckgo.com/" method="get">
|
||||
|
|
29
_werc/lib/headers.tpl
Executable file
|
@ -0,0 +1,29 @@
|
|||
<!DOCTYPE HTML>
|
||||
<html>
|
||||
<head>
|
||||
|
||||
<title>%($pageTitle%)</title>
|
||||
|
||||
<link rel="stylesheet" href="/pub/style/style.css" type="text/css" media="screen, handheld" title="default">
|
||||
<link rel="shortcut icon" href="/favicon.ico" type="image/vnd.microsoft.icon">
|
||||
% if(test -f $sitedir/_werc/pub/style.css)
|
||||
% echo ' <link rel="stylesheet" href="/_werc/pub/style.css" type="text/css" media="screen" title="default">'
|
||||
|
||||
<meta charset="UTF-8">
|
||||
% # Legacy charset declaration for backards compatibility with non-html5 browsers.
|
||||
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
|
||||
|
||||
% if(! ~ $#meta_description 0)
|
||||
% echo ' <meta name="description" content="'$"meta_description'">'
|
||||
% if(! ~ $#meta_keywords 0)
|
||||
% echo ' <meta name="keywords" content="'$"meta_keywords'">'
|
||||
|
||||
% h = `{get_lib_file headers.inc}
|
||||
% if(! ~ $#h 0)
|
||||
% cat $h
|
||||
|
||||
%($"extraHeaders%)
|
||||
|
||||
</head>
|
||||
<body>
|
||||
|
|
@ -1,10 +0,0 @@
|
|||
<div class="hidden-mobile">
|
||||
<div>
|
||||
<a href="https://forum.effectivealtruism.org/users/nunosempere">EA forum</a> |
|
||||
<a href="https://forecasting.substack.com/">forecasting newsletter</a> |
|
||||
<a href="https://github.com/NunoSempere">github</a> |
|
||||
<a href="https://metaforecast.org/">metaforecast</a> |
|
||||
<a href="https://quantifieduncertainty.org/">quantified uncertainty</a> |
|
||||
<a href="https://twitter.com/NunoSempere">twitter</a>
|
||||
</div>
|
||||
</div>
|
|
@ -1,15 +0,0 @@
|
|||
<div class="hidden-mobile">
|
||||
<div>
|
||||
<a href="http://gsoc.cat-v.org">gsoc</a> |
|
||||
<a href="http://doc.cat-v.org">doc archive</a> |
|
||||
<a href="http://repo.cat-v.org">software repo</a> |
|
||||
<a href="http://ninetimes.cat-v.org">ninetimes</a> |
|
||||
<a href="http://harmful.cat-v.org">harmful</a> |
|
||||
<a href="http://9p.cat-v.org/">9P</a> |
|
||||
<a href="http://cat-v.org">cat-v.org</a>
|
||||
</div>
|
||||
<div>
|
||||
<a href="http://cat-v.org/update_log">site updates</a> |
|
||||
<a href="/sitemap">site map</a>
|
||||
</div>
|
||||
</div>
|
Before Width: | Height: | Size: 238 KiB After Width: | Height: | Size: 238 KiB |
Before Width: | Height: | Size: 278 KiB After Width: | Height: | Size: 278 KiB |
Before Width: | Height: | Size: 435 KiB After Width: | Height: | Size: 435 KiB |
Before Width: | Height: | Size: 414 KiB After Width: | Height: | Size: 414 KiB |
Before Width: | Height: | Size: 322 KiB After Width: | Height: | Size: 322 KiB |
Before Width: | Height: | Size: 425 KiB After Width: | Height: | Size: 425 KiB |
Before Width: | Height: | Size: 366 KiB After Width: | Height: | Size: 366 KiB |
Before Width: | Height: | Size: 934 KiB After Width: | Height: | Size: 934 KiB |
Before Width: | Height: | Size: 359 KiB After Width: | Height: | Size: 359 KiB |
Before Width: | Height: | Size: 379 KiB After Width: | Height: | Size: 379 KiB |
Before Width: | Height: | Size: 348 KiB After Width: | Height: | Size: 348 KiB |
Before Width: | Height: | Size: 358 KiB After Width: | Height: | Size: 358 KiB |
Before Width: | Height: | Size: 293 KiB After Width: | Height: | Size: 293 KiB |
Before Width: | Height: | Size: 352 KiB After Width: | Height: | Size: 352 KiB |
Before Width: | Height: | Size: 314 KiB After Width: | Height: | Size: 314 KiB |
Before Width: | Height: | Size: 231 KiB After Width: | Height: | Size: 231 KiB |
Before Width: | Height: | Size: 373 KiB After Width: | Height: | Size: 373 KiB |
Before Width: | Height: | Size: 406 KiB After Width: | Height: | Size: 406 KiB |
Before Width: | Height: | Size: 410 KiB After Width: | Height: | Size: 410 KiB |
Before Width: | Height: | Size: 492 KiB After Width: | Height: | Size: 492 KiB |
Before Width: | Height: | Size: 322 KiB After Width: | Height: | Size: 322 KiB |
Before Width: | Height: | Size: 299 KiB After Width: | Height: | Size: 299 KiB |
Before Width: | Height: | Size: 311 KiB After Width: | Height: | Size: 311 KiB |
Before Width: | Height: | Size: 377 KiB After Width: | Height: | Size: 377 KiB |
Before Width: | Height: | Size: 374 KiB After Width: | Height: | Size: 374 KiB |
Before Width: | Height: | Size: 339 KiB After Width: | Height: | Size: 339 KiB |
Before Width: | Height: | Size: 281 KiB After Width: | Height: | Size: 281 KiB |
Before Width: | Height: | Size: 491 KiB After Width: | Height: | Size: 491 KiB |
Before Width: | Height: | Size: 240 KiB After Width: | Height: | Size: 240 KiB |
Before Width: | Height: | Size: 325 KiB After Width: | Height: | Size: 325 KiB |
Before Width: | Height: | Size: 335 KiB After Width: | Height: | Size: 335 KiB |
|
@ -115,32 +115,32 @@ Three most important actionables:
|
|||
|
||||
## B. Exploratory plots.
|
||||
|
||||
![](images/8a73163df20e38856be2ab2212d484b1a58b7fe9.png)
|
||||
![](images/b8025a0103dc70a976ecf604a12de6ddf2cd0ddb.png)
|
||||
![](images/02891164325de4d34fcd3ca026b4c8c74b26fdcc.png)
|
||||
![](images/df0dd280b2cd33327cfc81eef8de696e015c7562.png)
|
||||
![](images/18ba7e3dce520c9137538bab707b1b585b2f671c.png)
|
||||
![](images/aa717ffb66b96e2edc8a032dd179f8f17f33181a.png)
|
||||
![](images/8f9995e951ce75147421f5233794555542db867a.png)
|
||||
![](images/918999de00da42973f752ecb7d47a68f3c599a99.png)
|
||||
![](images/b44c2ad3d57250ceec6a5f3eede0d375c649ab16.png)
|
||||
![](images/d4989159c8011be94310198f52367c3d61189fcc.png)
|
||||
![](images/05f019dc1b238486c93e78b31e2262ba34ac80b7.png)
|
||||
![](images/86a9fc7e89b8a6e69c47c8e5fef0b79a2ad045d8.png)
|
||||
![](images/bd015fe73fb51b04114e4c87931792a2100e7674.png)
|
||||
![](images/06c70e8eeb9d81d0ecf08a9ab21f8f821f4a5980.png)
|
||||
![](images/42ca65e3812fb1007f2070e11ca5e4bab0ed9c9e.png)
|
||||
![](images/3540e439e51010f97f9423445b47440c8643ea1a.png)
|
||||
![](images/f52e2963f147e74f31654bf13e3ab6b3098d2996.png)
|
||||
![](images/1364361431dfd66aec4fb7fb9fba0473bd359e67.png)
|
||||
![](images/b65235999c982b9414bcaf59e48ce39827bfe201.png)
|
||||
![](images/38b656906584ea6736e321d7dc14c7c4be7b0b4d.png)
|
||||
![](images/13dd0da9f9a4007915097af5919e45d72a278c66.png)
|
||||
![](images/3e772f2cd88e85a0658b2b7498b9ff1311d50c58.png)
|
||||
![](images/b9fd7bdd1c95cf0d8429312b77cbceae85db049f.png)
|
||||
![](images/502b5cc72b9cee7b6330d5e01cf2894c5283a67a.png)
|
||||
![](images/8ff3e88439796d0868b305fa0c5eaef294df7c5f.png)
|
||||
![](images/ec1ab2e026861dc21df76876c5254bc94f13f10e.png)
|
||||
![](.images/8a73163df20e38856be2ab2212d484b1a58b7fe9.png)
|
||||
![](.images/b8025a0103dc70a976ecf604a12de6ddf2cd0ddb.png)
|
||||
![](.images/02891164325de4d34fcd3ca026b4c8c74b26fdcc.png)
|
||||
![](.images/df0dd280b2cd33327cfc81eef8de696e015c7562.png)
|
||||
![](.images/18ba7e3dce520c9137538bab707b1b585b2f671c.png)
|
||||
![](.images/aa717ffb66b96e2edc8a032dd179f8f17f33181a.png)
|
||||
![](.images/8f9995e951ce75147421f5233794555542db867a.png)
|
||||
![](.images/918999de00da42973f752ecb7d47a68f3c599a99.png)
|
||||
![](.images/b44c2ad3d57250ceec6a5f3eede0d375c649ab16.png)
|
||||
![](.images/d4989159c8011be94310198f52367c3d61189fcc.png)
|
||||
![](.images/05f019dc1b238486c93e78b31e2262ba34ac80b7.png)
|
||||
![](.images/86a9fc7e89b8a6e69c47c8e5fef0b79a2ad045d8.png)
|
||||
![](.images/bd015fe73fb51b04114e4c87931792a2100e7674.png)
|
||||
![](.images/06c70e8eeb9d81d0ecf08a9ab21f8f821f4a5980.png)
|
||||
![](.images/42ca65e3812fb1007f2070e11ca5e4bab0ed9c9e.png)
|
||||
![](.images/3540e439e51010f97f9423445b47440c8643ea1a.png)
|
||||
![](.images/f52e2963f147e74f31654bf13e3ab6b3098d2996.png)
|
||||
![](.images/1364361431dfd66aec4fb7fb9fba0473bd359e67.png)
|
||||
![](.images/b65235999c982b9414bcaf59e48ce39827bfe201.png)
|
||||
![](.images/38b656906584ea6736e321d7dc14c7c4be7b0b4d.png)
|
||||
![](.images/13dd0da9f9a4007915097af5919e45d72a278c66.png)
|
||||
![](.images/3e772f2cd88e85a0658b2b7498b9ff1311d50c58.png)
|
||||
![](.images/b9fd7bdd1c95cf0d8429312b77cbceae85db049f.png)
|
||||
![](.images/502b5cc72b9cee7b6330d5e01cf2894c5283a67a.png)
|
||||
![](.images/8ff3e88439796d0868b305fa0c5eaef294df7c5f.png)
|
||||
![](.images/ec1ab2e026861dc21df76876c5254bc94f13f10e.png)
|
||||
|
||||
The code to produce them can be found [here](https://nunosempere.github.io/rat/eamentalhealth/)
|
||||
|
||||
|
@ -150,11 +150,11 @@ The code to produce them can be found [here](https://nunosempere.github.io/rat/e
|
|||
|
||||
#### 1.1. According to age
|
||||
|
||||
![](images/e8466ca8f7555d37b25e7046bf22f45c575e7c95.png)
|
||||
![](.images/e8466ca8f7555d37b25e7046bf22f45c575e7c95.png)
|
||||
|
||||
#### 1.2. According to country.
|
||||
|
||||
![](images/ff22d1678c0868146825ca03f26c1352d11bb850.png)
|
||||
![](.images/ff22d1678c0868146825ca03f26c1352d11bb850.png)
|
||||
|
||||
#### 1.3. According to gender.
|
||||
|
||||
|
@ -172,7 +172,7 @@ Editorial bottom line: With respect to age, country, and gender, there seem to b
|
|||
|
||||
Initially, I was intending to find out the different mental disorder rates in the different countries, and combine that with the distribution in the data. The webpage [Our World in Data](https://ourworldindata.org/mental-health) provides the necessary data:
|
||||
|
||||
![](images/241ccf3251fdd6e74c882b31aa9985cb5898c44e.png)
|
||||
![](.images/241ccf3251fdd6e74c882b31aa9985cb5898c44e.png)
|
||||
|
||||
We see that the distribution is broadly similar across the countries among which EA has a presence. Most importantly, it doesn't surpass ~25% in any country, whereas among survey respondents:
|
||||
|
||||
|
@ -290,7 +290,7 @@ The effect is very robust to different modelizations: regressing instead on empt
|
|||
|
||||
Here is the above, presented visually
|
||||
|
||||
![](images/67f6b5d117bdbcb6c7b56e5afd798d859ee7966c.png)
|
||||
![](.images/67f6b5d117bdbcb6c7b56e5afd798d859ee7966c.png)
|
||||
|
||||
The following table is the cold, raw, hard data for the graphic.
|
||||
|
||||
|
@ -1357,7 +1357,7 @@ With regards to pathway 2, we have a rough upwards cross-sectional estimate of 2
|
|||
* People who recover from a mental illness because of help from the EA community might not donate 10% of the counterfactual gain, but more.
|
||||
* The gain is not only in the form of hours missed but in productivity gained while working.
|
||||
|
||||
![](images/0d7aea62609944b3674d36602b1dd50f2d91c8fe.png)
|
||||
![](.images/0d7aea62609944b3674d36602b1dd50f2d91c8fe.png)
|
||||
|
||||
With regards to pathway 3, it is my impression that people in top charities and EA organizations already get good mental healthcare, though about rogue effective altruists I cannot say much.
|
||||
|
||||
|
@ -1397,6 +1397,6 @@ SlateStarCodex's list of [mental health professionals](https://slatestarcodex.co
|
|||
|
||||
The point being that there are a lot of mental health resources and information online, if only one knew where to find them, and >10% of survey respondents answered that finding information on mental health resources was hard or very hard.
|
||||
|
||||
![](images/3540e439e51010f97f9423445b47440c8643ea1a.png)
|
||||
![](.images/3540e439e51010f97f9423445b47440c8643ea1a.png)
|
||||
|
||||
Thus, it is a high tractability, high neglectedness, medium impact task to create such a list of mental resources for the EA community, even if collated and scavenged from other sources. If such a resource already exists, plenty of respondents do not seem to know about it.
|
Before Width: | Height: | Size: 23 KiB After Width: | Height: | Size: 23 KiB |
|
@ -106,7 +106,7 @@ And an extra property:
|
|||
|
||||
are enough to force the Shapley value function to take the form it takes:
|
||||
|
||||
![](images/aeef390de90cb7c6fe07fcad852578fbebe162b1.svg)
|
||||
![](.images/aeef390de90cb7c6fe07fcad852578fbebe162b1.svg)
|
||||
|
||||
At this point, the reader may want to consult [Wikipedia](https://en.wikipedia.org/wiki/Shapley_value) to familiarize themselves with the mathematical formalism, or, for a book-length treatment, [_The Shapley value: Essays in honor of Lloyd S. Shapley_](http://www.library.fa.ru/files/Roth2.pdf). Ultimately, a quick way to understand it is as "the function uniquely determined by the properties above".
|
||||
|
Before Width: | Height: | Size: 131 KiB After Width: | Height: | Size: 131 KiB |
Before Width: | Height: | Size: 85 KiB After Width: | Height: | Size: 85 KiB |
Before Width: | Height: | Size: 114 KiB After Width: | Height: | Size: 114 KiB |
Before Width: | Height: | Size: 15 KiB After Width: | Height: | Size: 15 KiB |
|
@ -39,11 +39,11 @@ A two-sentence version would be:
|
|||
|
||||
> Forecasters predicted the conclusions that would be reached by Elizabeth van Norstrand, a generalist researcher, before she conducted a study on the accuracy of various historical claims. We randomly sampled a subset of research claims for her to evaluate, and since we can set that probability arbitrarily low this method is not bottlenecked by her time.
|
||||
|
||||
![](images/4a235d14d0177ec92050af5b2551cdbc337f2d1e.png)
|
||||
![](.images/4a235d14d0177ec92050af5b2551cdbc337f2d1e.png)
|
||||
|
||||
The below graph shows the evolution of the accuracy of the crowd prediction over time, starting from Elizabeth Van Nostrand’s prior. Predictions were submitted separately by two groups of forecasters: one based on a mailing list with participants interested in participating in forecasting experiments (recruited from effective altruism-adjacent events and other forecasting platforms), and one recruited from Positly, an online platform for crowdworkers.
|
||||
|
||||
![](images/c7e041d8fab837233a9cc4d03c6166c54da04020.png)
|
||||
![](.images/c7e041d8fab837233a9cc4d03c6166c54da04020.png)
|
||||
|
||||
The y-axis shows the accuracy score on a logarithmic scale, and the x-axis shows how far along the experiment is. For example, 14 out of 28 days would correspond to 50%. The thick lines show the average score of the aggregate prediction, across all questions, at each time-point. The shaded areas show the standard error of the scores, so that the graph might be interpreted as a guess of how the two communities would predict a random new question.
|
||||
|
||||
|
@ -61,7 +61,7 @@ We measured “value provided” as the reduction in uncertainty weighted by the
|
|||
|
||||
Results were as follows.
|
||||
|
||||
![](images/50453b84385fa25f5a934570cfa2bc6702869748.png)
|
||||
![](.images/50453b84385fa25f5a934570cfa2bc6702869748.png)
|
||||
|
||||
In other words, each unit of resource invested in the network-adjacent forecasters provided 72% as much returns as investing it in Elizabeth directly, and each unit invested in the crowdworkers provided negative returns, as they tended to be less accurate than Elizabeth’s prior.
|
||||
|
||||
|
@ -229,7 +229,7 @@ For future experiments, we’re considering obtaining an objective data-set with
|
|||
|
||||
In order to participate in the experiment, a forecaster has to turn their mental models (represented in whichever way the human brain represents models) into quantitative distributions (which is a format quite unlike that native to our brains), as shown in the following diagram:
|
||||
|
||||
![](images/fd632951009d3c978277000a6ba9f3834cb4922a.png)
|
||||
![](.images/fd632951009d3c978277000a6ba9f3834cb4922a.png)
|
||||
|
||||
Each step in this chain is quite challenging, requires much practice to master, and can result in a loss of information.
|
||||
|
Before Width: | Height: | Size: 379 KiB After Width: | Height: | Size: 379 KiB |
Before Width: | Height: | Size: 131 KiB After Width: | Height: | Size: 131 KiB |
Before Width: | Height: | Size: 85 KiB After Width: | Height: | Size: 85 KiB |
Before Width: | Height: | Size: 269 KiB After Width: | Height: | Size: 269 KiB |
Before Width: | Height: | Size: 177 KiB After Width: | Height: | Size: 177 KiB |
Before Width: | Height: | Size: 131 KiB After Width: | Height: | Size: 131 KiB |
Before Width: | Height: | Size: 164 KiB After Width: | Height: | Size: 164 KiB |
Before Width: | Height: | Size: 164 KiB After Width: | Height: | Size: 164 KiB |
Before Width: | Height: | Size: 361 KiB After Width: | Height: | Size: 361 KiB |
Before Width: | Height: | Size: 114 KiB After Width: | Height: | Size: 114 KiB |
Before Width: | Height: | Size: 48 KiB After Width: | Height: | Size: 48 KiB |
|
@ -37,7 +37,7 @@ A two-sentence version would be:
|
|||
|
||||
> Forecasters predicted the conclusions that would be reached by Elizabeth van Norstrand, a generalist researcher, before she conducted a study on the accuracy of various historical claims. We randomly sampled a subset of research claims for her to evaluate, and since we can set that probability arbitrarily low this method is not bottlenecked by her time.
|
||||
|
||||
![](images/4a235d14d0177ec92050af5b2551cdbc337f2d1e.png)
|
||||
![](.images/4a235d14d0177ec92050af5b2551cdbc337f2d1e.png)
|
||||
|
||||
**1\. Evaluator extracts claims from the book and submits priors**
|
||||
|
||||
|
@ -49,7 +49,7 @@ All claims were assigned an importance rating from 1-10 based on their relevance
|
|||
|
||||
Elizabeth also spent 3 minutes per claim submitting an initial estimate (referred to as a “prior”).
|
||||
|
||||
![](images/533cbee1908697a3c0338e0f5c83b7f960d73551.png)
|
||||
![](.images/533cbee1908697a3c0338e0f5c83b7f960d73551.png)
|
||||
|
||||
Beliefs were typically encoded as distributions over the range 0% to 100%, representing where Elizabeth expected the mean of her posterior credence in the claim to be after 10 more hours of research_._ For more explanation, see this footnote \[4\].
|
||||
|
||||
|
@ -63,7 +63,7 @@ A key part of the design was that that forecasters _did_ _not know_ which questi
|
|||
|
||||
Two groups of forecasters participated in the experiment: one based on a mailing list with participants interested in participating in forecasting experiments (recruited from effective altruism-adjacent events and other forecasting platforms) \[6\], and one recruited from Positly, an online platform for crowdworkers. The former group is here called “Network-adjacent forecasters” and the latter “Online crowdworkers”.
|
||||
|
||||
![](images/5e456cfc58967fc5c074c0287c806653d978b84b.png)
|
||||
![](.images/5e456cfc58967fc5c074c0287c806653d978b84b.png)
|
||||
|
||||
**3\. The evaluator judges the claims**
|
||||
|
||||
|
@ -125,17 +125,17 @@ The aggregate prediction was computed as the average of all forecasters' final p
|
|||
|
||||
The following graph shows how the aggregate performed on each question:
|
||||
|
||||
![](images/f6ebc350474cbb51b21c0fa5184716c6f2e3eceb.png)
|
||||
![](.images/f6ebc350474cbb51b21c0fa5184716c6f2e3eceb.png)
|
||||
|
||||
The opaque bars represent the scores from the crowdworkers, and the translucent bars, which have higher scores throughout, represent the scores from the network-adjacent forecasters. It's interesting that the order is preserved, that is, that the question difficulty was the same for both groups. Finally we don’t see any correlation between question difficulty and the importance weights Elizabeth assigned to the questions.
|
||||
|
||||
However, the comparison is confounded by the fact that more effort was spent from the network-adjacent forecasters. The above graph also doesn’t compare performance to Elizabeth’s priors. Hence we also plot the evolution of the aggregate score over prediction number and time (the first data-point in the below graphs represent Elizabeth’s priors):
|
||||
|
||||
![](images/53f456d57fe63c3f65b05b21cffa42b69d45ed87.png)
|
||||
![](.images/53f456d57fe63c3f65b05b21cffa42b69d45ed87.png)
|
||||
|
||||
![](images/a547c3816ddef37ee3d560ac2d05ec50071df615.png)
|
||||
![](.images/a547c3816ddef37ee3d560ac2d05ec50071df615.png)
|
||||
|
||||
![](images/c7e041d8fab837233a9cc4d03c6166c54da04020.png)
|
||||
![](.images/c7e041d8fab837233a9cc4d03c6166c54da04020.png)
|
||||
|
||||
For the last graph, the y-axis shows the score on a logarithmic scale, and the x-axis shows how far along the experiment is. For example, 14 out of 28 days would correspond to 50%. The thick lines show the average score of the aggregate prediction, across all questions, at each time-point. The shaded areas show the standard error of the scores, so that the graph might be interpreted as a guess of how the two communities would predict a random new question \[10\].
|
||||
|
||||
|
@ -147,9 +147,9 @@ One way to see this qualitatively is by observing the graphs below, where we dis
|
|||
|
||||
The x-axis \[12\] refers to the Elizabeth’s best estimate of the accuracy of a claim, from 0% to 100% (see section “Mechanism design, 1. Evaluator extracts claims” for more detail).
|
||||
|
||||
![](images/b10eb78f4b874e299a1a14ef331821ebb47b042a.png)
|
||||
![](.images/b10eb78f4b874e299a1a14ef331821ebb47b042a.png)
|
||||
|
||||
![](images/449cbaaa18d85ac7e1fbf3e7d70defc290367b94.png)
|
||||
![](.images/449cbaaa18d85ac7e1fbf3e7d70defc290367b94.png)
|
||||
|
||||
Another way to understand the performance of the aggregate is to note that the aggregate of network-adjacent forecasters had an average log score of -0.5. To get a rough sense of what that means, it's the score you'd get by being 70% confident in a binary event, and being correct (though note that this binary comparison merely serves to provide intuition, there are technical details making the comparison to a distributional setting a bit tricky).
|
||||
|
||||
|
@ -189,7 +189,7 @@ The value is computed using the following model (interactive calculation linked
|
|||
|
||||
Results were as follows.
|
||||
|
||||
![](images/50453b84385fa25f5a934570cfa2bc6702869748.png)
|
||||
![](.images/50453b84385fa25f5a934570cfa2bc6702869748.png)
|
||||
|
||||
_(Links to models: network-adjacent_ _[cost ratio](https://www.getguesstimate.com/models/14521)_ _and_ _[value ratio](https://observablehq.com/@jjj/amplification-effectiveness), online crowdworker_ _[cost ratio](https://www.getguesstimate.com/models/14614)_ _and_ _[value ratio](https://observablehq.com/@jjj/amplification-effectiveness-positly).)_
|
||||
|
||||
|
@ -199,7 +199,7 @@ This observation is in tension with the some of the above graphs, which show a t
|
|||
|
||||
Another question to consider when thinking about cost-effectiveness is diminishing returns. The following graph shows how the information gain from additional predictions diminished over time.
|
||||
|
||||
![](images/6181a43348526199a4746e6da4bbdc96afdd823b.png)
|
||||
![](.images/6181a43348526199a4746e6da4bbdc96afdd823b.png)
|
||||
|
||||
The x-axis shows the number of predictions after Elizabeth’s prior (which would be prediction number 0). The y-axis shows how much closer to a perfect score each prediction moved the aggregate, as a percentage of the distance between the previous aggregate and the perfect log score of 0 \[15\].
|
||||
|
Before Width: | Height: | Size: 38 KiB After Width: | Height: | Size: 38 KiB |
Before Width: | Height: | Size: 42 KiB After Width: | Height: | Size: 42 KiB |
|
@ -25,9 +25,9 @@ When the professors deviate from that general structure, they become less convin
|
|||
|
||||
The question presented is whether there is a nutrition based poverty trap. A mathematical formalism for a poverty trap is presented, in which wealth at time t+1 depends on wealth at time t. A poverty trap appears if falling below a wealth threshold leads to a further sliding down, that is, if the relationship between wealth at time t and t+1 looks like:
|
||||
|
||||
![](images/be7f99df5b8aa33ef7aadd37a7560aa24505e5d9.png) as opposed to like
|
||||
![](.images/be7f99df5b8aa33ef7aadd37a7560aa24505e5d9.png) as opposed to like
|
||||
|
||||
![](images/8b0e4c8fbb9b0400998fbbc58158fa7c79aaeebb.png)
|
||||
![](.images/8b0e4c8fbb9b0400998fbbc58158fa7c79aaeebb.png)
|
||||
|
||||
(In both cases, you start at y0 at time 0, move to y1 at time 1, to y2 at time 2, etc.)
|
||||
|
Before Width: | Height: | Size: 6.2 KiB After Width: | Height: | Size: 6.2 KiB |
Before Width: | Height: | Size: 70 KiB After Width: | Height: | Size: 70 KiB |
Before Width: | Height: | Size: 701 KiB After Width: | Height: | Size: 701 KiB |
Before Width: | Height: | Size: 294 KiB After Width: | Height: | Size: 294 KiB |
Before Width: | Height: | Size: 6.2 KiB After Width: | Height: | Size: 6.2 KiB |
Before Width: | Height: | Size: 187 KiB After Width: | Height: | Size: 187 KiB |
Before Width: | Height: | Size: 5.4 KiB After Width: | Height: | Size: 5.4 KiB |
Before Width: | Height: | Size: 101 KiB After Width: | Height: | Size: 101 KiB |
Before Width: | Height: | Size: 130 KiB After Width: | Height: | Size: 130 KiB |
Before Width: | Height: | Size: 242 KiB After Width: | Height: | Size: 242 KiB |
Before Width: | Height: | Size: 135 KiB After Width: | Height: | Size: 135 KiB |
|
@ -17,7 +17,7 @@ In [Shapley values: Better than counterfactuals](https://forum.effectivealtruism
|
|||
|
||||
Because this post is long, images will be interspersed throughout to clearly separate sections and provide rest for tired eyes. This is an habit I have from my blogging days, though which I have not seen used in this forum.
|
||||
|
||||
![](images/bc4a6add2d82c0297031b883af215a4dff297d94.png)
|
||||
![](.images/bc4a6add2d82c0297031b883af215a4dff297d94.png)
|
||||
|
||||
## Philantropic Coordination Theory:
|
||||
|
||||
|
@ -42,7 +42,7 @@ If we try to calculate the Shapley value in this case, we notice that it depends
|
|||
|
||||
In any case, their Shapley values are:
|
||||
|
||||
![](images/0b51c9065e85902d93abc4de5c676a162431fd9e.png) ![](images/48294ee8a66ec565a22576254823d2292d1d6f7b.png)
|
||||
![](.images/0b51c9065e85902d93abc4de5c676a162431fd9e.png) ![](.images/48294ee8a66ec565a22576254823d2292d1d6f7b.png)
|
||||
|
||||
One can understand this as follows: Player A has Value({A}) already in their hand, and Player B has Value({B}), and they're considering whether to cooperate. If they do, then the surplus from cooperating is Value({A,B}) - (Value({A}) + Value({B})), and it get's shared equally:
|
||||
|
||||
|
@ -136,7 +136,7 @@ Note that, because GiveWell's alternative is known, GiveWell doesn't have to see
|
|||
* When buying a certificate of impact, the donor would in fact be willing to pay _more_ than $1, because $1 dollar can't get him that much value any more, due to diminishing returns. Similarly, GiveWell would be willing to sell it for _less_ than $1, because of the same reasons; once diminishing returns start setting in, they would have to donate less than $1 to their best alternative to get the equivalent of $1 dollar of donations to SCI. I've thus pretended that in this market with one seller and one buyer, the price is agreed to be $1. Another solution would be to have an efficient market in certificates of impact.
|
||||
* The value certificate equilibrium is very similar regardless of whether one is thinking in terms of Shapley values or counterfactuals. I feel, but can't prove, that Shapley values add a kind of clarity and crispness to the reasoning, if only because they force you to consider all the moving parts.
|
||||
|
||||
![](images/14578566a70fe4029fdf4fc0b37253dce1b735d2.jpg)
|
||||
![](.images/14578566a70fe4029fdf4fc0b37253dce1b735d2.jpg)
|
||||
|
||||
## Shapley values of forecasters.
|
||||
|
||||
|
@ -202,7 +202,7 @@ while the new forecaster gets rewarded in proportion to
|
|||
|
||||
This still preserves some of the same incentives as above, though in this case, attribution becomes more tricky. Further, anecdotically, seeing someone's distribution before updating gives more information than seeing someone's distribution after they've updated, so just seeing the contrast between g(0) and g(1) might be useful to future forecasters.
|
||||
|
||||
![](images/4bf82fbcce46ac1e5520ce4070c8883525c80bde.jpg)
|
||||
![](.images/4bf82fbcce46ac1e5520ce4070c8883525c80bde.jpg)
|
||||
|
||||
## A value attribution impossibility theorem.
|
||||
|
||||
|
@ -279,7 +279,7 @@ With that in mind, one of the most interesting facts about Arrow's impossibility
|
|||
|
||||
As such, I'm hedging my bets: impossibility theorems must be taken with a grain of salt; they can be stepped over if their assumptions do not hold.
|
||||
|
||||
![](images/3edb1377c2726d6123865c43360d67c08adb9aca.jpg)
|
||||
![](.images/3edb1377c2726d6123865c43360d67c08adb9aca.jpg)
|
||||
|
||||
## Parfit's [_Five Mistakes in Moral Mathematics_](http://www.stafforini.com/docs/Parfit%20-%20Five%20mistakes%20in%20moral%20mathematics.pdf).
|
||||
|
||||
|
@ -289,7 +289,7 @@ The Indian Mathematician Brahmagupta describes the solution to the quadratic equ
|
|||
|
||||
to describe the solution to the quadratic equation ax^2 + bx = c.
|
||||
|
||||
![](images/5c7efddde4aa1a7119a15d93783f6c682d7212cf.svg)
|
||||
![](.images/5c7efddde4aa1a7119a15d93783f6c682d7212cf.svg)
|
||||
|
||||
I read Parfit's piece with the same admiration, sadness and sorrow with which I read the above paragraph. On the one hand, he is oftent clearly right. On the other hand, he's just working with very rudimentary tools: mere words.
|
||||
|
||||
|
@ -356,7 +356,7 @@ Overall, I think that Shapley values do pretty well on the problems posed by Par
|
|||
|
||||
> (C7) Even if an act harms no one, this act may be wrong because it is one of a set of acts that together harm other people. Similarly, even if some act benefits no one, it can be what someone ought to do, because it is one of a set of acts that together benefit other people.
|
||||
|
||||
![](images/c1c6470e1f4712d38d21ce31a6a2a4baa3671ab9.jpg)
|
||||
![](.images/c1c6470e1f4712d38d21ce31a6a2a4baa3671ab9.jpg)
|
||||
|
||||
## Shapley value puzzles
|
||||
|
||||
|
@ -433,7 +433,7 @@ Calculate the Shapley values for:
|
|||
* You (Emma)
|
||||
* Lucy
|
||||
|
||||
![](images/16fb6f6948ae9600cee6ae878a46ea21172e3821.jpg)
|
||||
![](.images/16fb6f6948ae9600cee6ae878a46ea21172e3821.jpg)
|
||||
|
||||
## Speculative Shapley extensions.
|
||||
|
||||
|
@ -478,7 +478,7 @@ If you the agent's information for those expected values, this allows you to pun
|
|||
|
||||
However, Shapley values are probably be too unsophisticated to be used in situations which are primarily about social incentives.
|
||||
|
||||
![](images/8260e1c392a2a710e9421817175e84156922de3d.jpeg)
|
||||
![](.images/8260e1c392a2a710e9421817175e84156922de3d.jpeg)
|
||||
|
||||
## Some complimentary (and yet obligatory) ramblings about Shapley Values, Goodhart's law, and Stanovich's disrationalia.
|
||||
|
||||
|
@ -537,7 +537,7 @@ Suppose that you have two players, player a and Player B, and three charities: G
|
|||
|
||||
Or, in graph form:
|
||||
|
||||
![](images/b640f8bfd4ce1e54a632d7bb4a3c4bd9e9695418.jpg)
|
||||
![](.images/b640f8bfd4ce1e54a632d7bb4a3c4bd9e9695418.jpg)
|
||||
|
||||
Caveat:
|
||||
|
Before Width: | Height: | Size: 79 KiB After Width: | Height: | Size: 79 KiB |
Before Width: | Height: | Size: 52 KiB After Width: | Height: | Size: 52 KiB |
|
@ -105,7 +105,7 @@ This month they've organized a flurry of activities, most notably:
|
|||
|
||||
PredictIt is a prediction platform restricted to US citizens, but also accessible with a VPN. This month, they present a map about the electoral college result in the USA. States are colored according to the market prices:
|
||||
|
||||
![](images/654d6212cd170b9287738a89bd6b4535248ed6e1.png)
|
||||
![](..images/654d6212cd170b9287738a89bd6b4535248ed6e1.png)
|
||||
|
||||
Some of the predictions I found most interesting follow. The market probabilities can be found below; the engaged reader might want to write down their own probabilities and then compare.
|
||||
|
||||
|
@ -259,7 +259,7 @@ This section contains items which have recently come to my attention, but which
|
|||
> Our judges in this study were eight individuals, carefully selected for their expertise as handicappers. Each judge was presented with a list of 88 variables culled from the past performance charts. He was asked to indicate which five variables out of the 88 he would wish to use when handicapping a race, if all he could have was five variables. He was then asked to indicate which 10, which 20, and which 40 he would use if 10, 20, or 40 were available to him.
|
||||
|
||||
> We see that accuracy was as good with five variables as it was with 10, 20, or 40. The flat curve is an average over eight subjects and is somewhat misleading. Three of the eight actually showed a decrease in accuracy with more information, two improved, and three stayed about the same. All of the handicappers became more confident in their judgments as information increased.
|
||||
> ![](images/e8ac191e43364ff35bdc19361dd92c9a74e7109a.png)
|
||||
> ![](..images/e8ac191e43364ff35bdc19361dd92c9a74e7109a.png)
|
||||
|
||||
* The study contains other nuggets, such as:
|
||||
* An experiment on trying to predict the outcome of a given equation. When the feedback has a margin of error, this confuses respondents.
|
||||
|
|
Before Width: | Height: | Size: 247 KiB After Width: | Height: | Size: 247 KiB |
Before Width: | Height: | Size: 6.8 KiB After Width: | Height: | Size: 6.8 KiB |
Before Width: | Height: | Size: 8.9 KiB After Width: | Height: | Size: 8.9 KiB |
|
@ -115,7 +115,7 @@ Ordered in subjective order of importance:
|
|||
|
||||
* The International Energy Agency had terrible forecasts on solar photo-voltaic energy production, until [recently](https://pv-magazine-usa.com/2020/07/12/has-the-international-energy-agency-finally-improved-at-forecasting-solar-growth/):
|
||||
|
||||
> ![](images/7244132c6380f86a5fc5327b5c6abb70e741097a.jpg)
|
||||
> ![](.images/7244132c6380f86a5fc5327b5c6abb70e741097a.jpg)
|
||||
|
||||
> ...It’s a scenario assuming current policies are kept and no new policies are added.
|
||||
|
||||
|
@ -241,7 +241,7 @@ Ordered in subjective order of importance:
|
|||
|
||||
* [Taleb](https://forecasters.org/blog/2020/06/14/on-single-point-forecasts-for-fat-tailed-variables/): _On single point forecasts for fat tailed variables_. Leitmotiv: Pandemics are fat-tailed.
|
||||
|
||||
> ![](images/d263195904a7942604599ff703fcb71f28d0a156.png) ![](images/860ccc6875dd7044a884708cd8c34c6bb3d70506.png)
|
||||
> ![](.images/d263195904a7942604599ff703fcb71f28d0a156.png) ![](.images/860ccc6875dd7044a884708cd8c34c6bb3d70506.png)
|
||||
|
||||
> We do not need more evidence under fat tailed distributions — it is there in the properties themselves (properties for which we have ample evidence) and these clearly represent risk that must be killed in the egg (when it is still cheap to do so). Secondly, unreliable data — or any source of uncertainty — should make us follow the most paranoid route. \[...\] more uncertainty in a system makes precautionary decisions very easy to make (if I am uncertain about the skills of the pilot, I get off the plane).
|
||||
|
||||
|
|
Before Width: | Height: | Size: 32 KiB After Width: | Height: | Size: 32 KiB |
|
@ -179,7 +179,7 @@ The Foresight Insitute organizes weekly talks; here is one with Samo Burja on [l
|
|||
|
||||
Last, but not least, Ozzie Gooen on [Multivariate estimation & the Squiggly language](https://www.lesswrong.com/posts/kTzADPE26xh3dyTEu/multivariate-estimation-and-the-squiggly-language):
|
||||
|
||||
![](images/fef39d9a14a8ca8986c984ba2f8227d1581d9421.jpg)
|
||||
![](.images/fef39d9a14a8ca8986c984ba2f8227d1581d9421.jpg)
|
||||
|
||||
---
|
||||
|
||||
|
|
Before Width: | Height: | Size: 82 KiB After Width: | Height: | Size: 82 KiB |
|
@ -70,7 +70,7 @@ Boeing [releases](https://www.fool.com/investing/2020/10/15/boeings-commercial-m
|
|||
|
||||
The [World Agricultural Supply and Demand Estimates](https://www.usda.gov/oce/commodity/wasde) is a monthly report by the US Department of Agriculture. It provides monthly estimates and past figures for crops worldwide, and for livestock production in the US specifically (meat, poultry, dairy), which might be of interest to the animal suffering movement. It also provides estimates of the past reliability of those forecasts. The October report can be found [here](https://www.usda.gov/oce/commodity/wasde/wasde1020.pdf), along with a summary [here](https://www.feedstuffs.com/markets/usda-raises-meat-poultry-production-forecast). The image below presents the 2020 and 2021 predictions, as well as the 2019 numbers:
|
||||
|
||||
![](images/c1fa5c2d58e4b0b76d0c7afab896ee89a7289559.png)
|
||||
![](.images/c1fa5c2d58e4b0b76d0c7afab896ee89a7289559.png)
|
||||
|
||||
The Atlantic considers scenarios under which [Trump refuses to concede](https://www.theatlantic.com/magazine/archive/2020/11/what-if-trump-refuses-concede/616424/). Warning: very long, very chilling.
|
||||
|
||||
|
|
Before Width: | Height: | Size: 54 KiB After Width: | Height: | Size: 54 KiB |
|
@ -13,7 +13,7 @@ For example if only the top forecaster wins a prize, you might want to predict a
|
|||
|
||||
Consider for example [this prediction contest](https://predictingpolitics.com/2020/08/02/the-predictions-are-in/), which only had a prize for #1. The following question asks about the margin Biden would win or lose Georgia by:
|
||||
|
||||
![](images/f736a63590b98d329a456b5ff1cc055da86d416c.png)
|
||||
![](.images/f736a63590b98d329a456b5ff1cc055da86d416c.png)
|
||||
|
||||
Then the most likely scenario might be a close race, but the prediction which would have maximized your odds of coming in #1st might be much more extremized, because other predictors are more tightly packed in the middle.
|
||||
|
||||
|
|
Before Width: | Height: | Size: 585 KiB After Width: | Height: | Size: 585 KiB |
Before Width: | Height: | Size: 17 KiB After Width: | Height: | Size: 17 KiB |
Before Width: | Height: | Size: 8.8 KiB After Width: | Height: | Size: 8.8 KiB |
Before Width: | Height: | Size: 11 KiB After Width: | Height: | Size: 11 KiB |
Before Width: | Height: | Size: 19 KiB After Width: | Height: | Size: 19 KiB |
Before Width: | Height: | Size: 12 KiB After Width: | Height: | Size: 12 KiB |
Before Width: | Height: | Size: 11 KiB After Width: | Height: | Size: 11 KiB |
|
@ -73,7 +73,7 @@ I also had a payout for insightful comments, such that the nth most upvoted comm
|
|||
|
||||
Note that the projects were _not_ chosen so as to maximize impact, but rather as to maximize information about whether their value could be predicted.
|
||||
|
||||
![](images/0e2bfcbc0fa993098421b08b67af411ec30e8ffb.png)
|
||||
![](.images/0e2bfcbc0fa993098421b08b67af411ec30e8ffb.png)
|
||||
|
||||
We observe that:
|
||||
|
||||
|
@ -96,7 +96,7 @@ Predictions:
|
|||
* Centered 50% confidence interval: 33 to 74
|
||||
* Centered 95% confidence interval: 11 to 111
|
||||
|
||||
![](images/4a4419c6711561edc7b61a8c0633da2717c0913f.png)
|
||||
![](.images/4a4419c6711561edc7b61a8c0633da2717c0913f.png)
|
||||
|
||||
Actual upvotes after one month: 31
|
||||
|
||||
|
@ -114,7 +114,7 @@ Predictions:
|
|||
* Centered 50% confidence interval: 28 to 68
|
||||
* Centered 95% confidence interval: 9 to 106
|
||||
|
||||
![](images/a71da8f5dd6340d4f81828cb1cb5b20b292c3188.png)
|
||||
![](.images/a71da8f5dd6340d4f81828cb1cb5b20b292c3188.png)
|
||||
|
||||
Actual upvotes after one month: 20 for the [first post](https://www.lesswrong.com/posts/yWmmYLCJft7u7XL5o/some-examples-of-technology-timelines), 49 for the [second one](https://www.lesswrong.com/posts/FaCqw2x59ZFhMXJr9/a-prior-for-technological-discontinuities). I'm taking 49 as the resolution.
|
||||
|
||||
|
@ -132,7 +132,7 @@ Predictions:
|
|||
* Centered 50% confidence interval: 15 to 57
|
||||
* Centered 95% confidence interval: 4 to 91
|
||||
|
||||
![](images/e1d6f9b8bfd27b8f9102143b112b9d9c7fc75fb7.png)
|
||||
![](.images/e1d6f9b8bfd27b8f9102143b112b9d9c7fc75fb7.png)
|
||||
|
||||
Actual upvotes after one month: 22
|
||||
|
||||
|
@ -150,7 +150,7 @@ Predictions:
|
|||
* Centered 50% confidence interval: 17 to 60
|
||||
* Centered 95% confidence interval: 6 to 102
|
||||
|
||||
![](images/7da9d104143c060272f96cbd12bcc27e2c71dbf9.png)
|
||||
![](.images/7da9d104143c060272f96cbd12bcc27e2c71dbf9.png)
|
||||
|
||||
Actual upvotes after one month: 20
|
||||
|
||||
|
@ -167,7 +167,7 @@ Predictions:
|
|||
* Centered 50% confidence interval: 19 to 59
|
||||
* Centered 95% confidence interval: 7 to 92
|
||||
|
||||
![](images/89ff6294dfe357bfebc3640584d42060934103c4.png)
|
||||
![](.images/89ff6294dfe357bfebc3640584d42060934103c4.png)
|
||||
|
||||
Actual upvotes after one month: 27
|
||||
|
||||
|
@ -182,7 +182,7 @@ A further idea was my proposal for my Summer Research Fellowship at FHI, though
|
|||
* Centered 50% confidence interval: 24 to 70
|
||||
* Centered 95% confidence interval: 10 to 103
|
||||
|
||||
![](images/9e6a4434c603e9fd94e9a6819fff247011a5ab93.png)
|
||||
![](.images/9e6a4434c603e9fd94e9a6819fff247011a5ab93.png)
|
||||
|
||||
I take these data-points as further evidence that this setup is interesting or worth it; arguably a major take-away for this project is “a fairly simple forecasting system is able to produce a project which gets accepted to the FHI summer fellowship.” Because the program got ~300 [applications](https://forum.effectivealtruism.org/posts/EPGdwe6vsCY7A9HPa/review-of-fhi-s-summer-research-fellowship-2020), but only 27 participants were accepted, this puts this forecasting setup on the top 9% of applicants in terms of some fuzzy “optimization power” (though this is a simplification, because the project proposal was probably one of many factors.)
|
||||
|
||||
|
|
Before Width: | Height: | Size: 202 KiB After Width: | Height: | Size: 202 KiB |
|
@ -11,7 +11,7 @@ As my primary conclusion, I found that assessing the relative value of projects
|
|||
|
||||
The reader can see my results [here](https://docs.google.com/spreadsheets/d/1BGmkDrQzDCYdMXEvIv7DhMusG-7EwBTkz8MiaEBq9VE/edit#gid=0), though I’d like to note that they’re preliminary and experimental, as is the rest of this post. Towards the end of this post, there is a section which mentions more caveats.
|
||||
|
||||
![](images/16bb08699adfac73a6545a3e056f19480a486da7.png)
|
||||
![](.images/16bb08699adfac73a6545a3e056f19480a486da7.png)
|
||||
|
||||
Related work:
|
||||
|
||||
|
|
Before Width: | Height: | Size: 272 KiB After Width: | Height: | Size: 272 KiB |