tweak: tweaks, add content

This commit is contained in:
Nuno Sempere 2023-02-11 21:44:30 +00:00
parent 6ffef9d8cf
commit 7ce3ae9819
7 changed files with 170 additions and 7 deletions

View File

@ -14,8 +14,8 @@ There previously was a form here, but I think someone was inputting random email
<p><input type="text" name="name" placeholder="Name (helps me filter out malicious entries)" class="subscribe-input"/></ap>
<p>
<input id="82ff8" type="checkbox" name="l" checked value="82ff889c-f9d9-4a45-bf9a-7e2696813021" />
<label for="82ff8" style="font-size: 18px">nunosempere.com</label>
<input id="c469b" type="checkbox" name="l" checked value="c469bee2-2754-4360-b97b-4f8d2bf62363" />
<label for="c469b" style="font-size: 18px">nunosempere.com</label>
</p>
<p><input type="submit" value="Subscribe" class="subscribe-button"/></p>
@ -23,9 +23,5 @@ There previously was a form here, but I think someone was inputting random email
</form>
<p>
...or send me an email to list@nunosempere.com with subject "Subscribe to blog" and your name.
</p>
<p>
The point about malicious entries is curious, so I thought I'd explain it: People wanting to overflow someone's inbox can subscribe them to a lot of newsletters. Sending a confirmation email doesn't fix this, because then the victim is just overflowed with confirmation emails. Apparently substack has also been experiencing problems with this. Anyways, that's why I'll only accept subscriptions for which the person gives a real-sounding name.
The reason why I am asking for subscribers' names is explained <a href="https://nunosempere.com/.subscribe/why-name">here</a>.
</p>

11
.subscribe/why-name.md Normal file
View File

@ -0,0 +1,11 @@
## Why I need subscribers' names
This took me a while to figure out:
- People wanting to overflow someone's inbox can subscribe them to a lot of newsletters.
- Because I'm using relatively standard software, I've been getting large numbers of spurious signups.
- Sending a confirmation email doesn't fix this, because then the victim is just overflowed with confirmation emails
So that's why I'll only accept subscriptions for which the person gives a real-sounding name. Apparently substack has also been experiencing problems with this.
Anyways, if you don't want to give a real name, you can just input "Testy McTestFace" or similar.

View File

@ -0,0 +1,40 @@
Impact markets as a mechanism for not loosing your edge
========================================================
Here is a story I like about how to use impact markets to produce value:
- You are Open Philanthropy and you think that something is not worth funding because it doesn't meet your bar
- You agree that if you later change your mind and *in hindsight*, after the project is completed, come to think you should have funded it, you'll buy the impact shares, in *n* years. That is, if the project needs $X to be completed, you promise you'll spend $X plus some buffer buying its impact shares.
- The market decides whether you are wrong. If the market is confident that you are wrong, it can invest in a project, make it happen, and then later be paid once you realize you were wrong
The reverse variant is a normal prediction market:
- You are Open Philanthropy, and you decide that something is worth funding
- Someone disagrees, and creates a prediction market on whether you will change your mind in *n* years
- You've previously committed to betting some fraction of a grant on such markets
- When the future arrives, if you were right you get more money, if you were wrong you give your money to people who were better at predicting your future evaluations than you, and they will be more able to shift said prediction markets in the future.
So in this story, you create these impact markets and prediction markets because you appreciate having early indicators that something is not a good idea, and you don't want to throw good money after that. You also anticipate being more right if you give the market an incentive to prove you wrong. In this story, you also don't want to lose money, so to keep your edge, you punish yourself for being wrong, and in particular you don't mind giving your money to people who have better models of the future than you do, because, for instance, you could choose to only bet against people you think are altruistic.
A variant of this that I also like is:
- You are the Survival and Flourishing Fund. You think that your methodology is much better, and that OpenPhilanthropy's longtermist branch is being too risk averse
- You agree on some evaluation criteria, and you bet $50M money that your next $50M will have a higher impact than their next $50M
- At the end, the philanthropic institution which has done better gets $50M from the other.
In that story, you make this bet because you think that replacing yourself with a better alternative would be a positive.
Contrast this with the idea of impact markets which I've seen in the water supply, which is something like "impact certificates are like NFTs, and people will want to have them". I don't like that story, because it's missing a lot of steps, and purchasers of impact certificates are taking some very fuzzy risks that people will want to buy the impact-NFTs.
Some notes:
- Although in the end this kind of setup could move large amounts of money, I'd probably recommend starting very small, to train the markets and test and refine the evaluation systems.
- Note that for some bets, Open Philanthropy doesn't need to believe that they are more than 50% likely to succeed, it just has to believe that it's overall worth it. E.g., it could have a 20% chance of succeeding but have a large payoff. That's fine, you could offere a market which takes those odds into account.
<p>
<section id='isso-thread'>
<noscript>Javascript needs to be activated to view comments.</noscript>
</section>
</p>

View File

@ -0,0 +1,116 @@
Straightforwardly eliciting probabilities from GPT-3
==============
I explain two straightforward strategies for eliciting probabilities from language models, and in particular for GPT-3, provide code, and give my thoughts on what I would do if I were being more hardcore about this.
### <p>Straightforward strategies</p>
#### Look at the probability of yes/no completion
Given a binary question, like “At the end of 2023, will Vladimir Putin be President of Russia?” you can create something like the following text for the model to complete:
```
At the end of 2023, will Vladimir Putin be President of Russia? [Yes/No]
```
Then we can compare the relative probabilities of completion to the “Yes,” “yes,” “No” and “no” tokens. This requires a bit of care. Note that we are not making the same query 100 times and looking at the frequencies, but rather asking for the probabilities directly:
<img src="https://i.imgur.com/oNcbTGR.png" class='.img-medium-center'>
You can see a version of this strategy implemented [here](https://github.com/quantified-uncertainty/gpt-predictions/blob/master/src/prediction-methods/predict-logprobs.js).
A related strategy might be to look at what probabilities the model assigns to a pair of sentences with opposite meanings:
* “Putin will be the president of Russia in 2023”
* “Putin will not be the president of Russia in 2023.” 
For example, GPT-3 could assign a probability of 9 \* 10^-N to the first sentence and 10^-N to the second sentence. We could then interpret that as a 90% probability that Putin will be president of Russia by the end of 2023.
But that method has two problems:
* The negatively worded sentence has one word more, and so it might systematically have a lower probability
* [GPT-3s API](https://platform.openai.com/docs/api-reference/introduction) doesnt appear to provide a way of calculating the likelihood of a whole sentence.
#### Have the model output the probability verbally
You can directly ask the model for a probability, as follows:
```
Will Putin be the president of Russia in 2023? Probability in %:
```
Now, the problem with this approach is that, untweaked, it does poorly.
Instead, Ive tried to use templates. For example, here is a template for producing reasoning in base rates:
> Many good forecasts are made in two steps.
>
> 1. Look at the base rate or historical frequency to arrive at a baseline probability.
> 2. Take into account other considerations and update the baseline slightly.
>
> For example, we can answer the question “will there be a schism in the Catholic Church in 2023?” as follows:
>
> 1. There have been around 40 schisms in the 2000 years since the Catholic Church was founded. This is a base rate of 40 schisms / 2000 years = 2% chance of a schism / year. If we only look at the last 100 years, there have been 4 schisms, which is a base rate of 4 schisms / 100 years = 4% chance of a schism / year. In between is 3%, so we will take that as our baseline.
> 2. The Catholic Church in Germany is currently in tension and arguing with Rome. This increases the probability a bit, to 5%.
>
> Therefore, our final probability for “will there be a schism in the Catholic Church in 2023?” is: 5%
>
> For another example, we can answer the question “${question}” as follows:
That approach does somewhat better. The problem is that sometimes the base rate approach isnt quite relevant, because sometimes we have neither a historical record—e.g,. global nuclear war. And sometimes we can't straightforwardly rely on the lack of a historical track record: VR headsets havent really been adopted in the mainstream, but their price has been falling and their quality rising, so making a forecast solely looking at the historical lack of adoption might lead one astray.
You can see some code which implements this strategy [here](https://github.com/quantified-uncertainty/gpt-predictions/blob/master/src/prediction-methods/predict-verbally.js).
### More elaborate strategies
#### Various templates, and choosing the template depending on the type of question
The base rate template is only one of many possible options. We could also look at:
* Laplace rule of succession template: Since X was first possible, how often has it happened?
* “Mainstream plausibility” template: We could prompt a model to simulate how plausible a well-informed member of the public thinks that an event is, and then convert that degree of plausibility into a probability.
* Step-by-step model: What steps need to happen for something to happen, and how likely is each step?
* etc.
The point is that there are different strategies that a forecaster might employ, and we could try to write templates for them. We could also briefly describe them to GPT and ask it to choose on the fly which one would be more relevant to the question at hand.
#### GPT consulting GPT
More elaborate versions of using “templates” are possible. GPT could decompose a problem into subtasks, delegate these to further instances of GPT, and then synthesize and continue working with the task results. Some of this work has been done by Paul Christiano and others under the headline of “HCH” (Humans consulting HCH) or “amplification.”
However, it appears to me that GPT isnt quite ready for this kind of thing, because the quality of its reasoning isnt really high enough to play a game of telephone with itself. Though its possible that a more skilled prompter could get better results. Building tooling for GPT-consulting-GPT seems could get messy, although the research lab [Ought](https://ought.org/) has been doing some work in this area.
#### Query and interact with the internet.
Querying the internet seems like an easy win in order to increase a models knowledge. In particular, it might not be that difficult to query and summarize  up-to-date Wikipedia pages, or Google News articles.
#### Fine-tune the model on good worked examples of forecasting reasoning
1. Collect 100 to 1k examples of worked forecasting questions from good forecasters.
2. Fine-tune a model on those worked forecasting rationales.
3. Elicit similar reasoning from the model.
### Parting thoughts
You can see the first two strategies applied to SlateStarCodex in [this Google document](https://docs.google.com/spreadsheets/d/1Idy9Bfs6VX_ucykhCPWvDs9HiubKY_rothQnFfseR_c/edit?usp=sharing).
Overall, the probabilities outputted by GPT appear to be quite mediocre as of 2023-02-06, and so I abandoned further tweaks.
<img src="https://i.imgur.com/jNrnGdU.png" class='.img-medium-center'>
In the above image, I think that we are in the first orange region, where the returns to fine-tuning and tweaking just arent that exciting. Though it is also possible that having tweaks and tricks ready might help us identify that the curve is turning steeper a bit earlier.
### Acknowledgements
<img src="https://i.imgur.com/3uQgbcw.png" style="width: 20%;">
<br>
This is a project of the [Quantified Uncertainty Research Institute](https://quantifieduncertainty.org/). Thanks to Ozzie Gooen and Adam Papineau for comments and suggestions.
<p>
<section id='isso-thread'>
<noscript>Javascript needs to be activated to view comments.</noscript>
</section>
</p>