feat: add consulting page, a few blogposts

This commit is contained in:
Nuno Sempere 2023-05-06 15:11:15 +00:00
parent 3b9b00acd6
commit 290d51b92b
7 changed files with 254 additions and 1 deletions

View File

@ -0,0 +1,53 @@
// Fish (salmon)
value_happy_salmon_year_in_human_qalys = 0.001 to 0.01
value_tortured_salmon_year_in_human_qalys = -(0.002 to 0.02)
value_farmed_salmon_year_in_human_qalys = -0.02 to 0.01
// ^ purely subjective estimates
lifetime_salmon = 2 to 3
// https://thehumaneleague.org.uk/article/how-long-do-salmon-live
weight_salmon = 3.5 to 5 // kilograms
// ^ https://www.wildcoastsalmon.com/salmonlifecycle
calories_per_kilogram_of_salmon_meat = 1700 to 2100
calories_salmon = weight_salmon * calories_per_kilogram_of_salmon_meat
// ^ see <https://www.quora.com/How-many-calories-are-in-1kg-of-salmon>
salmon_estimates = {
name: "salmon",
value_year_in_human_qalys: value_farmed_salmon_year_in_human_qalys,
weight: weight_salmon,
calories: calories_salmon,
lifetime: lifetime_salmon
}
/* Fish Welfare Initiative
Sources:
- https://www.fishwelfareinitiative.org/
- https://forum.effectivealtruism.org/posts/T5fSphiK6sQ6hyptX/opinion-estimating-invertebrate-sentience#Peter_Hurford
- https://forum.effectivealtruism.org/posts/Qk3hd6PrFManj8K6o/rethink-priorities-welfare-range-estimates
- https://nunosempere.com/blog/2023/02/19/bayesian-adjustment-to-rethink-priorities-welfare-range-estimates/
Key simplification: assume that all fish are salmon. This is inaccurate, because salmon is a very particular & expensive species of fish. But I think it's ok to start with. Later I could easily add different species.
*/
fish_potentially_helped = 1M to 2M
shrimp_potentially_helped = 1M to 2M
improvement_as_proportion_of_lifetime = 0.05 to 0.5
value_fwi_fish =
fish_potentially_helped *
improvement_as_proportion_of_lifetime *
(value_happy_salmon_year_in_human_qalys / salmon_estimates.lifetime)
value_of_shrimp_in_fish = (0.3 to 1)
// ^ very uncertain, subjective
value_fwi_shrimp =
shrimp_potentially_helped *
improvement_as_proportion_of_lifetime *
(value_happy_salmon_year_in_human_qalys / salmon_estimates.lifetime) *
value_of_shrimp_in_fish
value_fwi_so_far = value_fwi_fish + value_fwi_shrimp

View File

@ -3,6 +3,11 @@ Things you should buy, quantified
I've written a notebook using reusable Squiggle components to estimate the value of a few consumer products. You can find it [here](https://squiggle.nunosempere.com/consumer-surplus).
<p>
<section id='isso-thread'>
<noscript>Javascript needs to be activated to view comments.</noscript>

View File

@ -0,0 +1,27 @@
General discussion thread
=========================
Do you want to bring up something to me or to the kinds of people who are likely to read this post? Or do you want to just say hi? This is the post to do it.
### Why am I doing this?
Well, the EA Forum was my preferred forum for discussion for a long time. But in recent times it has become more censorious. Specifically, it has a moderation policy that I don't like: moderators have banned people I like, like [sapphire](https://forum.effectivealtruism.org/users/deluks917) or [Sabs](https://forum.effectivealtruism.org/users/sabs), who sometimes say interesting things. Recently, they banned someone for making a [post they found distasteful](https://forum.effectivealtruism.org/posts/AAZqD2pvydH7Jmaek/?commentId=CDPS2JQKziWsWg73D) during April Fools in the EA forum—whereas I would have made the call that poking fun at sacred cows during April Fools is fair game.
So overall it feels like the EA Forum has become bigger and like it cares less about my values. Specifically, moderators are much more willing than I am to trade off the pursuit of truth in exchange for having fewer rough edges. Shame, though perhaps neccessary to turtle down against actors seeking to harm one.
On the other end of the spectrum, [The Motte](https://www.themotte.org/post/430/quality-contributions-report-for-march-2023) is more spiky (waaay more spiky, warning if you click that link: irreverence towards any and all sacred cows), but it is also more unfocused and less forecasting-adjacent, and I don't know the community there.
So I thought I would try hosting a discussion here. Note that registration isn't needed to post a comment.
### What I expect will happen
I don't really know, that's why I'm trying this!
One salient possibility is that I will go down the same censorship slope as everyone else, where I start out being very in favour of uncensored speech and then decide that that is not optimal, after all. But I'll do my best to not censor stuff in this first thread.
<p>
<section id='isso-thread'>
<noscript>Javascript needs to be activated to view comments.</noscript>
</section>
</p>

View File

@ -0,0 +1,31 @@
A Soothing Frontend for the Effective Altruism Forum
====================================================
### About
<a href="https://forum.nunosempere.com">forum.nunosempere.com</a> is a frontend for the [Effective Altruism Forum](https://forum.effectivealtruism.org). It aims to present EA Forum posts in a way which I personally find soothing. It achieves that that goal at the cost of pretty restricted functionality—like not having a frontpage, or not being able to make or upvote comments and posts.
### Usage
Instead of having a frontpage, this frontend merely has an endpoint:
```
https://forum.nunosempere.com/posts/[post id]/[optional post slug]
```
An [example post](https://forum.nunosempere.com/posts/fz6NdW4i25FZoRQyz/use-of-i-d-bet-on-the-ea-forum-is-mostly-metaphorical)---chosen to be a link post so that you can see both the article and the comments---looks like this:
![](https://i.imgur.com/qlLCR2i.png)
### Notes
- Clicking on an article's or on a comment's timestamp directs to their location on the original EA Forum.
- I am choosing not to have a frontpage because it would just add additional development time to a hobby project, and because it doesn't support my use case. My usecase is just being subscribed via rss, and displaying forum posts which interest me in this new frontend.
- On the topic of rss, this frontend has an rss endpoint at [forum.nunosempere.com/feed](https://forum.nunosempere.com/feed): it piggybacks off the [ea.greaterwrong.com RSS feed](https://ea.greaterwrong.com/?format=rss).
- Other alternatives to the EA Forum known to me are the aforementioned [ea.greaterwrong.com](https://ea.greaterwrong.com/) as well as [eaforum.issarice.com](https://eaforum.issarice.com/).
<p>
<section id='isso-thread'>
<noscript>Javascript needs to be activated to view comments.</noscript>
</section>
</p>

View File

@ -7,7 +7,7 @@ I consider a simple version of "worldview diversification": allocating a set amo
More elaborate versions of worldview diversification are probably able to fix this flaw, for example by instituting trading between the different worldview&mdash;thought that trading does ultimately have to happen. However, I view those solutions as hacks, and I suspect that the problem I outline in this post is indicative of deeper problems with the overall approach of worldview diversification.
This post could have been part of a larger review of EA (Effective Altruism) in general and Open Philanthropy in particular, but I sent a grant request to the EA Infrastructure Fund about it and it doesn't look to be materializing, so that's probably not happening.
This post could have been part of a larger review of EA (Effective Altruism) in general and Open Philanthropy in particular. I sent a grant request to the EA Infrastructure Fund on that topic, but alas it doesn't to be materializing, so that's probably not happening.
### The main flaw: inconsistent relative values

View File

@ -0,0 +1,57 @@
Review of Epoch's *Scaling transformative autoregressive models*
================================================================
We want to forecast the arrival of human-level AI systems. This is a complicated task, and previous attempts have been kind of mediocre. So [this paper](https://epochai.org/files/direct-approach.pdf) proposes a new approach.
The approach has some key assumptions. And then it needs some auxiliary hypotheses and concrete estimates flesh out those key assumptions. Its key assumptions are:
- That a sufficient condition for reaching human-level performance might be indistinguishability: if you can't determine whether a git repository was produced by an expert human programmer or by an AI, this should be a sufficient (though not necessary) demonstration for the AI to have acquired the capability of programming.
- That models' performance will continue growing as predicted by current scaling laws.
Then, note that today's language models are in fact trained to mimic human text. And note that the error that their training method aims to minimize can be decomposed into some irreducible entropy part, and some part in which is due to the model not yet being a good enough mimic. So then, we can ask, when will a language models' loss be low enough that it is approximating human text enough that we can be very sure that it has acquired enough human capabilities that the model will have transformative effects?
For example, if a model is able to produce scientific papers such that it takes multiple papers to distinguish that model from a talented human scientist, then this would be enough to conclude that a model has acquired the capability of doing scientific research as good as that of the talented human scientist.
My high level thought is that this is an elegant approach. It warms my blackened heart a bit. I'm curious about why previous attempts, e.g., various reports commissioned by Open Philanthropy, didn't think of it in some shape.
One thought though, is that the scaling laws that they are referring to do have a large number of degrees of freedom. Their shape is:
<script src="https://polyfill.io/v3/polyfill.min.js?features=es6"></script>
<script id="MathJax-script" async src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script>
$$ L(N,D) = E + \frac{A}{N^{\alpha}} + \frac{B}{D^{\beta}}$$
and we notice that there are four degrees of freedom (A,B, alpha and beta), as well as the choice of overall shape of the formula. The E parameter represents irreducible uncertainty, but also seems to be only empirically estimatable by estimating the other parameters, and thus also seems to take the role of a dangling parameter. I am reminded of the quote "With four parameters I can fit an elephant, and with five I can make him wiggle his trunk". I don't really have strong opinions about whether scaling laws are overfitting on past data. But I do think that most of the uncertainty is going to come from uncertainty about whether scaling laws will continue as they have. Still, having a conditional forecast based on scaling laws continuing is fine.
Anyways, then to flesh out their approach, the authors need some auxiliary assumptions. Some which I could notice are:
1. That at the moment when a rater exceeds some threshold of certainty when trying to catch a model vs a human, their certainty will be very close to their threshold of certainty.
2. That human language is ergodic.
3. That a human trying to distinguish between a mimic model's output and that of a human would exhibit a particular mathematical shape, where a human moves in the same direction as an ideal bayesian reasoner each step, just more slowly.
Honestly, these seem like reasonable analytical approximations. But I wish there had been some discussion of when language cannot be ergodic. What I'm thinking is, that it does sound like a reasonable approximation, but that I haven't thought much about it, and that maybe there is a catch, like texts containing novel scientific discoveries maybe not being ergodic? Also, ergodic over what domain? The domain of texts which could be written, or will ever be written? But then language models don't have future or counterfactual texts in their training data? I don't know, and I haven't thought much about ergodicity either, so I'm uncertain here.
I also found assumption number 3. to be irritating, because the model assumes that human judges have a "slowdown" factor in their updates, but then humans could notice that, estimate their slowdown factor, and then update faster. My guess is that instead of, or in addition to a slowdown factor, humans have an error rate, where they process evidence as pointing to one side that an ideal reasoner would process that same evidence as pointing to another side.
I also didn't really like assumption number 1, about the endline uncertainty when waiting for enough evidence to exceed a threshold of uncertainty being close to that threshold. That assumption wouldn't be the case if, for example, most of the outputs of a model where nearly indistinguishable from a human, but it infrequently had some clear tells or artifacts, like inexact human hands in some arts models or unnatural moves in chess or in go.
But overall, despite being irritated by these auxiliary assumptions, I think they are generally a reasonable move. I don't really expect assumptions 1 and 2 to introduce much error. I'm a bit more wary about assumption number two, because as I said I haven't thought much about ergodicity, but it seems respectable to add more analytical clarity at the cost of wrapping the whole result in another conditional (... if language is such that it can be reasonably modelled as ergodic, then...). Also, the alternative might be to explore each tendril down the forking paths road, but this would be too effortful.
To conclude the analysis, the authors need some forecasts about how indistinguishable AI-produce texts have to be before they are transformative, operationalized as how many tokens one would have to see until one can be confident that they are in fact produced by an AI. Their given forecasts don't sem particularly objectionable, thought that I haven't had enough time to sit down and examine them carefully. More generally though, once the authors have outlined an analytical approach, it doesn't seem that difficult to commission forecasts to feed into their model.
As for suggestions, my main one would be to add more analytical clarity, and to be a bit obsessive about what analytical assumptions, key assumptions or otherwise, are going into the model. Maybe have a chart. If one does this, then the shape of what the paper produces is, I think:
- A conditional forecast
- of an upper bound of compute needed
- to reach transformative AI
The forecast is conditional on scaling laws continuing as they have, and on the various analytical assumptions not introducing too much error. And it's not a forecast of when models will be transformative, but of an upper bound, because as we mentioned at the beginning, indistinguishability is a sufficient but not a necessary condition for transformative AI. The authors point this out at the beginning, but I think this could be pointed out more obviously.
The text also generally needs an editor (e.g., use the first person plural, as there are two authors). As I was reading it, I felt the compulsion to rewrite it in better prose. But I didn't think that it was worth it for me to do that, or to point out style mistakes—besides my wish for greater clarity—because you can just hire an editor for this. And also, an alert reader should be able to extract the core of what you are saying even though prose could be improved. I did write down some impressions as I was reading in a different document, though.
Overall I liked it, and would recommend that it be published. It's the kind of thing that, even if one thinks that it is not enough for a forecast on its own, seems like it would be a valuable input into other forecasts.
<p>
<section id='isso-thread'>
<noscript>Javascript needs to be activated to view comments.</noscript>
</section>
</p>

80
consulting/index.md Normal file
View File

@ -0,0 +1,80 @@
Consulting
==========
This page presents my core competencies, my consulting rates, my description of my ideal client, two testimonials, and a few further thoughts. I can be reached out to at nuno.semperelh@protonmail.com.
### Core competencies
#### Researching:
Some past research outputs that I am proud of are
- [Incentive Problems With Current Forecasting Competitions.](https://forum.effectivealtruism.org/posts/ztmBA8v6KvGChxw92/incentive-problems-with-current-forecasting-competitions)
- [Real-Life Examples of Prediction Systems Interfering with the Real World (Predict-O-Matic Problems)](https://www.lesswrong.com/posts/6bSjRezJDxR2omHKE/real-life-examples-of-prediction-systems-interfering-with)
- [A prior for technological discontinuities](https://www.lesswrong.com/posts/FaCqw2x59ZFhMXJr9/a-prior-for-technological-discontinuities)
- [Better scoring rules](https://github.com/SamotsvetyForecasting/optimal-scoring)
- [Prediction Markets in The Corporate Setting](https://forum.effectivealtruism.org/posts/dQhjwHA7LhfE8YpYF/prediction-markets-in-the-corporate-setting)
- [A concern about the “evolutionary anchor” of Ajeya Cotras report on AI timelines.](https://forum.effectivealtruism.org/posts/FHTyixYNnGaQfEexH/a-concern-about-the-evolutionary-anchor-of-ajeya-cotra-s).
See [here](https://forum.effectivealtruism.org/users/nunosempere?sortedBy=top) for more past projects.
#### Red-teaming
Do you want my most disagreeable self to poke holes in your proposal or idea? I'm happy to do this.
#### Programming
Do you want me to build a website or tool? This is something I'd be excited to do more of. For a larger project, see [metaforecast](https://metaforecast.org/), for smaller projects see [this EA forum frontend](https://forum.nunosempere.com), [shapleyvalue.com](https://shapleyvalue.com/), this [tool to fit a beta distribution to a confidence interval](https://nunosempere.com/blog/2023/03/15/fit-beta/), or [this tool to compute proportional approval voting results for cases with more than one candidate](https://nunosempere.github.io/ea/ProportionalApprovalVoting.html). You might also want to check out [my Github](https://github.com/NunoSempere/).
#### Evaluation and estimation
Do you want me to evaluate a project, estimate its value, and suggest ways you could do better? I am very happy to do this. Some past examples are:
- [A Critical Review of Open Philanthropys Bet On Criminal Justice Reform](https://forum.effectivealtruism.org/posts/h2N9qEbvQ6RHABcae/a-critical-review-of-open-philanthropy-s-bet-on-criminal)
- [Shallow evaluations of longtermist organizations](https://forum.effectivealtruism.org/posts/xmmqDdGqNZq5RELer/shallow-evaluations-of-longtermist-organizations)
- [2018-2019 Long-Term Future Fund Grantees: How did they do?](https://forum.effectivealtruism.org/posts/Ps8ecFPBzSrkLC6ip/2018-2019-long-term-future-fund-grantees-how-did-they-do)
- [External Evaluation of the EA Wiki](https://forum.effectivealtruism.org/posts/kTLR23dFRB5pJryvZ/external-evaluation-of-the-ea-wiki).
- [An experiment eliciting relative estimates for Open Philanthropys 2018 AI safety grants](https://forum.effectivealtruism.org/posts/EPhDMkovGquHtFq3h/an-experiment-eliciting-relative-estimates-for-open).
For a smaller example, in the past I've really enjoyed doing subjective estimates of the value of different career pathways ([1](https://docs.google.com/spreadsheets/d/1QHBaCjf17C1VF_-su-7xHqz1UCwFrlPyCojiN1xzCi0/edit#gid=0), [2](https://docs.google.com/spreadsheets/d/1qvNGkpt9ztOfIYEPXAXJT62wiPQFG9CKm8yGlSHO1Yo/edit#gid=0)).
#### Forecasting
I am happy to host workshops, or advise on tournament or forecasting platform design. If you are looking for specific forecasts, you probably want to hire [Samotsvety](https://samotsvety.org/) instead, which can be reached out at info@samotsvety.org.
### Rates
I deeply value getting hired for more hours, because each engagement has some overhead cost. Therefore, I am deeply discounting buying a larger number of hours.
| # of hours | Cost | Example |
|------------|-------|------------------------------------------------------------------------------------------------|
| 1 hour | ~$200 | Talk to me for an hour about a project you want my input on, organize a forecasting workshop |
| 10 hours | ~$1.5k | Research that draws on my topics of expertise, where I have already thought about the topic, and just have to write it down. For example, [this Bayesian adjustment to Rethink Priorities](https://nunosempere.com/blog/2023/02/19/bayesian-adjustment-to-rethink-priorities-welfare-range-estimates/) |
| 100 hours | ~$10k | An [evaluation of an organization](https://forum.effectivealtruism.org/posts/kTLR23dFRB5pJryvZ/external-evaluation-of-the-ea-wiki), an early version of [metaforecast](https://metaforecast.org), two editions of the [forecasting newsletter](https://forecasting.substack.com/) |
| 1000 hours | reach out | Large research project, an ambitious report on a novel topic, the current iteration of [metaforecast](https://metaforecast.org) |
### Description of client
My ideal client would be someone or some organization who is producing value in the world, and which wants me to identify ways they could do even better in an open-ended way. Because this context would be assumed to be highly collaborative, they would have a high tolerance for disagreeableness. A close second would be someone who is making an important decision commissioning me to estimate the value of the different options they are considering.
My anti-client on the other hand would be someone who has already made up their mind, and who wants me to rubber-stamp their decisions, like a grifty crypto project[^1] asking for help writting up a grant to an EA grantmaker, or someone commissioning an evaluation that they will ignore, or a forecast that doesn't feed into any decisions.
### Testimonials
> I reached out to Nuño in mid-2021 because I was impressed by his "Shallow evaluation of longtermist organizations", and wanted him to conduct an evaluation of the EA Wiki, of which I was at the time the editor. I was very pleased with the rigour and thoroughness of the analysis he produced, and would recommend his services as a project evaluator and forecaster unreservedly. Indeed, I can think of very few other people in this area whom I would endorse as enthusiastically as I do Nuño.
>
> &mdash;Pablo Stafforini
<br>
> Nuño has depth of knowledge and a track record to prove it. His work spans AI, nuclear risk and a myriad other topics, which I have consistently found insightful. A generalist of the highest calibre.
> We hired Nuño to review a complex article about AI and forecasting. He delivered promptly, and we found his work insightful.
>
> &mdash;Jaime Sevilla
### Further details
- I am very amenable to taking on projects that would require more than one person, because I am able to bring in collaborators. I would discuss this with the potential client beforehand.
- Operationally, payouts may go either to the [Quantified Uncertainty Research Institute](https://quantifieduncertainty.org/) (a 503c charity in the US), or to myself directly, to be decided.
[^1]: Note that I don't think all crypto projects are grifty, and I in fact view crypto as a pretty promising area of innovation. It's just that for the last couple years if you wanted to grift, crypto was a good place to do so. And in fact a crypto project that wanted to figure out how to produce more value in the world and capture some fraction of it could be a great client.