Merge pull request #912 from quantified-uncertainty/doc-refactor-july2022-2

Next documentation update, july 2022
This commit is contained in:
Ozzie Gooen 2022-07-29 10:38:03 -07:00 committed by GitHub
commit f96f8de92a
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
9 changed files with 104 additions and 44 deletions

View File

@ -73,6 +73,14 @@ map: (list<'a>, a => b) => list<'b>
See [Rescript implementation](https://rescript-lang.org/docs/manual/latest/api/belt/array#map). See [Rescript implementation](https://rescript-lang.org/docs/manual/latest/api/belt/array#map).
### filter
```
filter: (list<'a>, 'a => bool) => list<'a>
```
See [Rescript implementation of keep](https://rescript-lang.org/docs/manual/latest/api/belt/array#keep), which is functionally equivalent.
### reduce ### reduce
``` ```

View File

@ -41,14 +41,14 @@ This interface should also be able to handle changing Squiggle values. This is b
**Importance & quality scores** **Importance & quality scores**
Workflows/functionality to declare the importance and coveredness of each part of the paramater space. For example, some subsets of the paramater space of a function might be much more important to get right than others. Similarly, the analyst might be much more certain about some parts than others. Ideally. they could decline sections. Workflows/functionality to declare the importance and coveredness of each part of the paramater space. For example, some subsets of the paramater space of a function might be much more important to get right than others. Similarly, the analyst might be much more certain about some parts than others. Ideally. they could decline sections.
**Static / Sensitivity Analysis** **Static / sensitivity analysis**
Guesstimate has Sensitivity analysis that's pretty useful. This could be quite feasible to add, though it will likely require some thinking. Guesstimate has Sensitivity analysis that's pretty useful. This could be quite feasible to add, though it will likely require some thinking.
**Annotation** **Annotation**
It might be useful to allow people to annotate functions and variables with longer descriptions, maybe Markdown. This could very much help interpretation/analysis of these items. It might be useful to allow people to annotate functions and variables with longer descriptions, maybe Markdown. This could very much help interpretation/analysis of these items.
**Randomness Seeds** **Randomness seeds**
Right now, Monte Carlo simulations are totally random. It would be nicer to be able to enter a seed somehow in order to control the randomness. Or, with the same seed, the function should always return the same values. This would make debugging and similar easier. Right now, Monte Carlo simulations are totally random. It would be nicer to be able to enter a seed somehow in order to control the randomness. Or, with the same seed, the function should always return the same values. This would make debugging and similar easier.
**Caching/Memoization** **Caching/memoization**
There are many performance improvements that Squiggle could have. We'll get to some of them eventually. There are many performance improvements that Squiggle could have. We'll get to some of them eventually.

View File

@ -3,7 +3,8 @@ sidebar_position: 2
title: Gallery title: Gallery
--- ---
- [Adjusting probabilities for the passage of time](https://www.lesswrong.com/s/rDe8QE5NvXcZYzgZ3/p/j8o6sgRerE3tqNWdj) by Nuño Sempere
- [GiveWell's GiveDirectly cost effectiveness analysis](https://observablehq.com/@hazelfire/givewells-givedirectly-cost-effectiveness-analysis) by Sam Nolan - [GiveWell's GiveDirectly cost effectiveness analysis](https://observablehq.com/@hazelfire/givewells-givedirectly-cost-effectiveness-analysis) by Sam Nolan
- [A Critical Review of Open Philanthropys Bet On Criminal Justice Reform](https://forum.effectivealtruism.org/posts/h2N9qEbvQ6RHABcae/a-critical-review-of-open-philanthropy-s-bet-on-criminal) by Nuño Sempere
- [Samotsvety Nuclear Risk Forecasts — March 2022](https://forum.effectivealtruism.org/posts/KRFXjCqqfGQAYirm5/samotsvety-nuclear-risk-forecasts-march-2022) by Nuño Sempere, Misha Yagudin, Eli Lifland
- [Adjusting probabilities for the passage of time](https://www.lesswrong.com/s/rDe8QE5NvXcZYzgZ3/p/j8o6sgRerE3tqNWdj) by Nuño Sempere
- [List of QURI Squiggle Models](https://github.com/quantified-uncertainty/squiggle-models) by Nuño Sempere, Sam Nolan, and Ozzie Gooen - [List of QURI Squiggle Models](https://github.com/quantified-uncertainty/squiggle-models) by Nuño Sempere, Sam Nolan, and Ozzie Gooen
- [Astronomical Waste](https://observablehq.com/@quinn-dougherty/waste)

View File

@ -5,17 +5,17 @@ author: Ozzie Gooen
date: 02-19-2022 date: 02-19-2022
--- ---
Probability distributions have several subtle possible formats. Three important ones that we deal with in Squiggle are symbolic, sample set, and graph formats. Probability distributions have several subtle possible formats. Three important ones that we deal with in Squiggle are symbolic, sample set, and point set formats.
_Symbolic_ formats are just the math equations. `normal(5,3)` is the symbolic representation of a normal distribution. _Symbolic_ formats are just the math equations. `normal(5,3)` is the symbolic representation of a normal distribution.
When you sample distributions (usually starting with symbolic formats), you get lists of samples. Monte Carlo techniques return lists of samples. Lets call this the “_Sample Set_” format. When you sample distributions (usually starting with symbolic formats), you get lists of samples. Monte Carlo techniques return lists of samples. Lets call this the “_Sample Set_” format.
Lastly is what Ill refer to as the _Graph_ format. It describes the coordinates, or the shape, of the distribution. You can save these formats in JSON, for instance, like, `{xs: [1, 2, 3, 4, …], ys: [.0001, .0003, .002, …]}`. Lastly is what Ill refer to as the _Point Set_ format. It describes the coordinates, or the shape, of the distribution. You can save these formats in JSON, for instance, like, `{xs: [1, 2, 3, 4, …], ys: [.0001, .0003, .002, …]}`.
Symbolic, Sample Set, and Graph formats all have very different advantages and disadvantages. Symbolic, Sample Set, and Point Set formats all have very different advantages and disadvantages.
Note that the name "Symbolic" is fairly standard, but I haven't found common names for what I'm referring to as "Sample Set" and "Graph" formats. The formats aren't often specifically referred to for these purposes, from what I can tell. Note that the name "Symbolic" is fairly standard, but I haven't found common names for what I'm referring to as "Sample Set" and "Point Set" formats. The formats aren't often specifically referred to for these purposes, from what I can tell.
## Symbolic Formats ## Symbolic Formats
@ -40,7 +40,7 @@ To perform calculations of symbolic systems, you need to find analytical solutio
- Its often either impossible or computationally infeasible to find analytical solutions to most symbolic equations. - Its often either impossible or computationally infeasible to find analytical solutions to most symbolic equations.
- Solving symbolic equations requires very specialized tooling thats very rare. There are a few small symbolic solver libraries out there, but not many. Wolfram Research is the main group that seems very strong here, and their work is mostly closed source + expensive. - Solving symbolic equations requires very specialized tooling thats very rare. There are a few small symbolic solver libraries out there, but not many. Wolfram Research is the main group that seems very strong here, and their work is mostly closed source + expensive.
**Converting to Graph Formats** **Converting to Point Set Formats**
- Very easy. Choose X points such that you capture most of the distribution (you can set a threshold, like 99.9%). For each X point, calculate the pdf, and save as the Y points. - Very easy. Choose X points such that you capture most of the distribution (you can set a threshold, like 99.9%). For each X point, calculate the pdf, and save as the Y points.
@ -49,23 +49,23 @@ To perform calculations of symbolic systems, you need to find analytical solutio
- Very easy. Just sample a bunch of times. The regular way is to randomly sample (This is trivial to do for all distributions with inverse-cdf functions.) If you want to get more fancy, you could provide extra samples from the tails, that would be weighted lower. Or, you could take samples in equal distances (of probability mass) along the entire distribution, then optionally shuffle it. (In the latter case, these would not be random samples, but sometimes thats fine.) - Very easy. Just sample a bunch of times. The regular way is to randomly sample (This is trivial to do for all distributions with inverse-cdf functions.) If you want to get more fancy, you could provide extra samples from the tails, that would be weighted lower. Or, you could take samples in equal distances (of probability mass) along the entire distribution, then optionally shuffle it. (In the latter case, these would not be random samples, but sometimes thats fine.)
**How to Visualize** **How to Visualize**
Convert to graph, then display that. (Optionally, you can also convert to samples, then display those using a histogram, but this is often worse you have both options.) Convert to point set, then display that. (Optionally, you can also convert to samples, then display those using a histogram, but this is often worse you have both options.)
**Bonus: The Metalog Distribution** **Bonus: The Metalog Distribution**
The Metalog distribution seems like it can represent almost any reasonable distribution. Its symbolic. This is great for storage, but its not clear if it helps with calculation. My impression is that we dont have symbolic ways of doing most functions (addition, multiplication, etc) on metalog distributions. Also, note that it can take a fair bit of computation to fit a shape to the Metalog distribution. The Metalog distribution seems like it can represent almost any reasonable distribution. Its symbolic. This is great for storage, but its not clear if it helps with calculation. My impression is that we dont have symbolic ways of doing most functions (addition, multiplication, etc) on metalog distributions. Also, note that it can take a fair bit of computation to fit a shape to the Metalog distribution.
## Graph Formats ## Point Set Formats
**TL;DR** **TL;DR**
Lists of the x-y coordinates of the shape of a distribution. (Usually the pdf, which is more compressed than the cdf). Some key functions (like pdf, cdf) and manipulations can work on almost any graphically-described distribution. Lists of the x-y coordinates of the shape of a distribution. (Usually the pdf, which is more compressed than the cdf). Some key functions (like pdf, cdf) and manipulations can work on almost any point set distribution.
**Alternative Names:** **Alternative Names:**
Grid, Mesh, Graph, Vector, Pdf, PdfCoords/PdfPoints, Discretised, Bezier, Curve Grid, Mesh, Graph, Vector, Pdf, PdfCoords/PdfPoints, Discretised, Bezier, Curve
See [this facebook thread](https://www.facebook.com/ozzie.gooen/posts/10165936265785363?notif_id=1644937423623638&notif_t=feedback_reaction_generic&ref=notif). See [this facebook thread](https://www.facebook.com/ozzie.gooen/posts/10165936265785363?notif_id=1644937423623638&notif_t=feedback_reaction_generic&ref=notif).
**How to Do Computation** **How to Do Computation**
Use graph techniques. These can be fairly computationally-intensive (particularly finding integrals, which take a whole lot of adding). In the case that you want to multiply independent distributions, you can try convolution, but its pretty expensive. Use point set techniques. These can be fairly computationally-intensive (particularly finding integrals, which take a whole lot of adding). In the case that you want to multiply independent distributions, you can try convolution, but its pretty expensive.
**Examples** **Examples**
`{xs: [1, 2, 3, 4…], ys: [.0001, .0003, .002, .04, ...]} ` `{xs: [1, 2, 3, 4…], ys: [.0001, .0003, .002, .04, ...]} `
@ -74,18 +74,18 @@ Use graph techniques. These can be fairly computationally-intensive (particularl
**Advantages** **Advantages**
- Much more compressed than Sample List formats, but much less compressed than Symbolic formats. - Much more compressed than Sample List formats, but much less compressed than Symbolic formats.
- Many functions (pdf, cdf, percentiles, mean, integration, etc) and manipulations (truncation, scaling horizontally or vertically), are possible on essentially all graph distributions. - Many functions (pdf, cdf, percentiles, mean, integration, etc) and manipulations (truncation, scaling horizontally or vertically), are possible on essentially all point set distributions.
**Disadvantages** **Disadvantages**
- Most calculations are infeasible/impossible to perform graphically. In these cases, you need to use sampling. - Most calculations are infeasible/impossible to perform using point sets formats. In these cases, you need to use sampling.
- Not as accurate or fast as symbolic methods, where the symbolic methods are applicable. - Not as accurate or fast as symbolic methods, where the symbolic methods are applicable.
- The tails get cut off, which is subideal. Its assumed that the value of the pdf outside of the bounded range is exactly 0, which is not correct. (Note: If you have ideas on how to store graph formats that dont cut off tails, let me know) - The tails get cut off, which is subideal. Its assumed that the value of the pdf outside of the bounded range is exactly 0, which is not correct. (Note: If you have ideas on how to store point set formats that dont cut off tails, let me know)
**Converting to Symbolic Formats** **Converting to Symbolic Formats**
- Okay, if you are okay with a Metalog approximation or similar. Metaculus uses an additive combination of up to [Logistic distributions](https://www.metaculus.com/help/faq/); you could also fit this. Fitting takes a little time (it requires several attempts and some optimization), can be arbitrarily accurate. - Okay, if you are okay with a Metalog approximation or similar. Metaculus uses an additive combination of up to [Logistic distributions](https://www.metaculus.com/help/faq/); you could also fit this. Fitting takes a little time (it requires several attempts and some optimization), can be arbitrarily accurate.
- If you want to be very fancy, you could try to fit graph distributions into normal / lognormal / etc. but this seems like a lot of work for little gain. - If you want to be very fancy, you could try to fit point set distributions into normal / lognormal / etc. but this seems like a lot of work for little gain.
**Converting to Sample List Formats** **Converting to Sample List Formats**

View File

@ -0,0 +1,41 @@
---
title: Gotchas
sidebar_position: 8
---
import { SquiggleEditor } from "../../src/components/SquiggleEditor";
import Admonition from "@theme/Admonition";
## Point Set Distributions Conversions
Point Set conversions are done with [kernel density estimation](https://en.wikipedia.org/wiki/Kernel_density_estimation), which is lossy. This might be particularly noticeable in cases where distributions should be entirely above zero.
In this example, we see that the median of this (highly skewed) distribution is positive when it's in a Sample Set format, but negative when it's converted to a Point Set format.
<SquiggleEditor
defaultCode={`dist = SampleSet.fromDist(5 to 100000000)
{
sampleSetMedian: quantile(dist, .5),
pointSetMedian: quantile(PointSet.fromDist(dist), .5),
dist: dist
}`}
/>
---
This can be particularly confusing for visualizations. Visualizations automatically convert distributions into Point Set formats. Therefore, they might often show negative values, even if the underlying distribution is fully positive.
We plan to later support more configuration of kernel density estimation, and for visualiations of Sample Set distributions to instead use histograms.
## Sample Set Correlations
Correlations with Sample Set distributions are a bit complicated. Monte Carlo generations with Squiggle are ordered. The first sample in one Sample Set distribution will correspond to the first sample in a distribution that comes from a resulting Monte Carlo generation. Therefore, Sample Set distributions in a chain of Monte Carlo generations are likely to all be correlated with each other. This connection breaks if any node changes to the Point Set or Symbolic format.
In this example, we subtract all three types of distributions by themselves. Notice that the Sample Set distribution returns 1. The other two return the result of subtracting one normal distribution from a separate uncorrelated distribution. These results are clearly very different to each other.
<SquiggleEditor
defaultCode={`sampleSetDist = normal(5,2) |> SampleSet.fromDist
sampleSetDistToPointSet = sampleSetDist |> PointSet.fromDist
symbolicDist = normal(5,2)
[sampleSetDist-sampleSetDist, sampleSetDistToPointSet-sampleSetDistToPointSet, symbolicDist-symbolicDist]`}
/>

View File

@ -0,0 +1,34 @@
---
sidebar_position: 4
title: "Integrations"
---
## Node Packages
There are two JavaScript packages currently available for Squiggle:
- [`@quri/squiggle-lang`](https://www.npmjs.com/package/@quri/squiggle-lang)
- [`@quri/squiggle-components`](https://www.npmjs.com/package/@quri/squiggle-components)
Types are available for both packages.
## [Squiggle Language](https://www.npmjs.com/package/@quri/squiggle-lang) ![npm version](https://badge.fury.io/js/@quri%2Fsquiggle-lang.svg)
[_See `README.md` in Github_](https://github.com/quantified-uncertainty/squiggle/tree/develop/packages/squiggle-lang#use-the-npm-package)
## [Squiggle Components](https://www.npmjs.com/package/@quri/squiggle-components) ![npm version](https://badge.fury.io/js/@quri%2Fsquiggle-components.svg)
[_See `README.md` in Github_](https://github.com/quantified-uncertainty/squiggle/tree/develop/packages/components#usage-in-a-react-project)
This documentation uses `@quri/squiggle-components` frequently.
We host [a storybook](https://squiggle-components.netlify.app/) with details
and usage of each of the components made available.
## [Visual Studio Code Extension](https://marketplace.visualstudio.com/items?itemName=QURI.vscode-squiggle) ![npm version](https://vsmarketplacebadge.apphb.com/version/QURI.vscode-squiggle.svg)
This extention allows you to run and visualize Squiggle code.
## [Observable Library](https://observablehq.com/@hazelfire/squiggle)
An exportable [Observable Notebook](https://observablehq.com/@hazelfire/squiggle) of the key components that you can directly import and use in Observable notebooks.

View File

@ -1,24 +0,0 @@
---
sidebar_position: 4
title: Node Packages
---
There are two JavaScript packages currently available for Squiggle:
- [`@quri/squiggle-lang`](https://www.npmjs.com/package/@quri/squiggle-lang) ![npm version](https://badge.fury.io/js/@quri%2Fsquiggle-lang.svg)
- [`@quri/squiggle-components`](https://www.npmjs.com/package/@quri/squiggle-components) ![npm version](https://badge.fury.io/js/@quri%2Fsquiggle-components.svg)
Types are available for both packages.
## Squiggle Language
[_See `README.md` in Github_](https://github.com/quantified-uncertainty/squiggle/tree/develop/packages/squiggle-lang#use-the-npm-package)
## Squiggle Components
[_See `README.md` in Github_](https://github.com/quantified-uncertainty/squiggle/tree/develop/packages/components#usage-in-a-react-project)
This documentation uses `@quri/squiggle-components` frequently.
We host [a storybook](https://squiggle-components.netlify.app/) with details
and usage of each of the components made available.

View File

@ -28,8 +28,8 @@ const sidebars = {
}, },
{ {
type: "doc", type: "doc",
id: "Node-Packages", id: "Integrations",
label: "Node Packages", label: "Integrations",
}, },
{ {
type: "category", type: "category",