More small modifications

2022-07-29 08:07:11 -07:00 · 2022-07-29 08:07:11 -07:00 · 9096ee7051
commit 9096ee7051
parent e13fe6277d
9 changed files with 97 additions and 47 deletions
--- a/packages/website/docs/Api/DistSampleSet.mdx
+++ b/packages/website/docs/Api/DistSampleSet.mdx
@ -3,6 +3,9 @@ sidebar_position: 5
 title: Sample Set Distribution
 ---

+import { SquiggleEditor } from "../../src/components/SquiggleEditor";
+import Admonition from "@theme/Admonition";
+
 Sample set distributions are one of the three distribution formats. Internally, they are stored as a list of numbers. It's useful to distinguish point set distributions from arbitrary lists of numbers to make it clear which functions are applicable.

 Monte Carlo calculations typically result in sample set distributions.
--- a/packages/website/docs/Api/List.md
+++ b/packages/website/docs/Api/List.md
@ -73,6 +73,13 @@ map: (list<'a>, a => b) => list<'b>

 See [Rescript implementation](https://rescript-lang.org/docs/manual/latest/api/belt/array#map).

+### filter
+
+```
+filter: (list<'a>, 'a => bool) => list<'a>
+```
+See [Rescript implementation of keep](https://rescript-lang.org/docs/manual/latest/api/belt/array#keep), which is functionally equivalent.
+
 ### reduce

 ```
@ -97,4 +104,4 @@ reduceReverse: (list<'b>, 'a, ('a, 'b) => 'a) => 'a

 Works like `reduce`, but the function is applied to each item from the last back to the first.

-See [Rescript implementation](https://rescript-lang.org/docs/manual/latest/api/belt/array#reducereverse).
+See [Rescript implementation](https://rescript-lang.org/docs/manual/latest/api/belt/array#reducereverse).
--- a/packages/website/docs/Discussions/Future-Features.md
+++ b/packages/website/docs/Discussions/Future-Features.md
@ -41,14 +41,14 @@ This interface should also be able to handle changing Squiggle values. This is b
 **Importance & quality scores**  
 Workflows/functionality to declare the importance and coveredness of each part of the paramater space. For example, some subsets of the paramater space of a function might be much more important to get right than others. Similarly, the analyst might be much more certain about some parts than others. Ideally. they could decline sections.

-**Static / Sensitivity Analysis**  
+**Static / sensitivity analysis**  
 Guesstimate has Sensitivity analysis that's pretty useful. This could be quite feasible to add, though it will likely require some thinking.

 **Annotation**  
 It might be useful to allow people to annotate functions and variables with longer descriptions, maybe Markdown. This could very much help interpretation/analysis of these items.

-**Randomness Seeds**  
+**Randomness seeds**  
 Right now, Monte Carlo simulations are totally random. It would be nicer to be able to enter a seed somehow in order to control the randomness. Or, with the same seed, the function should always return the same values. This would make debugging and similar easier.

-**Caching/Memoization**  
+**Caching/memoization**  
 There are many performance improvements that Squiggle could have. We'll get to some of them eventually.
--- a/packages/website/docs/Discussions/Gallery.md
+++ b/packages/website/docs/Discussions/Gallery.md
@ -3,7 +3,8 @@ sidebar_position: 2
 title: Gallery
 ---

- [Adjusting probabilities for the passage of time](https://www.lesswrong.com/s/rDe8QE5NvXcZYzgZ3/p/j8o6sgRerE3tqNWdj) by Nuño Sempere
- [GiveWell's GiveDirectly cost effectiveness analysis](https://observablehq.com/@hazelfire/givewells-givedirectly-cost-effectiveness-analysis) by Sam Nolan
- [List of QURI Squiggle Models](https://github.com/quantified-uncertainty/squiggle-models) by Nuño Sempere, Sam Nolan, and Ozzie Gooen
- [Astronomical Waste](https://observablehq.com/@quinn-dougherty/waste)
+* [GiveWell's GiveDirectly cost effectiveness analysis](https://observablehq.com/@hazelfire/givewells-givedirectly-cost-effectiveness-analysis) by Sam Nolan
+* [A Critical Review of Open Philanthropy’s Bet On Criminal Justice Reform](https://forum.effectivealtruism.org/posts/h2N9qEbvQ6RHABcae/a-critical-review-of-open-philanthropy-s-bet-on-criminal) by Nuño Sempere
+* [Samotsvety Nuclear Risk Forecasts — March 2022](https://forum.effectivealtruism.org/posts/KRFXjCqqfGQAYirm5/samotsvety-nuclear-risk-forecasts-march-2022) by Nuño Sempere, Misha Yagudin, Eli Lifland
+* [Adjusting probabilities for the passage of time](https://www.lesswrong.com/s/rDe8QE5NvXcZYzgZ3/p/j8o6sgRerE3tqNWdj) by Nuño Sempere
+* [List of QURI Squiggle Models](https://github.com/quantified-uncertainty/squiggle-models) by Nuño Sempere, Sam Nolan, and Ozzie Gooen
--- a/packages/website/docs/Discussions/Three-Formats-Of-Distributions.md
+++ b/packages/website/docs/Discussions/Three-Formats-Of-Distributions.md
@ -5,17 +5,17 @@ author: Ozzie Gooen
 date: 02-19-2022
 ---

-Probability distributions have several subtle possible formats. Three important ones that we deal with in Squiggle are symbolic, sample set, and graph formats.
+Probability distributions have several subtle possible formats. Three important ones that we deal with in Squiggle are symbolic, sample set, and point set formats.

 _Symbolic_ formats are just the math equations. `normal(5,3)` is the symbolic representation of a normal distribution.

 When you sample distributions (usually starting with symbolic formats), you get lists of samples. Monte Carlo techniques return lists of samples. Let’s call this the “_Sample Set_” format.

-Lastly is what I’ll refer to as the _Graph_ format. It describes the coordinates, or the shape, of the distribution. You can save these formats in JSON, for instance, like, `{xs: [1, 2, 3, 4, …], ys: [.0001, .0003, .002, …]}`.
+Lastly is what I’ll refer to as the _Point Set_ format. It describes the coordinates, or the shape, of the distribution. You can save these formats in JSON, for instance, like, `{xs: [1, 2, 3, 4, …], ys: [.0001, .0003, .002, …]}`.

-Symbolic, Sample Set, and Graph formats all have very different advantages and disadvantages.
+Symbolic, Sample Set, and Point Set formats all have very different advantages and disadvantages.

-Note that the name "Symbolic" is fairly standard, but I haven't found common names for what I'm referring to as "Sample Set" and "Graph" formats. The formats aren't often specifically referred to for these purposes, from what I can tell.
+Note that the name "Symbolic" is fairly standard, but I haven't found common names for what I'm referring to as "Sample Set" and "Point Set" formats. The formats aren't often specifically referred to for these purposes, from what I can tell.

 ## Symbolic Formats

@ -40,7 +40,7 @@ To perform calculations of symbolic systems, you need to find analytical solutio
 - It’s often either impossible or computationally infeasible to find analytical solutions to most symbolic equations.
 - Solving symbolic equations requires very specialized tooling that’s very rare. There are a few small symbolic solver libraries out there, but not many. Wolfram Research is the main group that seems very strong here, and their work is mostly closed source + expensive.

-**Converting to Graph Formats**
+**Converting to Point Set Formats**

 - Very easy. Choose X points such that you capture most of the distribution (you can set a threshold, like 99.9%). For each X point, calculate the pdf, and save as the Y points.

@ -49,23 +49,23 @@ To perform calculations of symbolic systems, you need to find analytical solutio
 - Very easy. Just sample a bunch of times. The regular way is to randomly sample (This is trivial to do for all distributions with inverse-cdf functions.) If you want to get more fancy, you could provide extra samples from the tails, that would be weighted lower. Or, you could take samples in equal distances (of probability mass) along the entire distribution, then optionally shuffle it. (In the latter case, these would not be random samples, but sometimes that’s fine.)

 **How to Visualize**  
-Convert to graph, then display that. (Optionally, you can also convert to samples, then display those using a histogram, but this is often worse you have both options.)
+Convert to point set, then display that. (Optionally, you can also convert to samples, then display those using a histogram, but this is often worse you have both options.)

 **Bonus: The Metalog Distribution**

 The Metalog distribution seems like it can represent almost any reasonable distribution. It’s symbolic. This is great for storage, but it’s not clear if it helps with calculation. My impression is that we don’t have symbolic ways of doing most functions (addition, multiplication, etc) on metalog distributions. Also, note that it can take a fair bit of computation to fit a shape to the Metalog distribution.

-## Graph Formats
+## Point Set Formats

 **TL;DR**  
-Lists of the x-y coordinates of the shape of a distribution. (Usually the pdf, which is more compressed than the cdf). Some key functions (like pdf, cdf) and manipulations can work on almost any graphically-described distribution.
+Lists of the x-y coordinates of the shape of a distribution. (Usually the pdf, which is more compressed than the cdf). Some key functions (like pdf, cdf) and manipulations can work on almost any point set distribution.

 **Alternative Names:**  
 Grid, Mesh, Graph, Vector, Pdf, PdfCoords/PdfPoints, Discretised, Bezier, Curve  
 See [this facebook thread](https://www.facebook.com/ozzie.gooen/posts/10165936265785363?notif_id=1644937423623638&notif_t=feedback_reaction_generic&ref=notif).

 **How to Do Computation**  
-Use graph techniques. These can be fairly computationally-intensive (particularly finding integrals, which take a whole lot of adding). In the case that you want to multiply independent distributions, you can try convolution, but it’s pretty expensive.
+Use point set techniques. These can be fairly computationally-intensive (particularly finding integrals, which take a whole lot of adding). In the case that you want to multiply independent distributions, you can try convolution, but it’s pretty expensive.

 **Examples**  
 `{xs: [1, 2, 3, 4…], ys: [.0001, .0003, .002, .04, ...]} `  
@ -74,18 +74,18 @@ Use graph techniques. These can be fairly computationally-intensive (particularl
 **Advantages**

 - Much more compressed than Sample List formats, but much less compressed than Symbolic formats.
- Many functions (pdf, cdf, percentiles, mean, integration, etc) and manipulations (truncation, scaling horizontally or vertically), are possible on essentially all graph distributions.
+- Many functions (pdf, cdf, percentiles, mean, integration, etc) and manipulations (truncation, scaling horizontally or vertically), are possible on essentially all point set distributions.

 **Disadvantages**

- Most calculations are infeasible/impossible to perform graphically. In these cases, you need to use sampling.
+- Most calculations are infeasible/impossible to perform using point sets formats. In these cases, you need to use sampling.
 - Not as accurate or fast as symbolic methods, where the symbolic methods are applicable.
- The tails get cut off, which is subideal. It’s assumed that the value of the pdf outside of the bounded range is exactly 0, which is not correct. (Note: If you have ideas on how to store graph formats that don’t cut off tails, let me know)
+- The tails get cut off, which is subideal. It’s assumed that the value of the pdf outside of the bounded range is exactly 0, which is not correct. (Note: If you have ideas on how to store point set formats that don’t cut off tails, let me know)

 **Converting to Symbolic Formats**

 - Okay, if you are okay with a Metalog approximation or similar. Metaculus uses an additive combination of up to [Logistic distributions](https://www.metaculus.com/help/faq/); you could also fit this. Fitting takes a little time (it requires several attempts and some optimization), can be arbitrarily accurate.
- If you want to be very fancy, you could try to fit graph distributions into normal / lognormal / etc. but this seems like a lot of work for little gain.
+- If you want to be very fancy, you could try to fit point set distributions into normal / lognormal / etc. but this seems like a lot of work for little gain.

 **Converting to Sample List Formats**

--- a/packages/website/docs/Guides/Gotchas.mdx
+++ b/packages/website/docs/Guides/Gotchas.mdx
@ -0,0 +1,34 @@
+---
+title: Gotchas
+sidebar_position: 8
+---
+
+import { SquiggleEditor } from "../../src/components/SquiggleEditor";
+import Admonition from "@theme/Admonition";
+
+## Point Set Distributions Conversions
+Point Set conversions are done with [kernel density estimation](https://en.wikipedia.org/wiki/Kernel_density_estimation), which is lossy. This might be particularly noticeable in cases where distributions should be entirely above zero.
+
+In this example, we see that the median of this (highly skewed) distribution is positive when it's in a Sample Set format, but negative when it's converted to a Point Set format.
+
+<SquiggleEditor defaultCode={`dist = SampleSet.fromDist(5 to 100000000)
+{
+    sampleSetMedian: quantile(dist, .5),
+    pointSetMedian: quantile(PointSet.fromDist(dist), .5),
+    dist: dist
+}`} />
+
+---
+This can be particularly confusing for visualizations. Visualizations automatically convert distributions into Point Set formats. Therefore, they might often show negative values, even if the underlying distribution is fully positive.
+
+We plan to later support more configuration of kernel density estimation, and for visualiations of Sample Set distributions to instead use histograms.
+
+## Sample Set Correlations
+Correlations with Sample Set distributions are a bit complicated. Monte Carlo generations with Squiggle are ordered. The first sample in one Sample Set distribution will correspond to the first sample in a distribution that comes from a resulting Monte Carlo generation. Therefore, Sample Set distributions in a chain of Monte Carlo generations are likely to all be correlated with each other. This connection breaks if any node changes to the Point Set or Symbolic format.
+
+In this example, we subtract all three types of distributions by themselves. Notice that the Sample Set distribution returns 1. The other two return the result of subtracting one normal distribution from a separate uncorrelated distribution. These results are clearly very different to each other.
+
+<SquiggleEditor defaultCode={`sampleSetDist = normal(5,2) |> SampleSet.fromDist
+sampleSetDistToPointSet = sampleSetDist |> PointSet.fromDist
+symbolicDist = normal(5,2)
+[sampleSetDist-sampleSetDist, sampleSetDistToPointSet-sampleSetDistToPointSet, symbolicDist-symbolicDist]`} />
--- a/packages/website/docs/Integrations.md
+++ b/packages/website/docs/Integrations.md
@ -0,0 +1,29 @@
+---
+sidebar_position: 4
+title: "Integrations"
+---
+
+## Node Packages
+There are two JavaScript packages currently available for Squiggle:
+
+- [`@quri/squiggle-lang`](https://www.npmjs.com/package/@quri/squiggle-lang)
+- [`@quri/squiggle-components`](https://www.npmjs.com/package/@quri/squiggle-components)
+
+Types are available for both packages.
+
+## [Squiggle Language](https://www.npmjs.com/package/@quri/squiggle-lang) ![npm version](https://badge.fury.io/js/@quri%2Fsquiggle-lang.svg)   
+[_See `README.md` in Github_](https://github.com/quantified-uncertainty/squiggle/tree/develop/packages/squiggle-lang#use-the-npm-package)
+
+## [Squiggle Components](https://www.npmjs.com/package/@quri/squiggle-components) ![npm version](https://badge.fury.io/js/@quri%2Fsquiggle-components.svg)  
+[_See `README.md` in Github_](https://github.com/quantified-uncertainty/squiggle/tree/develop/packages/components#usage-in-a-react-project)
+
+This documentation uses `@quri/squiggle-components` frequently.
+
+We host [a storybook](https://squiggle-components.netlify.app/) with details
+and usage of each of the components made available.
+
+## [Visual Studio Code Extension](https://marketplace.visualstudio.com/items?itemName=QURI.vscode-squiggle) ![npm version](https://vsmarketplacebadge.apphb.com/version/QURI.vscode-squiggle.svg)  
+This extention allows you to run and visualize Squiggle code.  
+
+## [Observable Library](https://observablehq.com/@hazelfire/squiggle)
+An exportable [Observable Notebook](https://observablehq.com/@hazelfire/squiggle) of the key components that you can directly import and use in Observable notebooks.
--- a/packages/website/docs/Node-Packages.md
+++ b/packages/website/docs/Node-Packages.md
@ -1,24 +0,0 @@
---
-sidebar_position: 4
-title: Node Packages
---
-
-There are two JavaScript packages currently available for Squiggle:
-
- [`@quri/squiggle-lang`](https://www.npmjs.com/package/@quri/squiggle-lang) ![npm version](https://badge.fury.io/js/@quri%2Fsquiggle-lang.svg)
- [`@quri/squiggle-components`](https://www.npmjs.com/package/@quri/squiggle-components) ![npm version](https://badge.fury.io/js/@quri%2Fsquiggle-components.svg)
-
-Types are available for both packages.
-
-## Squiggle Language
-
-[_See `README.md` in Github_](https://github.com/quantified-uncertainty/squiggle/tree/develop/packages/squiggle-lang#use-the-npm-package)
-
-## Squiggle Components
-
-[_See `README.md` in Github_](https://github.com/quantified-uncertainty/squiggle/tree/develop/packages/components#usage-in-a-react-project)
-
-This documentation uses `@quri/squiggle-components` frequently.
-
-We host [a storybook](https://squiggle-components.netlify.app/) with details
-and usage of each of the components made available.
--- a/packages/website/sidebars.js
+++ b/packages/website/sidebars.js
@ -28,8 +28,8 @@ const sidebars = {
    },
    {
      type: "doc",
-      id: "Node-Packages",
-      label: "Node Packages",
+      id: "Integrations",
+      label: "Integrations",
    },
    {
      type: "category",