squiggle/packages/website/docs/Guides/DistributionCreation.mdx

363 lines
12 KiB
Plaintext
Raw Normal View History

---
2022-05-01 12:09:34 +00:00
title: "Distribution Creation"
2022-06-14 21:37:59 +00:00
sidebar_position: 2
---
import { SquiggleEditor } from "../../src/components/SquiggleEditor";
import Admonition from "@theme/Admonition";
import Tabs from "@theme/Tabs";
import TabItem from "@theme/TabItem";
## To
2022-05-01 12:09:34 +00:00
`(5thPercentile: number) to (95thPercentile: number)`
`to(5thPercentile: number, 95thPercentile: number)`
The `to` function is an easy way to generate simple distributions using predicted _5th_ and _95th_ percentiles.
If both values are above zero, a `lognormal` distribution is used. If not, a `normal` distribution is used.
<Tabs>
<TabItem value="ex1" label="5 to 10" default>
2022-05-03 15:12:44 +00:00
When <code>5 to 10</code> is entered, both numbers are positive, so it
generates a lognormal distribution with 5th and 95th percentiles at 5 and
10.
<SquiggleEditor defaultCode="5 to 10" />
</TabItem>
<TabItem value="ex3" label="to(5,10)">
2022-05-03 15:12:44 +00:00
<code>5 to 10</code> does the same thing as <code>to(5,10)</code>.
<SquiggleEditor defaultCode="to(5,10)" />
</TabItem>
<TabItem value="ex2" label="-5 to 5">
2022-05-03 15:12:44 +00:00
When <code>-5 to 5</code> is entered, there's negative values, so it
generates a normal distribution. This has 5th and 95th percentiles at 5 and
10.
<SquiggleEditor defaultCode="-5 to -3" />
</TabItem>
<TabItem value="ex4" label="1 to 10000">
It's very easy to generate distributions with very long tails. If this
happens, you can click the "log x scale" box to view this using a log scale.
<SquiggleEditor defaultCode="1 to 10000" />
</TabItem>
</Tabs>
### Arguments
2022-05-01 12:09:34 +00:00
- `5thPercentile`: number
- `95thPercentile`: number, greater than `5thPercentile`
<Admonition type="tip" title="Tip">
<p>
"<bold>To</bold>" is a great way to generate probability distributions very
quickly from your intuitions. It's easy to write and easy to read. It's
often a good place to begin an estimate.
</p>
</Admonition>
<Admonition type="caution" title="Caution">
<p>
If you haven't tried{" "}
<a href="https://www.lesswrong.com/posts/LdFbx9oqtKAAwtKF3/list-of-probability-calibration-exercises">
calibration training
</a>
, you're likely to be overconfident. We recommend doing calibration training
to get a feel for what a 90 percent confident interval feels like.
</p>
</Admonition>
## Mixture
`mixture(...distributions: Distribution[], weights?: number[])`
2022-05-01 12:09:34 +00:00
`mx(...distributions: Distribution[], weights?: number[])`
`mixture(distributions: Distributions[], weights?: number[])`
`mx(distributions: Distributions[], weights?: number[])`
The `mixture` mixes combines multiple distributions to create a mixture. You can optionally pass in a list of proportional weights.
<Tabs>
<TabItem value="ex1" label="Simple" default>
<SquiggleEditor defaultCode="mixture(1 to 2, 5 to 8, 9 to 10)" />
</TabItem>
<TabItem value="ex2" label="With Weights">
<SquiggleEditor defaultCode="mixture(1 to 2, 5 to 8, 9 to 10, [0.1, 0.1, 0.8])" />
</TabItem>
<TabItem value="ex3" label="With Continuous and Discrete Inputs">
<SquiggleEditor defaultCode="mixture(1 to 5, 8 to 10, 1, 3, 20)" />
</TabItem>
<TabItem value="ex4" label="Array of Distributions Input">
<SquiggleEditor defaultCode="mx([1 to 2, exponential(1)], [1,1])" />
</TabItem>
</Tabs>
### Arguments
2022-06-13 04:19:28 +00:00
- `distributions`: A set of distributions or numbers, each passed as a paramater. Numbers will be converted into point mass distributions.
2022-05-01 12:09:34 +00:00
- `weights`: An optional array of numbers, each representing the weight of its corresponding distribution. The weights will be re-scaled to add to `1.0`. If a weights array is provided, it must be the same length as the distribution paramaters.
### Aliases
- `mx`
### Special Use Cases of Mixtures
<details>
<summary>🕐 Zero or Continuous</summary>
<p>
One common reason to have mixtures of continous and discrete distributions is to handle the special case of 0.
2022-05-01 12:09:34 +00:00
Say I want to model the time I will spend on some upcoming project. I think I have an 80% chance of doing it.
</p>
<p>
In this case, I have a 20% chance of spending 0 time with it. I might estimate my hours with,
</p>
<SquiggleEditor
defaultCode={`hours_the_project_will_take = 5 to 20
chance_of_doing_anything = 0.8
mx(hours_the_project_will_take, 0, [chance_of_doing_anything, 1 - chance_of_doing_anything])`}
/>
</details>
<details>
<summary>🔒 Model Uncertainty Safeguarding</summary>
<p>
One technique several <a href="https://www.foretold.io/">Foretold.io</a> users used is to combine their main guess, with a
"just-in-case distribution". This latter distribution would have very low weight, but would be
very wide, just in case they were dramatically off for some weird reason.
</p>
<SquiggleEditor
defaultCode={`forecast = 3 to 30
chance_completely_wrong = 0.05
forecast_if_completely_wrong = -100 to 200
mx(forecast, forecast_if_completely_wrong, [1-chance_completely_wrong, chance_completely_wrong])`}
/>
</details>
## Normal
2022-05-01 12:09:34 +00:00
`normal(mean:number, standardDeviation:number)`
Creates a [normal distribution](https://en.wikipedia.org/wiki/Normal_distribution) with the given mean and standard deviation.
2022-05-01 19:04:00 +00:00
<Tabs>
<TabItem value="ex1" label="normal(5,1)" default>
<SquiggleEditor defaultCode="normal(5, 1)" />
</TabItem>
<TabItem value="ex2" label="normal(100000000000, 100000000000)">
<SquiggleEditor defaultCode="normal(100000000000, 100000000000)" />
</TabItem>
</Tabs>
### Arguments
2022-05-01 12:09:34 +00:00
- `mean`: Number
- `standard deviation`: Number greater than zero
[Wikipedia](https://en.wikipedia.org/wiki/Normal_distribution)
## Log-normal
2022-05-01 19:04:00 +00:00
`lognormal(mu: number, sigma: number)`
Creates a [log-normal distribution](https://en.wikipedia.org/wiki/Log-normal_distribution) with the given mu and sigma.
`Mu` and `sigma` represent the mean and standard deviation of the normal which results when
you take the log of our lognormal distribution. They can be difficult to directly reason about.
Because of this complexity, we recommend typically using the <a href="#to">to</a> syntax instead of estimating `mu` and `sigma` directly.
2022-05-01 12:09:34 +00:00
<SquiggleEditor defaultCode="lognormal(0, 0.7)" />
### Arguments
2022-05-01 12:09:34 +00:00
- `mu`: Number
- `sigma`: Number greater than zero
[Wikipedia](https://en.wikipedia.org/wiki/Log-normal_distribution)
<details>
2022-05-01 19:04:00 +00:00
<summary>
❓ Understanding <bold>mu</bold> and <bold>sigma</bold>
</summary>
<p>
2022-05-03 15:12:44 +00:00
The log of <code>lognormal(mu, sigma)</code> is a normal distribution with
mean <code>mu</code>
and standard deviation <code>sigma</code>. For example, these two distributions
are identical:
</p>
2022-05-01 19:04:00 +00:00
<SquiggleEditor
defaultCode={`normalMean = 10
normalStdDev = 2
logOfLognormal = log(lognormal(normalMean, normalStdDev))
[logOfLognormal, normal(normalMean, normalStdDev)]`}
2022-05-01 19:04:00 +00:00
/>
</details>
## Uniform
2022-05-01 12:09:34 +00:00
`uniform(low:number, high:number)`
2022-05-01 19:04:00 +00:00
Creates a [uniform distribution](<https://en.wikipedia.org/wiki/Uniform_distribution_(continuous)>) with the given low and high values.
<SquiggleEditor defaultCode="uniform(3,7)" />
### Arguments
2022-05-01 12:09:34 +00:00
- `low`: Number
- `high`: Number greater than `low`
<Admonition type="caution" title="Caution">
<p>
2022-05-01 19:04:00 +00:00
While uniform distributions are very simple to understand, we find it rare
to find uncertainties that actually look like this. Before using a uniform
distribution, think hard about if you are really 100% confident that the
paramater will not wind up being just outside the stated boundaries.
</p>
2022-05-01 19:04:00 +00:00
<p>
2022-05-01 19:04:00 +00:00
One good example of a uniform distribution uncertainty would be clear
physical limitations. You might have complete complete uncertainty on what
time of day an event will occur, but can say with 100% confidence it will
happen between the hours of 0:00 and 24:00.
</p>
</Admonition>
2022-06-13 04:19:28 +00:00
## Point Mass
2022-05-01 19:04:00 +00:00
2022-06-13 04:19:28 +00:00
`pointMass(value:number)`
2022-05-01 19:04:00 +00:00
Creates a discrete distribution with all of its probability mass at point `value`.
2022-06-13 04:19:28 +00:00
Few Squiggle users call the function `pointMass()` directly. Numbers are converted into point mass distributions automatically, when it is appropriate.
2022-05-03 15:06:53 +00:00
2022-06-13 04:19:28 +00:00
For example, in the function `mixture(1,2,normal(5,2))`, the first two arguments will get converted into point mass distributions
with values at 1 and 2. Therefore, this is the same as `mixture(pointMass(1),pointMass(2),pointMass(5,2))`.
2022-05-03 15:06:53 +00:00
2022-06-13 04:19:28 +00:00
`pointMass()` distributions are currently the only discrete distributions accessible in Squiggle.
2022-05-01 19:04:00 +00:00
<Tabs>
2022-06-13 04:19:28 +00:00
<TabItem value="ex1" label="pointMass(3)" default>
<SquiggleEditor defaultCode="pointMass(3)" />
2022-05-01 19:04:00 +00:00
</TabItem>
<TabItem value="ex3" label="mixture(1,3,5)">
<SquiggleEditor defaultCode="mixture(1,3,5)" />
2022-05-01 19:04:00 +00:00
</TabItem>
<TabItem value="ex2" label="normal(5,2) * 6">
<SquiggleEditor defaultCode="normal(5,2) * 6" />
2022-05-01 19:04:00 +00:00
</TabItem>
2022-05-03 15:06:53 +00:00
<TabItem value="ex4" label="dotAdd(normal(5,2), 6)">
<SquiggleEditor defaultCode="dotAdd(normal(5,2), 6)" />
2022-05-03 15:06:53 +00:00
</TabItem>
<TabItem value="ex5" label="dotMultiply(normal(5,2), 6)">
<SquiggleEditor defaultCode="dotMultiply(normal(5,2), 6)" />
2022-05-01 19:04:00 +00:00
</TabItem>
</Tabs>
### Arguments
- `value`: Number
## Beta
2022-05-01 19:04:00 +00:00
`beta(alpha:number, beta:number)`
Creates a [beta distribution](https://en.wikipedia.org/wiki/Beta_distribution) with the given `alpha` and `beta` values. For a good summary of the beta distribution, see [this explanation](https://stats.stackexchange.com/a/47782) on Stack Overflow.
<Tabs>
<TabItem value="ex1" label="beta(10, 20)" default>
<SquiggleEditor defaultCode="beta(10,20)" />
</TabItem>
2022-05-01 19:04:00 +00:00
<TabItem value="ex2" label="beta(1000, 1000)">
<SquiggleEditor defaultCode="beta(1000, 2000)" />
</TabItem>
2022-05-01 19:04:00 +00:00
<TabItem value="ex3" label="beta(1, 10)">
<SquiggleEditor defaultCode="beta(1, 10)" />
</TabItem>
2022-05-01 19:04:00 +00:00
<TabItem value="ex4" label="beta(10, 1)">
<SquiggleEditor defaultCode="beta(10, 1)" />
</TabItem>
2022-05-01 19:04:00 +00:00
<TabItem value="ex5" label="beta(0.8, 0.8)">
<SquiggleEditor defaultCode="beta(0.8, 0.8)" />
</TabItem>
</Tabs>
### Arguments
2022-05-01 12:09:34 +00:00
- `alpha`: Number greater than zero
- `beta`: Number greater than zero
<Admonition type="caution" title="Caution with small numbers">
<p>
2022-05-01 19:04:00 +00:00
Squiggle struggles to show beta distributions when either alpha or beta are
below 1.0. This is because the tails at ~0.0 and ~1.0 are very high. Using a
log scale for the y-axis helps here.
</p>
2022-05-01 19:04:00 +00:00
<details>
<summary>Examples</summary>
<Tabs>
<TabItem value="ex1" label="beta(0.3, 0.3)" default>
<SquiggleEditor defaultCode="beta(0.3, 0.3)" />
2022-05-01 19:04:00 +00:00
</TabItem>
<TabItem value="ex2" label="beta(0.5, 0.5)">
<SquiggleEditor defaultCode="beta(0.5, 0.5)" />
2022-05-01 19:04:00 +00:00
</TabItem>
<TabItem value="ex3" label="beta(0.8, 0.8)">
<SquiggleEditor defaultCode="beta(.8,.8)" />
2022-05-01 19:04:00 +00:00
</TabItem>
<TabItem value="ex4" label="beta(0.9, 0.9)">
<SquiggleEditor defaultCode="beta(.9,.9)" />
2022-05-01 19:04:00 +00:00
</TabItem>
</Tabs>
</details>
</Admonition>
## Exponential
2022-05-01 19:04:00 +00:00
`exponential(rate:number)`
Creates an [exponential distribution](https://en.wikipedia.org/wiki/Exponential_distribution) with the given rate.
<SquiggleEditor defaultCode="exponential(4)" />
### Arguments
2022-05-01 19:04:00 +00:00
2022-05-01 12:09:34 +00:00
- `rate`: Number greater than zero
## Triangular distribution
2022-05-01 19:04:00 +00:00
`triangular(low:number, mode:number, high:number)`
Creates a [triangular distribution](https://en.wikipedia.org/wiki/Triangular_distribution) with the given low, mode, and high values.
### Arguments
2022-05-01 19:04:00 +00:00
2022-05-01 12:09:34 +00:00
- `low`: Number
- `mode`: Number greater than `low`
- `high`: Number greater than `mode`
<SquiggleEditor defaultCode="triangular(1, 2, 4)" />
## FromSamples
2022-05-01 19:04:00 +00:00
`fromSamples(samples:number[])`
2022-05-01 12:09:34 +00:00
Creates a sample set distribution using an array of samples.
<SquiggleEditor defaultCode="fromSamples([1,2,3,4,6,5,5,5])" />
2022-05-01 12:09:34 +00:00
### Arguments
2022-05-01 19:04:00 +00:00
- `samples`: An array of at least 5 numbers.
2022-05-03 15:06:53 +00:00
<Admonition type="caution" title="Caution!">
<p>
Samples are converted into{" "}
<a href="https://en.wikipedia.org/wiki/Probability_density_function">PDF</a>{" "}
shapes automatically using{" "}
<a href="https://en.wikipedia.org/wiki/Kernel_density_estimation">
kernel density estimation
</a>{" "}
and an approximated bandwidth. Eventually Squiggle will allow for more
specificity.
</p>
</Admonition>