From c0ec3b02b7943371473d8bd561654081acfe68f3 Mon Sep 17 00:00:00 2001 From: Ozzie Gooen Date: Sat, 30 Apr 2022 14:34:00 -0400 Subject: [PATCH 01/10] Minor documentation improvements --- packages/website/docs/Features/Functions.mdx | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/packages/website/docs/Features/Functions.mdx b/packages/website/docs/Features/Functions.mdx index 1de7514b..936d8b93 100644 --- a/packages/website/docs/Features/Functions.mdx +++ b/packages/website/docs/Features/Functions.mdx @@ -5,8 +5,6 @@ sidebar_position: 7 import { SquiggleEditor } from "../../src/components/SquiggleEditor"; -_The source of truth for this document is [this file of code](https://github.com/quantified-uncertainty/squiggle/blob/develop/packages/squiggle-lang/src/rescript/ReducerInterface/ReducerInterface_GenericDistribution.res)_ - ## Inventory distributions We provide starter distributions, computed symbolically. @@ -255,8 +253,8 @@ dist1 .* dist2`} ### Pointwise division From 37047ac9ffcaf224ea9c217df90cdfd1a44a82f9 Mon Sep 17 00:00:00 2001 From: Ozzie Gooen Date: Sat, 30 Apr 2022 21:47:54 -0400 Subject: [PATCH 02/10] Starting to pull out distributions functionality --- .../website/docs/Features/Distributions.mdx | 258 ++++++++++++++++++ packages/website/docs/Features/Functions.mdx | 25 -- 2 files changed, 258 insertions(+), 25 deletions(-) create mode 100644 packages/website/docs/Features/Distributions.mdx diff --git a/packages/website/docs/Features/Distributions.mdx b/packages/website/docs/Features/Distributions.mdx new file mode 100644 index 00000000..81d8737e --- /dev/null +++ b/packages/website/docs/Features/Distributions.mdx @@ -0,0 +1,258 @@ +--- +title: "Creating Distributions" +sidebar_position: 8 +--- + +import TOCInline from "@theme/TOCInline"; +import { SquiggleEditor } from "../../src/components/SquiggleEditor"; +import Admonition from "@theme/Admonition"; +import Tabs from "@theme/Tabs"; +import TabItem from "@theme/TabItem"; + + + +## To + +`(5thPercentile: float) to (95thPercentile: float)` +`to(5thPercentile: float, 95thPercentile: float)` + +The `to` function is an easy way to generate simple distributions using predicted _5th_ and _95th_ percentiles. + +If both values are above zero, a `lognormal` distribution is used. If not, a `normal` distribution is used. + + + + When `5 to 10` is entered, both numbers are positive, so it generates a + lognormal distribution with 5th and 95th percentiles at 5 and 10. + + + + `5 to 10` does the same thing as `to(5,10)`. + + + + When `-5 to 5` is entered, there's negative values, so it generates a normal + distribution. This has 5th and 95th percentiles at 5 and 10. + + + + It's very easy to generate distributions with very long tails. If this + happens, you can click the "log x scale" box to view this using a log scale. + + + + +### Arguments + +- `5thPercentile`: Float +- `95thPercentile`: Float + + +

+ "To" is a great way to generate probability distributions very + quickly from your intuitions. It's easy to write and easy to read. It's + often a good place to begin an estimate. +

+
+ + +

+ If you haven't tried{" "} + + calibration training + + , you're likely to be overconfident. We recommend doing calibration training + to get a feel for what a 90 percent confident interval feels like. +

+
+ +## Mixture + +`mixture(...distributions: Distribution[], weights?: float[])` +`mx(...distributions: Distribution[], weights?: float[])` + +The `mixture` mixes combines multiple distributions to create a mixture. You can optionally pass in a list of proportional weights. + + + + + + + + + + + + + +### Arguments + +- `distributions`: A set of distributions or floats, each passed as a paramater. Floats will be converted into Delta distributions. +- `weights`: An optional array of floats, each representing the weight of its corresponding distribution. The weights will be re-scaled to add to `1.0`. If a weights array is provided, it must be the same length as the distribution paramaters. + +### Aliases + +- `mx` + +### Special Use Cases of Mixtures + +
+ 🕐 Zero or Continuous +

+ One common reason to have mixtures of continous and discrete distributions is to handle the special case of 0. + Say I want to model the time I will spend on some upcoming assignment. I think I have an 80% chance of doing it. +

+ +

+ In this case, I have a 20% chance of spending 0 time with it. I might estimate my hours with, +

+ +
+ +
+ 🔒 Model Uncertainty Safeguarding +

+ One technique several Foretold.io users used is to combine their main guess, with a + "just-in-case distribution". This latter distribution would have very low weight, but would be + very wide, just in case they were dramatically off for some weird reason. +

+

+ One common reason to have mixtures of continous and discrete distributions is to handle the special case of 0. + Say I want to model the time I will spend on some upcoming assignment. I think I have an 80% chance of doing it. +

+ + +
+ +## Normal + +`normal(mean:float, standardDeviation:float)` + + + + + + + + + + +### Arguments + +- `mean`: Float +- `standard deviation`: Float greater than zero + +[Wikipedia entry](https://en.wikipedia.org/wiki/Normal_distribution) + +## Log-normal + +The log of `lognormal(mu, sigma)` is a normal distribution with mean `mu` and standard deviation `sigma`. + +`lognormal(mu: float, sigma: float)` + + + +### Arguments + +- `mu`: Float +- `sigma`: Float greater than zero + +[Wikipedia](https://en.wikipedia.org/wiki/Log-normal_distribution) + +An alternative format is also available. The `to` notation creates a lognormal +distribution with a 90% confidence interval between the two numbers. We add +this convenience as lognormal distributions are commonly used in practice. + + + +#### Future feature: + +Furthermore, it's also possible to create a lognormal from it's actual mean +and standard deviation, using `lognormalFromMeanAndStdDev`. + +TODO: interpreter/parser doesn't provide this in current `develop` branch + + + +#### Validity + +- `sigma > 0` +- In `x to y` notation, `x < y` + +## Uniform + +`normal(low:float, high:float)` + + + + + + + + + + +### Arguments + +- `low`: Float +- `high`: Float greater than `low` + +## Beta + +The `beta(a, b)` function creates a beta distribution with parameters `a` and `b`: + + + +#### Validity + +- `a > 0` +- `b > 0` +- Empirically, we have noticed that numerical instability arises when `a < 1` or `b < 1` + +## Exponential + +The `exponential(rate)` function creates an exponential distribution with the given +rate. + + + +#### Validity + +- `rate > 0` + +## Triangular distribution + +The `triangular(a,b,c)` function creates a triangular distribution with lower +bound `a`, mode `b` and upper bound `c`. + +#### Validity + +- `a < b < c` + + + +### Scalar (constant dist) + +Squiggle, when the context is right, automatically casts a float to a constant distribution. + +## `fromSamples` + +The last distribution constructor takes an array of samples and constructs a sample set distribution. + + + +#### Validity + +For `fromSamples(xs)`, + +- `xs.length > 5` +- Strictly every element of `xs` must be a number. diff --git a/packages/website/docs/Features/Functions.mdx b/packages/website/docs/Features/Functions.mdx index 936d8b93..ae07e189 100644 --- a/packages/website/docs/Features/Functions.mdx +++ b/packages/website/docs/Features/Functions.mdx @@ -113,31 +113,6 @@ For `fromSamples(xs)`, Here are the ways we combine distributions. -### Mixture of distributions - -The `mixture` function combines 2 or more other distributions to create a weighted -combination of the two. The first positional arguments represent the distributions -to be combined, and the last argument is how much to weigh every distribution in the -combination. - - - -It's possible to create discrete distributions using this method. - - - -As well as mixed distributions: - - - -An alias of `mixture` is `mx` - -#### Validity - -Using javascript's variable arguments notation, consider `mx(...dists, weights)`: - -- `dists.length == weights.length` - ### Addition A horizontal right shift From 92f606b09b6b7f89602be3c28e92e0887cc670df Mon Sep 17 00:00:00 2001 From: Ozzie Gooen Date: Sat, 30 Apr 2022 22:48:57 -0400 Subject: [PATCH 03/10] Starting to pull out distributions for more specialized documentation --- .../website/docs/Features/Distributions.mdx | 150 +++++++++++------- packages/website/docs/Features/Functions.mdx | 104 ------------ 2 files changed, 95 insertions(+), 159 deletions(-) diff --git a/packages/website/docs/Features/Distributions.mdx b/packages/website/docs/Features/Distributions.mdx index 81d8737e..0331ece1 100644 --- a/packages/website/docs/Features/Distributions.mdx +++ b/packages/website/docs/Features/Distributions.mdx @@ -26,7 +26,7 @@ If both values are above zero, a `lognormal` distribution is used. If not, a `no lognormal distribution with 5th and 95th percentiles at 5 and 10. - + `5 to 10` does the same thing as `to(5,10)`. @@ -45,7 +45,7 @@ If both values are above zero, a `lognormal` distribution is used. If not, a `no ### Arguments - `5thPercentile`: Float -- `95thPercentile`: Float +- `95thPercentile`: Float, greater than `5thPercentile`

@@ -77,10 +77,10 @@ The `mixture` mixes combines multiple distributions to create a mixture. You can - + - + @@ -137,11 +137,12 @@ mx(forecast, forecast_if_completely_wrong, [1-chance_completely_wrong, chance_co `normal(mean:float, standardDeviation:float)` +Creates a [normal distribution](https://en.wikipedia.org/wiki/Normal_distribution) with the given mean and standard deviation. - + @@ -151,13 +152,13 @@ mx(forecast, forecast_if_completely_wrong, [1-chance_completely_wrong, chance_co - `mean`: Float - `standard deviation`: Float greater than zero -[Wikipedia entry](https://en.wikipedia.org/wiki/Normal_distribution) +[Wikipedia](https://en.wikipedia.org/wiki/Normal_distribution) ## Log-normal -The log of `lognormal(mu, sigma)` is a normal distribution with mean `mu` and standard deviation `sigma`. +`lognormal(mu: float, sigma: float)` -`lognormal(mu: float, sigma: float)` +Creates a [log-normal distribution](https://en.wikipedia.org/wiki/Log-normal_distribution) with the given mu and sigma. @@ -168,85 +169,124 @@ The log of `lognormal(mu, sigma)` is a normal distribution with mean `mu` and st [Wikipedia](https://en.wikipedia.org/wiki/Log-normal_distribution) -An alternative format is also available. The `to` notation creates a lognormal -distribution with a 90% confidence interval between the two numbers. We add -this convenience as lognormal distributions are commonly used in practice. +### Argument Alternatives +`Mu` and `sigma` can be difficult to directly reason about. Because of this complexity, we recommend typically using the to syntax. - - -#### Future feature: - -Furthermore, it's also possible to create a lognormal from it's actual mean -and standard deviation, using `lognormalFromMeanAndStdDev`. - -TODO: interpreter/parser doesn't provide this in current `develop` branch - - - -#### Validity - -- `sigma > 0` -- In `x to y` notation, `x < y` +

+ ❓ Understanding mu and sigma +

+ The log of `lognormal(mu, sigma)` is a normal distribution with mean `mu` and standard deviation `sigma`. For example, these two distributions are identical: +

+ +
## Uniform -`normal(low:float, high:float)` +`uniform(low:float, high:float)` - - - - - - - - +Creates a [uniform distribution](https://en.wikipedia.org/wiki/Uniform_distribution_(continuous)) with the given low and high values. + ### Arguments - `low`: Float - `high`: Float greater than `low` + +

+ While uniform distributions are very simple to understand, we find it rare to find uncertainties that actually look like this. Before using a uniform distribution, think hard about if you are really 100% confident that the paramater will not wind up being just outside the stated boundaries. +

+ +

+ One good example of a uniform distribution uncertainty would be clear physical limitations. You might have complete complete uncertainty on what time of day an event will occur, but can say with 100% confidence it will happen between the hours of 0:00 and 24:00. +

+
+ ## Beta +``beta(alpha:float, beta:float)`` -The `beta(a, b)` function creates a beta distribution with parameters `a` and `b`: +Creates a [beta distribution](https://en.wikipedia.org/wiki/Beta_distribution) with the given `alpha` and `beta` values. For a good summary of the beta distribution, see [this explanation](https://stats.stackexchange.com/a/47782) on Stack Overflow. - + + + + + + + + + + + + + + + + + -#### Validity +### Arguments -- `a > 0` -- `b > 0` -- Empirically, we have noticed that numerical instability arises when `a < 1` or `b < 1` +- `alpha`: Float greater than zero +- `beta`: Float greater than zero + + +

+ Squiggle struggles to show beta distributions when either alpha or beta are below 1.0. This is because the tails at ~0.0 and ~1.0 are very high. Using a log scale for the y-axis helps here. +

+
+ Examples + + + + + + + + + + + + + + +
+
## Exponential -The `exponential(rate)` function creates an exponential distribution with the given -rate. +``exponential(rate:float)`` - +Creates an [exponential distribution](https://en.wikipedia.org/wiki/Exponential_distribution) with the given rate. -#### Validity + -- `rate > 0` +### Arguments +- `rate`: Float greater than zero ## Triangular distribution -The `triangular(a,b,c)` function creates a triangular distribution with lower -bound `a`, mode `b` and upper bound `c`. +``triangular(low:float, mode:float, high:float)`` + +Creates a [triangular distribution](https://en.wikipedia.org/wiki/Triangular_distribution) with the given low, mode, and high values. #### Validity -- `a < b < c` +### Arguments +- `low`: Float +- `mode`: Float greater than `low` +- `high`: Float greater than `mode` -### Scalar (constant dist) +## FromSamples -Squiggle, when the context is right, automatically casts a float to a constant distribution. - -## `fromSamples` - -The last distribution constructor takes an array of samples and constructs a sample set distribution. +Creates a sample set distribution using an array of samples. diff --git a/packages/website/docs/Features/Functions.mdx b/packages/website/docs/Features/Functions.mdx index ae07e189..38872db3 100644 --- a/packages/website/docs/Features/Functions.mdx +++ b/packages/website/docs/Features/Functions.mdx @@ -5,110 +5,6 @@ sidebar_position: 7 import { SquiggleEditor } from "../../src/components/SquiggleEditor"; -## Inventory distributions - -We provide starter distributions, computed symbolically. - -### Normal distribution - -The `normal(mean, sd)` function creates a normal distribution with the given mean -and standard deviation. - - - -#### Validity - -- `sd > 0` - -### Uniform distribution - -The `uniform(low, high)` function creates a uniform distribution between the -two given numbers. - - - -#### Validity - -- `low < high` - -### Lognormal distribution - -The `lognormal(mu, sigma)` returns the log of a normal distribution with parameters -`mu` and `sigma`. The log of `lognormal(mu, sigma)` is a normal distribution with mean `mu` and standard deviation `sigma`. - - - -An alternative format is also available. The `to` notation creates a lognormal -distribution with a 90% confidence interval between the two numbers. We add -this convenience as lognormal distributions are commonly used in practice. - - - -#### Future feature: - -Furthermore, it's also possible to create a lognormal from it's actual mean -and standard deviation, using `lognormalFromMeanAndStdDev`. - -TODO: interpreter/parser doesn't provide this in current `develop` branch - - - -#### Validity - -- `sigma > 0` -- In `x to y` notation, `x < y` - -### Beta distribution - -The `beta(a, b)` function creates a beta distribution with parameters `a` and `b`: - - - -#### Validity - -- `a > 0` -- `b > 0` -- Empirically, we have noticed that numerical instability arises when `a < 1` or `b < 1` - -### Exponential distribution - -The `exponential(rate)` function creates an exponential distribution with the given -rate. - - - -#### Validity - -- `rate > 0` - -### Triangular distribution - -The `triangular(a,b,c)` function creates a triangular distribution with lower -bound `a`, mode `b` and upper bound `c`. - -#### Validity - -- `a < b < c` - - - -### Scalar (constant dist) - -Squiggle, when the context is right, automatically casts a float to a constant distribution. - -## `fromSamples` - -The last distribution constructor takes an array of samples and constructs a sample set distribution. - - - -#### Validity - -For `fromSamples(xs)`, - -- `xs.length > 5` -- Strictly every element of `xs` must be a number. - ## Operating on distributions Here are the ways we combine distributions. From ed5b7e63f281f3c8d1b054aa55f26da00ce45f8d Mon Sep 17 00:00:00 2001 From: Ozzie Gooen Date: Sun, 1 May 2022 08:09:34 -0400 Subject: [PATCH 04/10] Minor cleanup --- .../website/docs/Features/Distributions.mdx | 75 +++++++++---------- 1 file changed, 34 insertions(+), 41 deletions(-) diff --git a/packages/website/docs/Features/Distributions.mdx b/packages/website/docs/Features/Distributions.mdx index 0331ece1..5f6a85a5 100644 --- a/packages/website/docs/Features/Distributions.mdx +++ b/packages/website/docs/Features/Distributions.mdx @@ -1,5 +1,5 @@ --- -title: "Creating Distributions" +title: "Distribution Creation" sidebar_position: 8 --- @@ -13,8 +13,8 @@ import TabItem from "@theme/TabItem"; ## To -`(5thPercentile: float) to (95thPercentile: float)` -`to(5thPercentile: float, 95thPercentile: float)` +`(5thPercentile: number) to (95thPercentile: number)` +`to(5thPercentile: number, 95thPercentile: number)` The `to` function is an easy way to generate simple distributions using predicted _5th_ and _95th_ percentiles. @@ -44,8 +44,8 @@ If both values are above zero, a `lognormal` distribution is used. If not, a `no ### Arguments -- `5thPercentile`: Float -- `95thPercentile`: Float, greater than `5thPercentile` +- `5thPercentile`: number +- `95thPercentile`: number, greater than `5thPercentile`

@@ -68,8 +68,8 @@ If both values are above zero, a `lognormal` distribution is used. If not, a `no ## Mixture -`mixture(...distributions: Distribution[], weights?: float[])` -`mx(...distributions: Distribution[], weights?: float[])` +`mixture(...distributions: Distribution[], weights?: number[])` +`mx(...distributions: Distribution[], weights?: number[])` The `mixture` mixes combines multiple distributions to create a mixture. You can optionally pass in a list of proportional weights. @@ -87,8 +87,8 @@ The `mixture` mixes combines multiple distributions to create a mixture. You can ### Arguments -- `distributions`: A set of distributions or floats, each passed as a paramater. Floats will be converted into Delta distributions. -- `weights`: An optional array of floats, each representing the weight of its corresponding distribution. The weights will be re-scaled to add to `1.0`. If a weights array is provided, it must be the same length as the distribution paramaters. +- `distributions`: A set of distributions or numbers, each passed as a paramater. Numbers will be converted into Delta distributions. +- `weights`: An optional array of numbers, each representing the weight of its corresponding distribution. The weights will be re-scaled to add to `1.0`. If a weights array is provided, it must be the same length as the distribution paramaters. ### Aliases @@ -100,7 +100,7 @@ The `mixture` mixes combines multiple distributions to create a mixture. You can

🕐 Zero or Continuous

One common reason to have mixtures of continous and discrete distributions is to handle the special case of 0. - Say I want to model the time I will spend on some upcoming assignment. I think I have an 80% chance of doing it. + Say I want to model the time I will spend on some upcoming project. I think I have an 80% chance of doing it.

@@ -120,10 +120,6 @@ mx(hours_the_project_will_take, 0, [chance_of_doing_anything, 1 - chance_of_doin "just-in-case distribution". This latter distribution would have very low weight, but would be very wide, just in case they were dramatically off for some weird reason.

-

- One common reason to have mixtures of continous and discrete distributions is to handle the special case of 0. - Say I want to model the time I will spend on some upcoming assignment. I think I have an 80% chance of doing it. -

@@ -149,29 +145,28 @@ Creates a [normal distribution](https://en.wikipedia.org/wiki/Normal_distributio ### Arguments -- `mean`: Float -- `standard deviation`: Float greater than zero +- `mean`: Number +- `standard deviation`: Number greater than zero [Wikipedia](https://en.wikipedia.org/wiki/Normal_distribution) ## Log-normal -`lognormal(mu: float, sigma: float)` +`lognormal(mu: number, sigma: number)` Creates a [log-normal distribution](https://en.wikipedia.org/wiki/Log-normal_distribution) with the given mu and sigma. +`Mu` and `sigma` can be difficult to directly reason about. Because of this complexity, we recommend typically using the to syntax instead of estimating `mu` and `sigma` directly. + ### Arguments -- `mu`: Float -- `sigma`: Float greater than zero +- `mu`: Number +- `sigma`: Number greater than zero [Wikipedia](https://en.wikipedia.org/wiki/Log-normal_distribution) -### Argument Alternatives -`Mu` and `sigma` can be difficult to directly reason about. Because of this complexity, we recommend typically using the to syntax. -
❓ Understanding mu and sigma

@@ -187,15 +182,15 @@ logOfLognormal = log(lognormal(normalMean, normalStdDev)) ## Uniform -`uniform(low:float, high:float)` +`uniform(low:number, high:number)` Creates a [uniform distribution](https://en.wikipedia.org/wiki/Uniform_distribution_(continuous)) with the given low and high values. ### Arguments -- `low`: Float -- `high`: Float greater than `low` +- `low`: Number +- `high`: Number greater than `low`

@@ -208,7 +203,7 @@ Creates a [uniform distribution](https://en.wikipedia.org/wiki/Uniform_distribut ## Beta -``beta(alpha:float, beta:float)`` +``beta(alpha:number, beta:number)`` Creates a [beta distribution](https://en.wikipedia.org/wiki/Beta_distribution) with the given `alpha` and `beta` values. For a good summary of the beta distribution, see [this explanation](https://stats.stackexchange.com/a/47782) on Stack Overflow. @@ -232,8 +227,8 @@ Creates a [beta distribution](https://en.wikipedia.org/wiki/Beta_distribution) w ### Arguments -- `alpha`: Float greater than zero -- `beta`: Float greater than zero +- `alpha`: Number greater than zero +- `beta`: Number greater than zero

@@ -260,39 +255,37 @@ Creates a [beta distribution](https://en.wikipedia.org/wiki/Beta_distribution) w ## Exponential -``exponential(rate:float)`` +``exponential(rate:number)`` Creates an [exponential distribution](https://en.wikipedia.org/wiki/Exponential_distribution) with the given rate. ### Arguments -- `rate`: Float greater than zero +- `rate`: Number greater than zero ## Triangular distribution -``triangular(low:float, mode:float, high:float)`` +``triangular(low:number, mode:number, high:number)`` Creates a [triangular distribution](https://en.wikipedia.org/wiki/Triangular_distribution) with the given low, mode, and high values. #### Validity ### Arguments -- `low`: Float -- `mode`: Float greater than `low` -- `high`: Float greater than `mode` +- `low`: Number +- `mode`: Number greater than `low` +- `high`: Number greater than `mode` ## FromSamples +``fromSamples(samples:number[])`` + Creates a sample set distribution using an array of samples. -#### Validity - -For `fromSamples(xs)`, - -- `xs.length > 5` -- Strictly every element of `xs` must be a number. +### Arguments +- `samples`: An array of at least 5 numbers. \ No newline at end of file From 18af09ab0482ef1efbdd9083417524a5cd537c55 Mon Sep 17 00:00:00 2001 From: Ozzie Gooen Date: Sun, 1 May 2022 09:00:56 -0400 Subject: [PATCH 05/10] Added delta function to produce delta distributions --- .../rescript/Distributions/SymbolicDist/SymbolicDist.res | 6 ++++++ .../ReducerInterface_GenericDistribution.res | 6 ++++-- packages/squiggle-lang/src/rescript/Utility/E.res | 1 + 3 files changed, 11 insertions(+), 2 deletions(-) diff --git a/packages/squiggle-lang/src/rescript/Distributions/SymbolicDist/SymbolicDist.res b/packages/squiggle-lang/src/rescript/Distributions/SymbolicDist/SymbolicDist.res index 997506d9..94fd42ca 100644 --- a/packages/squiggle-lang/src/rescript/Distributions/SymbolicDist/SymbolicDist.res +++ b/packages/squiggle-lang/src/rescript/Distributions/SymbolicDist/SymbolicDist.res @@ -219,6 +219,12 @@ module Uniform = { module Float = { type t = float let make = t => #Float(t) + let makeSafe = t => + if E.Float.isFinite(t) { + Ok(#Float(t)) + } else { + Error("Float must be finite") + } let pdf = (x, t: t) => x == t ? 1.0 : 0.0 let cdf = (x, t: t) => x >= t ? 1.0 : 0.0 let inv = (p, t: t) => p < t ? 0.0 : 1.0 diff --git a/packages/squiggle-lang/src/rescript/ReducerInterface/ReducerInterface_GenericDistribution.res b/packages/squiggle-lang/src/rescript/ReducerInterface/ReducerInterface_GenericDistribution.res index 7f5ad1eb..8092786f 100644 --- a/packages/squiggle-lang/src/rescript/ReducerInterface/ReducerInterface_GenericDistribution.res +++ b/packages/squiggle-lang/src/rescript/ReducerInterface/ReducerInterface_GenericDistribution.res @@ -179,10 +179,12 @@ let dispatchToGenericOutput = (call: ExpressionValue.functionCall): option< > => { let (fnName, args) = call switch (fnName, args) { - | ("exponential" as fnName, [EvNumber(f1)]) => + | ("exponential" as fnName, [EvNumber(f)]) => SymbolicConstructors.oneFloat(fnName) - ->E.R.bind(r => r(f1)) + ->E.R.bind(r => r(f)) ->SymbolicConstructors.symbolicResultToOutput + | ("delta", [EvNumber(f)]) => + SymbolicDist.Float.makeSafe(f)->SymbolicConstructors.symbolicResultToOutput | ( ("normal" | "uniform" | "beta" | "lognormal" | "cauchy" | "to") as fnName, [EvNumber(f1), EvNumber(f2)], diff --git a/packages/squiggle-lang/src/rescript/Utility/E.res b/packages/squiggle-lang/src/rescript/Utility/E.res index 472c32f7..1445a80c 100644 --- a/packages/squiggle-lang/src/rescript/Utility/E.res +++ b/packages/squiggle-lang/src/rescript/Utility/E.res @@ -198,6 +198,7 @@ module Float = { let with3DigitsPrecision = Js.Float.toPrecisionWithPrecision(_, ~digits=3) let toFixed = Js.Float.toFixed let toString = Js.Float.toString + let isFinite = Js.Float.isFinite } module I = { From 8147c5ad60fd63e82f537eb146f951fdec0faf49 Mon Sep 17 00:00:00 2001 From: Ozzie Gooen Date: Sun, 1 May 2022 15:04:00 -0400 Subject: [PATCH 06/10] Minor additions of delta distribution --- .../website/docs/Features/Distributions.mdx | 121 ++++++++++++------ 1 file changed, 84 insertions(+), 37 deletions(-) diff --git a/packages/website/docs/Features/Distributions.mdx b/packages/website/docs/Features/Distributions.mdx index 5f6a85a5..5d840ee0 100644 --- a/packages/website/docs/Features/Distributions.mdx +++ b/packages/website/docs/Features/Distributions.mdx @@ -134,6 +134,7 @@ mx(forecast, forecast_if_completely_wrong, [1-chance_completely_wrong, chance_co `normal(mean:number, standardDeviation:number)` Creates a [normal distribution](https://en.wikipedia.org/wiki/Normal_distribution) with the given mean and standard deviation. + @@ -152,7 +153,7 @@ Creates a [normal distribution](https://en.wikipedia.org/wiki/Normal_distributio ## Log-normal -`lognormal(mu: number, sigma: number)` +`lognormal(mu: number, sigma: number)` Creates a [log-normal distribution](https://en.wikipedia.org/wiki/Log-normal_distribution) with the given mu and sigma. @@ -168,23 +169,28 @@ Creates a [log-normal distribution](https://en.wikipedia.org/wiki/Log-normal_dis [Wikipedia](https://en.wikipedia.org/wiki/Log-normal_distribution)

- ❓ Understanding mu and sigma + + ❓ Understanding mu and sigma +

- The log of `lognormal(mu, sigma)` is a normal distribution with mean `mu` and standard deviation `sigma`. For example, these two distributions are identical: + The log of `lognormal(mu, sigma)` is a normal distribution with mean `mu` + and standard deviation `sigma`. For example, these two distributions are + identical:

- + />
## Uniform `uniform(low:number, high:number)` -Creates a [uniform distribution](https://en.wikipedia.org/wiki/Uniform_distribution_(continuous)) with the given low and high values. +Creates a [uniform distribution]() with the given low and high values. + ### Arguments @@ -194,16 +200,52 @@ Creates a [uniform distribution](https://en.wikipedia.org/wiki/Uniform_distribut

- While uniform distributions are very simple to understand, we find it rare to find uncertainties that actually look like this. Before using a uniform distribution, think hard about if you are really 100% confident that the paramater will not wind up being just outside the stated boundaries. + While uniform distributions are very simple to understand, we find it rare + to find uncertainties that actually look like this. Before using a uniform + distribution, think hard about if you are really 100% confident that the + paramater will not wind up being just outside the stated boundaries.

- +

- One good example of a uniform distribution uncertainty would be clear physical limitations. You might have complete complete uncertainty on what time of day an event will occur, but can say with 100% confidence it will happen between the hours of 0:00 and 24:00. + One good example of a uniform distribution uncertainty would be clear + physical limitations. You might have complete complete uncertainty on what + time of day an event will occur, but can say with 100% confidence it will + happen between the hours of 0:00 and 24:00.

+## Delta + +`delta(value:number)` + +Creates a discrete distribution with all of its probability mass at point `value`. + +Numbers are often cast into delta distributions automatically. For example, in the function, +`mixture(1,2,normal(5,2))`, the first two arguments will get converted into delta distributions +with values at 1 and 2. Therefore, `mixture(1,2,normal(5,2))` is the same as `mixture(delta(1), delta(2),normal(5,2))` + + + + + + + + + + + + + + + + +### Arguments + +- `value`: Number + ## Beta -``beta(alpha:number, beta:number)`` + +`beta(alpha:number, beta:number)` Creates a [beta distribution](https://en.wikipedia.org/wiki/Beta_distribution) with the given `alpha` and `beta` values. For a good summary of the beta distribution, see [this explanation](https://stats.stackexchange.com/a/47782) on Stack Overflow. @@ -211,16 +253,16 @@ Creates a [beta distribution](https://en.wikipedia.org/wiki/Beta_distribution) w - + - + - + - + @@ -232,47 +274,51 @@ Creates a [beta distribution](https://en.wikipedia.org/wiki/Beta_distribution) w

- Squiggle struggles to show beta distributions when either alpha or beta are below 1.0. This is because the tails at ~0.0 and ~1.0 are very high. Using a log scale for the y-axis helps here. + Squiggle struggles to show beta distributions when either alpha or beta are + below 1.0. This is because the tails at ~0.0 and ~1.0 are very high. Using a + log scale for the y-axis helps here.

-
- Examples - - - - - - - - - - - - - - -
+
+ Examples + + + + + + + + + + + + + + +
## Exponential -``exponential(rate:number)`` +`exponential(rate:number)` Creates an [exponential distribution](https://en.wikipedia.org/wiki/Exponential_distribution) with the given rate. ### Arguments + - `rate`: Number greater than zero ## Triangular distribution -``triangular(low:number, mode:number, high:number)`` +`triangular(low:number, mode:number, high:number)` Creates a [triangular distribution](https://en.wikipedia.org/wiki/Triangular_distribution) with the given low, mode, and high values. #### Validity ### Arguments + - `low`: Number - `mode`: Number greater than `low` - `high`: Number greater than `mode` @@ -281,11 +327,12 @@ Creates a [triangular distribution](https://en.wikipedia.org/wiki/Triangular_dis ## FromSamples -``fromSamples(samples:number[])`` +`fromSamples(samples:number[])` Creates a sample set distribution using an array of samples. ### Arguments -- `samples`: An array of at least 5 numbers. \ No newline at end of file + +- `samples`: An array of at least 5 numbers. From b28df258e1a682cd6663ff19fd839536d6b59404 Mon Sep 17 00:00:00 2001 From: Ozzie Gooen Date: Tue, 3 May 2022 11:06:53 -0400 Subject: [PATCH 07/10] Ran formatter --- .../SymbolicDist/SymbolicDist.res | 2 +- .../website/docs/Features/Distributions.mdx | 31 ++++++++++++++----- 2 files changed, 25 insertions(+), 8 deletions(-) diff --git a/packages/squiggle-lang/src/rescript/Distributions/SymbolicDist/SymbolicDist.res b/packages/squiggle-lang/src/rescript/Distributions/SymbolicDist/SymbolicDist.res index 94fd42ca..58129d0b 100644 --- a/packages/squiggle-lang/src/rescript/Distributions/SymbolicDist/SymbolicDist.res +++ b/packages/squiggle-lang/src/rescript/Distributions/SymbolicDist/SymbolicDist.res @@ -230,7 +230,7 @@ module Float = { let inv = (p, t: t) => p < t ? 0.0 : 1.0 let mean = (t: t) => Ok(t) let sample = (t: t) => t - let toString = Js.Float.toString + let toString = (t:t) => j`Delta($t)` } module From90thPercentile = { diff --git a/packages/website/docs/Features/Distributions.mdx b/packages/website/docs/Features/Distributions.mdx index 5d840ee0..c57ed075 100644 --- a/packages/website/docs/Features/Distributions.mdx +++ b/packages/website/docs/Features/Distributions.mdx @@ -220,9 +220,12 @@ Creates a [uniform distribution]( @@ -234,8 +237,11 @@ with values at 1 and 2. Therefore, `mixture(1,2,normal(5,2))` is the same as `mi - - + + + + + @@ -315,8 +321,6 @@ Creates an [exponential distribution](https://en.wikipedia.org/wiki/Exponential_ Creates a [triangular distribution](https://en.wikipedia.org/wiki/Triangular_distribution) with the given low, mode, and high values. -#### Validity - ### Arguments - `low`: Number @@ -336,3 +340,16 @@ Creates a sample set distribution using an array of samples. ### Arguments - `samples`: An array of at least 5 numbers. + + +

+ Samples are converted into{" "} + PDF{" "} + shapes automatically using{" "} + + kernel density estimation + {" "} + and an approximated bandwidth. Eventually Squiggle will allow for more + specificity. +

+
From 02a2d96f8f28f8ab67cac5b5b89e7052e8a11277 Mon Sep 17 00:00:00 2001 From: Ozzie Gooen Date: Tue, 3 May 2022 11:12:44 -0400 Subject: [PATCH 08/10] Added appropriate code blocsk --- .../website/docs/Features/Distributions.mdx | 19 +++++++++++-------- 1 file changed, 11 insertions(+), 8 deletions(-) diff --git a/packages/website/docs/Features/Distributions.mdx b/packages/website/docs/Features/Distributions.mdx index c57ed075..2ade1a3c 100644 --- a/packages/website/docs/Features/Distributions.mdx +++ b/packages/website/docs/Features/Distributions.mdx @@ -22,17 +22,19 @@ If both values are above zero, a `lognormal` distribution is used. If not, a `no - When `5 to 10` is entered, both numbers are positive, so it generates a - lognormal distribution with 5th and 95th percentiles at 5 and 10. + When 5 to 10 is entered, both numbers are positive, so it + generates a lognormal distribution with 5th and 95th percentiles at 5 and + 10. - `5 to 10` does the same thing as `to(5,10)`. + 5 to 10 does the same thing as to(5,10). - When `-5 to 5` is entered, there's negative values, so it generates a normal - distribution. This has 5th and 95th percentiles at 5 and 10. + When -5 to 5 is entered, there's negative values, so it + generates a normal distribution. This has 5th and 95th percentiles at 5 and + 10. @@ -173,9 +175,10 @@ Creates a [log-normal distribution](https://en.wikipedia.org/wiki/Log-normal_dis ❓ Understanding mu and sigma

- The log of `lognormal(mu, sigma)` is a normal distribution with mean `mu` - and standard deviation `sigma`. For example, these two distributions are - identical: + The log of lognormal(mu, sigma) is a normal distribution with + mean mu + and standard deviation sigma. For example, these two distributions + are identical:

Date: Tue, 3 May 2022 11:30:00 -0400 Subject: [PATCH 09/10] Formatted rescript --- .../src/rescript/Distributions/SymbolicDist/SymbolicDist.res | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/packages/squiggle-lang/src/rescript/Distributions/SymbolicDist/SymbolicDist.res b/packages/squiggle-lang/src/rescript/Distributions/SymbolicDist/SymbolicDist.res index 58129d0b..a4704a34 100644 --- a/packages/squiggle-lang/src/rescript/Distributions/SymbolicDist/SymbolicDist.res +++ b/packages/squiggle-lang/src/rescript/Distributions/SymbolicDist/SymbolicDist.res @@ -230,7 +230,7 @@ module Float = { let inv = (p, t: t) => p < t ? 0.0 : 1.0 let mean = (t: t) => Ok(t) let sample = (t: t) => t - let toString = (t:t) => j`Delta($t)` + let toString = (t: t) => j`Delta($t)` } module From90thPercentile = { From 526ee921b5d9abab381fbc687440277fe4c563da Mon Sep 17 00:00:00 2001 From: NunoSempere Date: Tue, 3 May 2022 17:22:08 -0400 Subject: [PATCH 10/10] tweak: some tweaks to documentation, part 1/2 --- .../Three-Formats-Of-Distributions.md | 19 ++++++----- .../website/docs/Features/Distributions.mdx | 4 ++- packages/website/docs/Features/Functions.mdx | 33 ++++++++++++++----- packages/website/docs/Features/Language.mdx | 16 ++++----- .../website/docs/Features/Node-Packages.md | 2 +- 5 files changed, 47 insertions(+), 27 deletions(-) diff --git a/packages/website/docs/Discussions/Three-Formats-Of-Distributions.md b/packages/website/docs/Discussions/Three-Formats-Of-Distributions.md index 8ec5b88d..405bc97c 100644 --- a/packages/website/docs/Discussions/Three-Formats-Of-Distributions.md +++ b/packages/website/docs/Discussions/Three-Formats-Of-Distributions.md @@ -11,7 +11,7 @@ _Symbolic_ formats are just the math equations. `normal(5,3)` is the symbolic re When you sample distributions (usually starting with symbolic formats), you get lists of samples. Monte Carlo techniques return lists of samples. Let’s call this the “_Sample Set_” format. -Lastly is what I’ll refer to as the _Graph_ format. It describes the coordinates, or the shape, of the distribution. You can save these formats in JSON, for instance, like, `{xs: [1, 2, 3, 4…], ys: [.0001, .0003, .002, …]}`. +Lastly is what I’ll refer to as the _Graph_ format. It describes the coordinates, or the shape, of the distribution. You can save these formats in JSON, for instance, like, `{xs: [1, 2, 3, 4, …], ys: [.0001, .0003, .002, …]}`. Symbolic, Sample Set, and Graph formats all have very different advantages and disadvantages. @@ -19,7 +19,7 @@ Note that the name "Symbolic" is fairly standard, but I haven't found common nam ## Symbolic Formats -**TLDR** +**TL;DR** Mathematical representations. Require analytic solutions. These are often ideal where they can be applied, but apply to very few actual functions. Typically used sparsely, except for the starting distributions (before any computation is performed). **Examples** @@ -29,9 +29,6 @@ Mathematical representations. Require analytic solutions. These are often ideal **How to Do Computation** To perform calculations of symbolic systems, you need to find analytical solutions. For example, there are equations to find the pdf or cdf of most distribution shapes at any point. There are also lots of simplifications that could be done in particular situations. For example, there’s an analytical solution for combining normal distributions. -**Special: The Metalog Distribution** -The Metalog distribution seems like it can represent almost any reasonable distribution. It’s symbolic. This is great for storage, but it’s not clear if it helps with calculation. My impression is that we don’t have symbolic ways of doing most functions (addition, multiplication, etc) on metalog distributions. Also, note that it can take a fair bit of computation to fit a shape to the Metalog distribution. - **Advantages** - Maximally compressed; i.e. very easy to store. @@ -54,10 +51,14 @@ The Metalog distribution seems like it can represent almost any reasonable distr **How to Visualize** Convert to graph, then display that. (Optionally, you can also convert to samples, then display those using a histogram, but this is often worse you have both options.) +**Bonus: The Metalog Distribution** + +The Metalog distribution seems like it can represent almost any reasonable distribution. It’s symbolic. This is great for storage, but it’s not clear if it helps with calculation. My impression is that we don’t have symbolic ways of doing most functions (addition, multiplication, etc) on metalog distributions. Also, note that it can take a fair bit of computation to fit a shape to the Metalog distribution. + ## Graph Formats -**TLDR** -Lists of the x-y coordinates of the shape of a distribution. (Usually the pdf, which is more compressed than the cdf). Some key functions (like pdf, cdf) and manipulations can work on almost any graphally-described distribution. +**TL;DR** +Lists of the x-y coordinates of the shape of a distribution. (Usually the pdf, which is more compressed than the cdf). Some key functions (like pdf, cdf) and manipulations can work on almost any graphically-described distribution. **Alternative Names:** Grid, Mesh, Graph, Vector, Pdf, PdfCoords/PdfPoints, Discretised, Bezier, Curve @@ -77,7 +78,7 @@ Use graph techniques. These can be fairly computationally-intensive (particularl **Disadvantages** -- Most calculations are infeasible/impossible to perform graphally. In these cases, you need to use sampling. +- Most calculations are infeasible/impossible to perform graphically. In these cases, you need to use sampling. - Not as accurate or fast as symbolic methods, where the symbolic methods are applicable. - The tails get cut off, which is subideal. It’s assumed that the value of the pdf outside of the bounded range is exactly 0, which is not correct. (Note: If you have ideas on how to store graph formats that don’t cut off tails, let me know) @@ -108,7 +109,7 @@ Use graph techniques. These can be fairly computationally-intensive (particularl ## Sample Set Formats -**TLDR** +**TL;DR** Random samples. Use Monte Carlo simulation to perform calculations. This is the predominant technique using Monte Carlo methods; in these cases, most nodes are essentially represented as sample sets. [Guesstimate](https://www.getguesstimate.com/) works this way. **How to Do Computation** diff --git a/packages/website/docs/Features/Distributions.mdx b/packages/website/docs/Features/Distributions.mdx index 2ade1a3c..28e1db01 100644 --- a/packages/website/docs/Features/Distributions.mdx +++ b/packages/website/docs/Features/Distributions.mdx @@ -159,7 +159,9 @@ Creates a [normal distribution](https://en.wikipedia.org/wiki/Normal_distributio Creates a [log-normal distribution](https://en.wikipedia.org/wiki/Log-normal_distribution) with the given mu and sigma. -`Mu` and `sigma` can be difficult to directly reason about. Because of this complexity, we recommend typically using the to syntax instead of estimating `mu` and `sigma` directly. +`Mu` and `sigma` represent the mean and standard deviation of the normal which results when +you take the log of our lognormal distribution. They can be difficult to directly reason about. +Because of this complexity, we recommend typically using the to syntax instead of estimating `mu` and `sigma` directly. diff --git a/packages/website/docs/Features/Functions.mdx b/packages/website/docs/Features/Functions.mdx index 38872db3..46bc4e39 100644 --- a/packages/website/docs/Features/Functions.mdx +++ b/packages/website/docs/Features/Functions.mdx @@ -11,7 +11,9 @@ Here are the ways we combine distributions. ### Addition -A horizontal right shift +A horizontal right shift. The addition operation represents the distribution of the sum of +the value of one random sample chosen from the first distribution and the value one random sample +chosen from the second distribution. @@ -68,6 +80,8 @@ exp(dist)`} ### Taking logarithms +A projection over a stretched x-axis. + @@ -201,7 +218,7 @@ Or `PointSet` format Above, we saw the unary `toSampleSet`, which uses an internal hardcoded number of samples. If you'd like to provide the number of samples, it has a binary signature as well (floored) - + #### Validity @@ -241,7 +258,7 @@ You can cut off from the left You can cut off from the right - + You can cut off from both sides diff --git a/packages/website/docs/Features/Language.mdx b/packages/website/docs/Features/Language.mdx index 5b66d2e2..74c703ae 100644 --- a/packages/website/docs/Features/Language.mdx +++ b/packages/website/docs/Features/Language.mdx @@ -7,21 +7,21 @@ import { SquiggleEditor } from "../../src/components/SquiggleEditor"; ## Expressions -A distribution +### Distributions -A number +### Numbers - + -Arrays +### Arrays -Records +### Records = We can define functions ## See more diff --git a/packages/website/docs/Features/Node-Packages.md b/packages/website/docs/Features/Node-Packages.md index ab590c32..381cef1f 100644 --- a/packages/website/docs/Features/Node-Packages.md +++ b/packages/website/docs/Features/Node-Packages.md @@ -30,7 +30,7 @@ this library to help navigate the return type. The `@quri/squiggle-components` package offers several components and utilities for people who want to embed Squiggle components into websites. This documentation -relies on `@quri/squiggle-components` frequently. +uses `@quri/squiggle-components` frequently. We host [a storybook](https://squiggle-components.netlify.app/) with details and usage of each of the components made available.