squiggle/packages/website/docs/Internal/Invariants.md

---
title: Invariants
urlcolor: blue
author:
  - Nuño Sempere
  - Quinn Dougherty
abstract: This document outlines some properties about algebraic combinations of distributions. It is meant to facilitate property tests for [Squiggle](https://squiggle-language.com/), an estimation language for forecasters. So far, we are focusing on the means, the standard deviation and the shape of the pdfs.
---

Invariants to check with property tests.

_This document right now is normative and aspirational, not a description of the testing that's currently done_.

# Algebraic combinations

The academic keyword to search for in relation to this document is "[algebra of random variables](https://wikiless.org/wiki/Algebra_of_random_variables?lang=en)". Squiggle doesn't yet support getting the standard deviation, denoted by $\sigma$, but such support could yet be added.

## Means and standard deviations

### Sums

$$
mean(f+g) = mean(f) + mean(g)
$$

$$
\sigma(f+g) = \sqrt{\sigma(f)^2 + \sigma(g)^2}
$$

In the case of normal distributions,

$$
mean(normal(a,b) + normal(c,d)) = mean(normal(a+c, \sqrt{b^2 + d^2}))
$$

### Subtractions

$$
mean(f-g) = mean(f) - mean(g)
$$

$$
\sigma(f-g) = \sqrt{\sigma(f)^2 + \sigma(g)^2}
$$

### Multiplications

$$
mean(f \cdot g) = mean(f) \cdot mean(g)
$$

$$
\sigma(f \cdot g) = \sqrt{ (\sigma(f)^2 + mean(f)) \cdot (\sigma(g)^2 + mean(g)) - (mean(f) \cdot mean(g))^2}
$$

### Divisions

Divisions are tricky, and in general we don't have good expressions to characterize properties of ratios. In particular, the ratio of two normals is a Cauchy distribution, which doesn't have to have a mean.

## Probability density functions (pdfs)

Specifying the pdf of the sum/multiplication/... of distributions as a function of the pdfs of the individual arguments can still be done. But it requires integration. My sense is that this is still doable, and I (Nuño) provide some _pseudocode_ to do this.

### Sums

Let $f, g$ be two independently distributed functions. Then, the pdf of their sum, evaluated at a point $z$, expressed as $(f + g)(z)$, is given by:

$$
(f + g)(z)= \int_{-\infty}^{\infty} f(x)\cdot g(z-x) \,dx
$$

See a proof sketch [here](https://www.milefoot.com/math/stat/rv-sums.htm)

Here is some pseudocode to approximate this:

```js
// pdf1 and pdf2 are pdfs,
// and cdf1 and cdf2 are their corresponding cdfs

let epsilonForBounds = 2 ** -16;
let getBounds = (cdf) => {
  let cdf_min = -1;
  let cdf_max = 1;
  let n = 0;
  while (
    (cdf(cdf_min) > epsilonForBounds || 1 - cdf(cdf_max) > epsilonForBounds) &&
    n < 10
  ) {
    if (cdf(cdf_min) > epsilonForBounds) {
      cdf_min = cdf_min * 2;
    }
    if (1 - cdf(cdf_max) > epsilonForBounds) {
      cdf_max = cdf_max * 2;
    }
  }
  return [cdf_min, cdf_max];
};

let epsilonForIntegrals = 2 ** -16;
let pdfOfSum = (pdf1, pdf2, cdf1, cdf2, z) => {
  let bounds1 = getBounds(cdf1);
  let bounds2 = getBounds(cdf2);
  let bounds = [
    Math.min(bounds1[0], bounds2[0]),
    Math.max(bounds1[1], bounds2[1]),
  ];

  let result = 0;
  for (let x = bounds[0]; (x = x + epsilonForIntegrals); x < bounds[1]) {
    let delta = pdf1(x) * pdf2(z - x);
    result = result + delta * epsilonForIntegrals;
  }
  return result;
};
```

## Cumulative density functions

TODO

## Inverse cumulative density functions

TODO

# `pdf`, `cdf`, and `inv`

With $\forall dist, pdf := x \mapsto \texttt{pdf}(dist, x) \land cdf := x \mapsto \texttt{cdf}(dist, x) \land inv := p \mapsto \texttt{inv}(dist, p)$,

## `cdf` and `inv` are inverses

$$
\forall x \in (0,1), cdf(inv(x)) = x \land \forall x \in \texttt{dom}(cdf), x = inv(cdf(x))
$$

## The codomain of `cdf` equals the open interval `(0,1)` equals the codomain of `pdf`

$$
\texttt{cod}(cdf) = (0,1) = \texttt{cod}(pdf)
$$

# To do:

- Provide sources or derivations, useful as this document becomes more complicated
- Provide definitions for the probability density function, exponential, inverse, log, etc.
- Provide at least some tests for division
- See if playing around with characteristic functions turns out anything useful
migrated `invariants.md` to `docusaurus` 2022-04-13 18:46:01 +00:00			`---`
removed pull request trigger from codeql analysis 2022-04-20 15:55:56 +00:00			`title: Invariants`
migrated `invariants.md` to `docusaurus` 2022-04-13 18:46:01 +00:00			`urlcolor: blue`
			`author:`
			`- Nuño Sempere`
			`- Quinn Dougherty`
			`abstract: This document outlines some properties about algebraic combinations of distributions. It is meant to facilitate property tests for [Squiggle](https://squiggle-language.com/), an estimation language for forecasters. So far, we are focusing on the means, the standard deviation and the shape of the pdfs.`
			`---`

functions reference is almost done 2022-04-20 16:09:57 +00:00			`Invariants to check with property tests.`
removed pull request trigger from codeql analysis 2022-04-20 15:55:56 +00:00
cleaned up README and ran `yarn format` 2022-04-13 18:57:01 +00:00			`_This document right now is normative and aspirational, not a description of the testing that's currently done_.`

removed pull request trigger from codeql analysis 2022-04-20 15:55:56 +00:00			`# Algebraic combinations`

migrated `invariants.md` to `docusaurus` 2022-04-13 18:46:01 +00:00			`The academic keyword to search for in relation to this document is "[algebra of random variables](https://wikiless.org/wiki/Algebra_of_random_variables?lang=en)". Squiggle doesn't yet support getting the standard deviation, denoted by $\sigma$, but such support could yet be added.`

			`## Means and standard deviations`

			`### Sums`

			`$$`
			`mean(f+g) = mean(f) + mean(g)`
			`$$`

cleaned up README and ran `yarn format` 2022-04-13 18:57:01 +00:00			`$$`
			`\sigma(f+g) = \sqrt{\sigma(f)^2 + \sigma(g)^2}`
migrated `invariants.md` to `docusaurus` 2022-04-13 18:46:01 +00:00			`$$`

			`In the case of normal distributions,`

			`$$`
			`mean(normal(a,b) + normal(c,d)) = mean(normal(a+c, \sqrt{b^2 + d^2}))`
			`$$`

			`### Subtractions`

cleaned up README and ran `yarn format` 2022-04-13 18:57:01 +00:00			`$$`
			`mean(f-g) = mean(f) - mean(g)`
migrated `invariants.md` to `docusaurus` 2022-04-13 18:46:01 +00:00			`$$`

			`$$`
			`\sigma(f-g) = \sqrt{\sigma(f)^2 + \sigma(g)^2}`
			`$$`

			`### Multiplications`

			`$$`
			`mean(f \cdot g) = mean(f) \cdot mean(g)`
			`$$`

			`$$`
			`\sigma(f \cdot g) = \sqrt{ (\sigma(f)^2 + mean(f)) \cdot (\sigma(g)^2 + mean(g)) - (mean(f) \cdot mean(g))^2}`
			`$$`

			`### Divisions`

			`Divisions are tricky, and in general we don't have good expressions to characterize properties of ratios. In particular, the ratio of two normals is a Cauchy distribution, which doesn't have to have a mean.`

			`## Probability density functions (pdfs)`

			`Specifying the pdf of the sum/multiplication/... of distributions as a function of the pdfs of the individual arguments can still be done. But it requires integration. My sense is that this is still doable, and I (Nuño) provide some _pseudocode_ to do this.`

			`### Sums`

			`Let $f, g$ be two independently distributed functions. Then, the pdf of their sum, evaluated at a point $z$, expressed as $(f + g)(z)$, is given by:`

			`$$`
			`(f + g)(z)= \int_{-\infty}^{\infty} f(x)\cdot g(z-x) \,dx`
			`$$`

			`See a proof sketch [here](https://www.milefoot.com/math/stat/rv-sums.htm)`

			`Here is some pseudocode to approximate this:`

			```js
			`// pdf1 and pdf2 are pdfs,`
			`// and cdf1 and cdf2 are their corresponding cdfs`

			`let epsilonForBounds = 2 ** -16;`
			`let getBounds = (cdf) => {`
			`let cdf_min = -1;`
			`let cdf_max = 1;`
			`let n = 0;`
			`while (`
			`(cdf(cdf_min) > epsilonForBounds \|\| 1 - cdf(cdf_max) > epsilonForBounds) &&`
			`n < 10`
			`) {`
			`if (cdf(cdf_min) > epsilonForBounds) {`
			`cdf_min = cdf_min * 2;`
			`}`
			`if (1 - cdf(cdf_max) > epsilonForBounds) {`
			`cdf_max = cdf_max * 2;`
			`}`
			`}`
			`return [cdf_min, cdf_max];`
			`};`

			`let epsilonForIntegrals = 2 ** -16;`
			`let pdfOfSum = (pdf1, pdf2, cdf1, cdf2, z) => {`
			`let bounds1 = getBounds(cdf1);`
			`let bounds2 = getBounds(cdf2);`
			`let bounds = [`
			`Math.min(bounds1[0], bounds2[0]),`
			`Math.max(bounds1[1], bounds2[1]),`
			`];`

			`let result = 0;`
			`for (let x = bounds[0]; (x = x + epsilonForIntegrals); x < bounds[1]) {`
			`let delta = pdf1(x) * pdf2(z - x);`
			`result = result + delta * epsilonForIntegrals;`
			`}`
			`return result;`
			`};`
			```

			`## Cumulative density functions`

			`TODO`

			`## Inverse cumulative density functions`

			`TODO`

functions reference is almost done 2022-04-20 16:09:57 +00:00			# `pdf`, `cdf`, and `inv`
removed pull request trigger from codeql analysis 2022-04-20 15:55:56 +00:00
functions reference is almost done 2022-04-20 16:09:57 +00:00			`With $\forall dist, pdf := x \mapsto \texttt{pdf}(dist, x) \land cdf := x \mapsto \texttt{cdf}(dist, x) \land inv := p \mapsto \texttt{inv}(dist, p)$,`
removed pull request trigger from codeql analysis 2022-04-20 15:55:56 +00:00
			## `cdf` and `inv` are inverses
functions reference is almost done 2022-04-20 16:09:57 +00:00
removed pull request trigger from codeql analysis 2022-04-20 15:55:56 +00:00			`$$`
			`\forall x \in (0,1), cdf(inv(x)) = x \land \forall x \in \texttt{dom}(cdf), x = inv(cdf(x))`
			`$$`

			## The codomain of `cdf` equals the open interval `(0,1)` equals the codomain of `pdf`
functions reference is almost done 2022-04-20 16:09:57 +00:00
removed pull request trigger from codeql analysis 2022-04-20 15:55:56 +00:00			`$$`
			`\texttt{cod}(cdf) = (0,1) = \texttt{cod}(pdf)`
			`$$`

migrated `invariants.md` to `docusaurus` 2022-04-13 18:46:01 +00:00			`# To do:`

			`- Provide sources or derivations, useful as this document becomes more complicated`
			`- Provide definitions for the probability density function, exponential, inverse, log, etc.`
			`- Provide at least some tests for division`
			`- See if playing around with characteristic functions turns out anything useful`