feat: Added notes over pdfs

as well as cleaned up the doc a bit

note: I don't really like the term "invariants"
This commit is contained in:
NunoSempere 2022-04-12 21:24:52 -04:00
parent f3a73a9147
commit 6a83554086
2 changed files with 81 additions and 17 deletions

View File

@ -1,15 +1,21 @@
# Squiggle invariants ---
title: Statistical properties of algebraic combinations of distributions for property testing.
urlcolor: blue
author:
- Nuño Sempere
- Quinn Dougherty
abstract: This document outlines some properties about algebraic combinations of distributions. It is meant to facilitate property tests for [Squiggle](https://squiggle-language.com/), an estimation language for forecasters. So far, we are focusing on the means, the standard deviation and the shape of the pdfs.
Here are some property tests for squiggle. I am testing mostly for the mean and the standard deviation. I know that squiggle doesn't yet have functions for the standard deviation, but they could be added. ---
The keywords to search for are "[algebra of random variables](https://wikiless.org/wiki/Algebra_of_random_variables?lang=en)". The academic keyword to search for in relation to this document is "[algebra of random variables](https://wikiless.org/wiki/Algebra_of_random_variables?lang=en)". Squiggle doesn't yet support getting the standard deviation, denoted by $\sigma$, but such support could yet be added.
## Means and standard deviations ## Means and standard deviations
### Sums ### Sums
$$ mean(f+g) = mean(f) + mean(g) $$ $$ mean(f+g) = mean(f) + mean(g) $$
$$ std(f+g) = \sqrt{std(f)^2 + std(g)^2} $$ $$ \sigma(f+g) = \sqrt{\sigma(f)^2 + \sigma(g)^2} $$
In the case of normal distributions, In the case of normal distributions,
@ -19,28 +25,77 @@ $$ mean(normal(a,b) + normal(c,d)) = mean(normal(a+c, \sqrt{b^2 + d^2})) $$
$$ mean(f-g) = mean(f) - mean(g) $$ $$ mean(f-g) = mean(f) - mean(g) $$
$$ std(f-g) = \sqrt{std(f)^2 + std(g)^2} $$ $$ \sigma(f-g) = \sqrt{\sigma(f)^2 + \sigma(g)^2} $$
### Multiplications ### Multiplications
$$ mean(f \cdot g) = mean(f) \cdot mean(g) $$ $$ mean(f \cdot g) = mean(f) \cdot mean(g) $$
$$ std(f \cdot g) = \sqrt{ (std(f)^2 + mean(f)) \cdot (std(g)^2 + mean(g)) - (mean(f) \cdot mean(g))^2} $$ $$ \sigma(f \cdot g) = \sqrt{ (\sigma(f)^2 + mean(f)) \cdot (\sigma(g)^2 + mean(g)) - (mean(f) \cdot mean(g))^2} $$
### Divisions ### Divisions
Divisions are tricky, and in general we don't have good expressions to characterize properties of ratios. In particular, the ratio of two normals is a Cauchy distribution, which doesn't have to have a mean. Divisions are tricky, and in general we don't have good expressions to characterize properties of ratios. In particular, the ratio of two normals is a Cauchy distribution, which doesn't have to have a mean.
# To do: ## Probability density functions (pdfs)
- Provide sources or derivations, useful as this document becomes more complicated Specifying the pdf of the sum/multiplication/... of distributions as a function of the pdfs of the individual arguments can still be done. But it requires integration. My sense is that this is still doable, and I (Nuño) provide some *pseudocode* to do this.
- Provide definitions for the probability density function, exponential, inverse, log, etc.
- Provide at least some tests for division
- See if playing around with characteristic functions turns out anything useful
## Probability density functions ### Sums
TODO Let $f, g$ be two independently distributed functions. Then, the pdf of their sum, evaluated at a point $z$, expressed as $(f + g)(z)$, is given by:
$$ (f + g)(z)= \int_{-\infty}^{\infty} f(x)\cdot g(z-x) \,dx $$
See a proof sketch [here](https://www.milefoot.com/math/stat/rv-sums.htm)
Here is some pseudocode to approximate this:
```js
// pdf1 and pdf2 are pdfs,
// and cdf1 and cdf2 are their corresponding cdfs
let epsilonForBounds = 2**(-16)
let getBounds = cdf => {
let cdf_min = -1
let cdf_max = 1
let n=0
while(
(
cdf(cdf_min) > epsilonForBounds ||
( 1 - cdf(cdf_max) ) > epsilonForBounds
) &&
n < 10
){
if(cdf(cdf_min) > epsilonForBounds){
cdf_min = cdf_min * 2
}
if((1-cdf(cdf_max)) > epsilonForBounds){
cdf_max = cdf_max * 2
}
}
return [cdf_min, cdf_max]
}
let epsilonForIntegrals = 2**(-16)
let pdfOfSum = (pdf1, pdf2, cdf1, cdf2, z) => {
let bounds1 = getBounds(cdf1)
let bounds2 = getBounds(cdf2)
let bounds = [
Math.min(bounds1[0], bounds2[0]),
Math.max(bounds1[1], bounds2[1])
]
let result = 0
for(let x = bounds[0]; x=x+epsilonForIntegrals; x<bounds[1]){
let delta = pdf1(x) * pdf2(z-x)
result = result + delta * epsilonForIntegrals
}
return result
}
```
## Cumulative density functions ## Cumulative density functions
@ -49,3 +104,12 @@ TODO
## Inverse cumulative density functions ## Inverse cumulative density functions
TODO TODO
# To do:
- Provide sources or derivations, useful as this document becomes more complicated
- Provide definitions for the probability density function, exponential, inverse, log, etc.
- Provide at least some tests for division
- See if playing around with characteristic functions turns out anything useful

Binary file not shown.