feat: Added notes over pdfs

as well as cleaned up the doc a bit note: I don't really like the term "invariants"
2022-04-12 21:24:52 -04:00 · 2022-04-12 21:24:52 -04:00 · 6a83554086
commit 6a83554086
parent f3a73a9147
2 changed files with 81 additions and 17 deletions
--- a/packages/squiggle-lang/tests/docs/invariants.md
+++ b/packages/squiggle-lang/tests/docs/invariants.md
@ -1,15 +1,21 @@
-# Squiggle invariants
+---
+title: Statistical properties of algebraic combinations of distributions for property testing.
+urlcolor: blue
+author: 
+- Nuño Sempere
+- Quinn Dougherty
+abstract: This document outlines some properties about algebraic combinations of distributions. It is meant to facilitate property tests for [Squiggle](https://squiggle-language.com/), an estimation language for forecasters. So far, we are focusing on the means, the standard deviation and the shape of the pdfs.

-Here are some property tests for squiggle. I am testing mostly for the mean and the standard deviation. I know that squiggle doesn't yet have functions for the standard deviation, but they could be added.
+---

-The keywords to search for are "[algebra of random variables](https://wikiless.org/wiki/Algebra_of_random_variables?lang=en)". 
+The academic keyword to search for in relation to this document is "[algebra of random variables](https://wikiless.org/wiki/Algebra_of_random_variables?lang=en)". Squiggle doesn't yet support getting the standard deviation, denoted by $\sigma$, but such support could yet be added. 

 ## Means and standard deviations
 ### Sums

 $$ mean(f+g) = mean(f) + mean(g) $$

-$$ std(f+g) = \sqrt{std(f)^2 + std(g)^2} $$
+$$ \sigma(f+g) = \sqrt{\sigma(f)^2 + \sigma(g)^2} $$

 In the case of normal distributions,

@ -19,28 +25,77 @@ $$ mean(normal(a,b) + normal(c,d)) = mean(normal(a+c, \sqrt{b^2 + d^2})) $$

 $$ mean(f-g) = mean(f) - mean(g) $$

-$$ std(f-g) = \sqrt{std(f)^2 + std(g)^2} $$
+$$ \sigma(f-g) = \sqrt{\sigma(f)^2 + \sigma(g)^2} $$

 ### Multiplications

 $$ mean(f \cdot g) =  mean(f) \cdot mean(g) $$

-$$ std(f \cdot g) = \sqrt{ (std(f)^2 + mean(f)) \cdot (std(g)^2 + mean(g)) - (mean(f) \cdot mean(g))^2} $$
+$$ \sigma(f \cdot g) = \sqrt{ (\sigma(f)^2 + mean(f)) \cdot (\sigma(g)^2 + mean(g)) - (mean(f) \cdot mean(g))^2} $$

 ### Divisions

 Divisions are tricky, and in general we don't have good expressions to characterize properties of ratios. In particular, the ratio of two normals is a Cauchy distribution, which doesn't have to have a mean.

-# To do:
+## Probability density functions (pdfs)

- Provide sources or derivations, useful as this document becomes more complicated
- Provide definitions for the probability density function, exponential, inverse, log, etc.
- Provide at least some tests for division
- See if playing around with characteristic functions turns out anything useful
+Specifying the pdf of the sum/multiplication/... of distributions as a function of the pdfs of the individual arguments can still be done. But it requires integration. My sense is that this is still doable, and I (Nuño) provide some *pseudocode* to do this.

-## Probability density functions
+### Sums

-TODO
+Let $f, g$ be two independently distributed functions. Then, the pdf of their sum, evaluated at a point $z$, expressed as $(f + g)(z)$, is given by:
+
+$$ (f + g)(z)= \int_{-\infty}^{\infty} f(x)\cdot g(z-x) \,dx  $$
+
+See a proof sketch [here](https://www.milefoot.com/math/stat/rv-sums.htm)
+
+Here is some pseudocode to approximate this:
+
+```js
+
+// pdf1 and pdf2 are pdfs, 
+// and cdf1 and cdf2 are their corresponding cdfs
+
+let epsilonForBounds = 2**(-16)
+let getBounds  = cdf => {
+  let cdf_min = -1
+  let cdf_max = 1
+  let n=0
+  while(
+    ( 
+      cdf(cdf_min) > epsilonForBounds || 
+      ( 1 - cdf(cdf_max) ) > epsilonForBounds 
+    ) && 
+    n < 10
+  ){
+    if(cdf(cdf_min) > epsilonForBounds){
+      cdf_min = cdf_min * 2
+    }
+    if((1-cdf(cdf_max)) > epsilonForBounds){
+      cdf_max = cdf_max * 2
+    }
+  }
+  return [cdf_min, cdf_max]
+}
+
+let epsilonForIntegrals = 2**(-16)
+let pdfOfSum = (pdf1, pdf2, cdf1, cdf2, z) => {
+  let bounds1 = getBounds(cdf1)
+  let bounds2 = getBounds(cdf2)
+  let bounds = [
+    Math.min(bounds1[0], bounds2[0]), 
+    Math.max(bounds1[1], bounds2[1])
+  ]
+  
+  let result = 0
+  for(let x = bounds[0]; x=x+epsilonForIntegrals; x<bounds[1]){
+      let delta = pdf1(x) * pdf2(z-x)
+      result = result + delta * epsilonForIntegrals
+  }
+  return result
+}
+
+```

 ## Cumulative density functions 

@ -49,3 +104,12 @@ TODO
 ## Inverse cumulative density functions

 TODO
+
+
+# To do:
+
+- Provide sources or derivations, useful as this document becomes more complicated
+- Provide definitions for the probability density function, exponential, inverse, log, etc.
+- Provide at least some tests for division
+- See if playing around with characteristic functions turns out anything useful
+
--- a/packages/squiggle-lang/tests/docs/invariants.pdf
+++ b/packages/squiggle-lang/tests/docs/invariants.pdf