Compare commits

...

7 Commits

Author SHA1 Message Date
6a83554086 feat: Added notes over pdfs
as well as cleaned up the doc a bit

note: I don't really like the term "invariants"
2022-04-12 21:24:52 -04:00
f3a73a9147 merge: invariants docs 2022-04-12 20:32:24 -04:00
8828977db7 fix: typo in algebraic properties doc 2022-04-12 20:30:40 -04:00
Quinn Dougherty
ceea4d1a5d fixed up default.nix with correct pandoc command 2022-04-12 18:27:03 -04:00
Quinn Dougherty
7e61d8890e removed commented out pkgs 2022-04-12 17:58:51 -04:00
bfae8b0e6d feat: initial properties to test squiggle validity 2022-04-12 17:50:53 -04:00
Quinn Dougherty
ecfd8deece scaffolding in place for properties.pdf 2022-04-12 17:22:11 -04:00
7 changed files with 159 additions and 0 deletions

View File

@ -19,3 +19,4 @@ yarn-error.log
dist
*.coverage
_coverage
result

View File

@ -0,0 +1,16 @@
# Properties
Using `nix`. Where `o` is `open` on OSX and `xdg-open` on linux,
```sh
nix-build
o result/property-tests.pdf
```
Without `nix`, you can install `pandoc` and `pdflatex` yourself and see `make.sh` for the rendering command.
## _Details_
The `invariants.pdf` document is _normative and aspirational_. It does not document tests as they exist in the codebase, but represents how we think squiggle ought to be tested.
We are partially bottlenecked by the rescript ecosystem's maturity with respect to property-based testing.

View File

@ -0,0 +1,26 @@
{ chan ? "a7ecde854aee5c4c7cd6177f54a99d2c1ff28a31" # 21.11
, pkgs ? import (builtins.fetchTarball { url = "https://github.com/NixOS/nixpkgs/archive/${chan}.tar.gz"; }) {}
}:
# Style sheets https://github.com/citation-style-language/styles/
with pkgs;
let deps = [
pandoc
(texlive.combine
{ inherit (texlive) scheme-small datetime; }
)
]; in
stdenv.mkDerivation {
name = "render_squiggle_properties";
src = ./.;
buildInputs = deps;
buildPhase = ''
echo rendering...
pandoc -s invariants.md -o invariants.pdf
echo rendered.
'';
installPhase = ''
mkdir -p $out
cp invariants.pdf $out/invariants.pdf
'';
}

View File

@ -0,0 +1,115 @@
---
title: Statistical properties of algebraic combinations of distributions for property testing.
urlcolor: blue
author:
- Nuño Sempere
- Quinn Dougherty
abstract: This document outlines some properties about algebraic combinations of distributions. It is meant to facilitate property tests for [Squiggle](https://squiggle-language.com/), an estimation language for forecasters. So far, we are focusing on the means, the standard deviation and the shape of the pdfs.
---
The academic keyword to search for in relation to this document is "[algebra of random variables](https://wikiless.org/wiki/Algebra_of_random_variables?lang=en)". Squiggle doesn't yet support getting the standard deviation, denoted by $\sigma$, but such support could yet be added.
## Means and standard deviations
### Sums
$$ mean(f+g) = mean(f) + mean(g) $$
$$ \sigma(f+g) = \sqrt{\sigma(f)^2 + \sigma(g)^2} $$
In the case of normal distributions,
$$ mean(normal(a,b) + normal(c,d)) = mean(normal(a+c, \sqrt{b^2 + d^2})) $$
### Subtractions
$$ mean(f-g) = mean(f) - mean(g) $$
$$ \sigma(f-g) = \sqrt{\sigma(f)^2 + \sigma(g)^2} $$
### Multiplications
$$ mean(f \cdot g) = mean(f) \cdot mean(g) $$
$$ \sigma(f \cdot g) = \sqrt{ (\sigma(f)^2 + mean(f)) \cdot (\sigma(g)^2 + mean(g)) - (mean(f) \cdot mean(g))^2} $$
### Divisions
Divisions are tricky, and in general we don't have good expressions to characterize properties of ratios. In particular, the ratio of two normals is a Cauchy distribution, which doesn't have to have a mean.
## Probability density functions (pdfs)
Specifying the pdf of the sum/multiplication/... of distributions as a function of the pdfs of the individual arguments can still be done. But it requires integration. My sense is that this is still doable, and I (Nuño) provide some *pseudocode* to do this.
### Sums
Let $f, g$ be two independently distributed functions. Then, the pdf of their sum, evaluated at a point $z$, expressed as $(f + g)(z)$, is given by:
$$ (f + g)(z)= \int_{-\infty}^{\infty} f(x)\cdot g(z-x) \,dx $$
See a proof sketch [here](https://www.milefoot.com/math/stat/rv-sums.htm)
Here is some pseudocode to approximate this:
```js
// pdf1 and pdf2 are pdfs,
// and cdf1 and cdf2 are their corresponding cdfs
let epsilonForBounds = 2**(-16)
let getBounds = cdf => {
let cdf_min = -1
let cdf_max = 1
let n=0
while(
(
cdf(cdf_min) > epsilonForBounds ||
( 1 - cdf(cdf_max) ) > epsilonForBounds
) &&
n < 10
){
if(cdf(cdf_min) > epsilonForBounds){
cdf_min = cdf_min * 2
}
if((1-cdf(cdf_max)) > epsilonForBounds){
cdf_max = cdf_max * 2
}
}
return [cdf_min, cdf_max]
}
let epsilonForIntegrals = 2**(-16)
let pdfOfSum = (pdf1, pdf2, cdf1, cdf2, z) => {
let bounds1 = getBounds(cdf1)
let bounds2 = getBounds(cdf2)
let bounds = [
Math.min(bounds1[0], bounds2[0]),
Math.max(bounds1[1], bounds2[1])
]
let result = 0
for(let x = bounds[0]; x=x+epsilonForIntegrals; x<bounds[1]){
let delta = pdf1(x) * pdf2(z-x)
result = result + delta * epsilonForIntegrals
}
return result
}
```
## Cumulative density functions
TODO
## Inverse cumulative density functions
TODO
# To do:
- Provide sources or derivations, useful as this document becomes more complicated
- Provide definitions for the probability density function, exponential, inverse, log, etc.
- Provide at least some tests for division
- See if playing around with characteristic functions turns out anything useful

Binary file not shown.

View File

@ -0,0 +1 @@
pandoc -s invariants.md -o invariants.pdf