pontificate about tests with wide lognormals

add example of getting confidence interval & misc changes
formatting pass.
2023-07-23 21:10:56 +02:00 · 2023-07-23 19:12:02 +02:00 · 2023-07-23 16:30:42 +02:00 · 2023-07-23 16:28:44 +02:00 · 2023-07-23 15:44:22 +02:00 · 2023-07-23 15:43:35 +02:00
22 changed files with 668 additions and 247 deletions
--- a/README.md
+++ b/README.md
@ -11,7 +11,8 @@ A self-contained C99 library that provides a subset of [Squiggle](https://www.sq
 - Because it can fit in my head
 - Because if you can implement something in C, you can implement it anywhere else
 - Because it can be made faster if need be
-  - e.g., with a multi-threading library like OpenMP, or by adding more algorithmic complexity
+  - e.g., with a multi-threading library like OpenMP, 
+  - or by implementing faster but more complex algorithms
  - or more simply, by inlining the sampling functions (adding an `inline` directive before their function declaration)
 - **Because there are few abstractions between it and machine code** (C => assembly => machine code with gcc, or C => machine code, with tcc), leading to fewer errors beyond the programmer's control.

@ -68,7 +69,7 @@ This library provides two approaches:
 ```C
 struct box {
    int empty;
-    float content;
+    double content;
    char* error_msg;
 };
 ```
@ -131,9 +132,9 @@ int main(){
    uint64_t* seed = malloc(sizeof(uint64_t));
    *seed = 1000; // xorshift can't start with a seed of 0

-    float a = sample_to(1, 10, seed);
-    float b = 2 * a;
-    float c = b / a;
+    double a = sample_to(1, 10, seed);
+    double b = 2 * a;
+    double c = b / a;

    printf("a: %f, b: %f, c: %f\n", a, b, c);
    // a: 0.607162, b: 1.214325, c: 0.500000
@ -153,7 +154,7 @@ vs
 #include <stdlib.h>
 #include <stdio.h>

-float draw_xyz(uint64_t* seed){
+double draw_xyz(uint64_t* seed){
    // function could also be placed inside main with gcc nested functions extension.
    return sample_to(1, 20, seed);
 }
@ -164,9 +165,9 @@ int main(){
    uint64_t* seed = malloc(sizeof(uint64_t));
    *seed = 1000; // xorshift can't start with a seed of 0

-    float a = draw_xyz(seed);
-    float b = 2 * draw_xyz(seed);
-    float c = b / a;
+    double a = draw_xyz(seed);
+    double b = 2 * draw_xyz(seed);
+    double c = b / a;

    printf("a: %f, b: %f, c: %f\n", a, b, c);
    // a: 0.522484, b: 10.283501, c: 19.681936
@ -175,6 +176,66 @@ int main(){
 }
 ```

+### Tests and the long tail of the lognormal
+
+Distribution functions can be tested with:
+
+```sh
+cd tests
+make && make run
+```
+
+`make verify` is an alias that runs all the tests and just displays the ones that are failing. 
+
+These tests are somewhat rudimentary: they get between 1M and 10M samples from a given sampling function, and check that their mean and standard deviations correspond to what they should theoretically should be.
+
+If you run `make run` (or `make verify`), you will see errors such as these:
+
+```
+[-] Mean test for normal(47211.047473, 682197.019012) NOT passed.
+Mean of normal(47211.047473, 682197.019012): 46933.673278, vs expected mean: 47211.047473
+delta: -277.374195, relative delta: -0.005910
+
+[-] Std test for lognormal(4.584666, 2.180816) NOT passed.
+Std of lognormal(4.584666, 2.180816): 11443.588861, vs expected std: 11342.434900
+delta: 101.153961, relative delta: 0.008839
+
+[-] Std test for to(13839.861856, 897828.354318) NOT passed.
+Std of to(13839.861856, 897828.354318): 495123.630575, vs expected std: 498075.002499
+delta: -2951.371925, relative delta: -0.005961
+```
+
+These tests I wouldn't worry about. Due to luck of the draw, their relative error is a bit over 0.005, or 0.5%, and so the test fails. But it would surprise me if that had some meaningful practical implication.
+
+The errors that should raise some worry are:
+
+```
+[-] Mean test for lognormal(1.210013, 4.766882) NOT passed.
+Mean of lognormal(1.210013, 4.766882): 342337.257677, vs expected mean: 288253.061628
+delta: 54084.196049, relative delta: 0.157985
+[-] Std test for lognormal(1.210013, 4.766882) NOT passed.
+Std of lognormal(1.210013, 4.766882): 208107782.972184, vs expected std: 24776840217.604111
+delta: -24568732434.631927, relative delta: -118.057730
+
+[-] Mean test for lognormal(-0.195240, 4.883106) NOT passed.
+Mean of lognormal(-0.195240, 4.883106): 87151.733198, vs expected mean: 123886.818303
+delta: -36735.085104, relative delta: -0.421507
+[-] Std test for lognormal(-0.195240, 4.883106) NOT passed.
+Std of lognormal(-0.195240, 4.883106): 33837426.331671, vs expected std: 18657000192.914921
+delta: -18623162766.583248, relative delta: -550.371727
+
+[-] Mean test for lognormal(0.644931, 4.795860) NOT passed.
+Mean of lognormal(0.644931, 4.795860): 125053.904456, vs expected mean: 188163.894101
+delta: -63109.989645, relative delta: -0.504662
+[-] Std test for lognormal(0.644931, 4.795860) NOT passed.
+Std of lognormal(0.644931, 4.795860): 39976300.711166, vs expected std: 18577298706.170452
+delta: -18537322405.459286, relative delta: -463.707799
+```
+
+What is happening in this case is that you are taking a normal, like `normal(-0.195240, 4.883106)`, and you are exponentiating it to arrive at a lognormal. But `normal(-0.195240, 4.883106)` is going to have some noninsignificant weight on, say, 18. But `exp(18) = 39976300`, and points like it are going to end up a nontrivial amount to the analytical mean and standard deviation, even though they have little probability mass.
+
+Fortunately, the reader can also check that for more plausible real-world values, like the 
+
 ## Related projects

 - [Squiggle](https://www.squiggle-language.com/)
@ -184,16 +245,10 @@ int main(){

 ## To do list

- [ ] Test summary statistics for each of the distributions.
 - [ ] Have some more complicated & realistic example
 - [ ] Add summarization functions: 90% ci (or all c.i.?) 
 - [ ] Systematize references
 - [ ] Publish online
- [ ] Add efficient sampling from a beta distribution
-  - https://dl.acm.org/doi/10.1145/358407.358414
-  - https://link.springer.com/article/10.1007/bf02293108
-  - https://stats.stackexchange.com/questions/502146/how-does-numpy-generate-samples-from-a-beta-distribution
-  - https://github.com/numpy/numpy/blob/5cae51e794d69dd553104099305e9f92db237c53/numpy/random/src/distributions/distributions.c
 - [ ] Support all distribution functions in <https://www.squiggle-language.com/docs/Api/Dist>
 - [ ] Support all distribution functions in <https://www.squiggle-language.com/docs/Api/Dist>, and do so efficiently

@ -224,3 +279,16 @@ int main(){
  - https://dl.acm.org/doi/pdf/10.1145/358407.358414
 - [x] Explain correlated samples
 - [-] ~~Add tests in Stan?~~
+- [x] Test summary statistics for each of the distributions.
+  - [x] For uniform
+  - [x] For normal
+  - [x] For lognormal
+  - [x] For lognormal (to syntax)
+  - [x] For beta distribution
+- [x] Clarify gamma/standard gamma
+- [x] Add efficient sampling from a beta distribution
+  - https://dl.acm.org/doi/10.1145/358407.358414
+  - https://link.springer.com/article/10.1007/bf02293108
+  - https://stats.stackexchange.com/questions/502146/how-does-numpy-generate-samples-from-a-beta-distribution
+  - https://github.com/numpy/numpy/blob/5cae51e794d69dd553104099305e9f92db237c53/numpy/random/src/distributions/distributions.c
+- [x] Pontificate about lognormal tests
--- a/examples/01_one_sample/example
+++ b/examples/01_one_sample/example
--- a/examples/01_one_sample/example.c
+++ b/examples/01_one_sample/example.c
@ -4,22 +4,22 @@
 #include <stdio.h>

 // Estimate functions
-float sample_0(uint64_t* seed)
+double sample_0(uint64_t* seed)
 {
    return 0;
 }

-float sample_1(uint64_t* seed)
+double sample_1(uint64_t* seed)
 {
    return 1;
 }

-float sample_few(uint64_t* seed)
+double sample_few(uint64_t* seed)
 {
    return sample_to(1, 3, seed);
 }

-float sample_many(uint64_t* seed)
+double sample_many(uint64_t* seed)
 {
    return sample_to(2, 10, seed);
 }
@ -29,15 +29,15 @@ int main(){
 		uint64_t* seed = malloc(sizeof(uint64_t));
 		*seed = 1000; // xorshift can't start with 0

-    float p_a = 0.8;
-    float p_b = 0.5;
-    float p_c = p_a * p_b;
+    double p_a = 0.8;
+    double p_b = 0.5;
+    double p_c = p_a * p_b;

    int n_dists = 4;
-    float weights[] = { 1 - p_c, p_c / 2, p_c / 4, p_c / 4 };
-    float (*samplers[])(uint64_t*) = { sample_0, sample_1, sample_few, sample_many };
+    double weights[] = { 1 - p_c, p_c / 2, p_c / 4, p_c / 4 };
+    double (*samplers[])(uint64_t*) = { sample_0, sample_1, sample_few, sample_many };

-    float result_one = sample_mixture(samplers, weights, n_dists, seed);
+    double result_one = sample_mixture(samplers, weights, n_dists, seed);
 		printf("result_one: %f\n", result_one);
 		free(seed);
 }
--- a/examples/02_many_samples/example
+++ b/examples/02_many_samples/example
--- a/examples/02_many_samples/example.c
+++ b/examples/02_many_samples/example.c
@ -4,22 +4,22 @@
 #include "../../squiggle.h"

 // Estimate functions
-float sample_0(uint64_t* seed)
+double sample_0(uint64_t* seed)
 {
    return 0;
 }

-float sample_1(uint64_t* seed)
+double sample_1(uint64_t* seed)
 {
    return 1;
 }

-float sample_few(uint64_t* seed)
+double sample_few(uint64_t* seed)
 {
    return sample_to(1, 3, seed);
 }

-float sample_many(uint64_t* seed)
+double sample_many(uint64_t* seed)
 {
    return sample_to(2, 10, seed);
 }
@ -29,16 +29,16 @@ int main(){
 		uint64_t* seed = malloc(sizeof(uint64_t));
 		*seed = 1000; // xorshift can't start with 0

-    float p_a = 0.8;
-    float p_b = 0.5;
-    float p_c = p_a * p_b;
+    double p_a = 0.8;
+    double p_b = 0.5;
+    double p_c = p_a * p_b;

    int n_dists = 4;
-    float weights[] = { 1 - p_c, p_c / 2, p_c / 4, p_c / 4 };
-    float (*samplers[])(uint64_t*) = { sample_0, sample_1, sample_few, sample_many };
+    double weights[] = { 1 - p_c, p_c / 2, p_c / 4, p_c / 4 };
+    double (*samplers[])(uint64_t*) = { sample_0, sample_1, sample_few, sample_many };

 		int n_samples = 1000000;
-		float* result_many = (float *) malloc(n_samples * sizeof(float));
+		double* result_many = (double *) malloc(n_samples * sizeof(double));
 		for(int i=0; i<n_samples; i++){
      result_many[i] = sample_mixture(samplers, weights, n_dists, seed);
 		}
--- a/examples/03_gcc_nested_function/example
+++ b/examples/03_gcc_nested_function/example
--- a/examples/03_gcc_nested_function/example.c
+++ b/examples/03_gcc_nested_function/example.c
@ -8,22 +8,22 @@ int main(){
 		uint64_t* seed = malloc(sizeof(uint64_t));
 		*seed = 1000; // xorshift can't start with 0

-    float p_a = 0.8;
-    float p_b = 0.5;
-    float p_c = p_a * p_b;
+    double p_a = 0.8;
+    double p_b = 0.5;
+    double p_c = p_a * p_b;

    int n_dists = 4;
 		
-		float sample_0(uint64_t* seed){ return 0; }
-		float sample_1(uint64_t* seed) { return 1; }
-		float sample_few(uint64_t* seed){ return sample_to(1, 3, seed); }
-		float sample_many(uint64_t* seed){ return sample_to(2, 10, seed); } 
+		double sample_0(uint64_t* seed){ return 0; }
+		double sample_1(uint64_t* seed) { return 1; }
+		double sample_few(uint64_t* seed){ return sample_to(1, 3, seed); }
+		double sample_many(uint64_t* seed){ return sample_to(2, 10, seed); } 
 		
-    float (*samplers[])(uint64_t*) = { sample_0, sample_1, sample_few, sample_many };
-		float weights[] = { 1 - p_c, p_c / 2, p_c / 4, p_c / 4 };
+    double (*samplers[])(uint64_t*) = { sample_0, sample_1, sample_few, sample_many };
+		double weights[] = { 1 - p_c, p_c / 2, p_c / 4, p_c / 4 };

 		int n_samples = 1000000;
-		float* result_many = (float *) malloc(n_samples * sizeof(float));
+		double* result_many = (double *) malloc(n_samples * sizeof(double));
 		for(int i=0; i<n_samples; i++){
      result_many[i] = sample_mixture(samplers, weights, n_dists, seed);
 		}
--- a/examples/04_sample_from_cdf_simple/example
+++ b/examples/04_sample_from_cdf_simple/example
--- a/examples/04_sample_from_cdf_simple/example.c
+++ b/examples/04_sample_from_cdf_simple/example.c
@ -8,7 +8,7 @@
 #define NUM_SAMPLES 1000000

 // Example cdf
-float cdf_uniform_0_1(float x)
+double cdf_uniform_0_1(double x)
 {
    if (x < 0) {
        return 0;
@ -19,7 +19,7 @@ float cdf_uniform_0_1(float x)
    }
 }

-float cdf_squared_0_1(float x)
+double cdf_squared_0_1(double x)
 {
    if (x < 0) {
        return 0;
@ -30,17 +30,17 @@ float cdf_squared_0_1(float x)
    }
 }

-float cdf_normal_0_1(float x)
+double cdf_normal_0_1(double x)
 {
-    float mean = 0;
-    float std = 1;
+    double mean = 0;
+    double std = 1;
    return 0.5 * (1 + erf((x - mean) / (std * sqrt(2)))); // erf from math.h
 }

 // Some testers
-void test_inverse_cdf_float(char* cdf_name, float cdf_float(float))
+void test_inverse_cdf_double(char* cdf_name, double cdf_double(double))
 {
-    struct box result = inverse_cdf_float(cdf_float, 0.5);
+    struct box result = inverse_cdf_double(cdf_double, 0.5);
    if (result.empty) {
        printf("Inverse for %s not calculated\n", cdf_name);
        exit(1);
@ -49,12 +49,12 @@ void test_inverse_cdf_float(char* cdf_name, float cdf_float(float))
    }
 }

-void test_and_time_sampler_float(char* cdf_name, float cdf_float(float), uint64_t* seed)
+void test_and_time_sampler_double(char* cdf_name, double cdf_double(double), uint64_t* seed)
 {
    printf("\nGetting some samples from %s:\n", cdf_name);
    clock_t begin = clock();
    for (int i = 0; i < NUM_SAMPLES; i++) {
-        struct box sample = sampler_cdf_float(cdf_float, seed);
+        struct box sample = sampler_cdf_double(cdf_double, seed);
        if (sample.empty) {
            printf("Error in sampler function for %s", cdf_name);
        } else {
@ -62,39 +62,39 @@ void test_and_time_sampler_float(char* cdf_name, float cdf_float(float), uint64_
        }
    }
    clock_t end = clock();
-    float time_spent = (float)(end - begin) / CLOCKS_PER_SEC;
+    double time_spent = (double)(end - begin) / CLOCKS_PER_SEC;
    printf("Time spent: %f\n", time_spent);
 }

 int main()
 {
-    // Test inverse cdf float
-    test_inverse_cdf_float("cdf_uniform_0_1", cdf_uniform_0_1);
-    test_inverse_cdf_float("cdf_squared_0_1", cdf_squared_0_1);
-    test_inverse_cdf_float("cdf_normal_0_1", cdf_normal_0_1);
+    // Test inverse cdf double
+    test_inverse_cdf_double("cdf_uniform_0_1", cdf_uniform_0_1);
+    test_inverse_cdf_double("cdf_squared_0_1", cdf_squared_0_1);
+    test_inverse_cdf_double("cdf_normal_0_1", cdf_normal_0_1);

    // Testing samplers
    // set randomness seed
    uint64_t* seed = malloc(sizeof(uint64_t));
    *seed = 1000; // xorshift can't start with 0

-    // Test float sampler
-    test_and_time_sampler_float("cdf_uniform_0_1", cdf_uniform_0_1, seed);
-    test_and_time_sampler_float("cdf_squared_0_1", cdf_squared_0_1, seed);
-    test_and_time_sampler_float("cdf_normal_0_1", cdf_normal_0_1, seed);
+    // Test double sampler
+    test_and_time_sampler_double("cdf_uniform_0_1", cdf_uniform_0_1, seed);
+    test_and_time_sampler_double("cdf_squared_0_1", cdf_squared_0_1, seed);
+    test_and_time_sampler_double("cdf_normal_0_1", cdf_normal_0_1, seed);

    // Get some normal samples using a previous approach
    printf("\nGetting some samples from sample_unit_normal\n");

    clock_t begin_2 = clock();
-
+		double* normal_samples = malloc(NUM_SAMPLES * sizeof(double));
    for (int i = 0; i < NUM_SAMPLES; i++) {
-        float normal_sample = sample_unit_normal(seed);
+        normal_samples[i] = sample_unit_normal(seed);
        // printf("%f\n", normal_sample);
    }

    clock_t end_2 = clock();
-    float time_spent_2 = (float)(end_2 - begin_2) / CLOCKS_PER_SEC;
+    double time_spent_2 = (double)(end_2 - begin_2) / CLOCKS_PER_SEC;
    printf("Time spent: %f\n", time_spent_2);

    free(seed);
--- a/examples/05_sample_from_cdf_beta/example
+++ b/examples/05_sample_from_cdf_beta/example
--- a/examples/05_sample_from_cdf_beta/example.c
+++ b/examples/05_sample_from_cdf_beta/example.c
@ -10,11 +10,11 @@
 #define TINY_BETA 1.0e-30

 // Incomplete beta function
-struct box incbeta(float a, float b, float x)
+struct box incbeta(double a, double b, double x)
 {
    // Descended from <https://github.com/codeplea/incbeta/blob/master/incbeta.c>,
    // <https://codeplea.com/incomplete-beta-function-c>
-    // but modified to return a box struct and floats instead of doubles.
+    // but modified to return a box struct and doubles instead of doubles.
    // [ ] to do: add attribution in README
    // Original code under this license:
    /*
@ -60,17 +60,17 @@ struct box incbeta(float a, float b, float x)
    }

    /*Find the first part before the continued fraction.*/
-    const float lbeta_ab = lgamma(a) + lgamma(b) - lgamma(a + b);
-    const float front = exp(log(x) * a + log(1.0 - x) * b - lbeta_ab) / a;
+    const double lbeta_ab = lgamma(a) + lgamma(b) - lgamma(a + b);
+    const double front = exp(log(x) * a + log(1.0 - x) * b - lbeta_ab) / a;

    /*Use Lentz's algorithm to evaluate the continued fraction.*/
-    float f = 1.0, c = 1.0, d = 0.0;
+    double f = 1.0, c = 1.0, d = 0.0;

    int i, m;
    for (i = 0; i <= 200; ++i) {
        m = i / 2;

-        float numerator;
+        double numerator;
        if (i == 0) {
            numerator = 1.0; /*First numerator is 1.0.*/
        } else if (i % 2 == 0) {
@ -89,7 +89,7 @@ struct box incbeta(float a, float b, float x)
        if (fabs(c) < TINY_BETA)
            c = TINY_BETA;

-        const float cd = c * d;
+        const double cd = c * d;
        f *= cd;

        /*Check for stop.*/
@ -105,7 +105,7 @@ struct box incbeta(float a, float b, float x)
    return PROCESS_ERROR("More loops needed, did not converge, in function incbeta");
 }

-struct box cdf_beta(float x)
+struct box cdf_beta(double x)
 {
    if (x < 0) {
        struct box result = { .empty = 0, .content = 0 };
@ -114,13 +114,13 @@ struct box cdf_beta(float x)
        struct box result = { .empty = 0, .content = 1 };
        return result;
    } else {
-        float successes = 1, failures = (2023 - 1945);
+        double successes = 1, failures = (2023 - 1945);
        return incbeta(successes, failures, x);
    }
 }

 // Some testers
-void test_inverse_cdf_box(char* cdf_name, struct box cdf_box(float))
+void test_inverse_cdf_box(char* cdf_name, struct box cdf_box(double))
 {
    struct box result = inverse_cdf_box(cdf_box, 0.5);
    if (result.empty) {
@ -131,7 +131,7 @@ void test_inverse_cdf_box(char* cdf_name, struct box cdf_box(float))
    }
 }

-void test_and_time_sampler_box(char* cdf_name, struct box cdf_box(float), uint64_t* seed)
+void test_and_time_sampler_box(char* cdf_name, struct box cdf_box(double), uint64_t* seed)
 {
    printf("\nGetting some samples from %s:\n", cdf_name);
    clock_t begin = clock();
@ -144,7 +144,7 @@ void test_and_time_sampler_box(char* cdf_name, struct box cdf_box(float), uint64
        }
    }
    clock_t end = clock();
-    float time_spent = (float)(end - begin) / CLOCKS_PER_SEC;
+    double time_spent = (double)(end - begin) / CLOCKS_PER_SEC;
    printf("Time spent: %f\n", time_spent);
 }

--- a/examples/06_gamma_beta/example
+++ b/examples/06_gamma_beta/example
--- a/examples/06_gamma_beta/example.c
+++ b/examples/06_gamma_beta/example.c
@ -14,15 +14,15 @@ int main()
    int n =  1000 * 1000;
 		/*
 		for (int i = 0; i < n; i++) {
-        float gamma_0 = sample_gamma(0.0, seed);
+        double gamma_0 = sample_gamma(0.0, seed);
        // printf("sample_gamma(0.0): %f\n", gamma_0);
    }
 		printf("\n");
 		*/

-		float* gamma_1_array = malloc(sizeof(float) * n);
+		double* gamma_1_array = malloc(sizeof(double) * n);
    for (int i = 0; i < n; i++) {
-        float gamma_1 = sample_gamma(1.0, seed);
+        double gamma_1 = sample_gamma(1.0, seed);
        // printf("sample_gamma(1.0): %f\n", gamma_1);
 				gamma_1_array[i] = gamma_1;
    }
@ -30,9 +30,9 @@ int main()
 		free(gamma_1_array);
 		printf("\n");
 		
-		float* beta_1_2_array = malloc(sizeof(float) * n);
+		double* beta_1_2_array = malloc(sizeof(double) * n);
 		for (int i = 0; i < n; i++) {
-        float beta_1_2 = sample_beta(1, 2.0, seed);
+        double beta_1_2 = sample_beta(1, 2.0, seed);
        // printf("sample_beta(1.0, 2.0): %f\n", beta_1_2);
 				beta_1_2_array[i] = beta_1_2;
    }
@ -43,10 +43,3 @@ int main()
 		free(seed);
 }

-/* 
-Aggregation mechanisms:
- Quantiles (requires a sort)
- Sum 
- Average
- Std
-*/
--- a/examples/07_ci_beta/example
+++ b/examples/07_ci_beta/example
--- a/examples/07_ci_beta/example.c
+++ b/examples/07_ci_beta/example.c
@ -0,0 +1,21 @@
+#include "../../squiggle.h"
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+
+// Estimate functions
+double beta_1_2_sampler(uint64_t* seed){
+	return sample_beta(1, 2.0, seed);
+}
+
+int main()
+{
+    // set randomness seed
+    uint64_t* seed = malloc(sizeof(uint64_t));
+    *seed = 1000; // xorshift can't start with 0
+
+		struct c_i beta_1_2_ci_90 = get_90_confidence_interval(beta_1_2_sampler, seed);
+		printf("90%% confidence interval of beta(1,2) is [%f, %f]\n", beta_1_2_ci_90.low, beta_1_2_ci_90.high);
+		
+		free(seed);
+}
--- a/examples/07_ci_beta/makefile
+++ b/examples/07_ci_beta/makefile
@ -0,0 +1,53 @@
+# Interface: 
+#   make
+#   make build
+#   make format
+#   make run
+
+# Compiler
+CC=gcc
+# CC=tcc # <= faster compilation
+
+# Main file
+SRC=example.c ../../squiggle.c
+OUTPUT=example
+
+## Dependencies
+MATH=-lm
+
+## Flags
+DEBUG= #'-g'
+STANDARD=-std=c99
+WARNINGS=-Wall
+OPTIMIZED=-O3  #-Ofast
+# OPENMP=-fopenmp
+
+## Formatter
+STYLE_BLUEPRINT=webkit
+FORMATTER=clang-format -i -style=$(STYLE_BLUEPRINT)
+
+## make build
+build: $(SRC)
+	$(CC) $(OPTIMIZED) $(DEBUG) $(SRC) $(MATH) -o $(OUTPUT)
+
+format: $(SRC)
+	$(FORMATTER) $(SRC)
+
+run: $(SRC) $(OUTPUT)
+	OMP_NUM_THREADS=1 ./$(OUTPUT) && echo
+
+time-linux: 
+	@echo "Requires /bin/time, found on GNU/Linux systems" && echo
+	
+	@echo "Running 100x and taking avg time $(OUTPUT)"
+	@t=$$(/usr/bin/time -f "%e" -p bash -c 'for i in {1..100}; do $(OUTPUT); done' 2>&1 >/dev/null | grep real | awk '{print $$2}' ); echo "scale=2; 1000 * $$t / 100" | bc | sed "s|^|Time using 1 thread: |" | sed 's|$$|ms|' && echo
+
+## Profiling
+
+profile-linux: 
+	echo "Requires perf, which depends on the kernel version, and might be in linux-tools package or similar"
+	echo "Must be run as sudo"
+	$(CC) $(SRC) $(MATH) -o $(OUTPUT)
+	sudo perf record ./$(OUTPUT)
+	sudo perf report
+	rm perf.data
--- a/1
+++ b/1
@ -11,6 +11,7 @@ all:
 	cd examples/04_sample_from_cdf_simple && make && echo
 	cd examples/05_sample_from_cdf_beta && make && echo
 	cd examples/06_gamma_beta && make && echo
+	cd examples/07_ci_beta && make && echo

 format: squiggle.c squiggle.h
 	$(FORMATTER) squiggle.c
--- a/squiggle.c
+++ b/squiggle.c
@ -11,7 +11,7 @@
 #define EXIT_ON_ERROR 0
 #define PROCESS_ERROR(error_msg) process_error(error_msg, EXIT_ON_ERROR, __FILE__, __LINE__)

-const float PI = 3.14159265358979323846; // M_PI in gcc gnu99
+const double PI = 3.14159265358979323846; // M_PI in gcc gnu99

 // Pseudo Random number generator
 uint64_t xorshift32(uint32_t* seed)
@ -35,67 +35,73 @@ uint64_t xorshift64(uint64_t* seed)
    // https://en.wikipedia.org/wiki/Xorshift
    // Also some drama: <https://www.pcg-random.org/posts/on-vignas-pcg-critique.html>, <https://prng.di.unimi.it/>

-		uint64_t x = *seed;
-		x ^= x << 13;
-		x ^= x >> 7;
-		x ^= x << 17;
-		return *seed = x;
+    uint64_t x = *seed;
+    x ^= x << 13;
+    x ^= x >> 7;
+    x ^= x << 17;
+    return *seed = x;
 }

 // Distribution & sampling functions
 // Unit distributions
-float sample_unit_uniform(uint64_t* seed)
+double sample_unit_uniform(uint64_t* seed)
 {
    // samples uniform from [0,1] interval.
-    return ((float)xorshift64(seed)) / ((float)UINT64_MAX);
+    return ((double)xorshift64(seed)) / ((double)UINT64_MAX);
 }

-float sample_unit_normal(uint64_t* seed)
+double sample_unit_normal(uint64_t* seed)
 {
    // See: <https://en.wikipedia.org/wiki/Box%E2%80%93Muller_transform>
-    float u1 = sample_unit_uniform(seed);
-    float u2 = sample_unit_uniform(seed);
-    float z = sqrtf(-2.0 * log(u1)) * sin(2 * PI * u2);
+    double u1 = sample_unit_uniform(seed);
+    double u2 = sample_unit_uniform(seed);
+    double z = sqrtf(-2.0 * log(u1)) * sin(2 * PI * u2);
    return z;
 }

 // Composite distributions
-float sample_uniform(float start, float end, uint64_t* seed)
+double sample_uniform(double start, double end, uint64_t* seed)
 {
    return sample_unit_uniform(seed) * (end - start) + start;
 }

-float sample_normal(float mean, float sigma, uint64_t* seed)
+double sample_normal(double mean, double sigma, uint64_t* seed)
 {
    return (mean + sigma * sample_unit_normal(seed));
 }

-float sample_lognormal(float logmean, float logsigma, uint64_t* seed)
+double sample_lognormal(double logmean, double logstd, uint64_t* seed)
 {
-    return expf(sample_normal(logmean, logsigma, seed));
+    return exp(sample_normal(logmean, logstd, seed));
 }

-float sample_to(float low, float high, uint64_t* seed)
+double sample_to(double low, double high, uint64_t* seed)
 {
    // Given a (positive) 90% confidence interval,
    // returns a sample from a lognormal
    // with a matching 90% c.i.
-    const float NORMAL95CONFIDENCE = 1.6448536269514722;
-    float loglow = logf(low);
-    float loghigh = logf(high);
-    float logmean = (loglow + loghigh) / 2;
-    float logsigma = (loghigh - loglow) / (2.0 * NORMAL95CONFIDENCE);
-    return sample_lognormal(logmean, logsigma, seed);
+    const double NORMAL95CONFIDENCE = 1.6448536269514722;
+    double loglow = logf(low);
+    double loghigh = logf(high);
+    double logmean = (loglow + loghigh) / 2;
+    double logstd = (loghigh - loglow) / (2.0 * NORMAL95CONFIDENCE);
+    return sample_lognormal(logmean, logstd, seed);
 }

-float sample_gamma(float alpha, uint64_t* seed)
+double sample_gamma(double alpha, uint64_t* seed)
 {

    // A Simple Method for Generating Gamma Variables, Marsaglia and Wan Tsang, 2001
    // https://dl.acm.org/doi/pdf/10.1145/358407.358414
    // see also the references/ folder
+		// Note that the Wikipedia page for the gamma distribution includes a scaling parameter
+		// k or beta
+		// https://en.wikipedia.org/wiki/Gamma_distribution
+		// such that gamma_k(alpha, k) = k * gamma(alpha)
+		// or gamma_beta(alpha, beta) = gamma(alpha) / beta
+		// So far I have not needed to use this, and thus the second parameter is by default 1.
    if (alpha >= 1) {
-        float d, c, x, v, u;
+        double d, c, x, v, u;
        d = alpha - 1.0 / 3.0;
        c = 1.0 / sqrt(9.0 * d);
        while (1) {
@ -105,7 +111,7 @@ float sample_gamma(float alpha, uint64_t* seed)
                v = 1.0 + c * x;
            } while (v <= 0.0);

-            v = v * v * v; 
+            v = v * v * v;
            u = sample_unit_uniform(seed);
            if (u < 1.0 - 0.0331 * (x * x * x * x)) { // Condition 1
                // the 0.0331 doesn't inspire much confidence
@ -125,24 +131,24 @@ float sample_gamma(float alpha, uint64_t* seed)
    }
 }

-float sample_beta(float a, float b, uint64_t* seed)
+double sample_beta(double a, double b, uint64_t* seed)
 {
-    float gamma_a = sample_gamma(a, seed);
-    float gamma_b = sample_gamma(b, seed);
+    double gamma_a = sample_gamma(a, seed);
+    double gamma_b = sample_gamma(b, seed);
    return gamma_a / (gamma_a + gamma_b);
 }

 // Array helpers
-float array_sum(float* array, int length)
+double array_sum(double* array, int length)
 {
-    float sum = 0.0;
+    double sum = 0.0;
    for (int i = 0; i < length; i++) {
        sum += array[i];
    }
    return sum;
 }

-void array_cumsum(float* array_to_sum, float* array_cumsummed, int length)
+void array_cumsum(double* array_to_sum, double* array_cumsummed, int length)
 {
    array_cumsummed[0] = array_to_sum[0];
    for (int i = 1; i < length; i++) {
@ -150,39 +156,38 @@ void array_cumsum(float* array_to_sum, float* array_cumsummed, int length)
    }
 }

-float array_mean(float* array, int length)
+double array_mean(double* array, int length)
 {
-    float sum = array_sum(array, length);
+    double sum = array_sum(array, length);
    return sum / length;
 }

-float array_std(float* array, int length)
+double array_std(double* array, int length)
 {
-    float mean = array_mean(array, length);
-    float std = 0.0;
+    double mean = array_mean(array, length);
+    double std = 0.0;
    for (int i = 0; i < length; i++) {
-        std += (array[i] - mean);
-				std *= std;
+        std += (array[i] - mean) * (array[i] - mean);
    }
    std = sqrt(std / length);
    return std;
 }

 // Mixture function
-float sample_mixture(float (*samplers[])(uint64_t*), float* weights, int n_dists, uint64_t* seed)
+double sample_mixture(double (*samplers[])(uint64_t*), double* weights, int n_dists, uint64_t* seed)
 {
    // You can see a simpler version of this function in the git history
    // or in C-02-better-algorithm-one-thread/
-    float sum_weights = array_sum(weights, n_dists);
-    float* cumsummed_normalized_weights = (float*)malloc(n_dists * sizeof(float));
+    double sum_weights = array_sum(weights, n_dists);
+    double* cumsummed_normalized_weights = (double*)malloc(n_dists * sizeof(double));
    cumsummed_normalized_weights[0] = weights[0] / sum_weights;
    for (int i = 1; i < n_dists; i++) {
        cumsummed_normalized_weights[i] = cumsummed_normalized_weights[i - 1] + weights[i] / sum_weights;
    }

-    float result;
+    double result;
    int result_set_flag = 0;
-    float p = sample_uniform(0, 1, seed);
+    double p = sample_uniform(0, 1, seed);
    for (int k = 0; k < n_dists; k++) {
        if (p < cumsummed_normalized_weights[k]) {
            result = samplers[k](seed);
@ -200,7 +205,7 @@ float sample_mixture(float (*samplers[])(uint64_t*), float* weights, int n_dists
 // Sample from an arbitrary cdf
 struct box {
    int empty;
-    float content;
+    double content;
    char* error_msg;
 };

@ -219,13 +224,13 @@ struct box process_error(const char* error_msg, int should_exit, char* file, int

 // Inverse cdf at point
 // Two versions of this function:
-//   - raw, dealing with cdfs that return floats
-//     - input: cdf: float => float, p
+//   - raw, dealing with cdfs that return doubles
+//     - input: cdf: double => double, p
 //     - output: Box(number|error)
 //   - box, dealing with cdfs that return a box.
-//     - input: cdf: float => Box(number|error), p
+//     - input: cdf: double => Box(number|error), p
 //     - output: Box(number|error)
-struct box inverse_cdf_float(float cdf(float), float p)
+struct box inverse_cdf_double(double cdf(double), double p)
 {
    // given a cdf: [-Inf, Inf] => [0,1]
    // returns a box with either
@ -233,8 +238,8 @@ struct box inverse_cdf_float(float cdf(float), float p)
    // or an error
    // if EXIT_ON_ERROR is set to 1, it exits instead of providing an error

-    float low = -1.0;
-    float high = 1.0;
+    double low = -1.0;
+    double high = 1.0;

    // 1. Make sure that cdf(low) < p < cdf(high)
    int interval_found = 0;
@ -260,14 +265,14 @@ struct box inverse_cdf_float(float cdf(float), float p)
        int convergence_condition = 0;
        int count = 0;
        while (!convergence_condition && (count < (INT_MAX / 2))) {
-            float mid = (high + low) / 2;
+            double mid = (high + low) / 2;
            int mid_not_new = (mid == low) || (mid == high);
-            // float width = high - low;
+            // double width = high - low;
            // if ((width < 1e-8) || mid_not_new){
            if (mid_not_new) {
                convergence_condition = 1;
            } else {
-                float mid_sign = cdf(mid) - p;
+                double mid_sign = cdf(mid) - p;
                if (mid_sign < 0) {
                    low = mid;
                } else if (mid_sign > 0) {
@ -288,7 +293,7 @@ struct box inverse_cdf_float(float cdf(float), float p)
    }
 }

-struct box inverse_cdf_box(struct box cdf_box(float), float p)
+struct box inverse_cdf_box(struct box cdf_box(double), double p)
 {
    // given a cdf: [-Inf, Inf] => Box([0,1])
    // returns a box with either
@ -296,8 +301,8 @@ struct box inverse_cdf_box(struct box cdf_box(float), float p)
    // or an error
    // if EXIT_ON_ERROR is set to 1, it exits instead of providing an error

-    float low = -1.0;
-    float high = 1.0;
+    double low = -1.0;
+    double high = 1.0;

    // 1. Make sure that cdf(low) < p < cdf(high)
    int interval_found = 0;
@ -332,9 +337,9 @@ struct box inverse_cdf_box(struct box cdf_box(float), float p)
        int convergence_condition = 0;
        int count = 0;
        while (!convergence_condition && (count < (INT_MAX / 2))) {
-            float mid = (high + low) / 2;
+            double mid = (high + low) / 2;
            int mid_not_new = (mid == low) || (mid == high);
-            // float width = high - low;
+            // double width = high - low;
            if (mid_not_new) {
                // if ((width < 1e-8) || mid_not_new){
                convergence_condition = 1;
@ -343,7 +348,7 @@ struct box inverse_cdf_box(struct box cdf_box(float), float p)
                if (cdf_mid.empty) {
                    return PROCESS_ERROR(cdf_mid.error_msg);
                }
-                float mid_sign = cdf_mid.content - p;
+                double mid_sign = cdf_mid.content - p;
                if (mid_sign < 0) {
                    low = mid;
                } else if (mid_sign > 0) {
@ -365,23 +370,60 @@ struct box inverse_cdf_box(struct box cdf_box(float), float p)
 }

 // Sampler based on inverse cdf and randomness function
-struct box sampler_cdf_box(struct box cdf(float), uint64_t* seed)
+struct box sampler_cdf_box(struct box cdf(double), uint64_t* seed)
 {
-    float p = sample_unit_uniform(seed);
+    double p = sample_unit_uniform(seed);
    struct box result = inverse_cdf_box(cdf, p);
    return result;
 }
-struct box sampler_cdf_float(float cdf(float), uint64_t* seed)
+struct box sampler_cdf_double(double cdf(double), uint64_t* seed)
 {
-    float p = sample_unit_uniform(seed);
-    struct box result = inverse_cdf_float(cdf, p);
+    double p = sample_unit_uniform(seed);
+    struct box result = inverse_cdf_double(cdf, p);
    return result;
 }

+// Get confidence intervals, given a sampler
+
+struct c_i {
+	float low;
+	float high;
+};
+int compare_doubles(const void *p, const void *q) {
+		// https://wikiless.esmailelbob.xyz/wiki/Qsort?lang=en
+    double x = *(const double *)p;
+    double y = *(const double *)q;
+
+    /* Avoid return x - y, which can cause undefined behaviour
+       because of signed integer overflow. */
+    if (x < y)
+        return -1;  // Return -1 if you want ascending, 1 if you want descending order. 
+    else if (x > y)
+        return 1;   // Return 1 if you want ascending, -1 if you want descending order.
+
+    return 0;
+}
+struct c_i get_90_confidence_interval(double (*sampler)(uint64_t*), uint64_t* seed){
+	int n = 100 * 1000;
+	double* samples_array = malloc(n * sizeof(double));
+	for(int i=0; i<n; i++){
+		samples_array[i] = sampler(seed);
+	}
+	qsort(samples_array, n, sizeof(double), compare_doubles);
+
+	struct c_i result = {
+		.low = samples_array[5000], 
+		.high =samples_array[94999],
+	};
+	free(samples_array);
+
+	return result;
+}
+
 /* Could also define other variations, e.g.,
-float sampler_danger(struct box cdf(float), uint64_t* seed)
+double sampler_danger(struct box cdf(double), uint64_t* seed)
 {
-    float p = sample_unit_uniform(seed);
+    double p = sample_unit_uniform(seed);
    struct box result = inverse_cdf_box(cdf, p);
 		if(result.empty){
 			exit(1);
--- a/squiggle.h
+++ b/squiggle.h
@ -8,31 +8,31 @@
 uint64_t xorshift64(uint64_t* seed);

 // Basic distribution sampling functions
-float sample_unit_uniform(uint64_t* seed);
-float sample_unit_normal(uint64_t* seed);
+double sample_unit_uniform(uint64_t* seed);
+double sample_unit_normal(uint64_t* seed);

 // Composite distribution sampling functions
-float sample_uniform(float start, float end, uint64_t* seed);
-float sample_normal(float mean, float sigma, uint64_t* seed);
-float sample_lognormal(float logmean, float logsigma, uint64_t* seed);
-float sample_to(float low, float high, uint64_t* seed);
+double sample_uniform(double start, double end, uint64_t* seed);
+double sample_normal(double mean, double sigma, uint64_t* seed);
+double sample_lognormal(double logmean, double logsigma, uint64_t* seed);
+double sample_to(double low, double high, uint64_t* seed);

-float sample_gamma(float alpha, uint64_t* seed);
-float sample_beta(float a, float b, uint64_t* seed);
+double sample_gamma(double alpha, uint64_t* seed);
+double sample_beta(double a, double b, uint64_t* seed);

 // Array helpers
-float array_sum(float* array, int length);
-void array_cumsum(float* array_to_sum, float* array_cumsummed, int length);
-float array_mean(float* array, int length);
-float array_std(float* array, int length);
+double array_sum(double* array, int length);
+void array_cumsum(double* array_to_sum, double* array_cumsummed, int length);
+double array_mean(double* array, int length);
+double array_std(double* array, int length);

 // Mixture function
-float sample_mixture(float (*samplers[])(uint64_t*), float* weights, int n_dists, uint64_t* seed);
+double sample_mixture(double (*samplers[])(uint64_t*), double* weights, int n_dists, uint64_t* seed);

 // Box
 struct box {
    int empty;
-    float content;
+    double content;
    char* error_msg;
 };

@ -43,11 +43,18 @@ struct box {
 struct box process_error(const char* error_msg, int should_exit, char* file, int line);

 // Inverse cdf
-struct box inverse_cdf_float(float cdf(float), float p);
-struct box inverse_cdf_box(struct box cdf_box(float), float p);
+struct box inverse_cdf_double(double cdf(double), double p);
+struct box inverse_cdf_box(struct box cdf_box(double), double p);

 // Samplers from cdf
-struct box sampler_cdf_float(float cdf(float), uint64_t* seed);
-struct box sampler_cdf_box(struct box cdf(float), uint64_t* seed);
+struct box sampler_cdf_double(double cdf(double), uint64_t* seed);
+struct box sampler_cdf_box(struct box cdf(double), uint64_t* seed);
+
+// Get 90% confidence interval
+struct c_i {
+	float low;
+	float high;
+};
+struct c_i get_90_confidence_interval(double (*sampler)(uint64_t*), uint64_t* seed);

 #endif
--- a/test/makefile
+++ b/test/makefile
@ -36,6 +36,9 @@ format: $(SRC)
 run: $(SRC) $(OUTPUT)
 	./$(OUTPUT)

+verify: $(SRC) $(OUTPUT)
+	./$(OUTPUT) | grep "NOT passed" -A 2 --group-separator='' || true
+
 time-linux: 
 	@echo "Requires /bin/time, found on GNU/Linux systems" && echo
 	
--- a/test/test
+++ b/test/test
--- a/test/test.c
+++ b/test/test.c
@ -1,93 +1,326 @@
 #include "../squiggle.h"
-#include <stdint.h>
 #include <math.h>
-#include <stdlib.h>
+#include <stdint.h>
 #include <stdio.h>
+#include <stdlib.h>

-#define N  1000 * 1000
+#define TOLERANCE 5.0 / 1000.0
+#define MAX_NAME_LENGTH 500

-void test_unit_uniform(uint64_t* seed){
-	float* unit_uniform_array = malloc(sizeof(float) * N);
-	
-	for(int i=0; i<N; i++){
-		unit_uniform_array[i] = sample_unit_uniform(seed);
-	}
-	
-	float mean = array_mean(unit_uniform_array, N);
-	float expected_mean = 0.5;
-	float delta_mean = mean - expected_mean;
+// Structs

-	float std = array_std(unit_uniform_array, N);
-	float expected_std = sqrt(1.0/12.0);
-	float delta_std = std - expected_std;
-	
-	printf("Mean of unit uniform: %f, vs expected mean: %f, delta: %f\n", mean, expected_mean, delta_mean);
-	printf("Std of unit uniform: %f, vs expected std: %f, delta: %f\n", std, expected_std, delta_std);
+struct array_expectations {
+    double* array;
+    int n;
+    char* name;
+    double expected_mean;
+    double expected_std;
+    double tolerance;
+};

-	if(fabs(delta_mean) > 1.0/1000.0){
-		printf("[-] Mean test for unit uniform NOT passed.\n");
-	}else {
-		printf("[x] Mean test for unit uniform PASSED\n");
-	}
+void test_array_expectations(struct array_expectations e)
+{
+    double mean = array_mean(e.array, e.n);
+    double delta_mean = mean - e.expected_mean;

-	if(fabs(delta_std) > 1.0/1000.0){
-		printf("[-] Std test for unit uniform NOT passed.\n");
-	}else {
-		printf("[x] Std test for unit uniform PASSED\n");
-	}
-	
-	printf("\n");
+    double std = array_std(e.array, e.n);
+    double delta_std = std - e.expected_std;

+    if ((fabs(delta_mean) / fabs(mean) > e.tolerance) && (fabs(delta_mean) > e.tolerance)) {
+        printf("[-] Mean test for %s NOT passed.\n", e.name);
+        printf("Mean of %s: %f, vs expected mean: %f\n", e.name, mean, e.expected_mean);
+        printf("delta: %f, relative delta: %f\n", delta_mean, delta_mean / fabs(mean));
+    } else {
+        printf("[x] Mean test for %s PASSED\n", e.name);
+    }
+
+    if ((fabs(delta_std) / fabs(std) > e.tolerance) && (fabs(delta_std) > e.tolerance)) {
+        printf("[-] Std test for %s NOT passed.\n", e.name);
+        printf("Std of %s: %f, vs expected std: %f\n", e.name, std, e.expected_std);
+        printf("delta: %f, relative delta: %f\n", delta_std, delta_std / fabs(std));
+    } else {
+        printf("[x] Std test for %s PASSED\n", e.name);
+    }
+
+    printf("\n");
 }

-void test_uniform(float start, float end, uint64_t* seed){
-	float* uniform_array = malloc(sizeof(float) * N);
-	
-	for(int i=0; i<N; i++){
-		uniform_array[i] = sample_uniform(start, end, seed);
-	}
-	
-	float mean = array_mean(uniform_array, N);
-	float expected_mean = (start + end) / 2; 
-	float delta_mean = mean - expected_mean;
-	
-	float std = array_std(uniform_array, N);
-	float expected_std = sqrt(1.0/12.0) * fabs(end-start);
-	float delta_std = std - expected_std;
-	
+// Test unit uniform
+void test_unit_uniform(uint64_t* seed)
+{
+    int n = 1000 * 1000;
+    double* unit_uniform_array = malloc(sizeof(double) * n);

-	float width = fabs(end - start);
-	if(fabs(delta_mean) > width * 1.0/1000.0){
-		printf("[-] Mean test for [%.1f, %.1f] uniform NOT passed.\n", start, end);
-		printf("Mean of [%.1f, %.1f] uniform: %f, vs expected mean: %f, delta: %f\n", start, end, mean, expected_mean, mean - expected_mean);
-	}else {
-		printf("[x] Mean test for unit uniform PASSED\n");
-	}
+    for (int i = 0; i < n; i++) {
+        unit_uniform_array[i] = sample_unit_uniform(seed);
+    }

-	if(fabs(delta_std) > width * 1.0/1000.0){
-		printf("[-] Std test for [%.1f, %.1f] uniform NOT passed.\n", start, end);
-		printf("Std of [%.1f, %.1f] uniform: %f, vs expected std: %f, delta: %f\n", start, end, std, expected_std, std - expected_std);
-	}else {
-		printf("[x] Std test for unit uniform PASSED\n");
-	}
-	printf("\n");
+    struct array_expectations expectations = {
+        .array = unit_uniform_array,
+        .n = n,
+        .name = "unit uniform",
+        .expected_mean = 0.5,
+        .expected_std = sqrt(1.0 / 12.0),
+        .tolerance = TOLERANCE,
+    };

+    test_array_expectations(expectations);
+    free(unit_uniform_array);
 }

-int main(){
+// Test uniforms
+void test_uniform(double start, double end, uint64_t* seed)
+{
+    int n = 1000 * 1000;
+    double* uniform_array = malloc(sizeof(double) * n);
+
+    for (int i = 0; i < n; i++) {
+        uniform_array[i] = sample_uniform(start, end, seed);
+    }
+
+    char* name = malloc(MAX_NAME_LENGTH * sizeof(char));
+    snprintf(name, MAX_NAME_LENGTH, "[%f, %f] uniform", start, end);
+    struct array_expectations expectations = {
+        .array = uniform_array,
+        .n = n,
+        .name = name,
+        .expected_mean = (start + end) / 2,
+        .expected_std = sqrt(1.0 / 12.0) * fabs(end - start),
+        .tolerance = fabs(end - start) * TOLERANCE,
+    };
+
+    test_array_expectations(expectations);
+    free(name);
+    free(uniform_array);
+}
+
+// Test unit normal
+void test_unit_normal(uint64_t* seed)
+{
+    int n = 1000 * 1000;
+    double* unit_normal_array = malloc(sizeof(double) * n);
+
+    for (int i = 0; i < n; i++) {
+        unit_normal_array[i] = sample_unit_normal(seed);
+    }
+
+    struct array_expectations expectations = {
+        .array = unit_normal_array,
+        .n = n,
+        .name = "unit normal",
+        .expected_mean = 0,
+        .expected_std = 1,
+        .tolerance = TOLERANCE,
+    };
+
+    test_array_expectations(expectations);
+    free(unit_normal_array);
+}
+
+// Test normal
+void test_normal(double mean, double std, uint64_t* seed)
+{
+    int n = 10 * 1000 * 1000;
+    double* normal_array = malloc(sizeof(double) * n);
+
+    for (int i = 0; i < n; i++) {
+        normal_array[i] = sample_normal(mean, std, seed);
+    }
+
+    char* name = malloc(MAX_NAME_LENGTH * sizeof(char));
+    snprintf(name, MAX_NAME_LENGTH, "normal(%f, %f)", mean, std);
+    struct array_expectations expectations = {
+        .array = normal_array,
+        .n = n,
+        .name = name,
+        .expected_mean = mean,
+        .expected_std = std,
+        .tolerance = TOLERANCE,
+    };
+
+    test_array_expectations(expectations);
+    free(name);
+    free(normal_array);
+}
+
+// Test lognormal
+void test_lognormal(double logmean, double logstd, uint64_t* seed)
+{
+    int n = 10 * 1000 * 1000;
+    double* lognormal_array = malloc(sizeof(double) * n);
+
+    for (int i = 0; i < n; i++) {
+        lognormal_array[i] = sample_lognormal(logmean, logstd, seed);
+    }
+
+    char* name = malloc(MAX_NAME_LENGTH * sizeof(char));
+    snprintf(name, MAX_NAME_LENGTH, "lognormal(%f, %f)", logmean, logstd);
+    struct array_expectations expectations = {
+        .array = lognormal_array,
+        .n = n,
+        .name = name,
+        .expected_mean = exp(logmean + pow(logstd, 2) / 2),
+        .expected_std = sqrt((exp(pow(logstd, 2)) - 1) * exp(2 * logmean + pow(logstd, 2))),
+        .tolerance = TOLERANCE,
+    };
+
+    test_array_expectations(expectations);
+    free(name);
+    free(lognormal_array);
+}
+
+// Test lognormal to
+void test_to(double low, double high, uint64_t* seed)
+{
+    int n = 10 * 1000 * 1000;
+    double* lognormal_array = malloc(sizeof(double) * n);
+
+    for (int i = 0; i < n; i++) {
+        lognormal_array[i] = sample_to(low, high, seed);
+    }
+
+
+    char* name = malloc(MAX_NAME_LENGTH * sizeof(char));
+    snprintf(name, MAX_NAME_LENGTH, "to(%f, %f)", low, high);
+    
+		const double NORMAL95CONFIDENCE = 1.6448536269514722;
+    double loglow = logf(low);
+    double loghigh = logf(high);
+    double logmean = (loglow + loghigh) / 2;
+    double logstd = (loghigh - loglow) / (2.0 * NORMAL95CONFIDENCE);
+    
+		struct array_expectations expectations = {
+        .array = lognormal_array,
+        .n = n,
+        .name = name,
+        .expected_mean = exp(logmean + pow(logstd, 2) / 2),
+        .expected_std = sqrt((exp(pow(logstd, 2)) - 1) * exp(2 * logmean + pow(logstd, 2))),
+        .tolerance = TOLERANCE,
+    };
+
+    test_array_expectations(expectations);
+    free(name);
+    free(lognormal_array);
+}
+
+// Test beta
+
+void test_beta(double a, double b, uint64_t* seed)
+{
+    int n = 10 * 1000 * 1000;
+    double* beta_array = malloc(sizeof(double) * n);
+
+    for (int i = 0; i < n; i++) {
+        beta_array[i] = sample_beta(a, b, seed);
+    }
+
+    char* name = malloc(MAX_NAME_LENGTH * sizeof(char));
+    snprintf(name, MAX_NAME_LENGTH, "beta(%f, %f)", a, b);
+    struct array_expectations expectations = {
+        .array = beta_array,
+        .n = n,
+        .name = name,
+        .expected_mean = a / (a + b),
+        .expected_std = sqrt((a * b) / (pow(a + b, 2) * (a + b + 1))),
+        .tolerance = TOLERANCE,
+    };
+
+    test_array_expectations(expectations);
+    free(name);
+}
+
+int main()
+{
    // set randomness seed
    uint64_t* seed = malloc(sizeof(uint64_t));
    *seed = 1000; // xorshift can't start with a seed of 0
-   
-		test_unit_uniform(seed);

-		for(int i=0; i<100; i++){
-			float start = sample_uniform(-10, 10, seed);
-			float end = sample_uniform(-10, 10, seed);
-			if ( end > start){
-				test_uniform(start, end, seed);
-			}
-		}
-		free(seed);
+    printf("Testing unit uniform\n");
+    test_unit_uniform(seed);
+
+    printf("Testing small uniforms\n");
+    for (int i = 0; i < 100; i++) {
+        double start = sample_uniform(-10, 10, seed);
+        double end = sample_uniform(-10, 10, seed);
+        if (end > start) {
+            test_uniform(start, end, seed);
+        }
+    }
+
+    printf("Testing wide uniforms\n");
+    for (int i = 0; i < 100; i++) {
+        double start = sample_uniform(-1000 * 1000, 1000 * 1000, seed);
+        double end = sample_uniform(-1000 * 1000, 1000 * 1000, seed);
+        if (end > start) {
+            test_uniform(start, end, seed);
+        }
+    }
+
+    printf("Testing unit normal\n");
+    test_unit_normal(seed);
+
+    printf("Testing small normals\n");
+    for (int i = 0; i < 100; i++) {
+        double mean = sample_uniform(-10, 10, seed);
+        double std = sample_uniform(0, 10, seed);
+        if (std > 0) {
+            test_normal(mean, std, seed);
+        }
+    }
+
+    printf("Testing larger normals\n");
+    for (int i = 0; i < 100; i++) {
+        double mean = sample_uniform(-1000 * 1000, 1000 * 1000, seed);
+        double std = sample_uniform(0, 1000 * 1000, seed);
+        if (std > 0) {
+            test_normal(mean, std, seed);
+        }
+    }
+
+    printf("Testing smaller lognormals\n");
+    for (int i = 0; i < 100; i++) {
+        double mean = sample_uniform(-1, 1, seed);
+        double std = sample_uniform(0, 1, seed);
+        if (std > 0) {
+            test_lognormal(mean, std, seed);
+        }
+    }
+
+    printf("Testing larger lognormals\n");
+    for (int i = 0; i < 100; i++) {
+        double mean = sample_uniform(-1, 5, seed);
+        double std = sample_uniform(0, 5, seed);
+        if (std > 0) {
+            test_lognormal(mean, std, seed);
+        }
+    }
+
+    printf("Testing lognormals — sample_to(low, high) syntax\n");
+    for (int i = 0; i < 100; i++) {
+        double low = sample_uniform(0, 1000 * 1000, seed);
+        double high = sample_uniform(0, 1000 * 1000, seed);
+        if (low < high) {
+            test_to(low, high, seed);
+        }
+    }
+
+    printf("Testing beta distribution\n");
+    for (int i = 0; i < 100; i++) {
+        double a = sample_uniform(0, 1000, seed);
+        double b = sample_uniform(0, 1000, seed);
+        if ((a > 0) && (b > 0)) {
+            test_beta(a, b, seed);
+        }
+    }
+
+    printf("Testing larger beta distributions\n");
+    for (int i = 0; i < 100; i++) {
+        double a = sample_uniform(0, 1000 * 1000, seed);
+        double b = sample_uniform(0, 1000 * 1000, seed);
+        if ((a > 0) && (b > 0)) {
+            test_beta(a, b, seed);
+        }
+    }
+
+    free(seed);
 }
-
Author	SHA1	Message	Date
NunoSempere	7694124fec	pontificate about tests with wide lognormals	2023-07-23 21:10:56 +02:00
NunoSempere	e053a726ee	add example of getting confidence interval & misc changes	2023-07-23 19:12:02 +02:00
NunoSempere	d531d5571f	formatting pass.	2023-07-23 16:30:42 +02:00
NunoSempere	c8fd237bbf	savepoint, rework tolerance values.	2023-07-23 16:28:44 +02:00
NunoSempere	88e998edce	formatting pass.	2023-07-23 15:44:22 +02:00
NunoSempere	6b2349132b	add tests lognormal, and have them use special tolerances.	2023-07-23 15:43:35 +02:00
NunoSempere	b80b05ca30	tests for larger beta distributions	2023-07-23 14:01:17 +02:00
NunoSempere	95afb7ea1a	add tests for normal & beta.	2023-07-23 14:00:14 +02:00
NunoSempere	f65699a688	fix floats.h bug, fix std bug, add tests for std.	2023-07-23 13:17:40 +02:00
NunoSempere	6e228dcc6b	replace all floats (32 bits) with doubles (64 bits) to fix bug after switching xorshift32 => xorshift64	2023-07-23 13:02:56 +02:00