try adding link in readme

mark to-dos as done.
add serious model, add template.
2023-07-24 00:39:17 +02:00 · 2023-07-24 00:38:28 +02:00 · 2023-07-24 00:37:45 +02:00 · 2023-07-24 00:15:05 +02:00 · 2023-07-24 00:14:44 +02:00 · 2023-07-23 23:54:15 +02:00
10 changed files with 309 additions and 38 deletions
--- a/README.md
+++ b/README.md
@ -20,12 +20,14 @@ A self-contained C99 library that provides a subset of [Squiggle](https://www.sq
 You can follow some example usage in the examples/ folder
-1. In the first example, we define a small model, and draw one sample from it
+1. In the [1st example](src/branch/master/examples/01_one_sample/example.c), we define a small model, and draw one sample from it
-2. In the second example, we define a small model, and return many samples
+2. In the 2nd example, we define a small model, and return many samples
-3. In the third example, we use a gcc extension—nested functions—to rewrite the code from point 2. in a more linear way.
+3. In the 3rd example, we use a gcc extension—nested functions—to rewrite the code from point 2. in a more linear way.
-4. In the fourth example, we define some simple cdfs, and we draw samples from those cdfs. We see that this approach is slower than using the built-in samplers, e.g., the normal sampler.
+4. In the 4th example, we define some simple cdfs, and we draw samples from those cdfs. We see that this approach is slower than using the built-in samplers, e.g., the normal sampler.
-5. In the fifth example, we define the cdf for the beta distribution, and we draw samples from it. 
+5. In the 5th example, we define the cdf for the beta distribution, and we draw samples from it. 
-6. In the sixth example, we take samples from simple gamma and beta distributions, using the samplers provided by this library.
+6. In the 6th example, we take samples from simple gamma and beta distributions, using the samplers provided by this library.
 7. In the 7th example, we get the 90% confidence interval of a beta distribution
 8. The 8th example translates the models from Eli and Nuño from [Samotsvety Nuclear Risk Forecasts — March 2022](https://forum.nunosempere.com/posts/KRFXjCqqfGQAYirm5/samotsvety-nuclear-risk-forecasts-march-2022#Nu_o_Sempere) into squiggle.c, then creates a mixture from both, and returns the mean probability of death per month and the 90% confidence interval.
 ## Commentary
@ -78,17 +80,18 @@ The first approach produces terser programs but might not scale. The second appr
 Behaviour on error can be toggled by the `EXIT_ON_ERROR` variable. This library also provides a convenient macro, `PROCESS_ERROR`, to make error handling in either case much terser—see the usage in example 4 in the examples/ folder.
-## Design choices
+### Design choices
 This code should be correct, then simple, then fast.
 - It should be correct. The user should be able to rely on it and not think about whether errors come from the library.
  - Nonetheless, the user should understand the limitations of sampling-based methods. See the section on [Tests and the long tail of the lognormal](https://git.nunosempere.com/personal/squiggle.c#tests-and-the-long-tail-of-the-lognormal) for a discussion of how sampling is bad at capturing some aspects of distributions with long tails.
 - It should be clear, conceptually simple. Simple for me to implement, simple for others to understand
 - It should be fast. But when speed conflicts with simplicity, choose simplicity. For example, there might be several possible algorithms to sample a distribution, each of which is faster over part of the domain. In that case, it's conceptually simpler to just pick one algorithm, and pay the—normally small—performance penalty. In any case, though, the code should still be *way faster* than Python.
 Note that being terse, or avoiding verbosity, is a non-goal. This is in part because of the constraints that C imposes. But it also aids with clarity and conceptual simplicity, as the issue of correlated samples illustrates in the next section.
-## Correlated samples
+### Correlated samples
 In the original [squiggle](https://www.squiggle-language.com/) language, there is some ambiguity about what this code means:
@ -243,17 +246,19 @@ Std of to(10.000000, 10000.000000): 23578.091775, vs expected std: 25836.381819
 delta: -2258.290043, relative delta: -0.095779
 ```
 Overall, I would caution that if you really care about the very far tails of distributions, you might want to instead use tools which can do some of the analytical manipulations for you, like the original Squiggle, Simple Squiggle (both linked below), or even doing lognormal multiplication by hand, relying on the fact that two lognormals multiplied together result in another lognormal with known shape.
 ## Related projects
 - [Squiggle](https://www.squiggle-language.com/)
 - [SquigglePy](https://github.com/rethinkpriorities/squigglepy)
 - [Simple Squiggle](https://nunosempere.com/blog/2022/04/17/simple-squiggle/)
 - [time to botec](https://github.com/NunoSempere/time-to-botec)
 - [beta]() 
 ## To do list
- [ ] Have some more complicated & realistic example
+- [ ] Link to the examples in the examples section.
 - [ ] Add summarization functions: 90% ci (or all c.i.?) 
 - [ ] Systematize references
 - [ ] Publish online
 - [ ] Support all distribution functions in <https://www.squiggle-language.com/docs/Api/Dist>
@ -265,7 +270,7 @@ delta: -2258.290043, relative delta: -0.095779
 - [x] Add example for many samples
 - [ ] ~~Add a custom preprocessor to allow simple nested functions that don't rely on local scope?~~
 - [x] Use gcc extension to define functions nested inside main.
- [x] Chain various sample_mixture functions
+- [x] Chain various `sample_mixture` functions
 - [x] Add beta distribution
  - See <https://stats.stackexchange.com/questions/502146/how-does-numpy-generate-samples-from-a-beta-distribution> for a faster method.
 - [ ] ~~Use OpenMP for acceleration~~
@ -299,3 +304,6 @@ delta: -2258.290043, relative delta: -0.095779
  - https://stats.stackexchange.com/questions/502146/how-does-numpy-generate-samples-from-a-beta-distribution
  - https://github.com/numpy/numpy/blob/5cae51e794d69dd553104099305e9f92db237c53/numpy/random/src/distributions/distributions.c
 - [x] Pontificate about lognormal tests
 - [x] Give warning about sampling-based methods.
 - [x] Have some more complicated & realistic example
 - [x] Add summarization functions: 90% ci (or all c.i.?) 
--- a/examples/00_example_template/example.c
+++ b/examples/00_example_template/example.c
@ -0,0 +1,14 @@
 #include "../../squiggle.h"
 #include <stdint.h>
 #include <stdio.h>
 #include <stdlib.h>
 int main()
 {
    // set randomness seed
    uint64_t* seed = malloc(sizeof(uint64_t));
    *seed = 1000; // xorshift can't start with 0
    free(seed);
 }
--- a/examples/00_example_template/makefile
+++ b/examples/00_example_template/makefile
@ -0,0 +1,53 @@
 # Interface: 
 #   make
 #   make build
 #   make format
 #   make run
 # Compiler
 CC=gcc
 # CC=tcc # <= faster compilation
 # Main file
 SRC=example.c ../../squiggle.c
 OUTPUT=example
 ## Dependencies
 MATH=-lm
 ## Flags
 DEBUG= #'-g'
 STANDARD=-std=c99
 WARNINGS=-Wall
 OPTIMIZED=-O3  #-Ofast
 # OPENMP=-fopenmp
 ## Formatter
 STYLE_BLUEPRINT=webkit
 FORMATTER=clang-format -i -style=$(STYLE_BLUEPRINT)
 ## make build
 build: $(SRC)
 	$(CC) $(OPTIMIZED) $(DEBUG) $(SRC) $(MATH) -o $(OUTPUT)
 format: $(SRC)
 	$(FORMATTER) $(SRC)
 run: $(SRC) $(OUTPUT)
 	OMP_NUM_THREADS=1 ./$(OUTPUT) && echo
 time-linux: 
 	@echo "Requires /bin/time, found on GNU/Linux systems" && echo
 	@echo "Running 100x and taking avg time $(OUTPUT)"
 	@t=$$(/usr/bin/time -f "%e" -p bash -c 'for i in {1..100}; do $(OUTPUT); done' 2>&1 >/dev/null | grep real | awk '{print $$2}' ); echo "scale=2; 1000 * $$t / 100" | bc | sed "s|^|Time using 1 thread: |" | sed 's|$$|ms|' && echo
 ## Profiling
 profile-linux: 
 	echo "Requires perf, which depends on the kernel version, and might be in linux-tools package or similar"
 	echo "Must be run as sudo"
 	$(CC) $(SRC) $(MATH) -o $(OUTPUT)
 	sudo perf record ./$(OUTPUT)
 	sudo perf report
 	rm perf.data
--- a/examples/08_nuclear_war/example
+++ b/examples/08_nuclear_war/example
--- a/examples/08_nuclear_war/example.c
+++ b/examples/08_nuclear_war/example.c
@ -0,0 +1,68 @@
 #include "../../squiggle.h"
 #include <math.h>
 #include <stdint.h>
 #include <stdio.h>
 #include <stdlib.h>
 double probability_of_dying_nuno(uint64_t* seed)
 {
    double first_year_russian_nuclear_weapons = 1953;
    double current_year = 2022;
    double laplace_probability_nuclear_exchange_year = sample_beta(1, current_year - first_year_russian_nuclear_weapons + 1, seed);
    double laplace_probability_nuclear_exchange_month = 1 - pow(1 - laplace_probability_nuclear_exchange_year, (1.0 / 12.0));
    double london_hit_conditional_on_russia_nuclear_weapon_usage = sample_beta(7.67, 69.65, seed);
    // I.e., a beta distribution with a range of 0.05 to 0.16 into: https://nunosempere.com/blog/2023/03/15/fit-beta/
    // 0.05 were my estimate and Samotsvety's estimate in March 2022, respectively:
    // https://forum.effectivealtruism.org/posts/KRFXjCqqfGQAYirm5/samotsvety-nuclear-risk-forecasts-march-2022#Nu_o_Sempere
    double informed_actor_not_able_to_escape = sample_beta(3.26212166586967, 3.26228162008564, seed);
    // 0.2 to 0.8, i.e., 20% to 80%, again using the previous tool
    double proportion_which_die_if_bomb_drops_in_london = sample_beta(10.00, 2.45, seed); // 60% to 95%
    double probability_of_dying = laplace_probability_nuclear_exchange_month * london_hit_conditional_on_russia_nuclear_weapon_usage * informed_actor_not_able_to_escape * proportion_which_die_if_bomb_drops_in_london;
    return probability_of_dying;
 }
 double probability_of_dying_eli(uint64_t* seed)
 {
    double russia_nato_nuclear_exchange_in_next_month = sample_beta(1.30, 1182.99, seed); // .0001 to .003
    double london_hit_conditional = sample_beta(3.47, 8.97, seed); // 0.1 to 0.5
    double informed_actors_not_able_to_escape = sample_beta(2.73, 5.67, seed); // .1 to .6
    double proportion_which_die_if_bomb_drops_in_london = sample_beta(3.00, 1.46, seed); // 0.3 to 0.95;
    double probability_of_dying = russia_nato_nuclear_exchange_in_next_month * london_hit_conditional * informed_actors_not_able_to_escape * proportion_which_die_if_bomb_drops_in_london;
    return probability_of_dying;
 }
 double mixture(uint64_t* seed)
 {
    double (*samplers[])(uint64_t*) = { probability_of_dying_nuno, probability_of_dying_eli };
    double weights[] = { 0.5, 0.5 };
    return sample_mixture(samplers, weights, 2, seed);
 }
 int main()
 {
    // set randomness seed
    uint64_t* seed = malloc(sizeof(uint64_t));
    *seed = 1000; // xorshift can't start with 0
    int n = 1000 * 1000;
    double* mixture_result = malloc(sizeof(double) * n);
    for (int i = 0; i < n; i++) {
        mixture_result[i] = mixture(seed);
    }
    printf("mixture_result: [ ");
    for (int i = 0; i < 9; i++) {
        printf("%.6f, ", mixture_result[i]);
    }
    printf("... ]\n");
    struct c_i c_i_90 = get_90_confidence_interval(mixture, seed);
    printf("mean: %f\n", array_mean(mixture_result, n));
    printf("90%% confidence interval: [%f, %f]\n", c_i_90.low, c_i_90.high);
    free(seed);
 }
--- a/examples/08_nuclear_war/makefile
+++ b/examples/08_nuclear_war/makefile
@ -0,0 +1,53 @@
 # Interface: 
 #   make
 #   make build
 #   make format
 #   make run
 # Compiler
 CC=gcc
 # CC=tcc # <= faster compilation
 # Main file
 SRC=example.c ../../squiggle.c
 OUTPUT=example
 ## Dependencies
 MATH=-lm
 ## Flags
 DEBUG= #'-g'
 STANDARD=-std=c99
 WARNINGS=-Wall
 OPTIMIZED=-O3  #-Ofast
 # OPENMP=-fopenmp
 ## Formatter
 STYLE_BLUEPRINT=webkit
 FORMATTER=clang-format -i -style=$(STYLE_BLUEPRINT)
 ## make build
 build: $(SRC)
 	$(CC) $(OPTIMIZED) $(DEBUG) $(SRC) $(MATH) -o $(OUTPUT)
 format: $(SRC)
 	$(FORMATTER) $(SRC)
 run: $(SRC) $(OUTPUT)
 	OMP_NUM_THREADS=1 ./$(OUTPUT) && echo
 time-linux: 
 	@echo "Requires /bin/time, found on GNU/Linux systems" && echo
 	@echo "Running 100x and taking avg time $(OUTPUT)"
 	@t=$$(/usr/bin/time -f "%e" -p bash -c 'for i in {1..100}; do $(OUTPUT); done' 2>&1 >/dev/null | grep real | awk '{print $$2}' ); echo "scale=2; 1000 * $$t / 100" | bc | sed "s|^|Time using 1 thread: |" | sed 's|$$|ms|' && echo
 ## Profiling
 profile-linux: 
 	echo "Requires perf, which depends on the kernel version, and might be in linux-tools package or similar"
 	echo "Must be run as sudo"
 	$(CC) $(SRC) $(MATH) -o $(OUTPUT)
 	sudo perf record ./$(OUTPUT)
 	sudo perf report
 	rm perf.data
--- a/examples/08_nuclear_war/scratchpad/example
+++ b/examples/08_nuclear_war/scratchpad/example
--- a/examples/08_nuclear_war/scratchpad/example.c
+++ b/examples/08_nuclear_war/scratchpad/example.c
@ -0,0 +1,20 @@
 #include "../../../squiggle.h"
 #include <stdint.h>
 #include <stdio.h>
 #include <stdlib.h>
 int main()
 {
    // set randomness seed
    uint64_t* seed = malloc(sizeof(uint64_t));
    *seed = 1000; // xorshift can't start with 0
 		double firstYearRussianNuclearWeapons = 1953;
 		double	currentYear = 2023;
 		for(int i=0; i<10; i++){
 			double laplace_beta = sample_beta(currentYear-firstYearRussianNuclearWeapons + 1, 1, seed);
 			printf("%f\n", laplace_beta);
 		}
    free(seed);
 }
--- a/examples/08_nuclear_war/scratchpad/makefile
+++ b/examples/08_nuclear_war/scratchpad/makefile
@ -0,0 +1,53 @@
 # Interface: 
 #   make
 #   make build
 #   make format
 #   make run
 # Compiler
 CC=gcc
 # CC=tcc # <= faster compilation
 # Main file
 SRC=example.c ../../../squiggle.c
 OUTPUT=example
 ## Dependencies
 MATH=-lm
 ## Flags
 DEBUG= #'-g'
 STANDARD=-std=c99
 WARNINGS=-Wall
 OPTIMIZED=-O3  #-Ofast
 # OPENMP=-fopenmp
 ## Formatter
 STYLE_BLUEPRINT=webkit
 FORMATTER=clang-format -i -style=$(STYLE_BLUEPRINT)
 ## make build
 build: $(SRC)
 	$(CC) $(OPTIMIZED) $(DEBUG) $(SRC) $(MATH) -o $(OUTPUT)
 format: $(SRC)
 	$(FORMATTER) $(SRC)
 run: $(SRC) $(OUTPUT)
 	OMP_NUM_THREADS=1 ./$(OUTPUT) && echo
 time-linux: 
 	@echo "Requires /bin/time, found on GNU/Linux systems" && echo
 	@echo "Running 100x and taking avg time $(OUTPUT)"
 	@t=$$(/usr/bin/time -f "%e" -p bash -c 'for i in {1..100}; do $(OUTPUT); done' 2>&1 >/dev/null | grep real | awk '{print $$2}' ); echo "scale=2; 1000 * $$t / 100" | bc | sed "s|^|Time using 1 thread: |" | sed 's|$$|ms|' && echo
 ## Profiling
 profile-linux: 
 	echo "Requires perf, which depends on the kernel version, and might be in linux-tools package or similar"
 	echo "Must be run as sudo"
 	$(CC) $(SRC) $(MATH) -o $(OUTPUT)
 	sudo perf record ./$(OUTPUT)
 	sudo perf report
 	rm perf.data
--- a/squiggle.c
+++ b/squiggle.c
@ -94,12 +94,12 @@ double sample_gamma(double alpha, uint64_t* seed)
    // A Simple Method for Generating Gamma Variables, Marsaglia and Wan Tsang, 2001
    // https://dl.acm.org/doi/pdf/10.1145/358407.358414
    // see also the references/ folder
-		// Note that the Wikipedia page for the gamma distribution includes a scaling parameter
+    // Note that the Wikipedia page for the gamma distribution includes a scaling parameter
-		// k or beta
+    // k or beta
-		// https://en.wikipedia.org/wiki/Gamma_distribution
+    // https://en.wikipedia.org/wiki/Gamma_distribution
-		// such that gamma_k(alpha, k) = k * gamma(alpha)
+    // such that gamma_k(alpha, k) = k * gamma(alpha)
-		// or gamma_beta(alpha, beta) = gamma(alpha) / beta
+    // or gamma_beta(alpha, beta) = gamma(alpha) / beta
-		// So far I have not needed to use this, and thus the second parameter is by default 1.
+    // So far I have not needed to use this, and thus the second parameter is by default 1.
    if (alpha >= 1) {
        double d, c, x, v, u;
        d = alpha - 1.0 / 3.0;
@ -386,38 +386,40 @@ struct box sampler_cdf_double(double cdf(double), uint64_t* seed)
 // Get confidence intervals, given a sampler
 struct c_i {
-	float low;
+    float low;
-	float high;
+    float high;
 };
-int compare_doubles(const void *p, const void *q) {
+int compare_doubles(const void* p, const void* q)
-		// https://wikiless.esmailelbob.xyz/wiki/Qsort?lang=en
+{
-    double x = *(const double *)p;
+    // https://wikiless.esmailelbob.xyz/wiki/Qsort?lang=en
-    double y = *(const double *)q;
+    double x = *(const double*)p;
    double y = *(const double*)q;
    /* Avoid return x - y, which can cause undefined behaviour
       because of signed integer overflow. */
    if (x < y)
-        return -1;  // Return -1 if you want ascending, 1 if you want descending order. 
+        return -1; // Return -1 if you want ascending, 1 if you want descending order.
    else if (x > y)
-        return 1;   // Return 1 if you want ascending, -1 if you want descending order.
+        return 1; // Return 1 if you want ascending, -1 if you want descending order.
    return 0;
 }
-struct c_i get_90_confidence_interval(double (*sampler)(uint64_t*), uint64_t* seed){
+struct c_i get_90_confidence_interval(double (*sampler)(uint64_t*), uint64_t* seed)
-	int n = 100 * 1000;
+{
-	double* samples_array = malloc(n * sizeof(double));
+    int n = 100 * 1000;
-	for(int i=0; i<n; i++){
+    double* samples_array = malloc(n * sizeof(double));
-		samples_array[i] = sampler(seed);
+    for (int i = 0; i < n; i++) {
-	}
+        samples_array[i] = sampler(seed);
-	qsort(samples_array, n, sizeof(double), compare_doubles);
+    }
    qsort(samples_array, n, sizeof(double), compare_doubles);
-	struct c_i result = {
+    struct c_i result = {
-		.low = samples_array[5000], 
+        .low = samples_array[5000],
-		.high =samples_array[94999],
+        .high = samples_array[94999],
-	};
+    };
-	free(samples_array);
+    free(samples_array);
-	return result;
+    return result;
 }
 /* Could also define other variations, e.g.,
Author	SHA1	Message	Date
NunoSempere	c29b3b4559	try adding link in readme	2023-07-24 00:39:17 +02:00
NunoSempere	f6562e9f65	mark to-dos as done.	2023-07-24 00:38:28 +02:00
NunoSempere	9c19095955	add serious model, add template.	2023-07-24 00:37:45 +02:00
NunoSempere	84399e60a2	formatting pass	2023-07-24 00:15:05 +02:00
NunoSempere	ef40ef5ae7	change the nuclear probability to monthly	2023-07-24 00:14:44 +02:00
NunoSempere	b76a630d58	formatting pass nuclear probabilities	2023-07-23 23:54:15 +02:00
NunoSempere	8b32d2d9b6	start adding nuclear probabilities	2023-07-23 23:53:57 +02:00
NunoSempere	f8f8bc8de0	fix readme headings	2023-07-23 23:29:57 +02:00