Compare commits

...

10 Commits

61 changed files with 1000889 additions and 550 deletions

View File

@ -46,7 +46,6 @@ This library provides some basic building blocks. The recommended strategy is to
### Cdf auxiliary functions
To help with the above core strategy, this library provides convenience functions, which take a cdf, and return a sample from the distribution produced by that cdf. This might make it easier to program models, at the cost of a 20x to 60x slowdown in the parts of the code that use it.
### Nested functions and compilation with tcc.
@ -60,29 +59,6 @@ GCC has an extension which allows a program to define a function inside another
My recommendation would be to use tcc while drawing a small number of samples for fast iteration, and then using gcc for the final version with lots of samples, and possibly with nested functions for ease of reading by others.
### Error propagation vs exiting on error
The process of taking a cdf and returning a sample might fail, e.g., it's a Newton method which might fail to converge because of cdf artifacts. The cdf itself might also fail, e.g., if a distribution only accepts a range of parameters, but is fed parameters outside that range.
This library provides two approaches:
1. Print the line and function in which the error occured, then exit on error
2. In situations where there might be an error, return a struct containing either the correct value or an error message:
```C
struct box {
int empty;
double content;
char* error_msg;
};
```
The first approach produces terser programs but might not scale. The second approach seems like it could lead to more robust programmes, but is more verbose.
Behaviour on error can be toggled by the `EXIT_ON_ERROR` variable. This library also provides a convenient macro, `PROCESS_ERROR`, to make error handling in either case much terser—see the usage in example 4 in the examples/ folder.
Overall, I'd describe the error handling capabilities of this library as pretty rudimentary. For example, this program might fail in surprising ways if you ask for a lognormal with negative standard deviation, because I haven't added error checking for that case yet.
### Guarantees and licensing
- I offer no guarantees about stability, correctness, performance, etc. I might, for instance, abandon the version in C and rewrite it in Zig, Nim or Rust.
@ -272,6 +248,49 @@ make tidy
It emits one warning about something I already took care of, so by default I've suppressed it. I think this is good news in terms of making me more confident that this simple library is correct :).
### Division between core functions and extraneous expansions
This library differentiates between core functions, which are pretty tightly scoped, and expansions and convenience functions, which are more meandering. Expansions are in `extra.c` and `extra.h`. To use them, take care to link them:
```
// In your C source file
#include "extra.h"
```
```
# When compiling:
gcc -std=c99 -Wall -O3 example.c squiggle.c extra.c -lm -o ./example
```
#### Extra: Cdf auxiliary functions
I provide some Take a cdf, and return a sample from the distribution produced by that cdf. This might make it easier to program models, at the cost of a 20x to 60x slowdown in the parts of the code that use it.
#### Extra: Error propagation vs exiting on error
The process of taking a cdf and returning a sample might fail, e.g., it's a Newton method which might fail to converge because of cdf artifacts. The cdf itself might also fail, e.g., if a distribution only accepts a range of parameters, but is fed parameters outside that range.
This library provides two approaches:
1. Print the line and function in which the error occured, then exit on error
2. In situations where there might be an error, return a struct containing either the correct value or an error message:
```C
struct box {
int empty;
double content;
char* error_msg;
};
```
The first approach produces terser programs but might not scale. The second approach seems like it could lead to more robust programmes, but is more verbose.
Behaviour on error can be toggled by the `EXIT_ON_ERROR` variable. This library also provides a convenient macro, `PROCESS_ERROR`, to make error handling in either case much terser—see the usage in example 4 in the examples/ folder.
Overall, I'd describe the error handling capabilities of this library as pretty rudimentary. For example, this program might fail in surprising ways if you ask for a lognormal with negative standard deviation, because I haven't added error checking for that case yet.
## Related projects
- [Squiggle](https://www.squiggle-language.com/)
@ -282,7 +301,8 @@ It emits one warning about something I already took care of, so by default I've
## To do list
- [ ] Think about whether to write a simple version of this for [uxn](https://100r.co/site/uxn.html), a minimalistic portable programming stack which, sadly, doesn't have doubles (64 bit floats)
- [ ] Document paralellism
- [ ] Document confidence intervals
- [ ] Point out that, even though the C standard is ambiguous about this, this code assumes that doubles are 64 bit precision (otherwise the xorshift should be different).
- [ ] Document rudimentary algebra manipulations for normal/lognormal
- [ ] Think through whether to delete cdf => samples function
@ -291,11 +311,11 @@ It emits one warning about something I already took care of, so by default I've
- complexify and use boxes for everything
- leave as is
- [ ] Systematize references
- [ ] Publish online
- [ ] Support all distribution functions in <https://www.squiggle-language.com/docs/Api/Dist>
- [ ] do so efficiently
- [ ] Add more functions to do algebra and get the 90% c.i. of normals, lognormals, betas, etc.
- Think through which of these make sense.
- [ ] Disambiguate sample_laplace--successes vs failures || successes vs total trials as two distinct and differently named functions
## Done
@ -354,5 +374,5 @@ It emits one warning about something I already took care of, so by default I've
- [ ] Consider desirability of defining shortcuts for those functions. Adds a level of magic, though.
- [ ] Test results
- [x] Move to own file? Or signpost in file? => signposted in file.
- [ ] Disambiguate sample_laplace--successes vs failures || successes vs total trials as two distinct and differently named functions
- [ ] Write twitter thread.
- [x] Write twitter thread: now [here](https://twitter.com/NunoSempere/status/1707041153210564959); retweets appreciated.
- [ ] ~~Think about whether to write a simple version of this for [uxn](https://100r.co/site/uxn.html), a minimalistic portable programming stack which, sadly, doesn't have doubles (64 bit floats)~~

Binary file not shown.

View File

@ -9,6 +9,7 @@ int main()
uint64_t* seed = malloc(sizeof(uint64_t));
*seed = 1000; // xorshift can't start with 0
// ...
free(seed);
}

Binary file not shown.

View File

@ -1,7 +1,7 @@
#include "../../squiggle.h"
#include <stdint.h>
#include <stdlib.h>
#include <stdio.h>
#include <stdlib.h>
// Estimate functions
double sample_0(uint64_t* seed)
@ -24,10 +24,11 @@ double sample_many(uint64_t* seed)
return sample_to(2, 10, seed);
}
int main(){
int main()
{
// set randomness seed
uint64_t* seed = malloc(sizeof(uint64_t));
*seed = 1000; // xorshift can't start with 0
uint64_t* seed = malloc(sizeof(uint64_t));
*seed = 1000; // xorshift can't start with 0
double p_a = 0.8;
double p_b = 0.5;
@ -38,6 +39,6 @@ int main(){
double (*samplers[])(uint64_t*) = { sample_0, sample_1, sample_few, sample_many };
double result_one = sample_mixture(samplers, weights, n_dists, seed);
printf("result_one: %f\n", result_one);
free(seed);
printf("result_one: %f\n", result_one);
free(seed);
}

View File

@ -1,38 +1,38 @@
#include <stdint.h>
#include <stdlib.h>
#include <stdio.h>
#include "../../squiggle.h"
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
int main(){
int main()
{
// set randomness seed
uint64_t* seed = malloc(sizeof(uint64_t));
*seed = 1000; // xorshift can't start with 0
uint64_t* seed = malloc(sizeof(uint64_t));
*seed = 1000; // xorshift can't start with 0
double p_a = 0.8;
double p_b = 0.5;
double p_c = p_a * p_b;
int n_dists = 4;
double sample_0(uint64_t* seed){ return 0; }
double sample_1(uint64_t* seed) { return 1; }
double sample_few(uint64_t* seed){ return sample_to(1, 3, seed); }
double sample_many(uint64_t* seed){ return sample_to(2, 10, seed); }
double sample_0(uint64_t * seed) { return 0; }
double sample_1(uint64_t * seed) { return 1; }
double sample_few(uint64_t * seed) { return sample_to(1, 3, seed); }
double sample_many(uint64_t * seed) { return sample_to(2, 10, seed); }
double (*samplers[])(uint64_t*) = { sample_0, sample_1, sample_few, sample_many };
double weights[] = { 1 - p_c, p_c / 2, p_c / 4, p_c / 4 };
double weights[] = { 1 - p_c, p_c / 2, p_c / 4, p_c / 4 };
int n_samples = 1000000;
double* result_many = (double *) malloc(n_samples * sizeof(double));
for(int i=0; i<n_samples; i++){
result_many[i] = sample_mixture(samplers, weights, n_dists, seed);
}
printf("result_many: [");
for(int i=0; i<100; i++){
printf("%.2f, ", result_many[i]);
}
printf("]\n");
free(seed);
int n_samples = 1000000;
double* result_many = (double*)malloc(n_samples * sizeof(double));
for (int i = 0; i < n_samples; i++) {
result_many[i] = sample_mixture(samplers, weights, n_dists, seed);
}
printf("result_many: [");
for (int i = 0; i < 100; i++) {
printf("%.2f, ", result_many[i]);
}
printf("]\n");
free(seed);
}

View File

@ -1,4 +1,5 @@
#include "../../squiggle.h"
#include "../../squiggle_more.h"
#include <math.h>
#include <stdint.h>
#include <stdio.h>
@ -87,7 +88,7 @@ int main()
printf("\nGetting some samples from sample_unit_normal\n");
clock_t begin_2 = clock();
double* normal_samples = malloc(NUM_SAMPLES * sizeof(double));
double* normal_samples = malloc(NUM_SAMPLES * sizeof(double));
for (int i = 0; i < NUM_SAMPLES; i++) {
normal_samples[i] = sample_unit_normal(seed);
// printf("%f\n", normal_sample);

View File

@ -9,12 +9,13 @@ CC=gcc # required for nested functions
# CC=tcc # <= faster compilation
# Main file
SRC=example.c ../../squiggle.c
SRC=example.c ../../squiggle.c ../../squiggle_more.c
OUTPUT=./example
## Dependencies
OPENMP=-fopenmp
MATH=-lm
DEPENDENCIES=$(MATH)
DEPENDENCIES=$(MATH) $(OPENMP)
# OPENMP=-fopenmp
## Flags

View File

@ -1,4 +1,5 @@
#include "../../squiggle.h"
#include "../../squiggle_more.h"
#include <math.h>
#include <stdint.h>
#include <stdio.h>

View File

@ -9,7 +9,7 @@
# CC=tcc # <= faster compilation
# Main file
SRC=example.c ../../squiggle.c
SRC=example.c ../../squiggle.c ../../squiggle_more.c
OUTPUT=./example
## Dependencies

Binary file not shown.

View File

@ -11,8 +11,8 @@ int main()
uint64_t* seed = malloc(sizeof(uint64_t));
*seed = 1000; // xorshift can't start with 0
int n = 1000 * 1000;
/*
int n = 1000 * 1000;
/*
for (int i = 0; i < n; i++) {
double gamma_0 = sample_gamma(0.0, seed);
// printf("sample_gamma(0.0): %f\n", gamma_0);
@ -20,26 +20,25 @@ int main()
printf("\n");
*/
double* gamma_1_array = malloc(sizeof(double) * n);
double* gamma_1_array = malloc(sizeof(double) * n);
for (int i = 0; i < n; i++) {
double gamma_1 = sample_gamma(1.0, seed);
// printf("sample_gamma(1.0): %f\n", gamma_1);
gamma_1_array[i] = gamma_1;
gamma_1_array[i] = gamma_1;
}
printf("gamma(1) summary statistics = mean: %f, std: %f\n", array_mean(gamma_1_array, n), array_std(gamma_1_array, n));
free(gamma_1_array);
printf("\n");
double* beta_1_2_array = malloc(sizeof(double) * n);
for (int i = 0; i < n; i++) {
printf("gamma(1) summary statistics = mean: %f, std: %f\n", array_mean(gamma_1_array, n), array_std(gamma_1_array, n));
free(gamma_1_array);
printf("\n");
double* beta_1_2_array = malloc(sizeof(double) * n);
for (int i = 0; i < n; i++) {
double beta_1_2 = sample_beta(1, 2.0, seed);
// printf("sample_beta(1.0, 2.0): %f\n", beta_1_2);
beta_1_2_array[i] = beta_1_2;
beta_1_2_array[i] = beta_1_2;
}
printf("beta(1,2) summary statistics: mean: %f, std: %f\n", array_mean(beta_1_2_array, n), array_std(beta_1_2_array, n));
free(beta_1_2_array);
printf("\n");
free(seed);
}
printf("beta(1,2) summary statistics: mean: %f, std: %f\n", array_mean(beta_1_2_array, n), array_std(beta_1_2_array, n));
free(beta_1_2_array);
printf("\n");
free(seed);
}

Binary file not shown.

View File

@ -1,11 +1,13 @@
#include "../../squiggle.h"
#include "../../squiggle_more.h"
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
// Estimate functions
double beta_1_2_sampler(uint64_t* seed){
return sample_beta(1, 2.0, seed);
double beta_1_2_sampler(uint64_t* seed)
{
return sample_beta(1, 2.0, seed);
}
int main()
@ -14,8 +16,8 @@ int main()
uint64_t* seed = malloc(sizeof(uint64_t));
*seed = 1000; // xorshift can't start with 0
struct c_i beta_1_2_ci_90 = get_90_confidence_interval(beta_1_2_sampler, seed);
printf("90%% confidence interval of beta(1,2) is [%f, %f]\n", beta_1_2_ci_90.low, beta_1_2_ci_90.high);
free(seed);
ci beta_1_2_ci_90 = get_90_confidence_interval(beta_1_2_sampler, seed);
printf("90%% confidence interval of beta(1,2) is [%f, %f]\n", beta_1_2_ci_90.low, beta_1_2_ci_90.high);
free(seed);
}

View File

@ -9,7 +9,7 @@ CC=gcc
# CC=tcc # <= faster compilation
# Main file
SRC=example.c ../../squiggle.c
SRC=example.c ../../squiggle.c ../../squiggle_more.c
OUTPUT=example
## Dependencies

Binary file not shown.

View File

@ -1,4 +1,5 @@
#include "../../squiggle.h"
#include "../../squiggle_more.h"
#include <math.h>
#include <stdint.h>
#include <stdio.h>

View File

@ -9,7 +9,7 @@ CC=gcc
# CC=tcc # <= faster compilation
# Main file
SRC=example.c ../../squiggle.c
SRC=example.c ../../squiggle.c ../../squiggle_more.c
OUTPUT=example
## Dependencies

Binary file not shown.

View File

@ -1,4 +1,5 @@
#include "../../squiggle.h"
#include "../../squiggle_more.h"
#include <math.h>
#include <stdint.h>
#include <stdio.h>
@ -33,7 +34,7 @@ int main()
result[i] = sample_minutes_per_day_jumping_rope_needed_to_burn_10kg(seed);
}
printf("How many minutes per day do I have to jump rope to lose 10kg of fat by the end of the year?\n");
printf("## How many minutes per day do I have to jump rope to lose 10kg of fat by the end of the year?\n");
printf("Mean: %f\n", array_mean(result, n));
printf("A few samples: [ ");
for (int i = 0; i < 9; i++) {
@ -41,8 +42,8 @@ int main()
}
printf("... ]\n");
struct c_i c_i_90 = get_90_confidence_interval(sample_minutes_per_day_jumping_rope_needed_to_burn_10kg, seed);
printf("90%% confidence interval: [%f, %f]\n", c_i_90.low, c_i_90.high);
ci ci_90 = get_90_confidence_interval(sample_minutes_per_day_jumping_rope_needed_to_burn_10kg, seed);
printf("90%% confidence interval: [%f, %f]\n", ci_90.low, ci_90.high);
free(seed);
}

View File

@ -5,22 +5,25 @@
# make run
# Compiler
CC=gcc
CC=gcc # required for nested functions
# CC=tcc # <= faster compilation
# Main file
SRC=example.c ../../squiggle.c
OUTPUT=example
SRC=example.c ../../squiggle.c ../../squiggle_more.c
OUTPUT=./example
## Dependencies
MATH=-lm
DEPENDENCIES=$(MATH)
# OPENMP=-fopenmp
## Flags
DEBUG= #'-g'
STANDARD=-std=c99
STANDARD=-std=c99 ## gnu99 allows for nested functions.
EXTENSIONS= #-fnested-functions
WARNINGS=-Wall
OPTIMIZED=-O3 #-Ofast
# OPENMP=-fopenmp
OPTIMIZED=-O3#-Ofast
CFLAGS=$(DEBUG) $(STANDARD) $(EXTENSIONS) $(WARNINGS) $(OPTIMIZED)
## Formatter
STYLE_BLUEPRINT=webkit
@ -28,13 +31,14 @@ FORMATTER=clang-format -i -style=$(STYLE_BLUEPRINT)
## make build
build: $(SRC)
$(CC) $(OPTIMIZED) $(DEBUG) $(SRC) $(MATH) -o $(OUTPUT)
# gcc -std=gnu99 example.c -lm -o example
$(CC) $(CFLAGS) $(SRC) $(DEPENDENCIES) -o $(OUTPUT)
format: $(SRC)
$(FORMATTER) $(SRC)
run: $(SRC) $(OUTPUT)
OMP_NUM_THREADS=1 ./$(OUTPUT) && echo
./$(OUTPUT) && echo
time-linux:
@echo "Requires /bin/time, found on GNU/Linux systems" && echo

Binary file not shown.

View File

@ -1,4 +1,5 @@
#include "../../squiggle.h"
#include "../../squiggle_more.h"
#include <math.h>
#include <stdint.h>
#include <stdio.h>
@ -6,34 +7,38 @@
double yearly_probability_nuclear_collapse(double year, uint64_t* seed)
{
double successes = 0;
double failures = (year - 1960);
return sample_laplace(successes, failures, seed);
// ^ can change to (successes + 1)/(trials + 2)
// to get a probability,
// rather than sampling from a distribution over probabilities.
double successes = 0;
double failures = (year - 1960);
return sample_laplace(successes, failures, seed);
// ^ can change to (successes + 1)/(trials + 2)
// to get a probability,
// rather than sampling from a distribution over probabilities.
}
double yearly_probability_nuclear_collapse_2023(uint64_t* seed){
return yearly_probability_nuclear_collapse(2023, seed);
double yearly_probability_nuclear_collapse_2023(uint64_t* seed)
{
return yearly_probability_nuclear_collapse(2023, seed);
}
double yearly_probability_nuclear_collapse_after_recovery(double year, double rebuilding_period_length_years, uint64_t* seed){
// assumption: nuclear
double successes = 1.0;
double failures = (year - rebuilding_period_length_years - 1960 - 1);
return sample_laplace(successes, failures, seed);
double yearly_probability_nuclear_collapse_after_recovery(double year, double rebuilding_period_length_years, uint64_t* seed)
{
// assumption: nuclear
double successes = 1.0;
double failures = (year - rebuilding_period_length_years - 1960 - 1);
return sample_laplace(successes, failures, seed);
}
double yearly_probability_nuclear_collapse_after_recovery_example(uint64_t* seed){
double year = 2070;
double rebuilding_period_length_years =30;
// So, there was a nuclear collapse in 2040,
// then a recovery period of 30 years
// and it's now 2070
return yearly_probability_nuclear_collapse_after_recovery(year, rebuilding_period_length_years, seed);
double yearly_probability_nuclear_collapse_after_recovery_example(uint64_t* seed)
{
double year = 2070;
double rebuilding_period_length_years = 30;
// So, there was a nuclear collapse in 2040,
// then a recovery period of 30 years
// and it's now 2070
return yearly_probability_nuclear_collapse_after_recovery(year, rebuilding_period_length_years, seed);
}
double yearly_probability_nuclear_collapse_after_recovery_antiinductive(uint64_t* seed){
return yearly_probability_nuclear_collapse(2023, seed)/2;
double yearly_probability_nuclear_collapse_after_recovery_antiinductive(uint64_t* seed)
{
return yearly_probability_nuclear_collapse(2023, seed) / 2;
}
int main()
@ -44,8 +49,8 @@ int main()
int num_samples = 1000000;
// Before a first nuclear collapse
printf("## Before the first nuclear collapse\n");
// Before a first nuclear collapse
printf("## Before the first nuclear collapse\n");
ci ci_90_2023 = get_90_confidence_interval(yearly_probability_nuclear_collapse_2023, seed);
printf("90%% confidence interval: [%f, %f]\n", ci_90_2023.low, ci_90_2023.high);
@ -53,10 +58,10 @@ int main()
for (int i = 0; i < num_samples; i++) {
yearly_probability_nuclear_collapse_2023_samples[i] = yearly_probability_nuclear_collapse_2023(seed);
}
printf("mean: %f\n", array_mean(yearly_probability_nuclear_collapse_2023_samples, num_samples));
printf("mean: %f\n", array_mean(yearly_probability_nuclear_collapse_2023_samples, num_samples));
// After the first nuclear collapse
printf("\n## After the first nuclear collapse\n");
// After the first nuclear collapse
printf("\n## After the first nuclear collapse\n");
ci ci_90_2070 = get_90_confidence_interval(yearly_probability_nuclear_collapse_after_recovery_example, seed);
printf("90%% confidence interval: [%f, %f]\n", ci_90_2070.low, ci_90_2070.high);
@ -64,10 +69,10 @@ int main()
for (int i = 0; i < num_samples; i++) {
yearly_probability_nuclear_collapse_after_recovery_samples[i] = yearly_probability_nuclear_collapse_after_recovery_example(seed);
}
printf("mean: %f\n", array_mean(yearly_probability_nuclear_collapse_after_recovery_samples, num_samples));
printf("mean: %f\n", array_mean(yearly_probability_nuclear_collapse_after_recovery_samples, num_samples));
// After the first nuclear collapse (antiinductive)
printf("\n## After the first nuclear collapse (antiinductive)\n");
// After the first nuclear collapse (antiinductive)
printf("\n## After the first nuclear collapse (antiinductive)\n");
ci ci_90_antiinductive = get_90_confidence_interval(yearly_probability_nuclear_collapse_after_recovery_antiinductive, seed);
printf("90%% confidence interval: [%f, %f]\n", ci_90_antiinductive.low, ci_90_antiinductive.high);
@ -75,7 +80,7 @@ int main()
for (int i = 0; i < num_samples; i++) {
yearly_probability_nuclear_collapse_after_recovery_antiinductive_samples[i] = yearly_probability_nuclear_collapse_after_recovery_antiinductive(seed);
}
printf("mean: %f\n", array_mean(yearly_probability_nuclear_collapse_after_recovery_antiinductive_samples, num_samples));
printf("mean: %f\n", array_mean(yearly_probability_nuclear_collapse_after_recovery_antiinductive_samples, num_samples));
free(seed);
}

View File

@ -5,22 +5,25 @@
# make run
# Compiler
CC=gcc
CC=gcc # required for nested functions
# CC=tcc # <= faster compilation
# Main file
SRC=example.c ../../squiggle.c
OUTPUT=example
SRC=example.c ../../squiggle.c ../../squiggle_more.c
OUTPUT=./example
## Dependencies
MATH=-lm
DEPENDENCIES=$(MATH)
# OPENMP=-fopenmp
## Flags
DEBUG= #'-g'
STANDARD=-std=c99
STANDARD=-std=c99 ## gnu99 allows for nested functions.
EXTENSIONS= #-fnested-functions
WARNINGS=-Wall
OPTIMIZED=-O3 #-Ofast
# OPENMP=-fopenmp
OPTIMIZED=-O3#-Ofast
CFLAGS=$(DEBUG) $(STANDARD) $(EXTENSIONS) $(WARNINGS) $(OPTIMIZED)
## Formatter
STYLE_BLUEPRINT=webkit
@ -28,13 +31,14 @@ FORMATTER=clang-format -i -style=$(STYLE_BLUEPRINT)
## make build
build: $(SRC)
$(CC) $(OPTIMIZED) $(DEBUG) $(SRC) $(MATH) -o $(OUTPUT)
# gcc -std=gnu99 example.c -lm -o example
$(CC) $(CFLAGS) $(SRC) $(DEPENDENCIES) -o $(OUTPUT)
format: $(SRC)
$(FORMATTER) $(SRC)
run: $(SRC) $(OUTPUT)
OMP_NUM_THREADS=1 ./$(OUTPUT) && echo
./$(OUTPUT) && echo
time-linux:
@echo "Requires /bin/time, found on GNU/Linux systems" && echo

Binary file not shown.

View File

@ -1,4 +1,5 @@
#include "../../squiggle.h"
#include "../../squiggle_more.h"
#include <math.h>
#include <stdint.h>
#include <stdio.h>

View File

@ -5,22 +5,25 @@
# make run
# Compiler
CC=gcc
CC=gcc # required for nested functions
# CC=tcc # <= faster compilation
# Main file
SRC=example.c ../../squiggle.c
OUTPUT=example
SRC=example.c ../../squiggle.c ../../squiggle_more.c
OUTPUT=./example
## Dependencies
MATH=-lm
DEPENDENCIES=$(MATH)
# OPENMP=-fopenmp
## Flags
DEBUG= #'-g'
STANDARD=-std=c99
STANDARD=-std=c99 ## gnu99 allows for nested functions.
EXTENSIONS= #-fnested-functions
WARNINGS=-Wall
OPTIMIZED=-O3 #-Ofast
# OPENMP=-fopenmp
OPTIMIZED=-O3#-Ofast
CFLAGS=$(DEBUG) $(STANDARD) $(EXTENSIONS) $(WARNINGS) $(OPTIMIZED)
## Formatter
STYLE_BLUEPRINT=webkit
@ -28,13 +31,14 @@ FORMATTER=clang-format -i -style=$(STYLE_BLUEPRINT)
## make build
build: $(SRC)
$(CC) $(OPTIMIZED) $(DEBUG) $(SRC) $(MATH) -o $(OUTPUT)
# gcc -std=gnu99 example.c -lm -o example
$(CC) $(CFLAGS) $(SRC) $(DEPENDENCIES) -o $(OUTPUT)
format: $(SRC)
$(FORMATTER) $(SRC)
run: $(SRC) $(OUTPUT)
OMP_NUM_THREADS=1 ./$(OUTPUT) && echo
./$(OUTPUT) && echo
time-linux:
@echo "Requires /bin/time, found on GNU/Linux systems" && echo

View File

@ -1,4 +1,5 @@
#include "../../squiggle.h"
#include "../../squiggle_more.h"
#include <math.h>
#include <stdint.h>
#include <stdio.h>
@ -9,20 +10,20 @@ int main()
// set randomness seed
uint64_t* seed = malloc(sizeof(uint64_t));
*seed = 1000; // xorshift can't start with 0
// Convert to 90% confidence interval form and back
lognormal_params ln1 = { .logmean = 1.0, .logstd = 3.0 };
ci ln1_ci = convert_lognormal_params_to_ci(ln1);
printf("The 90%% confidence interval of Lognormal(%f, %f) is [%f, %f]\n",
ln1.logmean, ln1.logstd,
ln1_ci.low, ln1_ci.high);
lognormal_params ln1_ci_paramas = convert_ci_to_lognormal_params(ln1_ci);
ln1.logmean, ln1.logstd,
ln1_ci.low, ln1_ci.high);
lognormal_params ln1_params2 = convert_ci_to_lognormal_params(ln1_ci);
printf("The lognormal which has 90%% confidence interval [%f, %f] is Lognormal(%f, %f)\n",
ln1_ci.low, ln1_ci.high,
ln1.logmean, ln1.logstd);
ln1_ci.low, ln1_ci.high,
ln1_params2.logmean, ln1_params2.logstd);
lognormal_params ln2 = convert_ci_to_lognormal_params((ci){.low = 1, .high = 10});
lognormal_params ln3 = convert_ci_to_lognormal_params((ci){.low = 5, .high = 50});
lognormal_params ln2 = convert_ci_to_lognormal_params((ci) { .low = 1, .high = 10 });
lognormal_params ln3 = convert_ci_to_lognormal_params((ci) { .low = 5, .high = 50 });
lognormal_params sln = algebra_product_lognormals(ln2, ln3);
ci sln_ci = convert_lognormal_params_to_ci(sln);

View File

@ -5,22 +5,25 @@
# make run
# Compiler
CC=gcc
CC=gcc # required for nested functions
# CC=tcc # <= faster compilation
# Main file
SRC=example.c ../../squiggle.c
OUTPUT=example
SRC=example.c ../../squiggle.c ../../squiggle_more.c
OUTPUT=./example
## Dependencies
MATH=-lm
DEPENDENCIES=$(MATH)
# OPENMP=-fopenmp
## Flags
DEBUG= #'-g'
STANDARD=-std=c99
STANDARD=-std=c99 ## gnu99 allows for nested functions.
EXTENSIONS= #-fnested-functions
WARNINGS=-Wall
OPTIMIZED=-O3 #-Ofast
# OPENMP=-fopenmp
OPTIMIZED=-O3#-Ofast
CFLAGS=$(DEBUG) $(STANDARD) $(EXTENSIONS) $(WARNINGS) $(OPTIMIZED)
## Formatter
STYLE_BLUEPRINT=webkit
@ -28,13 +31,14 @@ FORMATTER=clang-format -i -style=$(STYLE_BLUEPRINT)
## make build
build: $(SRC)
$(CC) $(OPTIMIZED) $(DEBUG) $(SRC) $(MATH) -o $(OUTPUT)
# gcc -std=gnu99 example.c -lm -o example
$(CC) $(CFLAGS) $(SRC) $(DEPENDENCIES) -o $(OUTPUT)
format: $(SRC)
$(FORMATTER) $(SRC)
run: $(SRC) $(OUTPUT)
OMP_NUM_THREADS=1 ./$(OUTPUT) && echo
./$(OUTPUT) && echo
time-linux:
@echo "Requires /bin/time, found on GNU/Linux systems" && echo

View File

@ -1,13 +1,14 @@
#include "../../squiggle.h"
#include "../../squiggle_more.h"
#include <math.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#define ln lognormal_params
#define to(...) convert_ci_to_lognormal_params((ci) __VA_ARGS__)
#define from(...) convert_lognormal_params_to_ci((ln) __VA_ARGS__)
#define times(a,b) algebra_product_lognormals(a,b)
#define to(...) convert_ci_to_lognormal_params((ci)__VA_ARGS__)
#define from(...) convert_lognormal_params_to_ci((ln)__VA_ARGS__)
#define times(a, b) algebra_product_lognormals(a, b)
int main()
{
@ -15,8 +16,8 @@ int main()
uint64_t* seed = malloc(sizeof(uint64_t));
*seed = 1000; // xorshift can't start with 0
ln a = to({.low = 1, .high = 10});
ln b = to({.low = 5, .high = 500});
ln a = to({ .low = 1, .high = 10 });
ln b = to({ .low = 5, .high = 500 });
ln c = times(a, b);
printf("Result: to(%f, %f)\n", from(c).low, from(c).high);

View File

@ -5,22 +5,25 @@
# make run
# Compiler
CC=gcc
CC=gcc # required for nested functions
# CC=tcc # <= faster compilation
# Main file
SRC=example.c ../../squiggle.c
OUTPUT=example
SRC=example.c ../../squiggle.c ../../squiggle_more.c
OUTPUT=./example
## Dependencies
MATH=-lm
DEPENDENCIES=$(MATH)
# OPENMP=-fopenmp
## Flags
DEBUG= #'-g'
STANDARD=-std=c99
STANDARD=-std=c99 ## gnu99 allows for nested functions.
EXTENSIONS= #-fnested-functions
WARNINGS=-Wall
OPTIMIZED=-O3 #-Ofast
# OPENMP=-fopenmp
OPTIMIZED=-O3#-Ofast
CFLAGS=$(DEBUG) $(STANDARD) $(EXTENSIONS) $(WARNINGS) $(OPTIMIZED)
## Formatter
STYLE_BLUEPRINT=webkit
@ -28,13 +31,14 @@ FORMATTER=clang-format -i -style=$(STYLE_BLUEPRINT)
## make build
build: $(SRC)
$(CC) $(OPTIMIZED) $(DEBUG) $(SRC) $(MATH) -o $(OUTPUT)
# gcc -std=gnu99 example.c -lm -o example
$(CC) $(CFLAGS) $(SRC) $(DEPENDENCIES) -o $(OUTPUT)
format: $(SRC)
$(FORMATTER) $(SRC)
run: $(SRC) $(OUTPUT)
OMP_NUM_THREADS=1 ./$(OUTPUT) && echo
./$(OUTPUT) && echo
time-linux:
@echo "Requires /bin/time, found on GNU/Linux systems" && echo

View File

@ -1,21 +1,26 @@
#include "../../squiggle.h"
#include "../../squiggle_more.h"
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
double sample_0(uint64_t* seed){
double sample_0(uint64_t* seed)
{
return 0;
}
double sample_1(uint64_t* seed){
double sample_1(uint64_t* seed)
{
return 1;
}
double sample_normal_mean_1_std_2(uint64_t* seed){
double sample_normal_mean_1_std_2(uint64_t* seed)
{
return sample_normal(1, 2, seed);
}
double sample_1_to_3(uint64_t* seed){
double sample_1_to_3(uint64_t* seed)
{
return sample_to(1, 3, seed);
}
@ -27,11 +32,11 @@ int main()
int n_dists = 4;
double weights[] = { 1, 2, 3, 4 };
double (*samplers[])(uint64_t*) = {
sample_0,
sample_1,
sample_normal_mean_1_std_2,
sample_1_to_3
double (*samplers[])(uint64_t*) = {
sample_0,
sample_1,
sample_normal_mean_1_std_2,
sample_1_to_3
};
int n_samples = 10;

View File

@ -5,22 +5,25 @@
# make run
# Compiler
CC=gcc
CC=gcc # required for nested functions
# CC=tcc # <= faster compilation
# Main file
SRC=example.c ../../squiggle.c
OUTPUT=example
SRC=example.c ../../squiggle.c ../../squiggle_more.c
OUTPUT=./example
## Dependencies
MATH=-lm
DEPENDENCIES=$(MATH)
# OPENMP=-fopenmp
## Flags
DEBUG= #'-g'
STANDARD=-std=c99
STANDARD=-std=c99 ## gnu99 allows for nested functions.
EXTENSIONS= #-fnested-functions
WARNINGS=-Wall
OPTIMIZED=-O3 #-Ofast
# OPENMP=-fopenmp
OPTIMIZED=-O3#-Ofast
CFLAGS=$(DEBUG) $(STANDARD) $(EXTENSIONS) $(WARNINGS) $(OPTIMIZED)
## Formatter
STYLE_BLUEPRINT=webkit
@ -28,7 +31,8 @@ FORMATTER=clang-format -i -style=$(STYLE_BLUEPRINT)
## make build
build: $(SRC)
$(CC) $(OPTIMIZED) $(DEBUG) $(SRC) $(MATH) -o $(OUTPUT)
# gcc -std=gnu99 example.c -lm -o example
$(CC) $(CFLAGS) $(SRC) $(DEPENDENCIES) -o $(OUTPUT)
format: $(SRC)
$(FORMATTER) $(SRC)
@ -40,7 +44,7 @@ time-linux:
@echo "Requires /bin/time, found on GNU/Linux systems" && echo
@echo "Running 100x and taking avg time $(OUTPUT)"
@t=$$(/usr/bin/time -f "%e" -p bash -c 'for i in {1..100}; do ./$(OUTPUT); done' 2>&1 >/dev/null | grep real | awk '{print $$2}' ); echo "scale=2; 1000 * $$t / 100" | bc | sed "s|^|Time using 1 thread: |" | sed 's|$$|ms|' && echo
@t=$$(/usr/bin/time -f "%e" -p bash -c 'for i in {1..100}; do $(OUTPUT); done' 2>&1 >/dev/null | grep real | awk '{print $$2}' ); echo "scale=2; 1000 * $$t / 100" | bc | sed "s|^|Time using 1 thread: |" | sed 's|$$|ms|' && echo
## Profiling

View File

@ -0,0 +1,7 @@
library(ggplot2)
data <- read.csv("samples.txt", header = FALSE)
data <- as.data.frame(data)
ggplot(data = data, aes(x = V1)) +
geom_bar()
ggplot(data = data, aes(x = V1)) +
geom_freqpoly()

View File

@ -0,0 +1,3 @@
build:
format:

View File

@ -0,0 +1,12 @@
options:
- use gnuplot
- requires aggregation
- do aggregation in its own gnuplot language
- use something like awk
- use some other language, like python or R, specifically to do the aggregation
- do the aggregation in C
- use ggplot
- requires setting up bins correctly
- also can't pipe to an r script directly, sadly enough.
- use python for plotting?
- just present the confidence intervals in C?

View File

@ -0,0 +1,7 @@
library(ggplot2)
data <- read.csv("samples.txt", header = FALSE)
data <- as.data.frame(data)
ggplot(data = data, aes(x = V1)) +
geom_freqpoly()

File diff suppressed because it is too large Load Diff

Binary file not shown.

View File

@ -0,0 +1,50 @@
#include "../../squiggle.h"
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
// Estimate functions
double sample_0(uint64_t* seed)
{
return 0;
}
double sample_1(uint64_t* seed)
{
return 1;
}
double sample_few(uint64_t* seed)
{
return sample_to(1, 3, seed);
}
double sample_many(uint64_t* seed)
{
return sample_to(2, 10, seed);
}
int main()
{
// set randomness seed
uint64_t* seed = malloc(sizeof(uint64_t));
*seed = 1000; // xorshift can't start with 0
double p_a = 0.8;
double p_b = 0.5;
double p_c = p_a * p_b;
int n_dists = 4;
double weights[] = { 1 - p_c, p_c / 2, p_c / 4, p_c / 4 };
double (*samplers[])(uint64_t*) = { sample_0, sample_1, sample_few, sample_many };
int n_samples = 1000000;
double* result_many = (double*)malloc(n_samples * sizeof(double));
for (int i = 0; i < n_samples; i++) {
result_many[i] = sample_mixture(samplers, weights, n_dists, seed);
printf("%f\n", result_many[i]);
}
// printf("Mean: %f\n", array_mean(result_many, n_samples));
free(seed);
}

View File

@ -0,0 +1,53 @@
# Interface:
# make
# make build
# make format
# make run
# Compiler
CC=gcc
# CC=tcc # <= faster compilation
# Main file
SRC=example.c ../../squiggle.c
OUTPUT=example
## Dependencies
MATH=-lm
## Flags
DEBUG= #'-g'
STANDARD=-std=c99
WARNINGS=-Wall
OPTIMIZED=-O3 #-Ofast
# OPENMP=-fopenmp
## Formatter
STYLE_BLUEPRINT=webkit
FORMATTER=clang-format -i -style=$(STYLE_BLUEPRINT)
## make build
build: $(SRC)
$(CC) $(OPTIMIZED) $(DEBUG) $(SRC) $(MATH) -o $(OUTPUT)
format: $(SRC)
$(FORMATTER) $(SRC)
run: $(SRC) $(OUTPUT)
./$(OUTPUT) && echo
time-linux:
@echo "Requires /bin/time, found on GNU/Linux systems" && echo
@echo "Running 100x and taking avg time $(OUTPUT)"
@t=$$(/usr/bin/time -f "%e" -p bash -c 'for i in {1..100}; do ./$(OUTPUT); done' 2>&1 >/dev/null | grep real | awk '{print $$2}' ); echo "scale=2; 1000 * $$t / 100" | bc | sed "s|^|Time using 1 thread: |" | sed 's|$$|ms|' && echo
## Profiling
profile-linux:
echo "Requires perf, which depends on the kernel version, and might be in linux-tools package or similar"
echo "Must be run as sudo"
$(CC) $(SRC) $(MATH) -o $(OUTPUT)
sudo perf record ./$(OUTPUT)
sudo perf report
rm perf.data

Binary file not shown.

View File

@ -0,0 +1,18 @@
#include "../../squiggle.h"
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
// Estimate functions
int main()
{
// set randomness seed
uint64_t* seed = malloc(sizeof(uint64_t));
*seed = 1000; // xorshift can't start with 0
for (int i = 0; i < 100; i++) {
double sample = sample_lognormal(0, 10, seed);
printf("%f\n", sample);
}
free(seed);
}

View File

@ -0,0 +1,53 @@
# Interface:
# make
# make build
# make format
# make run
# Compiler
CC=gcc
# CC=tcc # <= faster compilation
# Main file
SRC=example.c ../../squiggle.c
OUTPUT=example
## Dependencies
MATH=-lm
## Flags
DEBUG= #'-g'
STANDARD=-std=c99
WARNINGS=-Wall
OPTIMIZED=-O3 #-Ofast
# OPENMP=-fopenmp
## Formatter
STYLE_BLUEPRINT=webkit
FORMATTER=clang-format -i -style=$(STYLE_BLUEPRINT)
## make build
build: $(SRC)
$(CC) $(OPTIMIZED) $(DEBUG) $(SRC) $(MATH) -o $(OUTPUT)
format: $(SRC)
$(FORMATTER) $(SRC)
run: $(SRC) $(OUTPUT)
OMP_NUM_THREADS=1 ./$(OUTPUT) && echo
time-linux:
@echo "Requires /bin/time, found on GNU/Linux systems" && echo
@echo "Running 100x and taking avg time $(OUTPUT)"
@t=$$(/usr/bin/time -f "%e" -p bash -c 'for i in {1..100}; do $(OUTPUT); done' 2>&1 >/dev/null | grep real | awk '{print $$2}' ); echo "scale=2; 1000 * $$t / 100" | bc | sed "s|^|Time using 1 thread: |" | sed 's|$$|ms|' && echo
## Profiling
profile-linux:
echo "Requires perf, which depends on the kernel version, and might be in linux-tools package or similar"
echo "Must be run as sudo"
$(CC) $(SRC) $(MATH) -o $(OUTPUT)
sudo perf record ./$(OUTPUT)
sudo perf report
rm perf.data

View File

@ -0,0 +1 @@
./example | sort -h

View File

@ -4,7 +4,7 @@ MAKEFLAGS += --no-print-directory
STYLE_BLUEPRINT=webkit
FORMATTER=clang-format -i -style=$(STYLE_BLUEPRINT)
all:
build-examples:
cd examples/01_one_sample && make && echo
cd examples/02_many_samples_time_to_botec && make && echo
cd examples/03_gcc_nested_function && make && echo
@ -12,6 +12,33 @@ all:
cd examples/05_sample_from_cdf_beta && make && echo
cd examples/06_gamma_beta && make && echo
cd examples/07_ci_beta && make && echo
cd examples/08_nuclear_war && make && echo
cd examples/09_burn_10kg_fat && make && echo
cd examples/10_nuclear_recovery && make && echo
cd examples/11_algebra && make && echo
cd examples/12_algebra_and_conversion && make && echo
cd examples/13_ergonomic_algebra && make && echo
cd examples/14_twitter_thread_example && make && echo
cd examples/15_plotting-scratchpad && make && echo
cd examples/16_100_lognormal_samples && make && echo
format-examples:
cd examples/01_one_sample && make format && echo
cd examples/02_many_samples_time_to_botec && make format && echo
cd examples/03_gcc_nested_function && make format && echo
cd examples/04_sample_from_cdf_simple && make format && echo
cd examples/05_sample_from_cdf_beta && make format && echo
cd examples/06_gamma_beta && make format && echo
cd examples/07_ci_beta && make format && echo
cd examples/08_nuclear_war && make format && echo
cd examples/09_burn_10kg_fat && make format && echo
cd examples/10_nuclear_recovery && make format && echo
cd examples/11_algebra && make format && echo
cd examples/12_algebra_and_conversion && make format && echo
cd examples/13_ergonomic_algebra && make format && echo
cd examples/14_twitter_thread_example && make format && echo
cd examples/15_plotting-scratchpad && make format && echo
cd examples/16_100_lognormal_samples && make format && echo
format: squiggle.c squiggle.h
$(FORMATTER) squiggle.c
@ -19,3 +46,4 @@ format: squiggle.c squiggle.h
lint:
clang-tidy squiggle.c -- -lm
clang-tidy extra.c -- -lm

Binary file not shown.

View File

@ -9,12 +9,14 @@ int main()
// set randomness seed
uint64_t* seed = malloc(sizeof(uint64_t));
*seed = 1000; // xorshift can't start with a seed of 0
/*
for (int i = 0; i < 100; i++) {
double draw = sample_unit_uniform(seed);
printf("%f\n", draw);
}
}*/
// Test division
printf("\n%d\n", 10 % 3);
free(seed);
}

View File

@ -1,23 +1,11 @@
#include <float.h>
#include <limits.h>
#include <math.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <time.h>
// Some error niceties; these won't be used until later
#define MAX_ERROR_LENGTH 500
#define EXIT_ON_ERROR 0
#define PROCESS_ERROR(error_msg) process_error(error_msg, EXIT_ON_ERROR, __FILE__, __LINE__)
// math constants
#define PI 3.14159265358979323846 // M_PI in gcc gnu99
#define NORMAL90CONFIDENCE 1.6448536269514722
// # Key functionality
// Define the minimum number of functions needed to do simple estimation
// Starts here, ends until the end of the mixture function
#define NORMAL90CONFIDENCE 1.6448536269514727
// Pseudo Random number generator
uint64_t xorshift32(uint32_t* seed)
@ -85,6 +73,7 @@ inline double sample_normal_from_90_confidence_interval(double low, double high,
// 1. We know that the 90% confidence interval of the unit normal is
// [-1.6448536269514722, 1.6448536269514722]
// see e.g.: https://stackoverflow.com/questions/20626994/how-to-calculate-the-inverse-of-the-normal-cumulative-distribution-function-in-p
// or https://www.wolframalpha.com/input?i=N%5BInverseCDF%28normal%280%2C1%29%2C+0.05%29%2C%7B%E2%88%9E%2C100%7D%5D
// 2. So if we take a unit normal and multiply it by
// L / 1.6448536269514722, its new 90% confidence interval will be
// [-L, L], i.e., length 2 * L
@ -234,292 +223,3 @@ double sample_mixture(double (*samplers[])(uint64_t*), double* weights, int n_di
free(cumsummed_normalized_weights);
return result;
}
// # More cool stuff
// This is no longer necessary to do basic estimation,
// but is still cool
// ## Sample from an arbitrary cdf
struct box {
int empty;
double content;
char* error_msg;
};
struct box process_error(const char* error_msg, int should_exit, char* file, int line)
{
if (should_exit) {
printf("@, in %s (%d)", file, line);
exit(1);
} else {
char error_msg[MAX_ERROR_LENGTH];
snprintf(error_msg, MAX_ERROR_LENGTH, "@, in %s (%d)", file, line); // NOLINT: We are being carefull here by considering MAX_ERROR_LENGTH explicitly.
struct box error = { .empty = 1, .error_msg = error_msg };
return error;
}
}
// Inverse cdf at point
// Two versions of this function:
// - raw, dealing with cdfs that return doubles
// - input: cdf: double => double, p
// - output: Box(number|error)
// - box, dealing with cdfs that return a box.
// - input: cdf: double => Box(number|error), p
// - output: Box(number|error)
struct box inverse_cdf_double(double cdf(double), double p)
{
// given a cdf: [-Inf, Inf] => [0,1]
// returns a box with either
// x such that cdf(x) = p
// or an error
// if EXIT_ON_ERROR is set to 1, it exits instead of providing an error
double low = -1.0;
double high = 1.0;
// 1. Make sure that cdf(low) < p < cdf(high)
int interval_found = 0;
while ((!interval_found) && (low > -FLT_MAX / 4) && (high < FLT_MAX / 4)) {
// ^ Using FLT_MIN and FLT_MAX is overkill
// but it's also the *correct* thing to do.
int low_condition = (cdf(low) < p);
int high_condition = (p < cdf(high));
if (low_condition && high_condition) {
interval_found = 1;
} else if (!low_condition) {
low = low * 2;
} else if (!high_condition) {
high = high * 2;
}
}
if (!interval_found) {
return PROCESS_ERROR("Interval containing the target value not found, in function inverse_cdf");
} else {
int convergence_condition = 0;
int count = 0;
while (!convergence_condition && (count < (INT_MAX / 2))) {
double mid = (high + low) / 2;
int mid_not_new = (mid == low) || (mid == high);
// double width = high - low;
// if ((width < 1e-8) || mid_not_new){
if (mid_not_new) {
convergence_condition = 1;
} else {
double mid_sign = cdf(mid) - p;
if (mid_sign < 0) {
low = mid;
} else if (mid_sign > 0) {
high = mid;
} else if (mid_sign == 0) {
low = mid;
high = mid;
}
}
}
if (convergence_condition) {
struct box result = { .empty = 0, .content = low };
return result;
} else {
return PROCESS_ERROR("Search process did not converge, in function inverse_cdf");
}
}
}
struct box inverse_cdf_box(struct box cdf_box(double), double p)
{
// given a cdf: [-Inf, Inf] => Box([0,1])
// returns a box with either
// x such that cdf(x) = p
// or an error
// if EXIT_ON_ERROR is set to 1, it exits instead of providing an error
double low = -1.0;
double high = 1.0;
// 1. Make sure that cdf(low) < p < cdf(high)
int interval_found = 0;
while ((!interval_found) && (low > -FLT_MAX / 4) && (high < FLT_MAX / 4)) {
// ^ Using FLT_MIN and FLT_MAX is overkill
// but it's also the *correct* thing to do.
struct box cdf_low = cdf_box(low);
if (cdf_low.empty) {
return PROCESS_ERROR(cdf_low.error_msg);
}
struct box cdf_high = cdf_box(high);
if (cdf_high.empty) {
return PROCESS_ERROR(cdf_low.error_msg);
}
int low_condition = (cdf_low.content < p);
int high_condition = (p < cdf_high.content);
if (low_condition && high_condition) {
interval_found = 1;
} else if (!low_condition) {
low = low * 2;
} else if (!high_condition) {
high = high * 2;
}
}
if (!interval_found) {
return PROCESS_ERROR("Interval containing the target value not found, in function inverse_cdf");
} else {
int convergence_condition = 0;
int count = 0;
while (!convergence_condition && (count < (INT_MAX / 2))) {
double mid = (high + low) / 2;
int mid_not_new = (mid == low) || (mid == high);
// double width = high - low;
if (mid_not_new) {
// if ((width < 1e-8) || mid_not_new){
convergence_condition = 1;
} else {
struct box cdf_mid = cdf_box(mid);
if (cdf_mid.empty) {
return PROCESS_ERROR(cdf_mid.error_msg);
}
double mid_sign = cdf_mid.content - p;
if (mid_sign < 0) {
low = mid;
} else if (mid_sign > 0) {
high = mid;
} else if (mid_sign == 0) {
low = mid;
high = mid;
}
}
}
if (convergence_condition) {
struct box result = { .empty = 0, .content = low };
return result;
} else {
return PROCESS_ERROR("Search process did not converge, in function inverse_cdf");
}
}
}
// Sampler based on inverse cdf and randomness function
struct box sampler_cdf_box(struct box cdf(double), uint64_t* seed)
{
double p = sample_unit_uniform(seed);
struct box result = inverse_cdf_box(cdf, p);
return result;
}
struct box sampler_cdf_double(double cdf(double), uint64_t* seed)
{
double p = sample_unit_uniform(seed);
struct box result = inverse_cdf_double(cdf, p);
return result;
}
/* Could also define other variations, e.g.,
double sampler_danger(struct box cdf(double), uint64_t* seed)
{
double p = sample_unit_uniform(seed);
struct box result = inverse_cdf_box(cdf, p);
if(result.empty){
exit(1);
}else{
return result.content;
}
}
*/
// Get confidence intervals, given a sampler
typedef struct ci_t {
float low;
float high;
} ci;
int compare_doubles(const void* p, const void* q)
{
// https://wikiless.esmailelbob.xyz/wiki/Qsort?lang=en
double x = *(const double*)p;
double y = *(const double*)q;
/* Avoid return x - y, which can cause undefined behaviour
because of signed integer overflow. */
if (x < y)
return -1; // Return -1 if you want ascending, 1 if you want descending order.
else if (x > y)
return 1; // Return 1 if you want ascending, -1 if you want descending order.
return 0;
}
ci get_90_confidence_interval(double (*sampler)(uint64_t*), uint64_t* seed)
{
int n = 100 * 1000;
double* samples_array = malloc(n * sizeof(double));
for (int i = 0; i < n; i++) {
samples_array[i] = sampler(seed);
}
qsort(samples_array, n, sizeof(double), compare_doubles);
ci result = {
.low = samples_array[5000],
.high = samples_array[94999],
};
free(samples_array);
return result;
}
// # Small algebra manipulations
// here I discover named structs,
// which mean that I don't have to be typing
// struct blah all the time.
typedef struct normal_params_t {
double mean;
double std;
} normal_params;
normal_params algebra_sum_normals(normal_params a, normal_params b)
{
normal_params result = {
.mean = a.mean + b.mean,
.std = sqrt((a.std * a.std) + (b.std * b.std)),
};
return result;
}
typedef struct lognormal_params_t {
double logmean;
double logstd;
} lognormal_params;
lognormal_params algebra_product_lognormals(lognormal_params a, lognormal_params b)
{
lognormal_params result = {
.logmean = a.logmean + b.logmean,
.logstd = sqrt((a.logstd * a.logstd) + (b.logstd * b.logstd)),
};
return result;
}
lognormal_params convert_ci_to_lognormal_params(ci x)
{
double loghigh = logf(x.high);
double loglow = logf(x.low);
double logmean = (loghigh + loglow) / 2.0;
double logstd = (loghigh - loglow) / (2.0 * NORMAL90CONFIDENCE);
lognormal_params result = { .logmean = logmean, .logstd = logstd };
return result;
}
ci convert_lognormal_params_to_ci(lognormal_params y)
{
double h = y.logstd * NORMAL90CONFIDENCE;
double loghigh = y.logmean + h;
double loglow = y.logmean - h;
ci result = { .low = exp(loglow), .high = exp(loghigh) };
return result;
}

View File

@ -1,5 +1,5 @@
#ifndef SQUIGGLEC
#define SQUIGGLEC
#ifndef SQUIGGLE_C_CORE
#define SQUIGGLE_C_CORE
// uint64_t header
#include <stdint.h>
@ -30,52 +30,4 @@ double array_std(double* array, int length);
// Mixture function
double sample_mixture(double (*samplers[])(uint64_t*), double* weights, int n_dists, uint64_t* seed);
// Box
struct box {
int empty;
double content;
char* error_msg;
};
// Macros to handle errors
#define MAX_ERROR_LENGTH 500
#define EXIT_ON_ERROR 0
#define PROCESS_ERROR(error_msg) process_error(error_msg, EXIT_ON_ERROR, __FILE__, __LINE__)
struct box process_error(const char* error_msg, int should_exit, char* file, int line);
// Inverse cdf
struct box inverse_cdf_double(double cdf(double), double p);
struct box inverse_cdf_box(struct box cdf_box(double), double p);
// Samplers from cdf
struct box sampler_cdf_double(double cdf(double), uint64_t* seed);
struct box sampler_cdf_box(struct box cdf(double), uint64_t* seed);
// Get 90% confidence interval
typedef struct ci_t {
float low;
float high;
} ci;
ci get_90_confidence_interval(double (*sampler)(uint64_t*), uint64_t* seed);
// small algebra manipulations
typedef struct normal_params_t {
double mean;
double std;
} normal_params;
normal_params algebra_sum_normals(normal_params a, normal_params b);
typedef struct lognormal_params_t {
double logmean;
double logstd;
} lognormal_params;
lognormal_params algebra_product_lognormals(lognormal_params a, lognormal_params b);
lognormal_params convert_ci_to_lognormal_params(ci x);
ci convert_lognormal_params_to_ci(lognormal_params y);
#endif

339
squiggle_more.c Normal file
View File

@ -0,0 +1,339 @@
#include <float.h>
#include <math.h>
#include <limits.h>
#include <omp.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include "squiggle.h"
// math constants
#define PI 3.14159265358979323846 // M_PI in gcc gnu99
#define NORMAL90CONFIDENCE 1.6448536269514727
// Some error niceties; these won't be used until later
#define MAX_ERROR_LENGTH 500
#define EXIT_ON_ERROR 0
#define PROCESS_ERROR(error_msg) process_error(error_msg, EXIT_ON_ERROR, __FILE__, __LINE__)
// Get confidence intervals, given a sampler
// Not in core yet because I'm not sure how much I like the struct
// and the built-in 100k samples
// to do: add n to function parameters and document
typedef struct ci_t {
float low;
float high;
} ci;
int compare_doubles(const void* p, const void* q)
{
// https://wikiless.esmailelbob.xyz/wiki/Qsort?lang=en
double x = *(const double*)p;
double y = *(const double*)q;
/* Avoid returning x - y, which can cause undefined behaviour
because of signed integer overflow. */
if (x < y)
return -1; // Return -1 if you want ascending, 1 if you want descending order.
else if (x > y)
return 1; // Return 1 if you want ascending, -1 if you want descending order.
return 0;
}
ci get_90_confidence_interval(double (*sampler)(uint64_t*), uint64_t* seed)
{
int n = 100 * 1000;
double* samples_array = malloc(n * sizeof(double));
for (int i = 0; i < n; i++) {
samples_array[i] = sampler(seed);
}
qsort(samples_array, n, sizeof(double), compare_doubles);
ci result = {
.low = samples_array[5000],
.high = samples_array[94999],
};
free(samples_array);
return result;
}
// ## Sample from an arbitrary cdf
struct box {
int empty;
double content;
char* error_msg;
};
struct box process_error(const char* error_msg, int should_exit, char* file, int line)
{
if (should_exit) {
printf("@, in %s (%d)", file, line);
exit(1);
} else {
char error_msg[MAX_ERROR_LENGTH];
snprintf(error_msg, MAX_ERROR_LENGTH, "@, in %s (%d)", file, line); // NOLINT: We are being carefull here by considering MAX_ERROR_LENGTH explicitly.
struct box error = { .empty = 1, .error_msg = error_msg };
return error;
}
}
// Inverse cdf at point
// Two versions of this function:
// - raw, dealing with cdfs that return doubles
// - input: cdf: double => double, p
// - output: Box(number|error)
// - box, dealing with cdfs that return a box.
// - input: cdf: double => Box(number|error), p
// - output: Box(number|error)
struct box inverse_cdf_double(double cdf(double), double p)
{
// given a cdf: [-Inf, Inf] => [0,1]
// returns a box with either
// x such that cdf(x) = p
// or an error
// if EXIT_ON_ERROR is set to 1, it exits instead of providing an error
double low = -1.0;
double high = 1.0;
// 1. Make sure that cdf(low) < p < cdf(high)
int interval_found = 0;
while ((!interval_found) && (low > -FLT_MAX / 4) && (high < FLT_MAX / 4)) {
// ^ Using FLT_MIN and FLT_MAX is overkill
// but it's also the *correct* thing to do.
int low_condition = (cdf(low) < p);
int high_condition = (p < cdf(high));
if (low_condition && high_condition) {
interval_found = 1;
} else if (!low_condition) {
low = low * 2;
} else if (!high_condition) {
high = high * 2;
}
}
if (!interval_found) {
return PROCESS_ERROR("Interval containing the target value not found, in function inverse_cdf");
} else {
int convergence_condition = 0;
int count = 0;
while (!convergence_condition && (count < (INT_MAX / 2))) {
double mid = (high + low) / 2;
int mid_not_new = (mid == low) || (mid == high);
// double width = high - low;
// if ((width < 1e-8) || mid_not_new){
if (mid_not_new) {
convergence_condition = 1;
} else {
double mid_sign = cdf(mid) - p;
if (mid_sign < 0) {
low = mid;
} else if (mid_sign > 0) {
high = mid;
} else if (mid_sign == 0) {
low = mid;
high = mid;
}
}
}
if (convergence_condition) {
struct box result = { .empty = 0, .content = low };
return result;
} else {
return PROCESS_ERROR("Search process did not converge, in function inverse_cdf");
}
}
}
struct box inverse_cdf_box(struct box cdf_box(double), double p)
{
// given a cdf: [-Inf, Inf] => Box([0,1])
// returns a box with either
// x such that cdf(x) = p
// or an error
// if EXIT_ON_ERROR is set to 1, it exits instead of providing an error
double low = -1.0;
double high = 1.0;
// 1. Make sure that cdf(low) < p < cdf(high)
int interval_found = 0;
while ((!interval_found) && (low > -FLT_MAX / 4) && (high < FLT_MAX / 4)) {
// ^ Using FLT_MIN and FLT_MAX is overkill
// but it's also the *correct* thing to do.
struct box cdf_low = cdf_box(low);
if (cdf_low.empty) {
return PROCESS_ERROR(cdf_low.error_msg);
}
struct box cdf_high = cdf_box(high);
if (cdf_high.empty) {
return PROCESS_ERROR(cdf_low.error_msg);
}
int low_condition = (cdf_low.content < p);
int high_condition = (p < cdf_high.content);
if (low_condition && high_condition) {
interval_found = 1;
} else if (!low_condition) {
low = low * 2;
} else if (!high_condition) {
high = high * 2;
}
}
if (!interval_found) {
return PROCESS_ERROR("Interval containing the target value not found, in function inverse_cdf");
} else {
int convergence_condition = 0;
int count = 0;
while (!convergence_condition && (count < (INT_MAX / 2))) {
double mid = (high + low) / 2;
int mid_not_new = (mid == low) || (mid == high);
// double width = high - low;
if (mid_not_new) {
// if ((width < 1e-8) || mid_not_new){
convergence_condition = 1;
} else {
struct box cdf_mid = cdf_box(mid);
if (cdf_mid.empty) {
return PROCESS_ERROR(cdf_mid.error_msg);
}
double mid_sign = cdf_mid.content - p;
if (mid_sign < 0) {
low = mid;
} else if (mid_sign > 0) {
high = mid;
} else if (mid_sign == 0) {
low = mid;
high = mid;
}
}
}
if (convergence_condition) {
struct box result = { .empty = 0, .content = low };
return result;
} else {
return PROCESS_ERROR("Search process did not converge, in function inverse_cdf");
}
}
}
// Sampler based on inverse cdf and randomness function
struct box sampler_cdf_box(struct box cdf(double), uint64_t* seed)
{
double p = sample_unit_uniform(seed);
struct box result = inverse_cdf_box(cdf, p);
return result;
}
struct box sampler_cdf_double(double cdf(double), uint64_t* seed)
{
double p = sample_unit_uniform(seed);
struct box result = inverse_cdf_double(cdf, p);
return result;
}
/* Could also define other variations, e.g.,
double sampler_danger(struct box cdf(double), uint64_t* seed)
{
double p = sample_unit_uniform(seed);
struct box result = inverse_cdf_box(cdf, p);
if(result.empty){
exit(1);
}else{
return result.content;
}
}
*/
// # Small algebra manipulations
// here I discover named structs,
// which mean that I don't have to be typing
// struct blah all the time.
typedef struct normal_params_t {
double mean;
double std;
} normal_params;
normal_params algebra_sum_normals(normal_params a, normal_params b)
{
normal_params result = {
.mean = a.mean + b.mean,
.std = sqrt((a.std * a.std) + (b.std * b.std)),
};
return result;
}
typedef struct lognormal_params_t {
double logmean;
double logstd;
} lognormal_params;
lognormal_params algebra_product_lognormals(lognormal_params a, lognormal_params b)
{
lognormal_params result = {
.logmean = a.logmean + b.logmean,
.logstd = sqrt((a.logstd * a.logstd) + (b.logstd * b.logstd)),
};
return result;
}
lognormal_params convert_ci_to_lognormal_params(ci x)
{
double loghigh = logf(x.high);
double loglow = logf(x.low);
double logmean = (loghigh + loglow) / 2.0;
double logstd = (loghigh - loglow) / (2.0 * NORMAL90CONFIDENCE);
lognormal_params result = { .logmean = logmean, .logstd = logstd };
return result;
}
ci convert_lognormal_params_to_ci(lognormal_params y)
{
double h = y.logstd * NORMAL90CONFIDENCE;
double loghigh = y.logmean + h;
double loglow = y.logmean - h;
ci result = { .low = exp(loglow), .high = exp(loghigh) };
return result;
}
// Paralellism
/*
void paralellize(float (*sampler)(uint64_t* seed), float* results, int n_threads, int n_samples){
if((n_samples % n_threads) != 0){
fprintf(stderr, "Number of samples isn't divisible by number of threads, aborting\n");
exit(1);
}
uint64_t** seeds = malloc(n_threads * sizeof(uint64_t*));
for (uint64_t i = 0; i < n_threads; i++) {
seeds[i] = malloc(sizeof(uint64_t));
*seeds[i] = i + 1; // xorshift can't start with 0
}
int i;
#pragma omp parallel private(i)
{
#pragma omp for
for (i = 0; i < n_threads; i++) {
int lower_bound = i * (n_samples / n_threads);
int upper_bound = ((i+1) * (n_samples / n_threads)) - 1;
// printf("Lower bound: %d, upper bound: %d\n", lower_bound, upper_bound);
for (int j = lower_bound; j < upper_bound; j++) {
results[j] = sampler(seeds[i]);
}
}
}
for (uint64_t i = 0; i < n_threads; i++) {
free(seeds[i]);
}
free(seeds);
}
*/

49
squiggle_more.h Normal file
View File

@ -0,0 +1,49 @@
#ifndef SQUIGGLE_C_EXTRA
#define SQUIGGLE_C_EXTRA
// Box
struct box {
int empty;
double content;
char* error_msg;
};
// Macros to handle errors
#define MAX_ERROR_LENGTH 500
#define EXIT_ON_ERROR 0
#define PROCESS_ERROR(error_msg) process_error(error_msg, EXIT_ON_ERROR, __FILE__, __LINE__)
struct box process_error(const char* error_msg, int should_exit, char* file, int line);
// Inverse cdf
struct box inverse_cdf_double(double cdf(double), double p);
struct box inverse_cdf_box(struct box cdf_box(double), double p);
// Samplers from cdf
struct box sampler_cdf_double(double cdf(double), uint64_t* seed);
struct box sampler_cdf_box(struct box cdf(double), uint64_t* seed);
// Get 90% confidence interval
typedef struct ci_t {
float low;
float high;
} ci;
ci get_90_confidence_interval(double (*sampler)(uint64_t*), uint64_t* seed);
// small algebra manipulations
typedef struct normal_params_t {
double mean;
double std;
} normal_params;
normal_params algebra_sum_normals(normal_params a, normal_params b);
typedef struct lognormal_params_t {
double logmean;
double logstd;
} lognormal_params;
lognormal_params algebra_product_lognormals(lognormal_params a, lognormal_params b);
lognormal_params convert_ci_to_lognormal_params(ci x);
ci convert_lognormal_params_to_ci(lognormal_params y);
#endif

BIN
test/test

Binary file not shown.