Compare commits

..

No commits in common. "d3af874403141eda80e8d6da18f807a127ec4313" and "32b8a1fe1ab977b6f33761f42cf53c3d7c2fcea2" have entirely different histories.

61 changed files with 549 additions and 1000888 deletions

View File

@ -46,6 +46,7 @@ This library provides some basic building blocks. The recommended strategy is to
### Cdf auxiliary functions
To help with the above core strategy, this library provides convenience functions, which take a cdf, and return a sample from the distribution produced by that cdf. This might make it easier to program models, at the cost of a 20x to 60x slowdown in the parts of the code that use it.
### Nested functions and compilation with tcc.
@ -59,6 +60,29 @@ GCC has an extension which allows a program to define a function inside another
My recommendation would be to use tcc while drawing a small number of samples for fast iteration, and then using gcc for the final version with lots of samples, and possibly with nested functions for ease of reading by others.
### Error propagation vs exiting on error
The process of taking a cdf and returning a sample might fail, e.g., it's a Newton method which might fail to converge because of cdf artifacts. The cdf itself might also fail, e.g., if a distribution only accepts a range of parameters, but is fed parameters outside that range.
This library provides two approaches:
1. Print the line and function in which the error occured, then exit on error
2. In situations where there might be an error, return a struct containing either the correct value or an error message:
```C
struct box {
int empty;
double content;
char* error_msg;
};
```
The first approach produces terser programs but might not scale. The second approach seems like it could lead to more robust programmes, but is more verbose.
Behaviour on error can be toggled by the `EXIT_ON_ERROR` variable. This library also provides a convenient macro, `PROCESS_ERROR`, to make error handling in either case much terser—see the usage in example 4 in the examples/ folder.
Overall, I'd describe the error handling capabilities of this library as pretty rudimentary. For example, this program might fail in surprising ways if you ask for a lognormal with negative standard deviation, because I haven't added error checking for that case yet.
### Guarantees and licensing
- I offer no guarantees about stability, correctness, performance, etc. I might, for instance, abandon the version in C and rewrite it in Zig, Nim or Rust.
@ -248,49 +272,6 @@ make tidy
It emits one warning about something I already took care of, so by default I've suppressed it. I think this is good news in terms of making me more confident that this simple library is correct :).
### Division between core functions and extraneous expansions
This library differentiates between core functions, which are pretty tightly scoped, and expansions and convenience functions, which are more meandering. Expansions are in `extra.c` and `extra.h`. To use them, take care to link them:
```
// In your C source file
#include "extra.h"
```
```
# When compiling:
gcc -std=c99 -Wall -O3 example.c squiggle.c extra.c -lm -o ./example
```
#### Extra: Cdf auxiliary functions
I provide some Take a cdf, and return a sample from the distribution produced by that cdf. This might make it easier to program models, at the cost of a 20x to 60x slowdown in the parts of the code that use it.
#### Extra: Error propagation vs exiting on error
The process of taking a cdf and returning a sample might fail, e.g., it's a Newton method which might fail to converge because of cdf artifacts. The cdf itself might also fail, e.g., if a distribution only accepts a range of parameters, but is fed parameters outside that range.
This library provides two approaches:
1. Print the line and function in which the error occured, then exit on error
2. In situations where there might be an error, return a struct containing either the correct value or an error message:
```C
struct box {
int empty;
double content;
char* error_msg;
};
```
The first approach produces terser programs but might not scale. The second approach seems like it could lead to more robust programmes, but is more verbose.
Behaviour on error can be toggled by the `EXIT_ON_ERROR` variable. This library also provides a convenient macro, `PROCESS_ERROR`, to make error handling in either case much terser—see the usage in example 4 in the examples/ folder.
Overall, I'd describe the error handling capabilities of this library as pretty rudimentary. For example, this program might fail in surprising ways if you ask for a lognormal with negative standard deviation, because I haven't added error checking for that case yet.
## Related projects
- [Squiggle](https://www.squiggle-language.com/)
@ -301,8 +282,7 @@ Overall, I'd describe the error handling capabilities of this library as pretty
## To do list
- [ ] Document paralellism
- [ ] Document confidence intervals
- [ ] Think about whether to write a simple version of this for [uxn](https://100r.co/site/uxn.html), a minimalistic portable programming stack which, sadly, doesn't have doubles (64 bit floats)
- [ ] Point out that, even though the C standard is ambiguous about this, this code assumes that doubles are 64 bit precision (otherwise the xorshift should be different).
- [ ] Document rudimentary algebra manipulations for normal/lognormal
- [ ] Think through whether to delete cdf => samples function
@ -311,11 +291,11 @@ Overall, I'd describe the error handling capabilities of this library as pretty
- complexify and use boxes for everything
- leave as is
- [ ] Systematize references
- [ ] Publish online
- [ ] Support all distribution functions in <https://www.squiggle-language.com/docs/Api/Dist>
- [ ] do so efficiently
- [ ] Add more functions to do algebra and get the 90% c.i. of normals, lognormals, betas, etc.
- Think through which of these make sense.
- [ ] Disambiguate sample_laplace--successes vs failures || successes vs total trials as two distinct and differently named functions
## Done
@ -374,5 +354,5 @@ Overall, I'd describe the error handling capabilities of this library as pretty
- [ ] Consider desirability of defining shortcuts for those functions. Adds a level of magic, though.
- [ ] Test results
- [x] Move to own file? Or signpost in file? => signposted in file.
- [x] Write twitter thread: now [here](https://twitter.com/NunoSempere/status/1707041153210564959); retweets appreciated.
- [ ] ~~Think about whether to write a simple version of this for [uxn](https://100r.co/site/uxn.html), a minimalistic portable programming stack which, sadly, doesn't have doubles (64 bit floats)~~
- [ ] Disambiguate sample_laplace--successes vs failures || successes vs total trials as two distinct and differently named functions
- [ ] Write twitter thread.

Binary file not shown.

View File

@ -9,7 +9,6 @@ int main()
uint64_t* seed = malloc(sizeof(uint64_t));
*seed = 1000; // xorshift can't start with 0
// ...
free(seed);
}

Binary file not shown.

View File

@ -1,7 +1,7 @@
#include "../../squiggle.h"
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <stdio.h>
// Estimate functions
double sample_0(uint64_t* seed)
@ -24,8 +24,7 @@ double sample_many(uint64_t* seed)
return sample_to(2, 10, seed);
}
int main()
{
int main(){
// set randomness seed
uint64_t* seed = malloc(sizeof(uint64_t));
*seed = 1000; // xorshift can't start with 0

View File

@ -1,10 +1,9 @@
#include "../../squiggle.h"
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <stdio.h>
#include "../../squiggle.h"
int main()
{
int main(){
// set randomness seed
uint64_t* seed = malloc(sizeof(uint64_t));
*seed = 1000; // xorshift can't start with 0
@ -15,24 +14,25 @@ int main()
int n_dists = 4;
double sample_0(uint64_t * seed) { return 0; }
double sample_1(uint64_t * seed) { return 1; }
double sample_few(uint64_t * seed) { return sample_to(1, 3, seed); }
double sample_many(uint64_t * seed) { return sample_to(2, 10, seed); }
double sample_0(uint64_t* seed){ return 0; }
double sample_1(uint64_t* seed) { return 1; }
double sample_few(uint64_t* seed){ return sample_to(1, 3, seed); }
double sample_many(uint64_t* seed){ return sample_to(2, 10, seed); }
double (*samplers[])(uint64_t*) = { sample_0, sample_1, sample_few, sample_many };
double weights[] = { 1 - p_c, p_c / 2, p_c / 4, p_c / 4 };
int n_samples = 1000000;
double* result_many = (double*)malloc(n_samples * sizeof(double));
for (int i = 0; i < n_samples; i++) {
double* result_many = (double *) malloc(n_samples * sizeof(double));
for(int i=0; i<n_samples; i++){
result_many[i] = sample_mixture(samplers, weights, n_dists, seed);
}
printf("result_many: [");
for (int i = 0; i < 100; i++) {
for(int i=0; i<100; i++){
printf("%.2f, ", result_many[i]);
}
printf("]\n");
free(seed);
}

View File

@ -1,5 +1,4 @@
#include "../../squiggle.h"
#include "../../squiggle_more.h"
#include <math.h>
#include <stdint.h>
#include <stdio.h>

View File

@ -9,13 +9,12 @@ CC=gcc # required for nested functions
# CC=tcc # <= faster compilation
# Main file
SRC=example.c ../../squiggle.c ../../squiggle_more.c
SRC=example.c ../../squiggle.c
OUTPUT=./example
## Dependencies
OPENMP=-fopenmp
MATH=-lm
DEPENDENCIES=$(MATH) $(OPENMP)
DEPENDENCIES=$(MATH)
# OPENMP=-fopenmp
## Flags

View File

@ -1,5 +1,4 @@
#include "../../squiggle.h"
#include "../../squiggle_more.h"
#include <math.h>
#include <stdint.h>
#include <stdio.h>

View File

@ -9,7 +9,7 @@
# CC=tcc # <= faster compilation
# Main file
SRC=example.c ../../squiggle.c ../../squiggle_more.c
SRC=example.c ../../squiggle.c
OUTPUT=./example
## Dependencies

Binary file not shown.

View File

@ -42,3 +42,4 @@ int main()
free(seed);
}

Binary file not shown.

View File

@ -1,12 +1,10 @@
#include "../../squiggle.h"
#include "../../squiggle_more.h"
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
// Estimate functions
double beta_1_2_sampler(uint64_t* seed)
{
double beta_1_2_sampler(uint64_t* seed){
return sample_beta(1, 2.0, seed);
}
@ -16,7 +14,7 @@ int main()
uint64_t* seed = malloc(sizeof(uint64_t));
*seed = 1000; // xorshift can't start with 0
ci beta_1_2_ci_90 = get_90_confidence_interval(beta_1_2_sampler, seed);
struct c_i beta_1_2_ci_90 = get_90_confidence_interval(beta_1_2_sampler, seed);
printf("90%% confidence interval of beta(1,2) is [%f, %f]\n", beta_1_2_ci_90.low, beta_1_2_ci_90.high);
free(seed);

View File

@ -9,7 +9,7 @@ CC=gcc
# CC=tcc # <= faster compilation
# Main file
SRC=example.c ../../squiggle.c ../../squiggle_more.c
SRC=example.c ../../squiggle.c
OUTPUT=example
## Dependencies

Binary file not shown.

View File

@ -1,5 +1,4 @@
#include "../../squiggle.h"
#include "../../squiggle_more.h"
#include <math.h>
#include <stdint.h>
#include <stdio.h>

View File

@ -9,7 +9,7 @@ CC=gcc
# CC=tcc # <= faster compilation
# Main file
SRC=example.c ../../squiggle.c ../../squiggle_more.c
SRC=example.c ../../squiggle.c
OUTPUT=example
## Dependencies

Binary file not shown.

View File

@ -1,5 +1,4 @@
#include "../../squiggle.h"
#include "../../squiggle_more.h"
#include <math.h>
#include <stdint.h>
#include <stdio.h>
@ -34,7 +33,7 @@ int main()
result[i] = sample_minutes_per_day_jumping_rope_needed_to_burn_10kg(seed);
}
printf("## How many minutes per day do I have to jump rope to lose 10kg of fat by the end of the year?\n");
printf("How many minutes per day do I have to jump rope to lose 10kg of fat by the end of the year?\n");
printf("Mean: %f\n", array_mean(result, n));
printf("A few samples: [ ");
for (int i = 0; i < 9; i++) {
@ -42,8 +41,8 @@ int main()
}
printf("... ]\n");
ci ci_90 = get_90_confidence_interval(sample_minutes_per_day_jumping_rope_needed_to_burn_10kg, seed);
printf("90%% confidence interval: [%f, %f]\n", ci_90.low, ci_90.high);
struct c_i c_i_90 = get_90_confidence_interval(sample_minutes_per_day_jumping_rope_needed_to_burn_10kg, seed);
printf("90%% confidence interval: [%f, %f]\n", c_i_90.low, c_i_90.high);
free(seed);
}

View File

@ -5,25 +5,22 @@
# make run
# Compiler
CC=gcc # required for nested functions
CC=gcc
# CC=tcc # <= faster compilation
# Main file
SRC=example.c ../../squiggle.c ../../squiggle_more.c
OUTPUT=./example
SRC=example.c ../../squiggle.c
OUTPUT=example
## Dependencies
MATH=-lm
DEPENDENCIES=$(MATH)
# OPENMP=-fopenmp
## Flags
DEBUG= #'-g'
STANDARD=-std=c99 ## gnu99 allows for nested functions.
EXTENSIONS= #-fnested-functions
STANDARD=-std=c99
WARNINGS=-Wall
OPTIMIZED=-O3#-Ofast
CFLAGS=$(DEBUG) $(STANDARD) $(EXTENSIONS) $(WARNINGS) $(OPTIMIZED)
OPTIMIZED=-O3 #-Ofast
# OPENMP=-fopenmp
## Formatter
STYLE_BLUEPRINT=webkit
@ -31,14 +28,13 @@ FORMATTER=clang-format -i -style=$(STYLE_BLUEPRINT)
## make build
build: $(SRC)
# gcc -std=gnu99 example.c -lm -o example
$(CC) $(CFLAGS) $(SRC) $(DEPENDENCIES) -o $(OUTPUT)
$(CC) $(OPTIMIZED) $(DEBUG) $(SRC) $(MATH) -o $(OUTPUT)
format: $(SRC)
$(FORMATTER) $(SRC)
run: $(SRC) $(OUTPUT)
./$(OUTPUT) && echo
OMP_NUM_THREADS=1 ./$(OUTPUT) && echo
time-linux:
@echo "Requires /bin/time, found on GNU/Linux systems" && echo

Binary file not shown.

View File

@ -1,5 +1,4 @@
#include "../../squiggle.h"
#include "../../squiggle_more.h"
#include <math.h>
#include <stdint.h>
#include <stdio.h>
@ -14,31 +13,27 @@ double yearly_probability_nuclear_collapse(double year, uint64_t* seed)
// to get a probability,
// rather than sampling from a distribution over probabilities.
}
double yearly_probability_nuclear_collapse_2023(uint64_t* seed)
{
double yearly_probability_nuclear_collapse_2023(uint64_t* seed){
return yearly_probability_nuclear_collapse(2023, seed);
}
double yearly_probability_nuclear_collapse_after_recovery(double year, double rebuilding_period_length_years, uint64_t* seed)
{
double yearly_probability_nuclear_collapse_after_recovery(double year, double rebuilding_period_length_years, uint64_t* seed){
// assumption: nuclear
double successes = 1.0;
double failures = (year - rebuilding_period_length_years - 1960 - 1);
return sample_laplace(successes, failures, seed);
}
double yearly_probability_nuclear_collapse_after_recovery_example(uint64_t* seed)
{
double yearly_probability_nuclear_collapse_after_recovery_example(uint64_t* seed){
double year = 2070;
double rebuilding_period_length_years = 30;
double rebuilding_period_length_years =30;
// So, there was a nuclear collapse in 2040,
// then a recovery period of 30 years
// and it's now 2070
return yearly_probability_nuclear_collapse_after_recovery(year, rebuilding_period_length_years, seed);
}
double yearly_probability_nuclear_collapse_after_recovery_antiinductive(uint64_t* seed)
{
return yearly_probability_nuclear_collapse(2023, seed) / 2;
double yearly_probability_nuclear_collapse_after_recovery_antiinductive(uint64_t* seed){
return yearly_probability_nuclear_collapse(2023, seed)/2;
}
int main()

View File

@ -5,25 +5,22 @@
# make run
# Compiler
CC=gcc # required for nested functions
CC=gcc
# CC=tcc # <= faster compilation
# Main file
SRC=example.c ../../squiggle.c ../../squiggle_more.c
OUTPUT=./example
SRC=example.c ../../squiggle.c
OUTPUT=example
## Dependencies
MATH=-lm
DEPENDENCIES=$(MATH)
# OPENMP=-fopenmp
## Flags
DEBUG= #'-g'
STANDARD=-std=c99 ## gnu99 allows for nested functions.
EXTENSIONS= #-fnested-functions
STANDARD=-std=c99
WARNINGS=-Wall
OPTIMIZED=-O3#-Ofast
CFLAGS=$(DEBUG) $(STANDARD) $(EXTENSIONS) $(WARNINGS) $(OPTIMIZED)
OPTIMIZED=-O3 #-Ofast
# OPENMP=-fopenmp
## Formatter
STYLE_BLUEPRINT=webkit
@ -31,14 +28,13 @@ FORMATTER=clang-format -i -style=$(STYLE_BLUEPRINT)
## make build
build: $(SRC)
# gcc -std=gnu99 example.c -lm -o example
$(CC) $(CFLAGS) $(SRC) $(DEPENDENCIES) -o $(OUTPUT)
$(CC) $(OPTIMIZED) $(DEBUG) $(SRC) $(MATH) -o $(OUTPUT)
format: $(SRC)
$(FORMATTER) $(SRC)
run: $(SRC) $(OUTPUT)
./$(OUTPUT) && echo
OMP_NUM_THREADS=1 ./$(OUTPUT) && echo
time-linux:
@echo "Requires /bin/time, found on GNU/Linux systems" && echo

Binary file not shown.

View File

@ -1,5 +1,4 @@
#include "../../squiggle.h"
#include "../../squiggle_more.h"
#include <math.h>
#include <stdint.h>
#include <stdio.h>

View File

@ -5,25 +5,22 @@
# make run
# Compiler
CC=gcc # required for nested functions
CC=gcc
# CC=tcc # <= faster compilation
# Main file
SRC=example.c ../../squiggle.c ../../squiggle_more.c
OUTPUT=./example
SRC=example.c ../../squiggle.c
OUTPUT=example
## Dependencies
MATH=-lm
DEPENDENCIES=$(MATH)
# OPENMP=-fopenmp
## Flags
DEBUG= #'-g'
STANDARD=-std=c99 ## gnu99 allows for nested functions.
EXTENSIONS= #-fnested-functions
STANDARD=-std=c99
WARNINGS=-Wall
OPTIMIZED=-O3#-Ofast
CFLAGS=$(DEBUG) $(STANDARD) $(EXTENSIONS) $(WARNINGS) $(OPTIMIZED)
OPTIMIZED=-O3 #-Ofast
# OPENMP=-fopenmp
## Formatter
STYLE_BLUEPRINT=webkit
@ -31,14 +28,13 @@ FORMATTER=clang-format -i -style=$(STYLE_BLUEPRINT)
## make build
build: $(SRC)
# gcc -std=gnu99 example.c -lm -o example
$(CC) $(CFLAGS) $(SRC) $(DEPENDENCIES) -o $(OUTPUT)
$(CC) $(OPTIMIZED) $(DEBUG) $(SRC) $(MATH) -o $(OUTPUT)
format: $(SRC)
$(FORMATTER) $(SRC)
run: $(SRC) $(OUTPUT)
./$(OUTPUT) && echo
OMP_NUM_THREADS=1 ./$(OUTPUT) && echo
time-linux:
@echo "Requires /bin/time, found on GNU/Linux systems" && echo

View File

@ -1,5 +1,4 @@
#include "../../squiggle.h"
#include "../../squiggle_more.h"
#include <math.h>
#include <stdint.h>
#include <stdio.h>
@ -17,13 +16,13 @@ int main()
printf("The 90%% confidence interval of Lognormal(%f, %f) is [%f, %f]\n",
ln1.logmean, ln1.logstd,
ln1_ci.low, ln1_ci.high);
lognormal_params ln1_params2 = convert_ci_to_lognormal_params(ln1_ci);
lognormal_params ln1_ci_paramas = convert_ci_to_lognormal_params(ln1_ci);
printf("The lognormal which has 90%% confidence interval [%f, %f] is Lognormal(%f, %f)\n",
ln1_ci.low, ln1_ci.high,
ln1_params2.logmean, ln1_params2.logstd);
ln1.logmean, ln1.logstd);
lognormal_params ln2 = convert_ci_to_lognormal_params((ci) { .low = 1, .high = 10 });
lognormal_params ln3 = convert_ci_to_lognormal_params((ci) { .low = 5, .high = 50 });
lognormal_params ln2 = convert_ci_to_lognormal_params((ci){.low = 1, .high = 10});
lognormal_params ln3 = convert_ci_to_lognormal_params((ci){.low = 5, .high = 50});
lognormal_params sln = algebra_product_lognormals(ln2, ln3);
ci sln_ci = convert_lognormal_params_to_ci(sln);

View File

@ -5,25 +5,22 @@
# make run
# Compiler
CC=gcc # required for nested functions
CC=gcc
# CC=tcc # <= faster compilation
# Main file
SRC=example.c ../../squiggle.c ../../squiggle_more.c
OUTPUT=./example
SRC=example.c ../../squiggle.c
OUTPUT=example
## Dependencies
MATH=-lm
DEPENDENCIES=$(MATH)
# OPENMP=-fopenmp
## Flags
DEBUG= #'-g'
STANDARD=-std=c99 ## gnu99 allows for nested functions.
EXTENSIONS= #-fnested-functions
STANDARD=-std=c99
WARNINGS=-Wall
OPTIMIZED=-O3#-Ofast
CFLAGS=$(DEBUG) $(STANDARD) $(EXTENSIONS) $(WARNINGS) $(OPTIMIZED)
OPTIMIZED=-O3 #-Ofast
# OPENMP=-fopenmp
## Formatter
STYLE_BLUEPRINT=webkit
@ -31,14 +28,13 @@ FORMATTER=clang-format -i -style=$(STYLE_BLUEPRINT)
## make build
build: $(SRC)
# gcc -std=gnu99 example.c -lm -o example
$(CC) $(CFLAGS) $(SRC) $(DEPENDENCIES) -o $(OUTPUT)
$(CC) $(OPTIMIZED) $(DEBUG) $(SRC) $(MATH) -o $(OUTPUT)
format: $(SRC)
$(FORMATTER) $(SRC)
run: $(SRC) $(OUTPUT)
./$(OUTPUT) && echo
OMP_NUM_THREADS=1 ./$(OUTPUT) && echo
time-linux:
@echo "Requires /bin/time, found on GNU/Linux systems" && echo

View File

@ -1,14 +1,13 @@
#include "../../squiggle.h"
#include "../../squiggle_more.h"
#include <math.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#define ln lognormal_params
#define to(...) convert_ci_to_lognormal_params((ci)__VA_ARGS__)
#define from(...) convert_lognormal_params_to_ci((ln)__VA_ARGS__)
#define times(a, b) algebra_product_lognormals(a, b)
#define to(...) convert_ci_to_lognormal_params((ci) __VA_ARGS__)
#define from(...) convert_lognormal_params_to_ci((ln) __VA_ARGS__)
#define times(a,b) algebra_product_lognormals(a,b)
int main()
{
@ -16,8 +15,8 @@ int main()
uint64_t* seed = malloc(sizeof(uint64_t));
*seed = 1000; // xorshift can't start with 0
ln a = to({ .low = 1, .high = 10 });
ln b = to({ .low = 5, .high = 500 });
ln a = to({.low = 1, .high = 10});
ln b = to({.low = 5, .high = 500});
ln c = times(a, b);
printf("Result: to(%f, %f)\n", from(c).low, from(c).high);

View File

@ -5,25 +5,22 @@
# make run
# Compiler
CC=gcc # required for nested functions
CC=gcc
# CC=tcc # <= faster compilation
# Main file
SRC=example.c ../../squiggle.c ../../squiggle_more.c
OUTPUT=./example
SRC=example.c ../../squiggle.c
OUTPUT=example
## Dependencies
MATH=-lm
DEPENDENCIES=$(MATH)
# OPENMP=-fopenmp
## Flags
DEBUG= #'-g'
STANDARD=-std=c99 ## gnu99 allows for nested functions.
EXTENSIONS= #-fnested-functions
STANDARD=-std=c99
WARNINGS=-Wall
OPTIMIZED=-O3#-Ofast
CFLAGS=$(DEBUG) $(STANDARD) $(EXTENSIONS) $(WARNINGS) $(OPTIMIZED)
OPTIMIZED=-O3 #-Ofast
# OPENMP=-fopenmp
## Formatter
STYLE_BLUEPRINT=webkit
@ -31,14 +28,13 @@ FORMATTER=clang-format -i -style=$(STYLE_BLUEPRINT)
## make build
build: $(SRC)
# gcc -std=gnu99 example.c -lm -o example
$(CC) $(CFLAGS) $(SRC) $(DEPENDENCIES) -o $(OUTPUT)
$(CC) $(OPTIMIZED) $(DEBUG) $(SRC) $(MATH) -o $(OUTPUT)
format: $(SRC)
$(FORMATTER) $(SRC)
run: $(SRC) $(OUTPUT)
./$(OUTPUT) && echo
OMP_NUM_THREADS=1 ./$(OUTPUT) && echo
time-linux:
@echo "Requires /bin/time, found on GNU/Linux systems" && echo

View File

@ -1,26 +1,21 @@
#include "../../squiggle.h"
#include "../../squiggle_more.h"
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
double sample_0(uint64_t* seed)
{
double sample_0(uint64_t* seed){
return 0;
}
double sample_1(uint64_t* seed)
{
double sample_1(uint64_t* seed){
return 1;
}
double sample_normal_mean_1_std_2(uint64_t* seed)
{
double sample_normal_mean_1_std_2(uint64_t* seed){
return sample_normal(1, 2, seed);
}
double sample_1_to_3(uint64_t* seed)
{
double sample_1_to_3(uint64_t* seed){
return sample_to(1, 3, seed);
}

View File

@ -5,25 +5,22 @@
# make run
# Compiler
CC=gcc # required for nested functions
CC=gcc
# CC=tcc # <= faster compilation
# Main file
SRC=example.c ../../squiggle.c ../../squiggle_more.c
OUTPUT=./example
SRC=example.c ../../squiggle.c
OUTPUT=example
## Dependencies
MATH=-lm
DEPENDENCIES=$(MATH)
# OPENMP=-fopenmp
## Flags
DEBUG= #'-g'
STANDARD=-std=c99 ## gnu99 allows for nested functions.
EXTENSIONS= #-fnested-functions
STANDARD=-std=c99
WARNINGS=-Wall
OPTIMIZED=-O3#-Ofast
CFLAGS=$(DEBUG) $(STANDARD) $(EXTENSIONS) $(WARNINGS) $(OPTIMIZED)
OPTIMIZED=-O3 #-Ofast
# OPENMP=-fopenmp
## Formatter
STYLE_BLUEPRINT=webkit
@ -31,8 +28,7 @@ FORMATTER=clang-format -i -style=$(STYLE_BLUEPRINT)
## make build
build: $(SRC)
# gcc -std=gnu99 example.c -lm -o example
$(CC) $(CFLAGS) $(SRC) $(DEPENDENCIES) -o $(OUTPUT)
$(CC) $(OPTIMIZED) $(DEBUG) $(SRC) $(MATH) -o $(OUTPUT)
format: $(SRC)
$(FORMATTER) $(SRC)
@ -44,7 +40,7 @@ time-linux:
@echo "Requires /bin/time, found on GNU/Linux systems" && echo
@echo "Running 100x and taking avg time $(OUTPUT)"
@t=$$(/usr/bin/time -f "%e" -p bash -c 'for i in {1..100}; do $(OUTPUT); done' 2>&1 >/dev/null | grep real | awk '{print $$2}' ); echo "scale=2; 1000 * $$t / 100" | bc | sed "s|^|Time using 1 thread: |" | sed 's|$$|ms|' && echo
@t=$$(/usr/bin/time -f "%e" -p bash -c 'for i in {1..100}; do ./$(OUTPUT); done' 2>&1 >/dev/null | grep real | awk '{print $$2}' ); echo "scale=2; 1000 * $$t / 100" | bc | sed "s|^|Time using 1 thread: |" | sed 's|$$|ms|' && echo
## Profiling

View File

@ -1,7 +0,0 @@
library(ggplot2)
data <- read.csv("samples.txt", header = FALSE)
data <- as.data.frame(data)
ggplot(data = data, aes(x = V1)) +
geom_bar()
ggplot(data = data, aes(x = V1)) +
geom_freqpoly()

View File

@ -1,3 +0,0 @@
build:
format:

View File

@ -1,12 +0,0 @@
options:
- use gnuplot
- requires aggregation
- do aggregation in its own gnuplot language
- use something like awk
- use some other language, like python or R, specifically to do the aggregation
- do the aggregation in C
- use ggplot
- requires setting up bins correctly
- also can't pipe to an r script directly, sadly enough.
- use python for plotting?
- just present the confidence intervals in C?

View File

@ -1,7 +0,0 @@
library(ggplot2)
data <- read.csv("samples.txt", header = FALSE)
data <- as.data.frame(data)
ggplot(data = data, aes(x = V1)) +
geom_freqpoly()

File diff suppressed because it is too large Load Diff

View File

@ -1,50 +0,0 @@
#include "../../squiggle.h"
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
// Estimate functions
double sample_0(uint64_t* seed)
{
return 0;
}
double sample_1(uint64_t* seed)
{
return 1;
}
double sample_few(uint64_t* seed)
{
return sample_to(1, 3, seed);
}
double sample_many(uint64_t* seed)
{
return sample_to(2, 10, seed);
}
int main()
{
// set randomness seed
uint64_t* seed = malloc(sizeof(uint64_t));
*seed = 1000; // xorshift can't start with 0
double p_a = 0.8;
double p_b = 0.5;
double p_c = p_a * p_b;
int n_dists = 4;
double weights[] = { 1 - p_c, p_c / 2, p_c / 4, p_c / 4 };
double (*samplers[])(uint64_t*) = { sample_0, sample_1, sample_few, sample_many };
int n_samples = 1000000;
double* result_many = (double*)malloc(n_samples * sizeof(double));
for (int i = 0; i < n_samples; i++) {
result_many[i] = sample_mixture(samplers, weights, n_dists, seed);
printf("%f\n", result_many[i]);
}
// printf("Mean: %f\n", array_mean(result_many, n_samples));
free(seed);
}

View File

@ -1,53 +0,0 @@
# Interface:
# make
# make build
# make format
# make run
# Compiler
CC=gcc
# CC=tcc # <= faster compilation
# Main file
SRC=example.c ../../squiggle.c
OUTPUT=example
## Dependencies
MATH=-lm
## Flags
DEBUG= #'-g'
STANDARD=-std=c99
WARNINGS=-Wall
OPTIMIZED=-O3 #-Ofast
# OPENMP=-fopenmp
## Formatter
STYLE_BLUEPRINT=webkit
FORMATTER=clang-format -i -style=$(STYLE_BLUEPRINT)
## make build
build: $(SRC)
$(CC) $(OPTIMIZED) $(DEBUG) $(SRC) $(MATH) -o $(OUTPUT)
format: $(SRC)
$(FORMATTER) $(SRC)
run: $(SRC) $(OUTPUT)
./$(OUTPUT) && echo
time-linux:
@echo "Requires /bin/time, found on GNU/Linux systems" && echo
@echo "Running 100x and taking avg time $(OUTPUT)"
@t=$$(/usr/bin/time -f "%e" -p bash -c 'for i in {1..100}; do ./$(OUTPUT); done' 2>&1 >/dev/null | grep real | awk '{print $$2}' ); echo "scale=2; 1000 * $$t / 100" | bc | sed "s|^|Time using 1 thread: |" | sed 's|$$|ms|' && echo
## Profiling
profile-linux:
echo "Requires perf, which depends on the kernel version, and might be in linux-tools package or similar"
echo "Must be run as sudo"
$(CC) $(SRC) $(MATH) -o $(OUTPUT)
sudo perf record ./$(OUTPUT)
sudo perf report
rm perf.data

View File

@ -1,18 +0,0 @@
#include "../../squiggle.h"
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
// Estimate functions
int main()
{
// set randomness seed
uint64_t* seed = malloc(sizeof(uint64_t));
*seed = 1000; // xorshift can't start with 0
for (int i = 0; i < 100; i++) {
double sample = sample_lognormal(0, 10, seed);
printf("%f\n", sample);
}
free(seed);
}

View File

@ -1,53 +0,0 @@
# Interface:
# make
# make build
# make format
# make run
# Compiler
CC=gcc
# CC=tcc # <= faster compilation
# Main file
SRC=example.c ../../squiggle.c
OUTPUT=example
## Dependencies
MATH=-lm
## Flags
DEBUG= #'-g'
STANDARD=-std=c99
WARNINGS=-Wall
OPTIMIZED=-O3 #-Ofast
# OPENMP=-fopenmp
## Formatter
STYLE_BLUEPRINT=webkit
FORMATTER=clang-format -i -style=$(STYLE_BLUEPRINT)
## make build
build: $(SRC)
$(CC) $(OPTIMIZED) $(DEBUG) $(SRC) $(MATH) -o $(OUTPUT)
format: $(SRC)
$(FORMATTER) $(SRC)
run: $(SRC) $(OUTPUT)
OMP_NUM_THREADS=1 ./$(OUTPUT) && echo
time-linux:
@echo "Requires /bin/time, found on GNU/Linux systems" && echo
@echo "Running 100x and taking avg time $(OUTPUT)"
@t=$$(/usr/bin/time -f "%e" -p bash -c 'for i in {1..100}; do $(OUTPUT); done' 2>&1 >/dev/null | grep real | awk '{print $$2}' ); echo "scale=2; 1000 * $$t / 100" | bc | sed "s|^|Time using 1 thread: |" | sed 's|$$|ms|' && echo
## Profiling
profile-linux:
echo "Requires perf, which depends on the kernel version, and might be in linux-tools package or similar"
echo "Must be run as sudo"
$(CC) $(SRC) $(MATH) -o $(OUTPUT)
sudo perf record ./$(OUTPUT)
sudo perf report
rm perf.data

View File

@ -1 +0,0 @@
./example | sort -h

View File

@ -4,7 +4,7 @@ MAKEFLAGS += --no-print-directory
STYLE_BLUEPRINT=webkit
FORMATTER=clang-format -i -style=$(STYLE_BLUEPRINT)
build-examples:
all:
cd examples/01_one_sample && make && echo
cd examples/02_many_samples_time_to_botec && make && echo
cd examples/03_gcc_nested_function && make && echo
@ -12,33 +12,6 @@ build-examples:
cd examples/05_sample_from_cdf_beta && make && echo
cd examples/06_gamma_beta && make && echo
cd examples/07_ci_beta && make && echo
cd examples/08_nuclear_war && make && echo
cd examples/09_burn_10kg_fat && make && echo
cd examples/10_nuclear_recovery && make && echo
cd examples/11_algebra && make && echo
cd examples/12_algebra_and_conversion && make && echo
cd examples/13_ergonomic_algebra && make && echo
cd examples/14_twitter_thread_example && make && echo
cd examples/15_plotting-scratchpad && make && echo
cd examples/16_100_lognormal_samples && make && echo
format-examples:
cd examples/01_one_sample && make format && echo
cd examples/02_many_samples_time_to_botec && make format && echo
cd examples/03_gcc_nested_function && make format && echo
cd examples/04_sample_from_cdf_simple && make format && echo
cd examples/05_sample_from_cdf_beta && make format && echo
cd examples/06_gamma_beta && make format && echo
cd examples/07_ci_beta && make format && echo
cd examples/08_nuclear_war && make format && echo
cd examples/09_burn_10kg_fat && make format && echo
cd examples/10_nuclear_recovery && make format && echo
cd examples/11_algebra && make format && echo
cd examples/12_algebra_and_conversion && make format && echo
cd examples/13_ergonomic_algebra && make format && echo
cd examples/14_twitter_thread_example && make format && echo
cd examples/15_plotting-scratchpad && make format && echo
cd examples/16_100_lognormal_samples && make format && echo
format: squiggle.c squiggle.h
$(FORMATTER) squiggle.c
@ -46,4 +19,3 @@ format: squiggle.c squiggle.h
lint:
clang-tidy squiggle.c -- -lm
clang-tidy extra.c -- -lm

Binary file not shown.

View File

@ -9,14 +9,12 @@ int main()
// set randomness seed
uint64_t* seed = malloc(sizeof(uint64_t));
*seed = 1000; // xorshift can't start with a seed of 0
/*
for (int i = 0; i < 100; i++) {
double draw = sample_unit_uniform(seed);
printf("%f\n", draw);
}*/
// Test division
printf("\n%d\n", 10 % 3);
}
free(seed);
}

View File

@ -1,11 +1,23 @@
#include <float.h>
#include <limits.h>
#include <math.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <time.h>
// Some error niceties; these won't be used until later
#define MAX_ERROR_LENGTH 500
#define EXIT_ON_ERROR 0
#define PROCESS_ERROR(error_msg) process_error(error_msg, EXIT_ON_ERROR, __FILE__, __LINE__)
// math constants
#define PI 3.14159265358979323846 // M_PI in gcc gnu99
#define NORMAL90CONFIDENCE 1.6448536269514727
#define NORMAL90CONFIDENCE 1.6448536269514722
// # Key functionality
// Define the minimum number of functions needed to do simple estimation
// Starts here, ends until the end of the mixture function
// Pseudo Random number generator
uint64_t xorshift32(uint32_t* seed)
@ -73,7 +85,6 @@ inline double sample_normal_from_90_confidence_interval(double low, double high,
// 1. We know that the 90% confidence interval of the unit normal is
// [-1.6448536269514722, 1.6448536269514722]
// see e.g.: https://stackoverflow.com/questions/20626994/how-to-calculate-the-inverse-of-the-normal-cumulative-distribution-function-in-p
// or https://www.wolframalpha.com/input?i=N%5BInverseCDF%28normal%280%2C1%29%2C+0.05%29%2C%7B%E2%88%9E%2C100%7D%5D
// 2. So if we take a unit normal and multiply it by
// L / 1.6448536269514722, its new 90% confidence interval will be
// [-L, L], i.e., length 2 * L
@ -223,3 +234,292 @@ double sample_mixture(double (*samplers[])(uint64_t*), double* weights, int n_di
free(cumsummed_normalized_weights);
return result;
}
// # More cool stuff
// This is no longer necessary to do basic estimation,
// but is still cool
// ## Sample from an arbitrary cdf
struct box {
int empty;
double content;
char* error_msg;
};
struct box process_error(const char* error_msg, int should_exit, char* file, int line)
{
if (should_exit) {
printf("@, in %s (%d)", file, line);
exit(1);
} else {
char error_msg[MAX_ERROR_LENGTH];
snprintf(error_msg, MAX_ERROR_LENGTH, "@, in %s (%d)", file, line); // NOLINT: We are being carefull here by considering MAX_ERROR_LENGTH explicitly.
struct box error = { .empty = 1, .error_msg = error_msg };
return error;
}
}
// Inverse cdf at point
// Two versions of this function:
// - raw, dealing with cdfs that return doubles
// - input: cdf: double => double, p
// - output: Box(number|error)
// - box, dealing with cdfs that return a box.
// - input: cdf: double => Box(number|error), p
// - output: Box(number|error)
struct box inverse_cdf_double(double cdf(double), double p)
{
// given a cdf: [-Inf, Inf] => [0,1]
// returns a box with either
// x such that cdf(x) = p
// or an error
// if EXIT_ON_ERROR is set to 1, it exits instead of providing an error
double low = -1.0;
double high = 1.0;
// 1. Make sure that cdf(low) < p < cdf(high)
int interval_found = 0;
while ((!interval_found) && (low > -FLT_MAX / 4) && (high < FLT_MAX / 4)) {
// ^ Using FLT_MIN and FLT_MAX is overkill
// but it's also the *correct* thing to do.
int low_condition = (cdf(low) < p);
int high_condition = (p < cdf(high));
if (low_condition && high_condition) {
interval_found = 1;
} else if (!low_condition) {
low = low * 2;
} else if (!high_condition) {
high = high * 2;
}
}
if (!interval_found) {
return PROCESS_ERROR("Interval containing the target value not found, in function inverse_cdf");
} else {
int convergence_condition = 0;
int count = 0;
while (!convergence_condition && (count < (INT_MAX / 2))) {
double mid = (high + low) / 2;
int mid_not_new = (mid == low) || (mid == high);
// double width = high - low;
// if ((width < 1e-8) || mid_not_new){
if (mid_not_new) {
convergence_condition = 1;
} else {
double mid_sign = cdf(mid) - p;
if (mid_sign < 0) {
low = mid;
} else if (mid_sign > 0) {
high = mid;
} else if (mid_sign == 0) {
low = mid;
high = mid;
}
}
}
if (convergence_condition) {
struct box result = { .empty = 0, .content = low };
return result;
} else {
return PROCESS_ERROR("Search process did not converge, in function inverse_cdf");
}
}
}
struct box inverse_cdf_box(struct box cdf_box(double), double p)
{
// given a cdf: [-Inf, Inf] => Box([0,1])
// returns a box with either
// x such that cdf(x) = p
// or an error
// if EXIT_ON_ERROR is set to 1, it exits instead of providing an error
double low = -1.0;
double high = 1.0;
// 1. Make sure that cdf(low) < p < cdf(high)
int interval_found = 0;
while ((!interval_found) && (low > -FLT_MAX / 4) && (high < FLT_MAX / 4)) {
// ^ Using FLT_MIN and FLT_MAX is overkill
// but it's also the *correct* thing to do.
struct box cdf_low = cdf_box(low);
if (cdf_low.empty) {
return PROCESS_ERROR(cdf_low.error_msg);
}
struct box cdf_high = cdf_box(high);
if (cdf_high.empty) {
return PROCESS_ERROR(cdf_low.error_msg);
}
int low_condition = (cdf_low.content < p);
int high_condition = (p < cdf_high.content);
if (low_condition && high_condition) {
interval_found = 1;
} else if (!low_condition) {
low = low * 2;
} else if (!high_condition) {
high = high * 2;
}
}
if (!interval_found) {
return PROCESS_ERROR("Interval containing the target value not found, in function inverse_cdf");
} else {
int convergence_condition = 0;
int count = 0;
while (!convergence_condition && (count < (INT_MAX / 2))) {
double mid = (high + low) / 2;
int mid_not_new = (mid == low) || (mid == high);
// double width = high - low;
if (mid_not_new) {
// if ((width < 1e-8) || mid_not_new){
convergence_condition = 1;
} else {
struct box cdf_mid = cdf_box(mid);
if (cdf_mid.empty) {
return PROCESS_ERROR(cdf_mid.error_msg);
}
double mid_sign = cdf_mid.content - p;
if (mid_sign < 0) {
low = mid;
} else if (mid_sign > 0) {
high = mid;
} else if (mid_sign == 0) {
low = mid;
high = mid;
}
}
}
if (convergence_condition) {
struct box result = { .empty = 0, .content = low };
return result;
} else {
return PROCESS_ERROR("Search process did not converge, in function inverse_cdf");
}
}
}
// Sampler based on inverse cdf and randomness function
struct box sampler_cdf_box(struct box cdf(double), uint64_t* seed)
{
double p = sample_unit_uniform(seed);
struct box result = inverse_cdf_box(cdf, p);
return result;
}
struct box sampler_cdf_double(double cdf(double), uint64_t* seed)
{
double p = sample_unit_uniform(seed);
struct box result = inverse_cdf_double(cdf, p);
return result;
}
/* Could also define other variations, e.g.,
double sampler_danger(struct box cdf(double), uint64_t* seed)
{
double p = sample_unit_uniform(seed);
struct box result = inverse_cdf_box(cdf, p);
if(result.empty){
exit(1);
}else{
return result.content;
}
}
*/
// Get confidence intervals, given a sampler
typedef struct ci_t {
float low;
float high;
} ci;
int compare_doubles(const void* p, const void* q)
{
// https://wikiless.esmailelbob.xyz/wiki/Qsort?lang=en
double x = *(const double*)p;
double y = *(const double*)q;
/* Avoid return x - y, which can cause undefined behaviour
because of signed integer overflow. */
if (x < y)
return -1; // Return -1 if you want ascending, 1 if you want descending order.
else if (x > y)
return 1; // Return 1 if you want ascending, -1 if you want descending order.
return 0;
}
ci get_90_confidence_interval(double (*sampler)(uint64_t*), uint64_t* seed)
{
int n = 100 * 1000;
double* samples_array = malloc(n * sizeof(double));
for (int i = 0; i < n; i++) {
samples_array[i] = sampler(seed);
}
qsort(samples_array, n, sizeof(double), compare_doubles);
ci result = {
.low = samples_array[5000],
.high = samples_array[94999],
};
free(samples_array);
return result;
}
// # Small algebra manipulations
// here I discover named structs,
// which mean that I don't have to be typing
// struct blah all the time.
typedef struct normal_params_t {
double mean;
double std;
} normal_params;
normal_params algebra_sum_normals(normal_params a, normal_params b)
{
normal_params result = {
.mean = a.mean + b.mean,
.std = sqrt((a.std * a.std) + (b.std * b.std)),
};
return result;
}
typedef struct lognormal_params_t {
double logmean;
double logstd;
} lognormal_params;
lognormal_params algebra_product_lognormals(lognormal_params a, lognormal_params b)
{
lognormal_params result = {
.logmean = a.logmean + b.logmean,
.logstd = sqrt((a.logstd * a.logstd) + (b.logstd * b.logstd)),
};
return result;
}
lognormal_params convert_ci_to_lognormal_params(ci x)
{
double loghigh = logf(x.high);
double loglow = logf(x.low);
double logmean = (loghigh + loglow) / 2.0;
double logstd = (loghigh - loglow) / (2.0 * NORMAL90CONFIDENCE);
lognormal_params result = { .logmean = logmean, .logstd = logstd };
return result;
}
ci convert_lognormal_params_to_ci(lognormal_params y)
{
double h = y.logstd * NORMAL90CONFIDENCE;
double loghigh = y.logmean + h;
double loglow = y.logmean - h;
ci result = { .low = exp(loglow), .high = exp(loghigh) };
return result;
}

View File

@ -1,5 +1,5 @@
#ifndef SQUIGGLE_C_CORE
#define SQUIGGLE_C_CORE
#ifndef SQUIGGLEC
#define SQUIGGLEC
// uint64_t header
#include <stdint.h>
@ -30,4 +30,52 @@ double array_std(double* array, int length);
// Mixture function
double sample_mixture(double (*samplers[])(uint64_t*), double* weights, int n_dists, uint64_t* seed);
// Box
struct box {
int empty;
double content;
char* error_msg;
};
// Macros to handle errors
#define MAX_ERROR_LENGTH 500
#define EXIT_ON_ERROR 0
#define PROCESS_ERROR(error_msg) process_error(error_msg, EXIT_ON_ERROR, __FILE__, __LINE__)
struct box process_error(const char* error_msg, int should_exit, char* file, int line);
// Inverse cdf
struct box inverse_cdf_double(double cdf(double), double p);
struct box inverse_cdf_box(struct box cdf_box(double), double p);
// Samplers from cdf
struct box sampler_cdf_double(double cdf(double), uint64_t* seed);
struct box sampler_cdf_box(struct box cdf(double), uint64_t* seed);
// Get 90% confidence interval
typedef struct ci_t {
float low;
float high;
} ci;
ci get_90_confidence_interval(double (*sampler)(uint64_t*), uint64_t* seed);
// small algebra manipulations
typedef struct normal_params_t {
double mean;
double std;
} normal_params;
normal_params algebra_sum_normals(normal_params a, normal_params b);
typedef struct lognormal_params_t {
double logmean;
double logstd;
} lognormal_params;
lognormal_params algebra_product_lognormals(lognormal_params a, lognormal_params b);
lognormal_params convert_ci_to_lognormal_params(ci x);
ci convert_lognormal_params_to_ci(lognormal_params y);
#endif

View File

@ -1,339 +0,0 @@
#include <float.h>
#include <math.h>
#include <limits.h>
#include <omp.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include "squiggle.h"
// math constants
#define PI 3.14159265358979323846 // M_PI in gcc gnu99
#define NORMAL90CONFIDENCE 1.6448536269514727
// Some error niceties; these won't be used until later
#define MAX_ERROR_LENGTH 500
#define EXIT_ON_ERROR 0
#define PROCESS_ERROR(error_msg) process_error(error_msg, EXIT_ON_ERROR, __FILE__, __LINE__)
// Get confidence intervals, given a sampler
// Not in core yet because I'm not sure how much I like the struct
// and the built-in 100k samples
// to do: add n to function parameters and document
typedef struct ci_t {
float low;
float high;
} ci;
int compare_doubles(const void* p, const void* q)
{
// https://wikiless.esmailelbob.xyz/wiki/Qsort?lang=en
double x = *(const double*)p;
double y = *(const double*)q;
/* Avoid returning x - y, which can cause undefined behaviour
because of signed integer overflow. */
if (x < y)
return -1; // Return -1 if you want ascending, 1 if you want descending order.
else if (x > y)
return 1; // Return 1 if you want ascending, -1 if you want descending order.
return 0;
}
ci get_90_confidence_interval(double (*sampler)(uint64_t*), uint64_t* seed)
{
int n = 100 * 1000;
double* samples_array = malloc(n * sizeof(double));
for (int i = 0; i < n; i++) {
samples_array[i] = sampler(seed);
}
qsort(samples_array, n, sizeof(double), compare_doubles);
ci result = {
.low = samples_array[5000],
.high = samples_array[94999],
};
free(samples_array);
return result;
}
// ## Sample from an arbitrary cdf
struct box {
int empty;
double content;
char* error_msg;
};
struct box process_error(const char* error_msg, int should_exit, char* file, int line)
{
if (should_exit) {
printf("@, in %s (%d)", file, line);
exit(1);
} else {
char error_msg[MAX_ERROR_LENGTH];
snprintf(error_msg, MAX_ERROR_LENGTH, "@, in %s (%d)", file, line); // NOLINT: We are being carefull here by considering MAX_ERROR_LENGTH explicitly.
struct box error = { .empty = 1, .error_msg = error_msg };
return error;
}
}
// Inverse cdf at point
// Two versions of this function:
// - raw, dealing with cdfs that return doubles
// - input: cdf: double => double, p
// - output: Box(number|error)
// - box, dealing with cdfs that return a box.
// - input: cdf: double => Box(number|error), p
// - output: Box(number|error)
struct box inverse_cdf_double(double cdf(double), double p)
{
// given a cdf: [-Inf, Inf] => [0,1]
// returns a box with either
// x such that cdf(x) = p
// or an error
// if EXIT_ON_ERROR is set to 1, it exits instead of providing an error
double low = -1.0;
double high = 1.0;
// 1. Make sure that cdf(low) < p < cdf(high)
int interval_found = 0;
while ((!interval_found) && (low > -FLT_MAX / 4) && (high < FLT_MAX / 4)) {
// ^ Using FLT_MIN and FLT_MAX is overkill
// but it's also the *correct* thing to do.
int low_condition = (cdf(low) < p);
int high_condition = (p < cdf(high));
if (low_condition && high_condition) {
interval_found = 1;
} else if (!low_condition) {
low = low * 2;
} else if (!high_condition) {
high = high * 2;
}
}
if (!interval_found) {
return PROCESS_ERROR("Interval containing the target value not found, in function inverse_cdf");
} else {
int convergence_condition = 0;
int count = 0;
while (!convergence_condition && (count < (INT_MAX / 2))) {
double mid = (high + low) / 2;
int mid_not_new = (mid == low) || (mid == high);
// double width = high - low;
// if ((width < 1e-8) || mid_not_new){
if (mid_not_new) {
convergence_condition = 1;
} else {
double mid_sign = cdf(mid) - p;
if (mid_sign < 0) {
low = mid;
} else if (mid_sign > 0) {
high = mid;
} else if (mid_sign == 0) {
low = mid;
high = mid;
}
}
}
if (convergence_condition) {
struct box result = { .empty = 0, .content = low };
return result;
} else {
return PROCESS_ERROR("Search process did not converge, in function inverse_cdf");
}
}
}
struct box inverse_cdf_box(struct box cdf_box(double), double p)
{
// given a cdf: [-Inf, Inf] => Box([0,1])
// returns a box with either
// x such that cdf(x) = p
// or an error
// if EXIT_ON_ERROR is set to 1, it exits instead of providing an error
double low = -1.0;
double high = 1.0;
// 1. Make sure that cdf(low) < p < cdf(high)
int interval_found = 0;
while ((!interval_found) && (low > -FLT_MAX / 4) && (high < FLT_MAX / 4)) {
// ^ Using FLT_MIN and FLT_MAX is overkill
// but it's also the *correct* thing to do.
struct box cdf_low = cdf_box(low);
if (cdf_low.empty) {
return PROCESS_ERROR(cdf_low.error_msg);
}
struct box cdf_high = cdf_box(high);
if (cdf_high.empty) {
return PROCESS_ERROR(cdf_low.error_msg);
}
int low_condition = (cdf_low.content < p);
int high_condition = (p < cdf_high.content);
if (low_condition && high_condition) {
interval_found = 1;
} else if (!low_condition) {
low = low * 2;
} else if (!high_condition) {
high = high * 2;
}
}
if (!interval_found) {
return PROCESS_ERROR("Interval containing the target value not found, in function inverse_cdf");
} else {
int convergence_condition = 0;
int count = 0;
while (!convergence_condition && (count < (INT_MAX / 2))) {
double mid = (high + low) / 2;
int mid_not_new = (mid == low) || (mid == high);
// double width = high - low;
if (mid_not_new) {
// if ((width < 1e-8) || mid_not_new){
convergence_condition = 1;
} else {
struct box cdf_mid = cdf_box(mid);
if (cdf_mid.empty) {
return PROCESS_ERROR(cdf_mid.error_msg);
}
double mid_sign = cdf_mid.content - p;
if (mid_sign < 0) {
low = mid;
} else if (mid_sign > 0) {
high = mid;
} else if (mid_sign == 0) {
low = mid;
high = mid;
}
}
}
if (convergence_condition) {
struct box result = { .empty = 0, .content = low };
return result;
} else {
return PROCESS_ERROR("Search process did not converge, in function inverse_cdf");
}
}
}
// Sampler based on inverse cdf and randomness function
struct box sampler_cdf_box(struct box cdf(double), uint64_t* seed)
{
double p = sample_unit_uniform(seed);
struct box result = inverse_cdf_box(cdf, p);
return result;
}
struct box sampler_cdf_double(double cdf(double), uint64_t* seed)
{
double p = sample_unit_uniform(seed);
struct box result = inverse_cdf_double(cdf, p);
return result;
}
/* Could also define other variations, e.g.,
double sampler_danger(struct box cdf(double), uint64_t* seed)
{
double p = sample_unit_uniform(seed);
struct box result = inverse_cdf_box(cdf, p);
if(result.empty){
exit(1);
}else{
return result.content;
}
}
*/
// # Small algebra manipulations
// here I discover named structs,
// which mean that I don't have to be typing
// struct blah all the time.
typedef struct normal_params_t {
double mean;
double std;
} normal_params;
normal_params algebra_sum_normals(normal_params a, normal_params b)
{
normal_params result = {
.mean = a.mean + b.mean,
.std = sqrt((a.std * a.std) + (b.std * b.std)),
};
return result;
}
typedef struct lognormal_params_t {
double logmean;
double logstd;
} lognormal_params;
lognormal_params algebra_product_lognormals(lognormal_params a, lognormal_params b)
{
lognormal_params result = {
.logmean = a.logmean + b.logmean,
.logstd = sqrt((a.logstd * a.logstd) + (b.logstd * b.logstd)),
};
return result;
}
lognormal_params convert_ci_to_lognormal_params(ci x)
{
double loghigh = logf(x.high);
double loglow = logf(x.low);
double logmean = (loghigh + loglow) / 2.0;
double logstd = (loghigh - loglow) / (2.0 * NORMAL90CONFIDENCE);
lognormal_params result = { .logmean = logmean, .logstd = logstd };
return result;
}
ci convert_lognormal_params_to_ci(lognormal_params y)
{
double h = y.logstd * NORMAL90CONFIDENCE;
double loghigh = y.logmean + h;
double loglow = y.logmean - h;
ci result = { .low = exp(loglow), .high = exp(loghigh) };
return result;
}
// Paralellism
/*
void paralellize(float (*sampler)(uint64_t* seed), float* results, int n_threads, int n_samples){
if((n_samples % n_threads) != 0){
fprintf(stderr, "Number of samples isn't divisible by number of threads, aborting\n");
exit(1);
}
uint64_t** seeds = malloc(n_threads * sizeof(uint64_t*));
for (uint64_t i = 0; i < n_threads; i++) {
seeds[i] = malloc(sizeof(uint64_t));
*seeds[i] = i + 1; // xorshift can't start with 0
}
int i;
#pragma omp parallel private(i)
{
#pragma omp for
for (i = 0; i < n_threads; i++) {
int lower_bound = i * (n_samples / n_threads);
int upper_bound = ((i+1) * (n_samples / n_threads)) - 1;
// printf("Lower bound: %d, upper bound: %d\n", lower_bound, upper_bound);
for (int j = lower_bound; j < upper_bound; j++) {
results[j] = sampler(seeds[i]);
}
}
}
for (uint64_t i = 0; i < n_threads; i++) {
free(seeds[i]);
}
free(seeds);
}
*/

View File

@ -1,49 +0,0 @@
#ifndef SQUIGGLE_C_EXTRA
#define SQUIGGLE_C_EXTRA
// Box
struct box {
int empty;
double content;
char* error_msg;
};
// Macros to handle errors
#define MAX_ERROR_LENGTH 500
#define EXIT_ON_ERROR 0
#define PROCESS_ERROR(error_msg) process_error(error_msg, EXIT_ON_ERROR, __FILE__, __LINE__)
struct box process_error(const char* error_msg, int should_exit, char* file, int line);
// Inverse cdf
struct box inverse_cdf_double(double cdf(double), double p);
struct box inverse_cdf_box(struct box cdf_box(double), double p);
// Samplers from cdf
struct box sampler_cdf_double(double cdf(double), uint64_t* seed);
struct box sampler_cdf_box(struct box cdf(double), uint64_t* seed);
// Get 90% confidence interval
typedef struct ci_t {
float low;
float high;
} ci;
ci get_90_confidence_interval(double (*sampler)(uint64_t*), uint64_t* seed);
// small algebra manipulations
typedef struct normal_params_t {
double mean;
double std;
} normal_params;
normal_params algebra_sum_normals(normal_params a, normal_params b);
typedef struct lognormal_params_t {
double logmean;
double logstd;
} lognormal_params;
lognormal_params algebra_product_lognormals(lognormal_params a, lognormal_params b);
lognormal_params convert_ci_to_lognormal_params(ci x);
ci convert_lognormal_params_to_ci(lognormal_params y);
#endif

BIN
test/test

Binary file not shown.