Compare commits

..

No commits in common. "06a0569a02429cf36055cb0d94560960daf38544" and "3c49d3d2fd78d955d2557a1311250ad0e965da22" have entirely different histories.

6 changed files with 22 additions and 176 deletions

View File

@ -1,6 +1,6 @@
# squiggle.c
# Squiggle.c
squiggle.c is a self-contained C99 library that provides functions for simple Monte Carlo estimation, based on [Squiggle](https://www.squiggle-language.com/).
A self-contained C99 library that provides functions for simple Monte Carlo estimation, based on [Squiggle](https://www.squiggle-language.com/).
## Why C?
@ -28,7 +28,6 @@ You can follow some example usage in the examples/ folder
6. In the [6th example](examples/06_gamma_beta/example.c), we take samples from simple gamma and beta distributions, using the samplers provided by this library.
7. In the [7th example](examples/07_ci_beta/example.c), we get the 90% confidence interval of a beta distribution
8. The [8th example](examples/08_nuclear_war/example.c) translates the models from Eli and Nuño from [Samotsvety Nuclear Risk Forecasts — March 2022](https://forum.nunosempere.com/posts/KRFXjCqqfGQAYirm5/samotsvety-nuclear-risk-forecasts-march-2022#Nu_o_Sempere) into squiggle.c, then creates a mixture from both, and returns the mean probability of death per month and the 90% confidence interval.
8. The [9th example](examples/09_burn_10kg_fat/example.c) estimates how many minutes per day I would have to jump rope in order to lose 10kg of fat in half a year.
## Commentary
@ -50,7 +49,7 @@ To help with the above core strategy, this library provides convenience function
### Nested functions and compilation with tcc.
GCC has an extension which allows a program to define a function inside another function. This makes squiggle.c code more linear and nicer to read, at the cost of becoming dependent on GCC and hence sacrificing portability and increasing compilation times. Conversely, compiling with tcc (tiny c compiler) is almost instantaneous, but leads to longer execution times and doesn't allow for nested functions.
GCC has an extension which allows a program to define a function inside another function. This makes squiggle.c code more linear and nicer to read, at the cost of becoming dependent on GCC and hence sacrificing portability and compilation times. Conversely, compiling with tcc (tiny c compiler) is almost instantaneous, but leads to longer execution times and doesn't allow for nested functions.
| GCC | tcc |
| --- | --- |
@ -81,8 +80,6 @@ The first approach produces terser programs but might not scale. The second appr
Behaviour on error can be toggled by the `EXIT_ON_ERROR` variable. This library also provides a convenient macro, `PROCESS_ERROR`, to make error handling in either case much terser—see the usage in example 4 in the examples/ folder.
Overall, I'd describe the error handling capabilities of this library as pretty rudimentary. For example, this program might fail in surprising ways if you ask for a lognormal with negative standard deviation, because I haven't added error checking for that case yet.
### Guarantees and licensing
- I offer no guarantees about stability, correctness, performance, etc. I might, for instance, abandon the version in C and rewrite it in Zig, Nim or Rust.
@ -98,7 +95,7 @@ This code should aim to be correct, then simple, then fast.
- It should be correct. The user should be able to rely on it and not think about whether errors come from the library.
- Nonetheless, the user should understand the limitations of sampling-based methods. See the section on [Tests and the long tail of the lognormal](https://git.nunosempere.com/personal/squiggle.c#tests-and-the-long-tail-of-the-lognormal) for a discussion of how sampling is bad at capturing some aspects of distributions with long tails.
- It should be clear, conceptually simple. Simple for me to implement, simple for others to understand.
- It should be clear, conceptually simple. Simple for me to implement, simple for others to understand
- It should be fast. But when speed conflicts with simplicity, choose simplicity. For example, there might be several possible algorithms to sample a distribution, each of which is faster over part of the domain. In that case, it's conceptually simpler to just pick one algorithm, and pay the—normally small—performance penalty. In any case, though, the code should still be *way faster* than Python.
Note that being terse, or avoiding verbosity, is a non-goal. This is in part because of the constraints that C imposes. But it also aids with clarity and conceptual simplicity, as the issue of correlated samples illustrates in the next section.
@ -270,18 +267,10 @@ Overall, I would caution that if you really care about the very far tails of dis
## To do list
- [ ] Document rudimentary algebra manipulations
- [ ] Think through whether to delete cdf => samples function
- [ ] Think through whether to:
- simplify and just abort on error
- complexify and use boxes for everything
- leave as is
- [ ] Systematize references
- [ ] Publish online
- [ ] Support all distribution functions in <https://www.squiggle-language.com/docs/Api/Dist>
- [ ] do so efficiently
- [ ] Add more functions to do algebra and get the 90% c.i. of normals, lognormals, betas, etc.
- Think through which of these make sense.
## Done
@ -327,4 +316,3 @@ Overall, I would caution that if you really care about the very far tails of dis
- [x] Have some more complicated & realistic example
- [x] Add summarization functions: 90% ci (or all c.i.?)
- [x] Link to the examples in the examples section.
- [x] Add a few functions for doing simple algebra on normals, and lognormals?

Binary file not shown.

View File

@ -1,48 +0,0 @@
#include "../../squiggle.h"
#include <math.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
double sample_minutes_per_day_jumping_rope_needed_to_burn_10kg(uint64_t* seed)
{
double kcal_jumping_rope_minute = sample_to(15, 20, seed);
double kcal_jumping_rope_hour = kcal_jumping_rope_minute * 60;
double kcal_in_kg_of_fat = 7700;
double num_kg_of_fat_to_lose = 10;
double hours_jumping_rope_needed = kcal_in_kg_of_fat * num_kg_of_fat_to_lose / kcal_jumping_rope_hour;
double days_until_end_of_year = 152; // as of 2023-08-01
double hours_per_day = hours_jumping_rope_needed / days_until_end_of_year;
double minutes_per_day = hours_per_day * 60;
return minutes_per_day;
}
int main()
{
// set randomness seed
uint64_t* seed = malloc(sizeof(uint64_t));
*seed = 1000; // xorshift can't start with 0
int n = 1000 * 1000;
double* result = malloc(sizeof(double) * n);
for (int i = 0; i < n; i++) {
result[i] = sample_minutes_per_day_jumping_rope_needed_to_burn_10kg(seed);
}
printf("How many minutes per day do I have to jump rope to lose 10kg of fat by the end of the year?\n");
printf("Mean: %f\n", array_mean(result, n));
printf("A few samples: [ ");
for (int i = 0; i < 9; i++) {
printf("%.6f, ", result[i]);
}
printf("... ]\n");
struct c_i c_i_90 = get_90_confidence_interval(sample_minutes_per_day_jumping_rope_needed_to_burn_10kg, seed);
printf("90%% confidence interval: [%f, %f]\n", c_i_90.low, c_i_90.high);
free(seed);
}

View File

@ -1,53 +0,0 @@
# Interface:
# make
# make build
# make format
# make run
# Compiler
CC=gcc
# CC=tcc # <= faster compilation
# Main file
SRC=example.c ../../squiggle.c
OUTPUT=example
## Dependencies
MATH=-lm
## Flags
DEBUG= #'-g'
STANDARD=-std=c99
WARNINGS=-Wall
OPTIMIZED=-O3 #-Ofast
# OPENMP=-fopenmp
## Formatter
STYLE_BLUEPRINT=webkit
FORMATTER=clang-format -i -style=$(STYLE_BLUEPRINT)
## make build
build: $(SRC)
$(CC) $(OPTIMIZED) $(DEBUG) $(SRC) $(MATH) -o $(OUTPUT)
format: $(SRC)
$(FORMATTER) $(SRC)
run: $(SRC) $(OUTPUT)
OMP_NUM_THREADS=1 ./$(OUTPUT) && echo
time-linux:
@echo "Requires /bin/time, found on GNU/Linux systems" && echo
@echo "Running 100x and taking avg time $(OUTPUT)"
@t=$$(/usr/bin/time -f "%e" -p bash -c 'for i in {1..100}; do $(OUTPUT); done' 2>&1 >/dev/null | grep real | awk '{print $$2}' ); echo "scale=2; 1000 * $$t / 100" | bc | sed "s|^|Time using 1 thread: |" | sed 's|$$|ms|' && echo
## Profiling
profile-linux:
echo "Requires perf, which depends on the kernel version, and might be in linux-tools package or similar"
echo "Must be run as sudo"
$(CC) $(SRC) $(MATH) -o $(OUTPUT)
sudo perf record ./$(OUTPUT)
sudo perf report
rm perf.data

View File

@ -20,7 +20,7 @@ uint64_t xorshift32(uint32_t* seed)
// See <https://stackoverflow.com/questions/53886131/how-does-xorshift64-works>
// https://en.wikipedia.org/wiki/Xorshift
// Also some drama: <https://www.pcg-random.org/posts/on-vignas-pcg-critique.html>, <https://prng.di.unimi.it/>
// for floats
uint64_t x = *seed;
x ^= x << 13;
x ^= x >> 17;
@ -30,7 +30,11 @@ uint64_t xorshift32(uint32_t* seed)
uint64_t xorshift64(uint64_t* seed)
{
// same as above, but for generating doubles
// Algorithm "xor" from p. 4 of Marsaglia, "Xorshift RNGs"
// See <https://stackoverflow.com/questions/53886131/how-does-xorshift64-works>
// https://en.wikipedia.org/wiki/Xorshift
// Also some drama: <https://www.pcg-random.org/posts/on-vignas-pcg-critique.html>, <https://prng.di.unimi.it/>
uint64_t x = *seed;
x ^= x << 13;
x ^= x >> 7;
@ -379,19 +383,6 @@ struct box sampler_cdf_double(double cdf(double), uint64_t* seed)
return result;
}
/* Could also define other variations, e.g.,
double sampler_danger(struct box cdf(double), uint64_t* seed)
{
double p = sample_unit_uniform(seed);
struct box result = inverse_cdf_box(cdf, p);
if(result.empty){
exit(1);
}else{
return result.content;
}
}
*/
// Get confidence intervals, given a sampler
struct c_i {
@ -431,47 +422,15 @@ struct c_i get_90_confidence_interval(double (*sampler)(uint64_t*), uint64_t* se
return result;
}
// Do algebra over lognormals and normals
struct normal_parameters {
double mean;
double std;
};
struct lognormal_parameters {
double logmean;
double logstd;
};
struct normal_parameters algebra_sum_normals(struct normal_parameters a, struct normal_parameters b)
/* Could also define other variations, e.g.,
double sampler_danger(struct box cdf(double), uint64_t* seed)
{
struct normal_parameters result = {
.mean = a.mean + b.mean,
.std = sqrt((a.std * a.std) + (b.std * b.std)),
};
return result;
}
struct normal_parameters algebra_shift_normal(struct normal_parameters a, double shift)
{
struct normal_parameters result = {
.mean = a.mean + shift,
.std = a.std,
};
return result;
}
struct lognormal_parameters algebra_product_lognormals(struct lognormal_parameters a, struct lognormal_parameters b)
{
struct lognormal_parameters result = {
.logmean = a.logmean + b.logmean,
.logstd = sqrt((a.logstd * a.logstd) + (b.logstd * b.logstd)),
};
return result;
}
struct lognormal_parameters algebra_scale_lognormal(struct lognormal_parameters a, double k)
{
struct lognormal_parameters result = {
.logmean = a.logmean + k,
.logstd = a.logstd,
};
return result;
double p = sample_unit_uniform(seed);
struct box result = inverse_cdf_box(cdf, p);
if(result.empty){
exit(1);
}else{
return result.content;
}
}
*/

View File

@ -52,8 +52,8 @@ struct box sampler_cdf_box(struct box cdf(double), uint64_t* seed);
// Get 90% confidence interval
struct c_i {
float low;
float high;
float low;
float high;
};
struct c_i get_90_confidence_interval(double (*sampler)(uint64_t*), uint64_t* seed);