Update readme, small tweaks

This commit is contained in:
NunoSempere 2023-07-16 23:33:46 +02:00
parent bebb7c6d46
commit 17ba9488a4
5 changed files with 69 additions and 12 deletions

View File

@ -12,10 +12,6 @@ A self-contained C99 library that provides a subset of [Squiggle](https://www.sq
- Because if you can implement something in C, you can implement it anywhere else
- Because it can be made faster if need be, e.g., with a multi-threading library like OpenMP, or by adding more algorithmic complexity
## The core strategy
Have some basic building blocks, like , and return samplers. Use previous samplers to . Then use the final sampler to produce an array of samples.
## Getting started
You can follow some example usage in the examples/ folder
@ -26,23 +22,62 @@ You can follow some example usage in the examples/ folder
4. In the fourth example, we define some simple cdfs, and we draw samples from those cdfs. We see that this approach is slower than using the built-in samplers, e.g., the normal sampler.
5. In the fifth example, we define the cdf for the beta distribution, and we draw samples from it.
## Commentary
### squiggle.c is short
`squiggle.c` is around 300 lines of C. The reader could just read it and grasp its contents.
### Core strategy
This library provides some basic building blocks. The recommended strategy is to:
1. Define sampler functions, which take a seed, and return 1 sample
2. Compose those sampler functions to define your estimation model
3. At the end, call the last sampler function many times to generate many samples from your model
### Cdf auxiliary functions
To help with the above core strategy, this library provides convenience functions, which take a cdf, and returns a sample from the distribution produced by that cdf.
This might make it easier to program models, at the cost of a 20x to 60x slowdown in the parts of the code that use it.
### Nested functions and compilation with tcc.
GCC has an extension which allows a program to define a function inside another function. This makes squiggle.c code more linear and nicer to read, at the cost of becoming dependent on GCC and hence sacrificing portability and compilation times. Conversely, compiling with tcc (tiny c compiler) is almost instantaneous, and doesn't allow for nested functions.
### Error propagation vs exiting on error
The process of taking a cdf and returning a sample might fail, e.g., it's a Newton method which might fail to converge because of cdf artifacts. The cdf itself might also fail, e.g., if a distribution only accepts a range of parameters, but is fed parameters outside that range.
This library provides two approaches:
1. Print the line and function in which the error occured, then exit on error
2. In situations where there might be an error, return a struct containing either the correct value or an error message:
```C
struct box {
int empty;
float content;
char* error_msg;
};
```
The first approach produces terser programs but might not scale. The second approach seems like it could lead to more robust programmes, but is more verbose.
Behaviour on error can be toggled by the `EXIT_ON_ERROR` variable. This library also provides a convenient macro, `PROCESS_ERROR`, to make error handling in either case much terser—see the usage in example 4 in the examples/ folder.
## Related projects
- [Squiggle](https://www.squiggle-language.com/)
- [SquigglePy](https://github.com/rethinkpriorities/squigglepy)
- [Simple Squiggle](https://nunosempere.com/blog/2022/04/17/simple-squiggle/)
- [time to botec](https://github.com/NunoSempere/time-to-botec)
- [simple squiggle](https://nunosempere.com/blog/2022/04/17/simple-squiggle/)
## To do list
- [ ] Have some more complicated & realistic example
- [ ] Add summarization functions, like mean, std, 90% ci (or all c.i.?)
- [ ] Add README
- Schema: a function which takes a sample and manipulates it,
- and at the end, an array of samples.
- Explain boxes
- [x] Explain individual examples
- Explain nested functions
- [ ] Publish online
- [ ] Support all distribution functions in <https://www.squiggle-language.com/docs/Api/Dist>
- [ ] Support all distribution functions in <https://www.squiggle-language.com/docs/Api/Dist>, and do so efficiently
@ -60,3 +95,11 @@ You can follow some example usage in the examples/ folder
- [x] Add function to get sample when given a cdf
- [x] Don't have a single header file.
- [x] Structure project a bit better
- [x] Simplify `PROCESS_ERROR` macro
- [x] Add README
- [x] Schema: a function which takes a sample and manipulates it,
- [x] and at the end, an array of samples.
- [x] Explain boxes
- [x] Explain nested functions
- [x] Explain exit on error
- [x] Explain individual examples

View File

@ -5,7 +5,7 @@
#include <stdlib.h>
#include <time.h>
#define NUM_SAMPLES 10000
#define NUM_SAMPLES 1000000
// Example cdf
float cdf_uniform_0_1(float x)

View File

@ -40,6 +40,7 @@ float rand_float(float max, uint32_t* seed)
float unit_normal(uint32_t* seed)
{
// See: <https://en.wikipedia.org/wiki/Box%E2%80%93Muller_transform>
float u1 = rand_0_to_1(seed);
float u2 = rand_0_to_1(seed);
float z = sqrtf(-2.0 * log(u1)) * sin(2 * PI * u2);
@ -298,3 +299,16 @@ struct box sampler_cdf_float(float cdf(float), uint32_t* seed)
struct box result = inverse_cdf_float(cdf, p);
return result;
}
/* Could also define other variations, e.g.,
float sampler_danger(struct box cdf(float), uint32_t* seed)
{
float p = rand_0_to_1(seed);
struct box result = inverse_cdf_box(cdf, p);
if(result.empty){
exit(1);
}else{
return result.content;
}
}
*/