squiggle.c is a self-contained C99 library that provides functions for simple Monte Carlo estimation, inspired by [Squiggle](https://www.squiggle-language.com/).
- I am trying to build a reliable alternative to the original squiggle, that works for me and addresses my frustrations with it.
- Some adjectives: [grug brain](https://grugbrain.dev/), [lindy](https://en.wikipedia.org/wiki/Lindy_effect), [suckless](https://suckless.org/)
- I am trying to make something that is simple enough that I and others can fully understand. squiggle.c is less than 700 lines of C, with a core of <230lines.You,asomewhattechnicallysophisticatedreader,couldjustreaditandgraspitscontents,andisencouragedtodoso.
- Because it can be made faster if need be, e.g., with a multi-threading library like OpenMP, by implementing faster but more complex algorithms, or more simply, by inlining the sampling functions (adding an `inline` directive before their function declaration)
- **Because there are few abstractions between it and machine code** (C => assembly => machine code with gcc, or C => machine code, with tcc), leading to fewer errors beyond the programmer's control.
- Nonetheless, the user should understand the limitations of sampling-based methods. See the section on [Tests and the long tail of the lognormal](https://git.nunosempere.com/personal/squiggle.c/FOLK_WISDOM.md#tests-and-the-long-tail-of-the-lognormal) for a discussion of how sampling is bad at capturing some aspects of distributions with long tails.
- It should be fast. But when speed conflicts with simplicity, choose simplicity. For example, there might be several possible algorithms to sample a distribution, each of which is faster over part of the domain. In that case, it's conceptually simpler to just pick one algorithm, and pay the—normally small—performance penalty.
- In any case, though, the code should still be *way faster* than, say, Python.
Note that being terse, or avoiding verbosity, is a non-goal. This is in part because of the constraints that C imposes. But it also aids with clarity and conceptual simplicity, as the issue of correlated samples illustrates in the next section.
You can follow some example usage in the [examples/](examples]) folder. In [examples/core](examples/core/), we build up some functionality, starting from drawing one sample and finishing with a replication of [Dissolving the Fermi paradox](https://arxiv.org/abs/1806.02404). In [examples/more](examples/more), we present a few more complicated examples, like finding confidence intervals, a model of nuclear war, an estimate of how much exercise to do to lose 10kg, or an example using parallelism.
- I offer no guarantees about stability, correctness, performance, etc. I might, for instance, abandon the version in C and rewrite it in Zig, Nim, Rust, Go.
- This project mostly exists for my own usage & for my own amusement.
- Caution! Think carefully before using this project for anything important.
- If you wanted to pay me to provide some stability or correctness, guarantees, or to tweak this library for your own usage, or to teach you how to use it, you could do so [here](https://nunosempere.com/consulting).
- I am conflicted about parallelism. It *does* add more complexity, complexity that you can be bitten by if you are not careful and don't understand it. And this conflicts with the initial grug-brain motivation. At the same time, it is clever, and it is nice, and I like it a lot.
- You can vendor the code, i.e., save it as a dependency together with your other files. This way, this renders you immune to any changes I may make.
- I've been hacking at this project for a while now, and I think I have a good grasp of its correctness and limitations. I've tried Nim and Zig, and I prefer C so far.
- I think the core interface is not likely to change much, though I've recently changed the interface for parallelism and for getting confidence intervals.
- I am using this code for a few important consulting projects, and I trust myself to operate it correctly.
This library differentiates between core functions, which are pretty tightly scoped, and expansions and convenience functions, which are more meandering. Expansions are in `squiggle_more.c` and `squiggle_more.h`. To use them, take care to link them:
`squiggle_more.c` has some helper functions to get confidence intervals. They are in `squiggle_more.c` because I'm still mulling over what their shape should be, and because until recently they were pretty limited and sub-optimal. But recently, I've put a bunch of effort into being able to get the confidence interval of an array of samples in O(number of samples), and into making confidence interval functions nicer and more general. So I might promote them to the main `squiggle.c` file.
I provide some functions to draw samples in parallel. For "normal" squiggle.c models, where you define one model and then draw samples from it once at the end, they should be fine.
But for more complicated use cases, my recommendation would be to not use parallelism unless you know what you are doing, because of intricacies around setting seeds. Some gotchas and exercises for the reader:
- If you run the `sampler_parallel` function twice, you will get the same result. Why?
- If you run the `sampler_parallel` function on two different inputs, their outputs will be correlated. E.g., if you run two lognormals, indices which have higher samples in one will tend to have higher samples in the other one. Why?
- For a small amount of samples, if you run the `sampler_parallel` function, you will get better spread out random numbers than if you run things serially. Why?
That said, I found adding parallelism to be an interesting an engaging task. Most recently, I even optimized the code to ensure that two threads weren't accessing the same cache line at the same time, and it was very satisfying to see a 30% improvement as a result.
`squiggle_more.c` has some functions to do some simple algebra manipulations: sums of normals and products of lognormals. You can see some example usage [here](examples/more/07_algebra/example.c) and [here](examples/more/08_algebra_and_conversion/example.c).
I provide some auxiliary functions that take a cdf, and return a sample from the distribution produced by that cdf. This might make it easier to program models, at the cost of a 20x to 60x slowdown in the parts of the code that use it.
The process of taking a cdf and returning a sample might fail, e.g., it's a Newton method which might fail to converge because of cdf artifacts. The cdf itself might also fail, e.g., if a distribution only accepts a range of parameters, but is fed parameters outside that range.
The first approach produces terser programs but might not scale. The second approach seems like it could lead to more robust programs, but is more verbose.
Behaviour on error can be toggled by the `EXIT_ON_ERROR` variable. This library also provides a convenient macro, `PROCESS_ERROR`, to make error handling in either case much terser—see the usage in example 4 in the examples/ folder.
Overall, I'd describe the error handling capabilities of this library as pretty rudimentary. For example, this program might fail in surprising ways if you ask for a lognormal with negative standard deviation, because I haven't added error checking for that case yet.