From 06a0569a02429cf36055cb0d94560960daf38544 Mon Sep 17 00:00:00 2001 From: NunoSempere Date: Tue, 1 Aug 2023 14:17:57 +0200 Subject: [PATCH] cleanup README.md before publishing as blogpost --- README.md | 16 +++++++++++----- 1 file changed, 11 insertions(+), 5 deletions(-) diff --git a/README.md b/README.md index 09ab3bf..029a8c6 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,6 @@ -# Squiggle.c +# squiggle.c -A self-contained C99 library that provides functions for simple Monte Carlo estimation, based on [Squiggle](https://www.squiggle-language.com/). +squiggle.c is a self-contained C99 library that provides functions for simple Monte Carlo estimation, based on [Squiggle](https://www.squiggle-language.com/). ## Why C? @@ -28,6 +28,7 @@ You can follow some example usage in the examples/ folder 6. In the [6th example](examples/06_gamma_beta/example.c), we take samples from simple gamma and beta distributions, using the samplers provided by this library. 7. In the [7th example](examples/07_ci_beta/example.c), we get the 90% confidence interval of a beta distribution 8. The [8th example](examples/08_nuclear_war/example.c) translates the models from Eli and Nuño from [Samotsvety Nuclear Risk Forecasts — March 2022](https://forum.nunosempere.com/posts/KRFXjCqqfGQAYirm5/samotsvety-nuclear-risk-forecasts-march-2022#Nu_o_Sempere) into squiggle.c, then creates a mixture from both, and returns the mean probability of death per month and the 90% confidence interval. +8. The [9th example](examples/09_burn_10kg_fat/example.c) estimates how many minutes per day I would have to jump rope in order to lose 10kg of fat in half a year. ## Commentary @@ -49,7 +50,7 @@ To help with the above core strategy, this library provides convenience function ### Nested functions and compilation with tcc. -GCC has an extension which allows a program to define a function inside another function. This makes squiggle.c code more linear and nicer to read, at the cost of becoming dependent on GCC and hence sacrificing portability and compilation times. Conversely, compiling with tcc (tiny c compiler) is almost instantaneous, but leads to longer execution times and doesn't allow for nested functions. +GCC has an extension which allows a program to define a function inside another function. This makes squiggle.c code more linear and nicer to read, at the cost of becoming dependent on GCC and hence sacrificing portability and increasing compilation times. Conversely, compiling with tcc (tiny c compiler) is almost instantaneous, but leads to longer execution times and doesn't allow for nested functions. | GCC | tcc | | --- | --- | @@ -80,6 +81,8 @@ The first approach produces terser programs but might not scale. The second appr Behaviour on error can be toggled by the `EXIT_ON_ERROR` variable. This library also provides a convenient macro, `PROCESS_ERROR`, to make error handling in either case much terser—see the usage in example 4 in the examples/ folder. +Overall, I'd describe the error handling capabilities of this library as pretty rudimentary. For example, this program might fail in surprising ways if you ask for a lognormal with negative standard deviation, because I haven't added error checking for that case yet. + ### Guarantees and licensing - I offer no guarantees about stability, correctness, performance, etc. I might, for instance, abandon the version in C and rewrite it in Zig, Nim or Rust. @@ -95,7 +98,7 @@ This code should aim to be correct, then simple, then fast. - It should be correct. The user should be able to rely on it and not think about whether errors come from the library. - Nonetheless, the user should understand the limitations of sampling-based methods. See the section on [Tests and the long tail of the lognormal](https://git.nunosempere.com/personal/squiggle.c#tests-and-the-long-tail-of-the-lognormal) for a discussion of how sampling is bad at capturing some aspects of distributions with long tails. -- It should be clear, conceptually simple. Simple for me to implement, simple for others to understand +- It should be clear, conceptually simple. Simple for me to implement, simple for others to understand. - It should be fast. But when speed conflicts with simplicity, choose simplicity. For example, there might be several possible algorithms to sample a distribution, each of which is faster over part of the domain. In that case, it's conceptually simpler to just pick one algorithm, and pay the—normally small—performance penalty. In any case, though, the code should still be *way faster* than Python. Note that being terse, or avoiding verbosity, is a non-goal. This is in part because of the constraints that C imposes. But it also aids with clarity and conceptual simplicity, as the issue of correlated samples illustrates in the next section. @@ -267,16 +270,18 @@ Overall, I would caution that if you really care about the very far tails of dis ## To do list +- [ ] Document rudimentary algebra manipulations - [ ] Think through whether to delete cdf => samples function - [ ] Think through whether to: - simplify and just abort on error - complexify and use boxes for everything - leave as is -- [ ] Add a few functions for doing simple algebra on normals, betas and lognormals? - [ ] Systematize references - [ ] Publish online - [ ] Support all distribution functions in - [ ] do so efficiently +- [ ] Add more functions to do algebra and get the 90% c.i. of normals, lognormals, betas, etc. + - Think through which of these make sense. ## Done @@ -322,3 +327,4 @@ Overall, I would caution that if you really care about the very far tails of dis - [x] Have some more complicated & realistic example - [x] Add summarization functions: 90% ci (or all c.i.?) - [x] Link to the examples in the examples section. +- [x] Add a few functions for doing simple algebra on normals, and lognormals?