squiggle.c/ROADMAP.md

# Roadmap

## To do

- [ ] Big refactor
  - [ ] Come up with a better headline example; fermi paradox paper is too complicated
  - [ ] Make README.md less messy
  - [ ] Give examples of new functions
- [ ] Post on suckless subreddit
- [ ] Drive in a few more real-life applications
  - [ ] US election modelling?
- [ ] Look into using size_t instead of int for sample numbers
- [ ] Reorganize code a little bit to reduce usage of gcc's nested functions

## Done

- [x] Document print stats
- [x] Document rudimentary algebra manipulations for normal/lognormal
- [x] Think through whether to delete cdf => samples function => not for now
- [x] Think through whether to:
  - simplify and just abort on error
  - complexify and use boxes for everything
  - leave as is
  - [x] Offer both options
- [x] Add more functions to do algebra and get the 90% c.i. of normals, lognormals, betas, etc.
  - Think through which of these make sense.
- [x] Systematize references
- [x] Think through seed initialization
- [x] Document parallelism
- [x] Document confidence intervals
- [x] Add example for only one sample
- [x] Add example for many samples
- [x] Use gcc extension to define functions nested inside main.
- [x] Chain various `sample_mixture` functions
- [x] Add beta distribution
  - See <https://stats.stackexchange.com/questions/502146/how-does-numpy-generate-samples-from-a-beta-distribution> for a faster method.
- [x] Use OpenMP for acceleration
- [x] Add function to get sample when given a cdf
- [x] Don't have a single header file.
- [x] Structure project a bit better
- [x] Simplify `PROCESS_ERROR` macro
- [x] Add README
  - [x] Schema: a function which takes a sample and manipulates it,
  - [x] and at the end, an array of samples.
  - [x] Explain boxes
  - [x] Explain nested functions
  - [x] Explain exit on error
  - [x] Explain individual examples
- [x] Rename functions to something more self-explanatory, e.g,. `sample_unit_normal`.
- [x] Add summarization functions: mean, std
- [x] Add sampling from a gamma distribution
  - https://dl.acm.org/doi/pdf/10.1145/358407.358414
- [x] Explain correlated samples
- [x] Test summary statistics for each of the distributions.
  - [x] For uniform
  - [x] For normal
  - [x] For lognormal
  - [x] For lognormal (to syntax)
  - [x] For beta distribution
- [x] Clarify gamma/standard gamma
- [x] Add efficient sampling from a beta distribution
  - https://dl.acm.org/doi/10.1145/358407.358414
  - https://link.springer.com/article/10.1007/bf02293108
  - https://stats.stackexchange.com/questions/502146/how-does-numpy-generate-samples-from-a-beta-distribution
  - https://github.com/numpy/numpy/blob/5cae51e794d69dd553104099305e9f92db237c53/numpy/random/src/distributions/distributions.c
- [x] Pontificate about lognormal tests
- [x] Give warning about sampling-based methods.
- [x] Have some more complicated & realistic example
- [x] Add summarization functions: 90% ci (or all c.i.?) 
- [x] Link to the examples in the examples section.
- [x] Add a few functions for doing simple algebra on normals, and lognormals
  - [x] Add prototypes
  - [x] Use named structs
  - [x] Add to header file
  - [x] Provide example algebra
  - [x] Add conversion between 90% ci and parameters.
  - [x] Use that conversion in conjunction with small algebra.
  - [x] Consider ergonomics of using ci instead of c_i
    - [x] use named struct instead
    - [x] demonstrate and document feeding a struct directly to a function; my_function((struct c_i){.low = 1, .high = 2});
  - [x] Move to own file? Or signpost in file? => signposted in file.
- [x] Write twitter thread: now [here](https://twitter.com/NunoSempere/status/1707041153210564959); retweets appreciated.
- [x] Write better confidence interval code that:
  - Gets number of samples as an input
  - Gets either a sampler function or a list of samples
  - is O(n), not O(nlog(n))
  - Parallelizes stuff

## Discarded

- [ ] ~~Disambiguate sample_laplace--successes vs failures || successes vs total trials as two distinct and differently named functions~~
- [ ] ~~Support all distribution functions in <https://www.squiggle-language.com/docs/Api/Dist>~~
- [ ] ~~Add a custom preprocessor to allow simple nested functions that don't rely on local scope?~~
- [ ] ~~Add tests in Stan?~~
- [ ] ~~Test results for lognormal manipulations~~
- [ ] ~~Consider desirability of defining shortcuts for algebra functions. Adds a level of magic, though.~~
- [ ] ~~Think about whether to write a simple version of this for [uxn](https://100r.co/site/uxn.html), a minimalist portable programming stack which, sadly, doesn't have doubles (64 bit floats)~~
big README refactor 2024-02-01 19:24:44 +00:00			`# Roadmap`

			`## To do`

			`- [ ] Big refactor`
			`- [ ] Come up with a better headline example; fermi paradox paper is too complicated`
			`- [ ] Make README.md less messy`
			`- [ ] Give examples of new functions`
			`- [ ] Post on suckless subreddit`
			`- [ ] Drive in a few more real-life applications`
			`- [ ] US election modelling?`
			`- [ ] Look into using size_t instead of int for sample numbers`
			`- [ ] Reorganize code a little bit to reduce usage of gcc's nested functions`

			`## Done`

			`- [x] Document print stats`
			`- [x] Document rudimentary algebra manipulations for normal/lognormal`
			`- [x] Think through whether to delete cdf => samples function => not for now`
			`- [x] Think through whether to:`
			`- simplify and just abort on error`
			`- complexify and use boxes for everything`
			`- leave as is`
			`- [x] Offer both options`
			`- [x] Add more functions to do algebra and get the 90% c.i. of normals, lognormals, betas, etc.`
			`- Think through which of these make sense.`
			`- [x] Systematize references`
			`- [x] Think through seed initialization`
			`- [x] Document parallelism`
			`- [x] Document confidence intervals`
			`- [x] Add example for only one sample`
			`- [x] Add example for many samples`
			`- [x] Use gcc extension to define functions nested inside main.`
			- [x] Chain various `sample_mixture` functions
			`- [x] Add beta distribution`
			`- See <https://stats.stackexchange.com/questions/502146/how-does-numpy-generate-samples-from-a-beta-distribution> for a faster method.`
			`- [x] Use OpenMP for acceleration`
			`- [x] Add function to get sample when given a cdf`
			`- [x] Don't have a single header file.`
			`- [x] Structure project a bit better`
			- [x] Simplify `PROCESS_ERROR` macro
			`- [x] Add README`
			`- [x] Schema: a function which takes a sample and manipulates it,`
			`- [x] and at the end, an array of samples.`
			`- [x] Explain boxes`
			`- [x] Explain nested functions`
			`- [x] Explain exit on error`
			`- [x] Explain individual examples`
			- [x] Rename functions to something more self-explanatory, e.g,. `sample_unit_normal`.
			`- [x] Add summarization functions: mean, std`
			`- [x] Add sampling from a gamma distribution`
			`- https://dl.acm.org/doi/pdf/10.1145/358407.358414`
			`- [x] Explain correlated samples`
			`- [x] Test summary statistics for each of the distributions.`
			`- [x] For uniform`
			`- [x] For normal`
			`- [x] For lognormal`
			`- [x] For lognormal (to syntax)`
			`- [x] For beta distribution`
			`- [x] Clarify gamma/standard gamma`
			`- [x] Add efficient sampling from a beta distribution`
			`- https://dl.acm.org/doi/10.1145/358407.358414`
			`- https://link.springer.com/article/10.1007/bf02293108`
			`- https://stats.stackexchange.com/questions/502146/how-does-numpy-generate-samples-from-a-beta-distribution`
			`- https://github.com/numpy/numpy/blob/5cae51e794d69dd553104099305e9f92db237c53/numpy/random/src/distributions/distributions.c`
			`- [x] Pontificate about lognormal tests`
			`- [x] Give warning about sampling-based methods.`
			`- [x] Have some more complicated & realistic example`
			`- [x] Add summarization functions: 90% ci (or all c.i.?)`
			`- [x] Link to the examples in the examples section.`
			`- [x] Add a few functions for doing simple algebra on normals, and lognormals`
			`- [x] Add prototypes`
			`- [x] Use named structs`
			`- [x] Add to header file`
			`- [x] Provide example algebra`
			`- [x] Add conversion between 90% ci and parameters.`
			`- [x] Use that conversion in conjunction with small algebra.`
			`- [x] Consider ergonomics of using ci instead of c_i`
			`- [x] use named struct instead`
			`- [x] demonstrate and document feeding a struct directly to a function; my_function((struct c_i){.low = 1, .high = 2});`
			`- [x] Move to own file? Or signpost in file? => signposted in file.`
			`- [x] Write twitter thread: now [here](https://twitter.com/NunoSempere/status/1707041153210564959); retweets appreciated.`
			`- [x] Write better confidence interval code that:`
			`- Gets number of samples as an input`
			`- Gets either a sampler function or a list of samples`
			`- is O(n), not O(nlog(n))`
			`- Parallelizes stuff`

			`## Discarded`

			`- [ ] ~~Disambiguate sample_laplace--successes vs failures \|\| successes vs total trials as two distinct and differently named functions~~`
			`- [ ] ~~Support all distribution functions in <https://www.squiggle-language.com/docs/Api/Dist>~~`
			`- [ ] ~~Add a custom preprocessor to allow simple nested functions that don't rely on local scope?~~`
			`- [ ] ~~Add tests in Stan?~~`
			`- [ ] ~~Test results for lognormal manipulations~~`
			`- [ ] ~~Consider desirability of defining shortcuts for algebra functions. Adds a level of magic, though.~~`
			`- [ ] ~~Think about whether to write a simple version of this for [uxn](https://100r.co/site/uxn.html), a minimalist portable programming stack which, sadly, doesn't have doubles (64 bit floats)~~`