7.1 KiB
A calculator for distributions, for Fermi estimation
This project is a minimalist, calculator-style DSL for fermi estimation. It can multiply, divide, add and substract scalars, lognormals and beta distributions, and supports variables.
Motivation
Sometimes, Squiggle, simple squiggle or squiggle.c are still too complicated and un-unix-like. In particular, their startup cost is not instant.
Installation
make build
sudo make install
fermi
Usage
$ fermi
5000000 12000000
=> 5.0M 12.0M
* beta 1 200
=> 1.9K 123.1K
* 30 180
=> 122.9K 11.7M
/ 48 52
=> 2.5K 234.6K
/ 5 6
=> 448.8 43.0K
/ 6 8
=> 64.5 6.2K
/ 60
=> 1.1 103.7
Perhaps this example is more understandable with comments and better units:
$ fermi
5M 12M # number of people living in Chicago
beta 1 200 # fraction of people that have a piano
30 180 # minutes it takes to tune a piano, including travel time
/ 48 52 # weeks a year that piano tuners work for
/ 5 6 # days a week in which piano tuners work
/ 6 8 # hours a day in which piano tuners work
/ 60 # minutes to an hour
=: piano_tuners
If you type "help" (or run fermi -h), you can see a small grammar and some optional command flags:
$ fermi
1. Grammar:
Operation | Variable assignment | Special
Operation: operator operand
operator: (empty) | * | / | + | -
operand: scalar | lognormal | beta | variable
lognormal: low high
beta: beta alpha beta
Variable assignment: =: variable_name
Variable assignment and clear stack: =. variable_name
Suffixes: %, K, M, B, T
Special commands:
Comment: # this is a comment
Summary stats: stats
Clear stack: clear | c | .
Print debug info: debug | d
Print help message: help | h
Start additional stack: operator (
Return from additional stack )
Exit: exit | e
Examples:
+ 2
/ 2.5
* 1 10 (interpreted as lognormal)
+ 1 10
* beta 1 10
1 10 (multiplication taken as default operation)
=: x
.
1 100
+ x
# this is a comment
* 1 12 # this is an operation followed by a comment
* (
1 10
+ beta 1 100
)
exit
Command flags:
-echo
Specifies whether inputs should be echoed back. Useful if reading from a file
. -f string
Specifies a file with a model to run. Sets the echo command to true by default.
-n int
Specifies the number of samples to draw when using samples (default 100000)
-h Shows help message
You can see real life examples here, here, here, here, here, here
Tips & tricks
- It's conceptually clearer to have all the multiplications first and then all the divisions
- For things between 0 and 1, consider using a beta distribution
Command line options
You can specify the number of samples to draw when algebraic manipulations are not sufficient:
$ fermi -n 1000000
$ fermi -n 1_000_000
You also run a file with the -f option
$ fermi -f more/piano-tuners.fermi
Integrations with linux utilities
Because the model reads from standard input, you can a model to it:
$ cat more/piano-tuners.fermi | fermi
In that case, you will probably want to use the echo flag as well
$ cat more/piano-tuners-commented.fermi | fermi -echo
You can make a model an executable file by running $ chmod -x model.fermi
and then adding the following at the top!
#!/bin/usr/fermi -f
You can save a session to a logfile with tee:
fermi | tee -a fermi.log
Different levels of complexity
The top level f.go file (420 lines) has a bunch of complexity: variables, parenthesis, samples, beta distributions, number of samples, etc. In the simple/ folder:
- f_simple.go (370 lines) strips variables and parenthesis, but keeps beta distributions, samples, and addition and substraction
- f_minimal.go (140 lines) strips everything that isn't lognormal and scalar multiplication and addition, plus a few debug options.
Roadmap
Done:
- Write README
- Add division?
- Read from file?
- Save to file?
- Allow comments?
- Use a sed filter?
- Add proper comment processing
- Add show more info version
- Scalar multiplication and division
- Think how to integrate with squiggle.c to draw samples
- Copy the time to botec go code
- Define samplers
- Call those samplers when operating on distributions that can't be operted on algebraically
- Display output more nicely, with K/M/B/T
- Consider the following: make this into a stack-based DSL, with:
- Variables that can be saved to and then displayed
- Other types of distributions, particularly beta distributions? => But then this requires moving to bags of samples. It could still be ~instantaneous though.
- Added bags of samples to support addition and multiplication of betas and lognormals
- Figure out go syntax for
- Maps
- Joint types
- Enums
- Fix correlation problem, by spinning up a new randomness thing every time some serial computation is done.
- Clean up error code. Right now only needed for division
- Maintain both a more complex thing that's more featureful and the more simple multiplication of lognormals thing.
- Allow input with K/M/T
- Document parenthesis syntax
- Specify number of samples as a command line option
- Figure out how to make models executable, by adding a #!/bin/bash-style command at the top?
- Make -n flag work
- Add flag to repeat input lines (useful when reading from files)
- Add percentages
- Consider adding an understanding of percentages
To (possibly) do:
- Consider implications of sampling strategy for operating variables in this case.
- Document mixture distributions
- Fix lognormal multiplication and division by 0 or < 0
- With the -f command line option, the program doesn't read from stdin after finishing reading the file
- Add functions. Now easier to do with an explicit representation of the stakc
- Think about how to draw a histogram from samples
- Dump samples to file
- Represent samples/statistics in some other way
- Perhaps use qsort rather than full sorting
- Program into a small device, like a calculator?
- Units?
Discarded:
Think of some way of calling bc