Simple estimation scripts which do the same in different programming languages.
Go to file
2023-06-02 16:14:04 -06:00
C remove old code from samples.c 2023-06-02 16:06:17 -06:00
js remove old files, simplify outline 2023-05-22 19:21:21 -04:00
nim fix: remove semicolon. 2023-05-24 22:39:16 -07:00
python remove old files, simplify outline 2023-05-22 19:21:21 -04:00
R remove old files, simplify outline 2023-05-22 19:21:21 -04:00
squiggle feat: rejiggle default number of samples. 2022-12-03 13:14:08 +00:00
wip/zig move nim to top level, add to README 2023-05-21 01:46:45 -04:00
.gitignore feat: add the node modules 2022-12-03 12:44:49 +00:00
README.md update README 2023-06-02 16:14:04 -06:00
time.txt time measuring tweaks. 2023-05-29 19:40:03 -04:00

Time to BOTEC

About

This repository contains example of very simple code to manipulate samples in various programming languages. It implements this platonic estimate:

p_a = 0.8
p_b = 0.5
p_c = p_a * p_b

dists = [0, 1, 1 to 3, 2 to 10]
weights = [(1 - p_c), p_c/2, p_c/4, p_c/4 ]

result = mixture(dists, weights) # should be 1M samples
mean(result)

As of now, it may be useful for checking the validity of simple estimations. The title of this repository is a pun on two meanings of "time to": "how much time does it take to do x", and "let's do x".

Current languages

  • C
  • Javascript (NodeJS)
  • Squiggle
  • R
  • Python
  • Nim

Comparison table

Language Time Lines of code
C (optimized, 16 threads) 6ms 222
Nim 68ms 84
C (naïve implementation) 292ms 149
Javascript (NodeJS) 732ms 69
Squiggle 1,536s 14
R 7,000s 49
Python (CPython) 16,641s 56

Time measurements taken with the time tool, using 1M samples:

Notes

Nim

I was really happy trying Nim, and as a result the Nim code is a bit more optimized and engineered:

  1. It is using the fastest "danger" compilation mode.
  2. It has some optimizations: I don't compute 1M samples for each dist, but instead pass functions around and compute the 1M samples at the end
  3. I define the normal function from scratch, using the BoxMuller transform.
  4. I also have a version in which I define the logarithm and sine functions themselves in nim to feed into the Box-Muller method. But it is much slower.

Without 1. and 2., the nim code takes 0m0.183s instead. But I don't think that these are unfair advantages: I liked trying out nim and therefore put in more love into the code, and this seems like it could be a recurring factor.

Ultimately, these optimizations were also incorporated into the C code as well.

C

For the C code, I enabled the -Ofast compilation flag. Without it, it instead takes ~0.4 seconds. Initially, before I enabled the -Ofast flag, I was surprised that the Node and Squiggle code were comparable to the C code.

The two optimizations which make more optimized code significantly faster than the naïve implementation are:

  • To pass around pointers to functions, instead of large arrays. This is the same as in the nim implementation, but imho leads to more complex code
  • To use multithreading support
  • To use the Box-Muller transform instead of using libraries, like in nim.

For the optimized C code, see that folder's README.

NodeJS and Squiggle

Using bun instead of node is actually a bit slower. Also, both the NodeJS and the Squiggle code use stdlib in their innards, which has a bunch of interleaved functions that make the code slower. It's possible that not using that external library could make the code faster, but at the same time, the js approach does seem to be to use external libraries whenever possible.

Python

For the Python code, it's possible that the lack of speed is more a function of me not being as familiar with Python. It's also very possible that the code would run faster with PyPy.

R

R has a warm place in my heart from back in the day, and it has predefined functions to do everything. It was particularly fast to write for me, though not particularly fast to run :) However, I do recall that R does have some multithreading support; it wasn't used.

Overall thoughts

Overall I don't think that this is a fair comparison of the languages intrinsically, because I'm just differentially good at them, because I've chosen to put more effort in ones than in others. But it is still useful to me personally, and perhaps mildly informative to others.

Languages I may add later

  • Julia (TuringML)
  • Rust
  • Lisp
  • Stan
  • Go
  • Zig
  • Forth
  • OCaml
  • Haskell
  • CUDA
  • ... and suggestions welcome

Roadmap

The future of this project is uncertain. In most words, I simply forget about this repository.

To do:

  • Check whether the Squiggle code is producing 1M samples. Still not too sure.
  • [-] Differentiate between initial startup time (e.g., compiling, loading environment) and runtime. This matters because startup time could be ~constant, so for larger projects only the runtime matters. Particularly for Julia. <= nah, too difficult.

Other similar projects