update README, time.txt tally

2023-06-02 16:24:08 -06:00 · 2023-06-02 16:24:08 -06:00 · 3378d1b9e7
commit 3378d1b9e7
parent 76fc0c817d
2 changed files with 24 additions and 13 deletions
--- a/README.md
+++ b/README.md
@ -58,14 +58,19 @@ Ultimately, these optimizations were also incorporated into the C code as well.

 ### C

-For the C code, I enabled the `-Ofast` compilation flag. Without it, it instead takes ~0.4 seconds. Initially, before I enabled the `-Ofast` flag, I was surprised that the Node and Squiggle code were comparable to the C code. 
+The optimizations which make the final C code significantly faster than the naïve implementation are:

-The two optimizations which make more optimized code significantly faster than the naïve implementation are:
 - To pass around pointers to functions, instead of large arrays. This is the same as in the nim implementation, but imho leads to more complex code
- To use multithreading support
 - To use the Box-Muller transform instead of using libraries, like in nim.
+- To use multithreading support

-For the optimized C code, see [that folder's README](./C-optimized/README.md).
+The C code uses the `-Ofast` or `-O3` compilation flags. Initially, without using those flags and without the algorithmic improvements, the code took ~0.4 seconds to run. So I was surprised that the Node and Squiggle code were comparable to the C code. It ended up being the case that the C code could be pushed to be ~100x faster, though :)
+
+In fact, the C code ended up being so fast that I had to measure its time by running the code 100 times in a row and dividing that amount by 100, rather than by just running it once, because running it once was too fast for /bin/time. More sophisticated profiling tools exist that could e.g., account for how iddle a machine is when running the code, but I didn't think that was worth it at this point.
+
+And still, there are some missing optimizations, like tweaking the code to take into account cache misses. I'm not exactly sure how that would go, though.
+
+Although the above paragraphs were written in the first person, the C code was written together with Jorge Sierra, who translated the algorithmic improvements to it and added the initial multithreading support.

 ### NodeJS and Squiggle

--- a/time.txt
+++ b/time.txt
@ -1,16 +1,22 @@
 # Optimized C

-OMP_NUM_THREADS=1 /bin/time -f "Time: %es" ./out/samples && echo
-Sum(dist_mixture, N)/N = 0.885837
-Time: 0.02s
+$ make time-linux
+Requires /bin/time, found on GNU/Linux systems

-OMP_NUM_THREADS=2 /bin/time -f "Time: %es" ./out/samples && echo
-Sum(dist_mixture, N)/N = 0.885123
-Time: 0.14s
+Running 100x and taking avg time: OMP_NUM_THREADS=1 out/samples
+Time using 1 thread: 24.00ms

-OMP_NUM_THREADS=4 /bin/time -f "Time: %es" ./out/samples && echo
-Sum(dist_mixture, N)/N = 0.886255
-Time: 0.11s
+Running 100x and taking avg time: OMP_NUM_THREADS=2 out/samples
+Time using 2 threads: 21.80ms
+
+Running 100x and taking avg time: OMP_NUM_THREADS=4 out/samples
+Time for 4 threads: 24.40ms
+
+Running 100x and taking avg time: OMP_NUM_THREADS=8 out/samples
+Time using 8 threads: 10.40ms
+
+Running 100x and taking avg time: OMP_NUM_THREADS=16 out/samples
+Time using 16 threads: 6.60ms

 # C