From c73476e5aa2db8ab7a5312d2379af6da73b0046a Mon Sep 17 00:00:00 2001 From: NunoSempere Date: Sat, 3 Jun 2023 04:21:59 -0600 Subject: [PATCH] finish xorshift updating. --- C/README.md | 2 ++ C/out/samples | Bin 22488 -> 18384 bytes C/perf.txt | 47 ++++++++++++++++++++++++----------------------- C/samples.c | 11 ++++++++++- README.md | 4 +++- time.txt | 14 ++++++++------ 6 files changed, 47 insertions(+), 31 deletions(-) diff --git a/C/README.md b/C/README.md index 116cd731..261ef8a7 100644 --- a/C/README.md +++ b/C/README.md @@ -11,6 +11,8 @@ This repository contains a few implementations of a simple botec (back-of-the-en - [ ] Add Windows/Powershell time-measuring commands - [ ] Add CUDA? - [x] Added results of perf. `rand_r` seems like a big chunk of it, but I'm hesitant to use lower-quality random numbers + - [x] used xorshift instead + - [ ] Use xorshift with a struct instead of a pointer? idk, could be faster for some reason? - [x] Update repository with correct timing - [x] Use better profiling approach to capture timing with 1M samples. - [x] See if program can be reworded so as to use multithreading effectively, e.g., so that you see speed gains proportional to the number of threads used diff --git a/C/out/samples b/C/out/samples index 2542aa2f8417183b31f636e99b7bd68bec5a6b18..ade23da3629ce188bfa4f5dc080f205bae16fcf1 100755 GIT binary patch literal 18384 zcmeHPeRNyJl^tj4Xk} z2AUoNo41O=?Af!0o}S$R;WP`hn|9M}+T}w|LhPOf)?`y^Ahd29TDcBjlmK4Vp#9x> zGf(fmNVMDC|JLsu`Of{_duQ(4xijx+^u}LT8dev%TufeW_GQMoa}C5V1YTJH?`O+c zDgG{C7qYX!&*B*VCJfP&uOno*me7OHOxs*8!(ep zws#}!C@z2MC5L&fXs^&XjEHzjO5;t};?9oSmM&S`8CulY5lQqe>Rq;U(b6TAJ<-Y< zE+@aJ4I1_}*Ecg%lOjxnk&ne(!IS*F-q#*~s^e_6hGHt)6kY|F4?hRE&!%ZG8c05Uy#PL3 z0RMFXJW&8|Du8b&fPY*7f3g7nMc_VsoYy)4`TCnKfQJg;OM(0FabDjBkgxuCfM4uX zCLa0NMd`PcY4z(GwNN+~ZtLiYhhys+S9L}s;dQ~=I>SuU+Pb0k9U2@%C6a80s-GgV!R7F1F@| z#wM*h7z=iGhC59KNK7Nt+9I4M-&($=wX}atYYldEvhGB@r9Bv9-LZ}cyy-$E(H8Wh zr8lUxc0_`m9d}W;s0Pukt|0tujfKNT!%Hfw7&%Z4x~8Fix%r7^mjLim}4UH16LE(3y)DB8>z6zg_OTK{VTf*kK49vff_*v{i z@N~IQ7Zpk0k5A(K4&0e$vo8pJ{h`0dbUusK2>!1*?`E~8eGz+3*n_aS9UY~#+NeU` zsIQ3EM{;?8k?Z=HNUmk#{vq*Q0x$8CRqj3@`O)hPoYoq-mJu>$!D)?RD*pvmAYZ!^!1x`^HA>dZs7Tmfn_F8aSN9DEMg3DVn$#z(9>wRRvf>Yb_df0-? z*pY0P1s9JR++>dh|CFo*bgu=UW5M@Z@G=XYvf$#8giD7lIOUYQMl5)~^+f8;(3uF# zL|`TYGZC1HfHeZAJQseb_MI$Ilf|ztW=y>=6)(z;sC|b@4sx~Z)ir=K*-Nhi@ho7Z zPtQRtGydyrHaoz1x?yL=Up9HVQD?>nO`aYkGUMMjdAea|#ve6#x>0AwA2NBmL1)Il zVe)ii&Wzt~@^nMaj0a7gZp4}K8%>^Wz?t!Y$GR=)bJ zdi~`ss{R|b@5E%&x`zJZ`%94u{iPmGYIh;t*?B*J$>{;LzxGuW)c)e5M3zs*=OJ|- zpfXbD?Ch9l!FEKG9TXJ$|HSpLevHUvAFKMLI{5Z=>fn@Hbsbh;{y07t6@*0vCE2l7 z9(QR^`L(^4#4K?|v)Wg?l8UPSW_*sCtX&NzUH@@5n+{>{4j0#gcWs8P**@8i-w7cV znqh}z_o@2bX+3dbNSP*ACzd9aY4u^Xf9`$6s!1iSx<*cUycqlqDaiE0=*6#)H2W#h z)63CbpEB)&WQB5K+kxjP-O|7P2(#g*2hC=TCR#o3q)vItl_}ZQ<%QewHcp-L+%m}3 zE%mkCs)Z&&@q@XG(sLuG#QNyjuol>P17VbI#g1qaVKu*b6=S0 z%JFS`1_p9|=yy-+i7CDLbW)jmm$pR>AU9$t}o@9m@UdVo0|~*3PgWYPaM;m zM~0L%++`E1lFcK@#$)J>{*peF{uYKyS579ClNc3rBH5f)TaK#zbAC?RYO;A$?Z2R0 zg=QIouIlbSC5_QYuTVy|{cvCai&w?QT$_0Y{1VyU7mNz|fU8NQ)lmAJY1`=joMKG) zssFN%Z{JBjIx?&$4(ZLQkeZxzX$3i_D^t}eIqJu0=Hd7(>Dx;Nnae6o0v1c1Onq;~%jm`~@l~x;t zbh#*u6{Em)nx}QIp==9M3A!^y%}{S2ry)pTO83q_rFt$;P0UmE1M16v=o@pXXCENw z0CE+n?Jdn<^*!a;GHCIp@1&PJpGJ+7sBw~N{98T;*DKRDJs(sk9z4WRsI+?SL7+}c zXp;$HZfsCar;mS_&1TT*y=XP#SwEP;6it5s5aN-Hr@R>x!CM^RK9PUx6*5evPjkEV z$`~1N=P1B-XAGwyY(xxlo{wU_Lz!mhrp9Shi79%^pJsZl=rm32KW5R4(Z)ol)YO0B zsTgkcL{jNTP(k|P|Fn&D-^nVG7v+tkaCC%vyd7&ap_nztx#d9^8No#EjdEzO37w{< z(B3Tyl}gue<1NP0Ln5G?NWNj1qJA=iq0V@=z@-Wva2`L*Lu#4Q4B;|8{wzGs4KYpl zUikPhR;&TNd8ev3Zm-jo9o7kxu1+N(Nh$;SE0ife#!wlaKb=%|8cP7-7r}i@o4uro zhSA+E%F|%>!W}GqyZF-AoK$uR6JzPyYc73+=hpZEbB#wWhEYyv7|Hc-7Z#UYE0mqv z9#1Mec;#;L=b)hy>2s-3 zUcpd$=?B^Dkg|unc~Mf?V{?G^LCNw_PyhWu5E$9av1Ie{WaC&;8R3r)8XzEC zdXwc?Vd<8r9N|mXVSQxM^Zjc575%{S$;59bN_hbL23_MdSRXNO{2K$Cax=n+T@zxr z2Q}j3pU+3$C?oy(TCgYSUr00FKykn3s{l}+t{gH@Obl|FXL2x}*3{;_>)_XDHf9<(U=Ap>1dNDf?X$SY4mt)!auS^k-BA-m_Raawl}2aZS7= zb?k%Xg~MbzN5b%6n8vVipW52v%rtZeBpWjxVTwU3?#E&@&DC+jaWy1KGL1Qu_|N)a7p^{(vS* zHf|1V3TzH+31|mfTPr;a?j@Ibd-P45+NC;4)A(l2#sBL+^t%RKuY2nK9r1A2#!Ew+ zeYL(zTR+8aN_16(@Dr2P)zKSI#KKqluD@*Y^+L$l_8TmXr^tTbZ;sM#v>!51Kd3M8 z;*kcz@Y(KX3*+^!&$(C7Z@UN++xt#7TY==*g)lXN?gecJ9e6jJrQa>LX0qAspkttP zcfSU?oB}-nItKbaD7pVtEFB9$Q=oOA*JDWufj$U&FX(a5-Jr9O%7;K70zCnG1auZ| zl^=sH1ic0eM;++zK|`R&uvG2<{T=8Y&}q@1G6|F}`Yo6)*IjFwtJmxL+??4Z1Ly`i>nI#_DGC?fr!)(r;AT-_B+y z2`}@mE}Q=so^$Rj+0L#z|C-CMT6i(RWWNa?6|uDbEh(H3}Q`0_O1Wf%kcjWs8~(Uczn9D{s^P5yOLeiHa=kdyzW{nxmd`2fv9 zzu$(O{4?cBpS!5l&>qGv(co{|54aRo{aFD;hgy{60`DO(mOs|u7=tSs}bEStZw%)7Fzg#V6v2=#QMo?@zJ z=9-DXOax{kFcX292+Ty_|4jtsdsXs1Dthdo%Sqb}*u?2wF1qFz#CS0b~iQIPLG(f1g-n&S#N$_!u&3`Gbjf$AK4Y+Khj38h!=fPlP6@rU1(oezD0r!d zuUfo+XpTeps>S&VQ6Ii?abEg~uQ{CmP;?w$;cRkI{r?$}?`bnVbi9yoAa8u;XVlnOAoFQ%vTs9Y4p+b32Y%FUQ9saAWrS^5SL8n`a!| z%)@+naa5A$f&r2yV5>UmAv@c2FzEOp>M zTMY*DGu$YB?96Ue>@jWf-dv;$eZu1Y!3m)blMdUVIM__`i$z$iZJM%)Tk~pA>oZU4cI;@aM$%1_b`JXxEwl z`-T3nXqWFD!sc0lUn}ab5%v5GIK_=`76OxTdr9b5iFWxuGUUG$ctq55t*GZ6jtj<0 zP$9*CTqJCpkj)YJ&Ei3p@54eiSKwa}{)_3#J`cRS=*;Jz6l)%UW>6s52Z7hbosUCE;36klO2klRe#?rfN^3t5T!GPOT9fA`(A$SrAQI zy*ALOXv+0#q+q)~~Wl_!%rfIN7Hus;EJ&Tg$XH4L7U|G-x-hUVW3Y zPFojP*`QFZ{D@OaPlETCA8_*ftp}RKNhv#M?FK&+^+~70=#-bm{7Xy=hl24Siv=Sg zjp7t%iI^6N#&8PDJjazg=;imzb5_3~>gIMNnx=k-Lpv_zz%6?LFX_Oo8Ork9B0 zUO9wLY~?v_W@(a-oK|HxTcyRLS~ZWm_as z*&T~^hhy=r7U;G_2aW}Igai~=S-&VAY-3!#J=oLEDnnZ%Xu_cJm?60%9P8nPHAhV3k;fyRfq6Th{-l7<~hC;e*Eje#!6ulFI!mA?CWWO!N|bXe&qB z({m49lKME8`y=;-B(f9*tNks)Uebk9QBd5@Y*#%vtG(QJkkn_>&OJ{^_+}LFlV()< zFZU-TmF>%YIH&(X*wY@Iw3quBl8#YuagmQyl=gDJtOFQzS=!6}4oT%cA5{1ys*^0c z9VOaBllF37MA8bODgBpvlHLP3?a4`A?w?4yOW2eDL1P(3nVjxz8eLN+`L`H--I5p(pnpC6)V>WKT9u`$vIMOl19dv@uRHAJzL%t()VUp6$(cNjQW_TQZ!Ds%t< literal 22488 zcmeHPdw5&LmA`s85nyx`u!DJQqf(_cg*Z}hQmbGhIdXDE%t_SR8IziugX+pl5U{TABpHgxHh?Y6}aViFptZ63{%(3(d;iXj*$K%fcQ-Km`yDlH-VB|8AbBYi!@jgRJ4#N$rTDG&(LxNm26s? zQZ3x3!0*U~p4MS7)#%+7yWfgBojI z#`dmFQqLD6(H04@7Af4@-6Hk0by78(x;nb4CYOXra8o!M>}cu@Mq4A{rskdurlo_O zu?}h7HFfpD?xsjndwaM&!+@NO5K0a4#s|A2ZJp7UV5s%;Xe}BV`aH#hElq9hQgN|;o)4%hq5UL8(; zR)jts&X;bM^;I3N`B_b3mky`8MYuu_rZ z#Ui7_WgWgxhm(zoFs8%v`5VDEPvYcFh!(St^hofUV1r*yhI}YR?3-4y^v0)cd;}r$TO+&Oqv$rnbc4wO&f)o z)PtEcEyy#e`!Z=-h-Xq?&!lMqo=JT%lct4tCiR6(nikxdR8uBR3++tm+Dw`@ATud% zCQS?OOv;r>(?(<__0de47T%fEB9TUKpTCNF>yuc2k*>~3ugpm=%SkWIN&ihwx-=(! zc22q|Cw&^n0s8j5m6LuwCw(L*{d`XP8GYK@un{KE+N=zimzF~nit2MTx(1W=P>2#b zsi>!wp;t}GuOq1*{k5k?2FGK5_0i;=P_$fLRQ?f3@~TNCQS^IQLW?4I?o$#`hh0f* zbvTv8*$)GiFYysjbP8ij979T?bPVH4@v(sVLW|r!L_AIchJQg80%({ZPx2k?bX@yg z6N>tZlBiZu++#im+vTY3j$*_u)Tn^!a3n86BOf7qq)H_vv9jWCXb^4SKQ7-=)W%Xb z3?`p`ADuZCP+v{nflep`=y9`hxz_*KnXvy>*Wk_Gn?{mXfU%`QUh-uWCrQlSU9bDq z@mm!2MJ4|7bbUi0QM4UfD|A$5r22k{q)Wear6ir$h4wyx2zvfcMBFE%rI>2BP#jb3 zjP#_uq>l<7;S~H^82`!$kxSlJ)M;ht*lK0yq)9Q1D&y}*OHjZ{UVtS%*~08X$kTl6 ztEOxzb}5Fn`W%WX>XB%%GElu3iRAgG)9GY0nmAf?HqwThA)Be6OUz#90UEAI8aa|(Qk2Y`6p;Ak|+<@6hW; zZ~Wy`>N84c+&^^097vS>uD1V>yz?$d)D0NpCyny<0mK99@MjX0<#PFNcF8uQ5+BCc zH|~?mC*{$}jfftgXem;IX-ZM1;+}~;J$F?nw z4;zoKksAX}v)BF9wuM^U5+5=^eB2)&wzyOBjt*$fi*D`wIi%FRUUkHmxD}IVKWqC- zrxY}sjaUDmymJgZfdTVxz423M_+d(^!+x#d!r=bI_CM!yAB-;3wZDY5KM^swNB_Q5 zbirmuhqNBpEDG-eHT$kG+CQ+trX)UNYmqyTs$=Nf(H42zv1$21mwPz2{zgm+@6Fy@ zggsz{QYFk>v5puth@8)$Y^*1FRD6Y^KThcXoPU_gDTR(-8bJ?&Q?WA@-5SWVLmpF;x+)#0HR z&vuQ099_i7i$$)n_Iz|^Wo0_q>*K8M4g z#78V}iVg)Si{%P9W_gDbEHt5$0d&Y#r@rM?N8JI(W%5q?0YM27JF2K+dJ8139HNDU zN-3{XPt~gL#FGZ0MxjEe3p|e5_7gYFjk}>I9|p_edY$X%N(Wcagiz41IyRJCkzjQnOFePr#kT9M92Dz){APbkUnkQP7fNn1Yr}N_?+bVM@9k z8ekeR_~Va4pXi<&_9ov|%Q7BS#Psk6M>H#Mf|ctmoUI=)>9T=^=*@oW5xi;XPZy)b z`37AbX2VYAFszPa^oNI1i(HSZ<3leLxgMvy$JO(W%Qui=JdU=)5R(tQlTOdTPQHjj zsgfCF%jE~f3MWHykV{wVg0F4@kfLZ_7J(Kmaa? zEqN;9>WreEq$-Dw8kM0J&5GeY#xIkDAEfnfNG>0jN5^UX8>VO}Qu}HB8=wk|GiN6St5Kz-JF!8;XS7Ejc2P>A{V{Gw z*cXv2PDYQzYTQ$?H*Un96>D%R1c0xXu)1>B{;chLnQIWMQ?Y9TOQMCg&FYm7Y%SNr z_8#L~=j59xfq4O}2|h>J?Dn?^?JplmdXdlkRRMpiE*xp$`84Et{TK6IDn|eMq9<|8 zJHq+7lPN4h0^fxI6CE1)Apa5}*1wGZEKat~X#T~ELea5&{?oE~1=CzHwewv(i7@}N z3xZkx#X|m*jDdXUNYaD+smFBI&Bu?e1rx{RoA%}H1L+Vm@%qvosE8>93qj*Brhtzs z7XB6uD*ap5ltoyUv=Lz|LNRqB&xYbpU0aGB$@On)s`)tTDWDqj=4!OW>>$nj0ncqr zGq*!7OEYHO%7VFx{-DjIsgdLb;GO!Vm|xlb`x-0o|E_qVE z-`$Y=aQ#dD<_)YWtH3|?u$ce!;DkABW20qGz6%ai5pbBvFFN3zldk|8Faa!asBqqo zV>F>Mtmiik(^KtTXa_ckG8AK`fd^x0khSglv%5fHR{D*kyr`J=Qu{xQ0h~+y!&r*V zMF1_K&4mNS7SK@f<{C5r<4GoP7-uU?MLiJAQjvahfp*hsg5Bn$wi{FTAA^d{1^?8~ znZMxA1L`y0hFbMedd@r!bv4rd-9~$FxGvqvh2k2-MtkQ7wxo#~ZGl{dCjnX))oId> zwmi;g>#1zB(fWwT7^oF(I4F-kz7Z-%N-lj|9-W{Sas;2u68Qu+z*r-_Z|^nHlOojK zX#NJ)NU!@DdFPXCBPmuIjND(Z+| z9mONZu1CyBdgG5_NgK5!p9H60U3=1}R@J&+jlFnW-iFm6Ko1A(d__C8soo4n<%4#8 z@?tCuEq=N46(3GoJnRsM4run`Z01pRT$B6}tNe}RQ?KFqEU_-_PqdF(X8pBTzaBz1 zTfbIrwGZlvb{G$?jkLjxf)j$`R1Gf}y#d#jn*di%)5>J-e zJffbkV<)XA)jx$F!CD6&iDL)qM@|hQKAK zY3G>iI*N7-oi;_6xDN6CQ$U>wsKfF^%DmIS3PmyKs9{~7n-g(flT}^EA9S6I$e$js9lrk zbUo;9&{ojC7t-nP;5lZ`i|MqD@R!o*KF~d&{{T7&dOxV`6>Pjgr<3V)6(0GGY35CqV~6?bxF~4e9}9@8RPa<4U}%-~nv~-3|I>&_U39K&L@}4C=(Q z+aRcfN7Tch9?)q}JD%!`>3IuJ9_65RJVpgTCviSF23n6(+%o>sA%Fc+GTd>UWazaS zE-YSX*@d37;zHk_i%sKzI}%C-6y75r~J*lFOi5dX0htVY~4u0Vdu_#1F;NL-^Q$)j(&` zTB~iRvBp}KFx6P?JIr2d`Sv2O)fr!4I$}7}TJE*lky&H4)mSYGp4S2pIf0M;WliTK ze;3K`B>9B7##+9k$Y*tKU*NS?#TWXleFoDHjb~b`P$tR-Po_98z0QH(z$72^wFhJV zEaH-4wHt@ev|4J@rW zYx&v{tKB<_P8>ilp0HY&Ug*($D`X!aSp?G60bpgYwMvF)*d#C?Flr}-dVEd*s|WTV z12X-u&GcW<^}pWOlehP^XDL?8$moG5p8PF)R|8}5OO#nnWkRPrj6Q4Gb`#VeH*d0* z`Pcv;i`o$dFU^S{@uDu{wa^!|NqyLbyb{b4yvo4>VLt$72gd9WX~G@`b|tV?j7ux` z0I*hI!-UZ|{F^oo#830;BFwidbo{4)DZma2{(V|IsEuWqN5z{Etp*&3!{W|QEMeE5)0u)r6D31D{aCp%~3Jlh6^?0lWo z)0wY+rfUa$h7;v*mqDe?hj|UmYhYdj^BS1fz`O?jA2cBTHzocjMcV=jxs<<2uK zxUje3lr8Xxi~aS%rCP@MJYW1@3$IzSK(9PgxIiPB|83E;0tJ!(*8Ax$O7F%Vgo4;R z+|MK8zfJpiKD~HCp_zAp_B|BDz7;RSWCboJ)nB8*w9TSG&ubJ)G=hJ&X#uZdv2Z_c zpum$nBHBI3`NT7bs9)?y1>YajTK`ZTgYa7m>u(h=4?o_pv}h-`Pb^)`^WV+M=hgnj zi1?p&Zh=nP3?q0rt>^SsPP;kn<8&9NyE)y%=^&?LoKA8&&8dW!7XF+d{@=H5&6-c! z%QxN@>x{^a+xc}Qm$<)oE6T>3fCnX{=v1-KOSj6pBw03l~s9h6WhPf|8Hn!v83ReP`nfB{2`Oaf>AEBO5&OnDt_ObQvbh zetTZrD%tXkqe+q_dtMxc;MMcoV1h9SzoO4G5wlZ|XmJz%X)gYcug^7MR`Ymydc{8Ki5o|JEXnBZ-MKgs5YH>}qqTHFLLXv>QWZ#gSNZpn-TnOQ`Ok)JbS zP(0jWu`>-|2ze~#l~{Xmk14tyJQ z{o?La+RE^``usj{DqpNetPp(tgz?WEmnVVeYk#4BJn;6BgY?t4L@Pj?+kA%jjdT4;3FDvJerEx^j+bxe z^G>YypXIoRw|_Y=^0oryM1U_c&b5dB0{nLuz<&mu^qlKwh8e$28sz$UkhkY|1yDaS*Qa&o}~N;d0GoS#%dft|mBKEQAWWqpls{y}cH2YK!|aMGufuXjS9PYZrNf7zKRWM1a@Zo5{% zVP4K#4Ce_QS%3m*T%3IAVQ0$7KZoPHFV>jZc_gA22!3w=V*md!;A9WEdRWfzxpuy$ z0Nx1Pu2q@!9RZ$aiy%G6`@4s?Uz}@yjq%&0dj7y6#{HYXd6%-vmwK8yy4%A&6(Omo zyS*(MY>Gshwgh`(9Th>kMx><^7jy6*lypca677k`@Jeh*n!V8^80`p#=uVR!T=~)5 z72MR`bz4(=usPZl=?ON)dZiFbY!64n%@tM7Rj&L@bZtl*Z$-E>8rg!&LYg|l!RA;; z#}<$({ zxY;LItMC#O-*r-OUErD;ZyM8-fkqnm{d;$}Tqv^~6}q+0`ZqS z(giW?;m%FbR#r||M=;je)`Bw7rR?inW~;8ul&-s4MprzurJz%5m@^pd3cBc?n%M$Y zxKTPErQq#Nu#0eKGrP)0$AkXm-Wa3H#pN)tE_7eJP$tW4+4VC`(Qu}jy31^4n>pKp zmRM&#gp5*j75-v@=I~D zd9+!F3wk=Y2<=C^q+U#v*0z@D%?WA} zB95w>I@(Z}CJq~sD&SQ+;Qq4R`k$Mk<-~vwwh1hVa{xj4HiqYFyqN?dAH#=^Y=pcx zFA&s@Jbe)7Ljt3Hv0lCr>l_6^9a=K;jb$w>_>t1fi}MFT?OELP?2{enIS+rf=@RW1 z=W&9H`o(!%Zu^@cuRWt`<%;tRK_^MM2-HT33VE@QZv#eM7WIpB4?)FwA6Qu6#fhZ! z1;pqqO~{M$5kbp2r)a<66LcH$>1HcpV}$f&tWNn2{nAV)86=W>EhXC$+V z%~bO*Ir8E>C__I$eov0PI5!e>@&n}mGeQ@st-n-FA=3B@Mx&Xp@`ONdRM-vhChysM1NEbtxFj2pFF0y$v zj$D4KOCJ{T`ki_ZaUp+c4ld;NRM?K5)#K4p7XHHtJwC3$N3=)yA3E0%ZPz5UZ{=!@ cvdcep8KPbREkY!p{JZNljtg=Ga+1>j0204h761SM diff --git a/C/perf.txt b/C/perf.txt index a3c9d659..5e7c95bf 100644 --- a/C/perf.txt +++ b/C/perf.txt @@ -1,25 +1,26 @@ +Samples: 884 of event 'cycles', Event count (approx.): 551167949 Overhead Command Shared Object Symbol - 23.94% samples libc-2.31.so [.] rand_r - 18.14% samples libgomp.so.1.0.0 [.] 0x000000000001d132 - 15.43% samples libgomp.so.1.0.0 [.] 0x000000000001d2ea - 12.16% samples samples [.] mixture._omp_fn.0 - 4.36% samples libm-2.31.so [.] __sin_fma - 3.49% samples libm-2.31.so [.] __ieee754_log_fma - 3.34% samples samples [.] random_to - 3.13% samples samples [.] random_uniform - 2.77% samples samples [.] split_array_sum._omp_fn.0 - 2.01% samples samples [.] rand_float - 1.65% samples libm-2.31.so [.] __logf_fma - 0.88% samples libgomp.so.1.0.0 [.] 0x000000000001d2f5 - 0.86% samples samples [.] ur_normal - 0.75% samples libm-2.31.so [.] __expf_fma - 0.70% samples libgomp.so.1.0.0 [.] 0x000000000001d13d - 0.69% samples libgomp.so.1.0.0 [.] 0x000000000001d139 - 0.57% samples libgomp.so.1.0.0 [.] 0x000000000001d2f1 - 0.57% samples samples [.] sample_1 - 0.55% samples samples [.] random_lognormal - 0.50% samples [kernel.kallsyms] [k] asm_exc_page_fault - 0.49% samples [kernel.kallsyms] [k] clear_page_rep - 0.47% samples samples [.] random_normal - 0.38% samples [kernel.kallsyms] [k] default_send_IPI_single_phys + 35.32% samples samples [.] xorshift32 + 14.09% samples libgomp.so.1.0.0 [.] 0x000000000001d2ea + 12.04% samples libgomp.so.1.0.0 [.] 0x000000000001d132 + 11.53% samples samples [.] mixture._omp_fn.0 + 4.55% samples libm-2.31.so [.] __sin_fma + 4.24% samples samples [.] rand_0_to_1 + 3.77% samples samples [.] random_to + 3.03% samples libm-2.31.so [.] __logf_fma + 1.61% samples libm-2.31.so [.] __expf_fma + 1.54% samples samples [.] split_array_sum._omp_fn.0 + 1.38% samples samples [.] random_uniform + 0.94% samples samples [.] ur_normal + 0.91% samples libm-2.31.so [.] __ieee754_log_fma + 0.74% samples libgomp.so.1.0.0 [.] 0x000000000001d13d + 0.52% samples samples [.] sample_0 + 0.41% samples libm-2.31.so [.] __sqrtf_finite@GLIBC_2.15 + 0.38% samples samples [.] sample_1 + 0.36% samples libgomp.so.1.0.0 [.] 0x000000000001d139 + 0.36% samples libgomp.so.1.0.0 [.] 0x000000000001d2f5 + 0.22% samples [kernel.kallsyms] [k] native_queued_spin_lock_slowpath + 0.18% samples [kernel.kallsyms] [k] _raw_spin_lock_irq + 0.18% samples samples [.] random_lognormal + 0.17% samples libgomp.so.1.0.0 [.] 0x000000000001d2f1 diff --git a/C/samples.c b/C/samples.c index 15957a94..eb292501 100644 --- a/C/samples.c +++ b/C/samples.c @@ -81,7 +81,9 @@ float split_array_sum(float** meta_array, int length, int divided_into) uint32_t xorshift32(uint32_t* seed) { - /* Algorithm "xor" from p. 4 of Marsaglia, "Xorshift RNGs" */ + // Algorithm "xor" from p. 4 of Marsaglia, "Xorshift RNGs" + // See + // https://en.wikipedia.org/wiki/Xorshift uint32_t x = *seed; x ^= x << 13; x ^= x >> 17; @@ -93,6 +95,13 @@ uint32_t xorshift32(uint32_t* seed) float rand_0_to_1(uint32_t* seed){ return ((float) xorshift32(seed)) / ((float) UINT32_MAX); + /* + uint32_t x = *seed; + x ^= x << 13; + x ^= x >> 17; + x ^= x << 5; + return ((float)(*seed = x))/((float) UINT32_MAX); + */ // previously: // ((float)rand_r(seed) / (float)RAND_MAX) // and before that: rand, but it wasn't thread-safe. diff --git a/README.md b/README.md index 62ee1736..f67267b2 100644 --- a/README.md +++ b/README.md @@ -33,7 +33,7 @@ The title of this repository is a pun on two meanings of "time to": "how much ti | Language | Time | Lines of code | |-----------------------------|-----------|---------------| -| C (optimized, 16 threads) | 6ms | 222 | +| C (optimized, 16 threads) | 5ms | 249 | | Nim | 68ms | 84 | | C (naïve implementation) | 292ms | 149 | | Javascript (NodeJS) | 732ms | 69 | @@ -74,6 +74,8 @@ In fact, the C code ended up being so fast that I had to measure its time by run And still, there are some missing optimizations, like tweaking the code to take into account cache misses. I'm not exactly sure how that would go, though. +Once the code was at 6.6ms, there was a 0.6ms gain possible by using OMP better, and a 1ms gain by using a xorshift algorithm instead of rand_r from stdlib. + Although the above paragraphs were written in the first person, the C code was written together with Jorge Sierra, who translated the algorithmic improvements from nim to it and added the initial multithreading support. ### NodeJS and Squiggle diff --git a/time.txt b/time.txt index 1975a4e6..1cd53214 100644 --- a/time.txt +++ b/time.txt @@ -1,22 +1,24 @@ # Optimized C -$ make time-linux + +$ make && make time-linux +gcc -O3 samples.c -fopenmp -lm -o out/samples Requires /bin/time, found on GNU/Linux systems Running 100x and taking avg time: OMP_NUM_THREADS=1 out/samples -Time using 1 thread: 24.00ms +Time using 1 thread: 20.20ms Running 100x and taking avg time: OMP_NUM_THREADS=2 out/samples -Time using 2 threads: 21.80ms +Time using 2 threads: 17.50ms Running 100x and taking avg time: OMP_NUM_THREADS=4 out/samples -Time for 4 threads: 24.40ms +Time for 4 threads: 17.00ms Running 100x and taking avg time: OMP_NUM_THREADS=8 out/samples -Time using 8 threads: 10.40ms +Time using 8 threads: 8.40ms Running 100x and taking avg time: OMP_NUM_THREADS=16 out/samples -Time using 16 threads: 6.60ms +Time using 16 threads: 5.00ms # C