squiggle.c/references/index.html

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
  <head> 
    <link href="https://fonts.googleapis.com/css?family=Open+Sans:400,400i,700,700i" rel="stylesheet">
    <meta http-equiv="Content-Type" content="text/html;charset=utf-8"> 
    <meta name="keywords" content="rng, prng, xoshiro, xoroshiro, xorshift, pseudorandom number generator, random number generator">
    <style type="text/css">
      @import "css/content.php";
      @import "css/layout.php";
		@import "css/tablesorter.css";

    </style>
    <title>xoshiro/xoroshiro generators and the PRNG shootout</title> 
		<script type="text/javascript" src="js/jquery.js"></script> 
		<script type="text/javascript" src="js/tablesorter.js"></script>
		<script type="text/javascript" src="js/metadata.js"></script> 
		<script type="text/javascript">
		$.tablesorter.defaults.widgets = ['zebra']; 	
		$(document).ready( function() { 
			$("#prng").tablesorter({
				headers: { 
	            3: { 
      	          sorter: false 
         	   },
	            4: { 
      	          sorter: false 
         	   }
				}
			});
			$("#vect").tablesorter({
			});
			$("#prngf").tablesorter({
				headers: { 
	            3: { 
      	          sorter: false 
         	   },
	            4: { 
      	          sorter: false 
         	   }
				}
			});
		} );
		</script>

		<script type="text/javascript" src="https://polyfill.io/v3/polyfill.min.js?features=es6"></script>
		<script type="text/javascript" src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script>

  </head> 
  <body>
    <div id=header><code>xoshiro</code> / <code>xoroshiro</code> generators and the PRNG shootout</div>

    <div id=left>
      <ul id="left-nav">
	<li><a href="http://vigna.di.unimi.it/">Home</a></li>
	<li><a href="http://vigna.di.unimi.it/papers.php">Papers<span class=arrow>&#10132;</span></a></li>
	<li><a href="http://vigna.di.unimi.it/software.php">Software<span class=arrow>&#10132;</span></a>
	<li><strong><a href="http://prng.di.unimi.it/">PRNG shootout</a></strong>

      <ul>
	<li><a href="#intro">Introduction</A></li>
	<li><a href="#shootout">A PRNG Shootout</A></li>
	<li><a href="#speed">Speed</A></li>
	<li><a href="#quality">Quality</A></li>
	<li><a href="#remarks">Remarks</A></li>
      </ul>

	
   <li><a href="http://pcg.di.unimi.it/pcg.php">The wrap-up on PCG generators<span class=arrow>&#10132;</span></a>
   <li><a href="http://fastutil.di.unimi.it/"><code>fastutil</code><span class=arrow>&#10132;</span></a>
	<li><a href="http://dsiutils.di.unimi.it/">DSI utilities<span class=arrow>&#10132;</span></a>
	<li><a href="http://webgraph.di.unimi.it/">WebGraph<span class=arrow>&#10132;</span></a>
	<li><a href="http://sux.di.unimi.it/">Sux<span class=arrow>&#10132;</span></a>
	<li><a href="http://vigna.di.unimi.it/music.php">Music<span class=arrow>&#10132;</span></a>
	<li><a href="https://shrinkai.di.unimi.it/">Shrink AI<span class=arrow>&#10132;</span></a>
    </ul>
    </div>

    <div id="main">
      
      
      <div id="content">

	<h1 class=first>Introduction</h1>

	<p>This page describes some new pseudorandom number generators (PRNGs) we (David Blackman and I) have been working on recently, and
   a shootout comparing them with other generators. Details about the generators can
   be found in our <a
   href="http://vigna.di.unimi.it/papers.php#BlVSLPNG">paper</a>. Information about my previous <code>xorshift</code>-based
   generators can be found <a href="xorshift.php">here</a>, but they have been entirely superseded by the new ones, which
   are faster <em>and</em> better. As part of our study, we developed a very strong <a href="hwd.php">test for Hamming-weight dependencies</a> 
   that gave a number of surprising results.

	<h1>64-bit Generators</h1>

   <P><a href="xoshiro256plusplus.c"><code>xoshiro256++</code></a>/<a href="xoshiro256starstar.c"><code>xoshiro256**</code></a>
   (XOR/shift/rotate) are our <strong>all-purpose</strong>
	generators (not <em>cryptographically secure</em> generators, though,
	like all PRNGs in these pages). They have excellent (sub-ns) speed, a state
	space (256 bits) that is large enough for any parallel application, and
	they pass all tests we are aware of. See the <a href="http://vigna.di.unimi.it/papers.php#BlVSLPNG">paper</a>
   for a discussion of their differences.

	<p>If, however, one has to generate only 64-bit <strong>floating-point</strong> numbers
   (by extracting the upper 53 bits) <a
   href="xoshiro256plus.c"><code>xoshiro256+</code></a> is a slightly (&asymp;15%)
   faster generator with analogous statistical properties. For general
   usage, one has to consider that its lowest bits have low linear
   complexity and will <a href="lowcomp.php">fail linearity tests</a>; however, low linear
   complexity of the lowest bits can have hardly any impact in practice, and certainly has no
   impact at all if you generate floating-point numbers using the upper bits (we computed a <a
   href="http://vigna.di.unimi.it/papers.php#BlVSLPNG">precise
   estimate</a> of the linear complexity of the lowest bits).

	<p>If you are <strong>tight on space</strong>, <a
   href="xoroshiro128plusplus.c"><code>xoroshiro128++</code></a>/<a
   href="xoroshiro128starstar.c"><code>xoroshiro128**</code></a>
   (XOR/rotate/shift/rotate) and  <a
   href="xoroshiro128plus.c"><code>xoroshiro128+</code></a> have the same
   speed and use half of the space; the same comments apply. They are suitable only for
   low-scale parallel applications; moreover, <code>xoroshiro128+</code>
   exhibits a mild dependency in Hamming weights that generates a failure
   after 5&thinsp;TB of output in <a href="hwd.php">our test</a>. We believe
   this slight bias cannot affect any application.

	<p>Finally, if for any reason (which reason?) you need <strong>more
	state</strong>, we provide in the same
	vein <a href="xoshiro512plusplus.c"><code>xoshiro512++</code></a> / <a href="xoshiro512starstar.c"><code>xoshiro512**</code></a> / <a href="xoshiro512plus.c"><code>xoshiro512+</code></a> and
	<a href="xoroshiro1024plusplus.c"><code>xoroshiro1024++</code></a> / <a href="xoroshiro1024starstar.c"><code>xoroshiro1024**</code></a> / <a href="xoroshiro1024star.c"><code>xoroshiro1024*</code></a> (see the <a
   href="http://vigna.di.unimi.it/papers.php#BlVSLPNG">paper</a>).

	<p>All generators, being based on linear recurrences, provide <em>jump
   functions</em> that make it possible to simulate any number of calls to
   the next-state function in constant time, once a suitable <em>jump
   polynomial</em> has been computed. We provide ready-made jump functions for
   a number of calls equal to the square root of the period, to make it easy
   generating non-overlapping sequences for parallel computations, and equal
	to the cube of the fourth root of the period, to make it possible to
	generate independent sequences on different parallel processors.

	<p>We suggest to use <a href="splitmix64.c"><span
	style="font-variant: small-caps">SplitMix64</span></a> to initialize
	the state of our generators starting from a 64-bit seed, as <a href="https://dl.acm.org/citation.cfm?doid=1276927.1276928">research
	has shown</a> that initialization must be performed with a generator
	radically different in nature from the one initialized to avoid
	correlation on similar seeds.


	<h1>32-bit Generators</h1>

   <P><a href="xoshiro128plusplus.c"><code>xoshiro128++</code></a>/<a href="xoshiro128starstar.c"><code>xoshiro128**</code></a> are our
	<strong>32-bit</strong> all-purpose generators, whereas <a
	href="xoshiro128plus.c"><code>xoshiro128+</code></a> is 
	for floating-point generation. They are the 32-bit counterpart of
	<code>xoshiro256++</code>, <code>xoshiro256**</code> and <code>xoshiro256+</code>, so similar comments apply.
	Their state is too small for
	large-scale parallelism: their intended usage is inside embedded
	hardware or GPUs. For an even smaller scale, you can use <a
	href="xoroshiro64starstar.c"><code>xoroshiro64**</code></a> and <a
	href="xoroshiro64star.c"><code>xoroshiro64*</code></a>. We not believe
	at this point in time 32-bit generator with a larger state can be of
	any use (but there are 32-bit <code>xoroshiro</code> generators of much larger size).

	<p>All 32-bit generators pass all tests we are aware of, with the
   exception of linearity tests (binary rank and linear complexity) for
   <code>xoshiro128+</code> and <code>xoroshiro64*</code>: in this case,
   due to the smaller number of output bits the low linear complexity of the
   lowest bits is sufficient to trigger BigCrush tests when the output is bit-reversed. Analogously to
   the 64-bit case, generating 32-bit floating-point number using the
   upper bits will not use any of the bits with <a href="lowcomp.php">low linear complexity</a>.

	<h1>16-bit Generators</h1>

	<p>We do not suggest any particular 16-bit generator, but it is possible
	to design relatively good ones using our techniques. For example,
	Parallax has embedded in their <a href="https://www.parallax.com/propeller-2/">Propeller 2 microcontroller</a> multiple 16-bit
	<code>xoroshiro32++</code> generators.

	<h1>Congruential Generators</h1>

	<p>In case you are interested in 64-bit PRNGs based on congruential arithmetic, I provide
	three instances of a
	<a href="https://groups.google.com/forum/#!searchin/sci.stat.math/Yet$20another$20rng%7Csort:date/sci.stat.math/p7aLW3TsJys/QGb1kti6kN0J">Marsaglia's Multiply-With-Carry generators</a>,
	<a href="MWC128.c"><code>MWC128</code></a>, <a href="MWC192.c"><code>MWC192</code></a>, and <a href="MWC256.c"><code>MWC256</code></a>, for which I computed good constants. They are some
	of the fastest generator available, but they need 128-bit operations.

	<p>Stronger theoretical guarantees are provided by the
	<a href="https://www.math.ias.edu/~goresky/pdf/p1-goresky.pdf">generalized multiply-with-carry generators defined by Goresky and Klapper</a>:
	also in this case I provide two instances, <a href="GMWC128.c"><code>GMWC128</code></a> and <a href="GMWC256.c"><code>GMWC256</code></a>, for which I computed good constants.
	This generators, however, are about twice slower than MWC generators.

	<h1>JavaScript</h1>

	<p><code>xorshift128+</code> is presently used in the JavaScript engines of
	<a href="http://v8project.blogspot.com/2015/12/theres-mathrandom-and-then-theres.html">Chrome</a>,
	<a href="https://nodejs.org/">Node.js</a>,
	<a href="https://bugzilla.mozilla.org/show_bug.cgi?id=322529#c99">Firefox</a>,
	<a href="https://bugs.webkit.org/show_bug.cgi?id=151641">Safari</a> and
	<a href="https://github.com/Microsoft/ChakraCore/commit/dbda0182dc0a983dfb37d90c05000e79b6fc75b0">Microsoft Edge</a>.

	<h1>Rust</h1>
	<p>The <a href="https://docs.rs/rand/latest/rand/rngs/struct.SmallRng.html">SmallRng</a> from the <a href="https://docs.rs/rand/latest/rand/">rand</a>
	crate is <a HREF="xoshiro256plusplus.c"><code>xoshiro256++</code></a> or <a HREF="xoshiro128plusplus.c"><code>xoshiro128++</code></a>, depending
	on the platform.

	<h1><code>java.util.random</code></h1>

	<p>I worked with Guy Steele at the <a
	href="https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/random/package-summary.html">new
	family of PRNGs available in Java 17</a>. The family, called <a
	href="http://vigna.di.unimi.it/papers.php#StVLXM">LXM</a>, uses <a
	href="http://vigna.di.unimi.it/papers.php#StVCESGMCPNG">new, better
	tables of multipliers for LCGs with power-of-two moduli</a>. Moreover,
	<code>java.util.random</code> contains ready-to-use implementations of
	<a HREF="xoroshiro128plusplus.c"><code>xoroshiro128++</code></a> and <a
	HREF="xoshiro256plusplus.c"><code>xoshiro256++</code></a>.

	<h1>.NET</h1>

	<p>In version 6, Microsoft's .NET framework <a href="https://devblogs.microsoft.com/dotnet/performance-improvements-in-net-6/">has adopted</a>
	<a HREF="xoshiro256starstar.c"><code>xoshiro256**</code></a> and <a
   HREF="xoshiro128starstar.c"><code>xoshiro128**</code></a> as default PRNGs.

	<h1>Erlang</h1>

	<p>The parallel functional language <a href="https://www.erlang.org/">Erlang</a> implements <a href="https://www.erlang.org/doc/man/rand.html">several
	variants of <code>xorshift</code>/<code>xoroshiro</code>-based generators</a> adapted in collaboration with Raimo Niskanen for Erlang's
	58/59-bit arithmetic.
	
	<h1>GNU FORTRAN</h1>
	<p>GNU's <a href="https://gcc.gnu.org/fortran/">implementation of the FORTRAN language</a> <a href="https://gcc.gnu.org/onlinedocs/gfortran/RANDOM_005fNUMBER.html">uses</a>
	<a HREF="xoshiro256starstar.c"><code>xoshiro256**</code></a> as default PRNG. 

	<h1>Julia</h1>
	<p>The <a href="https://julialang.org/">Julia programming language</a> <a href="https://docs.julialang.org/en/v1/stdlib/Random/">uses</a>
	<a HREF="xoshiro256plusplus.c"><code>xoshiro256++</code></a> as default PRNG. 

	<h1>Lua</h1>
	<p>The scripting language <a href="http://www.lua.org/">Lua</a> <a href="https://www.lua.org/manual/5.4/manual.html#pdf-math.random">uses</a> <a HREF="xoshiro256starstar.c"><code>xoshiro256**</code></a> as default PRNG.

	<h1>IoT</h1>

	<p>The IoT operating systems <a href="https://os.mbed.com/">Mbed</a> and <a href="https://www.zephyrproject.org/">Zephyr</a> use
	<a HREF="xoroshiro128plus.c"><code>xoroshiro128+</code></a> as default PRNG.

	<h1><a name=shootout>&#xfeff;</a>A PRNG Shootout</h1>
	
	<p>I provide here a shootout of a few recent 64-bit PRNGs that are quite widely used.
	The purpose is that of providing a consistent, reproducible assessment of two properties of the generators: speed and quality.
	The code used to perform the tests and all the output from statistical test suites is available for download.

	<h2><a name=speed>&#xfeff;</a>Speed</h2>

	<p>The speed reported in this page is the time required to emit 64
	random bits, and the number of clock cycles required to generate a byte (thanks to the <a href="http://icl.utk.edu/papi/">PAPI</a> library). If a generator is 32-bit in nature, I glue two
	consecutive outputs. Note that
	I do not report results using GPUs or SSE instructions, with an exception for the very common SFMT: for that to be
	meaningful, I should have implementations for all generators.
	Otherwise, with suitable hardware support I could just use AES in
	counter mode and get 64 secure bits in 0.56&thinsp;ns (or just use <a href="https://github.com/google/randen">Randen</a>). The tests were performed on a
	12th Gen Intel&reg; Core&trade; i7-12700KF @3.60GHz using <code>gcc</code> 12.2.1.

	<p>A few <i>caveats</i>:
	<ul>
	<li>There is some looping overhead, but subtracting it from the timings is not going to
	be particularly meaningful due to instruction rescheduling, etc.
	<li>Relative speed might be different on different CPUs and on different scenarios.
	<li>I do not use <code>-march=native</code>, which can improve the timing of some generators
   by vectorization or special instructions, because those improvements might not be possible
   when the generator is embedded in user code.
	<li>Code has been compiled using <code>gcc</code>'s <code>-fno-unroll-loops</code>
	option. This options is essential to get a sensible result: without it, the compiler
	may perform different loop unrolling depending on the generator. Previosuly I was using also 
	<code>-fno-move-loop-invariants</code>, which was essential in not giving generators using several
   large constants an advantage by preventing the compiler from loading them into registers. However,
   as of <code>gcc</code> 12.2.1 the compiler loads the constants into registers anyway, so the 
   option is no longer used. Timings
	with <a href="http://clang.llvm.org/"><code>clang</code></a> at the time of this writing
	are very close to those obtained with <code>gcc</code>.
	If you find timings that are significantly better than those shown here on 
	comparable hardware, they are likely to be due to compiler artifacts (e.g., vectorization).
	<li>Timings are taken running a generator for billions of times in a loop; but this is not the way you use generators. Register
   allocation might be very different when the generator is embedded in an application, leading to constants being reloaded
   or part of the state space being written to main memory at each iteration. These costs do not appear in the benchmarks below.
	</ul>

	<p>To ease replicability, I distribute a <a href="harness.c"><em>harness</em></a> performing the measurement. You just
	have to define a <a href="xoroshiro128plus-speed.c"><code>next()</code></a> function and include the harness. But the only realistic
	suggestion is to try different generators in your application and see what happens.

	<h2><a name=quality>&#xfeff;</a>Quality</h2>

	<p>This is probably the more <a
	href="http://dilbert.com/strips/comic/2001-10-25/">elusive</a> property
	of a PRNG. Here quality is measured using the powerful
	BigCrush suite of tests. BigCrush is part of <a
	href="http://simul.iro.umontreal.ca/testu01/tu01.html">TestU01</a>,
	a monumental framework for testing PRNGs developed by Pierre L'Ecuyer
	and Richard Simard (&ldquo;TestU01: A C library for empirical testing
	of random number generators&rdquo;, <i>ACM Trans. Math. Softw.</i>
	33(4), Article 22, 2007).

	<p>I run BigCrush starting from 100 equispaced points of the state space
	of the generator and collect <em>failures</em>&mdash;tests in which the
	<i>p</i>-value statistics is outside the interval [0.001..0.999]. A failure
	is <em>systematic</em> if it happens at all points.

	<p>Note that TestU01 is a 32-bit test suite. Thus, two 32-bit integer values 
	are passed to the test suite for each generated 64-bit value. Floating point numbers
	are generated instead by dividing the unsigned output of the generator by 2<sup>64</sup>.
	Since this implies a bias towards the high bits (which is anyway a known characteristic
	of TestU01), I run the test suite also on the <em>reverse</em>
	generator. More detail about the whole process can be found in this <a
	href="http://vigna.di.unimi.it/papers.php#VigEEMXGS">paper</a>.

	<p>Beside BigCrush, I analyzed generators using a test for <a href="hwd.php">Hamming-weight dependencies</a>
	described in our <a href="http://vigna.di.unimi.it/papers.php#BlVNTHWD">paper</a>. As I already remarked, our only
	generator failing the test (but only after 5&thinsp;TB of output) is <code>xoroshiro128+</code>.

	<p>I report the period of each generator and its footprint in bits: a generator gives &ldquo;bang-for-the-buck&rdquo;
	if the base-2 logarithm of the period is close to the footprint. Note
   that the footprint has been always padded to a multiple of 64, and it can
   be significantly larger than expected because of padding and
   cyclic access indices.

<div style="align: center"><table id='prng' style='margin: 2em 0' class='tablesorter'>
<thead><tr>
<th>PRNG
<th>Footprint (bits)
<th class="{ sorter: 'metadata' }">Period
<th><a href="http://simul.iro.umontreal.ca/testu01/tu01.html">BigCrush</a> Systematic Failures
<th><a href="http://prng.di.unimi.it/hwd.php">HWD failure</a>
<th>ns/64 bits
<th>cycles/B
<tbody>
<tr><td><a href="xoroshiro128plus.c"><code>xoroshiro128+</code></a><td align=right>128  <td align=right class='{sortValue: 128}'>2<sup>128</sup>&nbsp;&minus;&nbsp;1<td align=right>&mdash;<td align=right>5&thinsp;TB<td align=right>0.80<td align=right>0.36
<tr><td><a href="xoroshiro128plusplus.c"><code>xoroshiro128++</code></a><td align=right>128  <td align=right class='{sortValue: 128}'>2<sup>128</sup>&nbsp;&minus;&nbsp;1<td align=right>&mdash;<td align=right>&mdash;<td align=right>0.90<td align=right>0.40
<tr><td><a href="xoroshiro128starstar.c"><code>xoroshiro128**</code></a><td align=right>128  <td align=right class='{sortValue: 128}'>2<sup>128</sup>&nbsp;&minus;&nbsp;1<td align=right>&mdash;<td align=right>&mdash;<td align=right>0.78<td align=right>0.36
<tr><td><a href="xoshiro256plus.c"><code>xoshiro256+</code></a><td align=right>256  <td align=right class='{sortValue: 256}'>2<sup>256</sup>&nbsp;&minus;&nbsp;1<td align=right>&mdash;<td align=right>&mdash;<td align=right>0.61<td align=right>0.27
<tr><td><a href="xoshiro256plusplus.c"><code>xoshiro256++</code></a><td align=right>256  <td align=right class='{sortValue: 256}'>2<sup>256</sup>&nbsp;&minus;&nbsp;1<td align=right>&mdash;<td align=right>&mdash;<td align=right>0.75<td align=right>0.34
<tr><td><a href="xoshiro256starstar.c"><code>xoshiro256**</code></a><td align=right>256  <td align=right class='{sortValue: 256}'>2<sup>256</sup>&nbsp;&minus;&nbsp;1<td align=right>&mdash;<td align=right>&mdash;<td align=right>0.75<td align=right>0.34
<tr><td><a href="xoshiro512plus.c"><code>xoshiro512+</code></a><td align=right>512<td align=right class='{sortValue: 512}'>2<sup>512</sup>&nbsp;&minus;&nbsp;1<td align=right>&mdash;<td align=right>&mdash;<td align=right>0.68<td align=right>0.30
<tr><td><a href="xoshiro512plusplus.c"><code>xoshiro512++</code></a><td align=right>512<td align=right class='{sortValue: 512}'>2<sup>512</sup>&nbsp;&minus;&nbsp;1<td align=right>&mdash;<td align=right>&mdash;<td align=right>0.79<td align=right>0.36
<tr><td><a href="xoshiro512starstar.c"><code>xoshiro512**</code></a><td align=right>512<td align=right class='{sortValue: 512}'>2<sup>512</sup>&nbsp;&minus;&nbsp;1<td align=right>&mdash;<td align=right>&mdash;<td align=right>0.81<td align=right>0.37
<tr><td><a href="xoroshiro1024star.c"><code>xoroshiro1024*</code></a><td align=right>1068<td align=right class='{sortValue: 1024}'>2<sup>1024</sup>&nbsp;&minus;&nbsp;1<td align=right>&mdash;<td align=right>&mdash;<td align=right>0.82<td align=right>0.37
<tr><td><a href="xoroshiro1024plusplus.c"><code>xoroshiro1024++</code></a><td align=right>1068<td align=right class='{sortValue: 1024}'>2<sup>1024</sup>&nbsp;&minus;&nbsp;1<td align=right>&mdash;<td align=right>&mdash;<td align=right>1.01<td align=right>0.46
<tr><td><a href="xoroshiro1024starstar.c"><code>xoroshiro1024**</code></a><td align=right>1068<td align=right class='{sortValue: 1024}'>2<sup>1024</sup>&nbsp;&minus;&nbsp;1<td align=right>&mdash;<td align=right>&mdash;<td align=right>0.98<td align=right>0.44
<tr><td><a href="MWC128.c"><span style="font-variant: small-caps">MWC128</span></a><td align=right>128  			<td align=right class='{sortValue: 127}'>&asymp;2<sup>127</sup><td align=right>&mdash;<td align=right>&mdash;<td align=right>0.83<td align=right>0.37
<tr><td><a href="MWC192.c"><span style="font-variant: small-caps">MWC192</span></a><td align=right>192  			<td align=right class='{sortValue: 127}'>&asymp;2<sup>191</sup><td align=right>&mdash;<td align=right>&mdash;<td align=right>1.42<td align=right>0.19
<tr><td><a href="MWC256.c"><span style="font-variant: small-caps">MWC256</span></a><td align=right>256 			<td align=right class='{sortValue: 255}'>&asymp;2<sup>255</sup><td align=right>&mdash;<td align=right>&mdash;<td align=right>0.45<td align=right>0.20
<tr><td><a href="GMWC128.c"><span style="font-variant: small-caps">GMWC128</span></a><td align=right>128  			<td align=right class='{sortValue: 127}'>&asymp;2<sup>127</sup><td align=right>&mdash;<td align=right>&mdash;<td align=right>1.84<td align=right>0.83
<tr><td><a href="GMWC256.c"><span style="font-variant: small-caps">GMWC256</span></a><td align=right>256 			<td align=right class='{sortValue: 255}'>&asymp;2<sup>255</sup><td align=right>&mdash;<td align=right>&mdash;<td align=right>1.85<td align=right>0.83
<tr><td><a href="http://pracrand.sourceforge.net/"><span style="font-variant: small-caps">SFC64</span></a><td align=right>256 			<td align=right class='{sortValue: 64}'>&ge;2<sup>64</sup><td align=right>&mdash;<td align=right>&mdash;<td align=right>0.66<td align=right>0.30
<tr><td><a href="splitmix64.c"><span style="font-variant: small-caps">SplitMix64</span></a><td align=right>64  			<td align=right class='{sortValue: 64}'>2<sup>64</sup><td align=right>&mdash;<td align=right>&mdash;<td align=right>0.63<td align=right>0.29
<tr><td><a href="http://pcg-random.org/">PCG 128 XSH RS 64 (LCG)</a> <td align=right>128 <td align=right class='{sortValue: 128}'>2<sup>128</sup><td align=right>&mdash;<td align=right>&mdash;<td align=right>1.70<td align=right>0.77
<tr><td><a href="https://github.com/numpy/numpy">PCG64-DXSM (NumPy)</a> <td align=right>128 <td align=right class='{sortValue: 128}'>2<sup>128</sup><td align=right>&mdash;<td align=right>&mdash;<td align=right>1.11<td align=right>0.50
<tr><td><a href="http://numerical.recipes/"><code>Ran</code></a><td align=right>192  <td align=right class='{sortValue: 191}'>&#8776;2<sup>191</sup><td align=right>&mdash;<td align=right>&mdash;<td align=right>1.37<td align=right>0.62
<tr><td><a href="http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/emt.html"><code>MT19937-64</code> (Mersenne Twister)</a><td align=right>20032 <td align=right class='{sortValue: 19937}'>2<sup>19937</sup>&nbsp;&minus;&nbsp;1<td align=right>LinearComp<td align=right>&mdash;<td align=right>1.36<td align=right>0.62
<tr><td><a href="http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/SFMT/"><code>SFMT19937 (uses SSE2 instructions)</code></a><td align=right>20032 <td align=right class='{sortValue: 19937}'>2<sup>19937</sup>&nbsp;&minus;&nbsp;1<td align=right>LinearComp<td align=right>&mdash;<td align=right>0.93<td align=right>0.42
<tr><td><a href="http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/SFMT/"><code>SFMT607 (uses SSE2 instructions)</code></a><td align=right>672 <td align=right class='{sortValue: 607}'>2<sup>607</sup>&nbsp;&minus;&nbsp;1<td align=right>MatrixRank, LinearComp<td align=right>400 MB<td align=right>0.78<td align=right>0.34
<tr><td><a href="http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/TINYMT/index.html">Tiny Mersenne Twister</a> (64 bits)<td align=right>256<td align=right class='{sortValue: 127}'>2<sup>127</sup>&nbsp;&minus;&nbsp;1<td align=right>&mdash;<td align=right>90&thinsp;TB→<td align=right>2.76<td align=right>1.25
<tr><td><a href="http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/TINYMT/index.html">Tiny Mersenne Twister</a> (32 bits)<td align=right>224<td align=right class='{sortValue: 127}'>2<sup>127</sup>&nbsp;&minus;&nbsp;1<td align=right>CollisionOver, Run, SimPoker, AppearanceSpacings, MatrixRank, LinearComp, LongestHeadRun, Run of Bits (reversed)<td align=right>40&thinsp;TB→<td align=right>4.27<td align=right>1.92
<tr><td><a href="http://www.iro.umontreal.ca/~panneton/WELLRNG.html"><code>WELL512a</code></a><td align=right>544  <td align=right class='{sortValue: 512}'>2<sup>512</sup>&nbsp;&minus;&nbsp;1  <td align=right>MatrixRank, LinearComp<td align=right>3.5 PB<td align=right>5.42<td align=right>2.44
<tr><td><a href="http://www.iro.umontreal.ca/~panneton/WELLRNG.html"><code>WELL1024a</code></a><td align=right>1056  <td align=right class='{sortValue: 1024}'>2<sup>1024</sup>&nbsp;&minus;&nbsp;1  <td align=right>MatrixRank, LinearComp<td align=right>&mdash;<td align=right>5.30<td align=right>2.38
</table></div>

	<p>The following table compares instead two ways of generating floating-point numbers, namely the 521-bit <a href="http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/SFMT/">dSFMT</a>, which
   generates directly  floating-point numbers with 52 significant bits, and
   <a href="xoshiro256plus.c"><code>xoshiro256+</code></a> followed by a standard conversion of its upper bits to a floating-point number with 53 significant bits (see below).

<div style="align: center"><table id='prngf' style='margin: 2em 0' class='tablesorter'>
<thead><tr>
<th>PRNG
<th>Footprint (bits)
<th class="{ sorter: 'metadata' }">Period
<th> <a href="http://simul.iro.umontreal.ca/testu01/tu01.html">BigCrush</a> Systematic Failures
<th><a href="http://prng.di.unimi.it/hwd.php">HWD failure</a>
<th>ns/double
<th>cycles/B
<tbody>
<tr><td><a href="xoshiro256plus.c"><code>xoshiro256+</code></a> (returns 53 significant bits) <td align=right>256<td align=right class='{sortValue: 256}'>2<sup>256</sup>&nbsp;&minus;&nbsp;1<td align=right>&mdash;<td align=right>&mdash;<td align=right>0.92<td align=right>3.40
<tr><td><a href="http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/SFMT/"><code>dSFMT</code></a>  (uses SSE2 instructions, returns only 52 significant bits)<td align=right>704<td align=right class='{sortValue: 521}'>2<sup>521</sup>&nbsp;&minus;&nbsp;1<td align=right>MatrixRank, LinearComp<td align=right>6&thinsp;TB<td align=right>0.85<td align=right>3.07
</table></div>

	<p><code>xoshiro256+</code> is &asymp;8% slower than the dSFMT, but it has a doubled range of output values, does not need any extra SSE instruction (can be programmed in Java, etc.),
   has a much smaller footprint, and its upper bits do not fail any test.

	<h1><a name=remarks>&#xfeff;</a>Remarks</h1>

	<h2>Vectorization</h2>

	<p>Some of the generators can be very easily vectorized, so that multiple instances can be run in parallel to provide
	fast bulk generation. Thanks to an interesting <a href="https://github.com/JuliaLang/julia/issues/27614">discussion with the Julia developers</a>,
	I've become aware that AVX2 vectorizations of multiple instances of generators using the <code>+</code>/<code>++</code> scrambler are impressively fast (links
	below point at a speed test to be used with the <a href="harness.c">harness</a>, and the result will be multiplied by 1000):

<div style="align: center"><table id='vec' style='margin: 2em 0' class='tablesorter'>
<thead><tr>
<th>PRNG
<th>ns/64 bits
<th>cycles/B
<tbody>
<tr><td><a href="xoroshiro128+-vect-speed.c"><code>xoroshiro128+</code></a> (4 parallel instances)<td align=right>0.36<td align=right>0.14
<tr><td><a href="xoroshiro128++-vect-speed.c"><code>xoroshiro128++</code></a> (4 parallel instances)<td align=right>0.45<td align=right>0.18
<tr><td><a href="xoshiro256+-vect-speed.c"><code>xoshiro256+</code></a> (8 parallel instances)<td align=right>0.19<td align=right>0.08
<tr><td><a href="xoshiro256++-vect-speed.c"><code>xoshiro256++</code></a> (8 parallel instances)<td align=right>0.26<td align=right>0.09
</table></div>

	<p>Note that sometimes convincing the compiler to vectorize is a
	slightly quirky process: for example, on <code>gcc</code> 12.2.1 I have to use <code>-O3 -fdisable-tree-cunrolli -march=native</code>
	to vectorize <code>xoshiro256</code>-based generators
	(<code>-O3</code> alone will not vectorize; thanks to to Chris Elrod for pointing me at <code>-fdisable-tree-cunrolli</code>).

	<h2>A long period does not imply high quality</h2>

	<p>This is a common misconception. The generator <code>x++</code> has
	period \(2^k\), for any \(k\geq0\), provided that <code>x</code> is
	represented using \(k\) bits: nonetheless, it is a horrible generator.
	The generator returning \(k-1\) zeroes followed by a one has period
   \(k\).

	<p>It is however important that the period is long enough. A first heuristic rule of thumb
	is that if you need to use \(t\) values, you need a generator with period at least \(t^2\).
	Moreover, if you run \(n\) independent computations starting at random seeds,
	the sequences used by each computation should not overlap.

	<p>Now, given a generator with period \(P\), the probability that \(n\) subsequences of length \(L\) starting at random points in the state space
	overlap <a href="http://vigna.di.unimi.it/papers.php#VigPORSPNG">is bounded by \(n^2L/P\)</a>. If your generator has period \(2^{256}\) and you run
	on \(2^{64}\) cores (you will never have them) a computation using \(2^{64}\) pseudorandom numbers (you will never have the time)
	the probability of overlap would be less than \(2^{-64}\).

	<p>In other words: any generator with a period beyond
	\(2^{256}\) has a period that is
	sufficient for every imaginable application. Unless there are other motivations (e.g., provably
	increased quality), a generator with a larger period is only a waste of
	memory (as it needs a larger state), of cache lines, and of
	precious high-entropy random bits for seeding (unless you're using
	small seeds, but then it's not clear why you would want a very long
	period in the first place&mdash;the computation above is valid only if you seed all bits of the state
	with independent, uniformly distributed random bits).

	<p>In case the generator provides a <em>jump function</em> that lets you skip through chunks of the output in constant
	time, even a period of \(2^{128}\) can be sufficient, as it provides \(2^{64}\) non-overlapping sequences of length \(2^{64}\).

	<h2>Equidistribution</h2>

	<p>Every 64-bit generator of ours with <var>n</var> bits of state scrambled
	with <code>*</code> or <code>**</code> is <var>n</var>/64-dimensionally
	equidistributed: every <var>n</var>/64-tuple of consecutive 64-bit
	values appears exactly once in the output, except for the zero tuple
	(and this is the largest possible dimension). Generators based on the
	<code>+</code> or <code>++</code> scramblers are however only (<var>n</var>/64 &minus;
	1)-dimensionally equidistributed: every (<var>n</var>/64 &minus;
	1)-tuple of consecutive 64-bit values appears exactly 2<sup>64</sup>
	times in the output, except for a missing zero tuple. The same considerations
   apply to 32-bit generators.

	<h2>Generating uniform doubles in the unit interval</h2>

	<p>A standard double (64-bit) floating-point number in
	<a href="https://en.wikipedia.org/wiki/IEEE_floating_point">IEEE floating point format</a> has 52 bits of
	significand, plus an implicit bit at the left of the significand. Thus,
	the representation can actually store numbers with <em>53</em> significant binary digits.

	<p>Because of this fact, in C99 a 64-bit unsigned integer <code>x</code> should be converted to a 64-bit double 
	using the expression
<pre>
    #include &lt;stdint.h>

    (x >> 11) * 0x1.0p-53
</pre>
	<p>In Java you can use almost the same expression for a (signed) 64-bit integer:
<pre>
    (x >>> 11) * 0x1.0p-53
</pre>


	<p>This conversion guarantees that all dyadic rationals of the form <var>k</var> / 2<sup>&minus;53</sup> 
	will be equally likely. Note that this conversion prefers the high bits of <code>x</code> (usually, a good idea), but you can alternatively
	use the lowest bits.

	<p>An alternative, multiplication-free conversion is
<pre>
    #include &lt;stdint.h>

    static inline double to_double(uint64_t x) {
       const union { uint64_t i; double d; } u = { .i = UINT64_C(0x3FF) &lt;&lt; 52 | x >> 12 };
       return u.d - 1.0;
    }
</pre>
	<p>The code above cooks up by bit manipulation
	a real number in the interval [1..2), and then subtracts
	one to obtain a real number in the interval [0..1). If <code>x</code> is chosen uniformly among 64-bit integers, 
	<code>d</code> is chosen uniformly among dyadic rationals of the form <var>k</var> / 2<sup>&minus;52</sup>. This
	is the same technique used by generators providing directly doubles, such as the 
	<a href="http://dx.doi.org/10.1007/978-3-540-85912-3_26">dSFMT</a>.

	<p>This technique is supposed to be fast, but on recent hardare it does not seem to give a significant advantage.
   More importantly, <em>you will be generating half the values you could actually generate</em>.
	The same problem plagues the dSFMT. All doubles generated will have the lowest significand bit set to zero (I must
	thank Raimo Niskanen from the Erlang team for making me notice this&mdash;a previous version of this site
	did not mention this issue).

	<p>In Java you can obtain an analogous result using suitable static methods:
<pre>
    Double.longBitsToDouble(0x3FFL &lt;&lt; 52 | x >>> 12) - 1.0
</pre>

	<p>To adhere to the principle of least surprise, my implementations now use the multiplicative version, everywhere.

	<p>Interestingly, these are not the only notions of &ldquo;uniformity&rdquo; you can come up with. Another possibility
	is that of generating 1074-bit integers, normalize and return the nearest value representable as a
	64-bit double (this is the theory&mdash;in practice, you will almost never
	use more than two integers per double as the remaining bits would not be representable). This approach guarantees that all
	representable doubles could be in principle generated, albeit not every
	returned double will appear with the same probability. A reference
	implementation can be found <a href="random_real.c">here</a>. Note that unless your generator has
	at least 1074 bits of state and suitable equidistribution properties, the code above will not do what you expect
	(e.g., it might <em>never</em> return zero).


	  </div>
      
      
    </div>
    
    <div id="right">

<!--      <h1>Download</h1>
      <p><ul>
		<li><a HREF="prng-1.2.tgz">source tarball</A>
		<li><a HREF="prng-data.tar.bz2">data tarball (large!)</a>
      </ul>
-->
      <h1>C code (64 bits)</h1>
      <p><ul>
		<li><a HREF="xoshiro256plusplus.c"><code>xoshiro256++</code></a>
		<li><a HREF="xoshiro256starstar.c"><code>xoshiro256**</code></a>
		<li><a HREF="xoshiro256plus.c"><code>xoshiro256+</code></a>
		<li><a HREF="xoroshiro128plusplus.c"><code>xoroshiro128++</code></a>
		<li><a HREF="xoroshiro128starstar.c"><code>xoroshiro128**</code></a>
		<li><a HREF="xoroshiro128plus.c"><code>xoroshiro128+</code></a>
		<li><a HREF="https://github.com/vigna/MRG32k3a">Testless <code>MRG32k3a</code></a>
		<li><a HREF="MWC128.c"><code>MWC128</code></a> + <a HREF="mp.c"><code>mp.c</code></a>
		<li><a HREF="MWC192.c"><code>MWC192</code></a> + <a HREF="mp.c"><code>mp.c</code></a>
		<li><a HREF="MWC256.c"><code>MWC256</code></a> + <a HREF="mp.c"><code>mp.c</code></a>
		<li><a HREF="GMWC128.c"><code>GMWC128</code></a> + <a HREF="mp.c"><code>mp.c</code></a>
		<li><a HREF="GMWC256.c"><code>GMWC256</code></a> + <a HREF="mp.c"><code>mp.c</code></a>
<!--		<li><a HREF="xoshiro512starstar.c"><code>xoshiro512**</code></a>
		<li><a HREF="xoshiro512plus.c"><code>xoshiro512+</code></a>
		<li><a HREF="xoroshiro1024starstar.c"><code>xoroshiro1024**</code></a>
		<li><a HREF="xoroshiro1024plus.c"><code>xoroshiro1024*</code></a>-->
      </ul>

      <h1>C code (32 bits)</h1>
      <p><ul>
		<li><a HREF="xoshiro128plusplus.c"><code>xoshiro128++</code></a>
		<li><a HREF="xoshiro128starstar.c"><code>xoshiro128**</code></a>
		<li><a HREF="xoshiro128plus.c"><code>xoshiro128+</code></a>
		<li><a HREF="xoroshiro64starstar.c"><code>xoroshiro64**</code></a>
		<li><a HREF="xoroshiro64star.c"><code>xoroshiro64*</code></a>
      </ul>

   	<h1>Java code (<a HREF="https://github.com/openjdk/jdk17/tree/master/src/jdk.random/share/classes/jdk/random"><code>java.util.random</code></a>)</h1>

      <h1>Java code (<a href="http://dsiutils.di.unimi.it">DSI utilities</a>)</h1>
      <p><ul>
		<li><a HREF="http://dsiutils.di.unimi.it/docs/it/unimi/dsi/util/package-summary.html">Overview</a>
		<li><a HREF="http://dsiutils.di.unimi.it/docs/it/unimi/dsi/util/XoShiRo256PlusPlusRandom.html"><code>xoshiro256++</code></a>
		<li><a HREF="http://dsiutils.di.unimi.it/docs/it/unimi/dsi/util/XoShiRo256StarStarRandom.html"><code>xoshiro256**</code></a>
		<li><a HREF="http://dsiutils.di.unimi.it/docs/it/unimi/dsi/util/XoShiRo256PlusRandom.html"><code>xoshiro256+</code></a>
		<li><a HREF="http://dsiutils.di.unimi.it/docs/it/unimi/dsi/util/XoRoShiRo128PlusPlusRandom.html"><code>xoroshiro128++</code></a>
		<li><a HREF="http://dsiutils.di.unimi.it/docs/it/unimi/dsi/util/XoRoShiRo128StarStarRandom.html"><code>xoroshiro128**</code></a>
		<li><a HREF="http://dsiutils.di.unimi.it/docs/it/unimi/dsi/util/XoRoShiRo128PlusRandom.html"><code>xoroshiro128+</code></a>
		<li><a HREF="https://github.com/vigna/MRG32k3a">Testless <code>MRG32k3a</code></a>
      </ul>

   	<h1>Java code (<a HREF="https://gitbox.apache.org/repos/asf?p=commons-rng.git">Apache Commons RNG implementations</a>)</h1>

      <h1>Documentation</h1>
      <p><ul>
		<li>The <a href="http://vigna.di.unimi.it/papers.php#BlVSLPNG">paper</a> introducing <code>xoshiro</code>/<code>xoroshiro</code>.
		<li>The <a href="http://vigna.di.unimi.it/papers.php#BlVNTHWD">paper</a> describing our <a href="hwd.php">test for Hamming-weight dependencies</a>.
		<li>A <a href="http://vigna.di.unimi.it/papers.php#VigHTLGMT">paper</a> discussing the defects of the Mersenne Twister family of PRNGs.
		<li>A <a href="http://vigna.di.unimi.it/papers.php#VigPORSPNG">paper</a> discussing the probability of overlap of random subsequences.
		<li>A <a	href="http://vigna.di.unimi.it/papers.php#StVCESGMCPNG">paper</a> with new tables of multipliers for LCGs with power-of-two moduli.
		<li>A <a href="http://vigna.di.unimi.it/papers.php#StVLXM">paper</a> presenting the family LXM of PRNGs.
      </ul>

		<h1>Discussion</h1>

		<p>There is a <a href="http://groups.google.com/group/prng">discussion group</a>
		about this page. You can join or <a href="mailto:prng@googlegroups.com">send a message</a>.
		<h1><a href="https://validator.w3.org/check/referer">This is valid HTML 4.01</a></h1>

    </div>
  </body>
</html>