forked from personal/squiggle.c
598 lines
40 KiB
HTML
598 lines
40 KiB
HTML
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
|
|
<html>
|
|
<head>
|
|
<link href="https://fonts.googleapis.com/css?family=Open+Sans:400,400i,700,700i" rel="stylesheet">
|
|
<meta http-equiv="Content-Type" content="text/html;charset=utf-8">
|
|
<meta name="keywords" content="rng, prng, xoshiro, xoroshiro, xorshift, pseudorandom number generator, random number generator">
|
|
<style type="text/css">
|
|
@import "css/content.php";
|
|
@import "css/layout.php";
|
|
@import "css/tablesorter.css";
|
|
|
|
</style>
|
|
<title>xoshiro/xoroshiro generators and the PRNG shootout</title>
|
|
<script type="text/javascript" src="js/jquery.js"></script>
|
|
<script type="text/javascript" src="js/tablesorter.js"></script>
|
|
<script type="text/javascript" src="js/metadata.js"></script>
|
|
<script type="text/javascript">
|
|
$.tablesorter.defaults.widgets = ['zebra'];
|
|
$(document).ready( function() {
|
|
$("#prng").tablesorter({
|
|
headers: {
|
|
3: {
|
|
sorter: false
|
|
},
|
|
4: {
|
|
sorter: false
|
|
}
|
|
}
|
|
});
|
|
$("#vect").tablesorter({
|
|
});
|
|
$("#prngf").tablesorter({
|
|
headers: {
|
|
3: {
|
|
sorter: false
|
|
},
|
|
4: {
|
|
sorter: false
|
|
}
|
|
}
|
|
});
|
|
} );
|
|
</script>
|
|
|
|
<script type="text/javascript" src="https://polyfill.io/v3/polyfill.min.js?features=es6"></script>
|
|
<script type="text/javascript" src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script>
|
|
|
|
</head>
|
|
<body>
|
|
<div id=header><code>xoshiro</code> / <code>xoroshiro</code> generators and the PRNG shootout</div>
|
|
|
|
<div id=left>
|
|
<ul id="left-nav">
|
|
<li><a href="http://vigna.di.unimi.it/">Home</a></li>
|
|
<li><a href="http://vigna.di.unimi.it/papers.php">Papers<span class=arrow>➔</span></a></li>
|
|
<li><a href="http://vigna.di.unimi.it/software.php">Software<span class=arrow>➔</span></a>
|
|
<li><strong><a href="http://prng.di.unimi.it/">PRNG shootout</a></strong>
|
|
|
|
<ul>
|
|
<li><a href="#intro">Introduction</A></li>
|
|
<li><a href="#shootout">A PRNG Shootout</A></li>
|
|
<li><a href="#speed">Speed</A></li>
|
|
<li><a href="#quality">Quality</A></li>
|
|
<li><a href="#remarks">Remarks</A></li>
|
|
</ul>
|
|
|
|
|
|
<li><a href="http://pcg.di.unimi.it/pcg.php">The wrap-up on PCG generators<span class=arrow>➔</span></a>
|
|
<li><a href="http://fastutil.di.unimi.it/"><code>fastutil</code><span class=arrow>➔</span></a>
|
|
<li><a href="http://dsiutils.di.unimi.it/">DSI utilities<span class=arrow>➔</span></a>
|
|
<li><a href="http://webgraph.di.unimi.it/">WebGraph<span class=arrow>➔</span></a>
|
|
<li><a href="http://sux.di.unimi.it/">Sux<span class=arrow>➔</span></a>
|
|
<li><a href="http://vigna.di.unimi.it/music.php">Music<span class=arrow>➔</span></a>
|
|
<li><a href="https://shrinkai.di.unimi.it/">Shrink AI<span class=arrow>➔</span></a>
|
|
</ul>
|
|
</div>
|
|
|
|
<div id="main">
|
|
|
|
|
|
<div id="content">
|
|
|
|
<h1 class=first>Introduction</h1>
|
|
|
|
<p>This page describes some new pseudorandom number generators (PRNGs) we (David Blackman and I) have been working on recently, and
|
|
a shootout comparing them with other generators. Details about the generators can
|
|
be found in our <a
|
|
href="http://vigna.di.unimi.it/papers.php#BlVSLPNG">paper</a>. Information about my previous <code>xorshift</code>-based
|
|
generators can be found <a href="xorshift.php">here</a>, but they have been entirely superseded by the new ones, which
|
|
are faster <em>and</em> better. As part of our study, we developed a very strong <a href="hwd.php">test for Hamming-weight dependencies</a>
|
|
that gave a number of surprising results.
|
|
|
|
<h1>64-bit Generators</h1>
|
|
|
|
<P><a href="xoshiro256plusplus.c"><code>xoshiro256++</code></a>/<a href="xoshiro256starstar.c"><code>xoshiro256**</code></a>
|
|
(XOR/shift/rotate) are our <strong>all-purpose</strong>
|
|
generators (not <em>cryptographically secure</em> generators, though,
|
|
like all PRNGs in these pages). They have excellent (sub-ns) speed, a state
|
|
space (256 bits) that is large enough for any parallel application, and
|
|
they pass all tests we are aware of. See the <a href="http://vigna.di.unimi.it/papers.php#BlVSLPNG">paper</a>
|
|
for a discussion of their differences.
|
|
|
|
<p>If, however, one has to generate only 64-bit <strong>floating-point</strong> numbers
|
|
(by extracting the upper 53 bits) <a
|
|
href="xoshiro256plus.c"><code>xoshiro256+</code></a> is a slightly (≈15%)
|
|
faster generator with analogous statistical properties. For general
|
|
usage, one has to consider that its lowest bits have low linear
|
|
complexity and will <a href="lowcomp.php">fail linearity tests</a>; however, low linear
|
|
complexity of the lowest bits can have hardly any impact in practice, and certainly has no
|
|
impact at all if you generate floating-point numbers using the upper bits (we computed a <a
|
|
href="http://vigna.di.unimi.it/papers.php#BlVSLPNG">precise
|
|
estimate</a> of the linear complexity of the lowest bits).
|
|
|
|
<p>If you are <strong>tight on space</strong>, <a
|
|
href="xoroshiro128plusplus.c"><code>xoroshiro128++</code></a>/<a
|
|
href="xoroshiro128starstar.c"><code>xoroshiro128**</code></a>
|
|
(XOR/rotate/shift/rotate) and <a
|
|
href="xoroshiro128plus.c"><code>xoroshiro128+</code></a> have the same
|
|
speed and use half of the space; the same comments apply. They are suitable only for
|
|
low-scale parallel applications; moreover, <code>xoroshiro128+</code>
|
|
exhibits a mild dependency in Hamming weights that generates a failure
|
|
after 5 TB of output in <a href="hwd.php">our test</a>. We believe
|
|
this slight bias cannot affect any application.
|
|
|
|
<p>Finally, if for any reason (which reason?) you need <strong>more
|
|
state</strong>, we provide in the same
|
|
vein <a href="xoshiro512plusplus.c"><code>xoshiro512++</code></a> / <a href="xoshiro512starstar.c"><code>xoshiro512**</code></a> / <a href="xoshiro512plus.c"><code>xoshiro512+</code></a> and
|
|
<a href="xoroshiro1024plusplus.c"><code>xoroshiro1024++</code></a> / <a href="xoroshiro1024starstar.c"><code>xoroshiro1024**</code></a> / <a href="xoroshiro1024star.c"><code>xoroshiro1024*</code></a> (see the <a
|
|
href="http://vigna.di.unimi.it/papers.php#BlVSLPNG">paper</a>).
|
|
|
|
<p>All generators, being based on linear recurrences, provide <em>jump
|
|
functions</em> that make it possible to simulate any number of calls to
|
|
the next-state function in constant time, once a suitable <em>jump
|
|
polynomial</em> has been computed. We provide ready-made jump functions for
|
|
a number of calls equal to the square root of the period, to make it easy
|
|
generating non-overlapping sequences for parallel computations, and equal
|
|
to the cube of the fourth root of the period, to make it possible to
|
|
generate independent sequences on different parallel processors.
|
|
|
|
<p>We suggest to use <a href="splitmix64.c"><span
|
|
style="font-variant: small-caps">SplitMix64</span></a> to initialize
|
|
the state of our generators starting from a 64-bit seed, as <a href="https://dl.acm.org/citation.cfm?doid=1276927.1276928">research
|
|
has shown</a> that initialization must be performed with a generator
|
|
radically different in nature from the one initialized to avoid
|
|
correlation on similar seeds.
|
|
|
|
|
|
<h1>32-bit Generators</h1>
|
|
|
|
<P><a href="xoshiro128plusplus.c"><code>xoshiro128++</code></a>/<a href="xoshiro128starstar.c"><code>xoshiro128**</code></a> are our
|
|
<strong>32-bit</strong> all-purpose generators, whereas <a
|
|
href="xoshiro128plus.c"><code>xoshiro128+</code></a> is
|
|
for floating-point generation. They are the 32-bit counterpart of
|
|
<code>xoshiro256++</code>, <code>xoshiro256**</code> and <code>xoshiro256+</code>, so similar comments apply.
|
|
Their state is too small for
|
|
large-scale parallelism: their intended usage is inside embedded
|
|
hardware or GPUs. For an even smaller scale, you can use <a
|
|
href="xoroshiro64starstar.c"><code>xoroshiro64**</code></a> and <a
|
|
href="xoroshiro64star.c"><code>xoroshiro64*</code></a>. We not believe
|
|
at this point in time 32-bit generator with a larger state can be of
|
|
any use (but there are 32-bit <code>xoroshiro</code> generators of much larger size).
|
|
|
|
<p>All 32-bit generators pass all tests we are aware of, with the
|
|
exception of linearity tests (binary rank and linear complexity) for
|
|
<code>xoshiro128+</code> and <code>xoroshiro64*</code>: in this case,
|
|
due to the smaller number of output bits the low linear complexity of the
|
|
lowest bits is sufficient to trigger BigCrush tests when the output is bit-reversed. Analogously to
|
|
the 64-bit case, generating 32-bit floating-point number using the
|
|
upper bits will not use any of the bits with <a href="lowcomp.php">low linear complexity</a>.
|
|
|
|
<h1>16-bit Generators</h1>
|
|
|
|
<p>We do not suggest any particular 16-bit generator, but it is possible
|
|
to design relatively good ones using our techniques. For example,
|
|
Parallax has embedded in their <a href="https://www.parallax.com/propeller-2/">Propeller 2 microcontroller</a> multiple 16-bit
|
|
<code>xoroshiro32++</code> generators.
|
|
|
|
<h1>Congruential Generators</h1>
|
|
|
|
<p>In case you are interested in 64-bit PRNGs based on congruential arithmetic, I provide
|
|
three instances of a
|
|
<a href="https://groups.google.com/forum/#!searchin/sci.stat.math/Yet$20another$20rng%7Csort:date/sci.stat.math/p7aLW3TsJys/QGb1kti6kN0J">Marsaglia's Multiply-With-Carry generators</a>,
|
|
<a href="MWC128.c"><code>MWC128</code></a>, <a href="MWC192.c"><code>MWC192</code></a>, and <a href="MWC256.c"><code>MWC256</code></a>, for which I computed good constants. They are some
|
|
of the fastest generator available, but they need 128-bit operations.
|
|
|
|
<p>Stronger theoretical guarantees are provided by the
|
|
<a href="https://www.math.ias.edu/~goresky/pdf/p1-goresky.pdf">generalized multiply-with-carry generators defined by Goresky and Klapper</a>:
|
|
also in this case I provide two instances, <a href="GMWC128.c"><code>GMWC128</code></a> and <a href="GMWC256.c"><code>GMWC256</code></a>, for which I computed good constants.
|
|
This generators, however, are about twice slower than MWC generators.
|
|
|
|
<h1>JavaScript</h1>
|
|
|
|
<p><code>xorshift128+</code> is presently used in the JavaScript engines of
|
|
<a href="http://v8project.blogspot.com/2015/12/theres-mathrandom-and-then-theres.html">Chrome</a>,
|
|
<a href="https://nodejs.org/">Node.js</a>,
|
|
<a href="https://bugzilla.mozilla.org/show_bug.cgi?id=322529#c99">Firefox</a>,
|
|
<a href="https://bugs.webkit.org/show_bug.cgi?id=151641">Safari</a> and
|
|
<a href="https://github.com/Microsoft/ChakraCore/commit/dbda0182dc0a983dfb37d90c05000e79b6fc75b0">Microsoft Edge</a>.
|
|
|
|
<h1>Rust</h1>
|
|
<p>The <a href="https://docs.rs/rand/latest/rand/rngs/struct.SmallRng.html">SmallRng</a> from the <a href="https://docs.rs/rand/latest/rand/">rand</a>
|
|
crate is <a HREF="xoshiro256plusplus.c"><code>xoshiro256++</code></a> or <a HREF="xoshiro128plusplus.c"><code>xoshiro128++</code></a>, depending
|
|
on the platform.
|
|
|
|
<h1><code>java.util.random</code></h1>
|
|
|
|
<p>I worked with Guy Steele at the <a
|
|
href="https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/random/package-summary.html">new
|
|
family of PRNGs available in Java 17</a>. The family, called <a
|
|
href="http://vigna.di.unimi.it/papers.php#StVLXM">LXM</a>, uses <a
|
|
href="http://vigna.di.unimi.it/papers.php#StVCESGMCPNG">new, better
|
|
tables of multipliers for LCGs with power-of-two moduli</a>. Moreover,
|
|
<code>java.util.random</code> contains ready-to-use implementations of
|
|
<a HREF="xoroshiro128plusplus.c"><code>xoroshiro128++</code></a> and <a
|
|
HREF="xoshiro256plusplus.c"><code>xoshiro256++</code></a>.
|
|
|
|
<h1>.NET</h1>
|
|
|
|
<p>In version 6, Microsoft's .NET framework <a href="https://devblogs.microsoft.com/dotnet/performance-improvements-in-net-6/">has adopted</a>
|
|
<a HREF="xoshiro256starstar.c"><code>xoshiro256**</code></a> and <a
|
|
HREF="xoshiro128starstar.c"><code>xoshiro128**</code></a> as default PRNGs.
|
|
|
|
<h1>Erlang</h1>
|
|
|
|
<p>The parallel functional language <a href="https://www.erlang.org/">Erlang</a> implements <a href="https://www.erlang.org/doc/man/rand.html">several
|
|
variants of <code>xorshift</code>/<code>xoroshiro</code>-based generators</a> adapted in collaboration with Raimo Niskanen for Erlang's
|
|
58/59-bit arithmetic.
|
|
|
|
<h1>GNU FORTRAN</h1>
|
|
<p>GNU's <a href="https://gcc.gnu.org/fortran/">implementation of the FORTRAN language</a> <a href="https://gcc.gnu.org/onlinedocs/gfortran/RANDOM_005fNUMBER.html">uses</a>
|
|
<a HREF="xoshiro256starstar.c"><code>xoshiro256**</code></a> as default PRNG.
|
|
|
|
<h1>Julia</h1>
|
|
<p>The <a href="https://julialang.org/">Julia programming language</a> <a href="https://docs.julialang.org/en/v1/stdlib/Random/">uses</a>
|
|
<a HREF="xoshiro256plusplus.c"><code>xoshiro256++</code></a> as default PRNG.
|
|
|
|
<h1>Lua</h1>
|
|
<p>The scripting language <a href="http://www.lua.org/">Lua</a> <a href="https://www.lua.org/manual/5.4/manual.html#pdf-math.random">uses</a> <a HREF="xoshiro256starstar.c"><code>xoshiro256**</code></a> as default PRNG.
|
|
|
|
<h1>IoT</h1>
|
|
|
|
<p>The IoT operating systems <a href="https://os.mbed.com/">Mbed</a> and <a href="https://www.zephyrproject.org/">Zephyr</a> use
|
|
<a HREF="xoroshiro128plus.c"><code>xoroshiro128+</code></a> as default PRNG.
|
|
|
|
<h1><a name=shootout></a>A PRNG Shootout</h1>
|
|
|
|
<p>I provide here a shootout of a few recent 64-bit PRNGs that are quite widely used.
|
|
The purpose is that of providing a consistent, reproducible assessment of two properties of the generators: speed and quality.
|
|
The code used to perform the tests and all the output from statistical test suites is available for download.
|
|
|
|
<h2><a name=speed></a>Speed</h2>
|
|
|
|
<p>The speed reported in this page is the time required to emit 64
|
|
random bits, and the number of clock cycles required to generate a byte (thanks to the <a href="http://icl.utk.edu/papi/">PAPI</a> library). If a generator is 32-bit in nature, I glue two
|
|
consecutive outputs. Note that
|
|
I do not report results using GPUs or SSE instructions, with an exception for the very common SFMT: for that to be
|
|
meaningful, I should have implementations for all generators.
|
|
Otherwise, with suitable hardware support I could just use AES in
|
|
counter mode and get 64 secure bits in 0.56 ns (or just use <a href="https://github.com/google/randen">Randen</a>). The tests were performed on a
|
|
12th Gen Intel® Core™ i7-12700KF @3.60GHz using <code>gcc</code> 12.2.1.
|
|
|
|
<p>A few <i>caveats</i>:
|
|
<ul>
|
|
<li>There is some looping overhead, but subtracting it from the timings is not going to
|
|
be particularly meaningful due to instruction rescheduling, etc.
|
|
<li>Relative speed might be different on different CPUs and on different scenarios.
|
|
<li>I do not use <code>-march=native</code>, which can improve the timing of some generators
|
|
by vectorization or special instructions, because those improvements might not be possible
|
|
when the generator is embedded in user code.
|
|
<li>Code has been compiled using <code>gcc</code>'s <code>-fno-unroll-loops</code>
|
|
option. This options is essential to get a sensible result: without it, the compiler
|
|
may perform different loop unrolling depending on the generator. Previosuly I was using also
|
|
<code>-fno-move-loop-invariants</code>, which was essential in not giving generators using several
|
|
large constants an advantage by preventing the compiler from loading them into registers. However,
|
|
as of <code>gcc</code> 12.2.1 the compiler loads the constants into registers anyway, so the
|
|
option is no longer used. Timings
|
|
with <a href="http://clang.llvm.org/"><code>clang</code></a> at the time of this writing
|
|
are very close to those obtained with <code>gcc</code>.
|
|
If you find timings that are significantly better than those shown here on
|
|
comparable hardware, they are likely to be due to compiler artifacts (e.g., vectorization).
|
|
<li>Timings are taken running a generator for billions of times in a loop; but this is not the way you use generators. Register
|
|
allocation might be very different when the generator is embedded in an application, leading to constants being reloaded
|
|
or part of the state space being written to main memory at each iteration. These costs do not appear in the benchmarks below.
|
|
</ul>
|
|
|
|
<p>To ease replicability, I distribute a <a href="harness.c"><em>harness</em></a> performing the measurement. You just
|
|
have to define a <a href="xoroshiro128plus-speed.c"><code>next()</code></a> function and include the harness. But the only realistic
|
|
suggestion is to try different generators in your application and see what happens.
|
|
|
|
<h2><a name=quality></a>Quality</h2>
|
|
|
|
<p>This is probably the more <a
|
|
href="http://dilbert.com/strips/comic/2001-10-25/">elusive</a> property
|
|
of a PRNG. Here quality is measured using the powerful
|
|
BigCrush suite of tests. BigCrush is part of <a
|
|
href="http://simul.iro.umontreal.ca/testu01/tu01.html">TestU01</a>,
|
|
a monumental framework for testing PRNGs developed by Pierre L'Ecuyer
|
|
and Richard Simard (“TestU01: A C library for empirical testing
|
|
of random number generators”, <i>ACM Trans. Math. Softw.</i>
|
|
33(4), Article 22, 2007).
|
|
|
|
<p>I run BigCrush starting from 100 equispaced points of the state space
|
|
of the generator and collect <em>failures</em>—tests in which the
|
|
<i>p</i>-value statistics is outside the interval [0.001..0.999]. A failure
|
|
is <em>systematic</em> if it happens at all points.
|
|
|
|
<p>Note that TestU01 is a 32-bit test suite. Thus, two 32-bit integer values
|
|
are passed to the test suite for each generated 64-bit value. Floating point numbers
|
|
are generated instead by dividing the unsigned output of the generator by 2<sup>64</sup>.
|
|
Since this implies a bias towards the high bits (which is anyway a known characteristic
|
|
of TestU01), I run the test suite also on the <em>reverse</em>
|
|
generator. More detail about the whole process can be found in this <a
|
|
href="http://vigna.di.unimi.it/papers.php#VigEEMXGS">paper</a>.
|
|
|
|
<p>Beside BigCrush, I analyzed generators using a test for <a href="hwd.php">Hamming-weight dependencies</a>
|
|
described in our <a href="http://vigna.di.unimi.it/papers.php#BlVNTHWD">paper</a>. As I already remarked, our only
|
|
generator failing the test (but only after 5 TB of output) is <code>xoroshiro128+</code>.
|
|
|
|
<p>I report the period of each generator and its footprint in bits: a generator gives “bang-for-the-buck”
|
|
if the base-2 logarithm of the period is close to the footprint. Note
|
|
that the footprint has been always padded to a multiple of 64, and it can
|
|
be significantly larger than expected because of padding and
|
|
cyclic access indices.
|
|
|
|
<div style="align: center"><table id='prng' style='margin: 2em 0' class='tablesorter'>
|
|
<thead><tr>
|
|
<th>PRNG
|
|
<th>Footprint (bits)
|
|
<th class="{ sorter: 'metadata' }">Period
|
|
<th><a href="http://simul.iro.umontreal.ca/testu01/tu01.html">BigCrush</a> Systematic Failures
|
|
<th><a href="http://prng.di.unimi.it/hwd.php">HWD failure</a>
|
|
<th>ns/64 bits
|
|
<th>cycles/B
|
|
<tbody>
|
|
<tr><td><a href="xoroshiro128plus.c"><code>xoroshiro128+</code></a><td align=right>128 <td align=right class='{sortValue: 128}'>2<sup>128</sup> − 1<td align=right>—<td align=right>5 TB<td align=right>0.80<td align=right>0.36
|
|
<tr><td><a href="xoroshiro128plusplus.c"><code>xoroshiro128++</code></a><td align=right>128 <td align=right class='{sortValue: 128}'>2<sup>128</sup> − 1<td align=right>—<td align=right>—<td align=right>0.90<td align=right>0.40
|
|
<tr><td><a href="xoroshiro128starstar.c"><code>xoroshiro128**</code></a><td align=right>128 <td align=right class='{sortValue: 128}'>2<sup>128</sup> − 1<td align=right>—<td align=right>—<td align=right>0.78<td align=right>0.36
|
|
<tr><td><a href="xoshiro256plus.c"><code>xoshiro256+</code></a><td align=right>256 <td align=right class='{sortValue: 256}'>2<sup>256</sup> − 1<td align=right>—<td align=right>—<td align=right>0.61<td align=right>0.27
|
|
<tr><td><a href="xoshiro256plusplus.c"><code>xoshiro256++</code></a><td align=right>256 <td align=right class='{sortValue: 256}'>2<sup>256</sup> − 1<td align=right>—<td align=right>—<td align=right>0.75<td align=right>0.34
|
|
<tr><td><a href="xoshiro256starstar.c"><code>xoshiro256**</code></a><td align=right>256 <td align=right class='{sortValue: 256}'>2<sup>256</sup> − 1<td align=right>—<td align=right>—<td align=right>0.75<td align=right>0.34
|
|
<tr><td><a href="xoshiro512plus.c"><code>xoshiro512+</code></a><td align=right>512<td align=right class='{sortValue: 512}'>2<sup>512</sup> − 1<td align=right>—<td align=right>—<td align=right>0.68<td align=right>0.30
|
|
<tr><td><a href="xoshiro512plusplus.c"><code>xoshiro512++</code></a><td align=right>512<td align=right class='{sortValue: 512}'>2<sup>512</sup> − 1<td align=right>—<td align=right>—<td align=right>0.79<td align=right>0.36
|
|
<tr><td><a href="xoshiro512starstar.c"><code>xoshiro512**</code></a><td align=right>512<td align=right class='{sortValue: 512}'>2<sup>512</sup> − 1<td align=right>—<td align=right>—<td align=right>0.81<td align=right>0.37
|
|
<tr><td><a href="xoroshiro1024star.c"><code>xoroshiro1024*</code></a><td align=right>1068<td align=right class='{sortValue: 1024}'>2<sup>1024</sup> − 1<td align=right>—<td align=right>—<td align=right>0.82<td align=right>0.37
|
|
<tr><td><a href="xoroshiro1024plusplus.c"><code>xoroshiro1024++</code></a><td align=right>1068<td align=right class='{sortValue: 1024}'>2<sup>1024</sup> − 1<td align=right>—<td align=right>—<td align=right>1.01<td align=right>0.46
|
|
<tr><td><a href="xoroshiro1024starstar.c"><code>xoroshiro1024**</code></a><td align=right>1068<td align=right class='{sortValue: 1024}'>2<sup>1024</sup> − 1<td align=right>—<td align=right>—<td align=right>0.98<td align=right>0.44
|
|
<tr><td><a href="MWC128.c"><span style="font-variant: small-caps">MWC128</span></a><td align=right>128 <td align=right class='{sortValue: 127}'>≈2<sup>127</sup><td align=right>—<td align=right>—<td align=right>0.83<td align=right>0.37
|
|
<tr><td><a href="MWC192.c"><span style="font-variant: small-caps">MWC192</span></a><td align=right>192 <td align=right class='{sortValue: 127}'>≈2<sup>191</sup><td align=right>—<td align=right>—<td align=right>1.42<td align=right>0.19
|
|
<tr><td><a href="MWC256.c"><span style="font-variant: small-caps">MWC256</span></a><td align=right>256 <td align=right class='{sortValue: 255}'>≈2<sup>255</sup><td align=right>—<td align=right>—<td align=right>0.45<td align=right>0.20
|
|
<tr><td><a href="GMWC128.c"><span style="font-variant: small-caps">GMWC128</span></a><td align=right>128 <td align=right class='{sortValue: 127}'>≈2<sup>127</sup><td align=right>—<td align=right>—<td align=right>1.84<td align=right>0.83
|
|
<tr><td><a href="GMWC256.c"><span style="font-variant: small-caps">GMWC256</span></a><td align=right>256 <td align=right class='{sortValue: 255}'>≈2<sup>255</sup><td align=right>—<td align=right>—<td align=right>1.85<td align=right>0.83
|
|
<tr><td><a href="http://pracrand.sourceforge.net/"><span style="font-variant: small-caps">SFC64</span></a><td align=right>256 <td align=right class='{sortValue: 64}'>≥2<sup>64</sup><td align=right>—<td align=right>—<td align=right>0.66<td align=right>0.30
|
|
<tr><td><a href="splitmix64.c"><span style="font-variant: small-caps">SplitMix64</span></a><td align=right>64 <td align=right class='{sortValue: 64}'>2<sup>64</sup><td align=right>—<td align=right>—<td align=right>0.63<td align=right>0.29
|
|
<tr><td><a href="http://pcg-random.org/">PCG 128 XSH RS 64 (LCG)</a> <td align=right>128 <td align=right class='{sortValue: 128}'>2<sup>128</sup><td align=right>—<td align=right>—<td align=right>1.70<td align=right>0.77
|
|
<tr><td><a href="https://github.com/numpy/numpy">PCG64-DXSM (NumPy)</a> <td align=right>128 <td align=right class='{sortValue: 128}'>2<sup>128</sup><td align=right>—<td align=right>—<td align=right>1.11<td align=right>0.50
|
|
<tr><td><a href="http://numerical.recipes/"><code>Ran</code></a><td align=right>192 <td align=right class='{sortValue: 191}'>≈2<sup>191</sup><td align=right>—<td align=right>—<td align=right>1.37<td align=right>0.62
|
|
<tr><td><a href="http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/emt.html"><code>MT19937-64</code> (Mersenne Twister)</a><td align=right>20032 <td align=right class='{sortValue: 19937}'>2<sup>19937</sup> − 1<td align=right>LinearComp<td align=right>—<td align=right>1.36<td align=right>0.62
|
|
<tr><td><a href="http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/SFMT/"><code>SFMT19937 (uses SSE2 instructions)</code></a><td align=right>20032 <td align=right class='{sortValue: 19937}'>2<sup>19937</sup> − 1<td align=right>LinearComp<td align=right>—<td align=right>0.93<td align=right>0.42
|
|
<tr><td><a href="http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/SFMT/"><code>SFMT607 (uses SSE2 instructions)</code></a><td align=right>672 <td align=right class='{sortValue: 607}'>2<sup>607</sup> − 1<td align=right>MatrixRank, LinearComp<td align=right>400 MB<td align=right>0.78<td align=right>0.34
|
|
<tr><td><a href="http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/TINYMT/index.html">Tiny Mersenne Twister</a> (64 bits)<td align=right>256<td align=right class='{sortValue: 127}'>2<sup>127</sup> − 1<td align=right>—<td align=right>90 TB→<td align=right>2.76<td align=right>1.25
|
|
<tr><td><a href="http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/TINYMT/index.html">Tiny Mersenne Twister</a> (32 bits)<td align=right>224<td align=right class='{sortValue: 127}'>2<sup>127</sup> − 1<td align=right>CollisionOver, Run, SimPoker, AppearanceSpacings, MatrixRank, LinearComp, LongestHeadRun, Run of Bits (reversed)<td align=right>40 TB→<td align=right>4.27<td align=right>1.92
|
|
<tr><td><a href="http://www.iro.umontreal.ca/~panneton/WELLRNG.html"><code>WELL512a</code></a><td align=right>544 <td align=right class='{sortValue: 512}'>2<sup>512</sup> − 1 <td align=right>MatrixRank, LinearComp<td align=right>3.5 PB<td align=right>5.42<td align=right>2.44
|
|
<tr><td><a href="http://www.iro.umontreal.ca/~panneton/WELLRNG.html"><code>WELL1024a</code></a><td align=right>1056 <td align=right class='{sortValue: 1024}'>2<sup>1024</sup> − 1 <td align=right>MatrixRank, LinearComp<td align=right>—<td align=right>5.30<td align=right>2.38
|
|
</table></div>
|
|
|
|
<p>The following table compares instead two ways of generating floating-point numbers, namely the 521-bit <a href="http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/SFMT/">dSFMT</a>, which
|
|
generates directly floating-point numbers with 52 significant bits, and
|
|
<a href="xoshiro256plus.c"><code>xoshiro256+</code></a> followed by a standard conversion of its upper bits to a floating-point number with 53 significant bits (see below).
|
|
|
|
<div style="align: center"><table id='prngf' style='margin: 2em 0' class='tablesorter'>
|
|
<thead><tr>
|
|
<th>PRNG
|
|
<th>Footprint (bits)
|
|
<th class="{ sorter: 'metadata' }">Period
|
|
<th> <a href="http://simul.iro.umontreal.ca/testu01/tu01.html">BigCrush</a> Systematic Failures
|
|
<th><a href="http://prng.di.unimi.it/hwd.php">HWD failure</a>
|
|
<th>ns/double
|
|
<th>cycles/B
|
|
<tbody>
|
|
<tr><td><a href="xoshiro256plus.c"><code>xoshiro256+</code></a> (returns 53 significant bits) <td align=right>256<td align=right class='{sortValue: 256}'>2<sup>256</sup> − 1<td align=right>—<td align=right>—<td align=right>0.92<td align=right>3.40
|
|
<tr><td><a href="http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/SFMT/"><code>dSFMT</code></a> (uses SSE2 instructions, returns only 52 significant bits)<td align=right>704<td align=right class='{sortValue: 521}'>2<sup>521</sup> − 1<td align=right>MatrixRank, LinearComp<td align=right>6 TB<td align=right>0.85<td align=right>3.07
|
|
</table></div>
|
|
|
|
<p><code>xoshiro256+</code> is ≈8% slower than the dSFMT, but it has a doubled range of output values, does not need any extra SSE instruction (can be programmed in Java, etc.),
|
|
has a much smaller footprint, and its upper bits do not fail any test.
|
|
|
|
<h1><a name=remarks></a>Remarks</h1>
|
|
|
|
<h2>Vectorization</h2>
|
|
|
|
<p>Some of the generators can be very easily vectorized, so that multiple instances can be run in parallel to provide
|
|
fast bulk generation. Thanks to an interesting <a href="https://github.com/JuliaLang/julia/issues/27614">discussion with the Julia developers</a>,
|
|
I've become aware that AVX2 vectorizations of multiple instances of generators using the <code>+</code>/<code>++</code> scrambler are impressively fast (links
|
|
below point at a speed test to be used with the <a href="harness.c">harness</a>, and the result will be multiplied by 1000):
|
|
|
|
<div style="align: center"><table id='vec' style='margin: 2em 0' class='tablesorter'>
|
|
<thead><tr>
|
|
<th>PRNG
|
|
<th>ns/64 bits
|
|
<th>cycles/B
|
|
<tbody>
|
|
<tr><td><a href="xoroshiro128+-vect-speed.c"><code>xoroshiro128+</code></a> (4 parallel instances)<td align=right>0.36<td align=right>0.14
|
|
<tr><td><a href="xoroshiro128++-vect-speed.c"><code>xoroshiro128++</code></a> (4 parallel instances)<td align=right>0.45<td align=right>0.18
|
|
<tr><td><a href="xoshiro256+-vect-speed.c"><code>xoshiro256+</code></a> (8 parallel instances)<td align=right>0.19<td align=right>0.08
|
|
<tr><td><a href="xoshiro256++-vect-speed.c"><code>xoshiro256++</code></a> (8 parallel instances)<td align=right>0.26<td align=right>0.09
|
|
</table></div>
|
|
|
|
<p>Note that sometimes convincing the compiler to vectorize is a
|
|
slightly quirky process: for example, on <code>gcc</code> 12.2.1 I have to use <code>-O3 -fdisable-tree-cunrolli -march=native</code>
|
|
to vectorize <code>xoshiro256</code>-based generators
|
|
(<code>-O3</code> alone will not vectorize; thanks to to Chris Elrod for pointing me at <code>-fdisable-tree-cunrolli</code>).
|
|
|
|
<h2>A long period does not imply high quality</h2>
|
|
|
|
<p>This is a common misconception. The generator <code>x++</code> has
|
|
period \(2^k\), for any \(k\geq0\), provided that <code>x</code> is
|
|
represented using \(k\) bits: nonetheless, it is a horrible generator.
|
|
The generator returning \(k-1\) zeroes followed by a one has period
|
|
\(k\).
|
|
|
|
<p>It is however important that the period is long enough. A first heuristic rule of thumb
|
|
is that if you need to use \(t\) values, you need a generator with period at least \(t^2\).
|
|
Moreover, if you run \(n\) independent computations starting at random seeds,
|
|
the sequences used by each computation should not overlap.
|
|
|
|
<p>Now, given a generator with period \(P\), the probability that \(n\) subsequences of length \(L\) starting at random points in the state space
|
|
overlap <a href="http://vigna.di.unimi.it/papers.php#VigPORSPNG">is bounded by \(n^2L/P\)</a>. If your generator has period \(2^{256}\) and you run
|
|
on \(2^{64}\) cores (you will never have them) a computation using \(2^{64}\) pseudorandom numbers (you will never have the time)
|
|
the probability of overlap would be less than \(2^{-64}\).
|
|
|
|
<p>In other words: any generator with a period beyond
|
|
\(2^{256}\) has a period that is
|
|
sufficient for every imaginable application. Unless there are other motivations (e.g., provably
|
|
increased quality), a generator with a larger period is only a waste of
|
|
memory (as it needs a larger state), of cache lines, and of
|
|
precious high-entropy random bits for seeding (unless you're using
|
|
small seeds, but then it's not clear why you would want a very long
|
|
period in the first place—the computation above is valid only if you seed all bits of the state
|
|
with independent, uniformly distributed random bits).
|
|
|
|
<p>In case the generator provides a <em>jump function</em> that lets you skip through chunks of the output in constant
|
|
time, even a period of \(2^{128}\) can be sufficient, as it provides \(2^{64}\) non-overlapping sequences of length \(2^{64}\).
|
|
|
|
<h2>Equidistribution</h2>
|
|
|
|
<p>Every 64-bit generator of ours with <var>n</var> bits of state scrambled
|
|
with <code>*</code> or <code>**</code> is <var>n</var>/64-dimensionally
|
|
equidistributed: every <var>n</var>/64-tuple of consecutive 64-bit
|
|
values appears exactly once in the output, except for the zero tuple
|
|
(and this is the largest possible dimension). Generators based on the
|
|
<code>+</code> or <code>++</code> scramblers are however only (<var>n</var>/64 −
|
|
1)-dimensionally equidistributed: every (<var>n</var>/64 −
|
|
1)-tuple of consecutive 64-bit values appears exactly 2<sup>64</sup>
|
|
times in the output, except for a missing zero tuple. The same considerations
|
|
apply to 32-bit generators.
|
|
|
|
<h2>Generating uniform doubles in the unit interval</h2>
|
|
|
|
<p>A standard double (64-bit) floating-point number in
|
|
<a href="https://en.wikipedia.org/wiki/IEEE_floating_point">IEEE floating point format</a> has 52 bits of
|
|
significand, plus an implicit bit at the left of the significand. Thus,
|
|
the representation can actually store numbers with <em>53</em> significant binary digits.
|
|
|
|
<p>Because of this fact, in C99 a 64-bit unsigned integer <code>x</code> should be converted to a 64-bit double
|
|
using the expression
|
|
<pre>
|
|
#include <stdint.h>
|
|
|
|
(x >> 11) * 0x1.0p-53
|
|
</pre>
|
|
<p>In Java you can use almost the same expression for a (signed) 64-bit integer:
|
|
<pre>
|
|
(x >>> 11) * 0x1.0p-53
|
|
</pre>
|
|
|
|
|
|
<p>This conversion guarantees that all dyadic rationals of the form <var>k</var> / 2<sup>−53</sup>
|
|
will be equally likely. Note that this conversion prefers the high bits of <code>x</code> (usually, a good idea), but you can alternatively
|
|
use the lowest bits.
|
|
|
|
<p>An alternative, multiplication-free conversion is
|
|
<pre>
|
|
#include <stdint.h>
|
|
|
|
static inline double to_double(uint64_t x) {
|
|
const union { uint64_t i; double d; } u = { .i = UINT64_C(0x3FF) << 52 | x >> 12 };
|
|
return u.d - 1.0;
|
|
}
|
|
</pre>
|
|
<p>The code above cooks up by bit manipulation
|
|
a real number in the interval [1..2), and then subtracts
|
|
one to obtain a real number in the interval [0..1). If <code>x</code> is chosen uniformly among 64-bit integers,
|
|
<code>d</code> is chosen uniformly among dyadic rationals of the form <var>k</var> / 2<sup>−52</sup>. This
|
|
is the same technique used by generators providing directly doubles, such as the
|
|
<a href="http://dx.doi.org/10.1007/978-3-540-85912-3_26">dSFMT</a>.
|
|
|
|
<p>This technique is supposed to be fast, but on recent hardare it does not seem to give a significant advantage.
|
|
More importantly, <em>you will be generating half the values you could actually generate</em>.
|
|
The same problem plagues the dSFMT. All doubles generated will have the lowest significand bit set to zero (I must
|
|
thank Raimo Niskanen from the Erlang team for making me notice this—a previous version of this site
|
|
did not mention this issue).
|
|
|
|
<p>In Java you can obtain an analogous result using suitable static methods:
|
|
<pre>
|
|
Double.longBitsToDouble(0x3FFL << 52 | x >>> 12) - 1.0
|
|
</pre>
|
|
|
|
<p>To adhere to the principle of least surprise, my implementations now use the multiplicative version, everywhere.
|
|
|
|
<p>Interestingly, these are not the only notions of “uniformity” you can come up with. Another possibility
|
|
is that of generating 1074-bit integers, normalize and return the nearest value representable as a
|
|
64-bit double (this is the theory—in practice, you will almost never
|
|
use more than two integers per double as the remaining bits would not be representable). This approach guarantees that all
|
|
representable doubles could be in principle generated, albeit not every
|
|
returned double will appear with the same probability. A reference
|
|
implementation can be found <a href="random_real.c">here</a>. Note that unless your generator has
|
|
at least 1074 bits of state and suitable equidistribution properties, the code above will not do what you expect
|
|
(e.g., it might <em>never</em> return zero).
|
|
|
|
|
|
</div>
|
|
|
|
|
|
</div>
|
|
|
|
<div id="right">
|
|
|
|
<!-- <h1>Download</h1>
|
|
<p><ul>
|
|
<li><a HREF="prng-1.2.tgz">source tarball</A>
|
|
<li><a HREF="prng-data.tar.bz2">data tarball (large!)</a>
|
|
</ul>
|
|
-->
|
|
<h1>C code (64 bits)</h1>
|
|
<p><ul>
|
|
<li><a HREF="xoshiro256plusplus.c"><code>xoshiro256++</code></a>
|
|
<li><a HREF="xoshiro256starstar.c"><code>xoshiro256**</code></a>
|
|
<li><a HREF="xoshiro256plus.c"><code>xoshiro256+</code></a>
|
|
<li><a HREF="xoroshiro128plusplus.c"><code>xoroshiro128++</code></a>
|
|
<li><a HREF="xoroshiro128starstar.c"><code>xoroshiro128**</code></a>
|
|
<li><a HREF="xoroshiro128plus.c"><code>xoroshiro128+</code></a>
|
|
<li><a HREF="https://github.com/vigna/MRG32k3a">Testless <code>MRG32k3a</code></a>
|
|
<li><a HREF="MWC128.c"><code>MWC128</code></a> + <a HREF="mp.c"><code>mp.c</code></a>
|
|
<li><a HREF="MWC192.c"><code>MWC192</code></a> + <a HREF="mp.c"><code>mp.c</code></a>
|
|
<li><a HREF="MWC256.c"><code>MWC256</code></a> + <a HREF="mp.c"><code>mp.c</code></a>
|
|
<li><a HREF="GMWC128.c"><code>GMWC128</code></a> + <a HREF="mp.c"><code>mp.c</code></a>
|
|
<li><a HREF="GMWC256.c"><code>GMWC256</code></a> + <a HREF="mp.c"><code>mp.c</code></a>
|
|
<!-- <li><a HREF="xoshiro512starstar.c"><code>xoshiro512**</code></a>
|
|
<li><a HREF="xoshiro512plus.c"><code>xoshiro512+</code></a>
|
|
<li><a HREF="xoroshiro1024starstar.c"><code>xoroshiro1024**</code></a>
|
|
<li><a HREF="xoroshiro1024plus.c"><code>xoroshiro1024*</code></a>-->
|
|
</ul>
|
|
|
|
<h1>C code (32 bits)</h1>
|
|
<p><ul>
|
|
<li><a HREF="xoshiro128plusplus.c"><code>xoshiro128++</code></a>
|
|
<li><a HREF="xoshiro128starstar.c"><code>xoshiro128**</code></a>
|
|
<li><a HREF="xoshiro128plus.c"><code>xoshiro128+</code></a>
|
|
<li><a HREF="xoroshiro64starstar.c"><code>xoroshiro64**</code></a>
|
|
<li><a HREF="xoroshiro64star.c"><code>xoroshiro64*</code></a>
|
|
</ul>
|
|
|
|
<h1>Java code (<a HREF="https://github.com/openjdk/jdk17/tree/master/src/jdk.random/share/classes/jdk/random"><code>java.util.random</code></a>)</h1>
|
|
|
|
<h1>Java code (<a href="http://dsiutils.di.unimi.it">DSI utilities</a>)</h1>
|
|
<p><ul>
|
|
<li><a HREF="http://dsiutils.di.unimi.it/docs/it/unimi/dsi/util/package-summary.html">Overview</a>
|
|
<li><a HREF="http://dsiutils.di.unimi.it/docs/it/unimi/dsi/util/XoShiRo256PlusPlusRandom.html"><code>xoshiro256++</code></a>
|
|
<li><a HREF="http://dsiutils.di.unimi.it/docs/it/unimi/dsi/util/XoShiRo256StarStarRandom.html"><code>xoshiro256**</code></a>
|
|
<li><a HREF="http://dsiutils.di.unimi.it/docs/it/unimi/dsi/util/XoShiRo256PlusRandom.html"><code>xoshiro256+</code></a>
|
|
<li><a HREF="http://dsiutils.di.unimi.it/docs/it/unimi/dsi/util/XoRoShiRo128PlusPlusRandom.html"><code>xoroshiro128++</code></a>
|
|
<li><a HREF="http://dsiutils.di.unimi.it/docs/it/unimi/dsi/util/XoRoShiRo128StarStarRandom.html"><code>xoroshiro128**</code></a>
|
|
<li><a HREF="http://dsiutils.di.unimi.it/docs/it/unimi/dsi/util/XoRoShiRo128PlusRandom.html"><code>xoroshiro128+</code></a>
|
|
<li><a HREF="https://github.com/vigna/MRG32k3a">Testless <code>MRG32k3a</code></a>
|
|
</ul>
|
|
|
|
<h1>Java code (<a HREF="https://gitbox.apache.org/repos/asf?p=commons-rng.git">Apache Commons RNG implementations</a>)</h1>
|
|
|
|
<h1>Documentation</h1>
|
|
<p><ul>
|
|
<li>The <a href="http://vigna.di.unimi.it/papers.php#BlVSLPNG">paper</a> introducing <code>xoshiro</code>/<code>xoroshiro</code>.
|
|
<li>The <a href="http://vigna.di.unimi.it/papers.php#BlVNTHWD">paper</a> describing our <a href="hwd.php">test for Hamming-weight dependencies</a>.
|
|
<li>A <a href="http://vigna.di.unimi.it/papers.php#VigHTLGMT">paper</a> discussing the defects of the Mersenne Twister family of PRNGs.
|
|
<li>A <a href="http://vigna.di.unimi.it/papers.php#VigPORSPNG">paper</a> discussing the probability of overlap of random subsequences.
|
|
<li>A <a href="http://vigna.di.unimi.it/papers.php#StVCESGMCPNG">paper</a> with new tables of multipliers for LCGs with power-of-two moduli.
|
|
<li>A <a href="http://vigna.di.unimi.it/papers.php#StVLXM">paper</a> presenting the family LXM of PRNGs.
|
|
</ul>
|
|
|
|
<h1>Discussion</h1>
|
|
|
|
<p>There is a <a href="http://groups.google.com/group/prng">discussion group</a>
|
|
about this page. You can join or <a href="mailto:prng@googlegroups.com">send a message</a>.
|
|
<h1><a href="https://validator.w3.org/check/referer">This is valid HTML 4.01</a></h1>
|
|
|
|
</div>
|
|
</body>
|
|
</html>
|