A computable version of Solomoff induction

Thinking about Just-in-time Bayesianism a bit more, here is a computable approximation to Solomonoff Induction, which converges to the Turing machine generating your trail of bits in finite time.

The key idea: arrive at the correct hypothesis in finite time

  1. Start with a finite set of turing machines, ({T_0, …, T_n})
  2. If none of the (T_i) predict your trail bits, ((B_0, …, B_m)), compute the first (m) steps of Turing machine (T{n+1}). If (T{n+1}) doesn’t predict them either, go to (T_{n+2}), and so on^1
  3. Observe the next bit, purge the machines from your set which don’t predict it. If none predict it, GOTO 2.

Then in finite time, you will arrive at a set which only contains the simplest TM which describes the process generating your train of bits. Proof:

QED.

Using the above scheme to arrive at a probability

Now, the problem with the above scheme is that if we use our set of turing machines to output a probability for the next bit

[ P\Big(b_{m + 1} = 1 | (B_0, …, B_m) \Big) := \frac{1}{n} \cdot \sum_0n \Big(T_i(m+1) = 1\Big) ]

then our probabilities are going to be very janky. For example, at step ( j - 1 ), that scheme would output a 0% probability to the correct value of the next bit.

To fix this, in step 2, we can require that there be not only one turing machine that predicts all past bits, but multiple of them. What multiple? Well, however many your compute and memory budgets allow. This would make your probabilities less janky.

Interestingly, that scheme also suggests that there is a tradeoff between arriving at the correct hypothesis as fast as possible—in which case we would just implement the first scheme at full speed—and producing accurate probabilities—in which case it seems like we would have to use a tweak

A distracting epicycle: dealing with Turing machines that take too long or do not halt.

When thinking about Turing machines, one might consider one particular model, e.g., valid C programs. But in that case, it is easy to consider programs that do not halt, like:^2

void main(){ while(1){ ; } }

But then in step 2 of our scheme, we never get to see what bits this program outputs, because it never outputs any bits.

This can easily be fixed as follows:

  1. Start with a finite set of live turing machines, ({T_0, …, T_n}), a compute budget ( s ) and an empty cache of programs which take too long ( C ={} )
  2. Run each ( T_i ) for ( s ) seconds.
  3. If none of them predict your trail bits, ( (B_0, …, B_m) ), within the given compute budget:
  4. Observe the next bit, purge the machines from your set which don’t predict it. If none predict it, GOTO 2.