\item If none of the $T_i$ predict your trail bits, $(B_0, ..., B_m)$, compute the first $m$ steps of Turing machine $T_{n+1}$. If $T_{n+1}$ doesn't predict them either, go to $T_{n+2}$, and so on\footnote{Here we assume that we have an ordering of Turing machines, i.e., that $T_i$ is simpler than $T_{(i+1)}$}
Then in finite time, you will arrive at a set which only contains the simplest TM which describes the process generating your train of bits. Proof:
\begin{itemize}
\item Let $T_j$ be the simplest machine which describes the process generating your trail of bits. Then by virtue of it being such, all TM machines $\{T_1,...,T_{j-1}\}$ are not the simplest machine generating your trail of bits. Meaning that at some point, they predict something distinct from your trail of bits, meaning that at some point step 2. of the process above discards them. In particular, the step after discarding $T_{j-1}$ is arriving at $T_j$, and staying there forever.
\item Therefore, this process arrives at the simplest TM which describes the process generating your trail of bits in finite time.
\end{itemize}
QED.
\section{Using the above scheme to arrive at a probability }
then our probabilities are going to be very janky. For example, at step $ j -1$, that scheme would output a 0\% probability to the correct value of the next bit.
To fix this, in step 2, we can require that there be not only one Turing machine that predicts all past bits, but multiple of them. How many? Well, however many your compute and memory budgets allow. This would make your probabilities less janky.
Interestingly, that scheme also suggests that there is a tradeoff between arriving at the correct hypothesis as fast as possible—in which case we would just implement the first scheme at full speed—and producing accurate probabilities—in which case it seems like we would use the modification just outlined.
When thinking about Turing machines, one might consider one particular model, e.g., valid C programs. But in that case, it is easy to consider programs that do not halt, like:\footnote{Readers might find it amusing to run gcc loop.c and check it}:
\item Start with a finite set of live Turing machines, $\{T_0, ..., T_n\}$, a compute budget $s$ and an empty cache of programs which take too long $C =\{\}$
\item If none of them predict your trail bits, $(B_0, ..., B_m)$, within the given compute budget:
\begin{enumerate}
\item eliminate the ones who make incorrect predictions.
\item keep track of programs which didn't output anything in the cache.
\item attempt to compute the first $m$ steps of Turing machine $T_{n+1}$ with $s$ seconds of compute. If it makes correct predictions, keep it in the set of live machines, otherwise move it to the cache.
\item increase the compute budget to $s +1$ and run each machine in the cache for one additional second
\item Repeat steps 2-3 until you have one program which has predicted past bits within your compute budget. Eventually this program must exist, since the Turing machine which is producing your trail of bits is by construction computable and non-halting.\footnote{Or at least, it hasn't halted before producing the number of bits that you have seen so far.}