tweak: more typos
This commit is contained in:
parent
df8a63ef88
commit
4086ec8671
|
@ -12,7 +12,7 @@
|
||||||
\begin{frontmatter}
|
\begin{frontmatter}
|
||||||
|
|
||||||
|
|
||||||
\title{A computable version of Solomonoff induction}
|
\title{A computable variant of Solomonoff induction}
|
||||||
|
|
||||||
%% use optional labels to link authors explicitly to addresses:
|
%% use optional labels to link authors explicitly to addresses:
|
||||||
%% \author[label1,label2]{}
|
%% \author[label1,label2]{}
|
||||||
|
@ -24,7 +24,7 @@
|
||||||
\address[1]{Quantified Uncertainty Research Institute, Mexico}
|
\address[1]{Quantified Uncertainty Research Institute, Mexico}
|
||||||
|
|
||||||
\begin{abstract}
|
\begin{abstract}
|
||||||
I present a computable version of Solomonoff induction.
|
I present a computable variant of Solomonoff induction.
|
||||||
\end{abstract}
|
\end{abstract}
|
||||||
|
|
||||||
\begin{keyword}
|
\begin{keyword}
|
||||||
|
@ -37,9 +37,9 @@ Solomonoff \sep induction \sep computable
|
||||||
\section{The key idea: arrive at the correct hypothesis in finite time}
|
\section{The key idea: arrive at the correct hypothesis in finite time}
|
||||||
|
|
||||||
\begin{enumerate}
|
\begin{enumerate}
|
||||||
\item Start with a finite set of turing machines, $\{T_0, ..., T_n\}$
|
\item Start with a finite set of Turing machines, $\{T_0, ..., T_n\}$
|
||||||
\item If none of the $T_i$ predict your trail bits, $(B_0, ..., B_m)$, compute the first $m$ steps of Turing machine $T_{n+1}$. If $T_{n+1}$ doesn't predict them either, go to $T_{n+2}$, and so on\footnote{Here we assume that we have an ordering of Turing machines, i.e., that $T_i$ is simpler than $T_{(i+1)}$}
|
\item If none of the $T_i$ predict your trail bits, $(B_0, ..., B_m)$, compute the first $m$ steps of Turing machine $T_{n+1}$. If $T_{n+1}$ doesn't predict them either, go to $T_{n+2}$, and so on\footnote{Here we assume that we have an ordering of Turing machines, i.e., that $T_i$ is simpler than $T_{(i+1)}$}
|
||||||
\item Observe the next bit, purge the machines from your set which don't predict it. If none predict it, GOTO 2.
|
\item Observe the next bit, purge the machines from your set which don't predict it. If none predict it, go to step 2.
|
||||||
\end{enumerate}
|
\end{enumerate}
|
||||||
|
|
||||||
|
|
||||||
|
@ -53,18 +53,17 @@ QED.
|
||||||
|
|
||||||
\section{Using the above scheme to arrive at a probability }
|
\section{Using the above scheme to arrive at a probability }
|
||||||
|
|
||||||
Now, the problem with the above scheme is that if we use our set of turing machines to output a probability for the next bit
|
Now, the problem with the above scheme is that if we use our set of Turing machines to output a probability for the next bit
|
||||||
|
|
||||||
$$ P\Big(b_{m + 1} = 1 | (B_0, ..., B_m) \Big) := \frac{1}{n} \cdot \sum_0^n \Big(T_i(m+1) = 1\Big) $$
|
$$ P\Big(b_{m + 1} = 1 | (B_0, ..., B_m) \Big) := \frac{1}{n} \cdot \sum_0^n \Big(T_i(m+1) = 1\Big) $$
|
||||||
|
|
||||||
then our probabilities are going to be very janky. For example, at step $ j - 1 $, that scheme would output a 0\% probability to the correct value of the next bit.
|
then our probabilities are going to be very janky. For example, at step $ j - 1 $, that scheme would output a 0\% probability to the correct value of the next bit.
|
||||||
|
|
||||||
To fix this, in step 2, we can require that there be not only one turing machine that predicts all past bits, but multiple of them. What multiple? Well, however many your compute and memory budgets allow. This would make your probabilities less janky.
|
To fix this, in step 2, we can require that there be not only one Turing machine that predicts all past bits, but multiple of them. How many? Well, however many your compute and memory budgets allow. This would make your probabilities less janky.
|
||||||
|
|
||||||
Interestingly, that scheme also suggests that there is a tradeoff between arriving at the correct hypothesis as fast as possible—in which case we would just implement the first scheme at full speed—and producing accurate probabilities—in which case it seems like we would use the modification just outlined.
|
Interestingly, that scheme also suggests that there is a tradeoff between arriving at the correct hypothesis as fast as possible—in which case we would just implement the first scheme at full speed—and producing accurate probabilities—in which case it seems like we would use the modification just outlined.
|
||||||
|
|
||||||
\section{A downside}
|
\sectionHow many
|
||||||
|
|
||||||
Note that a downside of the procedures outlined above is that at the point we arrive at the correct hypothesis, we don't know that this is the case.
|
Note that a downside of the procedures outlined above is that at the point we arrive at the correct hypothesis, we don't know that this is the case.
|
||||||
|
|
||||||
\section{An distracting epicycle: dealing with Turing machines that take too long or do not halt.}
|
\section{An distracting epicycle: dealing with Turing machines that take too long or do not halt.}
|
||||||
|
@ -84,7 +83,7 @@ But then in step 2 of our scheme, we never get to see what bits this program out
|
||||||
This can easily be fixed as follows:
|
This can easily be fixed as follows:
|
||||||
|
|
||||||
\begin{enumerate}
|
\begin{enumerate}
|
||||||
\item Start with a finite set of live turing machines, $\{T_0, ..., T_n\}$, a compute budget $s$ and an empty cache of programs which take too long $C =\{\}$
|
\item Start with a finite set of live Turing machines, $\{T_0, ..., T_n\}$, a compute budget $s$ and an empty cache of programs which take too long $C =\{\}$
|
||||||
\item Run each $T_i$ for $s$ seconds.
|
\item Run each $T_i$ for $s$ seconds.
|
||||||
\item If none of them predict your trail bits, $(B_0, ..., B_m)$, within the given compute budget:
|
\item If none of them predict your trail bits, $(B_0, ..., B_m)$, within the given compute budget:
|
||||||
\begin{enumerate}
|
\begin{enumerate}
|
||||||
|
@ -95,7 +94,7 @@ This can easily be fixed as follows:
|
||||||
\item Repeat steps 2-3 until you have one program which has predicted past bits within your compute budget. Eventually this program must exist, since the Turing machine which is producing your trail of bits is by construction computable and non-halting.\footnote{Or at least, it hasn't halted before producing the number of bits that you have seen so far.}
|
\item Repeat steps 2-3 until you have one program which has predicted past bits within your compute budget. Eventually this program must exist, since the Turing machine which is producing your trail of bits is by construction computable and non-halting.\footnote{Or at least, it hasn't halted before producing the number of bits that you have seen so far.}
|
||||||
\item
|
\item
|
||||||
\end{enumerate}
|
\end{enumerate}
|
||||||
\item Observe the next bit, purge the machines from your set which don't predict it. If none predict it, GOTO 2.
|
\item Observe the next bit, purge the machines from your set which don't predict it. If none predict it, go to 2.
|
||||||
\end{enumerate}
|
\end{enumerate}
|
||||||
|
|
||||||
\end{document}
|
\end{document}
|
||||||
|
|
|
@ -5,9 +5,9 @@ Thinking about [Just-in-time Bayesianism](https://nunosempere.com/blog/2023/02/0
|
||||||
|
|
||||||
### The key idea: arrive at the correct hypothesis in finite time
|
### The key idea: arrive at the correct hypothesis in finite time
|
||||||
|
|
||||||
1. Start with a finite set of turing machines, \(\{T_0, ..., T_n\}\)
|
1. Start with a finite set of Turing machines, \(\{T_0, ..., T_n\}\)
|
||||||
2. If none of the \(T_i\) predict your trail bits, \((B_0, ..., B_m)\), compute the first \(m\) steps of Turing machine \(T_{n+1}\). If \(T_{n+1}\) doesn't predict them either, go to \(T_{n+2}\), and so on[^1]
|
2. If none of the \(T_i\) predict your trail bits, \((B_0, ..., B_m)\), compute the first \(m\) steps of Turing machine \(T_{n+1}\). If \(T_{n+1}\) doesn't predict them either, go to \(T_{n+2}\), and so on[^1]
|
||||||
3. Observe the next bit, purge the machines from your set which don't predict it. If none predict it, GOTO 2.
|
3. Observe the next bit, purge the machines from your set which don't predict it. If none predict it, go to 2.
|
||||||
|
|
||||||
Then in finite time, you will arrive at a set which only contains the simplest TM which describes the process generating your train of bits. Proof:
|
Then in finite time, you will arrive at a set which only contains the simplest TM which describes the process generating your train of bits. Proof:
|
||||||
|
|
||||||
|
@ -18,13 +18,13 @@ QED.
|
||||||
|
|
||||||
### Using the above scheme to arrive at a probability
|
### Using the above scheme to arrive at a probability
|
||||||
|
|
||||||
Now, the problem with the above scheme is that if we use our set of turing machines to output a probability for the next bit
|
Now, the problem with the above scheme is that if we use our set of Turing machines to output a probability for the next bit
|
||||||
|
|
||||||
\[ P\Big(b_{m + 1} = 1 | (B_0, ..., B_m) \Big) := \frac{1}{n} \cdot \sum_0^n \Big(T_i(m+1) = 1\Big) \]
|
\[ P\Big(b_{m + 1} = 1 | (B_0, ..., B_m) \Big) := \frac{1}{n} \cdot \sum_0^n \Big(T_i(m+1) = 1\Big) \]
|
||||||
|
|
||||||
then our probabilities are going to be very janky. For example, at step \( j - 1 \), that scheme would output a 0% probability to the correct value of the next bit.
|
then our probabilities are going to be very janky. For example, at step \( j - 1 \), that scheme would output a 0% probability to the correct value of the next bit.
|
||||||
|
|
||||||
To fix this, in step 2, we can require that there be not only one turing machine that predicts all past bits, but multiple of them. What multiple? Well, however many your compute and memory budgets allow. This would make your probabilities less janky.
|
To fix this, in step 2, we can require that there be not only one Turing machine that predicts all past bits, but multiple of them. What number? Well, however many your compute and memory budgets allow. This would make your probabilities less janky.
|
||||||
|
|
||||||
Interestingly, that scheme also suggests that there is a tradeoff between arriving at the correct hypothesis as fast as possible—in which case we would just implement the first scheme at full speed—and producing accurate probabilities—in which case it seems like we would have to use a tweak
|
Interestingly, that scheme also suggests that there is a tradeoff between arriving at the correct hypothesis as fast as possible—in which case we would just implement the first scheme at full speed—and producing accurate probabilities—in which case it seems like we would have to use a tweak
|
||||||
|
|
||||||
|
@ -44,7 +44,7 @@ But then in step 2 of our scheme, we never get to see what bits this program out
|
||||||
|
|
||||||
This can easily be fixed as follows:
|
This can easily be fixed as follows:
|
||||||
|
|
||||||
1. Start with a finite set of live turing machines, \(\{T_0, ..., T_n\}\), a compute budget \( s \) and an empty cache of programs which take too long \( C =\{\} \)
|
1. Start with a finite set of live Turing machines, \(\{T_0, ..., T_n\}\), a compute budget \( s \) and an empty cache of programs which take too long \( C =\{\} \)
|
||||||
2. Run each \( T_i \) for \( s \) seconds.
|
2. Run each \( T_i \) for \( s \) seconds.
|
||||||
3. If none of them predict your trail bits, \( (B_0, ..., B_m) \), within the given compute budget:
|
3. If none of them predict your trail bits, \( (B_0, ..., B_m) \), within the given compute budget:
|
||||||
- eliminate the ones who make incorrect predictions.
|
- eliminate the ones who make incorrect predictions.
|
||||||
|
@ -52,7 +52,7 @@ This can easily be fixed as follows:
|
||||||
- attempt to compute the first \( m \) steps of Turing machine \(T_{n+1} \) with \( s \) seconds of compute. If it makes correct predictions, keep it in the set of live machines, otherwise move it to the cache.
|
- attempt to compute the first \( m \) steps of Turing machine \(T_{n+1} \) with \( s \) seconds of compute. If it makes correct predictions, keep it in the set of live machines, otherwise move it to the cache.
|
||||||
- increase the compute budget to \( s + 1 \) and run each machine in the cache for one additional second
|
- increase the compute budget to \( s + 1 \) and run each machine in the cache for one additional second
|
||||||
- Repeat step 2 until you have one program which has predicted past bits within your compute budget. Eventually this program must exist, since the Turing machine which is producing your trail of bits is by construction computable and non-halting[^3].
|
- Repeat step 2 until you have one program which has predicted past bits within your compute budget. Eventually this program must exist, since the Turing machine which is producing your trail of bits is by construction computable and non-halting[^3].
|
||||||
4. Observe the next bit, purge the machines from your set which don't predict it. If none predict it, GOTO 2.
|
4. Observe the next bit, purge the machines from your set which don't predict it. If none predict it, go to 2.
|
||||||
|
|
||||||
<p>
|
<p>
|
||||||
<section id='isso-thread'>
|
<section id='isso-thread'>
|
||||||
|
|
Loading…
Reference in New Issue
Block a user