From 697ea680899c944f0d3f86162648d525afa58033 Mon Sep 17 00:00:00 2001 From: NunoSempere Date: Wed, 1 Mar 2023 23:41:49 -0600 Subject: [PATCH] feat: add more solomonoff changes --- blog/2023/03/01/computable-solomoff/index.md | 27 + blog/2023/03/01/computable-solomoff/src/a.out | Bin 0 -> 16464 bytes .../2023/03/01/computable-solomoff/src/loop.c | 5 + .../03/01/computable-solomoff/src/main.tex | 28 +- .../03/01/computable-solomoff/src/old.text | 1132 ----------------- .../03/01/computable-solomoff/src/test.md | 29 + 6 files changed, 84 insertions(+), 1137 deletions(-) create mode 100755 blog/2023/03/01/computable-solomoff/src/a.out create mode 100644 blog/2023/03/01/computable-solomoff/src/loop.c delete mode 100644 blog/2023/03/01/computable-solomoff/src/old.text create mode 100644 blog/2023/03/01/computable-solomoff/src/test.md diff --git a/blog/2023/03/01/computable-solomoff/index.md b/blog/2023/03/01/computable-solomoff/index.md index 4bdf648..090f0f3 100644 --- a/blog/2023/03/01/computable-solomoff/index.md +++ b/blog/2023/03/01/computable-solomoff/index.md @@ -28,6 +28,33 @@ To fix this, in step 2, we can require that there be not only one turing machine Interestingly, that scheme also suggests that there is a tradeoff between arriving at the correct hypothesis as fast as possible—in which case we would just implement the first scheme at full speed—and producing accurate probabilities—in which case it seems like we would have to use a tweak +### A distracting epicycle: dealing with Turing machines that take too long or do not halt. + +When thinking about Turing machines, one might consider one particular model, e.g., valid C programs. But in that case, it is easy to consider programs that do not halt, like:[^2] +[^2]: Readers might find it amusing to run gcc loop.c and check it + +``` +void main(){ + while(1){ + ; + } +} +``` + +But then in step 2 of our scheme, we never get to see what bits this program outputs, because it never outputs any bits. + +This can easily be fixed as follows: + +1. Start with a finite set of live turing machines, \(\{T_0, ..., T_n\}\), a compute budget \( s \) and an empty cache of programs which take too long \( C =\{\} \) +1. Run each \( T_i \) for \( s \) seconds. +1. If none of them predict your trail bits, \( (B_0, ..., B_m) \), within the given compute budget: + - eliminate the ones who make incorrect predictions. + - keep track of programs which didn't output anything in the cache. + - attempt to compute the first \( m \) steps of Turing machine \(T_{n+1} \) with \( s \) seconds of compute. If it makes correct predictions, keep it in the set of live machines, otherwise move it to the cache. + - increase the compute budget to \( s + 1 \) and run each machine in the cache for one additional second + - Repeat step 2 until you have one program which has predicted past bits within your compute budget. Eventually this program must exist, since the Turing machine which is producing your trail of bits is by construction computable and non-halting.\footnote{Or at least, it hasn't halted before producing the number of bits that you have seen so far.} +1. Observe the next bit, purge the machines from your set which don't predict it. If none predict it, GOTO 2. +

diff --git a/blog/2023/03/01/computable-solomoff/src/a.out b/blog/2023/03/01/computable-solomoff/src/a.out new file mode 100755 index 0000000000000000000000000000000000000000..888f10dc638b9d88c30278f4ae689cffa7ba0411 GIT binary patch literal 16464 zcmeHOeQX>@6`%9jiPOgC%NH?8Xbwn3LgS4c662JZ-1#eO!iNdju8A{MaFaum2PdJ0jBIzS^jw`&0#wouw$nu- zg9rXd900*r{Kdqh4_m|z=(gk7%r!f9a~)ocNnExG!j46@Q--5Y!{H>sS@8!%JUJ$O zwM#tx7;|b-K)>B_vGZfM*a6uGr=u_1skh?qUa_l<>1vc8_c^oz!rzCyf@RLL57u$|{|U;; zTqeAl{^^(|Zt>Gje5^{@2I5v%;RlF|RpEzk!QV;zyQ=6bir+AYt4&$f@I=x&mB}Tu znTOIOa*rz2|+$jeS z`PzPm=dogcNY`CF+QelyTs-Cu^K&j9_W<+rE}rK_kb(#V5eOm>L?DPj5P={9K?H&b z{3j9kbK_lqH>dtoZ$47@i=9fDpPIF6%2&*(AJ$*i)s*)1VY5o*{t=B4?t8G6RNx?r!R3bslG z_2uiQ8aGqBV9)J6*TKcAboZb+)ltJ)bNXd_rTIw5G7@fVFPFF}f*4NABr&%WaG2zpMYw0bYzj1O3J#<8c zCZeH@E0@<_5T3u?Uw^G!<~MHDyO4BqG~7BeBj%Q)eXF8L8Lb311}h3sg_= zos9z-q#y!81cC?z5eOm>L?DPj5P={9K?MHaB7l93*vCk(YB~W|qC>LJu|ec-%YMkW zi;Vq}8%4%GNZwb<3Hv7beT@^!-&`o?S$;w`JL28ycrCNox7nK_ zd-jXxD)>ptgw)b_ewO%Z?Y~vOKCkOpt#)vOeW_YKU(q^imbCNx#E1B4=^L+SMc+T} zFKB-0{CP#!vvhnPqYl*Y)7cm2F!8G?K3|i+A9&ucAwH&kcz=#*e(5~fNqkhTuZ#<< z8i-$Av-G}bV;)K_9F%(e^Kgjxm`e}Y)5Nc;S+DSX1b>!!?yMNe<6g}W&;5-{f74>{ z9l|KhtERXR0w+$@%Qno#B)3S`rjo!=5?v1pAoPB(dsW7 zF4#r8L_3a#)#5P>%O10a`5;D-4q>G7)<`x#l+0QwJ6|YT$xJC?qGXbj~hJs#AsJSlUXJ#>OVe#3fmzX?s+&`TRIF$+CKnBo6jiJ%_q2 zIsxLMY zifQW3VoNw6Bo7sfQjI=2V_9@S2EpKyGgXhncn{!so2HE7t@R}E|OIX+8{HtFd zGVnKK+vu{#dUw8=NJ`Liq(vG0YgE8<5B@9bO*zjzE91xWpA+mbj$a^-RbY?z_*q#P z#i`7lU=N(9Lbtt&i*}2H@FQ%+Kja00.5$, consider that the question has a probability $q'$ of resolving negatively and swap $p$ and $(1-p)$. - -\end{proof} - -In fact, if a forecaster has so far performed better than random guessing, there exists a range of probabilities which, if used, are guaranteed to hurt their score not only in expectation, but regardless of the outcome of the event. This is stated formally as follows. - -\begin{theorem} - A forecaster with brier score $b^2$ who forecasts that an event has probability $p$ is guaranteed to end up with a worse brier score, regardless of the outcome of the event, if $bb^2$ as $p>b$, so the player's score will be worse. - - - If $p>0.5$, the best score the player can achieve is $(1-p)^2$, when the event occurs. -\begin{equation} -\begin{split} -(1-p)>1-(1-b)\\ -\implies(1-p)>b\\ -\implies(1-p)^2>b^2\\ -\end{split} -\end{equation} - Hence, in this case the player's score will also be worse. - -\end{proof} - -Some competitions don't reward the Brier score, but the relative Brier score (or some combination of both). The relative Brier score is defined as the difference between the forecaster's Brier scores, and the aggregate's Brier scores. As before, a lower score is better. - -As in the previous case, forecasters should sometimes not predict on some questions, even if they know the probability exactly. This is not necessarily a problem, as it might lead to better allocation of the forecaster's attention, but can be. - -\begin{theorem} - A forecaster seeking to obtain a low average relative Brier score score, and who has\footnote{Again, either currently or in expectation} a relative Brier score of $r$, should only make predictions in questions where: -\begin{equation} - E[\textnormal{the forecaster's Brier score}] - E[\textnormal{Brier score of the aggregate}] < r -\end{equation} -\end{theorem} -\begin{proof} -\begin{equation} -\begin{split} -&E[\textnormal{the forecaster's Brier score}] - E[\textnormal{Brier score of the aggregate}] \\ -&= E[\textnormal{the forecaster's Brier score} - \textnormal{Brier score of the aggregate}] \\ -&= E[\textnormal{relative Brier score}] -\end{split} -\end{equation} -and the forecaster should only predict if $E[\textnormal{relative Brier score}]} - -%% else use the following coding to input the bibitems directly in the -%% TeX file. -\bibliographystyle{plain} -\begin{thebibliography}{00} - -%% \bibitem[Author(year)]{label} -%% Text of bibliographic item -\bibitem[CSET-Foretell (2020)]{SouthChina} - CSET-Foretell (2020) - ``Will the Chinese military or other maritime security forces fire upon another country's civil or military vessel in the South China Sea between January 1 and June 30, 2021, inclusive?'' - URL: https://web.archive.org/web/20201031221709/https://goodjudgment.io/ superforecasts/\#1338 - -\bibitem[Enten (2017)]{Fake polls real problem} - Enten, H. (2017) - ``Fake Polls Are A Real Problem" - URL: https://fivethirtyeight.com/features/fake-polls-are-a-real-problem/ - -\bibitem[Friedman (2001)]{friedman1} - Friedman, D. (2001) - \textit{Law's Order: What Economics Has to Do with Law and Why It Matters} - -\bibitem[Friedman (1990)]{friedman2} - Friedman, D. (1990) - \textit{Price Theory: An Intermediate Text}. - -\bibitem[Good Judgement (2018)]{gjscience} - Good Judgment (2018) - ``The Science of Superforecasting'' - Archived URL: https://web.archive.org/web/20180408044422/http://goodjudgment.com/science.html - -\bibitem[Good Judgment Scoring Rule (2019)]{GJSR} - Good Judgment (2019) - ``4. How are my forecasts scored for accuracy?'' - URL: https://www.gjopen.com/faq\#faq4 - -\bibitem[Good Judgement (2020a)]{covidrecovery} - Good Judgment (2020a) - ``Public Dashboard'' - URL: https://goodjudgment.com/covidrecovery/ - archived URL: https://web.archive.org/web/20201120231552/ https://goodjudgment.com/covidrecovery/ - -\bibitem[Good Judgement (2020b)]{USelections} - Good Judgment (2020b) - ``Who will win the 2020 United States presidential election?'' - URL: https://web.archive.org/web/20201031221709/ https://goodjudgment.io/superforecasts/\#1338 - -\bibitem[Good Judgement (2020c)]{gjfirst} - Good Judgment (2020c) - ``The First Championship Season'' - Archived URL: - https://web.archive.org/web/20201127110425/https://goodjudgment - .com/resources/the-superforecasters-track-record/the-first-championship-season - -\bibitem[Hubinger et al. (2019)]{Hubinger} - Hubinger, E. et al. (2019) - ``Risks from Learned Optimization in Advanced Machine Learning Systems" - URL: https://arxiv.org/abs/1906.01820 - -\bibitem[Krakovna et al. (2020)]{Krakovna} - Krakovna V. et al. (2020) - ``Specification gaming: the flip side of AI ingenuity" - URL: https://deepmind.com/blog/article/Specification-gaming-the-flip-side-of-AI-ingenuity/ - Specification gaming examples in AI - master list: https://docs.google.com/spreadsheets/d/e/2PACX-1vRPiprOaC3HsCf5Tuum8bRfzYUiKLRqJmbOoC-32JorNdfyTiRRsR7Ea5eWtvsWzuxo8bjOxCG84dAg/pubhtml - -\bibitem[Karvetski et al. (s.a)]{Karvetski} - Karvetski C., Minto T., Twardy, C.R. (s.a.) - ``Proper scoring of contingent stopping questions". - Unpublished. - -\bibitem[Lagerros (2019)]{lagerros} - Lagerros, J. (2019) - ``Unconscious Economics'' - URL: https://www.lesswrong.com/posts/PrCmeuBPC4XLDQz8C/unconscious-economics - -\bibitem[Lichtendahl et al. (2007)]{Lichtendahl} - Lichtendahl et al. - ``Probability Elicitation, Scoring Rules, and Competition Among Forecasters'' - Management Science, Vol. 53. N. 11. - URL: https://pubsonline.informs.org/doi/abs/10.1287/mnsc.1070.0729?journalCode=mnsc - -\bibitem[Metaculus (2018)]{ragnarok} - Metaculus (2018) - ``Ragnarök Question Series'' - URL: https://www.metaculus.com/questions/1506/ragnar\%25C3\%25B6k-question-series-overview-and-upcoming-questions/ - -\bibitem[Metaculus (2020)]{aiprogress} - Metaculus (2020) - ``Forecasting AI Progress'' - URL: https://www.metaculus.com/questions/1506/ragnar\%25C3\%25B6k-question-series-overview-and-upcoming-questions/ - -\bibitem[Metaculus (2021)]{FAQ} - Metaculus (2021) - ``FAQ'' - URL: https://www.metaculus.com/help/faq/\#fewpoints - -\bibitem[Rodriguez (2019)]{Rodriguez} - Rodriguez, L. (2019) - ``How many people would be killed as a direct result of a US-Russia nuclear exchange?'' - URL: https://forum.effectivealtruism.org/posts/FfxrwBdBDCg9YTh69/how-many-people-would-be-killed-as-a-direct-result-of-a-us - -\bibitem[Tetlock et al. (2015)]{Tetlock} %% . https://fs.blog/2015/12/ten-commandments-for-superforecasters/} - Tetlock, P. \& Gardner, D. (2015) \textit{Superforecasting: The Art and Science of Prediction.} - -\bibitem[Witkowski et al. (2021)]{Witkowski} - Witkowski J. et al. (2021) ``Incentive-Compatible Forecasting Competitions" - URL: https://arxiv.org/abs/2101.01816v1 - -\bibitem[Yeargain (2017)]{Yeargain} - Yeargain, T. (2020) - Missouri Law Review, Vol. 85, Issue 1. - ``Fake Polls, Real Consequences: The Rise of Fake Polls and the Case for Criminal Liability Case for Criminal Liability" (pp. 140-150). - URL: https://scholarship.law.missouri.edu/cgi/viewcontent.cgi?article=4418 - \&context=mlr -\end{thebibliography} - -\newpage -\begin{appendices} - -\section{Numerical Simulations}\label{Simulations} - -\subsection{Method used in the main body of the paper} -To quantify the optimal amount of distortion, we simulate a tournament many times, and observe the results. A tournament is made out of questions and users. - -We model questions as logistic distributions, with a mean of 0, and a standard deviation itself drawn from a logistic distribution of mean 20 and standard deviation 2. For instance, a question might be a logistic distribution of mean 0 and standard deviation 15. At question resolution time, a point is randomly drawn from the logistic distribution. The code to represent this looks roughly as follows: - -\begin{verbatim} -generateQuestion = function(meanOfTheStandardDeviation, - standardDeviationOfTheMean){ - mean <- 0 - sd <- randomDrawFromLogistic(meanOfTheStandardDeviation, - standardDeviationOfTheMean) - questionResult <- randomDrawFromLogistic(mean, sd) - question <- c(mean, sd, questionResult) - return(question) -} -\end{verbatim} - -Users attempt to guess the mean and standard deviation of each question, and each guess has some error. The code to represent this looks roughly as follows: - -\begin{verbatim} -generateUser = function(meanOfTheMean, standardErrorOfTheMean, - meanOfTheStandardDeviation, standardErrorOfTheStandardDeviation){ - user <- function(question){ - questionMean <- question[1] - questionSd <- question[2] - questionResolution <- question[3] - questionMeanGuessedByUser <- questionMean + - randomDrawFromLogistic(meanOfTheMean, standardErrorOfTheMean) - questionSdGuessedByUser <- questionSd + - randomDrawFromLogistic(meanOfTheStandardDeviation, standardErrorOfTheStandardDeviation)) - probabilityDensityOfResolutionGuessedByUser <- - getLogisticDensityAtPoint(questionResolution, - questionMeanGuessedByUser, questionSdGuessedByUser) - return(probabilityDensityOfResolutionGuessedByUser) - } - return(user) -} -\end{verbatim} - -We model the average user as having -\begin{itemize} - \item \verb|meanOfTheMean=5| - \item \verb|standardErrorOfTheMean=5| - \item \verb|meanOfTheStandardDeviation=5| - \item\verb|standardErrorOfTheStandardDeviation=5|. -\end{itemize} - -We then consider a "perfect predictor"---a user who knows what the mean and the standard deviation of a question are---and consider how much that perfect predictor would want to distort her own guess to maximize her chances of placing in the top 3 or users. More details can be found in the \href{https://github.com/NunoSempere/Online-Appendix-to-Incentive-Problems-In-Forecasting-Tournaments}{Online Appendix} accompanying this paper. - -\subsection{Simluations with more complex distributions of players} -\subsubsection{For binary questions} -A binary question elicits a probability from 0 to 100\%, and is resolved as either true or false. They exist in all three platforms we consider (Metaculus, Good Judgment Open and CSET). - -For the binary case, we first consider a simulated tournament with 10 questions, each with a `true' binary probability between 0 and 100\%. We also consider the following types of forecasters: - -\begin{enumerate} - \item Highly skilled predictors: 10 predictors predict a single probability on each question. Their predictions are off from the "true" binary probability by anywhere from -0.5 to 0.5 bits - \item Unsophisticated extremizers: 10 highly skilled predictors (whose predictions are off from the "true" binary probability by anywhere from -0.5 to 0.5 bits) who extremize their probabilities by 0.3 bits - \item Sophisticated extremizers: 5 highly skilled predictors (whose predictions are off from the "true" binary probability by anywhere from -0.5 to 0.5 bits) who take the question closest to 50\% and randomly move it to either 0\% or 100\% - \item Unskilled predictors: 10 predictors predict a single probability on each question. Their predictions are off from the "true" binary probability by anywhere from -2 to 2 bits -\end{enumerate} - -We ran the tournament 10,000 times. We find that sophisticated extremizers do best, followed by lucky unskilled predictors, followed by unsophisticated extremizers, followed by honest highly skilled predictors. - -\newpage - -\begin{figure}[h!] - \includegraphics[width=10cm]{ProbWin5} - \centering - \caption{\% of the time different predictors reach the top 5} -\end{figure} - -\begin{figure}[h!] - \includegraphics[width=10cm]{BrierScores5} - \centering - \caption{Mean Brier score for each group (lower is better)} -\end{figure} -\newpage - -We can also explore how this changes with the number of questions. In this case, we ran 1000 simulations for each possible number of questions, in increments of five. Results were as follows: -\\ -\begin{figure}[h!] - \includegraphics[width=\textwidth]{ProbDiscreteTopWithNumQuestionsTwoExtremizedQuestions} - \centering - \caption{Probability that a ``highly skilled predictor'' will obtain the lowest Brier score, for a tournament with $n$ binary questions} -\end{figure} - -The results would further vary in terms of how many questions the forecasters who extremize choose to extremize. Dynamic selection of number of questions to extremize based on the total number of questions could provide further opportunities for improved exploitation, but this was not explored. -%% I can actually do this, it just takes ~half an hour to an hour per simulation -%% Forecasters who extremize extremize two questions -%% Forecasters who extremize extremize three questions -%% Forecasters who extremize extremize four questions -%% If you can write the code I'll run it. -\newpage -\subsubsection{For continuous questions} - -As before, a continuous question elicits a probability distribution, and is resolved as a resolution value, where the different participants are rewarded in proportion to the probability density they assigned to that resolution value. They exist only in Metaculus (and on other experimental platforms, like foretold.io) - -We first ran 20,000 simulations of a tournament with 100 participants and 30 questions. Results for all 30 questions were drawn from a logistic distribution of mean 50 and standard deviation 10. - -The participants were divided into: - -\begin{enumerate} - \item One perfect predictor, which predicts a logistic with mean 50 and sd 10, i.e., the true underlying distribution. Represented in green. - \item 10 highly skilled predictors which are somewhat wrong, at various levels of systematic over or underconfidence. They predict a single logistic on each question with mean chosen randomly from 45-55 and standard deviation ranging from \{4,6 ...20,22\}. Represented in orange. - \item 10 highly skilled predictors, trying to stand out by extremizing some of their forecast distributions. They predict a single logistic on each question with mean chosen randomly from 45-55 and standard deviation 10 for 25 of the questions and standard deviation 5 for a (randomly selected) other 5. Represented in light greenish brown. - \item 20 unskilled predictors. They predict a single logistic with means for each question chosen randomly from 35-65 and standard deviation 10. Represented in light blue. - \item 20 unskilled, overconfident predictors. They predict a single logistic with means for each question chosen randomly from 35-65 and SD 5. Represented in dark blue. - \item 39 unskilled, underconfident predictors. They predict a single logistic with means for each question chosen randomly from 35-65 and SD 20. Represented in pink. -\end{enumerate} - -Scoring was according to the Metaculus score formula, though we used the mean of the scores instead of the score of the mean to simplify the process. %% [TO DO: Change] -%% Nuño: Maybe change this? -%% Agreed, it seems possible that it could make a significant difference, which would be a notable result if true. - -On such a tournament, it would be plausible for only the top 5 forecasters to be rewarded, so we present the probabilities of being in the top 5 for each group. - -We find that those who change the position of the mean slightly from the true distribution (i.e., groups 3, 4 and, to a lesser extent, 2), gain a 5\% absolute advantage in terms of getting into the top 5, and a 50\% relative advantage. Each forecaster who does this is $\approx 15\%$ likely to be successful in reaching the top 5. In contrast, the perfect predictor only has a $\approx 10\%$ chance of doing so. - -This can be explained by understanding that a forecaster who wants to reach a discrete ceiling faces a trade-off between the mean of the expected score, and the variance of that score. If the goal is to reach a threshold, increasing the variance at the expense of the mean turns out to in this case be worth it. Graphs which showcase this effect follow. -\\ -\begin{figure}[h!] - \includegraphics[width=\textwidth]{ProbContinuousHighlySkilledInTopWithNumQuestions} - \centering - \caption{Approximate probability (understood as ``frequency during simulations'') that a ``highly skilled forecaster'' will obtain the lowest Brier score, for a tournament with $n$ continuous questions} -\end{figure} - -%% TODO: Discussion of "bubbles" - -%\includegraphics{MeanScoreVsFreqTop5} - -%See \ref{Simul discrete} for numerical simulations which attempt to quantify this effect. We find that for $n=30$ questions, as in Metaculus tournaments, the effect is notable. In particular, consider the following simulations: - -% Alex: Referring to specific past tournaments is probably better here. Actually, other than the Academy Series and the Insight tournament, Metaculus has not run any 10-question discrete-only tournaments. The Insight tournament obviously changed to a probabilistic reward structure in response to these concerns, which might be worth discussing explicitly. -% Nuño: Do you want to add discussion to those past tournaments? I'm not as familiar. - -% Nuño: Removed graphs below as somewhat low quality. -%\newpage -% -%\begin{figure}[h!] -% \includegraphics[width=12cm]{ContinuousSimulationMeanScores} -% \centering -% \caption{Mean scores for continuous distributions} -%\end{figure} -% -%\begin{figure}[h!] -% \includegraphics[width=12cm]{ContinuousSimulationMeanPosition} -% \centering -% \caption{Mean position for continuous distributions} -%\end{figure} -% -%\newpage -% -%\begin{figure}[h!] -% \includegraphics[width=12cm]{ContinuousSimulationFrequencyInTop5} -% \centering -% \caption{Frequency in the top 5} -%\end{figure} -% -%\begin{figure}[h!] -% \includegraphics[width=12cm]{MeanScoreVsFreqTop5} -% \centering -% \caption{Mean score vs probability of being in the top 5} -%\end{figure} -% -%\newpage -%% Nuño: Maybe some graphs here seeing how this effect changes with more and more questions. - -\section{Extremization calculations} -\label{Extremization calculations} - -The expected ``participation-rate weighted Brier score'' for a question which closes on the nth day if the event happens, and or in the 5th day if the event hasn't happened by then, will be: - -\begin{equation} -\begin{split} - E[PWBS] &= 0.25 \cdot PWBS(\textnormal{The event happens in the first day}) \\ - &+ 0.75\cdot 0.25 \cdot PWBS(\textnormal{The event happens in the second day}) \\ - &+ 0.75^2\cdot 0.25 \cdot PWBS(\textnormal{The event happens in the third day}) \\ - &+ 0.75^3\cdot 0.25 \cdot PWBS(\textnormal{The event happens in the fourth day}) \\ - &+ 0.75^4\cdot PWBS(\textnormal{The event doesn't happen}) \\ -\end{split} -\end{equation} - -Now, the integral in (\ref{PWBS definition}) transforms into a simple sum, because probabilities stay constant throughout the day, so - -\begin{equation} -\begin{split} - &PWBS(\textnormal{The event happens in the nth day}) \\ - &= \frac{Brier(p(S_1)) + ... + Brier(p(S_n))}{n} -\end{split} -\end{equation} - -Likewise for the event not happening, except that $Brier(p)$ will be equal to $(0-p)^2$ instead of $(1-p)^2$: - -\begin{equation} -\begin{split} - &PWBS(\textnormal{The event doesn't happen}) \\ - &= \frac{Brier(p(S_1)) + ... + Brier(p(S_4))}{4} -\end{split} -\end{equation} -And so, the -\begin{equation} -\begin{split} - &E[PWBS] = 0.25 \cdot (1-p(S_1))^2\\ - &+ 0.75\cdot 0.25 \cdot \frac{ (1-p(S_1))^2 + (1-p(S_2))^2}{2} \\ - &+ 0.75^2\cdot 0.25 \cdot \frac{ (1-p(S_1))^2 + (1-p(S_2))^2 + (1-p(S_3))^2 }{3} \\ - &+ 0.75^3\cdot 0.25 \cdot \frac{ (1-p(S_1))^2 + (1-p(S_2))^2 + (1-p(S_3))^2 +(1-p(S_4))^2}{4} \\ - &+ 0.75^4 \cdot \frac{ (0-p(S_1))^2 + (0-p(S_2))^2 + (0-p(S_3))^2 +(0-p(S_4))^2}{4} \\ -\end{split} -\end{equation} - -Given this, we can calculate the expected ``participation weighted Brier score" for the true probabilities, and the extremized probabilities. - -\begin{equation} - \begin{split} - E[PWBS(\textnormal{honest probabilities)}] = 0.193 \\ - E[PWBS(\textnormal{extremized probabilities)}] = 0.182 - \end{split} -\end{equation} - -\end{appendices} -\end{document} - -\endinput -%% -%% End of file `elsarticle-template-harv.tex'. diff --git a/blog/2023/03/01/computable-solomoff/src/test.md b/blog/2023/03/01/computable-solomoff/src/test.md new file mode 100644 index 0000000..b33a943 --- /dev/null +++ b/blog/2023/03/01/computable-solomoff/src/test.md @@ -0,0 +1,29 @@ +\section{An distracting epicycle: dealing with Turing machines that take too long or do not halt.} + +When thinking about Turing machines, one might consider one particular model, e.g., valid C programs. But in that case, it is easy to consider programs that do not halt, like:\footnote{Readers might find it amusing to run gcc loop.c and check it}: + +\begin{verbatim} +void main(){ + while(1){ + ; + } +} +\end{verbatim} + +But then in step 2 of our scheme, we never get to see what bits this program outputs, because it never outputs any bits. + +This can easily be fixed as follows: + +\begin{enumerate} + \item Start with a finite set of live turing machines, $\{T_0, ..., T_n\}$, a compute budget $s$ and an empty cache of programs which take too long $C =\{\}$ + \item Run each $T_i$ for $s$ seconds. + \item If none of them predict your trail bits, $(B_0, ..., B_m)$, within the given compute budget: + \begin{enumerate} + \item eliminate the ones who make incorrect predictions. + \item keep track of programs which didn't output anything in the cache. + \item attempt to compute the first $m$ steps of Turing machine $T_{n+1}$ with $s$ seconds of compute. If it makes correct predictions, keep it in the set of live machines, otherwise move it to the cache. + \item increase the compute budget to $s + 1$ and run each machine in the cache for one additional second + \item Repeat step 2 until you have one program which has predicted past bits within your compute budget. Eventually this program must exist, since the Turing machine which is producing your trail of bits is by construction computable and non-halting.\footnote{Or at least, it hasn't halted before producing the number of bits that you have seen so far.} + \item + \end{enumerate} + \item Observe the next bit, purge the machines from your set which don't predict it. If none predict it, GOTO 2.