%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%          An Open Problem Regarding the Convergence        %%
%%             of Universal A Priori Probability             %%
%%              Marcus Hutter: Start: 01.05.03               %%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\documentclass[12pt]{article}
\topmargin=0mm  \oddsidemargin=0mm \evensidemargin=0mm
\textwidth=16cm \textheight=23cm
%\def\baselinestretch{0.97}

\sloppy
\thinmuskip=0mu
\medmuskip=1mu plus 1mu minus 1mu
\thickmuskip=2mu plus 3mu minus 1mu
\def\paradot#1{\vspace{0ex}\noindent{\bf{#1.}}}
\def\paranodot#1{\vspace{0ex}\noindent{\bf{#1}}}

\begin{document}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%                      T i t l e - P a g e                      %
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\title{\vskip -10mm\bf\Large\hrule height5pt \vskip 3mm
\sc An Open Problem Regarding the Convergence \\ of Universal A Priori Probability
\vskip 4mm \hrule height2pt \vskip 3mm}
\author{{\bf Marcus Hutter}\footnote{A prize
of 128 Euro for a solution of this problem is offered.}\\[3mm]
\normalsize IDSIA, Galleria 2, CH-6928\ Manno-Lugano, Switzerland\\
\normalsize marcus@idsia.ch \hspace{8.5ex} http://www.idsia.ch/$^{_{_\sim}}\!$marcus}
\date{10 June 2003}
\maketitle

\vspace{-4ex}
\begin{abstract}
\noindent Is the textbook result that Solomonoff's universal
posterior converges to the true posterior for all Martin-L{\"o}f
random sequences true?
\end{abstract}
\vspace{-1ex}

%------------------------------%
\paradot{Universal induction}
%------------------------------%
Induction problems can be phrased as sequence prediction
tasks. This is, for instance, obvious for time series prediction,
but also includes classification tasks. Having observed data $x_t$
at times $t<n$, the task is to predict the $t$-th symbol $x_t$
from sequence $x=x_1...x_{t-1}$.
%
The key concept to attack general induction problems is
Occam's razor and to a less extent Epicurus' principle of
multiple explanations. The former/latter may be interpreted as to
keep the simplest/all theories consistent with the observations
$x_1...x_{t-1}$ and to use these theories to predict $x_t$.
%
Solomonoff \cite{Solomonoff:64,Solomonoff:78} formalized and
combined both principles in his universal prior $M(x)$ which
assigns high/low probability to simple/complex environments, hence
implementing Occam and Epicurus.
$M(x)$ is defined as the probability that a universal Turing
machine $U$ outputs a string starting with $x$, when provided
with fair coin flips on the input tape.

%------------------------------%
\paradot{Posterior convergence}
%------------------------------%
Solomonoff's \cite{Solomonoff:78} central result is that if the
probability $\mu(x_t|x_1...x_{t-1})$ of observing $x_t$ at time
$t$, given past observations $x_1...x_{t-1}$ is a computable
function, then the universal posterior
$M_t:=M(x_t|x_1...x_{t-1})$ converges (rapidly!) {\em with
$\mu$-probability 1} (w.p.1) for $t\to\infty$ to the true
posterior $\mu_t:=\mu(x_t|x_1...x_{t-1})$, hence $M$ represents
a universal predictor in case of unknown $\mu$.
%
Convergence of $M_t$ to $\mu_t$ w.p.1 tells us that $M_t$ is close
to $\mu_t$ for sufficiently large $t$ for ``almost all'' sequences
$x_{1:\infty}$ (we abbreviate $x_{1:n}:=x_1...x_n$). It says
nothing about whether convergence is true for any {\em particular}
sequence (of measure 0).

%------------------------------%
\paranodot{Martin-L{\"o}f randomness}
%------------------------------%
is the standard notion to capture convergence for
individual sequences and is closely related to Solomonoff's
universal prior. Levin gave a characterization equivalent to
Martin-L\"{o}f's (M.L.) original definition \cite{Levin:73random}:

{\em A sequence $x_{1:\infty}$ is $\mu$-random (in the sense of M.L.)
{\em iff} there is a constant $c$ such that
$M(x_{1:n})\leq c\cdot \mu(x_{1:n})$ for all $n$.}

One can show that a $\mu$-random sequence $x_{1:\infty}$
passes {\em all} thinkable effective randomness tests, e.g.\ the
law of large numbers, the law of the iterated logarithm, etc.
In particular, the set of all $\mu$-random sequences has
$\mu$-measure 1.

%------------------------------%
\paradot{Open problem}
%------------------------------%
An interesting open question is whether $M_t$ converges to
$\mu_t$ (in difference or ratio) individually for all
Martin-L\"{o}f random sequences. Clearly, Solomonoff's result
shows that convergence may at most fail for a set of sequences
with $\mu$-measure zero. A convergence result for $\mu$-random
sequences is particularly interesting and natural in this context,
since $\mu$-randomness can be defined in terms of $M$ itself (see
above).

%------------------------------%
\paradot{Proof attempts}
%------------------------------%
Attempts to
convert the convergence results w.p.1 to effective
$\mu$-randomness tests fail, since $M_t$ is not
lower semi-computable.
%
In \cite[Th.5.2.2]{Li:97} and \cite[Th.10]{Vitanyi:00} the following
Theorem is stated:

``{\it Let $\mu$ be a positive recursive
measure. If the length of $y$ is fixed and the length of $x$ grows
to infinity, then $M(y|x)/\mu(y|x)\to 1$ with $\mu$-probability
one. The infinite sequences $\omega$ with prefixes $x$ satisfying
the displayed asymptotics are precisely [`$\Rightarrow$' {\em and}
`$\Leftarrow$'] the $\mu$-random sequences.}''

While convergence w.p.1 is correct if appropriately
interpreted,$\!$\footnote{The formulation of the Theorem is
quite misleading in general: First, for off-sequence $y$
convergence w.p.1 does not hold ($xy$ must be demanded to be a
prefix of $x_{1:\infty}$). Second, the proof of `$\Leftarrow$' has
loopholes (see main text). Last, `$\Rightarrow$' is given without
proof and is probably wrong. Also the assertion in
\cite[Th.5.2.1]{Li:97} that $S_t$ converges to zero faster than
$1/t$ cannot be made, since $S_t$ may not decrease monotonically.}
the proof that convergence holds for $\mu$-random sequences
is incomplete: ``{\it $M(x_{1:n})\leq c\cdot\mu(x_{1:n})\forall
n\Rightarrow \lim_{n\to\infty}M(x_{1:n})/\mu(x_{1:n})$ exists}''
has been used, but not proven, and may indeed be wrong.
%
Vovk \cite{Vovk:87} shows that for two finitely computable
semi-measures $\mu$ and $\rho$, and $x_{1:\infty}$ being $\mu$-
{\em and} $\rho$-random that $\rho_t/\mu_t\to 1$. If $M$ were
recursive, then this would imply $M_t/\mu_t\to 1$ for every
$\mu$-random sequence $x_{1:\infty}$, since {\em every} sequence
is $M$-random. Since $M$ is {\em not} recursive Vovk's theorem
cannot be applied and it is not obvious how to generalize it. So
the question of individual convergence remains open.

%------------------------------%
\paradot{Conclusions}
%------------------------------%
Contrary to what was believed before, the question of posterior
convergence $M_t/\mu_t\to 1$ (also $M_t\to\mu_t$) for all
$\mu$-random sequences is still open. In \cite{Hutter:03unipriors}
we introduce a new flexible notion of randomness which contains
Martin-L{\"of} randomness as a special case. This notion is
used to show that standard proof attempts of
$M_t/\mu_t\stackrel{M.L.}\longrightarrow 1$ based on so called
dominance only must fail, indicating that this problem may be a
hard one.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%         Bibliography        %
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\vspace{-3ex}
\def\refname{\normalsize\bf References.\vspace{-2ex}}
{\small
\begin{thebibliography}{Vov87}
\parskip=0ex plus 0.5ex\itemsep=0ex

\bibitem[Hut03]{Hutter:03unipriors}
M.~Hutter.
\newblock On the existence and convergence of computable universal priors.
\newblock Technical Report IDSIA-05-03, 2003.
\newblock http://arxiv.org/abs/cs.LG/0305052.

\bibitem[Lev73]{Levin:73random}
L.~A. Levin.
\newblock On the notion of a random sequence.
\newblock {\em Soviet Math. Dokl.}, 14(5):1413--1416, 1973.

\bibitem[LV97]{Li:97}
M.~Li and P.~M.~B. Vit\'anyi.
\newblock {\em An introduction to {Kolmogorov} complexity and its
  applications}.
\newblock Springer, 2nd edition, 1997.

\bibitem[Sol64]{Solomonoff:64}
R.~J. Solomonoff.
\newblock A formal theory of inductive inference: Part 1 and 2.
\newblock {\em Inform. Control}, 7:1--22, 224--254, 1964.

\bibitem[Sol78]{Solomonoff:78}
R.~J. Solomonoff.
\newblock Complexity-based induction systems: comparisons and convergence
  theorems.
\newblock {\em IEEE Trans. Inform. Theory}, IT-24:422--432, 1978.

\bibitem[VL00]{Vitanyi:00}
P.~M.~B. Vit\'anyi and M.~Li.
\newblock Minimum description length induction, {B}ayesianism, and {K}olmogorov
  complexity.
\newblock {\em IEEE Trans. on Inform. Theory}, 46(2):446--464, 2000.

\bibitem[Vov87]{Vovk:87}
V.~G. Vovk.
\newblock On a randomness criterion.
\newblock {\em Soviet Mathematics Doklady}, 35(3):656--660, 1987.

\end{thebibliography}
}

\end{document}
%---------------------End-of-UniPriors.tex--------------------%