%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% Towards a Universal Theory of Artificial Intelligence %%
%% based on %%
%% Algorithmic Probability and Sequential Decision Theory %%
%% %%
%% Marcus Hutter: Start: 09.12.00 LastEdit: 16.12.00 %%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\newif\ifijcai\ijcaifalse % TechReport version
%-------------------------------%
% My Document-Style %
%-------------------------------%
\documentclass[10pt,twocolumn]{article}
\setlength\headheight{0pt} \setlength\headsep{0pt}
\topmargin=0cm \oddsidemargin=-1cm \evensidemargin=-1cm
\textwidth=18cm \textheight=23cm %\unitlength=1mm \sloppy
%-------------------------------%
% Macro-Definitions %
%-------------------------------%
\renewenvironment{abstract}{\centerline{\bf
Abstract}\vspace{0.5ex}\begin{quote}\small}{\par\end{quote}\vskip 1ex}
\newenvironment{keywords}{\centerline{\bf
Key Words}\vspace{0.5ex}\begin{quote}\small}{\par\end{quote}\vskip 1ex}
\def\eqd{\stackrel{\bullet}{=}}
\def\ff{\Longrightarrow}
\def\gdw{\Longleftrightarrow}
\def\toinfty#1{\stackrel{#1\to\infty}{\longrightarrow}}
\def\gtapprox{\buildrel{\lower.7ex\hbox{$>$}}\over
{\lower.7ex\hbox{$\sim$}}}
\def\nq{\hspace{-1em}}
\def\look{\(\uparrow\)}
\def\ignore#1{}
\def\deltabar{{\delta\!\!\!^{-}}}
\def\qed{\sqcap\!\!\!\!\sqcup}
\def\odt{{\textstyle{1\over 2}}}
\def\odf{{\textstyle{1\over 4}}}
\def\odA{{\textstyle{1\over A}}}
\def\hbar{h\!\!\!\!^{-}\,}
\def\dbar{d\!\!^{-}\!}
\def\eps{\varepsilon}
\def\beq{\begin{equation}}
\def\eeq{\end{equation}}
\def\beqn{\begin{displaymath}}
\def\eeqn{\end{displaymath}}
\def\bqa{\begin{equation}\begin{array}{c}}
\def\eqa{\end{array}\end{equation}}
\def\bqan{\begin{displaymath}\begin{array}{c}}
\def\eqan{\end{array}\end{displaymath}}
\def\pb{\underline} % probability notation
\def\pb#1{\underline{#1}} % probability notation
\def\blank{{\,_\sqcup\,}} % blank position
\def\maxarg{\mathop{\rm maxarg}} % maxarg
\def\minarg{\mathop{\rm minarg}} % minarg
\def\hh#1{{\dot{#1}}} % historic I/O
\def\best{*} % or {best}
\def\vec#1{{\bf #1}}
\def\length{{l}}
\ifijcai\def\paragraph#1{{\bf #1}}\fi
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% T i t l e - P a g e %
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\title{\bf \Large Towards a Universal Theory of Artificial Intelligence
based on \\ Algorithmic Probability and Sequential Decision Theory}
{\author{ Marcus Hutter \\[2mm]
{\small IDSIA, Galleria 2, CH-6928 Manno-Lugano, Switzerland} \\
{\small marcus@idsia.ch \qquad http://www.idsia.ch} \\
{\small Technical Report IDSIA-14-00, 16. December 2000}
}
\date{}
\begin{document}
\maketitle
\begin{abstract}
Decision theory formally solves the problem of rational agents in
uncertain worlds if the true environmental probability
distribution is known. Solomonoff's theory of universal induction
formally solves the problem of sequence prediction for unknown
distribution. We unify both theories and give strong arguments
that the resulting universal AI$\xi$ model behaves optimal in any
computable environment. The major drawback of the AI$\xi$ model is
that it is uncomputable. To overcome this problem, we construct a
modified algorithm AI$\xi^{tl}$, which is still superior to any
other time $t$ and space $l$ bounded agent. The computation time
of AI$\xi^{tl}$ is of the order $t\!\cdot\!2^l$.\\
\end{abstract}
\ifijcai\else
\begin{keywords}
Rational agents,
sequential decision theory, universal Solomonoff induction,
algorithmic probability, reinforcement learning, computational
complexity, theorem proving, probabilistic reasoning, Kolmogorov
complexity, Levin search.
\end{keywords}
\fi
% ACM Classification
%I.2; I.2.3; I.2.6; I.2.8; F.1.3; F.2
%I.2. Artificial Intelligence,
%I.2.3. Deduction and Theorem Proving
%I.2.6. Learning
%I.2.8. Problem Solving, Control Methods and Search
%F.1.3. Complexity Classes
%F.2. Analysis of Algorithms and Problem Complexity
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Introduction}\label{int}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
The most general framework for Artificial Intelligence is the
picture of an {\em agent} interacting with an environment
\cite{Russell:95}. If the goal is not pre-specified, the agent has
to learn by occasional reinforcement feedback \cite{Sutton:98}. If
the agent shall be universal, no assumption about the environment
may be made, besides that there {\it exists} some exploitable
structure at all. We may ask for the most intelligent way an agent
could behave, or, about the optimal way of learning in terms of
real world interaction cycles. {\em Decision theory}
formally\footnote{With a formal solution we mean a rigorous
mathematically definition, uniquely specifying the solution. For
problems considered here this always implies the existence of an
algorithm which asymptotically converges to the correct solution.}
solves this problem only if the true environmental probability
distribution is known (e.g. Backgammon)
\cite{Bellman:57,Bertsekas:96}. \cite{Solomonoff:64,Solomonoff:78}
formally solves the problem of {\em induction} if the true
distribution is unknown but only if the agent cannot influence the
environment (e.g.\ weather forecasts) \cite{Li:97}. We combine
both ideas and get {\em a parameterless model AI$\xi$ of an acting
agent which we claim to behave optimally in any computable
environment} (e.g.\ prisoner or auction problems, poker, car
driving). To get an effective solution, a modification
AI$\xi^{tl}$, superior to any other time $t$ and space $l$ bounded
agent, is constructed. The computation time of AI$\xi^{tl}$ is of
the order $t\!\cdot\!2^l$. The main goal of this work is to derive
and discuss the AI$\xi$ and the AI$\xi^{tl}$ model, and to clarify
the meaning of {\it universal}, {\it optimal}, {\it superior},
{\it etc}. Details can be found in \cite{Hutter:00f}.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Rational Agents \& Sequential Decisions}\label{secAImurec}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%------------------------------%
\paragraph{Agents in probabilistic environments:}
%------------------------------%
A very general framework for intelligent systems is that of
rational agents \cite{Russell:95}. In cycle $k$, an agent performs
{\em action} $y_k\!\in\!Y$ (output word) which results in a {\em
perception} $x_k\!\in\!X$ (input word), followed by cycle
$k\!+\!1$ and so on. If agent and environment are deterministic
and computable, the entanglement of both can be modeled by two
Turing machines with two common tapes (and some private tapes)
containing the action stream $y_1y_2y_3...$ and the perception
stream $x_1x_2x_3...$ (The meaning of $x_k\!\equiv\!x'_kr_k$ is
explained in the next paragraph):
\begin{center}\label{cyberpic}
\small\unitlength=0.8mm
\special{em:linewidth 0.4pt}
\linethickness{0.4pt}
\begin{picture}(106,47)
\thinlines
\put(1,41){\framebox(10,6)[cc]{$x'_1$}}
\put(11,41){\framebox(6,6)[cc]{$r_1$}}
\put(17,41){\framebox(10,6)[cc]{$x'_2$}}
\put(27,41){\framebox(6,6)[cc]{$r_2$}}
\put(33,41){\framebox(10,6)[cc]{$x'_3$}}
\put(43,41){\framebox(6,6)[cc]{$r_3$}}
\put(49,41){\framebox(10,6)[cc]{$x'_4$}}
\put(59,41){\framebox(6,6)[cc]{$r_4$}}
\put(65,41){\framebox(10,6)[cc]{$x'_5$}}
\put(75,41){\framebox(6,6)[cc]{$r_5$}}
\put(81,41){\framebox(10,6)[cc]{$x'_6$}}
\put(91,41){\framebox(6,6)[cc]{$r_6$}}
\put(102,44){\makebox(0,0)[cc]{...}}
\put(1,1){\framebox(16,6)[cc]{$y_1$}}
\put(17,1){\framebox(16,6)[cc]{$y_2$}}
\put(33,1){\framebox(16,6)[cc]{$y_3$}}
\put(49,1){\framebox(16,6)[cc]{$y_4$}}
\put(65,1){\framebox(16,6)[cc]{$y_5$}}
\put(81,1){\framebox(16,6)[cc]{$y_6$}}
\put(102,4){\makebox(0,0)[cc]{...}}
\put(97,47){\line(1,0){9}}
\put(97,41){\line(1,0){9}}
\put(97,7){\line(1,0){9}}
\put(97,1){\line(0,0){0}}
\put(97,1){\line(1,0){9}}
\put(1,21){\framebox(16,6)[cc]{working}}
\thicklines
\put(17,17){\framebox(20,14)[cc]{$\displaystyle{Agent\atop\bf p}$}}
\thinlines
\put(37,27){\line(1,0){14}}
\put(37,21){\line(1,0){14}}
\put(39,24){\makebox(0,0)[lc]{tape ...}}
\put(56,21){\framebox(16,6)[cc]{working}}
\thicklines
\put(72,17){\framebox(20,14)[cc]{$\displaystyle{Environ-\atop ment\quad\bf q}$}}
\thinlines
\put(92,27){\line(1,0){14}}
\put(92,21){\line(1,0){14}}
\put(94,24){\makebox(0,0)[lc]{tape ...}}
\thicklines
\put(54,41){\vector(-3,-1){29}}
\put(84,31){\vector(-3,1){30}}
\put(54,7){\vector(3,1){30}}
\put(25,17){\vector(3,-1){29}}
\end{picture}
\end{center}
$p$ is the {\em policy} of the agent interacting with environment
$q$. We write $p(x_{\!\tilde l$.
\item Modify all $p$ in the following way: all output $w_k^py_k^p$
is temporarily written on an auxiliary tape. If $p$ stops in $\tilde t$
steps the internal 'output' is copied to the output tape. If $p$
does not stop after $\tilde t$ steps a stop is forced and $w_k^p\!=\!0$
and some arbitrary $y_k^p$ is written on the output tape. Let ${\cal P}$ be
the set of all those modified programs.
\item Start first cycle: $k\!:=\!1$.
\item\label{pbestloop} Run every $p\!\in\!{\cal P}$ on extended input
$\hh y\!\hh x_{\!1$ to
the input tape and continuing the computation of the previous
cycle.
\item Select the program $p$ with highest rating $w_k^p$:
$p_k^\best\!:=\!\maxarg_pw_k^p$.
\item Write $\hh y_k\!:=\!y_k^{p_k^\best}$ to the output tape.
\item Receive input $\hh x_k$ from the environment.
\item Begin next cycle: $k\!:=\!k\!+\!1$, goto step
\ref{pbestloop}.
\end{enumerate}
%------------------------------%
\paragraph{Properties of the $p^\best$ algorithm:}
%------------------------------%
Let $p$ be any extended chronological (incremental) policy of
length $l(p)\!\leq\!\tilde l$ and computation time per cycle
$t(p)\!\leq\!\tilde t$, for which there exists a proof of VA($p$)
of length $\leq\!l_P$. The algorithm $p^\best$, depending on
$\tilde l$, $\tilde t$ and $l_P$ but not on $p$, has always higher
rating than any such $p$. The setup time of $p^\best$ is
$t_{setup}(p^\best)\!=\!O(l_P^2\!\cdot\!2^{l_P})$ and the
computation time per cycle is $t_{cycle}(p^\best)\!=\!O(2^{\tilde
l}\!\cdot\!\tilde t)$. Furthermore, for $\tilde t,\tilde
l\!\to\!\infty$, $p^\best$ converges to the behavior of the AI$\xi$
model.
Roughly speaking, this means that if there exists a computable
solution to some AI problem at all, then the explicitly
constructed algorithm $p^\best$ is such a solution. Although this
claim is quite general, there are some limitations and open
questions, regarding the setup time regarding the necessity that
the policies must rate their own output, regarding true but not
efficiently provable VA($p$), and regarding ``inconsistent''
policies \cite{Hutter:00f}.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Outlook \& Discussion}\label{secOutlook}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
This section contains some discussion and remarks on otherwise
unmentioned topics.
%------------------------------%
\paragraph{Value bounds:}
%------------------------------%
Rigorous proofs of value bounds for the AI$\xi$ theory are the
major theoretical challenge -- general ones as well as tighter
bounds for special environments $\mu$. Of special importance are
suitable (and acceptable) conditions to $\mu$, under which $\hh
y_k$ and finite value bounds exist for infinite $Y$, $X$ and $m$.
%------------------------------%
\paragraph{Scaling AI$\xi$ down:}
%------------------------------%
\cite{Hutter:00f} shows for several examples how to integrate
problem classes into the AI$\xi$ model. Conversely, one can
downscale the AI$\xi$ model by using more restricted forms of
$\xi$. This could be done in a similar way as the theory of
universal induction has been downscaled with many insights to the
Minimum Description Length principle \cite{Li:92b,Rissanen:89} or
to the domain of finite automata \cite{Feder:92}. The AI$\xi$
model might similarly serve as a super model or as the very
definition of (universal unbiased) intelligence, from which
specialized models could be derived.
%------------------------------%
\paragraph{Applications:}
%------------------------------%
\cite{Hutter:00f} shows how a number of AI problem classes,
including {\em sequence prediction}, {\em strategic games}, {\em
function minimization} and {\em supervised learning} fit into
the general AI$\xi$ model. All problems are claimed to be formally
solved by the AI$\xi$ model. The solution is, however, only
formal, because the AI$\xi$ model is uncomputable or, at best,
approximable. First, each problem class is formulated in its
natural way (when $\mu^{\mbox{\tiny problem}}$ is known) and then
a formulation within the AI$\mu$ model is constructed and their
equivalence is proven. Then, the consequences of replacing $\mu$
by $\xi$ are considered. The main goal is to understand
how the problems are solved by AI$\xi$. For more details see
\cite{Hutter:00f}.
%------------------------------%
\paragraph{Implementation and approximation:}
%------------------------------%
The AI$\xi^{\tilde t\tilde l}$ model suffers from the same large
factor $2^{\tilde l}$ in computation time as Levin search for
inversion problems
\ifijcai\cite{Levin:73}.
\else\cite{Levin:73,Levin:84}.
\fi
Nevertheless, Levin
search has been implemented and successfully applied to a variety
of problems \cite{Schmidhuber:97nn,Schmidhuber:97bias}. Hence, a direct
implementation of the AI$\xi^{\tilde t\tilde l}$ model may also be
successful, at least in toy environments, e.g.\ prisoner problems.
The AI$\xi^{\tilde t\tilde l}$ algorithm should be regarded only
as the first step toward a {\em computable universal AI model}.
Elimination of the factor $2^{\tilde l}$ without giving up
universality will probably be a very difficult task. One could try
to select programs $p$ and prove VA($p$) in a more clever way than
by mere enumeration. All kinds of ideas like, heuristic search,
genetic algorithms, advanced theorem provers, and many more could
be incorporated. But now we have a problem.
%------------------------------%
\paragraph{Computability:}
%------------------------------%
We seem to have transferred the AI problem just to a different
level. This shift has some advantages (and also some
disadvantages) but presents, in no way, a solution. Nevertheless,
we want to stress that we have reduced the AI problem to (mere)
computational questions. Even the most general other systems the
author is aware of, depend on some (more than complexity)
assumptions about the environment, or it is far from clear whether
they are, indeed, universally optimal. Although computational
questions are themselves highly complicated, this reduction is a
non-trivial result. A formal theory of something, even if not
computable, is often a great step toward solving a problem and has
also merits of its own (see previous paragraphs).
%------------------------------%
\paragraph{Elegance:}
%------------------------------%
Many researchers in AI believe that intelligence is something
complicated and cannot be condensed into a few formulas. They
believe it is more a combining of enough {\em methods} and much
explicit {\em knowledge} in the right way. From a theoretical
point of view, we disagree as the AI$\xi$ model is simple and
seems to serve all needs. From a practical point of view we agree
to the following extent. To reduce the computational burden one
should provide special purpose algorithms ({\em methods}) from the
very beginning, probably many of them related to reduce the
complexity of the input and output spaces $X$ and $Y$ by
appropriate pre/post-processing methods.
%------------------------------%
\paragraph{Extra knowledge:}
%------------------------------%
There is no need to incorporate extra {\em knowledge} from the
very beginning. It can be presented in the first few cycles in
{\it any} format. As long as the algorithm that interprets the
data is of size $O(1)$, the AI$\xi$ system will 'understand' the
data after a few cycles (see \cite{Hutter:00f}). If the
environment $\mu$ is complicated but extra knowledge $z$ makes
$K(\mu|z)$ small, one can show that the bound (\ref{eukdistxi})
reduces to $\ln 2\!\cdot\!K(\mu|z)$ when $x_1\!\equiv\!z$, i.e.\
when $z$ is presented in the first cycle. Special purpose
algorithms could also be presented in $x_1$, but it would be
cheating to say that no special purpose algorithms have been
implemented in AI$\xi$. The boundary between implementation and
training is blurred in the AI$\xi$ model.
%------------------------------%
\paragraph{Training:}
%------------------------------%
We have not said much about the training process itself, as it is
not specific to the AI$\xi$ model and has been discussed in
literature in various forms and disciplines. A serious discussion
would be out of place. To repeat a truism, it is, of course,
important to present enough knowledge $x'_k$ and evaluate the
system output $y_k$ with $r_k$ in a reasonable way. To maximize
the information content in the reward, one should start with
simple tasks and give positive reward to approximately
the better half of the outputs $y_k$, for instance.
%------------------------------%
\paragraph{The big questions:}
%------------------------------%
\cite{Hutter:00f} contains a discussion of the ``big'' questions
concerning the mere existence of any computable, fast, and elegant
universal theory of intelligence, related to non-computable $\mu$
\cite{Penrose:94} and the `number of wisdom' $\Omega$
\cite{Chaitin:75,Chaitin:91}.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Bibliography %
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{thebibliography}{FMG92}
\bibitem[Bel57]{Bellman:57}
R.~Bellman.
\newblock {\em Dynamic Programming}.
\newblock Princeton University Press, New Jersey, 1957.
\bibitem[Ber95]{Bertsekas:95b}
D.~P. Bertsekas.
\newblock {\em Dynamic Programming and Optimal Control, Vol. (II)}.
\newblock Athena Scientific, Belmont, Massachusetts, 1995.
\bibitem[BT96]{Bertsekas:96}
D.~P. Bertsekas and J.~N. Tsitsiklis.
\newblock {\em Neuro-Dynamic Programming}.
\newblock Athena Scientific, Belmont, MA, 1996.
\bibitem[Cha75]{Chaitin:75}
G.~J. Chaitin.
\newblock A theory of program size formally identical to information theory.
\newblock {\em Journal of the ACM}, 22(3):329--340, 1975.
\bibitem[Cha91]{Chaitin:91}
G.~J. Chaitin.
\newblock Algorithmic information and evolution.
\newblock {\em in O.T. Solbrig and G. Nicolis, Perspectives on Biological
Complexity, IUBS Press}, pages 51--60, 1991.
\bibitem[FMG92]{Feder:92}
M.~Feder, N.~Merhav, and M.~Gutman.
\newblock Universal prediction of individual sequences.
\newblock {\em {IEEE} Transactions on Information Theory}, 38:1258--1270, 1992.
\bibitem[G\'74]{Gacs:74}
P.~G\'acs.
\newblock On the symmetry of algorithmic information.
\newblock {\em Russian Academy of Sciences Doklady. Mathematics (formerly
Soviet Mathematics--Doklady)}, 15:1477--1480, 1974.
\bibitem[Hut99]{Hutter:99}
M.~Hutter.
\newblock New error bounds for {Solomonoff} prediction.
\newblock {\em Journal of Computer and System Science, in press},
(IDSIA-11-00):1--13, 1999.
\newblock ftp://ftp.idsia.ch/pub/techrep/IDSIA-11-00.ps.gz.
\bibitem[Hut00a]{Hutter:00e}
M.~Hutter.
\newblock Optimality of universal prediction for general loss and alphabet.
\newblock Technical Report IDSIA-15-00, Istituto Dalle Molle di Studi
sull'Intelligenza Artificiale, Manno(Lugano), Switzerland, 2000.
\newblock In progress.
\bibitem[Hut00b]{Hutter:00f}
M.~Hutter.
\newblock A theory of universal artificial intelligence based on algorithmic
complexity.
\newblock Technical report, 2000.
\newblock 62 pages, http://xxx.lanl.gov/abs/cs.AI/0004001.
\bibitem[Kol65]{Kolmogorov:65}
A.~N. Kolmogorov.
\newblock Three approaches to the quantitative definition of information.
\newblock {\em Problems of Information and Transmission}, 1(1):1--7, 1965.
\bibitem[Lev73]{Levin:73}
L.~A. Levin.
\newblock Universal sequential search problems.
\newblock {\em Problems of Information Transmission}, 9:265--266, 1973.
\bibitem[Lev74]{Levin:74}
L.~A. Levin.
\newblock Laws of information conservation (non-growth) and aspects of the
foundation of probability theory.
\newblock {\em Problems of Information Transmission}, 10:206--210, 1974.
\bibitem[Lev84]{Levin:84}
L.~A. Levin.
\newblock Randomness conservation inequalities: Information and independence in
mathematical theories.
\newblock {\em Information and Control}, 61:15--37, 1984.
\bibitem[LK96]{Kaelbling:96}
A.W.~Moore L.P.~Kaelbling, M.L.~Littman.
\newblock Reinforcement learning: a survey.
\newblock {\em Journal of AI research}, 4:237--285, 1996.
\bibitem[LV91]{Li:91}
M.~Li and P.~M.~B. Vit\'anyi.
\newblock Learning simple concepts under simple distributions.
\newblock {\em SIAM Journal on Computing}, 20(5):911--935, 1991.
\bibitem[LV92]{Li:92b}
M.~Li and P.~M.~B. Vit\'anyi.
\newblock Inductive reasoning and {Kolmogorov} complexity.
\newblock {\em Journal of Computer and System Sciences}, 44:343--384, 1992.
\bibitem[LV97]{Li:97}
M.~Li and P.~M.~B. Vit\'anyi.
\newblock {\em An introduction to {Kolmogorov} complexity and its
applications}.
\newblock Springer, 2nd edition, 1997.
\bibitem[Pen94]{Penrose:94}
R.~Penrose.
\newblock {\em Shadows of the mind, {A} search for the missing science of
consciousness}.
\newblock Oxford Univ. Press, 1994.
\bibitem[Ris89]{Rissanen:89}
J.~Rissanen.
\newblock {\em Stochastic Complexity in Statistical Inquiry}.
\newblock World Scientific Publ. Co., 1989.
\bibitem[RN95]{Russell:95}
S.~J. Russell and P.~Norvig.
\newblock {\em Artificial Intelligence. {A} Modern Approach}.
\newblock Prentice-Hall, Englewood Cliffs, 1995.
\bibitem[SB98]{Sutton:98}
R.~Sutton and A.~Barto.
\newblock {\em Reinforcement learning: An introduction}.
\newblock Cambridge, MA, MIT Press, 1998.
\bibitem[Sch97]{Schmidhuber:97nn}
J.~Schmidhuber.
\newblock Discovering neural nets with low {Kolmogorov} complexity and high
generalization capability.
\newblock {\em Neural Networks}, 10(5):857--873, 1997.
\bibitem[Sol64]{Solomonoff:64}
R.~J. Solomonoff.
\newblock A formal theory of inductive inference: Part 1 and 2.
\newblock {\em Inform. Control}, 7:1--22, 224--254, 1964.
\bibitem[Sol78]{Solomonoff:78}
R.~J. Solomonoff.
\newblock Complexity-based induction systems: comparisons and convergence
theorems.
\newblock {\em IEEE Trans. Inform. Theory}, IT-24:422--432, 1978.
\bibitem[SZW97]{Schmidhuber:97bias}
J.~Schmidhuber, J.~Zhao, and M.~Wiering.
\newblock Shifting inductive bias with success-story algorithm, adaptive
{Levin} search, and incremental self-improvement.
\newblock {\em Machine Learning}, 28:105--130, 1997.
\end{thebibliography}
\end{document}
%---------------------------------------------------------------