%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% Universal Algorithmic Intelligence: %%
%% A mathematical top->down approach %%
%% %%
%% A Gentle Introduction to the Universal Algorithmic Agent AIXI %%
%% %%
%% Marcus Hutter: Start: 20.09.02 %%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%-------------------------------%
% Document-Style %
%-------------------------------%
\newif\ifagibook\agibookfalse
\newif\ifpublished\publishedtrue
\documentclass[12pt,twoside]{article}
\usepackage{latexsym}
\usepackage{makeidx}
\pagestyle{myheadings}
\markboth{\sc Marcus Hutter, Technical Report, IDSIA-01-03
}{\sc Universal Algorithmic Intelligence}
\setcounter{tocdepth}{4}
\setcounter{secnumdepth}{3}
\topmargin=0cm \oddsidemargin=5mm \evensidemargin=5mm
\textwidth=15cm \textheight=22cm
\sloppy
\makeindex
%-------------------------------%
% My Math-Spacings %
%-------------------------------%
\def\,{\mskip 3mu} \def\>{\mskip 4mu plus 2mu minus 4mu} \def\;{\mskip 5mu plus 5mu} \def\!{\mskip-3mu}
\def\dispmuskip{\thinmuskip= 3mu plus 0mu minus 2mu \medmuskip= 4mu plus 2mu minus 2mu \thickmuskip=5mu plus 5mu minus 2mu}
\def\textmuskip{\thinmuskip= 0mu \medmuskip= 1mu plus 1mu minus 1mu \thickmuskip=2mu plus 3mu minus 1mu}
\textmuskip
\def\beq{\dispmuskip\begin{equation}} \def\eeq{\end{equation}\textmuskip}
\def\beqn{\dispmuskip\begin{displaymath}}\def\eeqn{\end{displaymath}\textmuskip}
\def\bqa{\dispmuskip\begin{eqnarray}} \def\eqa{\end{eqnarray}\textmuskip}
\def\bqan{\dispmuskip\begin{eqnarray*}} \def\eqan{\end{eqnarray*}\textmuskip}
%Default Math-Spacings
%\def\beq{\begin{equation}} \def\eeq{\end{equation}}
%\def\beqn{\begin{displaymath}}\def\eeqn{\end{displaymath}}
%\def\bqa{\begin{eqnarray}} \def\eqa{\end{eqnarray}}
%\def\bqan{\begin{eqnarray*}} \def\eqan{\end{eqnarray*}}
%-------------------------------%
% Macro-Definitions %
%-------------------------------%
\newenvironment{keywords}{\centerline{\bf
Keywords}\vspace{0.5ex}\begin{quote}\small}{\par\end{quote}\vskip
1ex}
\renewenvironment{abstract}{\centerline{\bf
Abstract}\vspace{0.5ex}\begin{quote}\small}{\par\end{quote}\vskip
1ex}
\newtheorem{theorem}{Theorem}
\newtheorem{tablex}[theorem]{Table}
\newtheorem{figurex}[theorem]{Figure}
\newtheorem{claim}[theorem]{Claim}
\newtheorem{axiom}[theorem]{Axiom}
\newtheorem{assumption}[theorem]{Assumption}
\newtheorem{corollary}[theorem]{Corollary}
\newtheorem{lemma}[theorem]{Lemma}
\newtheorem{definition}[theorem]{Definition}
\def\ftablex#1#2#3{\begin{tablex}[#2]\label{#1} #3 \end{tablex} }
\def\ffigurex#1#2#3#4{{#4} \begin{figurex}[#2]\label{#1} #3 \end{figurex} }
\def\fclaim#1#2#3{\begin{claim}[#2]\label{#1} #3 \end{claim} }
\def\faxiom#1#2#3{\begin{axiom}[#2]\label{#1} #3 \end{axiom} }
\def\fassumption#1#2#3{\begin{assumption}[#2]\label{#1} #3 \end{assumption} }
\def\ftheorem#1#2#3{\begin{theorem}[#2]\label{#1} #3 \end{theorem} }
\def\fcorollary#1#2#3{\begin{corollary}[#2]\label{#1} #3 \end{corollary} }
\def\flemma#1#2#3{\begin{lemma}[#2]\label{#1} #3 \end{lemma} }
\def\fdefinition#1#2#3{\begin{definition}[#2]\label{#1} #3 \end{definition} }
\def\aidx#1{\protect\index{#1}} %Authors & editors to index
\def\idx#1{\index{#1}#1} %\idx{name} for also in text
\def\indxs#1#2{\index{#1!#2}\index{#2!#1}} %\idx{name} for also in text
\def\paragraph#1{\vspace{1ex}\noindent{\bf #1.}}
\def\toinfty#1{\stackrel{#1\to\infty}{\longrightarrow}}
\def\nq{\hspace{-1em}}
\def\qed{\hspace*{\fill}$\Box$}
\def\odt{{\textstyle{1\over 2}}}
\def\odf{{\textstyle{1\over 4}}}
\def\odA{{\textstyle{1\over A}}}
\def\odm{{\textstyle{1\over m}}}
\def\odn{{\textstyle{1\over n}}}
\def\eps{\varepsilon}
\def\epstr{\epsilon} % for empty string
\def\pb#1{\def\pb{\underline}\underline{#1}} % probability notation
\def\hh#1{{\dot{#1}}} % historic I/O
\def\best{*} % or {best}
\def\l{{\ell}} % length of string or program
\def\approxleq{\mbox{\raisebox{-0.8ex}{$\stackrel{\displaystyle<}\sim$}}} %% make nicer
\def\approxgeq{\mbox{\raisebox{-0.8ex}{$\stackrel{\displaystyle>}\sim$}}} %% make nicer
\def\equa{\stackrel+=} %\eqa unfortunately already used up
\def\leqa{\stackrel+\leq} % don't put around another braces,
\def\geqa{\stackrel+\geq} % since this messes up spacings
\def\eqm{\stackrel\times=} % for some reason
\def\leqm{\stackrel\times\leq}
\def\geqm{\stackrel\times\geq}
%\def\EC{{\text{EC}}}
\def\AI{{\text{AI}}}
\def\SP{{\text{SP}}}
\def\CF{{\text{CF}}}
\def\SG{{\text{SG}}}
\def\FM{{\text{FM}}}
%\def\EX{{\text{EX}}}
\def\rhoai{{\rho}}
\def\M{{\cal M}}
\def\X{{\cal X}}
\def\Y{{\cal Y}}
\def\O{{\cal O}}
\def\F{{\cal F}}
\def\R{{\cal R}}
\def\E{{\bf E}}
\def\P{{\bf P}}
\def\B{{I\!\!B}} % Binary set (or \SetB)
\def\SetN{I\!\!N} \def\SetQ{I\!\!\!Q} \def\SetR{I\!\!R} \def\SetZ{Z\!\!\!Z}
\def\qmbox#1{{\quad\mbox{#1}\quad}}
\ifagibook\def\Chapter{Section }\else\def\Chapter{Chapter }\fi
\def\mrcp{the chain rule}
\def\text#1{\mbox{\scriptsize{#1}}} % if not using amstex
\begin{document}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% T i t l e - P a g e %
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{titlepage}
\title{\vspace{-1cm}
{\small Technical Report IDSIA-01-03 \hfill
\ifpublished In {\it Artificial General Intelligence}, 2007
\else 17 January 2003 (minor rev.18Oct'04)
\fi \\[4mm]}
\Large\bf\hrule height1pt \vskip 3mm
\mbox{{\LARGE U}NIVERSAL {\LARGE A}LGORITHMIC {\LARGE I}NTELLIGENCE} \\[1mm]
\bf A mathematical top$\to$down approach
\vskip 3mm \hrule height1pt}
\author{Marcus Hutter\\[3ex]
\small IDSIA, Galleria 2, CH-6928 Manno-Lugano, Switzerland%
\\
\small marcus@idsia.ch \hspace{9ex} http://www.idsia.ch/$^{_{_\sim}}\!$marcus
}
\ifpublished\date{17 January 2003}\else\date{}\fi
\maketitle % typeset the title of the contribution
\begin{keywords}
Artificial intelligence;
algorithmic probability;
sequential decision theory;
rational agents;
value function;
Solomonoff induction;
Kolmogorov complexity;
reinforcement learning;
universal sequence prediction;
strategic games;
function minimization;
supervised learning.
\end{keywords}
\begin{abstract}
Sequential decision theory formally solves the problem of rational
agents in uncertain worlds if the true environmental prior
probability distribution is known. Solomonoff's theory of
universal induction formally solves the problem of sequence
prediction for unknown prior distribution. We combine both ideas
and get a parameter-free theory of universal Artificial
Intelligence. We give strong arguments that the resulting AIXI
model is the most intelligent unbiased agent possible. We outline
how the AIXI model can formally solve a number of problem classes,
including sequence prediction, strategic games, function
minimization, reinforcement and supervised learning. The major
drawback of the AIXI model is that it is uncomputable. To overcome
this problem, we construct a modified algorithm AIXI$tl$ that is
still effectively more intelligent than any other time $t$ and
length $l$ bounded agent. The computation time of AIXI$tl$ is of
the order $t\cdot 2^l$. The discussion includes formal definitions
of intelligence order relations, the horizon problem and relations
of the AIXI theory to other AI approaches.
\end{abstract}
\end{titlepage}
%------------------------------%
% Table of Contents %
%------------------------------%
{\parskip=0ex\tableofcontents}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Introduction}\label{secInt}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
This \ifagibook chapter \else article \fi gives an introduction to
a mathematical theory for intelligence. We present the AIXI model,
a parameter-free optimal reinforcement learning agent embedded in
an arbitrary unknown environment.
The science of Artificial Intelligence (AI) may be defined as the
construction of intelligent systems and their analysis. A natural
definition of a {\it system} is anything that has an input and an
output stream. Intelligence is more complicated. It can have many
faces like creativity, solving problems, pattern recognition,
classification, learning, induction, deduction, building
analogies, optimization, surviving in an environment, language
processing, knowledge and many more. A formal definition
incorporating every aspect of intelligence, however, seems
difficult. Most, if not all known facets of intelligence can be
formulated as goal-driven or, more precisely, as maximizing some
utility function. It is, therefore, sufficient to study
goal-driven AI; e.g.\ the (biological) goal of animals and humans
is to survive and spread. The goal of AI systems should be to be
useful to humans. The problem is that, except for special cases,
we know neither the utility function nor the environment in which
the agent will operate in advance. The mathematical theory,
coined AIXI, is supposed to solve these problems.
Assume the availability of unlimited computational resources. The
first important observation is that this does not make the AI
problem trivial. Playing chess optimally or solving NP-complete
problems become trivial, but driving a car or surviving in nature
don't. This is because it is a challenge itself to
well-define the latter problems, not to mention presenting an
algorithm. In other words: The AI problem has not yet been
well defined. One may view AIXI as a suggestion for such a
mathematical definition of AI.
AIXI is a universal theory of sequential decision making akin to
Solomonoff's celebrated universal theory of induction. Solomonoff
derived an optimal way of predicting future data, given previous
perceptions, provided the data is sampled from a computable
probability distribution. AIXI extends this approach to an optimal
decision making agent embedded in an unknown environment. The {\em
main idea} is to replace the unknown environmental distribution
$\mu$ in the Bellman equations by a suitably generalized
universal Solomonoff distribution $\xi$. The state space is the
space of complete histories. AIXI is a universal theory without
adjustable parameters, making no assumptions about the environment
except that it is sampled from a computable distribution. From an
algorithmic complexity perspective, the AIXI model generalizes
optimal passive universal induction to the case of active agents.
From a decision-theoretic perspective, AIXI is a suggestion of a
new (implicit) ``learning'' algorithm, which may overcome all
(except computational) problems of previous reinforcement learning
algorithms.
There are strong arguments that AIXI is the most intelligent
unbiased agent possible. We outline for a number of problem
classes, including sequence prediction, strategic games, function
minimization, reinforcement and supervised learning, how the AIXI
model can formally solve them. The major drawback of the AIXI
model is that it is incomputable. To overcome this problem, we
construct a modified algorithm AIXI$tl$ that is still
effectively more intelligent than any other time $t$ and length $l$
bounded agent. The computation time of AIXI$tl$ is of the order $t
\cdot 2^l$. Other discussed topics are a formal definition of
an intelligence order relation, the horizon problem and relations of
the AIXI theory to other AI approaches.
The article is meant to be a gentle introduction to and discussion
of the AIXI model. For a mathematically rigorous treatment, many
subtleties, and proofs see the references to the author's works in
the annotated bibliography section at the end of this \ifagibook
chapter\else article\fi, and in particular the book
\cite{Hutter:04uaibook}. This section also provides references to
introductory textbooks and original publications on algorithmic
information theory and sequential decision theory.
%------------------------------%
%\paragraph{Contents} %technical
%------------------------------%
{\em \Chapter \ref{chAImu}} presents the theory of sequential
decisions in a very general form (called AI$\mu$ model) in which
actions and perceptions may depend on arbitrary past events. We
clarify the connection to the Bellman equations and discuss minor
parameters including (the size of) the I/O spaces and the lifetime
of the agent and their universal choice which we have in mind.
Optimality of AI$\mu$ is obvious by construction.
{\em \Chapter \ref{chSU}:} How and in which sense induction is
possible at all has been subject to long philosophical
controversies. Highlights are Epicurus' principle of multiple
explanations, Occam's razor, and probability theory. Solomonoff
elegantly unified all these aspects into one formal theory of
inductive inference based on a universal probability distribution
$\xi$, which is closely related to Kolmogorov complexity $K(x)$,
the length of the shortest program computing $x$. Rapid
convergence of $\xi$ to the unknown true environmental
distribution $\mu$ and tight loss bounds for arbitrary bounded
loss functions and finite alphabet can be shown. Pareto optimality
of $\xi$ in the sense that there is no other predictor that
performs better or equal in all environments and strictly better
in at least one can also be shown. In view of these results it is
fair to say that the problem of sequence prediction possesses a
universally optimal solution.
{\em \Chapter \ref{chAIxi}:} In the active case, reinforcement
learning algorithms are usually used if $\mu$ is unknown. They can
succeed if the state space is either small or has effectively been
made small by generalization techniques. The algorithms work only
in restricted (e.g.\ Markovian) domains, have problems with
optimally trading off exploration versus exploitation, have
nonoptimal learning rate, are prone to diverge, or are otherwise
ad hoc. The formal solution proposed here is to generalize
Solomonoff's universal prior $\xi$ to include action conditions
and replace $\mu$ by $\xi$ in the AI$\mu$ model, resulting in the
AI$\xi\equiv$AIXI model, which we claim to be universally optimal.
We investigate what we can expect from a universally optimal agent
and clarify the meanings of {\em universal}, {\em optimal}, etc.
Other discussed topics are formal definitions of an intelligence
order relation, the horizon problem, and Pareto optimality of
AIXI.
{\em \Chapter \ref{chApply}:} We show how a number of AI problem
classes fit into the general AIXI model. They include sequence
prediction, strategic games, function minimization, and supervised
learning. We first formulate each problem class in its natural way
(for known $\mu$) and then construct a formulation within the
AI$\mu$ model and show their equivalence. We then consider the
consequences of replacing $\mu$ by $\xi$. The main goal is to
understand in which sense the problems are solved by AIXI.
{\em \Chapter \ref{chTime}:} The major drawback of AIXI is that it
is incomputable, or more precisely, only asymptotically
computable, which makes an implementation impossible. To overcome
this problem, we construct a modified model AIXI$tl$, which is
still superior to any other time $t$ and length $l$ bounded
algorithm. The computation time of AIXI$tl$ is of the order
$t\cdot 2^l$. The solution requires an implementation of first-order logic, the definition of a universal Turing machine within
it and a proof theory system.
{\em \Chapter \ref{chDisc}:} Finally we discuss and remark on some
otherwise unmentioned topics of general interest. We
remark on various topics, including concurrent actions and
perceptions, the choice of the I/O spaces, treatment of encrypted
information, and peculiarities of mortal embodies agents. We
continue with an outlook on further research, including
optimality, down-scaling, implementation, approximation, elegance,
extra knowledge, and training of/for AIXI($tl$). We also include
some (personal) remarks on non-computable physics, the number of
wisdom $\Omega$, and consciousness.
\ifagibook An annotated bibliography concludes this chapter.
\else An annotated bibliography and other references conclude this work.
\fi
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Agents in Known Probabilistic Environments}\label{chAImu}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
The general framework for AI might be viewed as the design and
study of intelligent agents \cite{Russell:03}. An agent is a
cybernetic system with some internal state, which acts with output
$y_k$ on some environment in cycle $k$, perceives some input $x_k$
from the environment and updates its internal state. Then the next
cycle follows. We split the input $x_k$ into a regular part $o_k$
and a reward $r_k$, often called reinforcement feedback. From time
to time the environment provides nonzero reward to the agent. The
task of the agent is to maximize its utility, defined as the sum
of future rewards. A probabilistic environment can be described by
the conditional probability $\mu$ for the inputs $x_1...x_n$ to
the agent under the condition that the agent outputs $y_1...y_n$.
Most, if not all environments are of this type. We give formal
expressions for the outputs of the agent, which maximize the total
$\mu$-expected reward sum, called value. This model is called the
AI$\mu$ model. As every AI problem can be brought into this form,
the problem of maximizing utility is hence being formally solved,
if $\mu$ is known.
%
Furthermore, we study some special aspects of the AI$\mu$ model.
We introduce factorizable probability distributions describing
environments with independent episodes. They occur in several
problem classes studied in Section~\ref{chApply} and are a special
case of more general separable probability distributions defined
in Section~\ref{secAIsep}. We also clarify the connection to
the Bellman equations of sequential decision theory and discuss
similarities and differences. We discuss minor parameters of our
model, including (the size of) the input and output spaces $\X$
and $\Y$ and the lifetime of the agent, and their universal
choice, which we have in mind.
%
There is nothing remarkable in this section; it is the essence of
sequential decision theory
\cite{VonNeumann:44,Bellman:57,Bertsekas:96,Sutton:98}, presented
in a new form. Notation and formulas needed in later sections are
simply developed. There are two major remaining problems: the
problem of the unknown true probability distribution $\mu$, which
is solved in Section~\ref{chAIxi}, and computational aspects,
which are addressed in Section~\ref{chTime}.
%-------------------------------------------------------------%
\subsection{The Cybernetic Agent Model}
%-------------------------------------------------------------%
A good way to start thinking about intelligent systems is to
consider more generally cybernetic systems,\index{cybernetic
systems} in AI usually called agents.\index{agents} This
avoids having to struggle with the meaning of \idx{intelligence}
from the very beginning. A cybernetic system is a control circuit
with input\index{input} $y$ and output\index{output} $x$
and an internal state.\index{state!internal} From an external
input and the internal state the agent calculates
deterministically or stochastically an output. This output
(\idx{action}) modifies the environment and leads to a new input
(\idx{perception}). This continues ad infinitum or for a finite
number of cycles.
%------------------------------%
\fdefinition{defAgent}{The Agent Model}{
%------------------------------%
An agent is a system that interacts with an environment in cycles
$k=1,2,3,...$. In cycle $k$ the action (output) $y_k \in \Y$ of
the agent is determined by a policy $p$ that depends on the
I/O-history $y_1x_1...y_{k-1}x_{k-1}$. The environment reacts to
this action and leads to a new perception (input) $x_k \in \X$
determined by a deterministic function $q$ or probability
distribution $\mu$, which depends on the history
$y_1x_1...y_{k-1}x_{k-1}y_k$. Then the next cycle $k+1$ starts.
}%------------------------------%
\noindent As explained in the last section, we need some
\idx{reward} assignment to the cybernetic system. The input $x$ is
divided into two parts, the standard input $o$ and some reward
input $r$. If input and output are represented by strings, a
\idx{deterministic} cybernetic system can be modeled by a
\idx{Turing machine} $p$, where $p$ is called the \idx{policy} of the agent,
which determines the (re)action to a perception. If the environment is
also computable it might be modeled by a Turing machine $q$ as
well. The interaction of the agent with the environment can be
illustrated as follows:
\begin{center}\label{cyberpic}
%\input KCUnAI.pic
\small\unitlength=0.95mm\thicklines
\begin{picture}(106,47)
\put( 1,41){\framebox(16,6)[cc]{$r_1 \;|\; o_1$}}
\put(17,41){\framebox(16,6)[cc]{$r_2 \;|\; o_2$}}
\put(33,41){\framebox(16,6)[cc]{$r_3 \;|\; o_3$}}
\put(49,41){\framebox(16,6)[cc]{$r_4 \;|\; o_4$}}
\put(65,41){\framebox(16,6)[cc]{$r_5 \;|\; o_5$}}
\put(81,41){\framebox(16,6)[cc]{$r_6 \;|\; o_6$}}
\put(97,47){\line(1,0){9}}\put(97,41){\line(1,0){9}}\put(102,44){\makebox(0,0)[cc]{...}}
\put( 1,1){\framebox(16,6)[cc]{$y_1$}}
\put(17,1){\framebox(16,6)[cc]{$y_2$}}
\put(33,1){\framebox(16,6)[cc]{$y_3$}}
\put(49,1){\framebox(16,6)[cc]{$y_4$}}
\put(65,1){\framebox(16,6)[cc]{$y_5$}}
\put(81,1){\framebox(16,6)[cc]{$y_6$}}
\put(97,7){\line(1,0){9}}\put(97,1){\line(1,0){9}}\put(102,4){\makebox(0,0)[cc]{...}}
%
%
\put(1,21){\line(1,0){16}}
\put(1,27){\line(1,0){16}}
\put(1,21){\line(0,6){6}}
\put(9,24){\makebox(0,0)[cc]{work}}
\put(17,17){\framebox(20,14)[cc]{$\displaystyle{Agent\atop\bf p}$}}
\put(37,27){\line(1,0){14}}
\put(37,21){\line(1,0){14}}
\put(39,24){\makebox(0,0)[lc]{tape ...}}
%
%
\put(56,21){\line(1,0){16}}
\put(56,27){\line(1,0){16}}
\put(56,21){\line(0,6){6}}
\put(64,24){\makebox(0,0)[cc]{work}}
\put(72,17){\framebox(20,14)[cc]{$\displaystyle{Environ\mbox{-}\atop ment\quad\bf q}$}}
\put(92,27){\line(1,0){14}}
\put(92,21){\line(1,0){14}}
\put(94,24){\makebox(0,0)[lc]{tape ...}}
%
%\normalcolor
\put(46,41){\vector(-2,-1){20}}
\put(81,31){\vector(-2,1){20}}
\put(54,7){\vector(3,1){30}}
\put(24,17){\vector(3,-1){30}}
\end{picture}
\end{center}
%
Both $p$ as well as $q$ have unidirectional input and output tapes
and bidirectional work
tapes.\index{tape!unidirectional}\index{tape!bidirectional} What
entangles the agent with the environment is the fact that the
upper tape serves as input tape for $p$, as well as output tape
for $q$, and that the lower tape serves as output tape for $p$ as
well as input tape for $q$. Further, the reading head must always
be left of the writing head, i.e.\ the symbols must first be
written before they are read. Both $p$ and $q$ have their own mutually
inaccessible work tapes containing their own ``secrets''. The
heads\index{Turing machine!head}\label{defxyr} move in the
following way. In the $k^{th}$ cycle $p$ writes $y_k$, $q$ reads
$y_k$, $q$ writes $x_k \equiv r_k o_k$, $p$ reads $x_k \equiv r_k
o_k$, followed by the $(k + 1)^{th}$ \idx{cycle} and so on. The
whole process starts with the first cycle, all heads on tape start
and work tapes being empty. We call Turing machines
behaving in this way {\it chronological Turing machines}. Before
continuing, some notations on strings are appropriate.
%-------------------------------------------------------------%
\subsection{Strings}\label{secStrings}\index{strings}\index{sequence}
%-------------------------------------------------------------%
We denote strings over the \idx{alphabet} $\X$ by $s =
x_1x_2...x_n$, with $x_k \in \X$, where $\X$ is alternatively
interpreted as a nonempty subset of $\SetN$ or itself as a
prefix-free set\index{set!prefix-free} of binary strings. The
length\index{string!length} of $s$ is $\l(s)=\l(x_1) +...+
\l(x_n)$. Analogous definitions hold for $y_k \in \Y$. We call
$x_k$ the $k^{th}$ input word\index{input!word} and $y_k$ the
$k^{th}$ output word\index{output!word} (rather than letter). The
string $s=y_1x_1...y_nx_n$ represents the input/output in
\idx{chronological} order. Due to the \idx{prefix property} of the
$x_k$ and $y_k$, $s$ can be uniquely separated into its words. The
words appearing in strings are always in chronological order. We
further introduce the following abbreviations:
$\epsilon$\index{string!empty} is the empty string,
$x_{n:m}:=x_nx_{n+1}...x_{m-1}x_m$ for $n\leq m$ and $\epsilon$
for $n>m$. $x_{1$ it states that
$p$ shall be consistent with $p^*$ in the sense that they have the
same history. If $\X$, $\Y$ and $m$ are finite, the number of
different behaviors of the agent, i.e.\ the search space is
finite. Therefore, because we have assumed that $q$ is known,
$p^\best$ can effectively be determined by pre-analyzing all
behaviors. The main reason for restricting to finite $m$ was not
to ensure computability of $p^\best$ but that the limit
$m\to\infty$ might not exist. The ease with which we defined and
computed the optimal policy $p^\best$ is not remarkable. Just the
(unrealistic) assumption of a completely known deterministic
environment $q$ has trivialized everything.
%-------------------------------------------------------------%
\subsection{AI Model for Known Prior Probability}\label{secAIfunc}
%-------------------------------------------------------------%
\indxs{environment}{probabilistic}%
Let us now weaken our assumptions by replacing the deterministic environment $q$
with a probability distribution $\mu(q)$ over chronological
functions. Here $\mu$ might be interpreted in two ways. Either the
environment itself behaves stochastically defined by $\mu$ or the
true environment is deterministic, but we only have subjective
(probabilistic) information of which environment is the true
environment. Combinations of both cases are also possible. We
assume here that $\mu$ is known and describes the true stochastic
behavior of the environment. The case of unknown $\mu$ with the
agent having some beliefs about the environment lies at the heart
of the AI$\xi$ model described in Section~\ref{chAIxi}.
The {\em best} or {\em most intelligent} agent is now the one
that maximizes the {\em expected} utility (called value function)
$
V_\mu^p\equiv V_{1m}^{p\mu}:=\sum_q\mu(q)V_{1m}^{pq}
$.
%$p^\best=\arg\max_p V_\mu^p$
This defines the AI$\mu$ model.
%------------------------------%
\fdefinition{defAImu}{The AI$\mu$ model}{
%------------------------------%
The AI$\mu$ model is the agent with policy $p^\mu$ that maximizes
the $\mu$-expected total reward $r_1 +...+ r_m$, i.e.\
$p^\best\equiv p^\mu:=\arg\max_p V_\mu^p$. Its value is
$V_\mu^\best:=V_\mu^{p^\mu}$.
}%------------------------------%
\noindent We need the concept of a {\em value function} in a
slightly more general form.
%------------------------------%
\fdefinition{defValue}{The $\mu$/true/generating value function}{
%------------------------------%
The agent's perception $x$ consists of a regular observation $o \in \O$
and a reward $r \in \R \subset \SetR$. In cycle $k$ the {\em
value} $V_{km}^{p\mu}(y\!x_{ 1$, the expectimax series
in (\ref{pbestrec}) and the process of selecting maximal
values may be interpreted as abstract {\it planning}. The expectimax
series is a form of {\it informed search}, in the case of AI$\mu$, and {\it
heuristic search}, for AI$\xi$, where $\xi$ could be interpreted as
a heuristic for $\mu$. The minimax strategy of {\it game playing}
in case of AI$\mu$ is also subsumed. The AI$\xi$ model converges
to the minimax strategy if the environment is a minimax player, but
it can also take advantage of environmental players with limited
rationality. {\it Problem solving} occurs (only) in the form of
how to maximize the expected future reward.
{\it Knowledge} is accumulated by AI$\xi$ and is stored in some
form not specified further on the work tape. Any kind of
information in any representation on the inputs $y$ is
exploited. The problem of {\it knowledge engineering} and
{\it representation} appears in the form of how to train the AI$\xi$
model. More practical aspects, like {\it language or image
processing}, have to be learned by AI$\xi$ from scratch.
Other theories, like {\it fuzzy logic}, {\it possibility theory},
{\it Dempster-Shafer theory}, ... are partly outdated and partly
reducible to Bayesian probability theory
\cite{Cheeseman:85,Cheeseman:88}. The interpretation and
consequences of the evidence gap $g := 1 -
\sum_{x_k}\xi(y\!x_{ 0$ in $\xi$ may be similar to
those in Dempster-Shafer theory. Boolean logical reasoning about
the external world plays, at best, an emergent role in the AI$\xi$
model.
Other methods that do not seem to be contained in the AI$\xi$ model
might also be emergent phenomena. The AI$\xi$ model has to
construct short codes of the environmental behavior, and
AIXI$tl$ (see next section) has to construct
short action programs. If we would analyze and interpret these
programs for realistic environments, we might find some of the
unmentioned or unused or new AI methods at work in these
programs. This is, however, pure speculation at this point. More
important: when trying to make AI$\xi$ practically usable,
some other AI methods, like genetic algorithms or neural nets,
especially for I/O pre/postprocessing, may be useful.
The main thing we wanted to point out is that the AI$\xi$ model
does not lack any important known property of intelligence or
known AI methodology. What {\it is} missing, however, are
computational aspects, which are addressed in the next section.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Time-Bounded AIXI Model}\label{secAIXItl}\label{chTime}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Until now, we have not bothered with the non-computability of the
universal probability distribution $\xi$. As all universal models
in this paper are based on $\xi$, they are not effective in this
form. In this section, we outline how the previous models and
results can be modified/generalized to the time-bounded case.
Indeed, the situation is not as bad as it could be. $\xi$
is enumerable and $\hh y_k$ is still approximable, i.e.\
there exists an algorithm that will produce a
sequence of outputs eventually converging to the exact output $\hh
y_k$, but we can never be sure whether we have already reached it.
Besides this, the convergence is extremely slow, so this type of
asymptotic computability is of no direct (practical) use, but will
nevertheless be important later.
Let $\tilde p$ be a program that calculates within a reasonable
time $\tilde t$ per cycle, a reasonable intelligent output, i.e.\
$\tilde p(\hh x_{ \tilde l$.
\item Modify the behavior of all retained $p$ in each cycle $k$ as follows:
Nothing is changed if $p$ outputs some $w_k^py_k^p$ within $\tilde
t$ time steps. Otherwise stop $p$ and write $w_k=0$ and some
arbitrary $y_k$ to the output tape of $p$. Let $P$ be the set of
all those modified programs.
\item Start first cycle: $k := 1$.
\item\label{pbestloop} Run every $p \in P$ on extended input
$\hh y\!\hh x_{ 1$ to the input tape and continuing the
computation of the previous cycle.
\item Select the program $p$ with highest claimed reward $w_k^p$:
$p_k^\best := \arg\max_pw_k^p$.
\item Write $\hh y_k := y_k^{p_k^\best}$ to the output tape.
\item Receive input $\hh x_k$ from the environment.
\item Begin next cycle: $k := k + 1$, goto step
\ref{pbestloop}.
\end{enumerate}
\noindent It is easy to see that the following theorem holds.
\indxs{optimality}{AIXI$tl $}
%------------------------------%
\ftheorem{thAIXItl}{Optimality of AIXI${tl}$}{
%------------------------------%
Let $p$ be any extended chronological (incremental) program like
(\ref{extprog}) of length $\l(p) \leq \tilde l$ and computation
time per cycle $t(p) \leq \tilde t$, for which there exists a
proof of VA($p$) defined in (\ref{vadef}) of length $\leq l_P$.
The algorithm $p^\best$ constructed in the last paragraph,
which depends on $\tilde l$, $\tilde t$ and $l_P$ but not on $p$, is
effectively more or equally intelligent, according to $\succeq^c$
(see Definition~\ref{effaiord}) than any such $p$. The size of
$p^\best$ is $\l(p^\best) = O(\log(\tilde l \cdot \tilde
t \cdot l_P))$, the setup-time is
$t_{setup}(p^\best) = O(l_P^2 \cdot 2^{l_P})$ and the computation
time per cycle is $t_{cycle}(p^\best) = O(2^{\tilde
l} \cdot \tilde t)$.
}%------------------------------%
\noindent Roughly speaking, the theorem says that if there exists
a computable solution to some or all AI problems at all, the
explicitly constructed algorithm $p^\best$ is such a solution.
Although this theorem is quite general, there are some limitations
and open questions that we discuss in the next subsection.
The construction of the algorithm $p^*$ needs the specification of
a formal logic system
$(\forall,\lambda,y_i,c_i,f_i,R_i,\rightarrow,\wedge,=,...)$, and
axioms, and inference rules. A proof is a sequence of formulas,
where each formula is either an axiom or inferred from previous
formulas in the sequence by applying the inference rules. Details
can be found in \cite{Hutter:01fast} in a related
construction or in %Gallier:86
any textbook on logic or proof theory, e.g.\
\cite{Fitting:96,Shoenfield:67}. We only need to know that {\em
provability} and {\em Turing Machines} can be formalized. The
setup time in the theorem is just the time needed to verify
the $2^{l_P}$ proofs, each needing time $O(l_P^2)$.
%------------------------------%
\subsection{Limitations and Open Questions}
%------------------------------%
\begin{itemize}\parskip=0ex\parsep=0ex%\itemsep=0ex
\item Formally, the total computation time of $p^\best$ for cycles
$1...k$ increases linearly with $k$, i.e.\ is of order $O(k)$ with
a coefficient $2^{\tilde l} \cdot \tilde t$. The unreasonably
large factor $2^{\tilde l}$ is a well-known drawback in
best/democratic vote models and will be taken without further
comments, whereas the factor ${\tilde t}$ can be assumed to be of
reasonable size. If we do not take the limit $k \to \infty$ but
consider reasonable $k$, the practical significance of the time bound
on $p^\best$ is somewhat limited due to the additional additive
constant $O(l_P^2 \cdot 2^{l_P})$. It is much larger than $k \cdot
2^{\tilde l} \cdot \tilde t$ as typically $l_P \gg \l($VA$(p))
\geq \l(p) \equiv \tilde l$.
\item\index{value!justification} $p^\best$ is superior only
to those $p$ that justify their
outputs (by large $w_k^p$). It might be possible that there are
$p$ that produce good outputs $y_k^p$ within reasonable time, but
it takes an unreasonably long time to justify their outputs by
sufficiently high $w_k^p$. We do not think that (from a certain
complexity level onwards) there are policies where the process of
constructing a good output is completely separated from some sort
of justification process. But this justification might not be
translatable (at least within reasonable time) into a reasonable
estimate of $V_{km_k}^{p\xi}$.
\item The (inconsistent) programs $p$ must be able to continue
strategies started by other policies. It might happen that a
policy $p$ steers the environment to a direction for which $p$ is
specialized. A ``foreign'' policy might be able to displace $p$
only between loosely connected episodes. There is probably no
problem for factorizable $\mu$. Think of a chess game, where it is
usually very difficult to continue the game or strategy of a
different player. When the game is over, it is usually advantageous
to replace a player by a better one for the next game. There might
also be no problem for sufficiently separable $\mu$.
\item There might be (efficient) valid approximations $p$ for which
VA($p$) is true but not provable, or for which only a very long
($> l_P$) proof exists.
\end{itemize}
%------------------------------%
\subsection{Remarks}
%------------------------------%
\begin{itemize}\parskip=0ex\parsep=0ex%\itemsep=0ex
\item The idea of suggesting outputs and justifying them by proving
reward bounds implements one aspect of human thinking. There are
several possible reactions to an input. Each reaction possibly has
far-reaching consequences. Within a limited time one tries to estimate the
consequences as well as possible. Finally,
each reaction is valuated, and the best one is selected. What
is inferior to human thinking is that the estimates $w_k^p$ must
be rigorously proved and the proofs are constructed by blind
exhaustive search, further, that {\it all} behaviors $p$ of length
$\leq \tilde l$ are checked. It is inferior ``only'' in the sense of
necessary computation time but not in the sense of the quality of
the outputs.
\item In practical applications there are often cases with
short and slow programs $p_s$ performing some task $T$, e.g.\
the computation of the digits of $\pi$, for which there exist
long but quick programs $p_l$ too. If it is not too difficult to
prove that this long program is equivalent to the short one, then it is
possible to prove $K^{t(p_l)}(T) \leqa \l(p_s)$
with $K^t$ being the time-bounded Kolmogorov complexity.
Similarly, the method of proving bounds $w_k$ for $V_{km_k}$ can
give high lower bounds without explicitly executing these short
and slow programs, which mainly contribute to $V_{km_k}$.
\item Dovetailing all length- and time-limited programs is a
well-known elementary idea (e.g.\ typing monkeys). The crucial part
that was developed here, is the selection criterion for the
most intelligent agent.
\item The construction of AIXI$\tilde t\tilde l$ and the
enumerability of $V_{km_k}$ ensure arbitrary close
approximations of $V_{km_k}$, hence we expect that the behavior of
AIXI$\tilde t\tilde l$ converges to the behavior of AI$\xi$
in the limit $\tilde t,\tilde l,l_P\to\infty$, in some sense.
\item Depending on what you know or assume that a program $p$ of size
$\tilde l$ and computation time per cycle $\tilde t$ is able to
achieve, the computable AIXI$\tilde t\tilde l$ model will have
the same capabilities. For the strongest assumption of the
existence of a Turing machine that outperforms human
intelligence, AIXI$\tilde t\tilde l$ will do too, within
the same time frame up to an (unfortunately very large) constant
factor.
\end{itemize}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Discussion}\label{chDisc}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
This section reviews what has been achieved in the article and
discusses some otherwise unmentioned topics of general interest.
We remark on various topics, including
concurrent actions and perceptions, the choice of the I/O spaces,
treatment of encrypted information, and peculiarities of mortal
embodies agents. We continue with an outlook on further research.
Since many ideas have already been presented in the various
sections, we concentrate on nontechnical open questions of
general importance, including optimality, down-scaling,
implementation, approximation, elegance, extra knowledge, and
training of/for AIXI($tl$). We also include some (personal)
remarks on non-computable physics, the number of wisdom $\Omega$,
and consciousness. As it should be, the article concludes with
conclusions.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{General Remarks}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%This subsection remarks on some otherwise unmentioned topics of
%general interest. The logically disconnected paragraphs discuss
%concurrent actions and perceptions, the choice of the I/O spaces,
%treatment of encrypted information, and peculiarities of mortal
%embodies agents.
%------------------------------%
\paragraph{Game theory}
%------------------------------%
\index{game theory}%
\index{theory|see{particular theory}}%
\index{concurrent!actions and perceptions}%
\index{actions!concurrent}%
\index{perceptions!concurrent}%
In game theory \cite{Osborne:94} one often wants to model the
situation of simultaneous actions, whereas the AI$\xi$ models have
serial I/O. Simultaneity can be simulated by withholding the
environment from the current agent's output $y_k$, until $x_k$ has
been received by the agent. Formally, this means that
$\mu(y\!x_{