%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% Open Problems in Universal Induction & Intelligence %%
%% Marcus Hutter, Start: February 2009 %%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\documentclass[12pt,twoside]{article}
\usepackage{latexsym}
\newdimen\paravsp \paravsp=1.3ex
\topmargin=-10mm \oddsidemargin=5mm \evensidemargin=5mm
\textwidth=15cm \textheight=22cm
\sloppy\lineskip=0pt
%-------------------------------%
% Macro-Definitions %
%-------------------------------%
\def\,{\mskip 3mu} \def\>{\mskip 4mu plus 2mu minus 4mu} \def\;{\mskip 5mu plus 5mu} \def\!{\mskip-3mu}
\def\dispmuskip{\thinmuskip= 3mu plus 0mu minus 2mu \medmuskip= 4mu plus 2mu minus 2mu \thickmuskip=5mu plus 5mu minus 2mu}
\def\textmuskip{\thinmuskip= 0mu \medmuskip= 1mu plus 1mu minus 1mu \thickmuskip=2mu plus 3mu minus 1mu}
\textmuskip
\def\beqn{\dispmuskip\begin{displaymath}}\def\eeqn{\end{displaymath}\textmuskip}
\def\abstract#1{\centerline{\bf\small
Abstract}\begin{quote}\vspace{-1ex}\small #1 \par\end{quote}\vskip 1ex}
\def\keywords#1{\centerline{\bf\small
Keywords}\begin{quote}\vspace{-1ex}\small #1 \par\end{quote}\vskip 1ex}
\def\paradot#1{\vspace{\paravsp plus 0.5\paravsp minus 0.5\paravsp}\noindent{\bf\boldmath{#1.}}}
\def\paranodot#1{\vspace{\paravsp plus 0.5\paravsp minus 0.5\paravsp}\noindent{\bf\boldmath{#1}}}
\def\req#1{(\ref{#1})}
\def\SetR{I\!\!R}
\def\SetN{I\!\!N}
\def\itb{\bf} % bold italics item headers
\def\eps{\varepsilon} % for small positive number
\def\nq{\hspace{-1em}}
\def\l{{\ell}} % length of string or program
\def\M{{\cal M}} % Set of prob. distributions
\def\X{{\cal X}} % sequence input set/alphabet
\def\A{{\cal A}} % input/perception set/alphabet
\def\O{{\cal O}} % output/action set/alphabet
% Open Problem Labels
\def\IZP{\ref{secInduct}\ref{IZP}}
\def\IBR{\ref{secInduct}\ref{IBR}}
\def\IGRUE{\ref{secInduct}\ref{IGRUE}}
\def\IRI{\ref{secInduct}\ref{IRI}}
\def\IOEUP{\ref{secInduct}\ref{IOEUP}}
\def\IOP{\ref{secInduct}\ref{IOP}}
\def\PSB{\ref{secPredict}\ref{PSB}}
\def\PNTM{\ref{secPredict}\ref{PNTM}}
\def\PMLC{\ref{secPredict}\ref{PMLC}}
\def\PGMCC{\ref{secPredict}\ref{PGMCC}}
\def\PLCBDM{\ref{secPredict}\ref{PLCBDM}}
\def\PAIXI{\ref{secPredict}\ref{PAIXI}}
\def\OMUO{\ref{secOptim}\ref{OMUO}}
\def\OLEC{\ref{secOptim}\ref{OLEC}}
\def\OGBM{\ref{secOptim}\ref{OGBM}}
\def\OIA{\ref{secOptim}\ref{OIA}}
\def\SAIXI{\ref{secStruct}\ref{SAIXI}}
\def\SPD{\ref{secStruct}\ref{SPD}}
\def\SAEA{\ref{secStruct}\ref{SAEA}}
\def\SARV{\ref{secStruct}\ref{SARV}}
\def\CPCUH{\ref{secConv}\ref{CPCUH}}
\def\CRL{\ref{secConv}\ref{CRL}}
\def\CRNCE{\ref{secConv}\ref{CRNCE}}
\def\CGT{\ref{secConv}\ref{CGT}}
\def\CPOST{\ref{secConv}\ref{CPOST}}
\def\DGSPM{\ref{secDefine}\ref{DGSPM}}
\def\DPPM{\ref{secDefine}\ref{DPPM}}
\def\DEE{\ref{secDefine}\ref{DEE}}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% T i t l e - P a g e %%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{document}
\title{\vspace{-4ex}
\vskip 2mm\bf\Large\hrule height5pt \vskip 4mm
Open Problems in Universal \\ Induction \& \iffalse Artificial \fi Intelligence
\vskip 4mm \hrule height2pt}
\author{{\bf Marcus Hutter}\\[3mm]
\normalsize RSISE$\,$@$\,$ANU and SML$\,$@$\,$NICTA \\
\normalsize Canberra, ACT, 0200, Australia \\
\normalsize \texttt{marcus@hutter1.net \ \ www.hutter1.net}
}
\date{21 June 2009}
\maketitle
\vspace*{-4ex}
\abstract{
Specialized intelligent systems can be found everywhere: finger
print, handwriting, speech, and face recognition, spam filtering,
chess and other game programs, robots, et al. This decade the first
presumably complete {\em mathematical} theory of artificial
intelligence based on universal induction-prediction-decision-action
has been proposed. This information-theoretic approach solidifies
the foundations of inductive inference and artificial intelligence.
Getting the foundations right usually marks a significant progress
and maturing of a field. The theory provides a gold standard and
guidance for researchers working on intelligent algorithms. The
roots of universal induction have been laid exactly half-a-century
ago and the roots of universal intelligence exactly one decade ago.
So it is timely to take stock of what has been achieved and what
remains to be done. Since there are already good recent surveys, I
describe the state-of-the-art only in passing and refer the reader
to the literature. This article concentrates on the open problems in
universal induction and its extension to universal intelligence.
%
\def\contentsname{\centering\normalsize Contents}
{\parskip=-2.7ex\tableofcontents}\vspace*{-4ex}
}
\keywords{
Kolmogorov complexity;
information theory;
sequential decision theory;
reinforcement learning;
artificial intelligence;
universal Solomonoff induction;
rational agents.
}
\begin{quote}\it
``The mathematician is by now accustomed to intractable equations,
and even to unsolved problems, in many parts of his discipline.
However, it is still a matter of some fascination to realize
that there are parts of mathematics where the very construction
of a precise mathematical statement of a verbal problem is
itself a problem of major difficulty.'' \par
\hfill --- {\sl Richard Bellman, Adaptive Control Processes (1961) p.194}
\end{quote}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Introduction}\label{secIntro}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%------------------------------%
%\paradot{Models}
%------------------------------%
What is a good model of the weather changes? %
Are there useful models of the world economy? %
What is the true regularity behind the number sequence 1,4,9,16,...? %
What is the correct relationship between mass, force,
and acceleration of a physical object? %
Is there a causal relation between interest rates and inflation? %
Are models of the stock market purely descriptive or do they have any predictive power? %
%------------------------------%
\paradot{Induction}
%------------------------------%
The questions above look like a set of unrelated inquires. What
they have in common is that they seem to be amenable to scientific
investigation. They all ask about a model for or relation between
observations. The purpose seems to be to explain or understand the
data. Generalizing from data to general rules is called {\em inductive
inference}, a core problem in philosophy
\cite{Hume:1739,Popper:34,Howson:03} and a key task of science
\cite{Levi:74,Earman:93,Wallace:05}.
But why do or should we care about modeling the world? Because this
is what science is about \cite{Salmon:06}? As indicated above,
models should be good, useful, true, correct, causal, predictive, or
descriptive \cite{Frigg:06}. Digging deeper, we see that models are
mostly used for prediction in related but new situations, especially
for predicting future events \cite{Wikipedia:08pm}.
%------------------------------%
\paradot{Predictions}
%------------------------------%
Consider the apparently only slight variation of the questions above: %
What is the correct answer in an IQ test asking to
continue the sequence 1,4,9,16,...? %
Given historic stock-charts, can one predict the quotes of tomorrow? %
Or questions like: Assuming the sun rose every day for 5000 years,
how likely is doomsday (that the sun will not rise) tomorrow? %
What is my risk of dying from cancer next year? %
These questions are instances of the important problem of
time-series forecasting, also called sequence prediction
\cite{Brockwell:02,Cesa:06}.
While inductive inference is about finding models or hypotheses that
{\em explain} the data (whatever {\em explain} actually shall mean),
{\em prediction} is concerned about forecasting the future.
Finding models is interesting and
useful, since they usually help us to (partially) answer such {\em
predictive} questions \cite{Geisser:93,Chatfield:03}.
While the usefulness of predictions is clearer to the layman than
the purpose of the scientific inquiry for models, one may again ask,
why we do or should we care about making predictions?
%------------------------------%
\paradot{Decisions}
%------------------------------%
Consider the following questions: Shall I take my umbrella or wear
sunglasses today? Shall I invest my assets in stocks or bonds? Shall
I skip work today because it might be my last day on earth? Shall I
irradiate or remove the tumor of my patient? These questions ask for
decisions that have some (minor to drastic) consequences. We usually
want to make ``good'' decisions, where the quality is measured in
terms of some reward (money, life expectancy) or loss
\cite{Ferguson:67,DeGroot:70,Jeffrey:83}. In order to compute this
reward as a function of our decision, we need to predict the
environment: whether there will be rain or sunshine today, whether
the market will go up or down, whether doomsday is tomorrow, or
which type of cancer the patient has. Often forecasts are uncertain
\cite{Paris:95}, but this is still better than no prediction.
Once we arrived at a (hopefully good) decision, what do we do next?
%------------------------------%
\paradot{Actions}
%------------------------------%
The obvious thing is to execute the decision, i.e.\ to perform some
action consistent with the decision arrived at. The action may not
influence the environment, like taking umbrella versus sunglasses
does not influence the future weather (ignoring the butterfly
effect) or small stock trades. These settings are called passive
\cite{Hutter:03optisp}, and the action part is of marginal
importance and usually not discussed. On the other hand, a patient
might die from a wrong treatment, or a chess player loses a figure
and possibly the whole game by making one mistake. These settings
are called (re)active \cite{Hutter:07aixigentle}, and their analysis is
immensely more involved than the passive case \cite{Bertsekas:06}.
%------------------------------%
\paranodot{And now?}
%------------------------------%
There are many theories and algorithms and whole research fields and
communities dealing with some aspects of induction, prediction,
decision, or action. Some of them will be detailed below.
%
Finding solutions for every particular (new) problem is possible and
useful for many specific applications. Trouble is that this approach
is cumbersome and prone to disagreement or contradiction
\cite{Kemp:03}. Some researchers feel that this is the nature of
their discipline and one can do little about it \cite{Kellert:06}.
But in science (in particular math, physics, and computer science)
previously separate approaches are constantly being unified towards
more and more powerful theories and
algorithms \cite{Green:87,Greene:00}.
%
There is at least one field, where we {\em must} put everything
(induction+prediction+decision+action) together in a completely
formal (preferably elegant) way, namely Artificial Intelligence
\cite{Russell:03}. Such a general and formal theory of AI has been
invented about a decade ago \cite{Hutter:00kcunai}.
%------------------------------%
\paradot{Contents}
%------------------------------%
% Section UAI
In {\em Section \ref{secUAI}} I will give a brief introduction into this
universal theory of AI. It is based on an unexpected
unification of algorithmic information theory
and sequential decision theory. The corresponding AIXI agent is the
first sound, complete, general, rational agent in any relevant but
unknown environment with reinforcement feedback
\cite{Hutter:04uaibook,Oates:06}. It is likely the best possible
such agent in a sense to be explained below.
% Section Hist & State-of-the-Art
{\em Section \ref{secHist}} describes the historic origin of the
AIXI model. One root is Solomonoff's theory \cite{Solomonoff:60} of
{\em universal induction}, which is closely connected to algorithmic
complexity. The other root is Bellman's {\em adaptive control
theory} \cite{Bellman:57} for optimal {\em sequential decision}
making. Both theories are now half-a-century old. From an
algorithmic information theory perspective, AIXI generalizes optimal
passive universal induction to the case of active agents. From a
decision-theoretic perspective, AIXI is a universal Bayes-optimal
learning algorithm.
% Open Problems ...
{\em Sections \ref{secInduct}--\ref{secDefine}} constitute the core
of this article describing the open problems around universal
induction \& intelligence. Most of them are taken from the book
\cite{Hutter:04uaibook} and paper \cite{Hutter:07uspx}. I focus on
questions whose solution has a realistic chance of advancing the
field. I avoid technical open problems whose global significance is
questionable.
% ... in Universal Induction. Section \ref{secInduct}
Solomonoff's half-a-century-old theory of universal induction is
already well developed. Naturally, most remaining open
problems are either philosophically or technically deep.
% ... reg. Optimality of AIXI. Section \ref{secOptim}
Its generalization to Universal Artificial Intelligence seems to
be quite intricate. While the AIXI model itself is very elegant,
its analysis is much more cumbersome. Although AIXI has been
shown to be optimal in some senses, a convincing notion
of optimality is still lacking. Convergence results also exist, but
are much weaker than in the passive case.
% ... reg. Uniqueness of AIXI. Section \ref{secUnique}
Its construction makes it plausible that AIXI is the optimal
rational general learning agent, but unlike the induction case,
victory cannot be claimed yet. It would be natural, hence, to
compare AIXI to alternatives, if there were any. Since there are
no competitors yet, one could try to create some. Finally, AIXI is only
``essentially'' unique, which gives rise to some more open
questions.
% ... in Defining Intelligence. Section \ref{secDefine}
Given that AI is about designing intelligent systems, a serious
attempt should be made to {\em formally} define intelligence in the
first place. Astonishingly there have been not too many
attempts. There is one definition that is closely related to AIXI,
but its properties have yet to be explored.
The final {\em Section \ref{secConc}} briefly discusses the flavor,
feasibility, difficulty, and interestingness of the raised
questions, and takes a step back and briefly compares the
information-theoretic approach to AI discussed in this article to
others.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Universal Artificial Intelligence}\label{secUAI}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%------------------------------%
\paradot{Artificial Intelligence}
%------------------------------%
The science of artificial intelligence (AI) may be defined as the
construction of intelligent systems ({\em artificial
agents}) and their analysis \cite{Russell:03}. A natural definition
of a {\em system} is anything that has an input and an output
stream, or equivalently an agent that acts and observes.
%
{\em Intelligence} is more complicated. It can have many faces like
{\em creativity}, {\em solving problems}, {\em pattern recognition},
{\em classification}, {\em learning}, {\em induction}, {\em
deduction}, {\em building analogies}, {\em optimization}, {\em
surviving in an environment}, {\em language processing}, {\em
planning}, and {\em knowledge acquisition and processing}.
%
Informally, {\em AI is concerned with developing agents that
perform well in a large range of environments} \cite{Hutter:07iorx}.
%
A formal definition incorporating every aspect of intelligence,
however, seems difficult.
%
In order to solve this problem we need to solve the induction,
prediction, decision, and action problem, which seems like
a daunting (some even claim impossible) task:
%
Intelligent {\em actions} are based on informed {\em decisions}.
Attaining good decisions requires {\em predictions} which are
typically based on models of the environments. Models are
constructed or learned from past observations via {\em
induction}.
%
Fortunately, based on the {\em deep philosophical insights} and
{\em powerful mathematical developments} listed in Section
\ref{secHist}, these problems have been overcome, at least in
theory.
%------------------------------%
\paradot{Universal Artificial Intelligence (UAI)}
%------------------------------%
Most, if not all, known facets of intelligence can be formulated as
goal driven or, more precisely, as maximizing some reward or utility
function. It is, therefore, sufficient to study goal-driven AI;
e.g.\ the (biological) goal of animals and humans is to survive and
spread. The goal of AI systems should be to be useful to humans. The
problem is that, except for special cases, we know neither the
utility function nor the environment in which the agent will operate
in advance.
%
What do we need (from a mathematical point of view) to construct a
universal optimal learning agent interacting with an arbitrary
unknown environment? The theory, coined {\em AIXI}, developed in
this decade and explained in \cite{Hutter:04uaibook} says: {\em All
you need is} {\em Occam} \cite{Franklin:02}, {\em Epicurus}
\cite{Asmis:84}, {\em Turing} \cite{Turing:36}, {\em Bayes}
\cite{Bayes:1763}, {\em Solomonoff} \cite{Solomonoff:64}, {\em
Kolmogorov} \cite{Kolmogorov:65}, and {\em Bellman}
\cite{Bellman:57}:
%
Sequential decision theory \cite{Bertsekas:06} ({\em Bellman}'s
equation) formally solves the problem of rational agents in
uncertain worlds if the true environmental probability distribution
is known. If the environment is unknown, {\em Bayes}ians
\cite{Berger:93} replace the true distribution by a weighted mixture
of distributions from some (hypothesis) class. Using the large class
of all (semi)measures that are (semi)computable on a {\em Turing}
machine bears in mind {\em Epicurus}, who teaches not to discard
any (consistent) hypothesis. In order not to ignore {\em Occam},
who would select the simplest hypothesis, {\em Solomonoff} defined
a universal prior that assigns high/low prior weight to
simple/complex environments, where {\em Kolmogorov} quantifies
complexity \cite{Hutter:07ait,Li:08}.
%
All other concepts and phenomena attributed to intelligence are emergent.
All together, this solves all conceptual problems
\cite{Hutter:04uaibook}, and ``only'' computational problems remain.
%------------------------------%
\paradot{Kolmogorov complexity}
%------------------------------%
Kolmogorov \cite{Kolmogorov:65} defined the complexity of a string
$x\in\X^*$ over some finite alphabet $\X$ as the length $\ell$ of a
shortest description $p\in\{0,1\}^*$ on a universal Turing machine $U$:
\beqn
\mbox{Kolmogorov complexity:}\qquad K(x):=\min_p\{\ell(p):U(p)=x\}
\eeqn
A string is simple if it can be described by a short program, like
``the string of one million ones'', and is complex if there is no
such short description, like for a random string whose shortest
description is specifying it bit-by-bit.
%
For non-string objects $o$ one defines
$K(o):=K(\langle o\rangle)$, where $\langle o\rangle\in\X^*$ is
some standard code for $o$.
%
Kolmogorov complexity \cite{Kolmogorov:65,Hutter:08kolmo} is a key
concept in (algorithmic) information theory \cite{Li:08}. An
important property of $K$ is that it is nearly independent of the
choice of $U$, i.e.\ different choices of $U$ change $K$ ``only'' by
an additive constant (see Section \PNTM). Furthermore it leads to
shorter codes than any other effective code. $K$ shares many
properties with Shannon's entropy (information measure) $S$
\cite{MacKay:03,Cover:06}, but $K$ is superior to $S$ in many
respects. Foremost, $K$ measures the information of individual
outcomes, while $S$ can only measure expected information of random
variables. To be brief, $K$ is an excellent universal complexity
measure, suitable for quantifying Occam's razor. The major drawback
of $K$ as complexity measure is its incomputability. So in practical
applications it has always to be approximated, e.g.\ by Lempel-Ziv
compression \cite{Lempel:76,Cilibrasi:05}, or by CTW
\cite{Willems:97} compression, or by using two-part codes like in
MDL and MML, or by others.
%------------------------------%
\paradot{Solomonoff induction}
%------------------------------%
Solomonoff \cite{Solomonoff:64} defined (earlier) the closely
related universal a priori probability $M(x)$ as the probability
that the output of a universal (monotone) Turing machine $U$ starts
with $x$ when provided with fair coin flips on the input tape
\cite{Hutter:07algprob}. Formally,
\beqn
\mbox{Solomonoff prior:}\qquad\qquad M(x):=\sum_{p:U(p)=x*}2^{-\ell(p)},
\eeqn
where the sum is over all (possibly non-halting) so-called minimal
programs $p$ which output a string starting with $x$. Since the sum
is dominated by short programs, we have $M(x)\approx 2^{-K(x)}$
(formally $-\log M(x)=K(x)+O(\log\l(x))$), i.e.\ simple/complex
strings are assigned a high/low a-priori probability.
A different representation is as follows \cite{Zvonkin:70}:
%
Let $\M=\{\nu\}$ be a countable class of probability measures $\nu$
(environments) on infinite sequences $\X^\infty$, $\mu\in\M$ be the
true sampling distribution, i.e.\ $\mu(x)$ is the true probability
that an infinite sequences starts with $x$, and
$\xi_\M(x):=\sum_{\nu\in\M} w_\nu\nu(x)$ be the $w$-weighted average
called Bayesian mixture distribution. One can show that
$M(x)=\xi_{\M_U}(x)$, where $\M_U$ includes all computable
probability measures and $w_\nu=2^{-K(\nu)}$. More precisely,
$\M_U:=\{\nu_1,\nu_2,...\}$ consists of an effective enumeration of
all so-called lower semi-computable semi-measures $\nu_i$, and
$K(\nu_i):=K(i):=K(\langle i\rangle)$ \cite{Li:08}.
$M$ can be used as a universal sequence predictor, which
outperforms in a strong sense all other predictors.
%
Consider the classical online sequence prediction task: Given
$x_{