0\ \forall k$ then $x$ is not rejectable.
\end{enumerate}
\end{definition}
\begin{theorem}[Linear separation]\label{thm:LF}
Suppose that we have a space of contracts $X$ that is a Banach
sequence space. Given a rational decision maker there is a positive
continuous linear functional $f:X\to\mathbb{R}$ such that
\begin{equation}
\{x\ |\ f(x)> 0\}\subset Z\subseteq\{x\ |\ f(x)\geq 0\}.
\end{equation}
\end{theorem}
\begin{proof} The third property tells us that $Z$ and $-Z$ are convex
cones. The second and fourth property tells us that $Z\neq
\mathbb{R}^m$. Suppose that there is a point $x$ that lies in both
the interior of $Z$ and of $-Z$. Then the same is true for $-x$
according to the second property and for the origin. That a ball
around the origin lies in $Z$ means that $Z=\mathbb{R}^m$ which is
not true. Thus the interiors of $Z$ and $-Z$ are disjoint open
convex sets and can, therefore, be separated by a hyperplane
(according to the Hahn-Banach theorem) which goes through the origin
(since according to the second and fourth property the origin is
both acceptable and rejectable). The first two properties tell us
that $Z\cup -Z=\mathbb{R}^m$. Given a separating hyperplane (between
the interiors of $Z$ and $-Z$), $Z$ must contain everything on one
side. This means that $Z$ is a half space whose boundary is a
hyperplane that goes through the origin and the closure $\bar{Z}$ of
$Z$ is a closed half space and can be written as $\{x\ |\ f(x)\geq
0\}$ for some $f\in X'$. The fourth property tells us that $f$ is
positive.
\end{proof}
\begin{corollary}[Additivity]\label{cor:add}
1. If $X=c_0$ then a rational decision maker is described by a
countably additive (probability) measure.
\\2. If $X=\ell^\infty$ then a rational decision maker is described by a
finitely additive (probability) measure.
\end{corollary}
It seems from Corollary \ref{cor:add} that we pay the price of
losing countable additivity for expanding the space of contracts
from $c_0$ to $\ell^\infty$ but we can expand the space even more by
looking at $c_0(w)$ where $w_k\to 0$ which contains $\ell^\infty$
and $X'$ is then $\ell^1((1/w_k))$. This means that we get countable
additivity back but we instead have a restriction on how fast the
probabilities $p_k$ must tend to $0$. Note that a bounded linear
functional on $c_0$ can always be extended to a bounded linear
functional on $\ell^\infty$ by the formula $f(x)=\sum p_ix_i$ but
that is not the unique extension. Note also that every bounded
linear functional on $\ell^\infty$ can be restricted to $c_0$ and
there be represented as $f(x)=\sum p_ix_i$. Therefore, a rational
decision maker on $\ell^\infty$ contracts has probabilistic beliefs
(unless $p_i=0\ \forall i$), though it might also take asymptotic
behavior of a contract into account. For example (and here $p_i=0\
\forall i$), the decision maker that makes decisions based on
asymptotic averages $\lim_{n\to\infty} \frac{1}{n}\sum_{i=1}^n x_i$
when they exist. That strategy can be extended to all of
$\ell^\infty$ (a Banach limit). The following proposition will help
us decide which decision maker on $\ell^\infty$ is described with
countably additive probabilities.
\begin{proposition}\label{p:approx}
Suppose that $f\in(\ell^\infty)'$. For any $x\in\ell^\infty$,
let $x^j_i=x_i$ if $i\leq j$ and $x^j_i=0$ otherwise. If for
any $x$,
$$
\lim_{j\to\infty} f(x^j)=f(x),
$$
then $f$ can be written as $f(x)=\sum p_ix_i$ where $p_i\geq 0$
and $\sum_{i=1}^\infty p_i<\infty$.
\end{proposition}
\begin{proof}
The restriction of $f$ to $c_0$ gives us numbers $p_i\geq 0$ such
that $\sum_{i=1}^\infty p_i<\infty$ and $f(x)=\sum p_ix_i$ for $x\in
c_0$. This means that $f(x^j)=\sum_{i=1}^j p_ix_i$ for any
$x\in\ell^\infty$ and $j<\infty$. Thus $\lim_{j\to\infty}
f(x^j)=\sum_{i=1}^\infty p_ix_i$.
\end{proof}
\begin{definition}[Monotone decisions]
We define the concept of a \em{monotone} decision maker in the
following way. Suppose that for every $x\in\ell^\infty$ there is
$N<\infty$ such that the decision is the same for all $x^j,\ j\geq
N$ (See Proposition \ref{p:approx} for definition) as for $x$. Then
we say that the decision maker is monotone.
\end{definition}
\begin{example}
Let $f\in\ell^\infty$ be such that if $\lim \alpha_k\to L$ then
$f(\alpha)=L$ (i.e.\ $f$ is a Banach limit). Furthermore define a
rational decision maker by letting the set of acceptable contracts
be $Z=\{x\ |\ f(x)\geq 0\}$. Then $f(x^j)=0$ (where we use notation
from Proposition \ref{p:approx}) for all $j<\infty$ and regardless
of which $x$ we define $x^j$ from. Therefore, all sequences that are
eventually zero are acceptable contracts. This means that this
decision maker is not monotone since there are contracts that are
not acceptable.
\end{example}
\begin{theorem}[Monotone rationality]\label{thm:mono}
Given a monotone rational decision maker for $\ell^\infty$
contracts, there are $p_i\geq 0$ such that $\sum p_i<\infty$ and
\begin{equation}
\{x\ |\ \sum x_ip_i>0\}\subset Z\subseteq \{x\ |\ \sum x_ip_i\geq
0\}.
\end{equation}
\end{theorem}
\begin{proof}
According to Theorem \ref{thm:LF} there is $f\in(\ell^\infty)'$ such
that (the closure of $Z$) $\bar{Z}=\{x|\ f(x)\geq 0\}$ . Let
$p_i\geq 0$ be such that $\sum p_i<\infty$ and such that $f(x)=\sum
x_ip_i$ for $x\in c_0$. Remember that $x^j$ (notation as in
Proposition \ref{p:approx}) is always in $c_0$. Suppose that there
is $x$ such that $x$ is accepted but $\sum x_ip_i<0$. This violate
monotonicity since there exist $N<\infty$ such that $\sum_{i=1}^n
x_ip_i<0$ for all $n\geq N$ and, therefore, $x^j$ is not accepted
for $j\geq N$ but $x$ is accepted. We conclude that if $x$ is
accepted then $\sum p_ix_i \geq 0$ and if $\sum p_ix_i>0$ then $x$
is accepted.
\end{proof}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Rational Agents for Classes of Environments}\label{four}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
We will here study agents that are designed to deal with a large
range of situations. Given a class of environments we want to define
agents that can learn to act well when placed in any of them,
assuming it is at all possible.
\begin{definition}[Universality for a class]
We say that a decision maker is universal for a class of
environments $\mathcal{M}$ if for any outcome sequence
$a_1j_1a_2j_2...$ that given the actions would be produced by some
environment in the class, there is $c>0$ (depending on the sequence)
such that the decision maker has probabilities that satisfy
$$
Pr(j_1,...,j_t|a_1,...,a_t)\geq c\ \forall t.
$$
This is obviously true if the decision maker's probabilistic
beliefs are a convex combination $\sum_{\nu\in \mathcal{M}}
w_\nu \nu$, $w_\nu>0$ and $\sum_\nu w_\nu=1$.
\end{definition}
We will next discuss how to define some large classes of
environments and agents that can succeed for them. We assume that
the total accumulated reward from the environment will be finite
regardless of our actions since we want any policy to have finite
utility. Furthermore, we assume that rewards are positive and that
it is possible to achieve strictly positive rewards in any
environment. We would like the agent to perform well regardless of
which environment from the chosen class it is placed in.
For any possible policy (action tree) $\pi$ and environment $\nu$,
there is a total reward $V_\nu^\pi$ that following $\pi$ in $\nu$
would result in. This means that for any $\pi$ there is a contract
sequence $(V_\nu^\pi)_\nu$, assuming we have enumerated our set of
environments. Let
$$
V^*_\nu=\max_\pi V_\nu^\pi.
$$
We know that $V^*_\nu>0$ for all $\nu$. Every contract sequence
$(V_{\nu}^\pi)_\nu$ lies in $X=\ell^\infty((1/V^*_\nu))$ and
$\|(V_\nu^\pi)\|_X\leq 1$. The rational decision makers are the
positive, continuous linear functionals on $X$. $X'$ contains the
space $\ell^1(V^*_\nu)$. In other words if $w_\nu \geq 0$ and $\sum
w_\nu V_\nu^*<\infty$ then the sequence $(w_\nu)$ defines a rational
decision maker for the contract space $X$. These are exactly the
monotone rational decision makers. Letting (which is the AIXI agent
from \cite{Hutter04})
\begin{equation}\label{eq:AIXI}
\pi^*\in\arg\max_\pi \sum_\nu w_\nu V_\nu^\pi
\end{equation}
we have a choice with the property that for any other $\pi$ with
$$
\sum_\nu w_\nu V_\nu^\pi< \sum_\nu w_\nu V_\nu^{\pi^*}.
$$
Hence the contract $(V_\nu^{\pi^*}-V_\nu^\pi)$ is not rejectable. In
other words $\pi^*$ is strictly preferable to $\pi$. By letting
$p_\nu=w_\nu V^*_\nu$, we can rewrite \eqref{eq:AIXI} as
\begin{equation}\label{eq:AIXI2}
\pi^*\in\arg\max_\pi \sum_\nu p_\nu \frac{V_\nu^\pi}{V^*_\nu}.
\end{equation}
If one further restricts the class of environments by assuming
$V^*_\nu\leq 1$ for all $\nu$ then for every $\pi$,
$(V^\pi_\nu)\in\ell^\infty$. Therefore, by Theorem \ref{thm:mono}
the monotone rational agents for this setting can be formulated as
in \eqref{eq:AIXI} with $(w_\nu)\in\ell_1$, i.e.\ $\sum_\nu
w_\nu<\infty$. However, since $(p_\nu)\in\ell_1$, a formulation of
the form of \eqref{eq:AIXI2} is also possible. Normalizing $p$ and
$w$ individually to probabilities makes \eqref{eq:AIXI} into a
maximum expected utility criterion and \eqref{eq:AIXI2} into maximum
relative utility. As long as our $w$ and $p$ relate the way they do
it is still the same decisions. If we would base both expectations
on the same probabilistic beliefs it would be different criteria.
When we have an upper bound $V^*_\nu**0$ with this property (as for
the similar Theorem 5.34 in \cite{Hutter04} which dealt with
discounted values). The convergence in $W$-values is the relevant
sense of optimality for our setting, since the $V$-values converge
to zero for any policy.
\begin{theorem}[Asymptotic optimality]\label{thmOpt}
Suppose that we have a decision maker that is universal (i.e.
$p_\nu>0\ \forall\nu$) with respect to the countable class
$\mathcal{M}$ of environments (which can be stochastic) and that
there exists policies $\pi_k$ such that for all $\nu$,
$W_k^{\pi_k,\nu}\to 1$ if $\nu$ is the actual environment (or the
sequence is consistent with $\nu$). This implies that
$W_k^{\pi^{p},\mu}\to 1$ where $\mu$ is the actual environment.
\end{theorem}
The proof technique is similar to that of Theorem 5.34 in
\cite{Hutter04}.
\begin{proof}
Let
\begin{equation}
0\leq 1-W^{\pi_k,\nu}_k=:{\Delta}^k_{\nu},\ {\Delta}^k=\sum_{\nu}
p_{\nu,k}{\Delta}^k_{\nu}.
\end{equation}
The assumptions tells us that $\Delta_\nu^k=W_k^{\pi_k,\nu}- 1\to
0$ for all $\nu$ that are consistent with the sequence
($p_{\nu,k}=0$ if $\nu$ is inconsistent with the history at time
$k$) and since $\Delta^k_\nu\leq 1$ , it follows that
$$
\Delta^k=\sum_{\nu} p_{\nu,k}\Delta^k_{\nu}\to 0.
$$
Note that $p_{\mu,k} (1-W_k^{\pi^{p},\mu})\leq \sum_{\nu} p_{\nu,k}
(1-W_{\pi^{p},k}^{\nu})\leq \sum_{\nu} p_{\nu,k}
(1-W^k_{\pi_k,\nu})=\sum p_{\nu,k} \Delta^k_\nu=\Delta^k$. Since we
also know that $p_{\mu,k}\geq p_{\mu,0}>0$ it follows that
$(1-W_k^{\pi^{p},\mu})\to 0$.
\end{proof}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Conclusions}\label{five}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
We studied complete rational decision makers including the cases of
actions that may affect the environment and sequential decision
making. We set up simple common sense rationality axioms that imply
that a complete rational decision maker has preferences that can be
characterized as maximizing expected utility. Of particular interest
is the countable case where our results follow from identifying the
Banach space dual of the space of contracts.
%-------------------------------%
\paradot{Acknowledgement}
%-------------------------------%
This work was supported by ARC grant DP0988049.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% B i b l i o g r a p h y %%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\newpage
\begin{small}
\begin{thebibliography}{ABCD}\parskip=0ex
\bibitem[All53]{Allais}
M~Allais.
\newblock Le comportement de l'homme rationnel devant le risque: Critique des
postulats et axiomes de l'ecole americaine.
\newblock {\em Econometrica}, 21(4):503--546, 1953.
\bibitem[Arr70]{Arrow70}
K~Arrow.
\newblock {\em Essays in the Theory of Risk-Bearing}.
\newblock North-Holland, 1970.
\bibitem[Cox46]{Cox46}
R.~T. Cox.
\newblock Probability, frequency and reasonable expectation.
\newblock {\em Am. Jour. Phys}, 14:1--13, 1946.
\bibitem[{deF}37]{deF37}
B.~{deFinetti}.
\newblock La prévision: Ses lois logiques, ses sources subjectives.
\newblock In {\em Annales de l'Institut Henri Poincaré 7}, pages 1--68. Paris,
1937.
\bibitem[Die84]{Diestel84}
Joseph Diestel.
\newblock {\em Sequences and series in $\text{Banach}$ spaces}.
\newblock Springer-Verlag, 1984.
\bibitem[Ell61]{Ellsberg}
Daniel Ellsberg.
\newblock {Risk, Ambiguity, and the Savage Axioms}.
\newblock {\em The Quarterly Journal of Economics}, 75(4):643--669, 1961.
\bibitem[Hal99]{Hal99}
Joseph~Y. Halpern.
\newblock A counterexample to theorems of $\text{Cox and Fine}$.
\newblock {\em Journal of AI research}, 10:67--–85, 1999.
\bibitem[Hut05]{Hutter04}
Marcus Hutter.
\newblock {\em Universal Artificial Intelligence: Sequential Decisions based on
Algorithmic Probability}.
\newblock Springer, Berlin, 2005.
\bibitem[Jay03]{Jay03}
E.~T. Jaynes.
\newblock {\em Probability theory: the logic of science}.
\newblock Cambridge University Press, 2003.
\bibitem[Kre89]{Krey89}
Erwin Kreyszig.
\newblock {\em Introductory Functional Analysis With Applications}.
\newblock Wiley, 1989.
\bibitem[NB97]{Nar97}
Lawrence Naricia and Edward Beckenstein.
\newblock The $\text{Hahn-Banach}$ theorem: the life and times.
\newblock {\em Topology and its Applications}, 77(2):193--211, 1997.
\bibitem[NM44]{NeuMor44}
J.~Neumann and O.~Morgenstern.
\newblock {\em Theory of Games and Economic Behavior}.
\newblock Princeton University Press, 1944.
\bibitem[Par94]{Paris94}
J.~B. Paris.
\newblock {\em The uncertain reasoner's companion: a mathematical perspective}.
\newblock Cambridge University Press, New York, NY, USA, 1994.
\bibitem[Ram31]{Ram31}
Frank~P. Ramsey.
\newblock Truth and probability.
\newblock In R.~B. Braithwaite, editor, {\em The Foundations of Mathematics and
other Logical Essays}, chapter~7, pages 156--198. Brace \& Co., 1931.
\bibitem[Sav54]{Sav54}
L.~Savage.
\newblock {\em The Foundations of Statistics}.
\newblock Wiley, New York, 1954.
\bibitem[Sug91]{Sugden}
Robert Sugden.
\newblock Rational choice: A survey of contributions from economics and
philosophy.
\newblock {\em Economic Journal}, 101(407):751--85, July 1991.
\end{thebibliography}
\end{small}
\end{document}
%---------------------End-of-AixiAxiom.tex--------------------%
**