%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% Death and Suicide in Universal Artificial Intelligence %%
%% Jarryd Martin and Tom Everitt and Marcus Hutter (2016) %%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%-------------------------------%
% Document-Style %
%-------------------------------%
\documentclass[a4paper]{article}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% AIXI-related formatting rules
%
% written by Jan Leike 2014
%
\usepackage{amsmath}
\usepackage{amsthm}
\usepackage[bookmarks]{hyperref}
\usepackage[capitalize]{cleveref}
\usepackage[english]{babel}
\usepackage[utf8]{inputenc}
\usepackage{enumerate}
\usepackage{amssymb}
\usepackage{enumerate}
\usepackage{tikz}
\usetikzlibrary{positioning}
\newtheorem{theorem}{Theorem}
\newtheorem{lemma}[theorem]{Lemma}
\newtheorem{proposition}[theorem]{Proposition}
\newtheorem{corollary}[theorem]{Corollary}
\newtheorem{claim}[theorem]{Claim}
\crefname{claim}{Claim}{Claims}
\theoremstyle{definition}
\newtheorem{definition}[theorem]{Definition}
\newtheorem{assumption}[theorem]{Assumption}
\crefname{assumption}{Assumption}{Assumptions}
% Assumption environment with manual counter
\newtheorem{innercustomassumption}{Assumption}
\newenvironment{customassumption}[1]
{\renewcommand\theinnercustomassumption{#1}\innercustomassumption}
{\endinnercustomassumption}
\crefname{customassumption}{Assumption}{Assumptions}
\theoremstyle{remark}
\newtheorem{remark}[theorem]{Remark}
%\newtheorem{example}[theorem]{Example}
\newtheorem{examplex}[theorem]{Example}
\newenvironment{example}
{\pushQED{\qed}\renewcommand{\qedsymbol}{$\diamondsuit$}\examplex}
{\popQED\endexamplex}
\newenvironment{keywords}%
{\renewcommand{\abstractname}{Keywords}\begin{abstract}\noindent}%
{\end{abstract}}
\usepackage{mathtools}
%---------------------------%
% Argmax and Argmin
\DeclareMathOperator*{\argmax}{arg\,max} % argmax operator
\DeclareMathOperator*{\argmin}{arg\,min} % argmin operator
%---------------------------%
% AIXI
\def\AIXI{\mathrm{AIXI}}
\def\AIMU{\mathrm{AIMU}}
\def\M{\mathcal{M}} % The set of all environments
%---------------------------%
% Actions, percepts, observations, and histories
\def\A{\mathcal{A}} % set of actions
\def\E{\mathcal{E}} % set of percepts
\def\E{\mathcal{E}} % set of percepts
\def\E{\mathcal{E}} % set of percepts
\def\Q{\mathcal{Q}} % set of all Q functions
\def\H{(\A \times \E)^*} % history
\let\aechar\ae % change the name of \ae so we can redefine it
\renewcommand{\ae}{
\ifmmode\mathchoice{
\mbox{\textsl{\aechar}}
}{
\mbox{\textsl{\aechar}}
}{
\mbox{\scriptsize\textsl{\aechar}}
}{
\mbox{\scriptsize\textsl{\aechar}}
}\else\aechar\fi%
} % action-percept tuple
%---------------------------%
% Measures
\def\dmid{\parallel} % double mid
\def\leadsto{\rightarrow} % conditional probability
\newcommand{\interleaved}[2]{%
{\mathchoice{
{\scriptscriptstyle\overleftrightarrow{\mbox{\vphantom{\footnotesize I}\normalsize${#1}\hspace{-1pt}{#2}$}}}%
}{
{\scriptscriptstyle\overleftrightarrow{\mbox{\vphantom{\footnotesize I}\normalsize${#1}\hspace{-1pt}{#2}$}}}%
}{
{\scriptscriptstyle\overleftrightarrow{\mbox{\vphantom{\tiny I}$\scriptstyle {#1}\hspace{-1pt}{#2}$}}}%
}{
{\scriptscriptstyle\overleftrightarrow{\mbox{$\scriptscriptstyle {#1}\hspace{-1pt}{#2}$}}}%
}}} % agent and environment interleaved
%---------------------------%
% Expectimax operator
\usepackage{relsize}
\newcommand{\expectimax}[1]{%
\mathchoice{
\ooalign{\hss$\max$\hss\cr$\mathlarger{\sum}\limits_{#1}$}
}{
\ooalign{\hss$\sum$\hss\cr$\max$}_{#1}
}{}{}
}
%---------------------------%
% Deprecated macros (used in old formalism)
\def\O{\mathcal{O}} % set of observations
\def\ao{{a\hspace{-1.4pt}o}} % action-observation tuple
\def\interleave{\rightleftarrows}
\def\paradot#1{\paragraph{#1.}}
%-------------------------------%
% Macro-Definitions %
%-------------------------------%
\def\eps{\varepsilon} % small positive number
\def\epstr{\epsilon} % empty string
\def\SetR{\mathbb{R}} % Set of Real numbers
\newcommand*{\AImu}{AI$\mu$}
\newcommand*{\As}{\A^{{\rm suicide}}}
\newcommand*{\mun}{\mu_{{\rm norm}}}
\newcommand*{\nun}{\nu_{{\rm norm}}}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% T i t l e - P a g e %%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\title{Death and Suicide in Universal Artificial Intelligence%
\footnote{A shorter version of this paper will be presented at
AGI-16 \cite{Martin2016agi}.}}
\date{\today}
\author{Jarryd Martin \and Tom Everitt \and Marcus Hutter
\\[0.5em]Australian National University, Canberra, Australia}
\sloppy
\begin{document}
\maketitle
\begin{abstract}
Reinforcement learning (RL) is a general paradigm for studying
intelligent behaviour, with applications ranging from artificial
intelligence to psychology and economics. AIXI is a universal
solution to the RL problem; it can learn any computable environment.
A technical subtlety of AIXI is that it is defined using a mixture over
{\em semimeasures} that need not sum to 1, rather than
over proper probability measures. In this work we argue that
the shortfall of a semimeasure can naturally be interpreted as
the agent's estimate of the probability of its death. We formally define
death for generally intelligent agents like AIXI, and prove a number
of related theorems about their behaviour. Notable discoveries
include that agent behaviour can change radically under positive linear
transformations of the reward signal (from suicidal to
dogmatically self-preserving), and that the agent's posterior belief
that it will survive increases over time.
\end{abstract}
\begin{keywords}
AIXI,
universal intelligence,
algorithmic information theory,
semimeasure,
Solomonoff Induction,
AI safety,
death,
suicide,
suicidal agent
\end{keywords}
\tableofcontents
\pagebreak
\begin{quote}\it
\lq`That Suicide may often be consistent with interest and with our duty to ourselves, no one can question, who allows, that age, sickness, or misfortune may render life a burthen, and make it worse even than annihilation." \par
\hfill --- {\sl Hume, \textit{Of Suicide} (1777)}
\end{quote}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Introduction}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Reinforcement Learning (RL) has proven to be a fruitful theoretical framework for reasoning about the properties of generally intelligent agents \cite{Hutter:04uaibook}. A good theoretical understanding of these agents is valuable for several reasons. Firstly, it can guide principled attempts to construct such agents \cite{Veness:09}. Secondly, once such agents are constructed, it may serve to make their reasoning and behaviour more transparent and intelligible to humans. Thirdly, it may assist in the development of strategies for controlling these agents. The latter challenge has recently received considerable attention in the context of the potential risks posed by these agents to human safety \cite{Bostrom2014}. It has even been argued that control strategies should be devised \textit{before} generally intelligent agents are first built \cite{Soares2015}. In this context - where we must reason about the behaviour of agents in the absence of a full specification of their implementation - a theoretical understanding of their general properties seems indispensable.
The universally intelligent agent AIXI constitutes a formal mathematical theory of artificial general intelligence \cite{Hutter:04uaibook}. AIXI models its environment using a \emph{universal mixture $\xi$} over the class of all lower semi-computable semimeasures, and thus is able to learn any computable environment. Semimeasures are defective probability measures which may sum to less than 1. Originally devised for Solomonoff induction, they are necessary for universal artificial intelligence because the halting problem prevents the existence of a (lower semi-)computable universal measure for the class of (computable) measures \cite{Li:08}. Recent work has shown that their use in RL has technical consequences that do not arise with proper measures.%
\footnote{For example, Leike and Hutter \cite{Leike2015a} proved that since $\xi$ is a mixture over semimeasures, the iterative and recursive formulations of the value function are non-equivalent.}
However, their use has heretofore lacked an interpretation proper to the RL context. In this paper, we argue that the measure loss suffered by semimeasures admits a deep and fruitful interpretation in terms of the agent's \emph{death}. We intend this usage to be intuitive: death means that one sees no more percepts, and takes no more actions. Assigning positive probability to death at time $t$ thus means assigning probability less than 1 to seeing a percept at time $t$. This motivates us to interpret the semimeasure loss in AIXI's environment model as its estimate of the probability of its own death.
%-------------------------------%
\paradot{Contributions}
%-------------------------------%
We first compare the interpretation of semimeasure loss as death-probability with an alternative characterisation of death as a \lq death-state' with 0 reward, and prove that the two definitions are equivalent for value-maximising agents (\cref{th:deathsame}).
Using this formalism we proceed to reason about the behaviour of several generally intelligent agents in relation to death: AI$\mu$, which knows the true environment distribution; AI$\xi$, which models the environment using a universal mixture; and AIXI, a special case of AI$\xi$ that uses the Solomonoff prior \cite{Hutter:04uaibook}.
Under various conditions, we show that:
\begin{itemize}
\item[•] Standard AI$\mu$ will try to avoid death (\cref{th:self-preserve}).
\item[•] AI$\mu$ with reward range shifted to $[-1,0]$ will seek death (\cref{th:suicide});
which we may interpret as AI$\mu$ attempting suicide.
This change is very unusual, given that agent behaviour is normally
invariant under positive linear transformations of the reward. We briefly consider the relevance of these results to AI safety risks and control strategies.
\item[•] AIXI increasingly believes it is in a safe environment (\cref{th:ratio}), and asymptotically its posterior estimate of the death-probability on sequence goes to 0 (\cref{th:immortal}). This occurs regardless of the true death-probability.
\item[•] However, we show by example that AIXI may maintain high
probability of death \emph{off-sequence} in certain situations.
Put simply, AIXI learns that it will live forever, but
not necessarily that it is immortal.
\end{itemize}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Preliminaries}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%-------------------------------%
\paradot{Strings}
%-------------------------------%
Let the \textit{alphabet} $\mathcal{X}$ be a finite set of symbols,
$\mathcal{X}^* := \bigcup^{\infty}_{n=0}\mathcal{X}^n$ be the set
of all finite strings over alphabet $\mathcal{X}$, and
$\mathcal{X}^\infty$ be the set of all infinite strings over
alphabet $\mathcal{X}$. Their union is the set $\mathcal{X}^{\#}:=
\mathcal{X}^*\cup\mathcal{X}^\infty$. We denote the empty string by
$\epstr$. For a string $x\in\mathcal{X}^*$, $x_{1:k}$ denotes the first $k$ characters of $x$, and $x_{0, \ \forall \nu\in\M$ (hence $\xi$ is universal for $\M$), and $w_\nu(\ae_{ 0$, given some history
$\ae_{ 0$.
From this follows that $V_\mu^\pi(\ae_{ V_\mu^\pi(\ae_{] node[above] {$a'$} (D);
\end{tikzpicture}
\caption{
In the semimeasure $\mu$,
action $a$ means you stay alive with certainty
and receive percept $e$ (no measure loss).
Action $a'$ means that you `jump off a cliff' and die with certainty
without receiving a percept (full measure loss).
}
\label{fig:no-measure-loss}
\end{figure}
\paradot{AIXI and immortality}
AIXI therefore becomes asymptotically certain that it will not die, given the particular sequence of actions it takes. However, this does not entail that AIXI necessarily concludes that it is immortal, because it may still maintain a counterfactual belief that it \emph{could die were it to act differently}. This is because the convergence of $\xi$ to $\mun$
only holds on the actual action sequence $a_{1:\infty}$ \cite[Sec.~5.1.3]{Hutter:04uaibook}. Consider \cref{fig:no-measure-loss}, which describes an environment in which taking action $a$ is always safe, and the action $a'$ %corresponds to jumping off a cliff and
leads to certain death. AIXI will never take $a'$, and on the sequence $\ae_{1:\infty}=aeaeae\ldots$ that it does experience, the true environment $\mu$ does not suffer any measure loss. This means that it will never increase its posterior belief in $\mun$ relative to $\mu$ (because on the safe sequence, the two environments are indistinguishable). Again we arrive at a counterintuive result. In this particular environment, AIXI continues to believe that it might be in a risky environment $\mu$, but only because on sequence it avoids exposure to death risk. It is only by taking risky actions and surviving that AIXI becomes sure it is immortal.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Conclusion}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
In this paper we have given a formal definition of death for intelligent
agents in terms of semimeasure loss.
The definition is applicable to any universal agent that uses
an environment class $\M$ containing semimeasures.
Additionally we have shown this definition equivalent to an alternative formalism in which the environment is modelled as a proper measure and death is a death-state with zero reward. We have shown that agents seek or avoid death depending
on whether rewards are represented by positive or negative real numbers,
and that survival in spite of positive probability of death actually increases a Bayesian agent's confidence that it is in a \emph{safe} environment.
We contend that these results have implications for problems in AI safety; in particular, for the so called \lq\lq shutdown problem" \cite{Soares2015}.
The shutdown problem arises if an intelligent agent's self-preservation drive incentivises it to resist termination \cite{Bostrom2014,Omohundro2008,Soares2015}. A full analysis of the problem is beyond the scope of this paper, but our results show that the self-preservation drive of universal agents depends on the reward range. This suggests a potentially robust ``tripwire mechanism" \cite{Bostrom2014} that could decrease the risk of intelligence explosion. The difficulty with existing tripwire proposals is that they require the explicit specification of a tripwire condition that the agent must not violate. It seems doubtful that such a condition could ever be made robust against subversion by a sufficiently intelligent agent \cite{Bostrom2014}. Our tentative proposal does not require the specification, evaluation or enforcement of an explicit condition. If an agent is designed to be suicidal, it will be intrinsically incentivised to destroy itself upon reaching a sufficient level of competence, instead of recursively self-improving toward superintelligence. Of course, a suicidal agent will pose a safety risk in itself, and the provision of a relatively safe mode of self-destruction to an agent is a significant design challenge. It is hoped that the preceding formal treatment of death for generally intelligent agents will allow more rigorous investigation into this and other problems related to agent termination.
\section*{Acknowledgements}
We thank John Aslanides and Jan Leike for reading drafts
and providing valuable feedback.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\bibliographystyle{splncs03}
\bibliography{references}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\end{document}