%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% Indefinitely Oscillating Martingales %%
%% Jan Leike and Marcus Hutter (2014) %%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%-------------------------------%
% Compiler-Switches %
%-------------------------------%
% Short version? (Or tech report version?)
\newif\ifshort
%\shortfalse
\shorttrue
%-------------------------------%
% Document-Style %
%-------------------------------%
\documentclass[oribibl,envcountsame]{llncs}
\usepackage[english]{babel}
\usepackage[utf8]{inputenc}
\usepackage[bookmarks]{hyperref}
\usepackage{amsmath}
\usepackage{amssymb}
\usepackage{enumerate}
\usepackage{tikz}
\usepackage{wrapfig}
\sloppy\lineskip=0pt
%-------------------------------%
% Macro-Definitions %
%-------------------------------%
\DeclareMathOperator*{\argmin}{arg\,min}
\DeclareMathOperator*{\argmax}{arg\,max}
\def\MDL{\mathrm{MDL}} % the MDL operator
\def\eps{\varepsilon} % a small real > 0
\def\epstr{\epsilon} % the empty string
\def\cinfty{\omega} % countable infinity (\infty or \omega)
\def\SetN{\mathbb{N}} % set of natural numbers
\def\F{\mathcal{F}} % sigma-algebra
\def\Foo{\mathcal{F}_\cinfty} % the sigma-algebra on \Sigma^\omega
\def\E{\mathbb{E}} % expectation
% Our own claim enumeration environment with blackjack and hookers!
% (because enumerate is just not awesome enough to do all the things we want it to do)
\newenvironment{claims}
{\newcounter{ClaimCounter}}
{}
\newcommand{\newclaim}[1]{
\refstepcounter{ClaimCounter}
\emph{Claim \theClaimCounter: {#1}}
}
\newcommand{\ClaimCounterautorefname}{Claim}
%-------------------------------%
% Fixes to Springer's Template %
%-------------------------------%
% autoref Section and Subsection in uppercase
\AtBeginDocument{
\renewcommand{\sectionautorefname}{Section}
\renewcommand{\subsectionautorefname}{Subsection}
}
% Lists with bullets
\renewcommand{\labelitemi}{$\bullet$}
% qedhere
\def\qedhere{\eqno\qed}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% Title Page & Meta Data
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\title{Indefinitely Oscillating Martingales}
\titlerunning{Indefinitely Oscillating Martingales}
\author{Jan Leike \and Marcus Hutter}
\authorrunning{J. Leike and M. Hutter}
\institute{
The Australian National University \\
\texttt{\{jan.leike|marcus.hutter\}@anu.edu.au}
}
\date{\today}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{document}
\maketitle
\begin{abstract}
We construct a class of nonnegative martingale processes that oscillate indefinitely with high probability.
For these processes,
we state a uniform rate of the number of oscillations for a given magnitude and
show that this rate is asymptotically close to the theoretical upper bound.
These bounds on probability and expectation of the number of upcrossings
are compared to classical bounds from the martingale literature.
We discuss two applications.
First, our results imply that the limit of the minimum description length operator may not exist.
Second, we give bounds on how often one can change one's belief in a given hypothesis when observing a stream of data.%
\ifshort%
\footnote{
In Theorem~\ref{thm:measure-martingale},
$Q$ needs to be absolutely continuous with respect to $P$ on cylinder sets.
In Theorem~\ref{thm:lower-bound}, Corollary~\ref{cor:lower-bound}, Corollary~\ref{cor:lower-bound-concrete},
and Corollary~\ref{cor:mdl-inductively-inconsistent}, $P$ needs to have
perpetual entropy. See technical report~\cite{LH:14martoscx}, which also contains omitted proofs.}
\fi
\keywords{martingales, infinite oscillations, bounds, convergence rates,
minimum description length, mind changes.}
\end{abstract}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Introduction}\label{sec:introduction}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%-------------------------------%
%\paragraph{Martingales and probability measures.}
%-------------------------------%
Martingale processes model fair gambles
where knowledge of the past or choice of betting strategy have no impact on future winnings.
But their application is not restricted to gambles and stock markets.
Here we exploit the connection between nonnegative martingales and
probabilistic data streams, i.e., probability measures on infinite strings.
For two probability measures $P$ and $Q$ on infinite strings,
the quotient $Q/P$ is a nonnegative $P$-martingale.
Conversely, every nonnegative $P$-martingale is a multiple of $Q/P$
$P$-almost everywhere
for some probability measure $Q$.
%-------------------------------%
%\paragraph{Martingale upcrossings.}
%-------------------------------%
One of the famous results of martingale theory is \hyperref[thm:upcrossing-inequality]{Doob's Upcrossing Inequality}~\cite{Doob:53}.
The inequality states that in expectation,
every nonnegative martingale has only finitely many oscillations
(called \emph{upcrossings} in the martingale literature).
Moreover, the bound on the expected number of oscillations is inversely proportional to their magnitude.
Closely related is \hyperref[thm:dubins-inequality]{Dubins' Inequality}~\cite{Durrett:10}
which asserts that the probability of having many oscillations decreases exponentially with their number.
These bounds are given with respect to oscillations of fixed magnitude.
%-------------------------------%
%\paragraph{Indefinitely oscillating martingales.}
%-------------------------------%
In \autoref{sec:indefinitely-oscillating-martingales}
we construct a class of nonnegative martingale processes
that have infinitely many oscillations of (by Doob necessarily) decreasing magnitude.
These martingales satisfy uniform lower bounds
on the probability and the expectation of the number of upcrossings.
We prove corresponding upper bounds in \autoref{sec:upper-bounds}
showing that these lower bounds are asymptotically tight.
Moreover, the construction of the martingales is agnostic regarding the underlying probability measure,
assuming only mild restrictions on it.
We compare these results to the statements of \hyperref[thm:dubins-inequality]{Dubins' Inequality} and
\hyperref[thm:upcrossing-inequality]{Doob's Upcrossing Inequality} and
demonstrate that our process makes those inequalities (in Doob's case asymptotically) tight.
If we drop the uniformity requirement,
asymptotics arbitrarily close to Doob and Dubins' bounds are achievable.
We discuss two direct applications of these bounds.
%-------------------------------%
%\paragraph{Minimum description length.}
%-------------------------------%
The Minimum Description Length (MDL) principle~\cite{Rissanen:78} and
the closely related Minimal Message Length (MML) principle~\cite{Wallace:68}
recommend to select among a class of models the one that
has the shortest code length for the data plus code length for the model.
There are many variations, so the following statements are generic:
for a variety of problem classes
MDL's predictions have been shown to converge asymptotically (predictive convergence).
For continuous independently identically distributed data
the MDL estimator usually converges to the true distribution~\cite{Gruenwald:07book,Wallace:05}
(inductive consistency).
For arbitrary (non-i.i.d.) countable classes,
the MDL estimator's predictions converge to those of the true distribution
for single-step predictions~\cite{Hutter:05mdl2px}
and $\infty$-step predictions~\cite{Hutter:09mdltvp}. % (predictive convergence).
Inductive consistency implies predictive convergence,
but not the other way around.
In \autoref{sec:mdl-application} we show that indeed,
the MDL estimator for countable classes is \emph{inductively inconsistent}.
This can be a major obstacle for using MDL for prediction,
since the model used for prediction has to be changed over and over again,
incurring the corresponding computational cost.
%-------------------------------%
%\paragraph{Mind changes.}
%-------------------------------%
Another application of martingales is in the theory of mind changes~\cite{Luo:05}.
How likely is it that your belief in some hypothesis changes by at least $\alpha > 0$
several times while observing some evidence?
Davis recently showed~\cite{Davis:13} using elementary mathematics
that this probability decreases exponentially.
In \autoref{sec:bounds-on-mind-changes} we rephrase this problem in our setting:
the stochastic process
\[
P( \,\text{hypothesis} \mid \text{evidence up to time $t$}\, )
\]
is a martingale bounded between $0$ and $1$.
The upper bound on the probability of many changes can thus be
derived from \hyperref[thm:dubins-inequality]{Dubins' Inequality}.
This yields a simpler alternative proof for Davis' result.
However, because we consider nonnegative but unbounded martingales,
we get a weaker bound than Davis.
\ifshort
Omitted proofs can be found in the extended technical report~\cite{LH:14martoscx}.
\fi
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Strings, Measures, and Martingales}
\label{sec:preliminaries}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%-------------------------------%
%\paragraph{Measure theory on infinite strings}
%-------------------------------%
We presuppose basic measure and probability theory~\cite[Chp.1]{Durrett:10}.
%
Let $\Sigma$ be a finite set, called \emph{alphabet}.
We assume $\Sigma$ contains at least two distinct elements.
%
For every $u \in \Sigma^*$,
the \emph{cylinder set}
\[
\Gamma_u := \{ uv \mid v \in \Sigma^\cinfty \}
\]
is the set of all infinite strings of which $u$ is a prefix.
%
Furthermore, fix the $\sigma$-algebras
\[
\F_t := \sigma \left(\{ \Gamma_u \mid u \in \Sigma^t \} \right)
\qquad\text{and}\qquad
\Foo := \sigma \Big( \bigcup_{t=1}^\infty \F_t \Big).
\]
$(\F_t)_{t \in \SetN}$ is a \emph{filtration}:
since $\Gamma_u = \bigcup_{a \in \Sigma} \Gamma_{ua}$,
it follows that $\F_t \subseteq \F_{t+1}$ for every $t \in \SetN$,
and all $\F_t \subseteq \Foo$ by the definition of $\Foo$.
%
An \emph{event} is a measurable set $E \subseteq \Sigma^\cinfty$.
The event $E^c := \Sigma^\cinfty \setminus E$ denotes the complement of $E$.
\ifshort\else
See also the list of notation
in Appendix \ref{app:notation}.
\fi
%-------------------------------%
\begin{definition}[Stochastic Process]
\label{def:stochastic-process}
%-------------------------------%
$(X_t)_{t \in \SetN}$ is called \emph{($\mathbb{R}$-valued) stochastic process} iff
each $X_t$ is an $\mathbb{R}$-valued random variable.
\end{definition}
%-------------------------------%
\begin{definition}[Martingale]
\label{def:martingale}
%-------------------------------%
Let $P$ be a probability measure over $(\Sigma^\cinfty, \Foo)$.
An $\mathbb{R}$-valued stochastic process $(X_t)_{t \in \SetN}$ is called a
\emph{$P$-supermartingale ($P$-sub\-martin\-gale)}
iff
\begin{enumerate}[(a)]
\item each $X_t$ is $\F_t$-measurable, and
\item $\E[X_t \mid \F_s] \leq X_s$ ($\E[X_t \mid \F_s] \geq X_s$) almost surely
for all $s, t \in \SetN$ with $s < t$.
\end{enumerate}
A process that is both $P$-supermartingale and $P$-submartingale is called \emph{$P$-martingale}.
\end{definition}
%
We call a supermartingale (submartingale) process $(X_t)_{t \in \SetN}$ \emph{nonnegative} iff
$X_t \geq 0$ for all $t \in \SetN$.
%-------------------------------%
%\paragraph{Stopped processes.}
%-------------------------------%
A \emph{stopping time} is an $(\SetN \cup \{ \cinfty \})$-valued random variable $T$
such that $\{ v \in \Sigma^\cinfty \mid T(v) = t \} \in \F_t$ for all $t \in \SetN$.
Given a supermartingale $(X_t)_{t \in \SetN}$,
the \emph{stopped process} $(X_{\min\{t, T\}})_{t \in \SetN}$
is a supermartingale~\cite[Thm.\ 5.2.6]{Durrett:10}.
If $(X_t)_{t \in \SetN}$ is bounded,
the limit of the stopped process, $X_T$, exists almost surely
even if $T = \cinfty$ (Martingale Convergence Theorem~\cite[Thm.\ 5.2.8]{Durrett:10}).
We use the following variant on Doob's Optional Stopping Theorem for supermartingales.
%-------------------------------%
\begin{theorem}[{Optional Stopping Theorem~\cite[Thm.\ 5.7.6]{Durrett:10}}]
\label{thm:optional-stopping}
%-------------------------------%
Let $(X_t)_{t \in \SetN}$ be a nonnegative supermartingale
and let $T$ be a stopping time.
The random variable $X_T$ is almost surely well defined and
$\E[X_T] \leq \E[X_0]$.
\end{theorem}
We exploit the following two theorems that
state the connection between
probability measures on infinite strings and martingales.
For any two probability measures $P$ and $Q$ on $(\Sigma^\cinfty, \Foo)$,
the quotient $Q/P$ is a nonnegative $P$-martingale.
%
Conversely, for every nonnegative $P$-martingale
there is a probability measure $Q$ on $(\Sigma^\cinfty, \Foo)$ such that
the martingale is $P$-almost surely a multiple of $Q/P$.
%-------------------------------%
\begin{theorem}[{Measures $\rightarrow$ Martingales~\cite[II§7 Ex.\ 3]{Doob:53}}]
\label{thm:measure-martingale}
%-------------------------------%
Let $Q$ and $P$ be two probability measures on $(\Sigma^\cinfty, \Foo)$.
The stochastic process $(X_t)_{t \in \SetN}$,
\ifshort $X_t(v) := Q(\Gamma_{v_{1:t}}) / P(\Gamma_{v_{1:t}})$
\else \[ X_t(v) := \frac{Q(\Gamma_{v_{1:t}})}{P(\Gamma_{v_{1:t}})} \]
\fi
is a nonnegative $P$-martingale
with $\E[X_t] = 1$.
\end{theorem}
%-------------------------------%
\begin{theorem}[{Martingales $\rightarrow$ Measures\ifshort~\cite{LH:14martoscx}\fi}]
\label{thm:martingale-measure}
%-------------------------------%
Let $P$ be a probability measure on $(\Sigma^\cinfty, \Foo)$ and
let $(X_t)_{t \in \SetN}$ be a nonnegative $P$-martingale
with $\E[X_t] = 1$.
There is a probability measure $Q$ on $(\Sigma^\cinfty, \Foo)$ such that
for all $v \in \Sigma^\cinfty$ and
all $t \in \SetN$ with $P(\Gamma_{v_{1:t}}) > 0$,
\ifshort $X_t(v) = Q(\Gamma_{v_{1:t}}) / P(\Gamma_{v_{1:t}})$.
\else \[ X_t(v) = \frac{Q(\Gamma_{v_{1:t}})}{P(\Gamma_{v_{1:t}})}. \]
\fi
\end{theorem}
\ifshort\else
For completeness,
we provide proofs for
\autoref{thm:measure-martingale} and \autoref{thm:martingale-measure} in
Appendix \ref{app:measures-martingales}.
\fi
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Martingale Upcrossings}
\label{sec:martingale-upcrossings}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%-------------------------------%
%\paragraph{Upcrossings.}
%-------------------------------%
Fix $c \in \mathbb{R}$, and
let $(X_t)_{t \in \SetN}$ be a martingale
over the probability space $(\Sigma^\cinfty, \Foo, P)$.
Let $t_1 < t_2$.
We say the process $(X_t)_{t \in \SetN}$ does an \emph{$\eps$-upcrossing} between $t_1$ and $t_2$ iff
$X_{t_1} \leq c - \eps$ and $X_{t_2} \geq c + \eps$.
Similarly, we say $(X_t)_{t \in \SetN}$ does an \emph{$\eps$-downcrossing} between $t_1$ and $t_2$ iff
$X_{t_1} \geq c + \eps$ and $X_{t_2} \leq c - \eps$.
%
Except for the first upcrossing, consecutive upcrossings always involve intermediate downcrossings.
Formally, we define the stopping times
\begin{align*}
T_0(v) &:= 0, \\
T_{2k+1}(v) &:= \inf \{ t > T_{2k}(v) \mid X_t(v) \leq c - \eps \}, \text{ and} \\
T_{2k+2}(v) &:= \inf \{ t > T_{2k+1}(v) \mid X_t(v) \geq c + \eps \}.
\end{align*}
The $T_{2k}(v)$ denote the indices of upcrossings.
We count the number of upcrossings by the random variable $U_t^X(c - \eps, c + \eps)$,
where
\[
U_t^X(c - \eps, c + \eps)(v) := \sup \{ k \geq 0 \mid T_{2k}(v) \leq t \}
\]
and $U^X(c - \eps, c + \eps) := \sup_{t \in \SetN} U_t^X(c - \eps, c + \eps)$
denotes the total number of upcrossings.
We omit the superscript $X$
if the martingale $(X_t)_{t \in \SetN}$ is clear from context.
%-------------------------------%
%\paragraph{Definition of the events $E_{m,k}^{X,f}$.}
%-------------------------------%
The following notation is used in the proofs.
Given a monotone decreasing function $f: \SetN \to [0, 1)$ and $m,k \in \SetN$,
we define the events $E_{m,k}^{X,f}$
that denote that there are at least $k$-many $f(m)$-upcrossings:
\[
E_{m,k}^{X,f} := \left\{ v \in \Sigma^\cinfty \mid U^X(1 - f(m), 1 + f(m))(v) \geq k \right\}.
\]
For all $m,k \in \SetN$ we have
$E_{m,k}^{X,f} \supseteq E_{m,k+1}^{X,f}$ and
$E_{m,k}^{X,f} \subseteq E_{m+1,k}^{X,f}$.
Again, we omit $X$ and $f$ in the superscript if they are clear from context.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Indefinitely Oscillating Martingales}
\label{sec:indefinitely-oscillating-martingales}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%-------------------------------%
\ifshort
\begin{wrapfigure}{r}{0.5\textwidth}
\vspace{-8mm}
\else
\begin{figure}[ht]
\fi
%-------------------------------%
\centering
\begin{tikzpicture}[scale=\ifshort 0.55\else 1.0\fi]
% Axes
\draw[->] (-0.1,0) -- (8.3,0) node[right] {$t$};
\draw[->] (0,-0.1) -- (0,4) node[above] {$X_t$};
\draw[color=gray,dashed] (8.2,2) -- (-0.1,2) node[left,color=black] {$1$};
% F(M_t)
\draw[color=orange] (0,3.2) -- (1,3.2) -- (1,2.6) -- (2,2.6) -- (2,2.4)
-- (3.5, 2.4) -- (3.5, 2.3) -- (5, 2.3) -- (5, 2.24) -- (8.2, 2.24)
node[above] {$1 + f(M_t)$};
\draw[color=orange] (0,0.8) -- (1,0.8) -- (1,1.4) -- (2,1.4) -- (2,1.6)
-- (3.5, 1.6) -- (3.5, 1.7) -- (5, 1.7) -- (5, 1.76) -- (8.2, 1.76)
node[below] {$1 - f(M_t)$};
% f = 2 + 0.6/x
% X_t
\draw (0,2) -- (0.5,0.8) -- (1,3.2) -- (1.5, 1.4) -- (2, 2.6) -- (2.5, 3.6)
-- (3, 1.6) -- (3.5, 2.4) -- (4, 1.7) -- (4.5, 1.1) -- (5, 2.3)
-- (5.5, 1.76) -- (6, 1.28) -- (6.5, 0.32) -- (7, 0.64) -- (7.5, 0.9)
-- (8, 0.8) -- (8.2, 0.48);
\draw[fill]
(0,2) circle (0.05)
(0.5,0.8) circle (0.05)
(1,3.2) circle (0.05)
(1.5,1.4) circle (0.05)
(2, 2.6) circle (0.05)
(2.5, 3.6) circle (0.05)
(3, 1.6) circle (0.05)
(3.5, 2.4) circle (0.05)
(4, 1.7) circle (0.05)
(4.5, 1.1) circle (0.05)
(5, 2.3) circle (0.05)
(5.5, 1.76) circle (0.05)
(6, 1.28) circle (0.05)
(6.5, 0.32) circle (0.05)
(7, 0.64) circle (0.05)
(7.5, 0.9) circle (0.05)
(8, 0.8) circle (0.05);
\end{tikzpicture}
\caption{
An example evaluation of the martingale defined in the proof of
\autoref{thm:lower-bound}.
}
\label{fig:oscillating-martingale}
\ifshort
\vspace{-6mm}
\end{wrapfigure}
\else
\end{figure}
\fi
%-------------------------------%
%\paragraph{Intuition behind our class of indefinitely oscillating martingale.}
%-------------------------------%
In this section we construct a class of martingales that has a high probability of doing
an infinite number of upcrossings.
The magnitude of the upcrossings decreases at a rate of a given summable function $f$
(a function $f$ is called \emph{summable} iff
it has finite $L_1$-norm, i.e., $\sum_{i=1}^\infty f(i) < \infty$),
and the value of the martingale $X_t$ oscillates back and forth between $1 - f(M_t)$ and $1 + f(M_t)$,
where $M_t$ denotes the number of upcrossings so far.
The process has a monotone decreasing chance of escaping the oscillation.
%-------------------------------%
\begin{theorem}[An indefinitely oscillating martingale]
\label{thm:lower-bound}
%-------------------------------%
Let $0 < \delta < \frac{2}{3}$ and
let $f:\SetN \to [0, 1)$ be any monotone decreasing function such that
$\sum_{i=1}^\infty f(i) \leq \frac{\delta}{2}$.
For every probability measure $P$ with
$P(\Gamma_u) > 0$ for all $u \in \Sigma^*$
there is a nonnegative martingale $(X_t)_{t \in \SetN}$
with $\E[X_t] = 1$ and
\[
P [ \forall m.\; U(1 - f(m), 1 + f(m)) \geq m ]
\geq 1 - \delta.
\]
\end{theorem}
\begin{proof}
We assume $\Sigma = \{ 0, 1 \}$ by grouping symbols into two groups.
Since $P(\Gamma_{u0} \mid \Gamma_u) + P(\Gamma_{u1} \mid \Gamma_u) =1$,
we can define a function $a:\Sigma^* \to \Sigma$ that assigns
to every string $u \in \Sigma^*$ a symbol $a_u := a(u)$ such that
$p_u := P(\Gamma_{ua_u} \mid \Gamma_u) \leq \frac{1}{2}$.
By assumption, we have $p_u > 0$.
%-------------------------------%
%\paragraph{Definition of $(X_t)_{t \in \SetN}$.}
%-------------------------------%
We define the following stochastic process $(X_t)_{t \in \SetN}$.
Let $v \in \Sigma^\omega$ and $t \in \SetN$ be given
and define $u := v_{1:t}$.
Let
\[
M_t(v)
:= 1 + \argmax_{m \in \SetN}
\left\{ \forall k \leq m.\; U_t^X(1 - f(k), 1 + f(k)) \geq k \right\},
\]
i.e.,
$M_t$ is $1$ plus
the number of upcrossings completed up to time $t$.
Define
\[
\gamma_t(v) := \tfrac{p_u}{1 - p_u} (1 + f(M_t(v)) - X_t(v)).
\]
For $t = 0$, we set $X_0(v) := 1$,
otherwise we distinguish the following three cases.
\begin{enumerate}[(i)]
\item For $X_t(v) \geq 1$:
\[
X_{t+1}(v) :=
\begin{cases}
1 - f(M_t(v))
&\text{if } v_{t+1} \neq a_u, \\
X_t(v) + \frac{1 - p_u}{p_u} (X_t(v) - (1 - f(M_t(v))))
&\text{if } v_{t+1} = a_u.
\end{cases}
\]
\item For $1 > X_t(v) \geq \gamma_t(v)$:
\[
X_{t+1}(v) :=
\begin{cases}
X_t(v) - \gamma_t(v)
&\text{if } v_{t+1} \neq a_u, \\
1 + f(M_t(v))
&\text{if } v_{t+1} = a_u.
\end{cases}
\]
\item For $X_t(v) < \gamma_t(v)$ and $X_t(v) < 1$: \\
let $d_t(v) := \max \{ 0,
\min \{ \frac{p_u}{1 - p_u} X_t(v),
\tfrac{1 - p_u}{p_u} \gamma_t(v) - 2f(M_t(v))
\} \}$;
\[
X_{t+1}(v) :=
\begin{cases}
X_t(v) + d_t(v)
&\text{if } v_{t+1} \neq a_u, \\
X_t(v) - \tfrac{1 - p_u}{p_u} d_t(v)
&\text{if } v_{t+1} = a_u.
\end{cases}
\]
\end{enumerate}
%-------------------------------%
%\paragraph{Description of $(X_t)_{t \in \SetN}$.}
%-------------------------------%
We give an intuition for the behavior of the process $(X_t)_{t \in \SetN}$.
For all $m$, the following repeats.
First $X_t$ increases while reading $a_u$'s
until it reads one symbol that is not $a_u$ and then jumps down to $1 - f(m)$.
Subsequently, $X_t$ decreases while not reading $a_u$'s
until it falls below $\gamma_t$ or reads an $a_u$ and then jumps up to $1 + f(m)$.
If it falls below $1$ and $\gamma_t$, then at every step,
it can either jump up to $1 - f(m)$ or jump down to $0$,
whichever one is closest
(the distance to the closest of the two is given by $d_t$).
See \autoref{fig:oscillating-martingale} for a visualization.
For notational convenience,
in the following we omit writing the argument $v$ to the random variables
$X_t$, $\gamma_t$, $M_t$, and $d_t$.
\begin{claims}
\newclaim{$(X_t)_{t \in \SetN}$ is a martingale.}
\label{claim:is-martingale}
Each $X_{t+1}$ is $\F_{t+1}$-measurable,
since it uses only the first $t + 1$ symbols of $v$.
%
Writing out cases (i), (ii), and (iii), we get
\begin{align*}
\E[X_{t+1} \mid \F_t]
&\stackrel{(i)}{=} (1 - f(M_t)) (1 - p_u)
+ \big(X_t + \tfrac{1 - p_u}{p_u}(X_t - (1 - f(M_t))) \big) p_u
= X_t, \\
\E[X_{t+1} \mid \F_t]
&\stackrel{(ii)}{=} \big(X_t - \tfrac{p_u}{1 - p_u}((1 + f(M_t)) - X_t) \big) (1 - p_u)
+ (1 + f(M_t)) p_u
= X_t, \\
\E[X_{t+1} \mid \F_t]
&\stackrel{(iii)}{=} (X_t + d_t) (1 - p_u)
+ (X_t - \tfrac{1 - p_u}{p_u} d_t) p_u
= X_t.
\end{align*}
\newclaim{$X_t \geq 0$ and $\E[X_t] = 1$.}
\label{claim:nonnegative-and-expectation}
The latter follows from
\ifshort $X_0 = 1$.
\else
\[
\E[X_t]
~=~ \E[\E[X_t \mid \F_{t-1}]]
~=~ \E[X_{t-1}]
~=~ \ldots
~=~ \E[X_0]
%~=~ X_0
~=~ 1.
\]
\fi
Regarding the former,
we use $0 \leq f(M_t) < 1$ to conclude
\begin{itemize}
\setlength{\itemindent}{1.5em}
\item[(i$\neq$)] $1 - f(M_t) \geq 0$,
\item[(i$=$)] $\frac{1 - p_u}{p_u} (X_t - (1 - f(M_t))) \geq 0$ for $X_t \geq 1$,
\item[(ii$\neq$)] $X_t - \gamma_t \geq 0$ for $X_t \geq \gamma_t$,
\item[(ii$=$)] $1 + f(M_t) \geq 0$,
\item[(iii$\neq$)] $X_t + d_t \geq 0$ since $d_t \geq 0$, and
\item[(iii$=$)] $X_t - \tfrac{1 - p_u}{p_u} d_t \geq 0$ since $d_t \leq \tfrac{p_u}{1 - p_u} X_t$.
\end{itemize}
\newclaim{$X_t \leq 1 - f(M_t)$ or $X_t \geq 1 + f(M_t)$ for all $t \geq T_1$.}
\label{claim:X_t-jumps}
We use induction on $t$.
The induction start holds with $X_{T_1} \leq 1 - f(M_t)$.
The induction step is clear for
(i) $X_t \geq 1$ and (ii) $1 > X_t \geq \gamma_t$ since $\gamma_t \geq 0$.
In case (iii) we have either $d_t = 0$ or
$d_t \leq \tfrac{1 - p_u}{p_u} \gamma_t - 2f(M_t)$
and since $X_t < \gamma_t$,
\[
X_{t+1} \leq X_t + d_t \leq X_t + (1 + f(M_t) - X_t) - 2f(M_t) = 1 - f(M_t).
\]
\newclaim{If $X_t \geq 1 - f(M_t)$ then $X_t > \gamma_t$.}
\label{claim:gamma_t-is-small}
In this case
\[
\gamma_t
= \tfrac{p_u}{1 - p_u} (1 + f(M_t) - X_t)
\leq 2 \tfrac{p_u}{1 - p_u} f(M_t),
\]
and thus with $p_u \leq \frac{1}{2}$ and
$f(M_t) \leq \sum_{k=1}^\infty f(k) \leq \frac{\delta}{2} < \frac{1}{3}$,
\[
X_t - \gamma_t
\geq 1 - f(M_t) - 2 \tfrac{p_u}{1 - p_u} f(M_t)
= 1 - \tfrac{1 + p_u}{1 - p_u} f(M_t)
\geq 1 - 3 f(M_t)
> 0.
\]
\newclaim{If $X_t > 0$ and ($f(M_t) > 0$ or $X_t < 1$) then $X_{t+1} \neq X_t$.}
\label{claim:X-moves}
\begin{enumerate}[(i)]
\item Assume $X_t \geq 1$.
Then either $X_{t+1} = 1 - f(M_t) < 1$, or
$\frac{1 - p_u}{p_u}(X_t - (1 - f(M_t))) > 0$ since $X_t > 1 - f(M_t)$.
\item Assume $1 > X_t \geq \gamma_t$.
Then either $X_{t+1} = 1 + f(M_t) \geq 1 > X_t$, or
$1 + f(M_t) - X_t \geq 1 - X_t > 0$, hence $\gamma_t > 0$ and thus
$X_{t+1} = X_t - \gamma_t < X_t$.
\item Assume $0 < X_t < \gamma_t$ and $X_t < 1$.
From \autoref{claim:gamma_t-is-small} follows $X_t < 1 - f(M_t)$,
thus $\frac{1 - p_u}{p_u} \gamma_t - 2 f(M_t) = 1 - f(M_t) - X_t > 0$.
By assumption, $\frac{p_u}{1 - p_u} X_t > 0$ and therefore $d_t > 0$.
Hence $X_t + d_t > X_t$ and $X_t - \frac{1 - p_u}{p_u} d_t < X_t$.
\end{enumerate}
\newclaim{For all $m \in \SetN$,
if $E_{m,m-1} \neq \emptyset$ then
$P(E_{m,m} \mid E_{m,m-1}) \geq 1 - 2f(m)$.}
\label{claim:single-upcrossing-bound}
Let $v \in E_{m,m-1}$
and let $t_0 \in \SetN$ be a time step such that
exactly $m - 1$ upcrossings have been completed up to time $t_0$, i.e.,
$M_{t_0}(v) = m$.
The subsequent downcrossing is completed eventually with probablity $1$:
we are in case (i) and
in every step there is a chance of $1 - p_u \geq \frac{1}{2}$ of completing the downcrossing.
Therefore we assume without loss of generality that the downcrossing has been completed,
i.e., that $t_0$ is such that $X_{t_0}(v) = 1 - f(m)$.
%
We will bound the probability $p := P(E_{m,m} \mid E_{m,m-1})$
that $X_t$ rises above $1 + f(m)$ after $t_0$ to complete the $m$-th upcrossing.
Define the stopping time $T: \Sigma^\cinfty \to \SetN \cup \{ \cinfty \}$,
\[
T(v) := \inf \{ t \geq t_0 \mid X_t(v) \geq 1 + f(m) \;\lor\; X_t(v) = 0 \},
\]
and define the stochastic process
$Y_t = 1 + f(m) - X_{\min\{t_0 + t, T\}}$.
Because $(X_{\min\{t_0 + t, T\}})_{t \in \SetN}$ is martingale,
$(Y_t)_{t \in \SetN}$ is martingale.
By definition, $X_t$ always stops at $1 + f(m)$ before exceeding it,
thus $X_T \leq 1 + f(m)$,
and hence $(Y_t)_{t \in \SetN}$ is nonnegative.
%
The \hyperref[thm:optional-stopping]{Optional Stopping Theorem} yields
$\E[Y_{T - t_0} \mid \F_{t_0}] \leq \E[Y_0 \mid \F_{t_0}]$ and thus
$\E[X_T \mid \F_{t_0}] \geq \E[X_{t_0} \mid \F_{t_0}] = 1 - f(m)$.
By \autoref{claim:X-moves},
$X_t$ does not converge unless
it reaches either $0$ or $1 + f(m)$, and thus
\[
1 - f(m) \leq \E[X_T \mid \F_{t_0}] = (1 + f(m)) \cdot p + 0 \cdot (1 - p),
\]
hence
$
P(E_{m,m} \mid E_{m,m-1})
= p
\geq 1 - f(m) (1 + p)
\geq 1 - 2f(m)
$.
\newclaim{$E_{m+1,m} = E_{m,m}$ and $E_{m+1,m+1} \subseteq E_{m,m}$.}
\label{claim:E-increasing}
By definition of $M_t$,
the $i$-th upcrossings of the process $(X_t)_{t \in \SetN}$
is between $1 - f(i)$ and $1 + f(i)$.
The function $f$ is monotone decreasing,
and by \autoref{claim:X_t-jumps} the process $(X_t)_{t \in \SetN}$
does not assume values between $1 - f(i)$ and $1 + f(i)$.
Therefore the first $m$ $f(m+1)$-upcrossings are also $f(m)$-upcrossings,
i.e., $E_{m+1,m} \subseteq E_{m,m}$.
By definition of $E_{m,k}$ we have
$E_{m+1,m} \supseteq E_{m,m}$ and
$E_{m+1,m+1} \subseteq E_{m+1,m}$.
\newclaim{$P(E_{m,m}) \geq 1 - \sum_{i=1}^m 2f(i)$.}
\label{claim:bound-on-E}
For $P(E_{0,0}) = 1$ this holds trivially.
Using \autoref{claim:single-upcrossing-bound} and \autoref{claim:E-increasing}
we conclude inductively
\begin{align*}
P(E_{m,m})
&= P(E_{m,m} \cap E_{m,m-1})
= P(E_{m,m} \mid E_{m,m-1}) P(E_{m,m-1}) \\
&= P(E_{m,m} \mid E_{m,m-1}) P(E_{m-1,m-1}) \\
&\geq (1 - 2f(m)) \left( 1 - \sum_{i=1}^{m-1} 2f(i) \right)
\geq 1 - \sum_{i=1}^m 2f(i).
\end{align*}
\end{claims}
% Finishing the proof
From \autoref{claim:E-increasing} follows
$\bigcap_{i=1}^m E_{i,i} = E_{m,m}$ and therefore
$P(\bigcap_{i=1}^\infty E_{i,i})
= \lim_{m \to \infty} P(E_{m,m})
\geq 1 - \sum_{i=1}^\infty 2f(i)
\geq 1 - \delta$.
\qed
\end{proof}
%-------------------------------%
%\paragraph{Uniform lower bound.}
%-------------------------------%
\autoref{thm:lower-bound} gives a \emph{uniform} lower bound on the probability
for many upcrossings:
it states the probability of the event that \emph{for all $m \in \SetN$},
$U(1 - f(m), 1 + f(m)) \geq m$ holds.
This is a lot stronger than the nonuniform bound
$P[U(1 - f(m), 1 + f(m)) \geq m] \geq 1 - \delta$ for all $m \in \SetN$:
the quantifier is inside the probability statement.
As an immediate consequence of \autoref{thm:lower-bound},
we get the following uniform lower bound on the \emph{expected} number of upcrossings.
%-------------------------------%
\begin{corollary}[Expected Upcrossings]
\label{cor:lower-bound}
%-------------------------------%
\iftrue
Under the same conditions as in Theorem~\ref{thm:lower-bound},
\else
Let $0 < \delta < 1$ and
let $f:\SetN \to [0, 1)$ be a monotone decreasing function such that
$\sum_{i=1}^\infty f(i) \leq \frac{\delta}{2}$.
For every probability measure $P$ with
$P(\Gamma_u) > 0$ for all $u \in \Sigma^*$,
there is a nonnegative martingale $(X_t)_{t \in \SetN}$
with $\E[X_t] = 1$ and
\fi
for all $m \in \SetN$,
\[
\E[U(1 - f(m), 1 + f(m))] \geq m (1 - \delta).
\]
\end{corollary}
\begin{proof}
From \autoref{thm:lower-bound} and Markov's inequality.
\qed
\end{proof}
\ifshort
By choosing the slowly decreasing but summable function $f$ by setting
$f^{-1}(\eps) := 2\delta(\frac{1}{\eps (\ln \eps)^2} - \frac{e^2}{4})$,
\else
By choosing a specific slowly decreasing but summable function $f$,
\fi
we get the following concrete results.
%-------------------------------%
\begin{corollary}[Concrete lower bound]
\label{cor:lower-bound-concrete}
%-------------------------------%
Let $0 < \delta < 1$.
For every probability measure $P$ with
$P(\Gamma_u) > 0$ for all $u \in \Sigma^*$,
there is a nonnegative martingale $(X_t)_{t \in \SetN}$
with $\E[X_t] = 1$ such that
\begin{gather*}
P \left[ \forall \eps > 0.\; U(1 - \eps, 1 + \eps)
\in \Omega \left( \tfrac{\delta}{\eps \left( \ln \frac{1}{\eps} \right)^2} \right) \right]
\geq 1 - \delta
\text{ and } \\
\E[U(1 - \eps, 1 + \eps)]
\in \Omega \Big( \tfrac{1}{\eps \left( \ln \frac{1}{\eps} \right)^2} \Big).
\end{gather*}
Moreover, for all $\eps < 0.015$ we get
$\E[U(1 - \eps, 1 + \eps)] > \tfrac{\delta (1 - \delta)}{\eps \left( \ln \frac{1}{\eps} \right)^2}$ and
\[
P \left[ \forall \eps < 0.015.\; U(1 - \eps, 1 + \eps)
> \tfrac{\delta}{\eps \left( \ln \frac{1}{\eps} \right)^2} \right]
\geq 1 - \delta.
\]
\end{corollary}
\ifshort\else
\begin{proof}
Define
\[
g:(0,e^{-2}] \to [0, \infty),
\quad\quad
\eps \mapsto 2\delta \left( \frac{1}{\eps (\ln \eps)^2} - \frac{e^2}{4} \right).
\]
We have $g(e^{-2}) = 0$, $\lim_{\eps \to 0} g(\eps) = \infty$, and
\[
\frac{dg}{d\eps}(\eps)
~=~ 2\delta \left(
\frac{-1}{\eps^2 (\ln \eps)^2}
+ \frac{-2}{\eps^2 (\ln \eps)^3}
\right) ~=~
-\frac{2\delta(2+\ln\eps)}{\eps^2(\ln \eps)^3}
~<~ 0 \text{ on } (0, e^{-2}).
\]
Therefore the function $g$ is strictly monotone decreasing and hence invertible.
Choose $f := g^{-1}$.
Using the substitution $t = g(\eps)$, $dt = \frac{dg}{d\eps}(\eps)d\eps$,
\begin{align*}
\sum_{t=1}^\infty f(t)
&\leq \int_0^\infty f(t) dt
= \int_{g^{-1}(0)}^{g^{-1}(\infty)} f(g(\eps)) \frac{dg}{d\eps}(\eps) d\eps \\
&= 2\delta \left(
\int_{e^{-2}}^0 \frac{-1}{\eps (\ln \eps)^2} d\eps
+ \int_{e^{-2}}^0 \frac{-2}{\eps (\ln \eps)^3} d\eps
\right) \\
&= 2\delta \left(
\left[ \tfrac{1}{\ln \eps} \right]_{e^{-2}}^0
+ \left[ \tfrac{1}{(\ln \eps)^2} \right]_{e^{-2}}^0
\right)
= 2\delta \left( \tfrac{1}{2} - \tfrac{1}{4} \right)
= \tfrac{\delta}{2}.
\end{align*}
Now we apply \autoref{thm:lower-bound} and \autoref{cor:lower-bound} to $m := g(\eps)$ and get
\begin{align*}
P \left[ U(1 - \eps, 1 + \eps)
\geq 2\delta \left( \tfrac{1}{\eps (\ln \eps)^2} - \tfrac{e^2}{4} \right) \right]
&\geq 1 - \delta, \text{ and} \\
\E[U(1 - \eps, 1 + \eps)]
&\geq 2 \delta (1 - \delta)
\left( \tfrac{1}{\eps (\ln \eps)^2} - \tfrac{e^2}{4} \right).
\end{align*}
For $\eps < 0.015$, we have $\frac{1}{\eps \left( \ln \eps \right)^2} > \frac{e^2}{2}$, hence
$g(\eps) > \frac{\delta}{\eps (\ln \eps)^2}$.
\qed
\end{proof}\fi
%-------------------------------%
%\paragraph{The concrete bounds are not optimal.}
%-------------------------------%
The concrete bounds given in \autoref{cor:lower-bound-concrete}
are \emph{not} the asymptotically optimal ones:
there are summable functions that decrease even more slowly.
For example,
we could multiply \ifshort $f^{-1}$ \else the function $g$ \fi with the factor
$\sqrt{\ln(1/\eps)}$ (which still is not optimal).
% dg/d\eps(\eps) = 2\delta ( \frac{-1}{\eps^2 (\ln\frac{1}{\eps})^{1.5}} + \frac{1.5}{\eps^2 \ln(\frac{1}{\eps})^{2.5}}).
%http://www.wolframalpha.com/input/?i=integrate+1%2Fx%2F%28ln%281%2Fx%29%5E1.5%29+from+0+to+1%2Fe%2Fe
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Martingale Upper Bounds}
\label{sec:upper-bounds}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%-------------------------------%
%\paragraph{Section contents.}
%-------------------------------%
In this section we state upper bounds on the probability and expectations of many upcrossings
(\hyperref[thm:dubins-inequality]{Dubins' Inequality} and
\hyperref[thm:upcrossing-inequality]{Doob's Upcrossing Inequality}).
We use the construction from the previous section to show that these bounds are tight.
Moreover, with the following theorem we show that
the uniform lower bound on the probability of many upcrossings guaranteed in \autoref{thm:lower-bound}
is asymptotically tight.
%-------------------------------%
%\paragraph{Upper and lower bounds.}
%-------------------------------%
Every function $f$ is either summable or not.
If $f$ is summable, then we can scale it with a constant factor such that
its sum is smaller than $\frac{\delta}{2}$,
and then apply the construction of \autoref{thm:lower-bound}.
If $f$ is not summable,
the following theorem implies that
there is no \emph{uniform} lower bound on
the probability of having at least $m$-many $f(m)$-upcrossings.
%-------------------------------%
\begin{theorem}[Upper bound on upcrossing rate]
\label{thm:upper-bound}
%-------------------------------%
Let $f: \SetN \to [0, 1)$ be a monotone decreasing function such that
$\sum_{t=1}^\infty f(t) = \infty$.
For every probability measure $P$ and
for every nonnegative $P$-martingale $(X_t)_{t \in \SetN}$
with $\E[X_t] = 1$,
\[
P[ \forall m.\; U(1 - f(m), 1 + f(m)) \geq m] = 0.
\]
\end{theorem}
\begin{proof}
Define the events $D_m := \bigcup_{i=1}^m E_{i,i}^c =
\{ \forall i \leq m.\; U(1 - f(i), 1 + f(i)) \geq i \}$.
Then $D_m \subseteq D_{m+1}$.
%
Assume there is a constant $c > 0$ such that
$c \leq P(D_m^c) = P(\bigcap_{i=1}^m E_{i,i})$ for all $m$.
%
Let $m \in \SetN$, $v \in D_m^c$, and pick $t_0 \in \SetN$
such that the process $X_0(v), \ldots, X_{t_0}(v)$
has completed $i$-many $f(i)$-upcrossings for all $i \leq m$ and
$X_{t_0}(v) \leq 1 - f(m+1)$.
%
If $X_t(v) \geq 1 + f(m+1)$ for some $t \geq t_0$,
the $(m + 1)$-st upcrossing for $f(m + 1)$ is completed
and thus $v \in E_{m+1, m+1}$.
Define the stopping time $T: \Sigma^\cinfty \to (\SetN \cup \{ \cinfty \})$,
\[
T(v) := \inf \{ t \geq t_0 \mid X_t(v) \geq 1 + f(m+1) \}.
\]
According to the \hyperref[thm:optional-stopping]{Optional Stopping Theorem}
applied to the process $(X_t)_{t \geq t_0}$,
the random variable $X_T$ is almost surely well-defined and
$\E[X_T \mid \F_{t_0}] \leq \E[X_{t_0} \mid \F_{t_0}] = X_{t_0}$.
This yields
$1 - f(m+1) \geq X_{t_0} \geq \E[X_T \mid \F_{t_0} ]$ and
by taking the expectation $\E[ \;\cdot \mid X_{t_0} \leq 1 - f(m+1)]$ on both sides,
\begin{align*}
1 - f(m+1)
&\geq \E[ X_T \mid X_{t_0} \leq 1 - f(m+1) ] \\
&\geq (1 + f(m+1)) P[X_T \geq 1 + f(m+1) \mid X_{t_0} \leq 1 - f(m+1) ]
\end{align*}
by Markov's inequality.
Therefore
\begin{align*}
P( E_{m+1,m+1} \mid D_m^c )
= \;&P[ X_T \geq 1 + f(m+1) \mid X_{t_0} \leq 1 - f(m+1)] \\
\;&\cdot P[ X_{t_0} \leq 1 - f(m+1) \mid D_m^c] \\
\leq \;&P[X_T \geq 1 + f(m+1) \mid X_{t_0} \leq 1 - f(m+1)] \\
\leq \;&\tfrac{1 - f(m+1)}{1 + f(m+1)} \leq 1 - f(m+1).
\end{align*}
Together with $c \leq P(D_m^c)$ we get
\begin{align*}
P \left( D_{m+1} \setminus D_m \right)
&= P \left( E_{m+1,m+1}^c \cap D_m^c \right) \\
&= P \left( E_{m+1,m+1}^c \mid D_m^c \right) P \left( D_m^c \right)
\geq f(m+1) c.
\end{align*}
This is a contradiction because $\sum_{i=1}^\infty f(i) = \infty$:
\[
1
\geq P(D_{m+1})
= P \left( \biguplus_{i=1}^m (D_{i+1} \setminus D_i) \right) \\
= \sum_{i=1}^m P(D_{i+1} \setminus D_i)
\geq \sum_{i=1}^m f(i+1) c
\to \infty.
\]
Therefore the assumption $P(D_m^c) \geq c$ for all $m$ is false,
and hence we get
$
P[ \forall m.\; U(1 - f(m), 1 + f(m)) \geq m]
= P(\bigcap_{i=1}^\infty E_{i,i})
= \lim_{m \to \infty} P(D_m^c)
= 0
$.
\qed
\end{proof}
\ifshort
By choosing the decreasing non-summable function $f$ by setting
$f^{-1}(\eps) := \frac{-a}{\eps (\ln \eps)} - b$
\else
By choosing a specific decreasing non-summable function $f$
\fi
for \autoref{thm:upper-bound},
we get that
$U(1-\eps, 1 + \eps) \notin \Omega(\frac{1}{\eps \log(1/\eps)})$
$P$-almost surely.
%-------------------------------%
\begin{corollary}[Concrete upper bound]
\label{cor:upper-bound-concrete}
%-------------------------------%
Let $P$ be a probability measure and
let $(X_t)_{t \in \SetN}$ be a nonnegative martingale
with $\E[X_t] = 1$.
Then for all $a, b > 0$,
\[
P \left[ \forall \eps > 0.\; U(1 - \eps, 1 + \eps) \geq \tfrac{a}{\eps \log(1/\eps)} - b \right] = 0.
\]
\end{corollary}
\ifshort\else
\begin{proof}
We proceed analogously to the proof of \autoref{cor:lower-bound}.
Define
\[
g: (0, c] \to [g(c), \infty), ~~~~~~ \eps \mapsto \frac{a}{\eps \ln \frac{1}{\eps}} - b
\]
with $c < 1$ and $g(c) \geq 1$.
We have $\lim_{\eps \to 0} g(\eps) \to \infty$ and
\[
\frac{dg}{d\eps}(\eps)
= \frac{-a}{\eps^2 \ln \frac{1}{\eps}}
+ \frac{-a}{\eps^2 (\ln \frac{1}{\eps})^2}
< 0 \text{ on } (0, c].
\]
Therefore the function $g$ is strictly monotone decreasing and hence invertible.
Choose $f := g^{-1}$.
Using the substitution $t = g(\eps)$,
$dt = \frac{dg}{d\eps}(\eps) d\eps$,
\begin{align*}
\sum_{t=1}^\infty f(t)
&\geq \int_{g(c)}^\infty f(t) dt
= \int_c^{g^{-1}(\infty)} f(g(\eps)) \frac{dg}{d\eps}(\eps) d\eps \\
&= \int_c^0 \frac{-a}{\eps \ln \frac{1}{\eps}} d\eps
+ \int_c^0 \frac{-a}{\eps (\ln \frac{1}{\eps})^2} d\eps
= \int_{-\ln c}^{-\ln 0} \frac{a}{u} du
+ \int_c^0 \frac{-a}{\eps (\ln \frac{1}{\eps})^2} d\eps \\
&= \left[ a \ln u \right]_{-\ln c}^{+\infty}
+ \left[ \tfrac{a}{\ln \frac{1}{\eps}} \right]_c^0
= \infty - a \ln (-\ln c) + 0 - \tfrac{a}{\ln \frac{1}{c}}
= \infty.
\end{align*}
Now we apply \autoref{thm:upper-bound}
to $m := g(\eps)$.
\qed
\end{proof}\fi
%-------------------------------%
\begin{theorem}[{Dubins' Inequality~\cite[Ex.5.2.14]{Durrett:10}}]
\label{thm:dubins-inequality}
%-------------------------------%
For every nonnegative $P$-martingale $(X_t)_{t \in \SetN}$
and for every $c > 0$ and every $\eps > 0$,
\[
P[ U(c - \eps, c + \eps) \geq k ]
\leq \left( \tfrac{c - \eps}{c + \eps} \right)^k
\E \left[ \min \left\{\tfrac{X_0}{c - \eps}, 1 \right\} \right].
\]
\end{theorem}
%-------------------------------%
%\paragraph{Dubin's inequality and \autoref{thm:lower-bound}.}
%-------------------------------%
Dubins' Inequality immediately yields
the following bound on the probability of the number of upcrossings.
\[
% P(E_{m,k})
P[U(1 - f(m), 1 + f(m)) \geq k]
\leq \left( \tfrac{1 - f(m)}{1 + f(m)} \right)^k.
\]
The construction from \autoref{thm:lower-bound} shows that
this bound is asymptotically tight for $m = k \to \infty$ and $\delta \to 0$:
define the monotone decreasing function $f: \SetN \to [0, 1)$,
\[
f(t) :=
\begin{cases}
\frac{\delta}{2k}, &\text{if } t \leq k, \text{ and} \\
0, &\text{otherwise}.
\end{cases}
\]
Then the martingale from \autoref{thm:lower-bound} yields the lower bound
\[
P[U(1 - \tfrac{\delta}{2k}, 1 + \tfrac{\delta}{2k}) \geq k]
\geq 1 - \delta,
\]
while \hyperref[thm:dubins-inequality]{Dubins' Inequality} gives the upper bound
\[
P[U(1 - \tfrac{\delta}{2k}, 1 + \tfrac{\delta}{2k}) \geq k]
\leq \left( \frac{1 - \frac{\delta}{2k}}{1 + \frac{\delta}{2k}} \right)^k
= \left( 1 - \frac{2\delta}{2k + \delta} \right)^k
\xrightarrow{k \to \infty} \exp(-\delta).
\]
% The tangent to $\exp(-\delta)$ at $\delta = 0$ is $1 - \delta$
% http://www.wolframalpha.com/input/?i=taylor+exp(-x)
As $\delta$ approaches $0$,
the value of $\exp(-\delta)$ approaches $1 - \delta$
(but exceeds it since $\exp$ is convex).
%
For $\delta = 0.2$ and $m = k = 3$,
the difference between the two bounds is already lower than $0.021$.
The following theorem
places an upper bound on the rate of \emph{expected} upcrossings.
\ifshort\else
In Appendix \ref{app:tightness}
we discuss different versions of this inequality
and prove this inequality tight.
\fi
%-------------------------------%
\begin{theorem}[{Doob's Upcrossing Inequality~\cite{Xu12}}]
\label{thm:upcrossing-inequality}
%-------------------------------%
Let $(X_t)_{t \in \SetN}$ be a submartingale.
For every $c \in \mathbb{R}$ and $\eps > 0$,
\[
\E[U_t(c - \eps, c + \eps)]
\leq \tfrac{1}{2\eps} \E[\max\{ c - \eps - X_t, 0 \}].
\]
\end{theorem}
%
%-------------------------------%
%\paragraph{Doob's inequality and \autoref{thm:lower-bound}.}
%-------------------------------%
Asymptotically, Doob's Upcrossing Inequality states that with $\eps \to 0$,
\[
\E[U(1 - \eps, 1 + \eps)]
\in O \left(\tfrac{1}{\eps} \right).
\]
Again, we can use the construction of \autoref{thm:lower-bound}
to show that these asymptotics are tight:
define the monotone decreasing function $f: \SetN \to [0, 1)$,
\[
f(t) :=
\begin{cases}
\frac{\delta}{2m}, &\text{if } t \leq m, \text{ and} \\
0, &\text{otherwise}.
\end{cases}
\]
Then for $\delta = \frac{1}{2}$,
\autoref{cor:lower-bound} yields a martingale fulfilling the lower bound
\[
\E[U(1 - \tfrac{1}{4m}, 1 + \tfrac{1}{4m})]
\geq \frac{m}{2}
\]
and \hyperref[thm:upcrossing-inequality]{Doob's Upcrossing Inequality} gives the upper bound
\[
\E[U(1 - \tfrac{1}{4m}, 1 + \tfrac{1}{4m})]
\leq 2m,
\]
which differs by a factor of $4$.
%
\ifshort\else
In \autoref{thm:tightness-Doob} we show that
\hyperref[thm:upcrossing-inequality]{Doob's Upcrossing Inequality}
can also be made exactly tight.
\fi
%-------------------------------%
%\paragraph{Gap between \autoref{cor:lower-bound-concrete} and Doob's inequality.}
%-------------------------------%
The lower bound for the expected number of upcrossings given in \autoref{cor:lower-bound-concrete}
is a little looser than
the upper bound given in \hyperref[thm:upcrossing-inequality]{Doob's Upcrossing Inequality}.
Closing this gap remains an open problem.
We know by \autoref{thm:upper-bound} that
given a non-summable function $f$,
the uniform probability for many $f(m)$-upcrossings goes to $0$.
However, this does not necessarily imply that expectation also tends to $0$;
low probability might be compensated for by high value.
So for expectation there might be a lower bound larger than \autoref{cor:lower-bound},
an upper bound smaller than \hyperref[thm:upcrossing-inequality]{Doob's Upcrossing Inequality},
or both.
%-------------------------------%
%\paragraph{Nonuniform, Doob's bound is best.}
%-------------------------------%
If we drop the requirement that the rate of upcrossings to be uniform,
\hyperref[thm:upcrossing-inequality]{Doob's Upcrossing Inequality}
is the best upper bound we can give\ifshort~\cite{LH:14martoscx}.
\else:
%
using the little-$o$ notation,
assume there is a smaller upper bound $g(m) \in o(m)$
such that for every martingale process $(X_t)_{t \in \SetN}$,
\begin{equation}\label{eq:nonuniform-bound}
\E \left[U(1 - \tfrac{1}{m}, 1 + \tfrac{1}{m}) \right] \in o(g(m)).
\end{equation}
%-------------------------------%
%\paragraph{Proof sketch.}
%-------------------------------%
In the following we sketch how to construct a martingale that violates this bound.
Define $f(m) := g(m) / m$,
then $f(m) \to 0$ as $m \to \infty$,
so there is an infinite sequence $(m_i)_{i \in \SetN}$ such that
$\sum_{i=0}^\infty f(m_i) \leq 1$.
We define the martingale process $(X_t)_{t \in \SetN}$
such that it picks an $i \in \SetN$ with probability $f(m_i)$,
and then becomes a martingale
that makes \hyperref[thm:upcrossing-inequality]{Doob's Upcrossing Inequality} tight
for upcrossings between $1 - 1/m_i$ and $1 + 1/m_i$%
\ifshort.
\else:
for every $i$,
we apply the construction of \autoref{thm:tightness-Doob}.
\fi
%
This would give the following lower bound on the expected number of upcrossings
for each $i$:
\[
\forall i\;\;
\E \left[ U(1 - \tfrac{1}{m_i}), 1 + \tfrac{1}{m_i}) \right]
\geq m_i f(m_i)
= g(m_i).
\]
Since there are infinitely many $m_i$,
we get a contradiction to \eqref{eq:nonuniform-bound}.
Using a similar argument, we can show that nonuniformly,
Dubins' bound is also the best we can get.
\fi
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Application to the {MDL} Principle}
\label{sec:mdl-application}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%-------------------------------%
%\paragraph{Definition of $\MDL$.}
%-------------------------------%
Let $\mathcal{M}$ be a countable set of probability measures on $(\Sigma^\cinfty, \Foo)$,
called \emph{environment class}.
Let $K: \mathcal{M} \to [0, 1]$ be a function
such that $\sum_{Q \in \mathcal{M}} 2^{-K(Q)} \leq 1$,
called \emph{complexity function on $\mathcal{M}$}.
Following notation in \cite{Hutter:09mdltvp}, we define for $u \in \Sigma^*$ the
\emph{minimal description length} model as
\[
\MDL^u := \argmin_{Q \in \mathcal{M}} \big\{ \!-\log Q(\Gamma_u) + K(Q) \big\}.
\]
That is, $-\log Q(\Gamma_u)$ is the (arithmetic) code length of $u$ given model $Q$,
and $K(Q)$ is a complexity penalty for $Q$, also called \emph{regularizer}.
Given data $u \in \Sigma^*$,
$\MDL^u$ is the measure $Q \in \mathcal{M}$
that minimizes the total code length of data and model.
The following corollary of \autoref{thm:lower-bound} states that in some cases
the limit $\lim_{t \to \infty} \MDL^{v_{1:t}}$ does not exist with high probability.
%-------------------------------%
\begin{corollary}[MDL may not converge]
\label{cor:mdl-inductively-inconsistent}
%-------------------------------%
Let $P$ be a probability measure on the measurable space $(\Sigma^\cinfty, \Foo)$.
For any $\delta > 0$,
there is a set of probability measures $\mathcal{M}$ containing $P$,
a complexity function $K: \mathcal{M} \to [0,1]$, and
a measurable set $Z \in \Foo$ with $P(Z) \geq 1 - \delta$
such that for all $v \in Z$,
the limit $\lim_{t \to \infty} \MDL^{v_{1:t}}$ does not exist.
\end{corollary}
\begin{proof}
Fix some positive monotone decreasing summable function $f$
(e.g., the one given in \autoref{cor:lower-bound-concrete}).
Let $(X_t)_{t \in \SetN}$ be the $P$-martingale process
from \autoref{thm:lower-bound}.
By \autoref{thm:martingale-measure} there is a
probability measure $Q$ on $(\Sigma^\cinfty, \Foo)$ such that
\[
X_t(v) = \frac{Q(\Gamma_{v_{1:t}})}{P(\Gamma_{v_{1:t}})}.
\]
Choose $\mathcal{M} := \{ P, Q \}$ with $K(P) := K(Q) := 1$.
From the definition of $\MDL$ and $Q$ it follows that
\begin{align*}
X_t(u) &< 1
\;\Longleftrightarrow\;
Q(\Gamma_u) < P(\Gamma_u)
\;\Longrightarrow\;
\MDL^u = P, \text{ and} \\
X_t(u) &> 1
\;\Longleftrightarrow\;
Q(\Gamma_u) > P(\Gamma_u)
\;\Longrightarrow\;
\MDL^u = Q.
\end{align*}
For $Z := \bigcap_{m=1}^\infty E_{m,m}$ \autoref{thm:lower-bound} yields
\[
P(Z)
= P[\forall m.\; U(1 - f(m), 1 + f(m)) \geq m]
\geq 1 - \delta.
\]
For each $v \in Z$, the measure $\MDL^{v_{1:t}}$ alternates between $P$ and $Q$
indefinitely, and thus its limit does not exist.
\qed
\end{proof}
%-------------------------------%
%\paragraph{$Q/P$ oscillates around $1$.}
%-------------------------------%
Crucial to the proof of \autoref{cor:mdl-inductively-inconsistent} is that
not only does the process $Q/P$ oscillate indefinitely,
it oscillates around the constant $\exp(K(Q) - K(P)) = 1$.
This implies that the MDL estimator may keep changing indefinitely,
and thus it is inductively inconsistent.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Bounds on Mind Changes}
\label{sec:bounds-on-mind-changes}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\newcommand{\secondfigure}{
\centering
\begin{tikzpicture}[scale=\ifshort 0.55\else 1.0 \fi]
% Axes
\draw[->] (-0.1,0) -- (8.3,0) node[right] {$t$};
\ifshort
\draw[->] (0,-0.1) -- (0,4) node[right] {$X_t$};
\else
\draw[->] (0,-0.1) -- (0,4) node[above] {$X_t$};
\fi
\draw[color=gray,dashed] (8.2,2) -- (-0.1,2)
node[left,color=black] {$c$};
\draw[color=orange] (4,4) -- (4,-0.1);
\draw[color=orange] (6.5,4) -- (6.5,-0.1);
% bounds
\draw[color=gray] (8.2,3) -- (-0.1,3)
node[left,color=black] {$c + \tfrac{\alpha}{2}$};
\draw[color=gray] (8.2,1) -- (-0.1,1)
node[left,color=black] {$c - \tfrac{\alpha}{2}$};
\draw[color=blue] (0,0.8) -- (2.5,0.8);
\draw[color=blue] (2.5,2.5) -- (3.5,2.5);
\draw[color=blue] (3.5,0.8) -- (5,0.8);
\draw[color=blue] (5,2.2) -- (5.5,2.2);
\draw[color=blue] (5.5,0.2) -- (8.1,0.2);
% X_t
\draw (0, 2.8) -- (0.5, 3.1) -- (1, 2.4) -- (1.5, 2.8) -- (2, 1.5) -- (2.5, 0.5)
-- (3, 0.7) -- (3.5, 2.8) -- (4, 3.4) -- (4.5, 1.5) -- (5, 0.2)
-- (5.5, 2.2) -- (6, 1.8) -- (6.5, 3.1) -- (7, 3.4) -- (7.5, 3.2) -- (8, 3.5);
\draw[fill]
(0, 2.8) circle (0.05)
(0.5,3.1) circle (0.05)
(1, 2.4) circle (0.05)
(1.5,2.8) circle (0.05)
(2, 1.5) circle (0.05)
(2.5,0.5) circle (0.05)
(3, 0.7) circle (0.05)
(3.5,2.8) circle (0.05)
(4, 3.4) circle (0.05)
(4.5,1.5) circle (0.05)
(5, 0.2) circle (0.05)
(5.5,2.2) circle (0.05)
(6, 1.8) circle (0.05)
(6.5,3.1) circle (0.05)
(7, 3.4) circle (0.05)
(7.5,3.2) circle (0.05)
(8, 3.5) circle (0.05);
\end{tikzpicture}
\caption{
This example process has two upcrossings
between $c - \alpha/2$ and $c + \alpha/2$
(completed at the time steps of the vertical orange bars) and
four $\alpha$-alternations (completed when crossing the horizontal blue bars).
}
\label{fig:alternations-and-upcrossings}
}
%-------------------------------%
%\paragraph{Mind changes as martingales.}
%-------------------------------%
Suppose we are testing a hypothesis $H \subseteq \Sigma^\cinfty$
on a stream of data $v \in \Sigma^\cinfty$.
Let $P(H \mid \Gamma_{v_{1:t}})$ denote our belief in $H$ at time $t \in \SetN$
after seeing the evidence $v_{1:t}$.
By Bayes' rule,
\[
P(H \mid \Gamma_{v_{1:t}})
= P(H) \frac{P(\Gamma_{v_{1:t}} \mid H)}{P(\Gamma_{v_{1:t}})}
=: X_t(v).
\]
Since $X_t$ is a constant multiple of $P(\;\cdot \mid H)/P$ and
$P(\;\cdot \mid H)$ is a probability measure on $(\Sigma^\cinfty, \Foo)$,
the process $(X_t)_{t \in \SetN}$ is a $P$-martingale
with respect to the filtration $(\F_t)_{t \in \SetN}$
by \autoref{thm:measure-martingale}.
By definition, $(X_t)_{t \in \SetN}$ is bounded between $0$ and $1$.
%-------------------------------%
%\paragraph{Definition of $\alpha$-alternations.}
%-------------------------------%
Let $\alpha > 0$.
We are interested in the question
how likely it is to often change one's mind about $H$ by at least $\alpha$,
i.e., what is the probability for $X_t = P(H \mid \Gamma_{v_{1:t}})$
to decrease and subsequently increase $m$ times by at least $\alpha$.
Formally, we define the stopping times $T_{0,\nu}'(v) := 0$,
\begin{align*}
T_{2k+1,\nu}'(v)
&:= \inf \{ t > T_{2k,\nu}'(v)
\mid X_t(v) \leq X_{T_{2k,\nu}'(v)}(v) - \nu\alpha \}, \\
T_{2k+2,\nu}'(v)
&:= \inf \{ t > T_{2k+1,\nu}'(v)
\mid X_t(v) \geq X_{T_{2k+1,\nu}'(v)}(v) + \nu\alpha \},
\end{align*}
\ifshort
% For the short version, this figure needs to be here
\begin{wrapfigure}{r}{0.5\textwidth}
\vspace{-9mm}
\secondfigure
\vspace{-9mm}
\end{wrapfigure}
\fi
and $T_k' := \min\{ T_{k,\nu}' \mid \nu \in \{ -1, +1 \} \}$.
(In Davis' notation, $X_{T_{0,\nu}'}, X_{T_{1,\nu}'}, \ldots$
is an $\alpha$-alternating W-sequence for $\nu = 1$ and
an $\alpha$-alternating M-sequence for $\nu = -1$~\cite[Def.\ 4]{Davis:13}.)
For any $t \in \SetN$, the random variable % $A_t(\alpha)$ where
\[
A_t^X(\alpha)(v) := \sup \{ k \geq 0 \mid T_k'(v) \leq t \},
\]
is defined as the number of \emph{$\alpha$-alter\-nations} up to time $t$.
Let $A^X(\alpha) := \sup_{t \in \SetN} A_t^X(\alpha)$
denote the total number of $\alpha$-alternations.
%-------------------------------%
%\paragraph{Alternations and Upcrossings.}
%-------------------------------%
Setting $\alpha = 2\eps$,
the $\alpha$-alter\-nations differ from $\eps$-upcrossings in three ways:
first, for upcrossings, the process decreases below $c - \eps$,
then increases above $c + \eps$, and then repeats.
For alternations, the process may overshoot $c - \eps$ or $c + \eps$ and
thus change the bar for the subsequent alternations,
causing a `drift' in the target bars over time.
Second, for $\alpha$-alternations the initial value of the martingale is relevant.
Third, one upcrossing corresponds to two alternations,
since one upcrossing always involves a preceding downcrossing.
See \autoref{fig:alternations-and-upcrossings}.
\ifshort\else
\begin{figure}[t]
\secondfigure
\end{figure}
\fi
%-------------------------------%
%\paragraph{Davis' Lemma.}
%-------------------------------%
To apply our bounds for upcrossings on $\alpha$-alternations,
we use the following lemma by Davis.
We reinterpret it as stating that
every bounded martingale process $(X_t)_{t \in \SetN}$
can be modified into a martingale $(Y_t)_{t \in \SetN}$
such that the probability for many $\alpha$-alternations is not decreased
and the number of alternations equals
the number of upcrossings plus the number of downcrossings%
\ifshort~\cite{LH:14martoscx}\fi.
\ifshort\else
A sketch of the proof can be found in Appendix \ref{app:davis-lemma}.
\fi
%-------------------------------%
\begin{lemma}[{Upcrossings and alternations~\cite[Lem.\ 9]{Davis:13}}]
\label{lem:alternations-upcrossings}
%-------------------------------%
Let $(X_t)_{t \in \SetN}$ be a martingale
with $0 \leq X_t \leq 1$.
There exists a martingale $(Y_t)_{t \in \SetN}$
with $0 \leq Y_t \leq 1$
and a constant $c \in (\alpha/2, 1 - \alpha/2)$
such that for all $t \in \SetN$ and for all $k \in \SetN$,
\[
P[A_t^X(\alpha) \geq 2k]
\leq P[A_t^Y(\alpha) \geq 2k]
= P[U_t^Y(c - \alpha/2, c + \alpha/2) \geq k].
\]
\end{lemma}
%-------------------------------%
\begin{theorem}[Upper bound on alternations]
\label{thm:davis-dubins}
%-------------------------------%
For every martingale process $(X_t)_{t \in \SetN}$
with $0 \leq X_t \leq 1$,
\[
P[A(\alpha) \geq 2k]
\leq \left( \frac{1 - \alpha}{1 + \alpha} \right)^k.
\]
\end{theorem}
\begin{proof}
We apply \autoref{lem:alternations-upcrossings}
to $(X_t)_{t \in \SetN}$ and $(1 - X_t)_{t \in \SetN}$ to get
the processes $(Y_t)_{t \in \SetN}$ and $(Z_t)_{t \in \SetN}$.
\hyperref[thm:dubins-inequality]{Dubins' Inequality} yields
\begin{align*}
P[A_t^X(\alpha) \geq 2k]
&\leq P[U_t^Y(c_+ - \tfrac{\alpha}{2}, c_+ - \tfrac{\alpha}{2}) \geq k]
\leq \left( \frac{c_+ - \frac{\alpha}{2}}{c_+ + \frac{\alpha}{2}} \right)^k
=: g(c_+) \text{ and} \\
P[A_t^{1-X}(\alpha) \geq 2k]
&\leq P[U_t^Z(c_- - \tfrac{\alpha}{2}, c_- - \tfrac{\alpha}{2}) \geq k]
\leq \left( \frac{c_- - \frac{\alpha}{2}}{c_- + \frac{\alpha}{2}} \right)^k
= g(c_-)
\end{align*}
for some $c_+, c_- \in (\alpha/2, 1 - \alpha/2)$.
Because \autoref{lem:alternations-upcrossings} is symmetric for
$(X_t)_{t \in \SetN}$ and $(1 - X_t)_{t \in \SetN}$,
we have $c_+ = 1 - c_-$.
%We want to eliminate $c_+$ and $c_-$ by finding a maximum of the function $g$.
%% http://www.wolframalpha.com/input/?i=d%2Fdc+(c-a%2F2)%5Em%2F(c%2Ba%2F2)%5Em
%\[
% \frac{dg}{dc}(c)
%= \left(\frac{2c - \alpha}{2c + \alpha} \right)^k \frac{4\alpha k}{4c^2 - \alpha^2},
%\]
%which is positive for all $c > \alpha/2$.
%Therefore $g$ is strictly monotone increasing.
Since $P[A_t^X(\alpha) \geq 2k] = P[A_t^{1 - X}(\alpha) \geq 2k]$
by the definition of $A_t^X(\alpha)$,
we have that both are less than $\min\{g(c_+), g(c_-)\} = \min\{g(c_+), g(1 - c_+)\}$.
This is maximized for $c_+ = c_- = 1/2$
because $g$ is strictly monotone increasing for $c > \alpha / 2$.
%
Therefore
\[
P[A_t^X(\alpha) \geq 2k]
\leq \left( \frac{\frac{1}{2} - \frac{\alpha}{2}}{\frac{1}{2} + \frac{\alpha}{2}} \right)^k
= \left( \frac{1 - \alpha}{1 + \alpha} \right)^k.
\]
Since this bound is independent of $t$,
it also holds for $P[A^X(\alpha) \geq 2k]$.
\qed
\end{proof}
%-------------------------------%
%\paragraph{Davis' bound.}
%-------------------------------%
The bound of \autoref{thm:davis-dubins} is the square root of the bound derived by Davis~\cite[Thm.\ 10 \& Thm.\ 11]{Davis:13}:
\begin{equation}\label{eq:Davis-bound}
P[A(\alpha) \geq 2k]
\leq \left( \frac{1 - \alpha}{1 + \alpha} \right)^{2k}
\end{equation}
This bound is tight~\cite[Cor.\ 13]{Davis:13}.
%-------------------------------%
%\paragraph{Why is our bound worse?}
%-------------------------------%
Because $0 \leq X_t \leq 1$,
the process $(1 - X_t)_{t \in \SetN}$ is also a nonnegative martingale,
hence the same upper bounds apply to it.
This explains why the result in \autoref{thm:davis-dubins} is worse than Davis' bound~\eqref{eq:Davis-bound}:
Dubins' bound applies to all nonnegative martingales,
while Davis' bound uses the fact that the process is bounded from below \emph{and} above.
For unbounded nonnegative martingales,
downcrossings are `free' in the sense that
one can make a downcrossing almost surely successful
(as done in the proof of \autoref{thm:lower-bound}).
If we apply Dubins' bound to the process $(1 - X_t)_{t \in \SetN}$,
we get the same probability bound for the downcrossings of $(X_t)_{t \in \SetN}$
(which are upcrossings of $(1 - X_t)_{t \in \SetN}$).
Multiplying both bounds yields Davis' bound~\eqref{eq:Davis-bound};
however, we still require a formal argument
why the upcrossing and downcrossing bounds are independent.
The following corollary to \autoref{thm:davis-dubins}
derives an upper bound on the \emph{expected} number of $\alpha$-alternations.
%-------------------------------%
\begin{theorem}[Upper bound on expected alternations]
\label{thm:expected-alternations}
%-------------------------------%
For every martingale $(X_t)_{t \in \SetN}$
with $0 \leq X_t \leq 1$,
the expectation
$
\E[A(\alpha)]
\leq \tfrac{1}{\alpha}.
$
\end{theorem}
\begin{proof}
By \autoref{thm:davis-dubins} we have
$P[A(\alpha) \geq 2k] \leq \left( \frac{1 - \alpha}{1 + \alpha} \right)^k$,
and thus
\begin{align*}
\E[A(\alpha)]
&= \sum_{k=1}^\infty P[A(\alpha) \geq k] \\
&= P[A(\alpha) \geq 1] + \sum_{k=1}^\infty \big(
P[A(\alpha) \geq 2k] + P[A(\alpha) \geq 2k + 1]
\big) \\
&\leq 1 + \sum_{k = 1}^\infty 2 P[A(\alpha) \geq 2k]
\leq 1 + 2\sum_{k = 1}^\infty \left( \frac{1 - \alpha}{1 + \alpha} \right)^k
%= 1 + 2 \big( \tfrac{1 - \alpha}{1 + \alpha} \big) \tfrac{1}{1 - \frac{1 - \alpha}{1 + \alpha}}
= \frac{1}{\alpha}.
\quad\quad\qed
\end{align*}
\end{proof}
%-------------------------------%
%\paragraph{Applying the bounds to mind changes.}
%-------------------------------%
We now apply the technical results of this section
to the martingale process $X_t = P(\;\cdot \mid H) / P$,
our belief in the hypothesis $H$ as we observe data.
The probability of changing our mind $k$ times by at least $\alpha$
decreases exponentially with $k$ (\autoref{thm:davis-dubins}).
Furthermore, the expected number of times we change our mind by at least $\alpha$
is bounded by $1/\alpha$ (\autoref{thm:expected-alternations}).
In other words, having to change one's mind a lot often is unlikely.
%-------------------------------%
%\paragraph{Lower bounds for mind changes.}
%-------------------------------%
Because in this section we consider martingales that are bounded between $0$ and $1$,
the lower bounds from \autoref{sec:indefinitely-oscillating-martingales} do not apply here.
While for the martingales constructed in \autoref{thm:lower-bound},
the number of $2\alpha$-alternations and the number of $\alpha$-up- and downcrossings coincide,
these processes are not bounded.
However, we can give a similar construction
that is bounded between $0$ and $1$ and makes Davis' bound asymptotically tight.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Conclusion}
\label{sec:conclusion}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%-------------------------------%
%\paragraph{Summary.}
%-------------------------------%
We constructed an indefinitely oscillating martingale process
from a summable function $f$.
\autoref{thm:lower-bound} and \autoref{cor:lower-bound}
give uniform lower bounds on the probability and expectation of
the number of upcrossings of decreasing magnitude.
%
In \autoref{thm:upper-bound} we proved the corresponding upper bound
if the function $f$ is not summable.
%
In comparison,
\hyperref[thm:upcrossing-inequality]{Doob's Upcrossing Inequality}
and \hyperref[thm:dubins-inequality]{Dubins' Inequality}
give upper bounds that are not uniform.
In \autoref{sec:upper-bounds} we showed that
for a certain summable function $f$,
our martingale makes these bounds asymptotically tight as well.
%-------------------------------%
%\paragraph{Recap of the MDL motivation.}
%-------------------------------%
Our investigation of indefinitely oscillating martingales
was motivated by two applications.
%
First, in \autoref{cor:mdl-inductively-inconsistent} we showed that
the minimum description length operator may not exist in the limit:
for any probability measure $P$
we can construct a probability measure $Q$ such that $Q/P$ oscillates forever
around the specific constant that causes $\lim_{t \to \infty} \MDL^{v_{1:t}}$ to not converge.
%-------------------------------%
%\paragraph{Recap of mind changes motivation.}
%-------------------------------%
Second, we derived bounds for the probability of changing one's mind about a hypothesis $H$
when observing a stream of data $v \in \Sigma^\cinfty$.
The probability $P(H \mid \Gamma_{v_{1:t}})$ is a martingale and
in \autoref{thm:davis-dubins} we proved that
the probability of changing the belief in $H$ often by at least $\alpha$ decreases exponentially.
%-------------------------------%
%\paragraph{Open questions.}
%-------------------------------%
A question that remains open is
whether there is a \emph{uniform} upper bound on the \emph{expected} number of upcrossings
tighter than \hyperref[thm:upcrossing-inequality]{Doob's Upcrossing Inequality}.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% Bibliography
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{thebibliography}{Wal05}
\bibitem[Dav13]{Davis:13}
Ernest Davis.
\newblock Bounding changes in probability over time: It is unlikely that you
will change your mind very much very often.
\newblock Technical report, 2013.
\newblock \url{https://cs.nyu.edu/davise/papers/dither.pdf}.
\bibitem[Doo53]{Doob:53}
Joseph~L. Doob.
\newblock {\em Stochastic Processes}.
\newblock Wiley, New York, 1953.
\bibitem[Dur10]{Durrett:10}
Rick Durrett.
\newblock {\em Probability: Theory and Examples}.
\newblock Cambridge University Press, 4th edition, 2010.
\bibitem[Grü07]{Gruenwald:07book}
Peter~D. Grünwald.
\newblock {\em The Minimum Description Length Principle}.
\newblock The MIT Press, Cambridge, 2007.
\bibitem[Hut09]{Hutter:09mdltvp}
Marcus Hutter.
\newblock Discrete {MDL} predicts in total variation.
\newblock In {\em Advances in Neural Information Processing Systems 22
({NIPS'09})}, pages 817--825, Cambridge, MA, USA, 2009. Curran Associates.
\bibitem[LH14]{LH:14martoscx}
Jan Leike and Marcus Hutter.
\newblock Indefinitely oscillating martingales.
\newblock Technical report, The Australian National University, 2014.
\newblock \url{http://www.hutter1.net/publ/martoscx.pdf}.
\bibitem[LS05]{Luo:05}
Wei Luo and Oliver Schulte.
\newblock Mind change efficient learning.
\newblock In {\em Proc. 18th Annual Conference on Learning Theory ({COLT'05})},
volume 3559 of {\em LNAI}, pages 398--412, Bertinoro, Italy, 2005. Springer.
\bibitem[PH05]{Hutter:05mdl2px}
Jan Poland and Marcus Hutter.
\newblock Asymptotics of discrete {MDL} for online prediction.
\newblock {\em IEEE Transactions on Information Theory}, 51(11):3780--3795,
November 2005.
\bibitem[Ris78]{Rissanen:78}
Jorma Rissanen.
\newblock Modeling by shortest data description.
\newblock {\em Automatica}, 14(5):465--471, 1978.
\bibitem[Wal05]{Wallace:05}
Christopher~S. Wallace.
\newblock {\em Statistical and Inductive Inference by Minimum Message Length}.
\newblock Springer, Berlin, 2005.
\bibitem[WB68]{Wallace:68}
Christopher~S. Wallace and David~M. Boulton.
\newblock An information measure for classification.
\newblock {\em Computer Journal}, 11(2):185--194, August 1968.
\bibitem[Xu12]{Xu12}
Weijun Xu.
\newblock Martingale convergence theorems.
\newblock Technical report, 2012.
\newblock \url{http://people.maths.ox.ac.uk/xu/Martingale_convergence.pdf}.
\end{thebibliography}
\end{document}