%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% Exact Bayesian Regression of Piecewise Constant Functions %%
%% Marcus Hutter, Start: June 2005 %%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%-------------------------------%
% Document-Style %
%-------------------------------%
\documentclass[12pt,twoside]{article}
\usepackage{latexsym,amsmath,hyperref}
\usepackage{graphicx}
\topmargin=-10mm \oddsidemargin=5mm \evensidemargin=5mm
\textwidth=15cm \textheight=22cm
\sloppy\lineskip=0pt
%-------------------------------%
% My Math-Environments %
%-------------------------------%
\def\,{\mskip 3mu} \def\>{\mskip 4mu plus 2mu minus 4mu} \def\;{\mskip 5mu plus 5mu} \def\!{\mskip-3mu}
\def\dispmuskip{\thinmuskip= 3mu plus 0mu minus 2mu \medmuskip= 4mu plus 2mu minus 2mu \thickmuskip=5mu plus 5mu minus 2mu}
\def\textmuskip{\thinmuskip= 0mu \medmuskip= 1mu plus 1mu minus 1mu \thickmuskip=2mu plus 3mu minus 1mu}
%\def\dispmuskip{}\def\textmuskip{}
\textmuskip
\def\beq{\dispmuskip\begin{equation}} \def\eeq{\end{equation}\textmuskip}
\def\beqn{\dispmuskip\begin{displaymath}}\def\eeqn{\end{displaymath}\textmuskip}
\def\bqa{\dispmuskip\begin{eqnarray}} \def\eqa{\end{eqnarray}\textmuskip}
\def\bqan{\dispmuskip\begin{eqnarray*}} \def\eqan{\end{eqnarray*}\textmuskip}
%-------------------------------%
% Macro-Definitions %
%-------------------------------%
\newenvironment{keywords}{\centerline{\bf\small
Keywords}\begin{quote}\small}{\par\end{quote}\vskip 1ex}
\def\paradot#1{\vspace{1.3ex plus 0.5ex minus 0.5ex}\noindent{\bf{#1.}}}
\def\paranodot#1{\vspace{1.3ex plus 0.5ex minus 0.5ex}\noindent{\bf{#1}}}
\def\aidx#1{}
\def\req#1{(\ref{#1})}
\def\toinfty#1{\stackrel{#1\to\infty}{\longrightarrow}}
\def\eps{\varepsilon}
\def\epstr{\epsilon} % for empty string
\def\nq{\hspace{-1em}}
\def\qed{\hspace*{\fill}$\Box\quad$\\}
\def\fr#1#2{{\textstyle{#1\over#2}}}
\def\frs#1#2{{^{#1}\!/\!_{#2}}}
\def\SetR{I\!\!R}
\def\SetN{I\!\!N}
\def\SetB{I\!\!B}
\def\SetZ{Z\!\!\!Z}
\def\qmbox#1{{\quad\mbox{#1}\quad}}
\def\e{{\rm e}} % natural e
\def\B{\{0,1\}}
\def\E{{\bf E}} % Expectation
\def\P{{\rm P}} % Probability
\def\v{\vec}
\def\es{\mbox{\o}} % empty set
\def\l{\ell}
\def\s{\sigma}
\def\v{\boldsymbol} %\boldsymbol\frak\Bbb\pmb\text
\def\text#1{\mbox{\scriptsize #1}}
\def\EE{I\!\!E}
\def\lesssim{\mbox{\raisebox{-0.8ex}{$\stackrel{\displaystyle<}\sim$}}} %% if not ams
\def\Var{\mbox{Var}}
\def\aai{{\scriptscriptstyle[]}} % algorithm array index
\begin{document}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% T i t l e - P a g e %
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\title{\vspace{-4ex}
\vskip 2mm\bf\Large\hrule height5pt \vskip 3mm
Exact Bayesian Regression of \\ Piecewise Constant Functions
\vskip 3mm \hrule height2pt}
\author{{\bf Marcus Hutter}\\[3mm]
\normalsize RSISE$\,$@$\,$ANU and SML$\,$@$\,$NICTA \\
\normalsize Canberra, ACT, 0200, Australia%
\thanks{A short version appeared in the proceedings of the Valencia 2006 meeting \cite{Hutter:07pcreg}.}\\
\normalsize \texttt{marcus@hutter1.net \ \ www.hutter1.net} }
\date{4 May 2007}
\maketitle
\vspace*{-5ex}
\begin{abstract}\noindent
We derive an exact and efficient Bayesian regression algorithm for
piecewise constant functions of unknown segment number, boundary
locations, and levels. The derivation works for any noise and segment
level prior, e.g.\ Cauchy which can handle outliers. We derive
simple but good estimates for the in-segment variance. We also
propose a Bayesian regression curve as a better way of smoothing
data without blurring boundaries. The Bayesian approach also allows
straightforward determination of the evidence, break probabilities
and error estimates, useful for model selection and significance and
robustness studies. We discuss the performance on synthetic and
real-world examples. Many possible extensions are discussed.
\def\contentsname{\centering\normalsize Contents}
{\parskip=-2.5ex\tableofcontents}
\end{abstract}
\begin{keywords}
Bayesian regression, exact polynomial algorithm, non-parametric
inference, piecewise constant function, dynamic programming, change point problem.
\vspace{-3ex}
\end{keywords}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Introduction}\label{secInt}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{quote}\it
``In science the primary duty of ideas is to be useful and interesting,
even more than to be true.'' \par
\vspace{-2ex}\hfill --- {\sl Wilfred Trotter (1941)}
\end{quote}
\noindent We consider the problem of fitting a piecewise constant function
through noisy one-dimensional data, as e.g.\ in Figure
\ref{figGLPCR}, where the segment number, boundaries and levels are
unknown. Regression with piecewise constant (PC) functions, a
special case of change point detection, has many applications, e.g.\
in seismology, tomography, biology, and econometric modeling.
Determining DNA copy numbers in cancer cells from micro-array data
is a very recent application.
%-------------------------------%
\paradot{Bayesian piecewise constant regression (BPCR)}
%-------------------------------%
We provide a full Bayesian analysis of PC-regression. For a fixed
number of segments we choose some prior over all possible
segment boundary locations. Some prior on the segment levels and
data noise within each segment is assumed. Finally a prior over the
number of segments is chosen. From this we obtain the posterior
segmentation probability distribution (Section \ref{secGM}).
%
In practice we need summaries of this complicated distribution. A
simple maximum (MAP) approximation or mean does not work
here. The right way is to proceed in stages from determining
the most critical segment number, to the boundary location, and
finally to the then trivial segment levels. We also extract the
evidence, the boundary probability distribution, and an
interesting non-PC mean regression curve including error estimate
(Section \ref{secQI}).
%
We derive an exact polynomial-time dynamic-programming-type
algorithm for all quantities of interest (Sections \ref{secCR},
\ref{secESQI} and \ref{secAlg}).
%
Our algorithm is applicable for any noise and level prior. We
consider more closely the Gaussian ``standard'' prior and
heavy-tailed robust-to-outliers distributions like the Cauchy, and
briefly discuss the non-parametric case (Sections \ref{secSM} and
\ref{secSSD}).
%
Finally, some hyper-parameters like the global data average and
variability and local within-level noise have to be determined. We
introduce and discuss efficient semi-principled estimators,
thereby avoiding problematic or expensive numerical EM or
Monte-Carlo estimates (Section \ref{secHP}).
%
We test our method on some synthetic examples (Section
\ref{secSE}) and some real-world data sets (Section \ref{secRWE}).
The simulations show that our method handles difficult data with
high noise and outliers well.
%
Our basic algorithm can (easily) be modified in a variety of ways:
For discrete segment levels, segment dependent variance, piecewise
linear and non-linear regression, non-parametric noise prior, etc.
(Section \ref{secMisc}).
%-------------------------------%
\paradot{Comparison to other work}
%-------------------------------%
Sen and Srivastava \cite{Sen:75} developed a frequentist solution
to the problem of detecting a single (the most prominent) segment
boundary, called change or break point. Olshen et al.\
\cite{Olshen:04} generalize this method to detect pairs of break
points, which improves recognition of short segments. Both methods
are then (heuristically) used to recursively determine further
change points.
%
Another approach is penalized Maximum Likelihood (ML). For a fixed
number of segments, ML chooses the boundary locations that maximize
the data likelihood, which is equivalent to minimizing the mean
square data deviation in case of Gaussian noise. Jong et al.\
\cite{Jong:03} use a population based algorithm as minimizer, while
Picard et al.\ \cite{Picard:05} use dynamic programming, which is
structurally very close to our core recursion, to find the exact
solution in polynomial time. An additional penalty term has to be
added to the likelihood in order to determine the correct number of
segments. The most principled penalty is the Bayesian Information
Criterion \cite{Schwarz:78,Kass:95}. Since it can be biased towards
too simple \cite{Weakliem:99} or too complex \cite{Picard:05}
models, in practice often a heuristic penalty is used. An
interesting heuristic, based on the curvature of the log-likelihood
as a function of the number of segments, has been used in
\cite{Picard:05}.
%
Our Bayesian regressor is a natural response to penalized ML.
%
Another related work to ours is Bayesian bin density
estimation by Endres and F\"oldi\'ak \cite{Endres:05}, who also
average over all boundary locations, but in the context of density
estimation.
%
Another related series of papers, developing exact Bayesian
algorithms for PC regression is \cite{Yao:84,Barry:92,Fearnhead:06},
which we discuss in more detail in Section \ref{secCompare}. All
three papers consider renewal processes (which have a highly
informed prior for the number of segments $k$) and derive recursions
over the number of data points $n$, while we consider a
noninformative uniform prior over $k$ (which is not a renewal
process) and derive a recursion over $k$.
%
See also \cite{Fearnhead:05} for a similar Bayesian/MAP approach.
%
Many other approximate, heuristic, sampling, or non-Bayesian
approaches to PC regression exist; too many to list them all.
%-------------------------------%
\paradot{Advantages of Bayesian regression}
%-------------------------------%
A full Bayesian approach, when computationally feasible, has various
advantages over others: A generic advantage is that it is more
principled and hence involves fewer heuristic design choices. This
is particularly important for estimating the number of segments.
Another generic advantage is that it can be easily embedded in a
larger framework. For instance, one can decide among competing
models solely based on the Bayesian evidence. Finally, Bayes often
works well in practice, and provably so if the model assumptions are
valid.\footnote{Note that we are not claiming here that BPCR works
better than the other mentioned approaches. In a certain sense Bayes
is optimal if the prior is `true'. Practical superiority likely
depends on the type of application. A comparison for micro-array
data is in progress. The major aim of this paper is to derive an
efficient algorithm, and demonstrate the gains of BPCR beyond bare
PC-regression, e.g.\ the (predictive) regression {\em curve} (which
is better than local smoothing which wiggles more and blurs jumps).}
%
We can also easily extract other information, like probability
estimates and variances for the various quantities of interest.
%
Particularly interesting is the expected level (and variance) of
each data point. This leads to a regression curve, which is very
flat, i.e.\ smoothes the data, in long and clear segments, wiggles
in less clear segments, follows trends, and jumps at the segment
boundaries. It thus behaves somewhat between local smoothing
and rigid PC-segmentation.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{The General Model}\label{secGM}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%-------------------------------%
\paradot{Setup}
%-------------------------------%
We are given a sequence $\v y=(y_1,...,y_n)$, e.g.\ times-series
data or measurements of some function at locations $1...n$, where
each $y_i\in\SetR$ resulted from a noisy ``measurement'', i.e.\
we assume that the $y_i$ are independently (e.g.\ Gaussian)
distributed with means $\mu_i'$ and\footnote{More generally,
$\mu'_i$ and $\s'_i$ are location and scale parameters of a
symmetric distribution.} variances $\s'_i\!\,^2$. The data
likelihood is therefore\footnote{For notational and verbal
simplicity we will not distinguish between probabilities of
discrete variables and densities of continuous variables.}
\beq\label{lh}
\mbox{likelihood:} \qquad P(\v y|\v\mu',\v\s') \;:=\;
\prod_{i=1}^n P(y_i|\mu'_i,\s'_i)
\eeq
The estimation of the true underlying function $f=(f_1,...,f_n)$ is
called regression. We assume or model $f$ as piecewise constant.
Consider $k$ segments with segment boundaries
$0=t_0y$ and
similarly for $x1000$, the $O(k_{max}n^2)$
algorithm is too slow. Fortunately, there is nearly no interaction
between distant segments; boundary $t_k$ is often practically
independent of where $t_{k\pm 2}$, $t_{k\pm 3}$, etc.\ are placed.
This suggests to break the whole data set into smaller overlapping
pieces, where each piece should be long enough to contain at least
four segments. Then boundaries $t_2^{piece},...,t_{k-2}^{piece}$ of
each piece are used, and appropriately merged. For the Bayesian
regression curve one should use some blending on the overlap. If
single segments are very long, one could coarsen (locally lump
together) the data and later refine around the boundaries.
Alternatively one can, of course, resort to simulation
based (Monte-Carlo) methods \cite{Barry:93,Fearnhead:06}.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Comparison to other work}\label{secCompare}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{table}
\rotatebox{90}%
{\begin{minipage}{0.95\textheight}
\caption[Quantities of Interest for various models]{\label{tabQIkuq}
{\bf (Quantities of Interest for various models)} Rows (1)--(9)
contain various quantities of interest (first column) for our model
($P_u$, fourth column), our model with fixed $k$ ($P_k$, third
column), and Yao's model with geometrically distributed segment
lengths ($P_q$, third column). The second column contains the
definition or a relation or way to compute the quantity. Section
\ref{secCR} explains notation.
%
Note that conditional on $k$ all three models are the same, hence
$P_k(\cdot|k)=P_u(\cdot|k)=P_q(\cdot|k)=P_k(\cdot)$ can be read off
from the third column.
}
$$\textmuskip
\begin{array}{|cc|c|c|c|c|} \hline
& \hfill \mbox{Model}
&
& \mbox{Fixed}
&
& \mbox{Markov or}
\\ \cline{3-3}
& \mbox{Function} \hfill
& \mbox{definition or relation}
& \mbox{\#segments $k$}
& \mbox{Uniform $k$}
& \mbox{geometric}
\\ \cline{1-2}\cline{4-6}
& P(\cdots)
& \mbox{or way to compute}
& P_k(\cdots)
& P_u(\cdots)
& P_q(\cdots)
\\ \hline\hline
1)
& P(k)
& \sum_{0i$ is independent
of where and how many change points occurred before $i$, that is
$P(t_{1l})=p_{0t_1}p_{t_1 t_2}...p_{t_{l-1}t_l}$. They develop
recursions in $n$ for the quantities of interest, while our
recursion is over $k$. Since the number of segments in their model
is variable without a recursion over $k$, their algorithm runs in
time $O(n^2)$.
% Yao's geometric model
Yao \cite{Yao:84} considers identically geometrically distributed
breaks with $p_{ij}=q(1-q)^{j-i-1}$, i.e.\ for all $i$, the
probability of a break at $i$ is $q$, independent of the breaks at
other $j\neq i$. In the following we compare Yao's model ($P_q$) to
our model with uniform $k$-prior ($P_u$) and our model with fixed
segment number ($P_k$). Table \ref{tabQIkuq} summarizes the most
important quantities, derived from the segment prior $P(\v t|k)P(k)$
by exploiting various binomial identities.
% row 2
The first observation is that, given $k$, the breaks in Yao's model
are uniformly distributed as in ours (row 2).
%
% row 1
Hence his model can only differ from ours in the prior of $k$.
Indeed, his prior is strongly concentrated around $k=qn$ for typical
(large) $n$ ($P_q(k)$ in row 1). It more resembles a model with
fixed $k$. Note that in all three models, probabilities involving
$t_{lm}$ only depend on $t_l$ and $t_m$ and are independent of
$t_{l+1},...,t_{m-1}$.
%
% row 4 and 5
Yao's model suppresses long segments exponentially ($P_q($row
$4)=q(1-q)^{\text{length-1}}$) with fixed expected length
$\E_q[$row $5]={1\over q}-1$, while our model has a wide polynomial
tail ($P_u($row $4)\sim l\cdot t_{l-1}^l/t_l^{l+1}$) with expected
length $\E_u[$row $5]\sim {(t_{l-1}+l)/(l-1)}$ adapting to
the past average length and no prior bias ($\E_u[t_1]$ is undefined).
%
% row 8 and D
Similarly, the probability that there is no break in the range from
$i$ to $j$ (row 8) and the probability of a break at $j$ given the
previous one is at $i$ (row D) decay exponentially in Yao's model
but only harmonically respectively inverse cubic in our model.
% Barry and renewal
Barry and Hartigan \cite{Barry:92} consider renewal processes with
general transition probabilities $p_{ij}$. We can also use our
algorithm for renewal processes by setting $T_{ij}=p_{ij}$ in \req{Tdef}.
Interestingly, Barry and Hartigan's favorite model (stated without
derivation or motivation) essentially coincides with the long-tail
$p_{ij}$ (row A-D) of our model. Nevertheless, their model is
completely different from ours, since our model with uniform prior
is {\em not} a renewal process. In our model
$p^l_{ij}:=P_u(t_l=j|t_{l-1}=i)$ (row 4) depends on $l$, i.e.\
$P_u(t_{1l}) = p^1_{0t_1}p^2_{t_1 t_2}...p^l_{t_{l-1}t_l} \neq
p_{0t_1}p_{t_1 t_2}...p_{t_{l-1}t_l}$. This $l$-dependence also
prevents us from a recursion algorithm in $n$.
%
% row 9 and D
This also explains why the high probability of a break at $i$
($P($row $9)=\frs12$) and of segments of length one ($p_{i-1,i}={4\over
1\cdot 2\cdot 3}$) in both models, imply a high bias towards short
segments in Barry's model, but not in ours. This explains why their
method overestimates the number of segments.
%the last sentence in their paper,
% general pij
In general, any renewal process $p_{ij}$ has some expected or median
segment length, which leads to a strongly peaked $k$-prior,
in contrast to our uniform prior.
% Bayes-average over q
If we would perform a Bayesian average over $q\in[0,1]$ with uniform
prior, Yao's model would reduce to ours. Both algorithms are
$O(n^2)$ with a factor $k_{max}$ in our case and the grid size for
integrating over $q$ in Yao's case. Actually Yao \cite{Yao:84} and
Fearnhead \cite{Fearnhead:06} use a maximum likelihood estimate for
$q$.
% what's different/new in our work
So the major difference of our work from
\cite{Yao:84,Barry:92,Fearnhead:06} is that we start with a
non-informative uniform prior over $k$, while they start with a
renewal process necessarily corresponding to a highly informed prior
for $k$. In contrast to their recursion over $n$, our recursion over
$k$ allows for arbitrary (in particular uniform) prior over $k$.
This also naturally allows us to find the MAP $\hat k$.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Summary}\label{secDisc}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
We considered Bayesian regression of piecewise constant functions
with unknown segment number, location and level.
We derived an efficient algorithm that works for any noise and
segment level prior, e.g.\ Cauchy which can handle outliers. We
derived simple but good estimates for the in-segment variance. We
also proposed a Bayesian regression curve as a better way of
smoothing data without blurring boundaries. The Bayesian approach
also allowed us to straightforwardly determine the global
evidence, break probabilities and error estimates, useful for
model selection and significance and robustness studies. We
discussed the performance on synthetic and real-world examples.
Many possible extensions have been discussed.
%-------------------------------%
\paradot{Acknowledgements}
%-------------------------------%
Thanks to IOSI for providing the gene copy \# data and to Ivo
Kwee for discussions.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Bibliography %
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{small}
\begin{thebibliography}{ABCD}\parskip=0ex
\bibitem[BH92]{Barry:92}
D.~Barry and J.~A. Hartigan.
\newblock Product partition models for change point problems.
\newblock {\em Annals of Statistics}, 20:260--279, 1992.
\bibitem[BH93]{Barry:93}
D.~Barry and J.~A. Hartigan.
\newblock A {B}ayesian analysis for change point problems.
\newblock {\em Journal of the American Statistical Association}, 88:309--319,
1993.
\bibitem[Bol04]{Bolstad:04}
W.~M. Bolstad.
\newblock {\em Introduction to {B}ayesian Statistics}.
\newblock Wiley Interscience, New Jersey, 2004.
\bibitem[EF05]{Endres:05}
D.~Endres and P.~F{\"o}ldi{\'a}k.
\newblock Bayesian bin distribution inference and mutual information.
\newblock {\em IEEE Transactions on Information Theory}, 51(11):3766--3779,
2005.
\bibitem[Fea05]{Fearnhead:05}
P.~Fearnhead.
\newblock Exact {Bayesian} curve fitting and signal segmentation.
\newblock {\em IEEE Transactions on Signal Processing}, 53:2160--2166, 2005.
\bibitem[Fea06]{Fearnhead:06}
P.~Fearnhead.
\newblock Exact and efficient {B}ayesian inference for multiple changepoint
problems.
\newblock {\em Statistics and Computing}, 16:203--213, 2006.
\bibitem[Goo83]{Good:83}
I.~J. Good.
\newblock Explicativity, corroboration, and the relative odds of hypotheses.
\newblock In {\em Good thinking: The Foundations of Probability and its
applications}. University of Minnesota Press, Minneapolis, MN, 1983.
\bibitem[Hut05a]{Hutter:05pcrcode}
M.~Hutter.
\newblock Additional material to article.
\newblock {\em \\
http://www.idsia.ch/\linebreak[1]\raisebox{-1ex}{\~{}}marcus\linebreak[1]/ai%
/pcreg.htm}, 2005.
\bibitem[Hut05b]{Hutter:05bayestree}
M.~Hutter.
\newblock Fast non-parametric {B}ayesian inference on infinite trees.
\newblock In {\em Proc. 10th International Conf. on Artificial Intelligence and
Statistics ({AISTATS-2005})}, pages 144--151. Society for Artificial
Intelligence and Statistics, 2005.
\bibitem[Hut07]{Hutter:07pcreg}
M.~Hutter.
\newblock {B}ayesian regression of piecewise constant functions.
\newblock In J.M. Bernardo et al., editors, {\em Proc.
Bayesian Statistics}, volume~8, Benidorm, pages 607--612. Oxford University Press, 2007.
\bibitem[Jay03]{Jaynes:03}
E.~T. Jaynes.
\newblock {\em Probability Theory: The Logic of Science}.
\newblock Cambridge University Press, Cambridge, MA, 2003.
\bibitem[Jon03]{Jong:03}
K.~Jong{ et al.}
\newblock Chromosomal breakpoint detection in human cancer.
\newblock In {\em Applications of Evolutionary Computing: EvoWorkshops'03},
volume 2611 of {\em LNCS}, pages 54--65. Springer, 2003.
\bibitem[KW95]{Kass:95}
R.~E. Kass and L.~Wasserman.
\newblock A reference {Bayesian} test for nested hypotheses with large samples.
\newblock {\em Journal of the ACM}, 90:773--795, 1995.
\bibitem[Mac03]{MacKay:03}
D.~J.~C. MacKay.
\newblock {\em Information theory, inference and learning algorithms}.
\newblock Cambridge University Press, Cambridge, MA, 2003.
\bibitem[OVLW04]{Olshen:04}
A.~B. Olshen, E.~S. Venkatraman, R.~Lucito, and M.~Wigler.
\newblock Circular binary segmentation for the analysis of array-based {DNA}
copy number data.
\newblock {\em Biostatistics}, 5:557--572, 2004.
\bibitem[Pic05]{Picard:05}
F.~Picard{ et al.}
\newblock A statistical approach for array {CGH} data analysis.
\newblock {\em BMC Bioinformatics}, 6(27):1--14, 2005.
\bibitem[Pin98]{Pinkel:98}
D.~Pinkel{ et al.}
\newblock High resolution analysis of {DNA} copy number variation using
comparative genomic hybridization to microarrays.
\newblock {\em Nature Genetics}, 20:207--211, 1998.
\bibitem[PRLD05]{Picard:05segcl}
F.~Picard, S.~Robin, E.~Lebarbier, and J.~J. Daudin.
\newblock A segmentation-clustering problem for the analysis of array CGH data.
\newblock In {\em Proc. 11th International Symposium on Applied Stochastic
Models and Data Analysis ({ASMDA'05})}, pages 145--152, Brest, France, 2005.
\bibitem[Rin06]{Rinaldi:06}
A.~Rinaldi{ et al.}
\newblock Genomic and expression profiling identifies the B-cell associated
tyrosine kinase Syk as a possible therapeutic target in mantle cell lymphoma.
\newblock {\em British Journal of Haematology}, 132(3):303--316, 2006.
\bibitem[Sch78]{Schwarz:78}
G.~Schwarz.
\newblock Estimating the dimension of a model.
\newblock {\em Annals of Statistics}, 6(2):461--464, 1978.
\bibitem[SS75]{Sen:75}
A.~Sen and M.~S. Srivastava.
\newblock On tests for detecting a change in mean.
\newblock {\em Annals of Statistics}, 3:98--108, 1975.
\bibitem[Wea99]{Weakliem:99}
D.~L. Weakliem.
\newblock A critique of the {Bayesian} information criterion for model
selection.
\newblock {\em Sociological Methods and Research}, 27:359--397, 1999.
\bibitem[Yao84]{Yao:84}
Y.-C. Yao.
\newblock Estimation of a noisy discrete-time step function: {B}ayes and
empirical {B}ayes approaches.
\newblock {\em Annals of Statistics}, 12:1434--1447, 1984.
\end{thebibliography}
\end{small}
\end{document}
%--------------------End-of-PCRegx.tex------------------------%