previous  home  search LaTeX (44kb)   PostScript (226kb)   PDF (179kb)   Html/Gif contact    up    next

## Convergence and Error Bounds for Universal Prediction of Nonbinary Sequences

 Author: Marcus Hutter (2001) Comments: 12 pages Subj-class: Learning; Artificial Intelligence; Computational Complexity ACM-class: F.2.3 Reference: Proceedings of the 12th European Conference on Machine Learning (ECML-2001) 239-250 Report-no: IDSIA-07-01 and cs.LG/0106036 Paper: LaTeX (44kb)  -  PostScript (226kb)  -  PDF (179kb)  -  Html/Gif Slides: PostScript - PDF

Keywords: Bayesian sequence prediction; Solomonoff induction; Kolmogorov complexity; learning; universal probability; finite non-binary alphabet; convergence; error bounds; games of chance; partial and delayed prediction; classification.

Abstract: Solomonoff's uncomputable universal prediction scheme ß allows to predict the next symbol xk of a sequence x1...xk-1 for any Turing computable, but otherwise unknown, probabilistic environment µ. This scheme will be generalized to arbitrary environmental classes, which, among others, allows the construction of computable universal prediction schemes ß. Convergence of ß to µ in a conditional mean squared sense and with µ probability 1 is proven. It is shown that the average number of prediction errors made by the universal ß scheme rapidly converges to those made by the best possible informed µ scheme. The schemes, theorems and proofs are given for general finite alphabet, which results in additional complications as compared to the binary case. Several extensions of the presented theory and results are outlined. They include general loss functions and bounds, games of chance, infinite alphabet, partial and delayed prediction, classification, and more active systems.

Reviewer: Resolves a 25 year open problem in a central prediction method for time series in the affirmative. Textbook quality--definite accept.

 previous  home  search LaTeX (44kb)   PostScript (226kb)   PDF (179kb)   Html/Gif contact    up    next

1. Introduction
2. Setup
• Strings and Probability Distributions
• Universal Prior Probability Distribution
• Probability Classes
3. Convergence
• Upper Bound for the Relative Entropy
• Lower Bound for the Relative Entropy
• Convergence of ß to µ
• Theorem 1 (Convergence)
4. Error Bounds
• Total Expected Numbers of Errors
• Error Bound
• Theorem 2 (Error bound)
• Proof of Theorem 2
5. Generalizations
• General Loss Function
• Games of Chance
• Infinite Alphabet
• Partial Prediction, Delayed Prediction, Classification
• More Active Systems
• Miscellaneous
6. Summary
 previous  home  search LaTeX (44kb)   PostScript (226kb)   PDF (179kb)   Html/Gif contact    up    next

### BibTeX Entry

@Article{Hutter:01alpha,
author =       "Marcus Hutter",
number =       "IDSIA-07-01",
institution =  "Istituto Dalle Molle di Studi sull'Intelligenza Artificiale",
title =        "Convergence and Error bounds for Universal Prediction of Nonbinary Sequences",
journal =      "Proceedings of the 12th European Conference on
Machine Learning (ECML-2001)",
editor =       "Luc De Raedt and Peter Flach",
publisher =    "Springer",
series =       "Lecture Notes in Artificial Intelligence (LNAI 2167)",
month =         sep,
year =         "2001",
pages =        "239--250",
ftp =          "ftp://ftp.idsia.ch/pub/techrep/IDSIA-07-01.ps.gz",
url =          "http://www.hutter1.net/ai/palpha.htm",
url2 =         "http://arxiv.org/abs/cs.LG/0106036",
keywords =     "Induction; Solomonoff, Bayesian, deterministic
prediction; Kolmogorov complexity; leaning; Loss function;
algorithmic information theory; universal probability",
abstract =     "Solomonoff's uncomputable universal prediction scheme $\xi$ allows
to predict the next symbol $x_k$ of a sequence $x_1...x_{k-1}$ for
any Turing computable, but otherwise unknown, probabilistic
environment $\mu$. This scheme will be generalized to arbitrary
environmental classes, which, among others, allows the
construction of computable universal prediction schemes $\xi$.
Convergence of $\xi$ to $\mu$ in a conditional mean squared sense
and with $\mu$ probability $1$ is proven. It is shown that the
average number of prediction errors made by the universal $\xi$
scheme rapidly converges to those made by the best possible
informed $\mu$ scheme. The schemes, theorems and proofs are given
for general finite alphabet, which results in additional
complications as compared to the binary case.
Several extensions of the presented theory and
results are outlined. They include general loss functions and
bounds, games of chance, infinite alphabet, partial and delayed
prediction, classification, and more active
systems.",
}

 previous  home  search LaTeX (44kb)   PostScript (226kb)   PDF (179kb)   Html/Gif contact    up    next