(widely known as the Hutter Prize)
50'000€ Prize for Compressing Human Knowledge
Compress the 100MB file enwik8 to less than the current record of about 16MB
Being able to compress well is closely
related to intelligence as explained below.
While intelligence is a slippery concept,
file sizes are hard numbers.
Wikipedia is an extensive snapshot of Human Knowledge.
If you can compress the first 100MB of Wikipedia better
than your predecessors, your (de)compressor likely has to be smart(er).
The intention of this prize is to encourage development
of intelligent compressors/programs as a path to AGI.
Create a compressed version (self-extracting archive) of the 100MB file enwik8 of less than about 16MB. More precisely:
Remark: You can download the zipped version
enwik8.zip of enwik8
Please find more details including constraints and relaxations at
- Create a Linux or Windows executable of size S < L := 15'949'688 = previous record.
- If run, it produces (without input from other sources) a 108 byte file
that is identical to enwik8.
- If we can verify your claim,
you are eligible for a prize of 50'000€×(1-S/L). Minimum claim is 1500€.
- Restrictions: Must run in <10 hours on a 2GHz P4 with 1GB RAM and 10GB free HD.
This compression contest is motivated by the fact that
being able to compress well is closely
related to acting intelligently, thus reducing
the slippery concept of intelligence to hard file size numbers.
In order to compress data, one has to find regularities in them,
which is intrinsically difficult (many researchers live from
analyzing data and finding compact models).
So compressors beating the current "dumb" compressors need to be
smart(er). Since the prize
wants to stimulate developing "universally" smart compressors,
we need a "universal" corpus of data.
Arguably the online encyclopedia
is a good snapshot of the Human World Knowledge.
So the ultimate compressor of it should "understand"
all human knowledge, i.e. be really smart.
enwik8 is a hopefully representative 100MB extract from Wikipedia.
Detailed Rules for Participation
||6.27 | 936MB | ~9h
||6.07 | 936MB | 9h
||5.86 | 900MB | 5h
||5.46 | 854MB | 5h
William of Ockham's razor: Entities should not be multiplied beyond necessity.
Ray Solomonoff introduced algorithmic probability for universal prediction.
Leonid Broukhis introduced the first compression competition with a prize.
Marcus Hutter introduced a compression based universal intelligent agent.
Jim Bowery proposed a larger scale compression contest based on the Wikipedia corpus.
Matt Mahoney compressed Wikipedia with many state of the art compressors.
Marcus Hutter launched the 50'000€ prize.
- Jim Bowery: verification of claims, public relations, finding sponsors, newsgroups, etc.
- Matt Mahoney: running the compression competition.
- Marcus Hutter: arbiter, current sponsor, and manager of prize fund.
We would like to increase the prize with the help of donations.
Currently we can only accept pledges of over 1000€, i.e.
the donor obliges himself to pay up to the pledged amount to one or
more winners in the future.
In return, the donor will be appreciated by placing his name
besides the winner in the table of records, unless he wants to
If you consider becoming a sponsor for
(or have questions or suggestions regarding) our the prize,
please contact one of the committee members above for
more information or
fill out and return the pledge form
(PDF / ASCII).
Please regard this as a suggestion only. We are open to
other forms, and in particular establishing a real fund.
Frequently Asked Questions (FAQ)
So far we have received the submissions below. Each is/was
open for public comment and verification for 30 days before an award decision
will be/was made. Comments should be made to the
Hutter Prize Newsgroup
or by email to members of the
||Meets all prize criteria. Third winner!
||3.0% improvement over new baseline paq8hp12
||Meets all prize criteria. Second winner!
||1% improvement over new baseline paq8hp5
||Meets all prize criteria. First winner!
||Superseded by paq8hp5
||Superseded by paq8hp4
||Superseded by paq8hp3
||Superseded by paq8hp2
||Superseded by paq8hp1
||-m1650 -o21 -t2
||Fails to meet the reasonable memory limitations
||Fails to meet the 1% hurdle, and others
The time for decompression/compression is estimated for a 2GHz P4.
The percent (%) improvement is over the baseline (previous record) L=17'073'018 and L=18'324'887 respectively
More details on the (de)compressors can be found
- Sep'07-...: Alexander Rhatushnyak submits another series of ever improving compressors.
Is there nobody else who can keep up with him?
submits another improved series of (de)compressors paq8hp6-12
(option -7). On 14.May 2007 he submits paq8hp12
It achieved an improvement of 3.5% over the new baseline paq8hp5 and was finally confirmed as the second winner on 30.June 2007.
Congratulations! A detailed description of paq8hp12 can be found here.
Most of the time in developing paqhp6-12 went into planning and
performing experiments, and studying and understanding the results
of these experiments.
Alexander Rhatushnyak's current occupation is in software
engineering. For him data compression is science and art and sport
all together. This was his motivation for participating in the
Dr. Rhatushnyak was born in the Siberian Scientific Center
(www.nsc.ru), studied data compression and related algorithms since 1991,
and graduated from the Moscow State University (www.msu.ru) in 1996.
After his PhD in 2002 he lived and worked in various places in the world.
of the Moscow State University Compression Project
submits an improving series of (de)compressors paq8hp?
(option -7), modifications of paq8h with a custom dictionary built from enwik8
and other improvements. Przemyslaw Skibinski contributed to earlier versions.
On 25.Sep.2006 Alexander Rhatushnyak submits paq8hp5.
It achieved an improvement of 6.8% over the baseline paq8f
and was finally confirmed as the first winner on 25.Oct.2006. Congratulations!
A detailed description of paq8hp5 can
be found here.
submits a modification of (de)compressor durilca
(option -m1650 -o21 -t2),
a modification of ppmd/ppmonstr with filters for text, exe, and data with fixed length records.
submits (de)compressor raq8g.cpp
(option -7), a modification of paq8f with additional text modeling.
Warning: The average quality of the posts
in the discussion groups and mailing lists is very low.
Most participants don't know the underlying scientific concepts
and some have not even read the
behind the contest. For a cleaned summary consult the
frequently asked questions.
The competition was also announced or discussed in many blogs.
Disclaimer: Copying and distribution of this page
is permitted, provided the source is cited.
The prize will be paid if the solution reflects the spirit of the contest.
In particular decompressors (secretely) receiving any kind of "outside" information are
forbidden. Also in order to verify your claim we need to be able to run your executable on
our machines within reasonable space and time constraints.
Payment of the prize cannot be legally enforced. The smallest claimable prize is 1500€.
After an award, the prize formula (L) will be adapted.
Rules may change at any time to meet the goals of fairness, accuracy,
maximizing public participation, and recognizing existing practice. July 2006.