Publish a compression program comp9.exe that outputs archive9.exe given input enwik9.
If archive9.exe is run with no input, it reproduces 109 byte file data9 that
is identical to enwik9.
Total size is measured as S := length(comp9.exe/zip)+length(archive9.exe).
Programs must be Windows or Linux (x86 32bit or 64bit) executables.
Programs must run without input from other sources (files, network,
dictionaries, etc.) under Windows or Linux without additional
installations. Use of standard libraries as for file I/O are allowed.
Each program must run in less than 70'000/T hours on a machine using at most 10GB RAM and 100GB HDD for temporary files,
where T is the machine's Geekbench5 score. No GPU usage.
In particular they must run on our current test machines, which are as of 2021 (but may change without notice)
a Lenovo 82HT Intel Core i7-1165G7 2.79GHz (Windows)
with T≈1427 (1 core) and T≈4667 (4 cores)
and an AMD Ryzen 7 3.6GHz (Linux)
with T=1310 (1 core) and T=8228 (8 cores)
In lieu of comp9.exe,
a compressor comp9a.exe producing archive9.bhm from enwik9,
and a decompressor decomp9.exe producing data9 from archive9.bhm may be submitted.
In this case, total size is measured as
S := length(comp9a.exe/zip)+2×length(decomp9.exe/zip)+length(archive9.bhm).
Resource restrictions for all executables are the same.
In lieu of executables X.exe (X=comp9/comp9a/decomp),
zip-files X.zip containing the source code and makefile,
which create X.exe may be submitted.
C++, Python, and Assembler are accepted. We may also consider other languages if
we can easily compile and run and verify the code for compliance with the other rules.
If command-line options for execution or compilation are necessary, their length is added to S.
If your contribution violates some of the rules, you may still be
eligible for the Large Compression Benchmark (see below).
A direct link to a self-extracting archive archive9.exe (or decomp9.exe+archive9.bhm)
A single line of instruction how to execute it (long/complicated instructions are inadmissable)
Names of all involved program(s), version, and used options if any
Sizes of (de)comp9(a) and most important archive9.exe/bhm
Approximate (de)compression time and maximal main and HDD memory used
Description of the test machine (processor, memory, operating system, Geekbench5 score)
Links where all files can be downloaded:
Executables (or zipped source) and well-documented source code
under some OSI license of (de)compressor and all other relevant files
A (link to a) document explaining the (algorithmic) ideas that led to or are incorporated into the algorithms and their inner working
Award = Z×(L-S)/L, where S = new record (size of comp9.exe+archive9.exe or alternatives above), L = previous record for S, Z = amount in prize fund (500'000€)
Update: L := S, while Z itself does not get reduced.
Minimum award is 1% of Z.
Contributions are dealt with in the order of their submission.
The contribution is subject to public comments for a period of at
least 30 days before the prize is awarded.
Compressors/decompressors do not have to be general purpose. They may
be tuned specifically to this benchmark and are allowed to reject or
fail on any input other than enwik9/archive9.bhm.
Only the version and combination of options submitted is eligible for
the prize.
If an author breaks his own record within 30 days,
the older submission is regarded as withdrawn.
If a submission fails to meet the criteria for the prize, the entrant
will be informed, and the submission henceforth be ignored. In
particular a miss of the 1% criterion will not diminish the prize (L
remains unchanged).
If your compressor beats the current record, but violates some of the
constraints regarding operating system, used dll's, used programming
language, etc, Matt Mahoney may be willing to assist you in satisfying
them. For instance if you send portable C code he can compile it under
Windows for you.
You can run some of the previous records on your system, and by
comparing your runtime with the displayed runtime, you can estimate
whether your algorithm will meet the time constraint on our machine.
Members of the prize committee are not eligible for prize money.
Committee members can publish (de)compressors but only when no
(other) submissions are pending verification.
L is updated as for regular submissions, but no money is paid.
The above formula currently amounts to 1€ for every ~230 byte
improvement, with a minimum improvement of ~1MB.
If a decompressor has multiple authors, then a submission must
include instructions for dividing the prize money. All authors must
agree on this distribution before any money can be awarded.
There will be a waiting period of at least 30 days after submission
to allow for public comment and verification. Comments should be made to
the Hutter
Prize Discussion Forum or by email to members of the
Prize committee.
The programs and/or data files must be available on the Internet for
free download and testing.
Documented source code must be made publicly available under some
OSI-approved license before the prize money will paid out.
The (virtual) prize fund (Z) is constant. It is not decreased
after awarding a prize. It may increase if additional sponsors
contribute to it. (Please contact
Marcus
Hutter if you wish to contribute).
The prize will be paid if the solution reflects the spirit of the
contest. In particular decompressors (secretely) receiving any kind of "outside"
information are forbidden. Also in order to verify your claim we need to
be able to run your executable on our machines. Payment of the prize
cannot be legally enforced. Marcus Hutter will make the final decision
whether to recognize a record, award a prize, and the amount.
Rules may change at any time without notice to meet the goals of
fairness, accuracy, fostering progress and public participation, and recognizing
existing practice.