For the impatient.
If you are the sort of impatient student who wants to build super
intelligent machines right away without "wasting" time reading or
learning too much, well, others have tried in the last 50 years and
failed, and so will you. If you can't hold back, at least read Legg (2008)
This is an excellently written non-technical thesis on
the necessary ingredients for super intelligent machines.
It will not help you much building one, since in order to
properly understand the general theory and
to bridge the gap to "narrow" but practical existing AI
algorithms, you need a lot more background. Nevertheless,
[Leg08] might motivate you to consider reading the books
I'll recommend now.
Russell and Norvig (2003) [RN10]
is the textbook to learn about
Artificial Intelligence. The book gives a broad introduction,
survey, and solid background of all aspects of AI. There is no real
alternative. Whatever subarea of AI you specialize later, you should
understand all introduced concepts, and have implemented and solved
at least some of the exercises.
The textbooks below are relevant for understanding and modeling
general intelligent behavior. If you already got attracted to some
specific AI applications, they may not be relevant for you.
One axis of categorizing AI is into
(1) logical (2) planning and (3) learning aspects.
CSL@ANU has experts in all 3 areas.
Historically, AI research started with (1) in the 1950s, which is
still relevant for many concrete practical applications.
Since at least in humans, high-level logical reasoning seems to
emerge from the more basic learning and planning aspects,
it is conceivable that (1) will play no fundamental role in a general AI system.
So I will concentrate on (2) and (3).
If put together, learning+planning under uncertainty is mainly the domain
of reinforcement learning (RL), also called adaptive control
or sequential decision theory in other fields.
Sutton and Barto (1998) [SB98]
is the excellent default RL textbook. It
requires no background knowledge, describes the key ideas, open
problems, and great applications of this field. Don't be surprised
about the ease of the book, it teaches understanding, not proofs. It
gets really tough to make things work in practice or to prove
If you want to bring order into the bunch of methods and ideas
you've learned so far, and want to understand more deeply their
connection either for curiosity or to extend the existing systems to
more general and powerful ones, you need to learn about some
concepts that at first seem quite disconnected and theoretical.
Intelligence has a lot to do with information processing. Algorithmic
information theory (AIT) is a branch of information theory that is
powerful enough to serve as a foundation for intelligent information
processing. It can deal with key aspects of intelligence, like
similarity, creativity, analogical reasoning, and generalization,
which are fundamentally connected to the induction problem and
Ockham's razor principle.
Li and Vitanyi's (1997) AIT book [LV97]
provides an excellent introduction. Kolmogorov complexity, Minimal Description Length,
universal Solomonoff induction, universal Levin search, and all
that. It requires a background in theoretical computer science in
general and computability theory in particular, which can be
obtained from the classic textbook
Now you are in a position to read [Hut05].
The book develops a sound and complete mathematical theory of an optimal
"intelligent" general-purpose learning agent. The theory is complete in
the sense that it gives a complete description of this agent, not
just an incomplete framework with gaps to be filled. But be warned,
it is only a theory. Like it is a long way from e.g. the minimax theory
of optimally playing games like chess to real chess programs, it is
a long way from this theory to a practical general purpose intelligent agent
The other recommended books below can be regarded as further readings
that provide more background and deepen your understanding of various
important aspects in AI research.
Bishop (2006) [Bis06]
is the excellent default textbook in statistical
machine learning, and should be put on your reading list.
Some Bayesian probability book will be useful too
How multiple rational agents interact
is the domain of game theory
natural language understanding
interfaces abstract agents with the real world.
gently and broadly introduces you to philosophy of science in general and
Earman (1992) [Ear92]
to the induction problem in particular.
This is particularly true for the information-theoretic
reinforcement learning approach to universally intelligent
agents (Universal AI).
This means that you have to acquire a lot of background knowledge first.
Textbooks on an advanced undergraduate or graduate level
to most of these subjects have been recommended above.
A background in the following subjects should be sufficient to understand them.
The following 11 items list the most relevant undergraduate background subjects:
Logic and philosophy are cornerstones of reasoning.
Analysis and linear algebra handle real-valued information.
Probability and statistics
are required for dealing with uncertainty and learning aspects.
Information theory deals with data and knowledge.
Programming, numerics, algorithms, and computability
are needed for the algorithmic, implementation, and applied aspects of AI.
Philosophy (of mind, knowledge, science, reasoning, induction, deduction)
Most engineering grows out of science, and all science grew out of philosophy.
The mind, knowledge, intelligence, rationality, reasoning, induction, etc.
pose a plethora of philosophical questions.
Also, the emergence of human-level AIs will have deep social, ethical,
and economic consequences, and raises deep philosophical questions.
Finally, some exposure to philosophy sharpens your analytical thinking,
and trains you to think out of the box by
questioning common assumptions which may be wrong.
Logic (predicate logic, reasoning, deduction, proof, completeness, soundness)
Critical thinking and proper argumentation
is fundamental to science and a key trait of intelligence.
Since logic formalizes rational arguments and deductive reasoning,
it plays a vital role in the field of (good old fashioned) artificial intelligence.
Even when you pursue an approach to AI where formal
logic plays no direct role (like machine learning and Universal AI),
some training in formal logic sharpens your analytical thinking,
and the AI problem is difficult and abstract enough that you will definitely
profit from it.
Analysis (real numbers and functions, inequalities, limits, differentiation, integration)
Real numbers and functions are fundamental to describing our (physical) world.
Since intelligent agents usually interact with this world, some
of their observations, knowledge, and states are naturally represented by real numbers or functions,
e.g. battery level, degrees of belief, or trajectory of objects.
Limits and derivatives are e.g. needed for dealing with
temporal or spatial change and probability densities.
Linear Algebra (linear equations, vectors, matrices, determinants, eigenvectors, quadratic forms)
Most problems in linear algebra have efficient algorithmic solutions.
Therefore many real-world problems in machine learning (ML) and other fields are
attempted to formalize in terms of vectors and matrices and
often are or can be reduced to or (locally) approximated by linear problems.
Also, data is often naturally represented as (e.g. feature) vectors or (e.g. dependency) matrices.
Probability (random variable, conditional probability, expectation, measure theory, densities)
An agent's subjective uncertainty about our world (e.g. do ETs exist?) and objective random processes (e.g. dice or nuclear decay)
can both be modeled by probabilities. Probability theory allows to predict
the likelihood of (future) events, so is crucial for inductive reasoning.
Statistics (estimation, likelihood, prior, Bayes rule, hypotheses, central limit theorem)
Statistics estimates probabilities or models or hypotheses or related
quantities from an agent's past observations (e.g. number of heads versus tails),
so is also crucial for inductive reasoning.
Bayes rule allows to update an agent's belief about our world given new evidence.
Note that the Bayesian approach quite blurs the distinction between probability and statistics.
Programming / C (data types/structure, operators, expressions, control flow, functions, I/O)
Any theory or model or algorithm of an intelligent system
has to be implemented before it can be applied in practice
(analytical solutions or by hand are out of question).
C has among the programming languages a status similar to English among
the natural languages. Both are communication defaults.
Real academics/programmers are able to read and write English/C.
Note that each (GOF)AI paradigm has spawned its own special-purpose programming language
(Prolog, Lisp, Scheme, Smalltalk, Haskell, ...).
Numerics (interpolation, integration, function evaluation, root finding, optimization, linear problems)
Numerical algorithms approximate the solution of problems that
involve (functions of) real numbers. Most
problems in machine learning involve maximizing or minimizing some
functions, which requires optimization algorithms or, after differentiation, solving
Information Theory (entropy, information, data compression, channel capacity)
Intelligent agents are information processing systems.
Agents perceive, process, and store information
(e.g. bits from a video camera) from the environment
and transmit information (e.g. angles for a robotic arm)
to their environment. Therefore information theory plays a key role in
Algorithms (sorting, data structures, graph algorithms)
Algorithms are (finite) sequences of elementary (mathematical,logical,branching) operations.
Effective solutions of virtually all AI problems
involve many different and complex algorithms and data structures.
Computability (languages, automata, Turing machines, complexity classes, randomization)
Computability and complexity theory classify problems
according to their inherent difficulty.
They can be used to determine the computational complexity
of a particular (AI) problem, i.e. whether there exists
an (efficient) algorithmic solution to it.
Universal Turing machines play a prominent role in Universal AI.
Below I recommend some books and courses which take you from high school level
to a college bachelor level, and should enable you to
read the textbooks above.
Note that just reading websites or short (Wikipedia) articles
about these topic is not a substitute for working through a textbook.
You need to do a fair amount of exercises, e.g. those in the books suggested below.
The list below is quite comprehensive.
Of course the depth, breadth, and focus
can depend on what you want to do: Just implement and apply some AI or ML
algorithms, understand the algorithms, understand the
mathematics behind them, or even further develop the theory.
The Australian National University (ANU) offers a
of Advanced Computing.
Computational Foundations Major
and the Intelligent Systems Major
both have a good overlap with my book recommendations.
In case you are doing one of these majors and are seriously interested
in Universal AI, add as many mathematics and statistics courses/books as possible
to this curriculum.
For most books I indicate level, prerequisites,
and related courses at the ANU:
The indicated year gives you a rough indication about the level
of the subject and amount of presumed background knowledge and year
in which this course could be taken, presuming a degree that follows
the book selection below.
Whenever available, I indicate for a book the closest matching ANU course.
The course might cover only certain aspects or not be that close after
The bold-face title books are recommended, the others are optional
Nicolas Alchin (2006) Theory of Knowledge
(0th year Uni.
This book is a fantastic introduction into (western) philosophy. If
you care at all about philosophy, then this is the book to
start with. It is immensely broad without being superficial,
avoids obscure philosophical jargon, seems very balanced,
and contains a wealth of contemporary popular material from other
resources. It covers philosophies of science, arts, math,
reason, history, empiricism, paradigms, culture, language,
ethics, politics, religion, emotions, truth, and
more. Just the collection of quotes at the beginning of each
chapter are worth the book. The book makes you question
everything you know. This book needs to be read very slowly.
It's important to reflect on the material and discuss with
others. Maybe form a reading club.
This book is actually used for the International Baccalaureate.
Peter Godfrey-Smith (2003)
Theory and Reality is a
is a clear and thorough undergraduate textbook introduction
to the philosophy of science.
Kernighan & Ritchie (1988) C Programming Language
(1st year Uni.
Pre: access to a computer with a C compiler.)
This is a classic unexcelled introduction to (ANSI) C, written by their developers:
slim, complete, and to the point.
It includes an introduction to programming, to C, a reference manual,
and explanation of all standard libraries, and many tips and tricks.
There's one programming language you should master (completely).
And that is C. All others are optional and can be learned on demand
to any desired degree. C is not very forgiving when you make errors,
but this is *good*. It educates you to become a careful programmer.
Press & al.(1992) Numerical Recipes in C
(2nd year Uni.
Pre: programming experience in C. higher math for some chapters.
This is a classic book covering a broad range of numerical
algorithms, i.e. those involving real numbers, for
interpolation, integration, function evaluation, root finding,
optimization, fourier transform, differential equations, and
many linear algebra problems. The complete C-code is in the
book. Emphasis is on comprehensibility rather than optimized
black-box libraries. Indeed, the book contains and hence can
serve as a compact hands-on introduction to the various
mathematical fields. Chapters 1,2,3,7,8,9,10,11,14,15,20 are an
absolute must, and Chapters 4,5,6 recommended.
If you can't afford whole courses or books on linear algebra,
analysis, statistics, etc, this single book may even serve as a
poor-man's substitute, although some chapters presuppose the
corresponding mathematical knowledge.