who
interprets them, and that person needs to know something about how the
models are produced to appreciate the strengths, and limitations, of the tech-
nology. However, it is not necessary for all data model users to have a deep
understanding of the finer details of the algorithms.
We address this situation by describing machine learning methods at succes-
sive levels of detail. You will learn the basic ideas, the topmost level, by reading
the first three chapters. Chapter 1 describes, through examples, what machine
learning is and where it can be used; it also provides actual practical applica-
tions. Chapters 2 and 3 cover the kinds of input and output—or knowledge
representation—involved. Different kinds of output dictate different styles
of algorithm, and at the next level Chapter 4 describes the basic methods of
machine learning, simplified to make them easy to comprehend. Here the prin-
ciples involved are conveyed in a variety of algorithms without getting into
intricate details or tricky implementation issues. To make progress in the appli-
cation of machine learning techniques to particular data mining problems, it is
essential to be able to measure how well you are doing. Chapter 5, which can be
read out of sequence, equips you to evaluate the results obtained from machine
learning, addressing the sometimes complex issues involved in performance
evaluation.
At the lowest and most detailed level, Chapter 6 exposes in naked detail the
nitty-gritty issues of implementing a spectrum of machine learning algorithms,
including the complexities necessary for them to work well in practice. Although
many readers may want to ignore this detailed information, it is at this level that
the full, working, tested implementations of machine learning schemes in Weka
are written. Chapter 7 describes practical topics involved with engineering the
input to machine learning—for example, selecting and discretizing attributes—
and covers several more advanced techniques for refining and combining the
output from different learning techniques. The final chapter of Part I looks to
the future.
The book describes most methods used in practical machine learning.
However, it does not cover reinforcement learning, because it is rarely applied
in practical data mining; genetic algorithm approaches, because these are just
an optimization technique; or relational learning and inductive logic program-
ming, because they are rarely used in mainstream data mining applications.
The data mining system that illustrates the ideas in the book is described in
Part II to clearly separate conceptual material from the practical aspects of how
to use it. You can skip to Part II directly from Chapter 4 if you are in a hurry to
analyze your data and don’t want to be bothered with the technical details.
Java has been chosen for the implementations of machine learning tech-
niques that accompany this book because, as an object-oriented programming
language, it allows a uniform interface to learning schemes and methods for pre-
and postprocessing. We have chosen Java instead of C
++, Smalltalk, or other
x x v i
P R E FAC E
P088407-FM.qxd 4/30/05 10:55 AM Page xxvi
object-oriented languages because programs written in Java can be run on
almost any computer without having to be recompiled, having to undergo com-
plicated installation procedures, or—worst of all—having to change the code.
A Java program is compiled into byte-code that can be executed on any com-
puter equipped with an appropriate interpreter. This interpreter is called the
Java virtual machine. Java virtual machines—and, for that matter, Java compil-
ers—are freely available for all important platforms.
Like all widely used programming languages, Java has received its share of
criticism. Although this is not the place to elaborate on such issues, in several
cases the critics are clearly right. However, of all currently available program-
ming languages that are widely supported, standardized, and extensively docu-
mented, Java seems to be the best choice for the purpose of this book. Its main
disadvantage is speed of execution—or lack of it. Executing a Java program is
several times slower than running a corresponding program written in C lan-
guage because the virtual machine has to translate the byte-code into machine
code before it can be executed. In our experience the difference is a factor of
three to five if the virtual machine uses a just-in-time compiler. Instead of trans-
lating each byte-code individually, a just-in-time compiler translates whole
chunks of byte-code into machine code, thereby achieving significant speedup.
However, if this is still to slow for your application, there are compilers that
translate Java programs directly into machine code, bypassing the byte-code
step. This code cannot be executed on other platforms, thereby sacrificing one
of Java’s most important advantages.
Updated and revised content
We finished writing the first edition of this book in 1999 and now, in April 2005,
are just polishing this second edition. The areas of data mining and machine
learning have matured in the intervening years. Although the core of material
in this edition remains the same, we have made the most of our opportunity to
update it to reflect the changes that have taken place over 5 years. There have
been errors to fix, errors that we had accumulated in our publicly available errata
file. Surprisingly few were found, and we hope there are even fewer in this
second edition. (The errata for the second edition may be found through the
book’s home page at http://www.cs.waikato.ac.nz/ml/weka/book.html.) We have
thoroughly edited the material and brought it up to date, and we practically
doubled the number of references. The most enjoyable part has been adding
new material. Here are the highlights.
Bowing to popular demand, we have added comprehensive information on
neural networks: the perceptron and closely related Winnow algorithm in
Section 4.6 and the multilayer perceptron and backpropagation algorithm
P R E FAC E
x x v i i
P088407-FM.qxd 4/30/05 10:55 AM Page xxvii