in Section 6.3. We have included more recent
material on implementing
nonlinear decision boundaries using both the kernel perceptron and radial basis
function networks. There is a new section on Bayesian networks, again in
response to readers’ requests, with a description of how to learn classifiers based
on these networks and how to implement them efficiently using all-dimensions
trees.
The Weka machine learning workbench that accompanies the book, a widely
used and popular feature of the first edition, has acquired a radical new look in
the form of an interactive interface—or rather, three separate interactive inter-
faces—that make it far easier to use. The primary one is the Explorer, which
gives access to all of Weka’s facilities using menu selection and form filling. The
others are the Knowledge Flow interface, which allows you to design configu-
rations for streamed data processing, and the Experimenter, with which you set
up automated experiments that run selected machine learning algorithms with
different parameter settings on a corpus of datasets, collect performance statis-
tics, and perform significance tests on the results. These interfaces lower the bar
for becoming a practicing data miner, and we include a full description of how
to use them. However, the book continues to stand alone, independent of Weka,
and to underline this we have moved all material on the workbench into a sep-
arate Part II at the end of the book.
In addition to becoming far easier to use, Weka has grown over the last 5
years and matured enormously in its data mining capabilities. It now includes
an unparalleled range of machine learning algorithms and related techniques.
The growth has been partly stimulated by recent developments in the field and
partly led by Weka users and driven by demand. This puts us in a position in
which we know a great deal about what actual users of data mining want, and
we have capitalized on this experience when deciding what to include in this
new edition.
The earlier chapters, containing more general and foundational material,
have suffered relatively little change. We have added more examples of fielded
applications to Chapter 1, a new subsection on sparse data and a little on string
attributes and date attributes to Chapter 2, and a description of interactive deci-
sion tree construction, a useful and revealing technique to help you grapple with
your data using manually built decision trees, to Chapter 3.
In addition to introducing linear decision boundaries for classification, the
infrastructure for neural networks, Chapter 4 includes new material on multi-
nomial Bayes models for document classification and on logistic regression. The
last 5 years have seen great interest in data mining for text, and this is reflected
in our introduction to string attributes in Chapter 2, multinomial Bayes for doc-
ument classification in Chapter 4, and text transformations in Chapter 7.
Chapter 4 includes a great deal of new material on efficient data structures for
searching the instance space: kD-trees and the recently invented ball trees. These
x x v i i i
P R E FAC E
P088407-FM.qxd 4/30/05 10:55 AM Page xxviii
are used to find nearest neighbors efficiently and to accelerate distance-based
clustering.
Chapter 5 describes the principles of statistical evaluation of machine learn-
ing, which have not changed. The main addition, apart from a note on the Kappa
statistic for measuring the success of a predictor, is a more detailed treatment
of cost-sensitive learning. We describe how to use a classifier, built without
taking costs into consideration, to make predictions that are sensitive to cost;
alternatively, we explain how to take costs into account during the training
process to build a cost-sensitive model. We also cover the popular new tech-
nique of cost curves.
There are several additions to Chapter 6, apart from the previously men-
tioned material on neural networks and Bayesian network classifiers. More
details—gory details—are given of the heuristics used in the successful RIPPER
rule learner. We describe how to use model trees to generate rules for numeric
prediction. We show how to apply locally weighted regression to classification
problems. Finally, we describe the X-means clustering algorithm, which is a big
improvement on traditional k-means.
Chapter 7 on engineering the input and output has changed most, because
this is where recent developments in practical machine learning have been con-
centrated. We describe new attribute selection schemes such as race search and
the use of support vector machines and new methods for combining models
such as additive regression, additive logistic regression, logistic model trees, and
option trees. We give a full account of LogitBoost (which was mentioned in the
first edition but not described). There is a new section on useful transforma-
tions, including principal components analysis and transformations for text
mining and time series. We also cover recent developments in using unlabeled
data to improve classification, including the co-training and co-EM methods.
The final chapter of Part I on new directions and different perspectives has
been reworked to keep up with the times and now includes contemporary chal-
lenges such as adversarial learning and ubiquitous data mining.
Acknowledgments
Writing the acknowledgments is always the nicest part! A lot of people have
helped us, and we relish this opportunity to thank them. This book has arisen
out of the machine learning research project in the Computer Science Depart-
ment at the University of Waikato, New Zealand. We have received generous
encouragement and assistance from the academic staff members on that project:
John Cleary, Sally Jo Cunningham, Matt Humphrey, Lyn Hunt, Bob McQueen,
Lloyd Smith, and Tony Smith. Special thanks go to Mark Hall, Bernhard
Pfahringer, and above all Geoff Holmes, the project leader and source of inspi-
P R E FAC E
x x i x
P088407-FM.qxd 4/30/05 10:55 AM Page xxix