Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

Yüklə 4,3 Mb.

Pdf görüntüsü

səhifə	9/219
tarix	08.10.2017
ölçüsü	4,3 Mb.
	#3816

1 ... 5 6 7 8 9 10 11 12 ... 219

Updated and revised content

who interprets them, and that person needs to know something about how the

models are produced to appreciate the strengths, and limitations, of the tech-

nology. However, it is not necessary for all data model users to have a deep

understanding of the ﬁner details of the algorithms.

We address this situation by describing machine learning methods at succes-

sive levels of detail. You will learn the basic ideas, the topmost level, by reading

the ﬁrst three chapters. Chapter 1 describes, through examples, what machine

learning is and where it can be used; it also provides actual practical applica-

tions. Chapters 2 and 3 cover the kinds of input and output—or knowledge

representation—involved. Different kinds of output dictate different styles

of algorithm, and at the next level Chapter 4 describes the basic methods of

machine learning, simpliﬁed to make them easy to comprehend. Here the prin-

ciples involved are conveyed in a variety of algorithms without getting into

intricate details or tricky implementation issues. To make progress in the appli-

cation of machine learning techniques to particular data mining problems, it is

essential to be able to measure how well you are doing. Chapter 5, which can be

read out of sequence, equips you to evaluate the results obtained from machine

learning, addressing the sometimes complex issues involved in performance

evaluation.

At the lowest and most detailed level, Chapter 6 exposes in naked detail the

nitty-gritty issues of implementing a spectrum of machine learning algorithms,

including the complexities necessary for them to work well in practice. Although

many readers may want to ignore this detailed information, it is at this level that

the full, working, tested implementations of machine learning schemes in Weka

are written. Chapter 7 describes practical topics involved with engineering the

input to machine learning—for example, selecting and discretizing attributes—

and covers several more advanced techniques for reﬁning and combining the

output from different learning techniques. The ﬁnal chapter of Part I looks to

the future.

The book describes most methods used in practical machine learning.

However, it does not cover reinforcement learning, because it is rarely applied

in practical data mining; genetic algorithm approaches, because these are just

an optimization technique; or relational learning and inductive logic program-

ming, because they are rarely used in mainstream data mining applications.

The data mining system that illustrates the ideas in the book is described in

Part II to clearly separate conceptual material from the practical aspects of how

to use it. You can skip to Part II directly from Chapter 4 if you are in a hurry to

analyze your data and don’t want to be bothered with the technical details.

Java has been chosen for the implementations of machine learning tech-

niques that accompany this book because, as an object-oriented programming

language, it allows a uniform interface to learning schemes and methods for pre-

and postprocessing. We have chosen Java instead of C

++, Smalltalk, or other

x x v i

P R E FAC E

P088407-FM.qxd 4/30/05 10:55 AM Page xxvi

object-oriented languages because programs written in Java can be run on

almost any computer without having to be recompiled, having to undergo com-

plicated installation procedures, or—worst of all—having to change the code.

A Java program is compiled into byte-code that can be executed on any com-

puter equipped with an appropriate interpreter. This interpreter is called the

Java virtual machine. Java virtual machines—and, for that matter, Java compil-

ers—are freely available for all important platforms.

Like all widely used programming languages, Java has received its share of

criticism. Although this is not the place to elaborate on such issues, in several

cases the critics are clearly right. However, of all currently available program-

ming languages that are widely supported, standardized, and extensively docu-

mented, Java seems to be the best choice for the purpose of this book. Its main

disadvantage is speed of execution—or lack of it. Executing a Java program is

several times slower than running a corresponding program written in C lan-

guage because the virtual machine has to translate the byte-code into machine

code before it can be executed. In our experience the difference is a factor of

three to ﬁve if the virtual machine uses a just-in-time compiler. Instead of trans-

lating each byte-code individually, a just-in-time compiler translates whole

chunks of byte-code into machine code, thereby achieving signiﬁcant speedup.

However, if this is still to slow for your application, there are compilers that

translate Java programs directly into machine code, bypassing the byte-code

step. This code cannot be executed on other platforms, thereby sacriﬁcing one

of Java’s most important advantages.

Updated and revised content

We ﬁnished writing the ﬁrst edition of this book in 1999 and now, in April 2005,

are just polishing this second edition. The areas of data mining and machine

learning have matured in the intervening years. Although the core of material

in this edition remains the same, we have made the most of our opportunity to

update it to reﬂect the changes that have taken place over 5 years. There have

been errors to ﬁx, errors that we had accumulated in our publicly available errata

ﬁle. Surprisingly few were found, and we hope there are even fewer in this

second edition. (The errata for the second edition may be found through the

book’s home page at http://www.cs.waikato.ac.nz/ml/weka/book.html.) We have

thoroughly edited the material and brought it up to date, and we practically

doubled the number of references. The most enjoyable part has been adding

new material. Here are the highlights.

Bowing to popular demand, we have added comprehensive information on

neural networks: the perceptron and closely related Winnow algorithm in

Section 4.6 and the multilayer perceptron and backpropagation algorithm

P R E FAC E

x x v i i

P088407-FM.qxd 4/30/05 10:55 AM Page xxvii

Yüklə 4,3 Mb.

Dostları ilə paylaş:

1 ... 5 6 7 8 9 10 11 12 ... 219