Data Mining: Practical Machine Learning Tools and Techniques, Second Edition


Practical Machine Learning Tools and Techniques



Yüklə 4,3 Mb.
Pdf görüntüsü
səhifə2/219
tarix08.10.2017
ölçüsü4,3 Mb.
#3816
1   2   3   4   5   6   7   8   9   ...   219

Data Mining

Practical Machine Learning Tools and Techniques,

Second Edition

Ian H. Witten

Department of Computer Science

University of Waikato

Eibe Frank

Department of Computer Science

University of Waikato

AMSTERDAM • BOSTON • HEIDELBERG • LONDON

NEW YORK • OXFORD • PARIS • SAN DIEGO

SAN FRANCISCO • SINGAPORE • SYDNEY • TOKYO

MORGAN KAUFMANN PUBLISHERS IS AN IMPRINT OF ELSEVIER

P088407-FM.qxd  4/30/05  10:55 AM  Page iii




Publisher:

Diane Cerra

Publishing Services Manager:

Simon Crump

Project Manager:

Brandy Lilly

Editorial Assistant:

Asma Stephan

Cover Design:

Yvo Riezebos Design

Cover Image:

Getty Images

Composition:

SNP Best-set Typesetter Ltd., Hong Kong

Technical Illustration:

Dartmouth Publishing, Inc.

Copyeditor:

Graphic World Inc.

Proofreader:

Graphic World Inc.

Indexer:

Graphic World Inc.

Interior printer:

The Maple-Vail Book Manufacturing Group

Cover printer:

Phoenix Color Corp

Morgan Kaufmann Publishers is an imprint of Elsevier.

500 Sansome Street, Suite 400, San Francisco, CA 94111

This book is printed on acid-free paper.

© 2005 by Elsevier Inc. All rights reserved.

Designations used by companies to distinguish their products are often claimed as trademarks

or registered trademarks. In all instances in which Morgan Kaufmann Publishers is aware of a

claim, the product names appear in initial capital or all capital letters. Readers, however, should

contact the appropriate companies for more complete information regarding trademarks and

registration.

No part of this publication may be reproduced, stored in a retrieval system, or transmitted in

any form or by any means—electronic, mechanical, photocopying, scanning, or otherwise—

without prior written permission of the publisher.

Permissions may be sought directly from Elsevier’s Science & Technology Rights Department in

Oxford, UK: phone: (

+44) 1865 843830, fax: (+44) 1865 853333, e-mail:

permissions@elsevier.com.uk. You may also complete your request on-line via the Elsevier

homepage (http://elsevier.com) by selecting “Customer Support” and then “Obtaining

Permissions.”



Library of Congress Cataloging-in-Publication Data

Witten, I. H. (Ian  H.)

Data mining : practical machine learning tools and techniques / Ian H. Witten, Eibe 

Frank. – 2nd ed.

p. cm. – (Morgan Kaufmann series in data management systems)

Includes bibliographical references and index.

ISBN: 0-12-088407-0

1. Data mining.

I. Frank, Eibe.

II. Title.

III. Series.

QA76.9.D343W58 2005

006.3–dc22

2005043385

For information on all Morgan Kaufmann publications,

visit our Web site at www.mkp.com or www.books.elsevier.com

Printed in the United States of America

05 06 07 08 09

5 4 3 2 1

Working together to grow 

libraries in developing countries

www.elsevier.com  |  www.bookaid.org  |  www.sabre.org

P088407-FM.qxd  5/3/05  2:22 PM  Page iv



Foreword

Jim Gray, Series Editor

Microsoft Research

Technology now allows us to capture and store vast quantities of data. Finding

patterns, trends, and anomalies in these datasets, and summarizing them 

with simple quantitative models, is one of the grand challenges of the infor-

mation age—turning data into information and turning information into

knowledge.

There has been stunning progress in data mining and machine learning. The

synthesis of statistics, machine learning, information theory, and computing has

created a solid science, with a firm mathematical base, and with very powerful

tools. Witten and Frank present much of this progress in this book and in the

companion implementation of the key algorithms. As such, this is a milestone

in the synthesis of data mining, data analysis, information theory, and machine

learning. If you have not been following this field for the last decade, this is a

great way to catch up on this exciting progress. If you have, then Witten and

Frank’s presentation and the companion open-source workbench, called Weka,

will be a useful addition to your toolkit.

They present the basic theory of automatically extracting models from data,

and then validating those models. The book does an excellent job of explaining

the various models (decision trees, association rules, linear models, clustering,

Bayes nets, neural nets) and how to apply them in practice. With this basis, they

then walk through the steps and pitfalls of various approaches. They describe

how to safely scrub datasets, how to build models, and how to evaluate a model’s

predictive quality. Most of the book is tutorial, but Part II broadly describes how

commercial systems work and gives a tour of the publicly available data mining

workbench that the authors provide through a website. This Weka workbench

has a graphical user interface that leads you through data mining tasks and has

excellent data visualization tools that help understand the models. It is a great

companion to the text and a useful and popular tool in its own right.

v

P088407-FM.qxd  5/3/05  2:23 PM  Page v




This book presents this new discipline in a very accessible form: as a text 

both to train the next generation of practitioners and researchers and to inform

lifelong learners like myself. Witten and Frank have a passion for simple and

elegant solutions. They approach each topic with this mindset, grounding all

concepts in concrete examples, and urging the reader to consider the simple

techniques first, and then progress to the more sophisticated ones if the simple

ones prove inadequate.

If you are interested in databases, and have not been following the machine

learning field, this book is a great way to catch up on this exciting progress. If

you have data that you want to analyze and understand, this book and the asso-

ciated Weka toolkit are an excellent way to start.

v i


F O R EWO R D

P088407-FM.qxd  5/3/05  2:23 PM  Page vi




Yüklə 4,3 Mb.

Dostları ilə paylaş:
1   2   3   4   5   6   7   8   9   ...   219




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©www.genderi.org 2024
rəhbərliyinə müraciət

    Ana səhifə