HAN
01-fm-i-vi-9780123814791
2011/6/1
3:29
Page v
#5
Data Mining
Concepts and Techniques
Third Edition
Jiawei Han
University of Illinois at Urbana–Champaign
Micheline Kamber
Jian Pei
Simon Fraser University
AMSTERDAM • BOSTON • HEIDELBERG • LONDON
NEW YORK • OXFORD • PARIS • SAN DIEGO
SAN FRANCISCO • SINGAPORE • SYDNEY • TOKYO
Morgan Kaufmann is an imprint of Elsevier
HAN
01-fm-i-vi-9780123814791
2011/6/1
3:29
Page vi
#6
Morgan Kaufmann Publishers is an imprint of Elsevier.
225 Wyman Street, Waltham, MA 02451, USA
c 2012 by Elsevier Inc. All rights reserved.
No part of this publication may be reproduced or transmitted in any form or by any means,
electronic or mechanical, including photocopying, recording, or any information storage and
retrieval system, without permission in writing from the publisher. Details on how to seek
permission, further information about the Publisher’s permissions policies and our
arrangements with organizations such as the Copyright Clearance Center and the Copyright
Licensing Agency, can be found at our website: www.elsevier.com/permissions.
This book and the individual contributions contained in it are protected under copyright by
the Publisher (other than as may be noted herein).
Notices
Knowledge and best practice in this field are constantly changing. As new research and
experience broaden our understanding, changes in research methods or professional practices,
may become necessary. Practitioners and researchers must always rely on their own experience
and knowledge in evaluating and using any information or methods described herein. In using
such information or methods they should be mindful of their own safety and the safety of others,
including parties for whom they have a professional responsibility.
To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors,
assume any liability for any injury and/or damage to persons or property as a matter of products
liability, negligence or otherwise, or from any use or operation of any methods, products,
instructions, or ideas contained in the material herein.
Library of Congress Cataloging-in-Publication Data
Han, Jiawei.
Data mining : concepts and techniques / Jiawei Han, Micheline Kamber, Jian Pei. – 3rd ed.
p.
cm.
ISBN 978-0-12-381479-1
1. Data mining. I. Kamber, Micheline. II. Pei, Jian. III. Title.
QA76.9.D343H36 2011
006.3 12–dc22
2011010635
British Library Cataloguing-in-Publication Data
A catalogue record for this book is available from the British Library.
For information on all Morgan Kaufmann publications, visit our
Web site at www.mkp.com or www.elsevierdirect.com
Printed in the United States of America
11 12 13 14 15
10 9 8 7 6 5 4 3 2 1
EDELKAMP
19-ch15-671-700-9780123725127
2011/5/28
14:50
Page 672
#2
This page intentionally left blank
HAN
03-toc-ix-xviii-9780123814791
2011/6/1
3:32
Page ix
#1
Contents
Foreword
xix
Foreword to Second Edition
xxi
Preface
xxiii
Acknowledgments
xxxi
About the Authors
xxxv
Chapter 1 Introduction
1
1.1
Why Data Mining?
1
1.1.1
Moving toward the Information Age
1
1.1.2
Data Mining as the Evolution of Information Technology
2
1.2
What Is Data Mining?
5
1.3
What Kinds of Data Can Be Mined?
8
1.3.1
Database Data
9
1.3.2
Data Warehouses
10
1.3.3
Transactional Data
13
1.3.4
Other Kinds of Data
14
1.4
What Kinds of Patterns Can Be Mined?
15
1.4.1
Class/Concept Description: Characterization and Discrimination
15
1.4.2
Mining Frequent Patterns, Associations, and Correlations
17
1.4.3
Classification and Regression for Predictive Analysis
18
1.4.4
Cluster Analysis
19
1.4.5
Outlier Analysis
20
1.4.6
Are All Patterns Interesting?
21
1.5
Which Technologies Are Used?
23
1.5.1
Statistics
23
1.5.2
Machine Learning
24
1.5.3
Database Systems and Data Warehouses
26
1.5.4
Information Retrieval
26
ix