ii
A Global
Text Project Book
This book is available on Amazon.com.
© 2012 Dr. Matthew A. North
This book is licensed under a
Creative Commons Attribution 3.0
License
All rights reserved.
ISBN:
0615684378
ISBN-13: 978-0615684376
iii
DEDICATION
This book is gratefully dedicated to Dr. Charles Hannon, who gave me the
chance to become a
college professor and then challenged me to learn how to teach data mining to the masses.
Data Mining for the Masses
v
Table of Contents
Dedication ....................................................................................................................................................... iii
Table of Contents ............................................................................................................................................ v
Acknowledgements ........................................................................................................................................ xi
SECTION ONE: Data Mining Basics ......................................................................................................... 1
Chapter One: Introduction to Data Mining and CRISP-DM .................................................................. 3
Introduction ................................................................................................................................................. 3
A Note About Tools .................................................................................................................................. 4
The Data Mining Process .......................................................................................................................... 5
Data Mining and You ...............................................................................................................................11
Chapter Two: Organizational Understanding and Data Understanding ..............................................13
Context and Perspective ..........................................................................................................................13
Learning Objectives ..................................................................................................................................14
Purposes, Intents and Limitations of Data Mining ..............................................................................15
Database, Data Warehouse, Data Mart, Data Set…? ..........................................................................15
Types of Data ............................................................................................................................................19
A Note about Privacy and Security ........................................................................................................20
Chapter Summary......................................................................................................................................21
Review Questions......................................................................................................................................22
Exercises .....................................................................................................................................................22
Chapter Three: Data Preparation ................................................................................................................25
Context and Perspective ..........................................................................................................................25
Learning Objectives ..................................................................................................................................25
Collation .....................................................................................................................................................27
Data Mining for the Masses
vi
Data Scrubbing ......................................................................................................................................... 28
Hands on Exercise .................................................................................................................................... 29
Preparing RapidMiner,
Importing Data, and ........................................................................................ 30
Handling Missing Data ............................................................................................................................ 30
Data Reduction ......................................................................................................................................... 46
Handling Inconsistent Data .................................................................................................................... 50
Attribute Reduction .................................................................................................................................. 52
Chapter Summary ..................................................................................................................................... 54
Review Questions ..................................................................................................................................... 55
Exercise ...................................................................................................................................................... 55
SECTION TWO: Data Mining Models and Methods ........................................................................... 57
Chapter Four: Correlation ........................................................................................................................... 59
Context and Perspective .......................................................................................................................... 59
Learning Objectives.................................................................................................................................. 59
Organizational Understanding ................................................................................................................ 59
Data Understanding ................................................................................................................................. 60
Data Preparation ....................................................................................................................................... 60
Modeling .................................................................................................................................................... 62
Evaluation .................................................................................................................................................. 63
Deployment ............................................................................................................................................... 65
Chapter Summary ..................................................................................................................................... 67
Review Questions ..................................................................................................................................... 68
Exercise ...................................................................................................................................................... 68
Chapter Five: Association Rules ................................................................................................................. 73
Context and Perspective .......................................................................................................................... 73
Learning Objectives.................................................................................................................................. 73
Organizational Understanding ................................................................................................................ 73