Data Mining. Concepts and Techniques, 3rd Edition

HAN 05-pref-xxiii-xxx-9780123814791

Yüklə 7,95 Mb.

Pdf görüntüsü

səhifə	9/343
tarix	08.10.2017
ölçüsü	7,95 Mb.
	#3817

1 ... 5 6 7 8 9 10 11 12 ... 343

To the Instructor

HAN

05-pref-xxiii-xxx-9780123814791

2011/6/1

3:35

Page xxv

#3

Preface

xxv

Chapter 3 introduces techniques for data preprocessing. It ﬁrst introduces the con-

cept of data quality and then discusses methods for data cleaning, data integration, data

reduction, data transformation, and data discretization.

Chapters 4 and 5 provide a solid introduction to data warehouses, OLAP (online ana-

lytical processing), and data cube technology. Chapter 4 introduces the basic concepts,

modeling, design architectures, and general implementations of data warehouses and

OLAP, as well as the relationship between data warehousing and other data generali-

zation methods. Chapter 5 takes an in-depth look at data cube technology, presenting a

detailed study of methods of data cube computation, including Star-Cubing and high-

dimensional OLAP methods. Further explorations of data cube and OLAP technologies

are discussed, such as sampling cubes, ranking cubes, prediction cubes, multifeature

cubes for complex analysis queries, and discovery-driven cube exploration.

Chapters 6 and 7 present methods for mining frequent patterns, associations, and

correlations in large data sets. Chapter 6 introduces fundamental concepts, such as

market basket analysis, with many techniques for frequent itemset mining presented

in an organized way. These range from the basic Apriori algorithm and its vari-

ations to more advanced methods that improve efﬁciency, including the frequent

pattern growth approach, frequent pattern mining with vertical data format, and min-

ing closed and max frequent itemsets. The chapter also discusses pattern evaluation

methods and introduces measures for mining correlated patterns. Chapter 7 is on

advanced pattern mining methods. It discusses methods for pattern mining in multi-

level and multidimensional space, mining rare and negative patterns, mining colossal

patterns and high-dimensional data, constraint-based pattern mining, and mining com-

pressed or approximate patterns. It also introduces methods for pattern exploration and

application, including semantic annotation of frequent patterns.

Chapters 8 and 9 describe methods for data classiﬁcation. Due to the importance

and diversity of classiﬁcation methods, the contents are partitioned into two chapters.

Chapter 8 introduces basic concepts and methods for classiﬁcation, including decision

tree induction, Bayes classiﬁcation, and rule-based classiﬁcation. It also discusses model

evaluation and selection methods and methods for improving classiﬁcation accuracy,

including ensemble methods and how to handle imbalanced data. Chapter 9 discusses

advanced methods for classiﬁcation, including Bayesian belief networks, the neural

network technique of backpropagation, support vector machines, classiﬁcation using

frequent patterns, k-nearest-neighbor classiﬁers, case-based reasoning, genetic algo-

rithms, rough set theory, and fuzzy set approaches. Additional topics include multiclass

classiﬁcation, semi-supervised classiﬁcation, active learning, and transfer learning.

Cluster analysis forms the topic of Chapters 10 and 11. Chapter 10 introduces the

basic concepts and methods for data clustering, including an overview of basic cluster

analysis methods, partitioning methods, hierarchical methods, density-based methods,

and grid-based methods. It also introduces methods for the evaluation of clustering.

Chapter 11 discusses advanced methods for clustering, including probabilistic model-

based clustering, clustering high-dimensional data, clustering graph and network data,

and clustering with constraints.

HAN

05-pref-xxiii-xxx-9780123814791

2011/6/1

3:35

Page xxvi

#4

xxvi

Preface

Chapter 12 is dedicated to outlier detection. It introduces the basic concepts of out-

liers and outlier analysis and discusses various outlier detection methods from the view

of degree of supervision (i.e., supervised, semi-supervised, and unsupervised meth-

ods), as well as from the view of approaches (i.e., statistical methods, proximity-based

methods, clustering-based methods, and classiﬁcation-based methods). It also discusses

methods for mining contextual and collective outliers, and for outlier detection in

high-dimensional data.

Finally, in Chapter 13, we discuss trends, applications, and research frontiers in data

mining. We brieﬂy cover mining complex data types, including mining sequence data

(e.g., time series, symbolic sequences, and biological sequences), mining graphs and

networks, and mining spatial, multimedia, text, and Web data. In-depth treatment of

data mining methods for such data is left to a book on advanced topics in data mining,

the writing of which is in progress. The chapter then moves ahead to cover other data

mining methodologies, including statistical data mining, foundations of data mining,

visual and audio data mining, as well as data mining applications. It discusses data

mining for ﬁnancial data analysis, for industries like retail and telecommunication, for

use in science and engineering, and for intrusion detection and prevention. It also dis-

cusses the relationship between data mining and recommender systems. Because data

mining is present in many aspects of daily life, we discuss issues regarding data mining

and society, including ubiquitous and invisible data mining, as well as privacy, security,

and the social impacts of data mining. We conclude our study by looking at data mining

trends.

Throughout the text, italic font is used to emphasize terms that are deﬁned, while

bold font is used to highlight or summarize main ideas. Sans serif font is used for

reserved words. Bold italic font is used to represent multidimensional quantities.

This book has several strong features that set it apart from other texts on data mining.

It presents a very broad yet in-depth coverage of the principles of data mining. The

chapters are written to be as self-contained as possible, so they may be read in order of

interest by the reader. Advanced chapters offer a larger-scale view and may be considered

optional for interested readers. All of the major methods of data mining are presented.

The book presents important topics in data mining regarding multidimensional OLAP

analysis, which is often overlooked or minimally treated in other data mining books.

The book also maintains web sites with a number of online resources to aid instructors,

students, and professionals in the ﬁeld. These are described further in the following.

To the Instructor

This book is designed to give a broad, yet detailed overview of the data mining ﬁeld. It

can be used to teach an introductory course on data mining at an advanced undergrad-

uate level or at the ﬁrst-year graduate level. Sample course syllabi are provided on the

book’s web sites (www.cs.uiuc.edu/∼hanj/bk3 and www.booksite.mkp.com/datamining3e)

in addition to extensive teaching resources such as lecture slides, instructors’ manuals,

and reading lists (see p. xxix).

Yüklə 7,95 Mb.

Dostları ilə paylaş:

1 ... 5 6 7 8 9 10 11 12 ... 343

Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©www.genderi.org 2024
rəhbərliyinə müraciət