Data Mining. Concepts and Techniques, 3rd Edition

HAN 08-ch01-001-038-9780123814791

Yüklə 7,95 Mb.

Pdf görüntüsü

səhifə	25/343
tarix	08.10.2017
ölçüsü	7,95 Mb.
	#3817

1 ... 21 22 23 24 25 26 27 28 ... 343

Mining Methodology
User Interaction

HAN

08-ch01-001-038-9780123814791

2011/6/1

3:12

Page 29

#29

1.7 Major Issues in Data Mining

1.7

Major Issues in Data Mining

Life is short but art is long. – Hippocrates

Data mining is a dynamic and fast-expanding ﬁeld with great strengths. In this section,

we brieﬂy outline the major issues in data mining research, partitioning them into

ﬁve groups: mining methodology, user interaction, efﬁciency and scalability, diversity of

data types, and data mining and society. Many of these issues have been addressed in

recent data mining research and development to a certain extent and are now consid-

ered data mining requirements; others are still at the research stage. The issues continue

to stimulate further investigation and improvement in data mining.

1.7.1

Mining Methodology

Researchers have been vigorously developing new data mining methodologies. This

involves the investigation of new kinds of knowledge, mining in multidimensional

space, integrating methods from other disciplines, and the consideration of semantic ties

among data objects. In addition, mining methodologies should consider issues such as

data uncertainty, noise, and incompleteness. Some mining methods explore how user-

speciﬁed measures can be used to assess the interestingness of discovered patterns as

well as guide the discovery process. Let’s have a look at these various aspects of mining

methodology.

Mining various and new kinds of knowledge: Data mining covers a wide spectrum of

data analysis and knowledge discovery tasks, from data characterization and discrim-

ination to association and correlation analysis, classiﬁcation, regression, clustering,

outlier analysis, sequence analysis, and trend and evolution analysis. These tasks may

use the same database in different ways and require the development of numerous

data mining techniques. Due to the diversity of applications, new mining tasks con-

tinue to emerge, making data mining a dynamic and fast-growing ﬁeld. For example,

for effective knowledge discovery in information networks, integrated clustering and

ranking may lead to the discovery of high-quality clusters and object ranks in large

networks.

Mining knowledge in multidimensional space: When searching for knowledge in large

data sets, we can explore the data in multidimensional space. That is, we can search

for interesting patterns among combinations of dimensions (attributes) at varying

levels of abstraction. Such mining is known as (exploratory) multidimensional data

mining. In many cases, data can be aggregated or viewed as a multidimensional data

cube. Mining knowledge in cube space can substantially enhance the power and

ﬂexibility of data mining.

Data mining—an interdisciplinary effort: The power of data mining can be substan-

tially enhanced by integrating new methods from multiple disciplines. For example,

HAN

08-ch01-001-038-9780123814791

2011/6/1

3:12

Page 30

#30

30

Chapter 1 Introduction

to mine data with natural language text, it makes sense to fuse data mining methods

with methods of information retrieval and natural language processing. As another

example, consider the mining of software bugs in large programs. This form of min-

ing, known as bug mining, beneﬁts from the incorporation of software engineering

knowledge into the data mining process.

Boosting the power of discovery in a networked environment: Most data objects reside

in a linked or interconnected environment, whether it be the Web, database rela-

tions, ﬁles, or documents. Semantic links across multiple data objects can be used

to advantage in data mining. Knowledge derived in one set of objects can be used

to boost the discovery of knowledge in a “related” or semantically linked set of

objects.

Handling uncertainty, noise, or incompleteness of data: Data often contain noise,

errors, exceptions, or uncertainty, or are incomplete. Errors and noise may confuse

the data mining process, leading to the derivation of erroneous patterns. Data clean-

ing, data preprocessing, outlier detection and removal, and uncertainty reasoning are

examples of techniques that need to be integrated with the data mining process.

Pattern evaluation and pattern- or constraint-guided mining: Not all the patterns gen-

erated by data mining processes are interesting. What makes a pattern interesting

may vary from user to user. Therefore, techniques are needed to assess the inter-

estingness of discovered patterns based on subjective measures. These estimate the

value of patterns with respect to a given user class, based on user beliefs or expec-

tations. Moreover, by using interestingness measures or user-speciﬁed constraints to

guide the discovery process, we may generate more interesting patterns and reduce

the search space.

1.7.2

User Interaction

The user plays an important role in the data mining process. Interesting areas of research

include how to interact with a data mining system, how to incorporate a user’s back-

ground knowledge in mining, and how to visualize and comprehend data mining results.

We introduce each of these here.

Interactive mining: The data mining process should be highly interactive. Thus, it is

important to build ﬂexible user interfaces and an exploratory mining environment,

facilitating the user’s interaction with the system. A user may like to ﬁrst sample a

set of data, explore general characteristics of the data, and estimate potential min-

ing results. Interactive mining should allow users to dynamically change the focus

of a search, to reﬁne mining requests based on returned results, and to drill, dice,

and pivot through the data and knowledge space interactively, dynamically exploring

“cube space” while mining.

Incorporation of background knowledge: Background knowledge, constraints, rules,

and other information regarding the domain under study should be incorporated

Yüklə 7,95 Mb.

Dostları ilə paylaş:

1 ... 21 22 23 24 25 26 27 28 ... 343