S u rv e y pa p e r

Yüklə 471,61 Kb.

Pdf görüntüsü

səhifə	7/16
tarix	08.10.2017
ölçüsü	471,61 Kb.
	#3814

1 2 3 4 5 6 7 8 9 10 ... 16

5 The EM algorithm

Top 10 algorithms in data mining

item in the order of increasing frequency and extracting frequent itemsets that contain the

chosen item by recursively calling itself on the conditional FP-tree. FP-growth is an order of

magnitude faster than the original Apriori algorithm.

There are several other dimensions regarding the extensions of frequent pattern mining.

The major ones include the followings: (1) incorporating taxonomy in items [

]: Use of

taxonomy makes it possible to extract frequent itemsets that are expressed by higher concepts

even when use of the base level concepts produces only infrequent itemsets. (2) incremental

mining: In this setting, it is assumed that the database is not stationary and a new instance

of transaction keeps added. The algorithm in [

] updates the frequent itemsets without

restarting from scratch. (3) using numeric valuable for item: When the item corresponds to a

continuous numeric value, current frequent itemset mining algorithm is not applicable unless

the values are discretized. A method of subspace clustering can be used to obtain an optimal

value interval for each item in each itemset [

]. (4) using other measures than frequency,

such as information gain or

value: These measures are useful in ﬁnding discriminative

patterns but unfortunately do not satisfy anti-monotonicity property. However, these mea-

sures have a nice property of being convex with respect to their arguments and it is possible

to estimate their upperbound for supersets of a pattern and thus prune unpromising patterns

efﬁciently. AprioriSMP uses this principle [

]. (5) using richer expressions than itemset:

Many algorithms have been proposed for sequences, tree and graphs to enable mining from

more complex data structure [

,

42

]. (6) closed itemsets: A frequent itemset is closed if it is

not included in any other frequent itemsets. Thus, once the closed itemsets are found, all the

frequent itemsets can be derived from them. LCM is the most efﬁcient algorithm to ﬁnd the

closed itemsets [

].

5 The EM algorithm

Finite mixture distributions provide a ﬂexible and mathematical-based approach to the mod-

eling and clustering of data observed on random phenomena. We focus here on the use of

normal mixture models, which can be used to cluster continuous data and to estimate the

underlying density function. These mixture models can be ﬁtted by maximum likelihood via

the EM (Expectation–Maximization) algorithm.

5.1 Introduction

Finite mixture models are being increasingly used to model the distributions of a wide variety

of random phenomena and to cluster data sets [

]. Here we consider their application in the

context of cluster analysis.

We let the p-dimensional vector ( y

= (y

, . . . , y

)

) contain the values of p variables

measured on each of n (independent) entities to be clustered, and we let y

denote the value

of y corresponding to the j th entity ( j

= 1, . . . , n). With the mixture approach to clustering,

, . . . , y

are assumed to be an observed random sample from mixture of a ﬁnite number,

say g, of groups in some unknown proportions

, . . . , π

g

The mixture density of y

is expressed as

; ) =

g

i

i

f

i

; θ

) ( j = 1, . . . , n),

(3)

123

X. Wu et al.

where the mixing proportions

, . . . , π

g

sum to one and the group-conditional density

f

i

; θ

) is speciﬁed up to a vector θ

of unknown parameters (i

= 1, . . . , g). The vector of

all the unknown parameters is given by

= (π

1

, . . . , π

−1

, θ

, . . . , θ

T

g

)

where the superscript “T” denotes vector transpose. Using an estimate of

, this approach

gives a probabilistic clustering of the data into g clusters in terms of estimates of the posterior

probabilities of component membership,

τ

i

(y

j

, ) =

π

i

f

i

; θ

)

f

(y

j

; )

(4)

where

τ

i

(y

j

) is the posterior probability that y

(really the entity with observation y

) belongs

to the i th component of the mixture (i

= 1, . . . , g; j = 1, . . . , n).

The parameter vector

can be estimated by maximum likelihood. The maximum like-

lihood estimate (MLE) of

, ˆ , is given by an appropriate root of the likelihood equation,

∂ log L( )/∂ = 0,

(5)

where

log L

( ) =

n

j

log f

(y

j

; )

(6)

is the log likelihood function for

. Solutions of (

) corresponding to local maximizers can

be obtained via the expectation–maximization (EM) algorithm [

For the modeling of continuous data, the component-conditional densities are usually

taken to belong to the same parametric family, for example, the normal. In this case,

f

i

; θ

) = φ(y

; µ

,

i

(7)

where

φ(y

; µ, ) denotes the p-dimensional multivariate normal distribution with mean

vector

µ and covariance matrix .

One attractive feature of adopting mixture models with elliptically symmetric compo-

nents such as the normal or t densities, is that the implied clustering is invariant under afﬁne

transformations of the data (that is, under operations relating to changes in location, scale,

and rotation of the data). Thus the clustering process does not depend on irrelevant factors

such as the units of measurement or the orientation of the clusters in space.

5.2 Maximum likelihood estimation of normal mixtures

McLachlan and Peel [

, Chap. 3] described the E- and M-steps of the EM algorithm for the

maximum likelihood (ML) estimation of multivariate normal components; see also [

]. In

the EM framework for this problem, the unobservable component labels z

i j

are treated as

being the “missing” data, where z

i j

is deﬁned to be one or zero according as y

belongs or

does not belong to the i th component of the mixture (i

= 1, . . . , g; , j = 1, . . . , n).

On the (k

+ 1)th iteration of the EM algorithm, the E-step requires taking the expectation

of the complete-data log likelihood logL

c

( ), given the current estimate

for

. As is

linear in the unobservable z

i j

, this E-step is effected by replacing the z

i j

by their conditional

expectation given the observed data y

j

, using

(k)

. That is, z

i j

is replaced by

(k)

i j

, which

is the posterior probability that y

j

belongs to the i th component of the mixture, using the

123

Yüklə 471,61 Kb.

Dostları ilə paylaş:

1 2 3 4 5 6 7 8 9 10 ... 16