Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

Yüklə 4,3 Mb.

Pdf görüntüsü

səhifə	6/219
tarix	08.10.2017
ölçüsü	4,3 Mb.
	#3816

1 2 3 4 5 6 7 8 9 ... 219

P088407-FM.qxd 4/30/05 10:55 AM Page xvi

List of Figures

Figure 1.1

Rules for the contact lens data.

Figure 1.2

Decision tree for the contact lens data.

Figure 1.3

Decision trees for the labor negotiations data.

Figure 2.1

A family tree and two ways of expressing the sister-of

relation.

Figure 2.2

ARFF ﬁle for the weather data.

Figure 3.1

Constructing a decision tree interactively: (a) creating a

rectangular test involving petallength and petalwidth and (b)

the resulting (unﬁnished) decision tree.

Figure 3.2

Decision tree for a simple disjunction.

Figure 3.3

The exclusive-or problem.

Figure 3.4

Decision tree with a replicated subtree.

Figure 3.5

Rules for the Iris data.

Figure 3.6

The shapes problem.

Figure 3.7

Models for the CPU performance data: (a) linear regression,

(b) regression tree, and (c) model tree.

Figure 3.8

Different ways of partitioning the instance space.

Figure 3.9

Different ways of representing clusters.

Figure 4.1

Pseudocode for 1R.

Figure 4.2

Tree stumps for the weather data.

Figure 4.3

Expanded tree stumps for the weather data.

100

Figure 4.4

Decision tree for the weather data.

101

Figure 4.5

Tree stump for the ID code attribute.

103

Figure 4.6

Covering algorithm: (a) covering the instances and (b) the

decision tree for the same problem.

106

Figure 4.7

The instance space during operation of a covering

algorithm.

108

Figure 4.8

Pseudocode for a basic rule learner.

111

Figure 4.9

Logistic regression: (a) the logit transform and (b) an example

logistic regression function.

122

x v i i

P088407-FM.qxd 4/30/05 10:55 AM Page xvii

Figure 4.10

The perceptron: (a) learning rule and (b) representation as

a neural network.

125

Figure 4.11

The Winnow algorithm: (a) the unbalanced version and (b)

the balanced version.

127

Figure 4.12

A kD-tree for four training instances: (a) the tree and (b)

instances and splits.

130

Figure 4.13

Using a kD-tree to ﬁnd the nearest neighbor of the

star.

131

Figure 4.14

Ball tree for 16 training instances: (a) instances and balls and

(b) the tree.

134

Figure 4.15

Ruling out an entire ball (gray) based on a target point (star)

and its current nearest neighbor.

135

Figure 4.16

A ball tree: (a) two cluster centers and their dividing line and

(b) the corresponding tree.

140

Figure 5.1

A hypothetical lift chart.

168

Figure 5.2

A sample ROC curve.

169

Figure 5.3

ROC curves for two learning methods.

170

Figure 5.4

Effects of varying the probability threshold: (a) the error curve

and (b) the cost curve.

174

Figure 6.1

Example of subtree raising, where node C is “raised” to

subsume node B.

194

Figure 6.2

Pruning the labor negotiations decision tree.

196

Figure 6.3

Algorithm for forming rules by incremental reduced-error

pruning.

205

Figure 6.4

RIPPER: (a) algorithm for rule learning and (b) meaning of

symbols.

206

Figure 6.5

Algorithm for expanding examples into a partial

tree.

208

Figure 6.6

Example of building a partial tree.

209

Figure 6.7

Rules with exceptions for the iris data.

211

Figure 6.8

A maximum margin hyperplane.

216

Figure 6.9

Support vector regression: (a)

e = 1, (b) e = 2, and (c)

e = 0.5.

221

Figure 6.10

Example datasets and corresponding perceptrons.

225

Figure 6.11

Step versus sigmoid: (a) step function and (b) sigmoid

function.

228

Figure 6.12

Gradient descent using the error function x

+ 1.

229

Figure 6.13

Multilayer perceptron with a hidden layer.

231

Figure 6.14

A boundary between two rectangular classes.

240

Figure 6.15

Pseudocode for model tree induction.

248

Figure 6.16

Model tree for a dataset with nominal attributes.

250

Figure 6.17

Clustering the weather data.

256

x v i i i

L I S T O F F I G U R E S

P088407-FM.qxd 4/30/05 10:55 AM Page xviii

Figure 6.18

Hierarchical clusterings of the iris data.

259

Figure 6.19

A two-class mixture model.

264

Figure 6.20

A simple Bayesian network for the weather data.

273

Figure 6.21

Another Bayesian network for the weather data.

274

Figure 6.22

The weather data: (a) reduced version and (b) corresponding

AD tree.

281

Figure 7.1

Attribute space for the weather dataset.

293

Figure 7.2

Discretizing the temperature attribute using the entropy

method.

299

Figure 7.3

The result of discretizing the temperature attribute.

300

Figure 7.4

Class distribution for a two-class, two-attribute

problem.

303

Figure 7.5

Principal components transform of a dataset: (a) variance of

each component and (b) variance plot.

308

Figure 7.6

Number of international phone calls from Belgium,

1950–1973.

314

Figure 7.7

Algorithm for bagging.

319

Figure 7.8

Algorithm for boosting.

322

Figure 7.9

Algorithm for additive logistic regression.

327

Figure 7.10

Simple option tree for the weather data.

329

Figure 7.11

Alternating decision tree for the weather data.

330

Figure 10.1

The Explorer interface.

370

Figure 10.2

Weather data: (a) spreadsheet, (b) CSV format, and

371

Figure 10.3

The Weka Explorer: (a) choosing the Explorer interface and

(b) reading in the weather data.

372

Figure 10.4

Using J4.8: (a) ﬁnding it in the classiﬁers list and (b) the

Classify tab.

374

Figure 10.5

Output from the J4.8 decision tree learner.

375

Figure 10.6

Visualizing the result of J4.8 on the iris dataset: (a) the tree

and (b) the classiﬁer errors.

379

Figure 10.7

Generic object editor: (a) the editor, (b) more information

(click More), and (c) choosing a converter

(click Choose).

381

Figure 10.8

Choosing a ﬁlter: (a) the ﬁlters menu, (b) an object editor, and

383

Figure 10.9

The weather data with two attributes removed.

384

Figure 10.10

Processing the CPU performance data with M5

¢.

385

Figure 10.11

Output from the M5

¢ program for numeric

prediction.

386

Figure 10.12

Visualizing the errors: (a) from M5

¢ and (b) from linear

regression.

388

L I S T O F F I G U R E S

x i x

P088407-FM.qxd 4/30/05 10:55 AM Page xix

Yüklə 4,3 Mb.

Dostları ilə paylaş:

1 2 3 4 5 6 7 8 9 ... 219