P088407-FM.qxd 4/30/05 10:55 AM Page xvi
List of Figures
Figure 1.1
Rules for the contact lens data.
13
Figure 1.2
Decision tree for the contact lens data.
14
Figure 1.3
Decision trees for the labor negotiations data.
19
Figure 2.1
A family tree and two ways of expressing the sister-of
relation.
46
Figure 2.2
ARFF file for the weather data.
54
Figure 3.1
Constructing a decision tree interactively: (a) creating a
rectangular test involving petallength and petalwidth and (b)
the resulting (unfinished) decision tree.
64
Figure 3.2
Decision tree for a simple disjunction.
66
Figure 3.3
The exclusive-or problem.
67
Figure 3.4
Decision tree with a replicated subtree.
68
Figure 3.5
Rules for the Iris data.
72
Figure 3.6
The shapes problem.
73
Figure 3.7
Models for the CPU performance data: (a) linear regression,
(b) regression tree, and (c) model tree.
77
Figure 3.8
Different ways of partitioning the instance space.
79
Figure 3.9
Different ways of representing clusters.
81
Figure 4.1
Pseudocode for 1R.
85
Figure 4.2
Tree stumps for the weather data.
98
Figure 4.3
Expanded tree stumps for the weather data.
100
Figure 4.4
Decision tree for the weather data.
101
Figure 4.5
Tree stump for the ID code attribute.
103
Figure 4.6
Covering algorithm: (a) covering the instances and (b) the
decision tree for the same problem.
106
Figure 4.7
The instance space during operation of a covering
algorithm.
108
Figure 4.8
Pseudocode for a basic rule learner.
111
Figure 4.9
Logistic regression: (a) the logit transform and (b) an example
logistic regression function.
122
x v i i
P088407-FM.qxd 4/30/05 10:55 AM Page xvii
Figure 4.10
The perceptron: (a) learning rule and (b) representation as
a neural network.
125
Figure 4.11
The Winnow algorithm: (a) the unbalanced version and (b)
the balanced version.
127
Figure 4.12
A kD-tree for four training instances: (a) the tree and (b)
instances and splits.
130
Figure 4.13
Using a kD-tree to find the nearest neighbor of the
star.
131
Figure 4.14
Ball tree for 16 training instances: (a) instances and balls and
(b) the tree.
134
Figure 4.15
Ruling out an entire ball (gray) based on a target point (star)
and its current nearest neighbor.
135
Figure 4.16
A ball tree: (a) two cluster centers and their dividing line and
(b) the corresponding tree.
140
Figure 5.1
A hypothetical lift chart.
168
Figure 5.2
A sample ROC curve.
169
Figure 5.3
ROC curves for two learning methods.
170
Figure 5.4
Effects of varying the probability threshold: (a) the error curve
and (b) the cost curve.
174
Figure 6.1
Example of subtree raising, where node C is “raised” to
subsume node B.
194
Figure 6.2
Pruning the labor negotiations decision tree.
196
Figure 6.3
Algorithm for forming rules by incremental reduced-error
pruning.
205
Figure 6.4
RIPPER: (a) algorithm for rule learning and (b) meaning of
symbols.
206
Figure 6.5
Algorithm for expanding examples into a partial
tree.
208
Figure 6.6
Example of building a partial tree.
209
Figure 6.7
Rules with exceptions for the iris data.
211
Figure 6.8
A maximum margin hyperplane.
216
Figure 6.9
Support vector regression: (a)
e = 1, (b) e = 2, and (c)
e = 0.5.
221
Figure 6.10
Example datasets and corresponding perceptrons.
225
Figure 6.11
Step versus sigmoid: (a) step function and (b) sigmoid
function.
228
Figure 6.12
Gradient descent using the error function x
2
+ 1.
229
Figure 6.13
Multilayer perceptron with a hidden layer.
231
Figure 6.14
A boundary between two rectangular classes.
240
Figure 6.15
Pseudocode for model tree induction.
248
Figure 6.16
Model tree for a dataset with nominal attributes.
250
Figure 6.17
Clustering the weather data.
256
x v i i i
L I S T O F F I G U R E S
P088407-FM.qxd 4/30/05 10:55 AM Page xviii
Figure 6.18
Hierarchical clusterings of the iris data.
259
Figure 6.19
A two-class mixture model.
264
Figure 6.20
A simple Bayesian network for the weather data.
273
Figure 6.21
Another Bayesian network for the weather data.
274
Figure 6.22
The weather data: (a) reduced version and (b) corresponding
AD tree.
281
Figure 7.1
Attribute space for the weather dataset.
293
Figure 7.2
Discretizing the temperature attribute using the entropy
method.
299
Figure 7.3
The result of discretizing the temperature attribute.
300
Figure 7.4
Class distribution for a two-class, two-attribute
problem.
303
Figure 7.5
Principal components transform of a dataset: (a) variance of
each component and (b) variance plot.
308
Figure 7.6
Number of international phone calls from Belgium,
1950–1973.
314
Figure 7.7
Algorithm for bagging.
319
Figure 7.8
Algorithm for boosting.
322
Figure 7.9
Algorithm for additive logistic regression.
327
Figure 7.10
Simple option tree for the weather data.
329
Figure 7.11
Alternating decision tree for the weather data.
330
Figure 10.1
The Explorer interface.
370
Figure 10.2
Weather data: (a) spreadsheet, (b) CSV format, and
(c) ARFF.
371
Figure 10.3
The Weka Explorer: (a) choosing the Explorer interface and
(b) reading in the weather data.
372
Figure 10.4
Using J4.8: (a) finding it in the classifiers list and (b) the
Classify tab.
374
Figure 10.5
Output from the J4.8 decision tree learner.
375
Figure 10.6
Visualizing the result of J4.8 on the iris dataset: (a) the tree
and (b) the classifier errors.
379
Figure 10.7
Generic object editor: (a) the editor, (b) more information
(click More), and (c) choosing a converter
(click Choose).
381
Figure 10.8
Choosing a filter: (a) the filters menu, (b) an object editor, and
(c) more information (click More).
383
Figure 10.9
The weather data with two attributes removed.
384
Figure 10.10
Processing the CPU performance data with M5
¢.
385
Figure 10.11
Output from the M5
¢ program for numeric
prediction.
386
Figure 10.12
Visualizing the errors: (a) from M5
¢ and (b) from linear
regression.
388
L I S T O F F I G U R E S
x i x
P088407-FM.qxd 4/30/05 10:55 AM Page xix
Dostları ilə paylaş: |