5 1 4
I N D E X
implementation—real-world schemes
(continued)
numeric prediction, 243–254
See also individual subject headings
inaccurate values, 59–60. See also cost of errors;
data cleaning; error rate
incremental algorithms, 346
incrementalClassifier, 434
IncrementalClassifierEvaluator, 431
incremental clustering, 255–260
incremental learning in Weka, 433–435
incremental reduced-error pruning, 203, 205
independent attributes, 267
index(), 472
induction, 29
inductive logic programming, 48, 60, 75, 351
Induct system, 214
industrial usage. See implementation—real-
world schemes
inferring rudimentary rules, 84–88
InfoGainAttributeEval, 422–423
informational loss function, 159–160, 161
information-based heuristic, 201
information extraction, 354
information gain, 99
information retrieval, 171
information value, 102
infrequent words, 353
inner cross-validation, 286
input, 41–60
ARFF format, 53–55
assembling the data, 52–53
attribute, 49–52
attribute types, 56–57
concept, 42–45
data engineering, 286–287, 288–315. See also
engineering input and output
data preparation, 52–60
getting to know your data, 60
inaccurate values, 59–60
instances, 45
missing values, 58
sparse data, 55–56
input layer, 224
instance in Weka, 450
Instance, 451
instance-based learning, 78, 128–136, 235–243
ball tree, 133–135
distance functions, 128–129, 239–242
finding nearest neighbors, 129–135
generalized distance functions, 241–242
generalized exemplars, 236
kD-trees, 130–132
missing values, 129
pruning noisy exemplars, 236–237
redundant exemplars, 236
simple method, 128–136, 235–236
weighting attributes, 237–238
Weka, 413–414
instance-based learning methods, 291
instance-based methods, 34
instance-based representation, 76–80
instance filters in Weka, 394, 400–401, 403
instances, 45
Instances, 451
instance space, 79
instance weights, 166, 321–322
integer-valued attributes, 49
intensive care patients, 29
interval, 88
interval quantities, 50–51
intrusion detection systems, 357
invertSelection, 382
in vitro fertilization, 3
iris dataset, 15–16
iris setosa, 15
iris versicolor, 15
iris virginica, 15
ISO-8601 combined date and time format, 55
item, 113
item sets, 113, 114–115
iterative distance-based clustering, 137–138
J
J4.8, 373–377
J48, 404, 450
Javadoc indices, 456
JDBC database, 445
JRip, 409
junk email filtering, 356–357
P088407-INDEX.qxd 4/30/05 11:25 AM Page 514
I N D E X
5 1 5
K
K2, 278
Kappa statistic, 163–164
kD-trees, 130–132, 136
Kepler’s three laws of planetary motion, 180
kernel
defined, 235
perceptron, 223
polynomial, 218
RBF, 219
sigmoid, 219
kernel density estimation, 97
kernel logistic regression, 223
kernel perceptron, 222–223
k-means, 137–138
k-nearest-neighbor method, 78
Knowledge Flow interface, 427–435
configuring/connecting components,
431–433
evaluation components, 430, 431
incremental learning, 433–435
starting up, 427
visualization components, 430–431
knowledge representation, 61–82
association rules, 69–70. See also association
rules
classification rules, 65–69. See also
classification rules
clusters, 81–82. See also clustering
decision table, 62
decision tree, 62–65. See also decision tree
instance-based representation, 76–80
rules involving relations, 73–75
rules with exceptions, 70–73, 210–213
trees for numeric prediction, 76
KStar, 413
L
labor negotiations data, 17–18, 19
language bias, 32–33
language identification, 353
Laplace, Pierre, 91
Laplace estimator, 91, 267, 269
large datasets, 346–349
law of diminishing returns, 347
lazy classifiers in Weka, 405, 413–414
LBR, 414
learning, 7–9
learning algorithms in Weka, 403–404
algorithm, listed, 404–405
Bayesian classifier, 403–406
functions, 404–405, 409–410
lazy classifiers, 405, 413–414
miscellaneous classifiers, 405, 414
neural network, 411–413
rules, 404, 408–409
trees, 404, 406–408
learning rate, 229, 230
least-absolute-error regression, 220
LeastMedSq, 409–410
leave-one-out cross-validation, 151–152
levels of measurement, 50
level-0 model, 332
level-1 model, 332
Leverage, 420
Lift, 420
lift chart, 166–168, 172
lift factor, 166
linear classification, 121–128
linearly separable, 124
linear machine, 142
linear models, 119–128, 214–235
backpropagation, 227–233
computational complexity, 218
kernel perceptron, 222–223
linear classification, 121–128
linear regression, 119–121
logistic regression, 121–125
maximum margin hyperplane, 215–217
multilayer perceptrons, 223–226, 231, 233
nonlinear class boundaries, 217–219
numeric prediction, 119–120
overfitting, 217–218
perceptron, 124–126
RBF network, 234
support vector regression, 219–222
Winnow, 126–128
linear regression, 77, 119–121
LinearRegression, 387, 409
linear threshold unit, 142
P088407-INDEX.qxd 4/30/05 11:25 AM Page 515