5 0 6
I N D E X
anomaly detection systems, 357
antecedent, of rule, 65
AODE, 405
Apriori, 419
Apriori method, 141
area under the curve (AUC), 173
ARFF format, 53–55
converting files to, 380–382
Weka, 370, 371
ARFFLoader, 381, 427
arithmetic underflow, 276
assembling the data, 52–53
assessing performance of learning scheme, 286
assignment of key phrases, 353
Associate panel, 392
association learning, 43
association rules, 69–70, 112–119
binary attributes, 119
generating rules efficiently, 117–118
item sets, 113, 114–115
Weka, 419–420
association-rule learners in Weka, 419–420
attackers, 357
Attribute, 451
attribute(), 480
attribute discretization. See discretizing
numeric attributes
attribute-efficient, 128
attribute evaluation methods in Weka, 421,
422–423
attribute filters in Weka, 394, 395–400, 402–403
attributeIndices, 382
attribute noise, 313
attribute-relation file format. See ARFF format
attributes, 49–52
adding irrelevant, 288
Boolean, 51
class, 53
as columns in tables, 49
combinations of, 65
continuous, 49
discrete, 51
enumerated, 51
highly branching, 86
identification code, 86
independent, 267
integer-valued, 49
nominal, 49
non-numeric, 17
numeric, 49
ordinal, 51
relevant, 289
rogue, 59
selecting, 288
subsets of values in, 80
types in ARFF format, 56
weighting, 237
See also orderings
AttributeSelectedClassifier, 417
attribute selection, 286–287, 288–296
attribute evaluation methods in Weka, 421,
422–423
backward elimination, 292, 294
beam search, 293
best-first search, 293
forward selection, 292, 294
race search, 295
schemata search, 295
scheme-independent selection, 290–292
scheme-specific selection, 294–296
searching the attribute space, 292–294
search methods in Weka, 421, 423–425
Weka, 392–393, 420–425
AttributeSelection, 403
attribute subset evaluators in Weka, 421, 422
AttributeSummarizer, 431
attribute transformations, 287, 305–311
principal components analysis, 306–309
random projections, 309
text to attribute vectors, 309–311
time series, 311
attribute weighting method, 237–238
AUC (area under the curve), 173
audit logs, 357
authorship ascription, 353
AutoClass, 269–270, 271
automatic data cleansing, 287, 312–315
anomalies, 314–315
improving decision trees, 312–313
robust regression, 313–314
P088407-INDEX.qxd 4/30/05 11:25 AM Page 506
I N D E X
5 0 7
automatic filtering, 315
averaging over subnetworks, 283
axis-parallel class boundaries, 242
B
background knowledge, 348
backpropagation, 227–233
backtracking, 209
backward elimination, 292, 294
backward pruning, 34, 192
bagging, 316–319
Bagging, 414–415
bagging with costs, 319–320
bag of words, 95
balanced Winnow, 128
ball tree, 133–135
basic methods. See algorithms-basic methods
batch learning, 232
Bayes, Thomas, 141
Bayesian classifier. See Naïve Bayes
Bayesian clustering, 268–270
Bayesian multinet, 279–280
Bayesian network, 141, 271–283
AD tree, 280–283
Bayesian multinet, 279–280
caveats, 276, 277
counting, 280
K2, 278
learning, 276–283
making predictions, 272–276
Markov blanket, 278–279
multiplication, 275
Naïve Bayes classifier, 278
network scoring, 277
simplifying assumptions, 272
structure learning by conditional
independence tests, 280
TAN (Tree Augmented Naïve Bayes), 279
Weka, 403–406
Bayesian network learning algorithms, 277–283
Bayesian option trees, 328–331, 343
Bayesians, 141
Bayesian scoring metrics, 277–280, 283
Bayes information, 271
BayesNet, 405
Bayes’s rule, 90, 181
beam search, 34, 293
beam width, 34
beer purchases, 27
Ben Ish Chai, 358
Bernoulli process, 147
BestFirst, 423
best-first search, 293
best-matching node, 257
bias, 32
defined, 318
language, 32–33
multilayer perceptrons, 225, 226
overfitting-avoidance, 34–35
perceptron learning rule, 124
search, 33–34
what is it, 317
bias-variance decomposition, 317, 318
big data (massive datasets), 346–349
binning
equal-frequency, 298
equal-interval, 298
equal-width, 342
binomial coefficient, 218
bits, 102
boolean, 51, 68
boosting, 321–325, 347
boosting in Weka, 416
bootstrap aggregating, 318
bootstrap estimation, 152–153
British Petroleum, 28
buildClassifier(), 453, 472, 482
C
C4.5, 105, 198–199
C5.0, 199
calm computing, 359, 362
capitalization conventions, 310
CAPPS (Computer Assisted Passenger Pre-
Screening System), 357
CART (Classification And Regression Tree), 29,
38, 199, 253
categorical attributes, 49. See also nominal
attributes
category utility, 260–262
P088407-INDEX.qxd 4/30/05 11:25 AM Page 507
Dostları ilə paylaş: |