5 0 8
I N D E X
causal relations, 350
CfsSubsetEval, 422
chain rule (probability theory), 275
character sets, 310
ChiSquaredAttributeEval, 423
chi-squared test, 302
circular ordering, 51
city-block metric, 129
ClassAssigner, 431
class attribute, 43
class distribution, 304
class hierarchy in Weka, 471–483
classification, 121
Classification And Regression Tree (CART), 29,
38, 199, 253
classification learning, 43
classification problems, 42
classification rules, 65–69, 200–214
converting decision trees to, 198
criteria for choosing tests, 200–201
decision list, 11
different from association rules, 42
global optimization, 205–207
good (worthwhile) rules, 202–205
missing values, 201–202
numeric attributes, 202
partial decision tree, 207–210
pruning, 203, 205
replicated subtree problem, 66–68
RIPPER rule learner, 205
rules with exceptions, 210–213
Weka, 408–409
Classifier, 453
classifier in Weka, 366, 471–483
classifier algorithms. See learning algorithms
classifier errors in Weka, 379
ClassifierPerformanceEvaluator, 431
classifiers package, 453–455
ClassifierSubsetEval, 422
Classifier superclass, 480
classifyInstance(), 453, 480–481
Classify panel, 373, 384
classify text files into two categories, 461–469
Classit, 271
class noise, 313
ClassOrder, 403
class summary, 451
ClassValuePicker, 431
cleaning data, 52, 60
automatic, 312
closed world assumption, 45
ClustererPerformanceEvaluator, 431
clustering, 43, 136–139, 254–271
anomalies, 258
basic assumption, 270
Bayesian, 268–270
category utility, 260–262
Cluster panel (Weka), 391–392
document, 353
EM algorithm, 265–266
faster distance calculations, 138–139
hierarchical, 139
how many clusters?, 254–255
incremental, 255–260
k-means, 137–138
MDL principle, 183–184
merging, 257
mixture model, 262–264, 266–268
output, 81–82
probability-based, 262–265
RBF network, 234
splitting, 254–255, 257
unlabeled data, 337–339
clustering algorithms in Weka, 418–419
clustering for classification, 337
ClusterMembership, 396, 397
Cluster panel, 391–392
Cobweb, 271
Cobweb in Weka, 419
co-EM, 340
column separation, 336
combining multiple models, 287, 315–336
additive logistic regression, 327–328
additive regression, 325–327
bagging, 316–319
bagging with costs, 319–320
boosting, 321–325
error-correcting output codes, 334–336
logistic model trees, 331
option trees, 328–331
P088407-INDEX.qxd 4/30/05 11:25 AM Page 508
I N D E X
5 0 9
randomization, 320–321
stacking, 332–334
command-line interface, 449–459
class, 450, 452
classifiers package, 453–455
core package, 451, 452
generic options, 456–458
instance, 450
Javadoc indices, 456
package, 451, 452
scheme-specific options, 458–459
starting up, 449–450
weka.associations, 455
weka.attributeSelection, 455
weka.clusterers, 455
weka.estimators, 455
weka.filters, 455
command-line options, 456–459
comma-separated value (CSV) format, 370,
371
comparing data mining methods, 153–157
ComplementNaiveBayes, 405
compression techniques, 362
computational learning theory, 324
computeEntropy(), 480
Computer Assisted Passenger Pre-Screening
System (CAPPS), 357
computer network security, 357
computer software. See Weka workbench
concept, 42
concept description, 42
concept description language, 32
concept representation, 82. See also knowledge
representation
conditional independence, 275
conditional likelihood for scoring networks,
280, 283
confidence, 69, 113, 324
Confidence, 420
confidence tests, 154–157, 184
conflict resolution strategies, 82
confusion matrix, 163
conjunction, 65
ConjunctiveRule, 408–409
consensus filter, 342
consequent, of rule, 65
ConsistencySubsetEval, 422
constrained quadratic optimization, 217
consumer music, 359
contact lens data, 6, 13–15
continuous attributes, 49. See also numeric
attributes
continuous monitoring, 28–29
converting discrete to numeric attributes,
304–305
convex hull, 171, 216
Conviction, 420
Copy, 395
core package, 451, 452
corrected resampled t-test, 157
correlation coefficient, 177–179
cost curves, 173–176
cost matrix, 164–165
cost of errors, 161–176
bagging, 319–320
cost curves, 173–176
cost-sensitive classification, 164–165
cost-sensitive learning, 165–166
Kappa statistic, 163–164
lift charts, 166–168
recall-precision curves, 171–172
ROC curves, 168–171
cost-sensitive classification, 164–165
CostSensitiveClassifier, 417
cost-sensitive learning, 165–166
cost-sensitive learning in Weka, 417
co-training, 339–340
covariance matrix, 267, 307
coverage, of association rules, 69
covering algorithm, 106–111
cow culling, 3–4, 37, 161–162
CPU performance data, 16–17
credit approval, 22–23
cross-validated ROC curves, 170
cross-validation, 149–152, 326
inner, 286
outer, 286
repeated, 144
CrossValidationFoldMaker, 428, 431
CSV format, 370, 371
P088407-INDEX.qxd 4/30/05 11:25 AM Page 509
Dostları ilə paylaş: |