Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

Yüklə 4,3 Mb.

Pdf görüntüsü

səhifə	214/219
tarix	08.10.2017
ölçüsü	4,3 Mb.
	#3816

1 ... 211 212 213 214 215 216 217 218 219

5 0 8

I N D E X

causal relations, 350

CfsSubsetEval, 422

chain rule (probability theory), 275

character sets, 310

ChiSquaredAttributeEval, 423

chi-squared test, 302

circular ordering, 51

city-block metric, 129

ClassAssigner, 431

class attribute, 43

class distribution, 304

class hierarchy in Weka, 471–483

classiﬁcation, 121

Classiﬁcation And Regression Tree (CART), 29,

38, 199, 253

classiﬁcation learning, 43

classiﬁcation problems, 42

classiﬁcation rules, 65–69, 200–214

converting decision trees to, 198

criteria for choosing tests, 200–201

decision list, 11

different from association rules, 42

global optimization, 205–207

good (worthwhile) rules, 202–205

missing values, 201–202

numeric attributes, 202

partial decision tree, 207–210

pruning, 203, 205

replicated subtree problem, 66–68

RIPPER rule learner, 205

rules with exceptions, 210–213

Weka, 408–409

Classiﬁer, 453

classiﬁer in Weka, 366, 471–483

classiﬁer algorithms. See learning algorithms

classiﬁer errors in Weka, 379

ClassiﬁerPerformanceEvaluator, 431

classiﬁers package, 453–455

ClassiﬁerSubsetEval, 422

Classiﬁer superclass, 480

classifyInstance(), 453, 480–481

Classify panel, 373, 384

classify text ﬁles into two categories, 461–469

Classit, 271

class noise, 313

ClassOrder, 403

class summary, 451

ClassValuePicker, 431

cleaning data, 52, 60

automatic, 312

closed world assumption, 45

ClustererPerformanceEvaluator, 431

clustering, 43, 136–139, 254–271

anomalies, 258

basic assumption, 270

Bayesian, 268–270

category utility, 260–262

Cluster panel (Weka), 391–392

document, 353

EM algorithm, 265–266

faster distance calculations, 138–139

hierarchical, 139

how many clusters?, 254–255

incremental, 255–260

k-means, 137–138

MDL principle, 183–184

merging, 257

mixture model, 262–264, 266–268

output, 81–82

probability-based, 262–265

RBF network, 234

splitting, 254–255, 257

unlabeled data, 337–339

clustering algorithms in Weka, 418–419

clustering for classiﬁcation, 337

ClusterMembership, 396, 397

Cluster panel, 391–392

Cobweb, 271

Cobweb in Weka, 419

co-EM, 340

column separation, 336

combining multiple models, 287, 315–336

additive logistic regression, 327–328

additive regression, 325–327

bagging, 316–319

bagging with costs, 319–320

boosting, 321–325

error-correcting output codes, 334–336

logistic model trees, 331

option trees, 328–331

P088407-INDEX.qxd 4/30/05 11:25 AM Page 508

I N D E X

5 0 9

randomization, 320–321

stacking, 332–334

command-line interface, 449–459

class, 450, 452

classiﬁers package, 453–455

core package, 451, 452

generic options, 456–458

instance, 450

Javadoc indices, 456

package, 451, 452

scheme-speciﬁc options, 458–459

starting up, 449–450

weka.associations, 455

weka.attributeSelection, 455

weka.clusterers, 455

weka.estimators, 455

weka.ﬁlters, 455

command-line options, 456–459

comma-separated value (CSV) format, 370,

371

comparing data mining methods, 153–157

ComplementNaiveBayes, 405

compression techniques, 362

computational learning theory, 324

computeEntropy(), 480

Computer Assisted Passenger Pre-Screening

System (CAPPS), 357

computer network security, 357

computer software. See Weka workbench

concept, 42

concept description, 42

concept description language, 32

concept representation, 82. See also knowledge

representation

conditional independence, 275

conditional likelihood for scoring networks,

280, 283

conﬁdence, 69, 113, 324

Conﬁdence, 420

conﬁdence tests, 154–157, 184

conﬂict resolution strategies, 82

confusion matrix, 163

conjunction, 65

ConjunctiveRule, 408–409

consensus ﬁlter, 342

consequent, of rule, 65

ConsistencySubsetEval, 422

constrained quadratic optimization, 217

consumer music, 359

contact lens data, 6, 13–15

continuous attributes, 49. See also numeric

attributes

continuous monitoring, 28–29

converting discrete to numeric attributes,

304–305

convex hull, 171, 216

Conviction, 420

Copy, 395

core package, 451, 452

corrected resampled t-test, 157

correlation coefﬁcient, 177–179

cost curves, 173–176

cost matrix, 164–165

cost of errors, 161–176

bagging, 319–320

cost curves, 173–176

cost-sensitive classiﬁcation, 164–165

cost-sensitive learning, 165–166

Kappa statistic, 163–164

lift charts, 166–168

recall-precision curves, 171–172

ROC curves, 168–171

cost-sensitive classiﬁcation, 164–165

CostSensitiveClassiﬁer, 417

cost-sensitive learning, 165–166

cost-sensitive learning in Weka, 417

co-training, 339–340

covariance matrix, 267, 307

coverage, of association rules, 69

covering algorithm, 106–111

cow culling, 3–4, 37, 161–162

CPU performance data, 16–17

credit approval, 22–23

cross-validated ROC curves, 170

cross-validation, 149–152, 326

inner, 286

outer, 286

repeated, 144

CrossValidationFoldMaker, 428, 431

CSV format, 370, 371

P088407-INDEX.qxd 4/30/05 11:25 AM Page 509

Yüklə 4,3 Mb.

Dostları ilə paylaş:

1 ... 211 212 213 214 215 216 217 218 219