Data Mining: Practical Machine Learning Tools and Techniques, Second Edition



Yüklə 4,3 Mb.
Pdf görüntüsü
səhifə214/219
tarix08.10.2017
ölçüsü4,3 Mb.
#3816
1   ...   211   212   213   214   215   216   217   218   219

5 0 8

I N D E X

causal relations, 350

CfsSubsetEval, 422

chain rule (probability theory), 275

character sets, 310

ChiSquaredAttributeEval, 423

chi-squared test, 302

circular ordering, 51

city-block metric, 129



ClassAssigner, 431

class attribute, 43

class distribution, 304

class hierarchy in Weka, 471–483

classification, 121

Classification And Regression Tree (CART), 29,

38, 199, 253

classification learning, 43

classification problems, 42

classification rules, 65–69, 200–214

converting decision trees to, 198

criteria for choosing tests, 200–201

decision list, 11

different from association rules, 42

global optimization, 205–207

good (worthwhile) rules, 202–205

missing values, 201–202

numeric attributes, 202

partial decision tree, 207–210

pruning, 203, 205

replicated subtree problem, 66–68

RIPPER rule learner, 205

rules with exceptions, 210–213

Weka, 408–409



Classifier, 453

classifier in Weka, 366, 471–483

classifier algorithms. See learning algorithms

classifier errors in Weka, 379



ClassifierPerformanceEvaluator, 431

classifiers package, 453–455

ClassifierSubsetEval, 422

Classifier superclass, 480

classifyInstance(), 453, 480–481

Classify panel, 373, 384

classify text files into two categories, 461–469

Classit, 271

class noise, 313



ClassOrder, 403

class summary, 451

ClassValuePicker, 431

cleaning data, 52, 60

automatic, 312

closed world assumption, 45



ClustererPerformanceEvaluator, 431

clustering, 43, 136–139, 254–271

anomalies, 258

basic assumption, 270

Bayesian, 268–270

category utility, 260–262



Cluster panel (Weka), 391–392

document, 353

EM algorithm, 265–266

faster distance calculations, 138–139

hierarchical, 139

how many clusters?, 254–255

incremental, 255–260

k-means, 137–138

MDL principle, 183–184

merging, 257

mixture model, 262–264, 266–268

output, 81–82

probability-based, 262–265

RBF network, 234

splitting, 254–255, 257

unlabeled data, 337–339

clustering algorithms in Weka, 418–419

clustering for classification, 337

ClusterMembership, 396, 397

Cluster panel, 391–392

Cobweb, 271

Cobweb in Weka, 419

co-EM, 340

column separation, 336

combining multiple models, 287, 315–336

additive logistic regression, 327–328

additive regression, 325–327

bagging, 316–319

bagging with costs, 319–320

boosting, 321–325

error-correcting output codes, 334–336

logistic model trees, 331

option trees, 328–331

P088407-INDEX.qxd  4/30/05  11:25 AM  Page 508



I N D E X

5 0 9


randomization, 320–321

stacking, 332–334

command-line interface, 449–459

class, 450, 452



classifiers package, 453–455

core package, 451, 452

generic options, 456–458

instance, 450

Javadoc indices, 456

package, 451, 452

scheme-specific options, 458–459

starting up, 449–450

weka.associations, 455

weka.attributeSelection, 455

weka.clusterers, 455

weka.estimators, 455

weka.filters, 455

command-line options, 456–459

comma-separated value (CSV) format, 370,

371


comparing data mining methods, 153–157

ComplementNaiveBayes, 405

compression techniques, 362

computational learning theory, 324

computeEntropy(), 480

Computer Assisted Passenger Pre-Screening

System (CAPPS), 357

computer network security, 357

computer software. See Weka workbench

concept, 42

concept description, 42

concept description language, 32

concept representation, 82. See also knowledge

representation

conditional independence, 275

conditional likelihood for scoring networks,

280, 283

confidence, 69, 113, 324



Confidence, 420

confidence tests, 154–157, 184

conflict resolution strategies, 82

confusion matrix, 163

conjunction, 65

ConjunctiveRule, 408–409

consensus filter, 342

consequent, of rule, 65

ConsistencySubsetEval, 422

constrained quadratic optimization, 217

consumer music, 359

contact lens data, 6, 13–15

continuous attributes, 49. See also numeric

attributes

continuous monitoring, 28–29

converting discrete to numeric attributes,

304–305

convex hull, 171, 216



Conviction, 420

Copy, 395

core package, 451, 452

corrected resampled t-test, 157

correlation coefficient, 177–179

cost curves, 173–176

cost matrix, 164–165

cost of errors, 161–176

bagging, 319–320

cost curves, 173–176

cost-sensitive classification, 164–165

cost-sensitive learning, 165–166

Kappa statistic, 163–164

lift charts, 166–168

recall-precision curves, 171–172

ROC curves, 168–171

cost-sensitive classification, 164–165

CostSensitiveClassifier, 417

cost-sensitive learning, 165–166

cost-sensitive learning in Weka, 417

co-training, 339–340

covariance matrix, 267, 307

coverage, of association rules, 69

covering algorithm, 106–111

cow culling, 3–4, 37, 161–162

CPU performance data, 16–17

credit approval, 22–23

cross-validated ROC curves, 170

cross-validation, 149–152, 326

inner, 286

outer, 286

repeated, 144

CrossValidationFoldMaker, 428, 431

CSV format, 370, 371

P088407-INDEX.qxd  4/30/05  11:25 AM  Page 509



Yüklə 4,3 Mb.

Dostları ilə paylaş:
1   ...   211   212   213   214   215   216   217   218   219




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©www.genderi.org 2024
rəhbərliyinə müraciət

    Ana səhifə