Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

Yüklə 4,3 Mb.

Pdf görüntüsü

səhifə	213/219
tarix	08.10.2017
ölçüsü	4,3 Mb.
	#3816

1 ... 209 210 211 212 213 214 215 216 ... 219

5 0 6

I N D E X

anomaly detection systems, 357

antecedent, of rule, 65

AODE, 405

Apriori, 419

Apriori method, 141

area under the curve (AUC), 173

ARFF format, 53–55

converting ﬁles to, 380–382

Weka, 370, 371

ARFFLoader, 381, 427

arithmetic underﬂow, 276

assembling the data, 52–53

assessing performance of learning scheme, 286

assignment of key phrases, 353

Associate panel, 392

association learning, 43

association rules, 69–70, 112–119

binary attributes, 119

generating rules efﬁciently, 117–118

item sets, 113, 114–115

Weka, 419–420

association-rule learners in Weka, 419–420

attackers, 357

Attribute, 451

attribute(), 480

attribute discretization. See discretizing

numeric attributes

attribute-efﬁcient, 128

attribute evaluation methods in Weka, 421,

422–423

attribute ﬁlters in Weka, 394, 395–400, 402–403

attributeIndices, 382

attribute noise, 313

attribute-relation ﬁle format. See ARFF format

attributes, 49–52

adding irrelevant, 288

Boolean, 51

class, 53

as columns in tables, 49

combinations of, 65

continuous, 49

discrete, 51

enumerated, 51

highly branching, 86

identiﬁcation code, 86

independent, 267

integer-valued, 49

nominal, 49

non-numeric, 17

numeric, 49

ordinal, 51

relevant, 289

rogue, 59

selecting, 288

subsets of values in, 80

types in ARFF format, 56

weighting, 237

See also orderings

AttributeSelectedClassiﬁer, 417

attribute selection, 286–287, 288–296

attribute evaluation methods in Weka, 421,

422–423

backward elimination, 292, 294

beam search, 293

best-ﬁrst search, 293

forward selection, 292, 294

race search, 295

schemata search, 295

scheme-independent selection, 290–292

scheme-speciﬁc selection, 294–296

searching the attribute space, 292–294

search methods in Weka, 421, 423–425

Weka, 392–393, 420–425

AttributeSelection, 403

attribute subset evaluators in Weka, 421, 422

AttributeSummarizer, 431

attribute transformations, 287, 305–311

principal components analysis, 306–309

random projections, 309

text to attribute vectors, 309–311

time series, 311

attribute weighting method, 237–238

AUC (area under the curve), 173

audit logs, 357

authorship ascription, 353

AutoClass, 269–270, 271

automatic data cleansing, 287, 312–315

anomalies, 314–315

improving decision trees, 312–313

robust regression, 313–314

P088407-INDEX.qxd 4/30/05 11:25 AM Page 506

I N D E X

5 0 7

automatic ﬁltering, 315

averaging over subnetworks, 283

axis-parallel class boundaries, 242

B

background knowledge, 348

backpropagation, 227–233

backtracking, 209

backward elimination, 292, 294

backward pruning, 34, 192

bagging, 316–319

Bagging, 414–415

bagging with costs, 319–320

bag of words, 95

balanced Winnow, 128

ball tree, 133–135

basic methods. See algorithms-basic methods

batch learning, 232

Bayes, Thomas, 141

Bayesian classiﬁer. See Naïve Bayes

Bayesian clustering, 268–270

Bayesian multinet, 279–280

Bayesian network, 141, 271–283

AD tree, 280–283

Bayesian multinet, 279–280

caveats, 276, 277

counting, 280

K2, 278

learning, 276–283

making predictions, 272–276

Markov blanket, 278–279

multiplication, 275

Naïve Bayes classiﬁer, 278

network scoring, 277

simplifying assumptions, 272

structure learning by conditional

independence tests, 280

TAN (Tree Augmented Naïve Bayes), 279

Weka, 403–406

Bayesian network learning algorithms, 277–283

Bayesian option trees, 328–331, 343

Bayesians, 141

Bayesian scoring metrics, 277–280, 283

Bayes information, 271

BayesNet, 405

Bayes’s rule, 90, 181

beam search, 34, 293

beam width, 34

beer purchases, 27

Ben Ish Chai, 358

Bernoulli process, 147

BestFirst, 423

best-ﬁrst search, 293

best-matching node, 257

bias, 32

deﬁned, 318

language, 32–33

multilayer perceptrons, 225, 226

overﬁtting-avoidance, 34–35

perceptron learning rule, 124

search, 33–34

what is it, 317

bias-variance decomposition, 317, 318

big data (massive datasets), 346–349

binning

equal-frequency, 298

equal-interval, 298

equal-width, 342

binomial coefﬁcient, 218

bits, 102

boolean, 51, 68

boosting, 321–325, 347

boosting in Weka, 416

bootstrap aggregating, 318

bootstrap estimation, 152–153

British Petroleum, 28

buildClassiﬁer(), 453, 472, 482

C

C4.5, 105, 198–199

C5.0, 199

calm computing, 359, 362

capitalization conventions, 310

CAPPS (Computer Assisted Passenger Pre-

Screening System), 357

CART (Classiﬁcation And Regression Tree), 29,

38, 199, 253

categorical attributes, 49. See also nominal

attributes

category utility, 260–262

P088407-INDEX.qxd 4/30/05 11:25 AM Page 507

Yüklə 4,3 Mb.

Dostları ilə paylaş:

1 ... 209 210 211 212 213 214 215 216 ... 219