Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

Yüklə 4,3 Mb.

Pdf görüntüsü

səhifə	215/219
tarix	08.10.2017
ölçüsü	4,3 Mb.
	#3816

1 ... 211 212 213 214 215 216 217 218 219

5 1 0

I N D E X

CSVLoader, 381

cumulative margin distribution in Weka, 458

curves

cost, 173

lift, 166

recall-precision, 171

ROC, 168

customer support and service, 28

cutoff parameter, 260

CVParameterSelection, 417

cybersecurity, 29

dairy farmers (New Zealand), 3–4, 37, 161–162

data assembly, 52–53

data cleaning, 52–60. See also automatic data

cleansing

data engineering. See engineering input and

output

data integration, 52

data mining, 4–5, 9

data ownership rights, 35

data preparation, 52–60

data transformation. See attribute

transformations

DataVisualizer, 389, 390, 430

data warehouse, 52–53

date attributes, 55

decision list, 11, 67

decision nodes, 328

decision stump, 325

DecisionStump, 407, 453, 454

decision table, 62, 295

DecisionTable, 408

decision tree, 14, 62–65, 97–105

complexity of induction, 196

converting to rules, 198

data cleaning, 312–313

error rates, 192–196

highly branching attributes, 102–105

missing values, 63, 191–192

multiclass case, 107

multivariate, 199

nominal attribute, 62

numeric attribute, 62, 189–191

partial, 207–210

pruning, 192–193, 312

replicated subtree, 66

rules, 198

subtree raising, 193, 197

subtree replacement, 192–193, 197

three-way split, 63

top-down induction, 97–105, 196–198

two-way split, 62

univariate, 199

Weka, 406–408

Weka’s User Classifer facility, 63–65

Decorate, 416

deduction, 350

default rule, 110

degrees of freedom, 93, 155

delta, 311

dendrograms, 82

denormalization, 47

density function, 93

diagnosis, 25–26

dichotomy, 51

directed acyclic graph, 272

direct marketing, 27

discrete attributes, 50. See also nominal

attributes

Discretize, 396, 398, 402

discretizing numeric attributes, 287, 296–305

chi-squared test, 302

converting discrete to numeric attributes,

304–305

entropy-based discretization, 298–302

error-based discretization, 302–304

global discretization, 297

local discretization, 297

supervised discretization, 297, 298

unsupervised discretization, 297–298

Weka, 398

disjunction, 32, 65

disjunctive normal form, 69

distance functions, 128–129, 239–242

distributed experiments in Weka, 445

distribution, 304

distributionForInstance(), 453, 481

divide-and-conquer. See decision tree

P088407-INDEX.qxd 4/30/05 11:25 AM Page 510

I N D E X

5 1 1

document classiﬁcation, 94–96, 352–353

document clustering, 353

domain knowledge, 20, 33, 349–351

double-consequent rules, 118

duplicate data, 59

dynamic programming, 302

early stopping, 233

easy instances, 322

ecological applications, 23, 28

eigenvalue, 307

eigenvector, 307

Einstein, Albert, 180

electricity supply, 24–25

electromechanical diagnosis application,

144

11-point average recall, 172

EM, 418

EM algorithm, 265–266

EM and co-training, 340–341

EM procedure, 337–338

embedded machine learning, 461–469

engineering input and output, 285–343

attribute selection, 288–296

combining multiple models, 315–336

data cleansing, 312–315

discretizing numeric attributes, 296–305

unlabeled data, 337–341

See also individual subject headings

entity extraction, 353

entropy, 102

entropy-based discretization, 298–302

enumerated attributes, 50. See also nominal

attributes

enumerating the concept space, 31–32

Epicurus, 183

epoch, 412

equal-frequency binning, 298

equal-interval binning, 298

equal-width binning, 342

erroneous values, 59

error-based discretization, 302–304

error-correcting output codes, 334–336

error log, 378

error rate

bias, 317

cost of errors. See cost of errors

decision tree, 192–196

deﬁned, 144

training data, 145

“Essay towards solving a problem in the

doctrine of chances, An” (Bayes), 141

ethics, 35–37

Euclidean distance, 78, 128, 129, 237

evaluation, 143–185

bootstrap procedure, 152–153

comparing data mining methods, 153–157

cost of errors, 161–176. See also cost of

errors

cross-validation, 149–152

leave-one-out cross-validation, 151–152

MDL principle, 179–184

numeric prediction, 176–179

predicting performance, 146–149

predicting probabilities, 157–161

training and testing, 144–146

evaluation(), 482

evaluation components in Weka, 430, 431

Evaluation panel, 431

example problems

contact lens data, 6, 13–15

CPU performance data, 16–17

iris dataset, 15–16

labor negotiations data, 17–18, 19

soybean data, 18–22

weather problem, 10–12

exceptions, 70–73, 210–213

exclusive-or problem, 67

exemplar

deﬁned, 236

generalized, 238–239

noisy, 236–237

redundant, 236

exemplar generalization, 238–239, 243

ExhaustiveSearch, 424

Expand all paths, 408

expectation, 265, 267

expected error, 174

expected success rate, 147

P088407-INDEX.qxd 4/30/05 11:25 AM Page 511

Yüklə 4,3 Mb.

Dostları ilə paylaş:

1 ... 211 212 213 214 215 216 217 218 219