Data Mining: Practical Machine Learning Tools and Techniques, Second Edition



Yüklə 4,3 Mb.
Pdf görüntüsü
səhifə215/219
tarix08.10.2017
ölçüsü4,3 Mb.
#3816
1   ...   211   212   213   214   215   216   217   218   219

5 1 0

I N D E X



CSVLoader, 381

cumulative margin distribution in Weka, 458

curves

cost, 173



lift, 166

recall-precision, 171

ROC, 168

customer support and service, 28

cutoff parameter, 260

CVParameterSelection, 417

cybersecurity, 29



D

dairy farmers (New Zealand), 3–4, 37, 161–162

data assembly, 52–53

data cleaning, 52–60. See also automatic data

cleansing

data engineering. See engineering input and

output

data integration, 52



data mining, 4–5, 9

data ownership rights, 35

data preparation, 52–60

data transformation. See attribute

transformations

DataVisualizer, 389, 390, 430

data warehouse, 52–53

date attributes, 55

decision list, 11, 67

decision nodes, 328

decision stump, 325



DecisionStump, 407, 453, 454

decision table, 62, 295



DecisionTable, 408

decision tree, 14, 62–65, 97–105

complexity of induction, 196

converting to rules, 198

data cleaning, 312–313

error rates, 192–196

highly branching attributes, 102–105

missing values, 63, 191–192

multiclass case, 107

multivariate, 199

nominal attribute, 62

numeric attribute, 62, 189–191

partial, 207–210

pruning, 192–193, 312

replicated subtree, 66

rules, 198

subtree raising, 193, 197

subtree replacement, 192–193, 197

three-way split, 63

top-down induction, 97–105, 196–198

two-way split, 62

univariate, 199

Weka, 406–408

Weka’s User Classifer facility, 63–65



Decorate, 416

deduction, 350

default rule, 110

degrees of freedom, 93, 155

delta, 311

dendrograms, 82

denormalization, 47

density function, 93

diagnosis, 25–26

dichotomy, 51

directed acyclic graph, 272

direct marketing, 27

discrete attributes, 50. See also nominal

attributes



Discretize, 396, 398, 402

discretizing numeric attributes, 287, 296–305

chi-squared test, 302

converting discrete to numeric attributes,

304–305

entropy-based discretization, 298–302



error-based discretization, 302–304

global discretization, 297

local discretization, 297

supervised discretization, 297, 298

unsupervised discretization, 297–298

Weka, 398

disjunction, 32, 65

disjunctive normal form, 69

distance functions, 128–129, 239–242

distributed experiments in Weka, 445

distribution, 304

distributionForInstance(), 453, 481

divide-and-conquer. See decision tree

P088407-INDEX.qxd  4/30/05  11:25 AM  Page 510



I N D E X

5 1 1


document classification, 94–96, 352–353

document clustering, 353

domain knowledge, 20, 33, 349–351

double-consequent rules, 118

duplicate data, 59

dynamic programming, 302



E

early stopping, 233

easy instances, 322

ecological applications, 23, 28

eigenvalue, 307

eigenvector, 307

Einstein, Albert, 180

electricity supply, 24–25

electromechanical diagnosis application,

144


11-point average recall, 172

EM, 418

EM algorithm, 265–266

EM and co-training, 340–341

EM procedure, 337–338

embedded machine learning, 461–469

engineering input and output, 285–343

attribute selection, 288–296

combining multiple models, 315–336

data cleansing, 312–315

discretizing numeric attributes, 296–305

unlabeled data, 337–341

See also individual subject headings

entity extraction, 353

entropy, 102

entropy-based discretization, 298–302

enumerated attributes, 50. See also nominal

attributes

enumerating the concept space, 31–32

Epicurus, 183

epoch, 412

equal-frequency binning, 298

equal-interval binning, 298

equal-width binning, 342

erroneous values, 59

error-based discretization, 302–304

error-correcting output codes, 334–336

error log, 378

error rate

bias, 317

cost of errors. See cost of errors

decision tree, 192–196

defined, 144

training data, 145

“Essay towards solving a problem in the

doctrine of chances, An” (Bayes), 141

ethics, 35–37

Euclidean distance, 78, 128, 129, 237

evaluation, 143–185

bootstrap procedure, 152–153

comparing data mining methods, 153–157

cost of errors, 161–176. See also cost of

errors

cross-validation, 149–152



leave-one-out cross-validation, 151–152

MDL principle, 179–184

numeric prediction, 176–179

predicting performance, 146–149

predicting probabilities, 157–161

training and testing, 144–146



evaluation(), 482

evaluation components in Weka, 430, 431



Evaluation panel, 431

example problems

contact lens data, 6, 13–15

CPU performance data, 16–17

iris dataset, 15–16

labor negotiations data, 17–18, 19

soybean data, 18–22

weather problem, 10–12

exceptions, 70–73, 210–213

exclusive-or problem, 67

exemplar

defined, 236

generalized, 238–239

noisy, 236–237

redundant, 236

exemplar generalization, 238–239, 243



ExhaustiveSearch, 424

Expand all paths, 408

expectation, 265, 267

expected error, 174

expected success rate, 147

P088407-INDEX.qxd  4/30/05  11:25 AM  Page 511



Yüklə 4,3 Mb.

Dostları ilə paylaş:
1   ...   211   212   213   214   215   216   217   218   219




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©www.genderi.org 2024
rəhbərliyinə müraciət

    Ana səhifə