Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

Yüklə 4,3 Mb.

Pdf görüntüsü

səhifə	217/219
tarix	08.10.2017
ölçüsü	4,3 Mb.
	#3816

1 ... 211 212 213 214 215 216 217 218 219

5 1 4

I N D E X

implementation—real-world schemes

(continued)

numeric prediction, 243–254

See also individual subject headings

inaccurate values, 59–60. See also cost of errors;

data cleaning; error rate

incremental algorithms, 346

incrementalClassiﬁer, 434

IncrementalClassiﬁerEvaluator, 431

incremental clustering, 255–260

incremental learning in Weka, 433–435

incremental reduced-error pruning, 203, 205

independent attributes, 267

index(), 472

induction, 29

inductive logic programming, 48, 60, 75, 351

Induct system, 214

industrial usage. See implementation—real-

world schemes

inferring rudimentary rules, 84–88

InfoGainAttributeEval, 422–423

informational loss function, 159–160, 161

information-based heuristic, 201

information extraction, 354

information gain, 99

information retrieval, 171

information value, 102

infrequent words, 353

inner cross-validation, 286

input, 41–60

ARFF format, 53–55

assembling the data, 52–53

attribute, 49–52

attribute types, 56–57

concept, 42–45

data engineering, 286–287, 288–315. See also

engineering input and output

data preparation, 52–60

getting to know your data, 60

inaccurate values, 59–60

instances, 45

missing values, 58

sparse data, 55–56

input layer, 224

instance in Weka, 450

Instance, 451

instance-based learning, 78, 128–136, 235–243

ball tree, 133–135

distance functions, 128–129, 239–242

ﬁnding nearest neighbors, 129–135

generalized distance functions, 241–242

generalized exemplars, 236

kD-trees, 130–132

missing values, 129

pruning noisy exemplars, 236–237

redundant exemplars, 236

simple method, 128–136, 235–236

weighting attributes, 237–238

Weka, 413–414

instance-based learning methods, 291

instance-based methods, 34

instance-based representation, 76–80

instance ﬁlters in Weka, 394, 400–401, 403

instances, 45

Instances, 451

instance space, 79

instance weights, 166, 321–322

integer-valued attributes, 49

intensive care patients, 29

interval, 88

interval quantities, 50–51

intrusion detection systems, 357

invertSelection, 382

in vitro fertilization, 3

iris dataset, 15–16

iris setosa, 15

iris versicolor, 15

iris virginica, 15

ISO-8601 combined date and time format, 55

item, 113

item sets, 113, 114–115

iterative distance-based clustering, 137–138

J

J4.8, 373–377

J48, 404, 450

Javadoc indices, 456

JDBC database, 445

JRip, 409

junk email ﬁltering, 356–357

P088407-INDEX.qxd 4/30/05 11:25 AM Page 514

I N D E X

5 1 5

K2, 278

Kappa statistic, 163–164

kD-trees, 130–132, 136

Kepler’s three laws of planetary motion, 180

kernel

deﬁned, 235

perceptron, 223

polynomial, 218

RBF, 219

sigmoid, 219

kernel density estimation, 97

kernel logistic regression, 223

kernel perceptron, 222–223

k-means, 137–138

k-nearest-neighbor method, 78

Knowledge Flow interface, 427–435

conﬁguring/connecting components,

431–433

evaluation components, 430, 431

incremental learning, 433–435

starting up, 427

visualization components, 430–431

knowledge representation, 61–82

association rules, 69–70. See also association

rules

classiﬁcation rules, 65–69. See also

classiﬁcation rules

clusters, 81–82. See also clustering

decision table, 62

decision tree, 62–65. See also decision tree

instance-based representation, 76–80

rules involving relations, 73–75

rules with exceptions, 70–73, 210–213

trees for numeric prediction, 76

KStar, 413

L

labor negotiations data, 17–18, 19

language bias, 32–33

language identiﬁcation, 353

Laplace, Pierre, 91

Laplace estimator, 91, 267, 269

large datasets, 346–349

law of diminishing returns, 347

lazy classiﬁers in Weka, 405, 413–414

LBR, 414

learning, 7–9

learning algorithms in Weka, 403–404

algorithm, listed, 404–405

Bayesian classiﬁer, 403–406

functions, 404–405, 409–410

lazy classiﬁers, 405, 413–414

miscellaneous classiﬁers, 405, 414

neural network, 411–413

rules, 404, 408–409

trees, 404, 406–408

learning rate, 229, 230

least-absolute-error regression, 220

LeastMedSq, 409–410

leave-one-out cross-validation, 151–152

levels of measurement, 50

level-0 model, 332

level-1 model, 332

Leverage, 420

Lift, 420

lift chart, 166–168, 172

lift factor, 166

linear classiﬁcation, 121–128

linearly separable, 124

linear machine, 142

linear models, 119–128, 214–235

backpropagation, 227–233

computational complexity, 218

kernel perceptron, 222–223

linear classiﬁcation, 121–128

linear regression, 119–121

logistic regression, 121–125

maximum margin hyperplane, 215–217

multilayer perceptrons, 223–226, 231, 233

nonlinear class boundaries, 217–219

numeric prediction, 119–120

overﬁtting, 217–218

perceptron, 124–126

RBF network, 234

support vector regression, 219–222

Winnow, 126–128

linear regression, 77, 119–121

LinearRegression, 387, 409

linear threshold unit, 142

P088407-INDEX.qxd 4/30/05 11:25 AM Page 515

Yüklə 4,3 Mb.

Dostları ilə paylaş:

1 ... 211 212 213 214 215 216 217 218 219