Data Mining: Practical Machine Learning Tools and Techniques, Second Edition



Yüklə 4,3 Mb.
Pdf görüntüsü
səhifə217/219
tarix08.10.2017
ölçüsü4,3 Mb.
#3816
1   ...   211   212   213   214   215   216   217   218   219

5 1 4

I N D E X

implementation—real-world schemes

(continued)

numeric prediction, 243–254

See also individual subject headings

inaccurate values, 59–60. See also cost of errors;

data cleaning; error rate

incremental algorithms, 346



incrementalClassifier, 434

IncrementalClassifierEvaluator, 431

incremental clustering, 255–260

incremental learning in Weka, 433–435

incremental reduced-error pruning, 203, 205

independent attributes, 267

index(), 472

induction, 29

inductive logic programming, 48, 60, 75, 351

Induct system, 214

industrial usage. See implementation—real-

world schemes

inferring rudimentary rules, 84–88

InfoGainAttributeEval, 422–423

informational loss function, 159–160, 161

information-based heuristic, 201

information extraction, 354

information gain, 99

information retrieval, 171

information value, 102

infrequent words, 353

inner cross-validation, 286

input, 41–60

ARFF format, 53–55

assembling the data, 52–53

attribute, 49–52

attribute types, 56–57

concept, 42–45

data engineering, 286–287, 288–315. See also

engineering input and output

data preparation, 52–60

getting to know your data, 60

inaccurate values, 59–60

instances, 45

missing values, 58

sparse data, 55–56

input layer, 224

instance in Weka, 450

Instance, 451

instance-based learning, 78, 128–136, 235–243

ball tree, 133–135

distance functions, 128–129, 239–242

finding nearest neighbors, 129–135

generalized distance functions, 241–242

generalized exemplars, 236

kD-trees, 130–132

missing values, 129

pruning noisy exemplars, 236–237

redundant exemplars, 236

simple method, 128–136, 235–236

weighting attributes, 237–238

Weka, 413–414

instance-based learning methods, 291

instance-based methods, 34

instance-based representation, 76–80

instance filters in Weka, 394, 400–401, 403

instances, 45



Instances, 451

instance space, 79

instance weights, 166, 321–322

integer-valued attributes, 49

intensive care patients, 29

interval, 88

interval quantities, 50–51

intrusion detection systems, 357



invertSelection, 382

in vitro fertilization, 3

iris dataset, 15–16



iris setosa, 15

iris versicolor, 15

iris virginica, 15

ISO-8601 combined date and time format, 55

item, 113

item sets, 113, 114–115

iterative distance-based clustering, 137–138

J

J4.8, 373–377



J48, 404, 450

Javadoc indices, 456



JDBC database, 445

JRip, 409

junk email filtering, 356–357

P088407-INDEX.qxd  4/30/05  11:25 AM  Page 514



I N D E X

5 1 5


K

K2, 278


Kappa statistic, 163–164

kD-trees, 130–132, 136

Kepler’s three laws of planetary motion, 180

kernel

defined, 235



perceptron, 223

polynomial, 218

RBF, 219

sigmoid, 219

kernel density estimation, 97

kernel logistic regression, 223

kernel perceptron, 222–223

k-means, 137–138

k-nearest-neighbor method, 78

Knowledge Flow interface, 427–435

configuring/connecting components,

431–433


evaluation components, 430, 431

incremental learning, 433–435

starting up, 427

visualization components, 430–431

knowledge representation, 61–82

association rules, 69–70. See also association

rules

classification rules, 65–69. See also



classification rules

clusters, 81–82. See also clustering

decision table, 62

decision tree, 62–65. See also decision tree

instance-based representation, 76–80

rules involving relations, 73–75

rules with exceptions, 70–73, 210–213

trees for numeric prediction, 76



KStar, 413

L

labor negotiations data, 17–18, 19

language bias, 32–33

language identification, 353

Laplace, Pierre, 91

Laplace estimator, 91, 267, 269

large datasets, 346–349

law of diminishing returns, 347

lazy classifiers in Weka, 405, 413–414

LBR, 414

learning, 7–9

learning algorithms in Weka, 403–404

algorithm, listed, 404–405

Bayesian classifier, 403–406

functions, 404–405, 409–410

lazy classifiers, 405, 413–414

miscellaneous classifiers, 405, 414

neural network, 411–413

rules, 404, 408–409

trees, 404, 406–408

learning rate, 229, 230

least-absolute-error regression, 220

LeastMedSq, 409–410

leave-one-out cross-validation, 151–152

levels of measurement, 50

level-0 model, 332

level-1 model, 332

Leverage, 420

Lift, 420

lift chart, 166–168, 172

lift factor, 166

linear classification, 121–128

linearly separable, 124

linear machine, 142

linear models, 119–128, 214–235

backpropagation, 227–233

computational complexity, 218

kernel perceptron, 222–223

linear classification, 121–128

linear regression, 119–121

logistic regression, 121–125

maximum margin hyperplane, 215–217

multilayer perceptrons, 223–226, 231, 233

nonlinear class boundaries, 217–219

numeric prediction, 119–120

overfitting, 217–218

perceptron, 124–126

RBF network, 234

support vector regression, 219–222

Winnow, 126–128

linear regression, 77, 119–121

LinearRegression, 387, 409

linear threshold unit, 142

P088407-INDEX.qxd  4/30/05  11:25 AM  Page 515



Yüklə 4,3 Mb.

Dostları ilə paylaş:
1   ...   211   212   213   214   215   216   217   218   219




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©www.genderi.org 2024
rəhbərliyinə müraciət

    Ana səhifə