Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

Credibility: Evaluating what’s been learned

Yüklə 4,3 Mb.

Pdf görüntüsü

səhifə	4/219
tarix	08.10.2017
ölçüsü	4,3 Mb.
	#3816

1 2 3 4 5 6 7 8 9 ... 219

Other estimates 151 Leave-one-out 151 The bootstrap 152 5.5
Implementations: Real machine learning schemes 187 6.1 Decision trees 189

5

Credibility: Evaluating what’s been learned

143

5.1

Training and testing

144

5.2

Predicting performance

146

5.3

Cross-validation

149

5.4

Other estimates

151

Leave-one-out

151

The bootstrap

152

5.5

Comparing data mining methods

153

5.6

Predicting probabilities

157

Quadratic loss function

158

Informational loss function

159

Discussion

160

5.7

Counting the cost

161

Cost-sensitive classiﬁcation

164

Cost-sensitive learning

165

Lift charts

166

ROC curves

168

Recall–precision curves

171

Discussion

172

Cost curves

173

5.8

Evaluating numeric prediction

176

5.9

The minimum description length principle

179

5.10

Applying the MDL principle to clustering

183

5.11

Further reading

184

6

Implementations: Real machine learning schemes

187

6.1

Decision trees

189

Numeric attributes

189

Missing values

191

Pruning

192

Estimating error rates

193

Complexity of decision tree induction

196

From trees to rules

198

C4.5: Choices and options

198

Discussion

199

6.2

Classiﬁcation rules

200

Criteria for choosing tests

200

Missing values, numeric attributes

201

C O N T E N TS

P088407-FM.qxd 4/30/05 10:55 AM Page x

Generating good rules

202

Using global optimization

205

Obtaining rules from partial decision trees

207

Rules with exceptions

210

Discussion

213

6.3

Extending linear models

214

The maximum margin hyperplane

215

Nonlinear class boundaries

217

Support vector regression

219

The kernel perceptron

222

Multilayer perceptrons

223

Discussion

235

6.4

Instance-based learning

235

Reducing the number of exemplars

236

Pruning noisy exemplars

236

Weighting attributes

237

Generalizing exemplars

238

Distance functions for generalized exemplars

239

Generalized distance functions

241

Discussion

242

6.5

Numeric prediction

243

Model trees

244

Building the tree

245

Pruning the tree

245

Nominal attributes

246

Missing values

246

Pseudocode for model tree induction

247

Rules from model trees

250

Locally weighted linear regression

251

Discussion

253

6.6

Clustering

254

Choosing the number of clusters

254

Incremental clustering

255

Category utility

260

Probability-based clustering

262

The EM algorithm

265

Extending the mixture model

266

Bayesian clustering

268

Discussion

270

6.7

Bayesian networks

271

Making predictions

272

Learning Bayesian networks

276

C O N T E N TS

x i

P088407-FM.qxd 4/30/05 10:55 AM Page xi

Speciﬁc algorithms

278

Data structures for fast learning

280

Discussion

283

7

Transformations: Engineering the input and output

285

7.1

Attribute selection

288

Scheme-independent selection

290

Searching the attribute space

292

Scheme-speciﬁc selection

294

7.2

Discretizing numeric attributes

296

Unsupervised discretization

297

Entropy-based discretization

298

Other discretization methods

302

Entropy-based versus error-based discretization

302

Converting discrete to numeric attributes

304

7.3

Some useful transformations

305

Principal components analysis

306

Random projections

309

Text to attribute vectors

309

Time series

311

7.4

Automatic data cleansing

312

Improving decision trees

312

Robust regression

313

Detecting anomalies

314

7.5

Combining multiple models

315

Bagging

316

Bagging with costs

319

Randomization

320

Boosting

321

Additive regression

325

Additive logistic regression

327

Option trees

328

Logistic model trees

331

Stacking

332

Error-correcting output codes

334

7.6

Using unlabeled data

337

Clustering for classiﬁcation

337

Co-training

339

EM and co-training

340

7.7

Further reading

341

x i i

C O N T E N TS

P088407-FM.qxd 4/30/05 10:55 AM Page xii

Yüklə 4,3 Mb.

Dostları ilə paylaş:

1 2 3 4 5 6 7 8 9 ... 219