Data Mining: Practical Machine Learning Tools and Techniques, Second Edition



Yüklə 4,3 Mb.
Pdf görüntüsü
səhifə13/219
tarix08.10.2017
ölçüsü4,3 Mb.
#3816
1   ...   9   10   11   12   13   14   15   16   ...   219

Now, finally, we can say what this book is about. It is about techniques for

finding and describing structural patterns in data. Most of the techniques that

we cover have developed within a field known as machine learning. But first let

us look at what structural patterns are.



Describing structural patterns

What is meant by structural patterns? How do you describe them? And what

form does the input take? We will answer these questions by way of illustration

rather than by attempting formal, and ultimately sterile, definitions. There will

be plenty of examples later in this chapter, but let’s examine one right now to

get a feeling for what we’re talking about.

Look at the contact lens data in Table 1.1. This gives the conditions under

which an optician might want to prescribe soft contact lenses, hard contact

lenses, or no contact lenses at all; we will say more about what the individual

6

C H A P T E R   1



|

W H AT ’ S   I T   A L L   A B O U T ?



Table 1.1

The contact lens data.

Spectacle

Tear production

Recommended

Age

prescription



Astigmatism

rate


lenses

young


myope

no

reduced



none

young


myope

no

normal



soft

young


myope

yes


reduced

none


young

myope


yes

normal


hard

young


hypermetrope

no

reduced



none

young


hypermetrope

no

normal



soft

young


hypermetrope

yes


reduced

none


young

hypermetrope

yes

normal


hard

pre-presbyopic

myope

no

reduced



none

pre-presbyopic

myope

no

normal



soft

pre-presbyopic

myope

yes


reduced

none


pre-presbyopic

myope


yes

normal


hard

pre-presbyopic

hypermetrope

no

reduced



none

pre-presbyopic

hypermetrope

no

normal



soft

pre-presbyopic

hypermetrope

yes


reduced

none


pre-presbyopic

hypermetrope

yes

normal


none

presbyopic

myope

no

reduced



none

presbyopic

myope

no

normal



none

presbyopic

myope

yes


reduced

none


presbyopic

myope


yes

normal


hard

presbyopic

hypermetrope

no

reduced



none

presbyopic

hypermetrope

no

normal



soft

presbyopic

hypermetrope

yes


reduced

none


presbyopic

hypermetrope

yes

normal


none

P088407-Ch001.qxd  4/30/05  11:11 AM  Page 6




features mean later. Each line of the table is one of the examples. Part of a struc-

tural description of this information might be as follows:

If tear production rate 

= reduced then recommendation = none

Otherwise, if age 

= young and astigmatic = no 

then recommendation 

= soft


Structural descriptions need not necessarily be couched as rules such as these.

Decision trees, which specify the sequences of decisions that need to be made

and the resulting recommendation, are another popular means of expression.

This example is a very simplistic one. First, all combinations of possible

values are represented in the table. There are 24 rows, representing three possi-

ble values of age and two values each for spectacle prescription, astigmatism,

and tear production rate (3 

¥ 2 ¥ 2 ¥ 2 = 24). The rules do not really general-

ize from the data; they merely summarize it. In most learning situations, the set

of examples given as input is far from complete, and part of the job is to gen-

eralize to other, new examples. You can imagine omitting some of the rows in

the table for which tear production rate is reduced and still coming up with the

rule

If tear production rate 



= reduced then recommendation = none

which would generalize to the missing rows and fill them in correctly. Second,

values are specified for all the features in all the examples. Real-life datasets

invariably contain examples in which the values of some features, for some

reason or other, are unknown—for example, measurements were not taken or

were lost. Third, the preceding rules classify the examples correctly, whereas

often, because of errors or noise in the data, misclassifications occur even on the

data that is used to train the classifier.



Machine learning

Now that we have some idea about the inputs and outputs, let’s turn to machine

learning. What is learning, anyway? What is machine learning? These are philo-

sophic questions, and we will not be much concerned with philosophy in this

book; our emphasis is firmly on the practical. However, it is worth spending a

few moments at the outset on fundamental issues, just to see how tricky they

are, before rolling up our sleeves and looking at machine learning in practice.

Our dictionary defines “to learn” as follows:

To get knowledge of by study, experience, or being taught;

To become aware by information or from observation;

To commit to memory;

To be informed of, ascertain;

To receive instruction.

1 . 1


DATA   M I N I N G  A N D   M AC H I N E   L E A R N I N G

7

P088407-Ch001.qxd  4/30/05  11:11 AM  Page 7




Yüklə 4,3 Mb.

Dostları ilə paylaş:
1   ...   9   10   11   12   13   14   15   16   ...   219




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©www.genderi.org 2024
rəhbərliyinə müraciət

    Ana səhifə