Data Mining: Practical Machine Learning Tools and Techniques, Second Edition



Yüklə 4,3 Mb.
Pdf görüntüsü
səhifə19/219
tarix08.10.2017
ölçüsü4,3 Mb.
#3816
1   ...   15   16   17   18   19   20   21   22   ...   219

1 . 2

S I M P L E   E X A M P L E S : T H E  W E AT H E R   P RO B L E M  A N D   OT H E R S

1 9

wage increase first year



bad

 2.5



statutory holidays

> 2.5


good

> 10


wage increase first year

 10



bad

 4 



good

> 4


(a)

 2.5



statutory holidays

> 2.5


bad

 36



health plan contribution

> 36


good

> 10


wage increase

first year

wage increase

first year

working hours

per week


 10


bad

none


good

half


bad

full


bad

 4



good

> 4


(b)

Figure

1.3

Decision t

rees for the labor negotiations data.

P088407-Ch001.qxd  4/30/05  11:11 AM  Page 19




680 examples, each representing a diseased plant. Plants were measured on 35

attributes, each one having a small set of possible values. Examples are labeled

with the diagnosis of an expert in plant biology: there are 19 disease categories

altogether—horrible-sounding diseases such as diaporthe stem canker, rhizoc-

tonia root rot, and bacterial blight, to mention just a few.

Table 1.7 gives the attributes, the number of different values that each can

have, and a sample record for one particular plant. The attributes are placed into

different categories just to make them easier to read.

Here are two example rules, learned from this data:

If

[leaf condition is normal and



stem condition is abnormal and

stem cankers is below soil line and

canker lesion color is brown]

then


diagnosis is rhizoctonia root rot

If

[leaf malformation is absent and



stem condition is abnormal and

stem cankers is below soil line and

canker lesion color is brown]

then


diagnosis is rhizoctonia root rot

These rules nicely illustrate the potential role of prior knowledge—often called



domain knowledge—in machine learning, because the only difference between

the two descriptions is leaf condition is normal versus leaf malformation is

absent. Now, in this domain, if the leaf condition is normal then leaf malfor-

mation is necessarily absent, so one of these conditions happens to be a special

case of the other. Thus if the first rule is true, the second is necessarily true as

well. The only time the second rule comes into play is when leaf malformation

is absent but leaf condition is not normal, that is, when something other than

malformation is wrong with the leaf. This is certainly not apparent from a casual

reading of the rules.

Research on this problem in the late 1970s found that these diagnostic rules

could be generated by a machine learning algorithm, along with rules for every

other disease category, from about 300 training examples. These training 

examples were carefully selected from the corpus of cases as being quite differ-

ent from one another—“far apart” in the example space. At the same time, the

plant pathologist who had produced the diagnoses was interviewed, and his

expertise was translated into diagnostic rules. Surprisingly, the computer-

generated rules outperformed the expert-derived rules on the remaining test

examples. They gave the correct disease top ranking 97.5% of the time com-

pared with only 72% for the expert-derived rules. Furthermore, not only did

2 0


C H A P T E R   1

|

W H AT ’ S   I T   A L L   A B O U T ?



P088407-Ch001.qxd  4/30/05  11:11 AM  Page 20


1 . 2

S I M P L E   E X A M P L E S : T H E  W E AT H E R   P RO B L E M  A N D   OT H E R S

2 1

Table 1.7

The soybean data.

Number


Attribute

of values

Sample value

Environment

time of occurrence

7

July



precipitation

3

above normal



temperature

3

normal



cropping history

4

same as last year



hail damage

2

yes



damaged area

4

scattered



severity

3

severe



plant height

2

normal



plant growth

2

abnormal



seed treatment

3

fungicide



germination

3

less than 80%



Seed

condition

2

normal


mold growth

2

absent



discoloration

2

absent



size

2

normal



shriveling

2

absent



Fruit

condition of fruit pods

3

normal


fruit spots

5



Leaf

condition

2

abnormal


leaf spot size

3



yellow leaf spot halo

3

absent



leaf spot margins

3



shredding

2

absent



leaf malformation

2

absent



leaf mildew growth

3

absent



Stem

condition

2

abnormal


stem lodging

2

yes



stem cankers

4

above soil line



canker lesion color

3



fruiting bodies on stems

2

present



external decay of stem

3

firm and dry



mycelium on stem

2

absent



internal discoloration

3

none



sclerotia

2

absent



Root

condition

3

normal


Diagnosis

diaporthe stem

19

canker


P088407-Ch001.qxd  4/30/05  11:11 AM  Page 21


Yüklə 4,3 Mb.

Dostları ilə paylaş:
1   ...   15   16   17   18   19   20   21   22   ...   219




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©www.genderi.org 2024
rəhbərliyinə müraciət

    Ana səhifə