1 . 2
S I M P L E E X A M P L E S : T H E W E AT H E R P RO B L E M A N D OT H E R S
1 9
wage increase first year
bad
≤
2.5
statutory holidays
> 2.5
good
> 10
wage increase first year
≤
10
bad
≤
4
good
> 4
(a)
≤
2.5
statutory holidays
> 2.5
bad
≤
36
health plan contribution
> 36
good
> 10
wage increase
first year
wage increase
first year
working hours
per week
≤
10
bad
none
good
half
bad
full
bad
≤
4
good
> 4
(b)
Figure
1.3
Decision t
rees for the labor negotiations data.
P088407-Ch001.qxd 4/30/05 11:11 AM Page 19
680
examples, each representing a diseased plant. Plants were measured on 35
attributes, each one having a small set of possible values. Examples are labeled
with the diagnosis of an expert in plant biology: there are 19 disease categories
altogether—horrible-sounding diseases such as diaporthe stem canker, rhizoc-
tonia root rot, and bacterial blight, to mention just a few.
Table 1.7 gives the attributes, the number of different values that each can
have, and a sample record for one particular plant. The attributes are placed into
different categories just to make them easier to read.
Here are two example rules, learned from this data:
If
[leaf condition is normal and
stem condition is abnormal and
stem cankers is below soil line and
canker lesion color is brown]
then
diagnosis is rhizoctonia root rot
If
[leaf malformation is absent and
stem condition is abnormal and
stem cankers is below soil line and
canker lesion color is brown]
then
diagnosis is rhizoctonia root rot
These rules nicely illustrate the potential role of prior knowledge—often called
domain knowledge—in machine learning, because the only difference between
the two descriptions is leaf condition is normal versus leaf malformation is
absent. Now, in this domain, if the leaf condition is normal then leaf malfor-
mation is necessarily absent, so one of these conditions happens to be a special
case of the other. Thus if the first rule is true, the second is necessarily true as
well. The only time the second rule comes into play is when leaf malformation
is absent but leaf condition is not normal, that is, when something other than
malformation is wrong with the leaf. This is certainly not apparent from a casual
reading of the rules.
Research on this problem in the late 1970s found that these diagnostic rules
could be generated by a machine learning algorithm, along with rules for every
other disease category, from about 300 training examples. These training
examples were carefully selected from the corpus of cases as being quite differ-
ent from one another—“far apart” in the example space. At the same time, the
plant pathologist who had produced the diagnoses was interviewed, and his
expertise was translated into diagnostic rules. Surprisingly, the computer-
generated rules outperformed the expert-derived rules on the remaining test
examples. They gave the correct disease top ranking 97.5% of the time com-
pared with only 72% for the expert-derived rules. Furthermore, not only did
2 0
C H A P T E R 1
|
W H AT ’ S I T A L L A B O U T ?
P088407-Ch001.qxd 4/30/05 11:11 AM Page 20
1 . 2
S I M P L E E X A M P L E S : T H E W E AT H E R P RO B L E M A N D OT H E R S
2 1
Table 1.7
The soybean data.
Number
Attribute
of values
Sample value
Environment
time of occurrence
7
July
precipitation
3
above normal
temperature
3
normal
cropping history
4
same as last year
hail damage
2
yes
damaged area
4
scattered
severity
3
severe
plant height
2
normal
plant growth
2
abnormal
seed treatment
3
fungicide
germination
3
less than 80%
Seed
condition
2
normal
mold growth
2
absent
discoloration
2
absent
size
2
normal
shriveling
2
absent
Fruit
condition of fruit pods
3
normal
fruit spots
5
—
Leaf
condition
2
abnormal
leaf
spot size
3
—
yellow leaf spot halo
3
absent
leaf spot margins
3
—
shredding
2
absent
leaf malformation
2
absent
leaf mildew growth
3
absent
Stem
condition
2
abnormal
stem lodging
2
yes
stem cankers
4
above soil line
canker lesion color
3
—
fruiting bodies on stems
2
present
external decay of stem
3
firm and dry
mycelium on stem
2
absent
internal discoloration
3
none
sclerotia
2
absent
Root
condition
3
normal
Diagnosis
diaporthe stem
19
canker
P088407-Ch001.qxd 4/30/05 11:11 AM Page 21