These meanings have some shortcomings when it
comes to talking about com-
puters. For the first two, it is virtually impossible to test whether learning has
been achieved or not. How do you know whether a machine has got knowledge
of something? You probably can’t just ask it questions; even if you could, you
wouldn’t be testing its ability to learn but would be testing its ability to answer
questions. How do you know whether it has become aware of something? The
whole question of whether computers can be aware, or conscious, is a burning
philosophic issue. As for the last three meanings, although we can see what they
denote in human terms, merely “committing to memory” and “receiving
instruction” seem to fall far short of what we might mean by machine learning.
They are too passive, and we know that computers find these tasks trivial.
Instead, we are interested in improvements in performance, or at least in the
potential for performance, in new situations. You can “commit something to
memory” or “be informed of something” by rote learning without being able to
apply the new knowledge to new situations. You can receive instruction without
benefiting from it at all.
Earlier we defined data mining operationally as the process of discovering
patterns, automatically or semiautomatically, in large quantities of data—and
the patterns must be useful. An operational definition can be formulated in the
same way for learning:
Things learn when they change their behavior in a way that makes them
perform better in the future.
This ties learning to performance rather than knowledge. You can test learning
by observing the behavior and comparing it with past behavior. This is a much
more objective kind of definition and appears to be far more satisfactory.
But there’s still a problem. Learning is a rather slippery concept. Lots of things
change their behavior in ways that make them perform better in the future, yet
we wouldn’t want to say that they have actually learned. A good example is a
comfortable slipper. Has it learned the shape of your foot? It has certainly
changed its behavior to make it perform better as a slipper! Yet we would hardly
want to call this learning. In everyday language, we often use the word “train-
ing” to denote a mindless kind of learning. We train animals and even plants,
although it would be stretching the word a bit to talk of training objects such
as slippers that are not in any sense alive. But learning is different. Learning
implies thinking. Learning implies purpose. Something that learns has to do so
intentionally. That is why we wouldn’t say that a vine has learned to grow round
a trellis in a vineyard—we’d say it has been trained. Learning without purpose
is merely training. Or, more to the point, in learning the purpose is the learner’s,
whereas in training it is the teacher’s.
Thus on closer examination the second definition of learning, in operational,
performance-oriented terms, has its own problems when it comes to talking about
8
C H A P T E R 1
|
W H AT ’ S I T A L L A B O U T ?
P088407-Ch001.qxd 4/30/05 11:11 AM Page 8
computers. To decide whether something has actually learned, you need to see
whether it intended to or whether there was any purpose involved. That makes
the concept moot when applied to machines because whether artifacts can behave
purposefully is unclear. Philosophic discussions of what is really meant by “learn-
ing,” like discussions of what is really meant by “intention” or “purpose,” are
fraught with difficulty. Even courts of law find intention hard to grapple with.
Data mining
Fortunately, the kind of learning techniques explained in this book do not
present these conceptual problems—they are called machine learning without
really presupposing any particular philosophic stance about what learning actu-
ally is. Data mining is a practical topic and involves learning in a practical, not
a theoretical, sense. We are interested in techniques for finding and describing
structural patterns in data as a tool for helping to explain that data and make
predictions from it. The data will take the form of a set of examples—examples
of customers who have switched loyalties, for instance, or situations in which
certain kinds of contact lenses can be prescribed. The output takes the form of
predictions about new examples—a prediction of whether a particular customer
will switch or a prediction of what kind of lens will be prescribed under given
circumstances. But because this book is about finding and describing patterns
in data, the output may also include an actual description of a structure that
can be used to classify unknown examples to explain the decision. As well as
performance, it is helpful to supply an explicit representation of the knowledge
that is acquired. In essence, this reflects both definitions of learning considered
previously: the acquisition of knowledge and the ability to use it.
Many learning techniques look for structural descriptions of what is learned,
descriptions that can become fairly complex and are typically expressed as sets
of rules such as the ones described previously or the decision trees described
later in this chapter. Because they can be understood by people, these descrip-
tions serve to explain what has been learned and explain the basis for new pre-
dictions. Experience shows that in many applications of machine learning to
data mining, the explicit knowledge structures that are acquired, the structural
descriptions, are at least as important, and often very much more important,
than the ability to perform well on new examples. People frequently use data
mining to gain knowledge, not just predictions. Gaining knowledge from data
certainly sounds like a good idea if you can do it. To find out how, read on!
1.2 Simple examples: The weather problem and others
We use a lot of examples in this book, which seems particularly appropriate con-
sidering that the book is all about learning from examples! There are several
1 . 2
S I M P L E E X A M P L E S : T H E W E AT H E R P RO B L E M A N D OT H E R S
9
P088407-Ch001.qxd 4/30/05 11:11 AM Page 9