Data Mining: Practical Machine Learning Tools and Techniques, Second Edition



Yüklə 4,3 Mb.
Pdf görüntüsü
səhifə22/219
tarix08.10.2017
ölçüsü4,3 Mb.
#3816
1   ...   18   19   20   21   22   23   24   25   ...   219

whether or not a fault existed, but to diagnose the kind of fault, given that one

was there. Thus there was no need to include fault-free cases in the training set.

The measured attributes were rather low level and had to be augmented by inter-

mediate concepts, that is, functions of basic attributes, which were defined in

consultation with the expert and embodied some causal domain knowledge.

The derived attributes were run through an induction algorithm to produce a

set of diagnostic rules. Initially, the expert was not satisfied with the rules

because he could not relate them to his own knowledge and experience. For

him, mere statistical evidence was not, by itself, an adequate explanation.

Further background knowledge had to be used before satisfactory rules were

generated. Although the resulting rules were quite complex, the expert liked

them because he could justify them in light of his mechanical knowledge. He

was pleased that a third of the rules coincided with ones he used himself and

was delighted to gain new insight from some of the others.

Performance tests indicated that the learned rules were slightly superior to

the handcrafted ones that had previously been elicited from the expert, and this

result was confirmed by subsequent use in the chemical factory. It is interesting

to note, however, that the system was put into use not because of its good per-

formance but because the domain expert approved of the rules that had been

learned.


Marketing and sales

Some of the most active applications of data mining have been in the area of

marketing and sales. These are domains in which companies possess massive

volumes of precisely recorded data, data which—it has only recently been real-

ized—is potentially extremely valuable. In these applications, predictions them-

selves are the chief interest: the structure of how decisions are made is often

completely irrelevant.

We have already mentioned the problem of fickle customer loyalty and the

challenge of detecting customers who are likely to defect so that they can be

wooed back into the fold by giving them special treatment. Banks were early

adopters of data mining technology because of their successes in the use of

machine learning for credit assessment. Data mining is now being used to

reduce customer attrition by detecting changes in individual banking patterns

that may herald a change of bank or even life changes—such as a move to

another city—that could result in a different bank being chosen. It may reveal,

for example, a group of customers with above-average attrition rate who do

most of their banking by phone after hours when telephone response is slow.

Data mining may determine groups for whom new services are appropriate,

such as a cluster of profitable, reliable customers who rarely get cash advances

from their credit card except in November and December, when they are pre-

2 6

C H A P T E R   1



|

W H AT ’ S   I T   A L L   A B O U T ?

P088407-Ch001.qxd  4/30/05  11:11 AM  Page 26



pared to pay exorbitant interest rates to see them through the holiday season. In

another domain, cellular phone companies fight churn by detecting patterns of

behavior that could benefit from new services, and then advertise such services

to retain their customer base. Incentives provided specifically to retain existing

customers can be expensive, and successful data mining allows them to be pre-

cisely targeted to those customers where they are likely to yield maximum benefit.



Market basket analysis is the use of association techniques to find groups of

items that tend to occur together in transactions, typically supermarket check-

out data. For many retailers this is the only source of sales information that is

available for data mining. For example, automated analysis of checkout data

may uncover the fact that customers who buy beer also buy chips, a discovery

that could be significant from the supermarket operator’s point of view

(although rather an obvious one that probably does not need a data mining

exercise to discover). Or it may come up with the fact that on Thursdays, cus-

tomers often purchase diapers and beer together, an initially surprising result

that, on reflection, makes some sense as young parents stock up for a weekend

at home. Such information could be used for many purposes: planning store

layouts, limiting special discounts to just one of a set of items that tend to be

purchased together, offering coupons for a matching product when one of them

is sold alone, and so on. There is enormous added value in being able to iden-

tify individual customer’s sales histories. In fact, this value is leading to a pro-

liferation of discount cards or “loyalty” cards that allow retailers to identify

individual customers whenever they make a purchase; the personal data that

results will be far more valuable than the cash value of the discount. Identifica-

tion of individual customers not only allows historical analysis of purchasing

patterns but also permits precisely targeted special offers to be mailed out to

prospective customers.

This brings us to direct marketing, another popular domain for data mining.

Promotional offers are expensive and have an extremely low—but highly 

profitable—response rate. Any technique that allows a promotional mailout to

be more tightly focused, achieving the same or nearly the same response from

a much smaller sample, is valuable. Commercially available databases contain-

ing demographic information based on ZIP codes that characterize the associ-

ated neighborhood can be correlated with information on existing customers

to find a socioeconomic model that predicts what kind of people will turn out

to be actual customers. This model can then be used on information gained in

response to an initial mailout, where people send back a response card or call

an 800 number for more information, to predict likely future customers. Direct

mail companies have the advantage over shopping-mall retailers of having com-

plete purchasing histories for each individual customer and can use data mining

to determine those likely to respond to special offers. Targeted campaigns are

cheaper than mass-marketed campaigns because companies save money by

1 . 3

F I E L D E D   A P P L I C AT I O N S



2 7

P088407-Ch001.qxd  4/30/05  11:11 AM  Page 27




Yüklə 4,3 Mb.

Dostları ilə paylaş:
1   ...   18   19   20   21   22   23   24   25   ...   219




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©www.genderi.org 2024
rəhbərliyinə müraciət

    Ana səhifə