Data Mining: Practical Machine Learning Tools and Techniques, Second Edition



Yüklə 4,3 Mb.
Pdf görüntüsü
səhifə21/219
tarix08.10.2017
ölçüsü4,3 Mb.
#3816
1   ...   17   18   19   20   21   22   23   24   ...   219

sharpness and jaggedness of the boundaries, proximity to other regions, and

information about the background in the vicinity of the region. Finally, stan-

dard learning techniques are applied to the resulting attribute vectors.

Several interesting problems were encountered. One is the scarcity of train-

ing data. Oil slicks are (fortunately) very rare, and manual classification is

extremely costly. Another is the unbalanced nature of the problem: of the many

dark regions in the training data, only a very small fraction are actual oil slicks.

A third is that the examples group naturally into batches, with regions drawn

from each image forming a single batch, and background characteristics vary

from one batch to another. Finally, the performance task is to serve as a filter,

and the user must be provided with a convenient means of varying the false-

alarm rate.



Load forecasting

In the electricity supply industry, it is important to determine future demand

for power as far in advance as possible. If accurate estimates can be made for

the maximum and minimum load for each hour, day, month, season, and year,

utility companies can make significant economies in areas such as setting the

operating reserve, maintenance scheduling, and fuel inventory management.

An automated load forecasting assistant has been operating at a major utility

supplier over the past decade to generate hourly forecasts 2 days in advance. The

first step was to use data collected over the previous 15 years to create a sophis-

ticated load model manually. This model had three components: base load for

the year, load periodicity over the year, and the effect of holidays. To normalize

for the base load, the data for each previous year was standardized by subtract-

ing the average load for that year from each hourly reading and dividing by the

standard deviation over the year. Electric load shows periodicity at three fun-

damental frequencies: diurnal, where usage has an early morning minimum and

midday and afternoon maxima; weekly, where demand is lower at weekends;

and seasonal, where increased demand during winter and summer for heating

and cooling, respectively, creates a yearly cycle. Major holidays such as Thanks-

giving, Christmas, and New Year’s Day show significant variation from the

normal load and are each modeled separately by averaging hourly loads for that

day over the past 15 years. Minor official holidays, such as Columbus Day, are

lumped together as school holidays and treated as an offset to the normal

diurnal pattern. All of these effects are incorporated by reconstructing a year’s

load as a sequence of typical days, fitting the holidays in their correct position,

and denormalizing the load to account for overall growth.

Thus far, the load model is a static one, constructed manually from histori-

cal data, and implicitly assumes “normal” climatic conditions over the year. The

final step was to take weather conditions into account using a technique that

2 4

C H A P T E R   1



|

W H AT ’ S   I T   A L L   A B O U T ?

P088407-Ch001.qxd  4/30/05  11:11 AM  Page 24



locates the previous day most similar to the current circumstances and uses the

historical information from that day as a predictor. In this case the prediction

is treated as an additive correction to the static load model. To guard against

outliers, the eight most similar days are located and their additive corrections

averaged. A database was constructed of temperature, humidity, wind speed,

and cloud cover at three local weather centers for each hour of the 15-year 

historical record, along with the difference between the actual load and that 

predicted by the static model. A linear regression analysis was performed to

determine the relative effects of these parameters on load, and the coefficients

were used to weight the distance function used to locate the most similar days.

The resulting system yielded the same performance as trained human fore-

casters but was far quicker—taking seconds rather than hours to generate a daily

forecast. Human operators can analyze the forecast’s sensitivity to simulated

changes in weather and bring up for examination the “most similar” days that

the system used for weather adjustment.

Diagnosis

Diagnosis is one of the principal application areas of expert systems. Although

the handcrafted rules used in expert systems often perform well, machine learn-

ing can be useful in situations in which producing rules manually is too labor

intensive.

Preventative maintenance of electromechanical devices such as motors and

generators can forestall failures that disrupt industrial processes. Technicians

regularly inspect each device, measuring vibrations at various points to deter-

mine whether the device needs servicing. Typical faults include shaft misalign-

ment, mechanical loosening, faulty bearings, and unbalanced pumps. A

particular chemical plant uses more than 1000 different devices, ranging from

small pumps to very large turbo-alternators, which until recently were diag-

nosed by a human expert with 20 years of experience. Faults are identified by

measuring vibrations at different places on the device’s mounting and using

Fourier analysis to check the energy present in three different directions at each

harmonic of the basic rotation speed. This information, which is very noisy

because of limitations in the measurement and recording procedure, is studied

by the expert to arrive at a diagnosis. Although handcrafted expert system rules

had been developed for some situations, the elicitation process would have to

be repeated several times for different types of machinery; so a learning

approach was investigated.

Six hundred faults, each comprising a set of measurements along with the

expert’s diagnosis, were available, representing 20 years of experience in the

field. About half were unsatisfactory for various reasons and had to be discarded;

the remainder were used as training examples. The goal was not to determine

1 . 3


F I E L D E D   A P P L I C AT I O N S

2 5


P088407-Ch001.qxd  4/30/05  11:11 AM  Page 25


Yüklə 4,3 Mb.

Dostları ilə paylaş:
1   ...   17   18   19   20   21   22   23   24   ...   219




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©www.genderi.org 2024
rəhbərliyinə müraciət

    Ana səhifə