Automation of Data Mining Using Integration Services


Create Data Flow tasks to Archive the Results



Yüklə 145,11 Kb.
səhifə8/9
tarix08.10.2017
ölçüsü145,11 Kb.
#3798
1   2   3   4   5   6   7   8   9

Create Data Flow tasks to Archive the Results


Remember that you created separate predictions for the data values, Amount and Quantity, to avoid dealing with a lot of nulls in the results. Next they are merged back together for reporting, to make a table that looks like this.

Job ID

Time executed

Series

Time slice

Prediction type

Predicted value

StDev

Variance

012

2010-09-20

M200 Europe

January 2012

Sales Amount

283880

nnn

nnn

012

2010-09-20

M200 Europe

February 2012

Sales Amount

251225

nnn

nnn

012

2010-09-20

M200 Europe

January 2012

Sales Quantity

507

nnn

nnn

012

2010-09-20

M200 Europe

February 2012

Sales Quantity

758

nnn

nnn

You can also add any extra metadata that might be useful later, such as the date the predictions were generated, a job ID, and so forth.

Let’s take another look at the DMX query statements used to generate the predictions.

SELECT $TIME as NewTime, Amount as NewValue, PredictStDev([Amount])as ValueStDev,PredictVariance([Amount]) as ValueVariance FROM PredictTimeSeries

Ordinarily the default column names that are generated in a prediction query are named by default based on the predictable column name, so the names would be something like PredictAmount and PredictQuantity. However, you can use a column alias in the output (here, it is NewValue) to make it easier to combine predicted values.



Again, because Integration Services is so flexible, there are lots of ways you might accomplish this task:

  • Store results in memory and merge them before writing to the archive table.

  • Store the results in different columns, one for each prediction type.

  • Write the results to temporary tables and merge them later.

  • Use the Integration Services raw file format to quickly write out and then read the interim results.

However, in this scenario, you want to verify the prediction data that is generated by each query. So you use the following approach:

  • Write predictions to a temporary table.

  • Use an OLE DB Source component to get the predictions that were written to the temporary table.

  • Use a Derived Column transformation to clean up the data and add some simple metadata.

  • Save the results to the archive table that is used for reporting on all models.

The graphic illustrates the overall task flow within each Data Flow task.


Create Data Flow tasks (Archive Results Qty, Archive Results Amt)


  1. Within the loop Predict Foreach Model, create two Data Flow tasks and name them Archive Results Qty and Archive Results Amt.

  2. Connect each Data Flow task to its related Data Mining Query task, in the order shown in the earlier Control Flow diagram for Package 3.

Note: You must have these tasks in a sequence, because they use the same temporary table and archive table. If Integration Services executes the tasks in parallel, the processes could create conflicts when attempting to access the same table.

  1. In each Data Flow task, add the following three components:

  • An OLE BD data source that reads from tmpPredictionResults

  • A Derived Column transformation as defined in the following table

  • An OLE DB destination that writes to table ArchivedPredictions

  1. Create expressions in each Derived Column transformation, to generate the data for the new columns as follows.

Task name

Derived column name

Data type

Value

Archive Results Qty

PredictionDate

datetime

GETDATE()

Archive Results Qty

PredictedValue

string

Amount

Archive Results Amt

PredictionDate

datetime

GETDATE()

Archive Results Amt

PredictedValue

string

Quantity

Tip: Isolating the data flows for each prediction type has another advantage: it is much, much easier to modify the package later. For example, you might decide that there is no good reason for creating a separate prediction for quantity. Instead of editing your query or the output, you can just disable that part of the package and it will still run without modification – you just won’t have predictions for Quantity.

Yüklə 145,11 Kb.

Dostları ilə paylaş:
1   2   3   4   5   6   7   8   9




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©www.genderi.org 2024
rəhbərliyinə müraciət

    Ana səhifə