Automation of Data Mining Using Integration Services

Create Data Flow tasks to Archive the Results

Yüklə 145,11 Kb.

səhifə	8/9
tarix	08.10.2017
ölçüsü	145,11 Kb.
	#3798

1 2 3 4 5 6 7 8 9

Create Data Flow tasks (Archive Results Qty, Archive Results Amt)

Create Data Flow tasks to Archive the Results

Remember that you created separate predictions for the data values, Amount and Quantity, to avoid dealing with a lot of nulls in the results. Next they are merged back together for reporting, to make a table that looks like this.

Job ID	Time executed	Series	Time slice	Prediction type	Predicted value	StDev	Variance
012	2010-09-20	M200 Europe	January 2012	Sales Amount	283880	nnn	nnn
012	2010-09-20	M200 Europe	February 2012	Sales Amount	251225	nnn	nnn
012	2010-09-20	M200 Europe	January 2012	Sales Quantity	507	nnn	nnn
012	2010-09-20	M200 Europe	February 2012	Sales Quantity	758	nnn	nnn

You can also add any extra metadata that might be useful later, such as the date the predictions were generated, a job ID, and so forth.

Let’s take another look at the DMX query statements used to generate the predictions.

SELECT $TIME as NewTime, Amount as NewValue, PredictStDev([Amount])as ValueStDev,PredictVariance([Amount]) as ValueVariance FROM PredictTimeSeries

Ordinarily the default column names that are generated in a prediction query are named by default based on the predictable column name, so the names would be something like PredictAmount and PredictQuantity. However, you can use a column alias in the output (here, it is NewValue) to make it easier to combine predicted values.

Again, because Integration Services is so flexible, there are lots of ways you might accomplish this task:

Store results in memory and merge them before writing to the archive table.
Store the results in different columns, one for each prediction type.
Write the results to temporary tables and merge them later.
Use the Integration Services raw file format to quickly write out and then read the interim results.

However, in this scenario, you want to verify the prediction data that is generated by each query. So you use the following approach:

Write predictions to a temporary table.
Use an OLE DB Source component to get the predictions that were written to the temporary table.
Use a Derived Column transformation to clean up the data and add some simple metadata.
Save the results to the archive table that is used for reporting on all models.

The graphic illustrates the overall task flow within each Data Flow task.

Create Data Flow tasks (Archive Results Qty, Archive Results Amt)

Within the loop Predict Foreach Model, create two Data Flow tasks and name them Archive Results Qty and Archive Results Amt.
Connect each Data Flow task to its related Data Mining Query task, in the order shown in the earlier Control Flow diagram for Package 3.

Note: You must have these tasks in a sequence, because they use the same temporary table and archive table. If Integration Services executes the tasks in parallel, the processes could create conflicts when attempting to access the same table.

In each Data Flow task, add the following three components:

An OLE BD data source that reads from tmpPredictionResults
A Derived Column transformation as defined in the following table
An OLE DB destination that writes to table ArchivedPredictions

Create expressions in each Derived Column transformation, to generate the data for the new columns as follows.

Task name	Derived column name	Data type	Value
Archive Results Qty	PredictionDate	datetime	GETDATE()
Archive Results Qty	PredictedValue	string	Amount
Archive Results Amt	PredictionDate	datetime	GETDATE()
Archive Results Amt	PredictedValue	string	Quantity

Tip: Isolating the data flows for each prediction type has another advantage: it is much, much easier to modify the package later. For example, you might decide that there is no good reason for creating a separate prediction for quantity. Instead of editing your query or the output, you can just disable that part of the package and it will still run without modification – you just won’t have predictions for Quantity.

Yüklə 145,11 Kb.

Dostları ilə paylaş:

1 2 3 4 5 6 7 8 9