Automation of Data Mining Using Integration Services

Add a Data Mining Query task after the Foreach Loop (Update Processing Status)

Yüklə 145,11 Kb.

səhifə	6/9
tarix	08.10.2017
ölçüsü	145,11 Kb.
	#3798

1 2 3 4 5 6 7 8 9

Add a Data Mining Query task after the Foreach Loop (Update Processing Status)

This task uses the Data Mining Query task to get the updated status of the mining models, and write that to a relational data table.

Right-click the Data Mining Query task you created before, because it has all the right connections and the correct query text, and then click Copy.
Paste the task after the Foreach Loop container and connect it to the loop.
Rename the task Update Processing Status.
Open the Data Mining Query Task Editor, click the Output tab, and verify that the option Drop and re-create the output table is selected.

This completes the package. You can now execute this package as before.

When you execute this package, the actual processing of each model can take a fairly long time, depending on how many models are available. You might want to add logging to the package to track the time used for processing each model.

Phase 4 - Create Predictions for All Models

In this package, you create prediction queries, using the Data Mining Query task, and run the queries for each of the models that you just created and processed:

You first use an Execute SQL task to query the list of models in the database, and save that list in a variable.
The Foreach Loop then uses the variable to first customize the prediction targets, by inserting the model name from the variable into the two Data Mining Query tasks.
You write the prediction results to a table in the relational database, using a pair of Data Flow tasks, which also write out some useful metadata.
The Analysis Services Execute DDL task executes the updated XMLA statement contained in the variable.

Create package and variables (PredictionsAllModels.dtsx)

Create a new Integration Services package and name it PredictionsAllModels.dtsx.
Create a variable with package scope as follows:

objProcessedModels Object

Create Execute SQL task (Get Processed Models)

Create a new Execute SQL task and name it Get Processed Models.
In the Execute SQL Task Editor, for Connection, use an OLE DB connection, and for the server, type /DM_Reporting.
For Result set, select Full result set.
For SQLSourceType, select Direct input.
For SQL Statement, type the following query text. (The purpose of adding the WHERE condition is to ensure that you do not create a prediction against a model that has not been processed, which would generate an error).

SELECT MODEL_NAME FROM dbo.tmpProcessingStatus

WHERE LAST_PROCESSED IS NOT null

On the Result Set tab, assign the columns in the result set to variables. There is only one column in the result set, so you assign the variable User::objProcessedModels to ResultSet 0 (zero).

Tip: When you are working with data mining models and especially when you are building complex queries, we recommend that you build DMX queries beforehand by opening the model directly in Business Intelligence Developer Studio and using Prediction Query Builder, or by launching Prediction Query Builder from SQL Server Management Studio. The reason for is that when you build queries by using the data mining designers in SQL Server Management Studio or Business Intelligence Developer Studio, Analysis Services does some validation, which enables you to browse and select valid objects. However, the Query Builder provided in the Data Mining Query task does not have this context and cannot validate or help with your selections.

Create Execute-SQL task (Get Series Names)

This task is not strictly necessary for prediction, but it creates data that is useful later for reporting.

Recall that the time series data mining model is based on sales data for different product lines in different regions, with each combination of a product line plus a region making a single series. For example, you can predict sales for the M200 product in Europe or the M200 product in North America. Here, the series name is extracted and stored in a table, making it easier to group and filter the predictions later in Reporting Services:

Add a new Execute SQL task and name it Get Series Names.
Connect it to the previous task.
In the Execute SQL Task Editor, choose OLE DB connection, and for Connection, type /DM_Reporting.
For Result set, select None.
For SQLSourceType, select Direct input.
For SQL Statement, type the following query text.

IF EXISTS

(SELECT [modelregion] FROM DMReporting.dbo.tmpModelRegions)

BEGIN

TRUNCATE TABLE DMReporting.dbo.tmpModelRegions

INSERT DMReporting.dbo.tmpModelRegions

SELECT DISTINCT [ModelRegion]

FROM AdventureWorksDW2008R2.dbo.vTimeSeries

END

Create Foreach Loop container (Predict Foreach Model)

In this Foreach Loop, you create two pairs of tasks: a prediction query plus data flow to handle the results, one for Amount and one for Quantity.

You might ask, why generate the results for Amount and Quantity separately, when the Prediction Query Builder allows you to predict both at once?

The reason is that the data mining query returns a nested rowset for each series you predict, but the providers in Integration Services can work only with flattened rowsets. If you predict both Amount and Quantity in one query, the rowset contains many nulls when flattened. Rather than try to remove the nulls and sort out the results, it is easier to generate a separate set of results and then combine them later in the Integration Services data flow.

Create a new Foreach Loop container and name it Predict Foreach Model. Connect it to the task, Get Processed Models.
With the Foreach Loop container selected, open the Variables window and create a new variable scoped to the task, as follows:

strModelName String

Return to the Foreach Loop Editor, and set the enumerator type to ForEach ADO enumerator.
In the Enumerator configuration pane, set ADO object source variable to User::objProcessedModels.
Set Enumeration mode to Rows in first table only.
In the Variables mapping pane, assign the variable User::strModelName to Index 0. Each row of the single-column table returned by the query is fed into the variable strModelName, which is in turn used to update the prediction query in the next set of tasks.

Yüklə 145,11 Kb.

Dostları ilə paylaş:

1 2 3 4 5 6 7 8 9