Using Data Profiling Task in SSIS


The Data Profiling task is a new task in the Integration Services toolbox. Data Profiling in SSIS provides profiling functionality for the data. Data Profiling helps to analyze the data more efficiently by providing the statistical information for table(s) in the database such as

· Number of rows in the table

· Number of nulls in each columns of a table

· Number of distinct values in each columns of a table etc.

Why we need Data Profiling? The statistical information generated by Data Profiling can be use to efficiently minimize the data quality issues that might occur from source data.

I have created a sample SSIS package which helps you to understand working with Data Profile Task. For this example I have used AdventureWorksLT database.

1. Open SQL Server Business Intelligence Development Studio (BIDS)

2. Create new SSIS project by selecting the “Integration Services Project”. Now you are ready to develop completely new SSIS project.

3. Selecting Data Profiling Task – Drag the “Data Profiling Task” to Component Flow from toolbox, so that your screen will looks like as below…

SSIS-DataProfileTask-1

4. Setting properties – Data Profiling task needs some inputs to run it such as Source – the input for which the statistics should be collected, Destination – where it should be collected, and for which entities should collect the statistics, This can be achieve by by providing some additional information to Data Profiling Task

  • Open the Data Profiling Task Editor (double –click on the Data profiler Task to open Editor) as  shown below.

SSIS-DataProfileTask-2 

  • Here you can set Source by clicking on Quick profile option. You can  connect to database using the ADO.net Connection option. In this case I have connected to my Local Server and AdventureWorksLT database. and select which statistics you want to calculate as well as view such as Number of rows in the table, Number of nulls in each columns of a table, Number of distinct values in each columns of a table etc. Here you can select just one or all tables with Table or View option.

SSIS-DataProfileTask-3

1. Source connection string – When you create any connection string in package, it always preserve the connection string in connection manager.

2. Statistical information calculation options.

  • You can set the destination file by clicking on General menu in Data Profiling Editor as shown in the below screen shot

SSIS-DataProfileTask-4 

5. Data Collection – Now we are ready to execute the package to collect the statistics for selected tables. Execute the package to collect the statistical information. This will store information at the Destination you have selected (step # 4)

6. Viewing the Data Profiling output – To view the Data Profiling output you need Data Profile Viewer, which can be found under Programs -> Microsoft SQL Server 2008 -> Integration Services OR (C:\Program Files\Microsoft SQL Server\100\DTS\Binn\DataProfileViewer.exe). Refer the below screen shot.

SSIS-DataProfileTask-5 

This is just a “How to use Data Profiling Task in SSIS” article, for further information Click here.

Advertisements

2 thoughts on “Using Data Profiling Task in SSIS

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s