The Merge Join Transformation and an SSIS Sorting Tip
My previous blog was about the SSIS Lookup task and how it really works. Now that I have shown that the Lookup task shouldn’t be used for one-to-many or many-to-many joins; let’s take a look at the Merge Join transformation task. If you follow along with this blog, you will learn a little tip that will eliminate the requirement for you to add a SORT transformation task within your data flow task.
Previously, we isolated our results sets down to one employee in the AdventureWorks database and joined two tables together. I’m going to do the same thing here. This time I am going to introduce a third table into the join logic. The three tables are listed below:
- Employee Department History
- Person
- Department
Here is what it would look like via SSMS with T-SQL.
Let’s see how we can mimic this result set in SSIS without using T-SQL to join the three tables. First, I want to say it is not always going to be the best option not to use T-SQL instead of individual tasks in SSIS. I have learned over time that sometimes it is just easier to write the join logic directly in you data flow source task. Maybe a future post will be on when to use T-SQL verse SSIS tasks. However, this is for demonstration purposes.
Let’s say you received a request to extract a result set, order the results set, and load it to another location. Here is what your package would look like in SSIS using the Merge Join Transformation task.
Here are our results:
Notice, I used the SORT transformation task in the example above. I used this to depict what has to occur in a step by step approach.
- Extracted data from the Person and Employee Department History tables
- Sorted each result set
- Merged the two results into one using inner join logic
- Extracted Data from The Department table
- Sorted the first joined result set and the Department result set
- Merge the Joined result set from Persons and Employee History with the Department Table
Let’s talk about best practice for this example. This is where the Sort tip is introduced, since we need an ordered result set per the request we are using the merge transformation instead of the union all task. Additionally, we used the Sort task. The Sort task can heavily impact the performance of an SSIS package. Particularly, when you have larger result sets than what we are going to extract from the AdventureWorks database.
Best practice is to bring in an ordered result set at the source and then merge your record sets. Well, how do you do that? Let’s walk through ordering your result set at the source and configuring your source to define the sorted column for merging your record sets.
First, we open the task and add our ORDER BY clause to our source.
Now, close the source task and right click on the same source task and chose the show advance editor option.
Inside the advanced editor there are two specs that need to be defined to make this work.
- Click on the Input and Output Properties tab
- Click on the OLE DB Source Outputs
- Changed the IsSorted parameter to be “True”
- Drill down into the OLE DB Source Output to Output Columns
- Click on your column that you used in your ORDER BY clause
- Change your SortKeyPosition parameter from “0” to “1”
The desired results should look similar to below:
Now, you can remove each sort task that directly follows your OLE DB Source task by repeating the steps above to reconfigure each source editor. Now, my data flow task looks like this:
And we do get back the same results:
In case you are wondering why did he get rid of all of the Sort tasks except for the one that follows the first merge join? There are two reasons for this. My second join is on DepartmentID and most important the merge transformation task is not considered a data flow source task and does not come with the functionality to define the sorted order.
To conclude my second blog post of this series, the Merge Join transformation task can be used to merge columns from two different tables using join logic similar to the joins that can be used in T-SQL. We have looked at a step by step break down of what has to occur to implement a Merge Join transformation task as well as discussing some tips and best practice in regards to using the Sort task in SSIS. I hope this blog post has been informational and that you look forward to part 3.