Ramblings of a SQL Hillbilly

Or, the mad ravings of a data warehouse forklift operator

T-SQL Tuesday: Why So Serious?

For this month’s T-SQL Tuesday, Matt Velic has chosen the topic of Dirty Little Tricks you can play on your coworkers/developers/those $@%! consultants using T-SQL. I’m a bit impish, so naturally it was time to dust off the ol’ blog for this topic!

Anybody who follows me on Twitter knows that I’m often faced with problems of assumed ordering. (Remember kids – if you didn’t order it, it ain’t ordered! It just looks like it is!) So, my Dirty Little T-SQL Trick involves sneakily breaking SSIS.

Let’s say that I have a Data Flow Task that does some ETL for me. It pulls my source data, does a key lookup against the target, and then routes any no matches through a surrogate key generator. After any necessary key generation, everything is pulled back together in a Merge Join, then inserted or updated into the target.

Now, it’s important to note that you can’t just feed any old thing into a Merge Join. No, the inputs have to be sorted – or at least, they have to TELL you that they’re sorted. You can do this with a Sort Transformation, though that’s generally slower than sorting in your source. Now, you CAN use the Advanced Editor for your source, to set the IsSorted property of the Output to True. You also have to set one or more of the output columns as a sort key, but once you’ve done that, your Merge Join will believe that the input is sorted, and sorted just like you tell it.

So where’s the trick? Well, I mentioned before that you can’t just feed any old thing into a Merge Join. See, what it TELLS you is that it’s doing a join. But it’s not a SQL join, because in SQL order is not guaranteed. In SQL Server, Table A inner join Table B will give you an inner join, regardless of physical or logical ordering of the data. In SSIS, a Merge Join is essentially an array comparison that starts at the beginning of both arrays and reads through to the end. If your data is out of order, tough luck! And if the arrays are ordered correctly but in opposite order (ascending versus descending) then a very interesting thing happens – the Merge Join will match exactly one row!

So here’s my trick: Change the ORDER BY clause on the source query to order opposite of what the Merge Join expects. It won’t complain, and the solution is non-obvious (at least, non-obvious if you’ve never encountered it before). And it will leave your coworkers scratching their heads when they get only one row!