Avoiding Cursor in MSSQL server: The magic of set based operations

Hello World,

Right, so it has been a happy new year so far. That helped push up my optimism to a level that every problem seemed to come with a solution : -)
Now the problem I am going to describe is pretty commonplace – you have a source table with n columns of interest and you want to sync a target table with the data from the source table where:
1. If the target table already has data, update it
2. Otherwise insert data into the target table

Most of us application developers would think of this as a source result-set that needs to be traversed row-by-row (pain!) against the target result-set; updating or inserting data as described above. That would inevitably lead one to code a solution using cursors. Now all of us know that cursors are resource intensive and messy. Every “best-practices” guru would tell you to avoid them as long as you can. This led me to search for an alternative way of achieving the desired result.
For starters, database queries are meant to be declarative – meaning: you declare your intent and the database engine will find a way to do it in the most optimized way known to it. Application programming on the other hand is mostly instructive: we tell the system to do something and how to do it.
My search then landed me on this excellent resource: http://wiki.lessthandot.com/index.php/Cursors_and_How_to_Avoid_Them

E.g. You have an Employees_sync table that is the source for the data to be synchronized and Employees table where the update/insert (jargon alert: UPSERT) should happen, in a SQL 2008 environment you would write a query as below:

MERGE dbo.Employees AS Target
USING (SELECT EmployeeNumber, FirstName, LastName, EmailID FROM Employees_sync EI)
AS Source
ON (Target.EmployeeNumber = Source.EmployeeNumber)
SET Target.FirstName = Source.FirstName,
Target.LastName = Source.LastName,
Target.EmailID =Source.EmailID
INSERT(EmployeeNumber, FirstName, LastName, EmailID)
VALUES( Source.EmployeeNumber, Source.FirstName, Source.LastName, Source.EmailID);

(Column names should be self-explanatory)

If you have SQL 2005 or below, you would have to split this up into two batches:

-- Update
UPDATE Employees
FirstName = tm.FirstName,
LastName = tm.LastName,
EmailID = tm.EmailID
FROM Employees_sync tm
LEFT JOIN Employees m ON tm.EmployeeNumber = m.EmployeeNumber

INSERT INTO Employees (EmployeeNumber, FirstName, LastName, EmailID)
SELECT EmployeeNumber, FirstName, LastName, EmailID FROM Employees_sync
WHERE EmployeeNumber not in (SELECT EmployeeNumber FROM Employees)

Happy Coding!

Posted in MSSQL

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

  • Comic for July 25, 2017
    Dilbert readers - Please visit Dilbert.com to read this feature. Due to changes with our feeds, we are now making this RSS feed a link to Dilbert.com.
%d bloggers like this: