Right, so it has been a happy new year so far. That helped push up my optimism to a level that every problem seemed to come with a solution : -)
Now the problem I am going to describe is pretty commonplace – you have a source table with n columns of interest and you want to sync a target table with the data from the source table where:
1. If the target table already has data, update it
2. Otherwise insert data into the target table
Most of us application developers would think of this as a source result-set that needs to be traversed row-by-row (pain!) against the target result-set; updating or inserting data as described above. That would inevitably lead one to code a solution using cursors. Now all of us know that cursors are resource intensive and messy. Every “best-practices” guru would tell you to avoid them as long as you can. This led me to search for an alternative way of achieving the desired result.
For starters, database queries are meant to be declarative – meaning: you declare your intent and the database engine will find a way to do it in the most optimized way known to it. Application programming on the other hand is mostly instructive: we tell the system to do something and how to do it.
My search then landed me on this excellent resource: http://wiki.lessthandot.com/index.php/Cursors_and_How_to_Avoid_Them
E.g. You have an Employees_sync table that is the source for the data to be synchronized and Employees table where the update/insert (jargon alert: UPSERT) should happen, in a SQL 2008 environment you would write a query as below:
MERGE dbo.Employees AS Target USING (SELECT EmployeeNumber, FirstName, LastName, EmailID FROM Employees_sync EI) AS Source ON (Target.EmployeeNumber = Source.EmployeeNumber) WHEN MATCHED THEN UPDATE SET Target.FirstName = Source.FirstName, Target.LastName = Source.LastName, Target.EmailID =Source.EmailID WHEN NOT MATCHED BY TARGET THEN INSERT(EmployeeNumber, FirstName, LastName, EmailID) VALUES( Source.EmployeeNumber, Source.FirstName, Source.LastName, Source.EmailID);
(Column names should be self-explanatory)
If you have SQL 2005 or below, you would have to split this up into two batches:
-- Update UPDATE Employees SET FirstName = tm.FirstName, LastName = tm.LastName, EmailID = tm.EmailID FROM Employees_sync tm LEFT JOIN Employees m ON tm.EmployeeNumber = m.EmployeeNumber --INSERT INSERT INTO Employees (EmployeeNumber, FirstName, LastName, EmailID) SELECT EmployeeNumber, FirstName, LastName, EmailID FROM Employees_sync WHERE EmployeeNumber not in (SELECT EmployeeNumber FROM Employees)