Monday, October 23, 2006

SPSiteDataQuery vs. GetListItems

Mart Muller over at TamTam posted his thoughts about using the new SPSiteDataQuery object in MOSS 2007 to retrieve items in a document library without tripping over the annoying folder limitations of the much-maligned SPQuery class. This is handy method for getting list data via the object model but it also has some limitations and 'gotchas' to be aware of.

First, when retrieving list data, anecdotal tests have shown (and our experience has proven this to be absolutely true) that the object model is best suited for working with smaller data sets within web parts or controls in small to medium traffic environments (~10,000 users). Why is this? Nobody seems to know for sure but I suspect that it has to do with the overhead related to repeatedly calling the core DLL's that are also performing a number of other concurrent tasks (but I could very well be wrong; the cause may be up for debate but the results speak for themselves).

Second, as Maurice Prather and Bil Simser have pointed out here, here and here, all those SPSite and SPWeb objects you're throwing around in your code don't dispose of themselves - you have to handle that yourself (you are disposing of your garbage, aren't you?). Those little guys riding round on the back of the Garbage Collector truck never see those objects so they don't throw 'em on the rubbish pile - leave a few thousand of those things open and your application pool will come to a screeching halt.

Third, you have to keep a careful eye on that ViewFields element (and don't forget to write your query correctly). A blank value will return all the fields that the records have in common but not any fields that have blank or null values - these are omitted from the results. This is important when you're looking for a particular field only to discover it's not there at all - crash goes your code. Pass in a null value and you get the default view (such as AllItems.aspx) which may not contain any of the fields you're looking for. You have to specify ALL of the fields that you want to appear in the results. And, just to keep you on your toes, make sure to use the DisplayName value of the column in each < FieldRef > node and not the Name value (they are often quite different, especially in custom lists).

Finally, remember that object model methods of this type are always subject to the annoying but ever-present impersonation issue. Call SPSiteDataQuery in the wrong user context and you won't get very far. RevertToSelf, anyone?

Personally, I prefer to use the web services GetListItems() method to get items from a list because a) it's faster (I know, you don't believe me, but it's true - someone somewhere did a performance comparison once but I can't seem to find it again. UPDATE: Found it here but there aren't any hard metrics in the post. UPDATE II: Scroll to the bottom of this post for some very basic metrics), b) I have more granular control over user credentials (I can use an admin account to get the items then check the user perms to verify the access level - no impersonation required), c) the data is returned as an XMLNode object, and d) there's no garbage to worry about - once the TCP connection is closed the memory is released. Most importantly, the web services scale well, especially in shared services environments where you may have one portal context accessing list data from another portal context. That being said, I will probably use the SPSiteDataQuery object in one-off web parts where scalability and performance are not an issue because the code is short and sweet - which is an advantage unto itself, now isn't it?