There are times when using the XML data type in Microsoft SQL Server can be of great help. For example the XML data type is very useful when you want to return data in a hierarchical structure rather than a simple tabular structure.
There are several options for going about and returning XML from your queries rather than tabular data, but I will take a short look into the pros and cons of the two most popular options. These are: FOR XML AUTO and FOR XML PATH.
A point to note is that the XML returned in any of these queries will not output nodes if NULL database data is involved. That can potentially cause problems if you are expecting a specific XML structure at your front-end (ie: in your XSLT file). So if you want to return blank nodes as part of your XML you need to use the command: ELEMENTS XSINIL. Note that this forces your XML to return as nodes rather than the default syntax which returns the tables you are selecting from as nodes, and the data returned as attributes of your node.
For example the ELEMENTS XSINIL directive compares with a basic FOR XML AUTO XML result set as follows:
Basic FOR XML AUTO XML result format:
<tblStore Name="Sears" Description="buy some clothes"> <tblSalesPerson First_Name="Sam" Last_Name="Walley"> <tblAddress AddressDesc="432 Street" City="Calgary"> <tblPhone /> </tblAddress> </tblSalesPerson> </tblStore>
The same example result set as it will appear when the ELEMENTS XSINIL directive is added (as: FOR XML AUTO,ELEMENTS XSINIL ). Here is what the XML will look like:
<tblStore xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <Name>Sears</Name> <Description>buy some clothes</Description> <tblSalesPerson> <First_Name>Sam</First_Name> <Last_Name>Walley</Last_Name> <tblAddress> <AddressDesc>432 Street</AddressDesc> <City>Calgary</City> <tblPhone> <HomeOrBusiness xsi:nil="true" /> <Number xsi:nil="true" /> </tblPhone> </tblAddress> </tblSalesPerson> </tblStore>
As you can see, the first query did not return the HomeOrBusiness and Number fields belonging to phone because they contain null data. Only the second query that includes the ELEMENTS XSINIL directive returns fields containing null data.
Although using ELEMENTS XSINIL is situationally useful, this can be a handy trick to know.
Reviewing FOR XML AUTO
The simplest method of returning XML is to append the line FOR XML AUTO to the end of your SQL statement. This is also the most popular method of returning XML data from a database.
However, this is also the solution that gives you the least amount of control over the structure of your data. In many cases when the table structure is simple and relatively flat, then leaving things up to SQL Server to structure can be good enough. But if you are expecting to display data more than a few levels deep with a custom structure, then you’ll need to move onto something more sophisticated. That is where the FOR XML PATH statement comes into play.
As a good description of the FOR XML AUTO mode see the MSDN article: Use AUTO Mode with FOR XML.
A standard FOR XML AUTO query will return the tables you join in your SQL as the nodes of the XML document, with data from each table as elements contained within the node. If you are using nested tables in your FROM clause then things can get tricky fast. Rather than the nice heirarchical structure you were expecting, you will see nodes embedded in other nodes in an awkward manner. When this happens it is best to resolve to using correlated subqueries and FOR XML PATH to shape your XML at a more detailed level.
Reviewing FOR XML PATH
The FOR XML PATH statement was added with SQL Server 2005 after developers realized that they 1) need more control than FOR XML AUTO provides, and 2) do not want to work with as complicated a syntax like the FOR XML EXPLICIT syntax.
The FOR XML PATH syntax is quite straightforward to use. Just structure your queries as you normally would but add the FOR XML PATH directive to the end of your query. A slight difference is the introduction of allowing correlated subqueries to shape your XML. If you want detailed control of the hierarchical structure of the XML to be output, then you are able to use correlated subqueries to do so. The difference here between regular SQL statements is that in regular SQL you can only return one return value from a correlated subquery, whereas in the FOR XML PATH syntax the correlated subquery can return as many sub-nodes as you like, and you can nest the sub-nodes as deeply as you like.
If you do use correlated subqueries, then you’ll need to include the FOR XML PATH directive with each. As well, include the term TYPE to make sure that the XML output is properly formatted, or else it will be sent back escaped to your program which can be confusing. Here is an example of the directive syntax you can use with a correlated subquery:
FOR XML PATH(‘MyNodeData’),TYPE, ELEMENTS XSINIL
As you can see, the subquery assigns a node called MyNodeData to the results returned from the query. The Type directive ensures that proper XML is returned, and the ELEMENTS XSINIL makes sure that the structure of the XML is consistent even with NULL database values.
A real problem with using correlated subqueries is managing the WHERE conditions. If you join several subqueries to your main query and apply where conditions to your deepest subquery, then you cannot force a join-like behavior from the queries leading to your deepest query as you might find in an inner join. As an example, if your deepest query is for a home phone number where you have restricted the number to just those starting with (123), then the less deep query that your home phone number query is nested in cannot be restricted to return only phone numbers where the home phone number starts with (123). Instead you have to write an ever more complex set of where conditions based on the depth of your query.
In other words, your correlated subqueries will always prefer to return the full results of the query that is blind to the where clauses of sub-queries, so if you have nested correlated subqueries you will need to add a where condition to each subquery in order to manage the data returned.
This primarily becomes a problem if you are writing correlated subqueries several levels deep and want the results to look like an inner join. In my opinion this limitation is very awkward and really should have been addressed by the developers who designed the FOR XML PATH behavior.
To jump to another point, an interesting thing to note when working to shape your XML is that you can control the hierarchy yourself by adjusting the alias for the return parameter in your Select statement. So for example if you want your XML to have the nodes User/Name/FullName then you can set this return structure using the following SELECT statement syntax:
Select MyName [User/Name/FullName] FROM MyTable
This is similar to the EXPLICIT mode of using exclamation marks to determine the return field node hierarchy.
1 thought on “An Overview of Returning XML from an MS SQL Server T-SQL Query”