在 SQL Server 中访问 XML 列中的数据的首选方法
背景
最近我开始更多地使用 XML 作为 SQL Server 2005 中的列。在昨天的一段停机时间内,我注意到我使用的两个链接表确实很碍事,这让我感到无聊得流泪了为几个连接编写更多支持结构代码。
为了实际生成这两个链接表的数据,我将两个 XML 字段传递给存储过程,该存储过程写入主记录,将两个 XML 变量分解为 @tables 并将它们插入到具有新 的实际表中来自主记录的 SCOPE_IDENTITY()
。
经过一番考虑后,我决定完全删除这些表,而将 XML 存储在 XML 字段中。现在我明白这里存在一些陷阱,例如一般查询性能,GROUP BY
不适用于 XML 数据。查询通常有点混乱,但总的来说,我喜欢现在在取回数据时可以使用 XElement
。
而且,这些东西不会改变。这是一次性的事情,所以我不必担心修改。
我想知道实际获取这些数据的最佳方法。我的很多查询都涉及根据子记录甚至子子记录的标准获取主记录。数据库中的大多数存储过程都会执行此操作,但规模要复杂得多,通常需要 UDF 和子查询才能有效工作,但我已经敲出了一个简单的示例来测试查询某些数据......
INSERT INTO Customers VALUES ('Tom', '', '<PhoneNumbers><PhoneNumber Type="1" Value="01234 456789" /><PhoneNumber Type="2" Value="01746 482954" /></PhoneNumbers>')
INSERT INTO Customers VALUES ('Andy', '', '<PhoneNumbers><PhoneNumber Type="2" Value="07948 598348" /></PhoneNumbers>')
INSERT INTO Customers VALUES ('Mike', '', '<PhoneNumbers><PhoneNumber Type="3" Value="02875 482945" /></PhoneNumbers>')
INSERT INTO Customers VALUES ('Steve', '', '<PhoneNumbers></PhoneNumbers>')
现在我可以看到两种获取它的方法。
方法1
DECLARE @PhoneType INT
SET @PhoneType = 2
SELECT ct.*
FROM Customers ct
WHERE ct.PhoneNumbers.exist('/PhoneNumbers/PhoneNumber[@Type=sql:variable("@PhoneType")]') = 1
真的吗? sql:variable 感觉有点不健康。然而,它确实有效。然而,以更有意义的方式访问数据显然更加困难。
方法2
SELECT ct.*, pt.PhoneType
FROM Customers ct
CROSS APPLY ct.PhoneNumbers.nodes('/PhoneNumbers/PhoneNumber') AS nums(pn)
INNER JOIN PhoneTypes pt ON pt.ID = nums.pn.value('./@Type[1]', 'int')
WHERE nums.pn.value('./@Type[1]', 'int') = @PhoneType
这个比较像。我已经可以轻松扩展它来执行连接和所有其他好东西。我之前在表值函数上使用过CROSS APPLY
,效果非常好。与之前的查询相比,此查询的执行计划更加先进。诚然,我没有对这些表进行任何索引等操作,但它占整个批次成本的 97%。
方法 2(扩展)
SELECT ct.ID, ct.CustomerName, ct.Notes, pt.PhoneType
FROM Customers ct
CROSS APPLY ct.PhoneNumbers.nodes('/PhoneNumbers/PhoneNumber') AS nums(pn)
INNER JOIN PhoneTypes pt ON pt.ID = nums.pn.value('./@Type[1]', 'int')
WHERE nums.pn.value('./@Type[1]', 'int') IN (SELECT ID FROM PhoneTypes)
这里很好的 IN
子句。我还可以做类似 pt.PhoneType = 'Work'
最后
所以我基本上获得了我想要的结果,但是在使用这种机制来询问少量内容时我应该注意什么XML 数据?在复杂的搜索过程中它的性能会下降吗?存储这种标记样式的数据是否开销太大?
旁注:
我过去使用过诸如 sp_xml_preparedocument
和 OPENXML
之类的东西,只是将列表传递到存储过程中,但相比之下,这就像呼吸新鲜空气一样!
Background
Recently I've started to use XML a lot more as a column in SQL Server 2005. During a bit of downtime yesterday, I noticed that two of the link tables I used a really just in the way and it bores me to tears having to write yet more supporting structure code for a couple of joins.
To actually generate the data for these two link tables, I pass in two XML fields to my stored procedure, which writes the main record, breaks the two XML variables down into @tables and inserts them into the actual tables with the new SCOPE_IDENTITY()
from the master record.
After some though, I decided to just do away with those tables altogether and just store the XML in XML fields. Now I understand there are some pitfalls here, like general querying performance, GROUP BY
doesn't work on XML data. And the query is generally a bit of a mess, but overall I like that I can now work with XElement
when I get the data back.
Also, this stuff isn't going to get changed. It's a one shot affair, so I don't have to worry about modification.
I am wondering about the best way to actually get at this data. A lot of my queries involve getting a master record based upon the criteria of a child or even a subchild record. Most of the sprocs in the database do this but on a far more elaborate scale, usually requiring UDFs and Subqueries to work effectively but I have knocked up a trivial example to test querying some data...
INSERT INTO Customers VALUES ('Tom', '', '<PhoneNumbers><PhoneNumber Type="1" Value="01234 456789" /><PhoneNumber Type="2" Value="01746 482954" /></PhoneNumbers>')
INSERT INTO Customers VALUES ('Andy', '', '<PhoneNumbers><PhoneNumber Type="2" Value="07948 598348" /></PhoneNumbers>')
INSERT INTO Customers VALUES ('Mike', '', '<PhoneNumbers><PhoneNumber Type="3" Value="02875 482945" /></PhoneNumbers>')
INSERT INTO Customers VALUES ('Steve', '', '<PhoneNumbers></PhoneNumbers>')
Now I can see two ways of grabbing it.
Method 1
DECLARE @PhoneType INT
SET @PhoneType = 2
SELECT ct.*
FROM Customers ct
WHERE ct.PhoneNumbers.exist('/PhoneNumbers/PhoneNumber[@Type=sql:variable("@PhoneType")]') = 1
Really? sql:variable feels a bit unwholesome. However, it does work. However it's distinctively more difficult to access data in a more meaningful way.
Method 2
SELECT ct.*, pt.PhoneType
FROM Customers ct
CROSS APPLY ct.PhoneNumbers.nodes('/PhoneNumbers/PhoneNumber') AS nums(pn)
INNER JOIN PhoneTypes pt ON pt.ID = nums.pn.value('./@Type[1]', 'int')
WHERE nums.pn.value('./@Type[1]', 'int') = @PhoneType
This is more like it. Already I can easily expand it to do joins and all other good stuff. I've used CROSS APPLY
before on a table valued function, and it was very good. The execution plan for this as opposed to the previous query is seriously more advanced. Admittedly I haven't done any indexing and whatnot on these tables, but it's 97% of the entire batch cost.
Method 2 (expanded)
SELECT ct.ID, ct.CustomerName, ct.Notes, pt.PhoneType
FROM Customers ct
CROSS APPLY ct.PhoneNumbers.nodes('/PhoneNumbers/PhoneNumber') AS nums(pn)
INNER JOIN PhoneTypes pt ON pt.ID = nums.pn.value('./@Type[1]', 'int')
WHERE nums.pn.value('./@Type[1]', 'int') IN (SELECT ID FROM PhoneTypes)
Nice IN
clause here. I can also do something like pt.PhoneType = 'Work'
Finally
So I'm essentially obtaining the results that I want, but is there anything I should be aware of when using this mechanism to interrogate small amounts of XML data? Will it fall down on performance during elaborate searches? And is the storage of such markup style data too much of an overhead?
Side note
I've used things like sp_xml_preparedocument
and OPENXML
in the past just to pass lists into sprocs, but this is like a breath of fresh air in comparison!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
对于存储在 XML 列中的一些关键信息项,我们采取的一种方法是将它们“显示”为“父”表上的计算的、持久的属性。这是使用一个小的存储函数来完成的。
它工作得很好,因为每次 XML 更改时只计算一次该值 - 只要它没有更改,就不需要重新计算,该值像任何其他列一样存储在表中。
它也很棒,因为它可以被索引!因此,如果您正在寻找和/或加入这样的领域 - 这就像一个魅力!
所以你基本上需要一个类似这样的存储函数:
如果你没有类型 1 的电话号码,你只会返回 NULL。
然后,您需要使用计算的持久列来扩展父表:
如您所见 - 它对于单个条目效果很好,但不幸的是,您无法显示完整的属性列表。但是,如果您有一些关键项目,例如 ID 等,您希望大多数行都具有这些项目,那么这可能是一种非常好的、灵活的方式,可以更轻松、更有效地获取该信息。
One approach we've taken for some of our key items of information stored inside an XML column is to "surface" them as computed, persisted properties on the "parent" table. This is done using a little stored function.
It works great, because the value is computed only once every time the XML changes - as long as it's not changing, there's no recomputation, the value is stored on the table like any other column.
It's also great since it can be indexed! So if you're searching and/or joining on such a field - that works like a charm!
So you basically need a stored function along the lines of this:
If you don't have a phone number of type 1, you'll just get back a NULL.
Then, you need to extend your parent table with a computed, persisted column:
As you can see - it works just fine for single entries, but unfortunately, you cannot surface a whole list of properties. But if you have some key items, like ID's or something, that you expect most of your rows to have, this can be a very nice and slick way to get at that information more easily and more efficiently.