慢速 SQL Server CLR 聚合
我总是发现缺少一个内置的聚合函数,该函数只是使用一些用户定义的分隔符将值串在一起,这令人沮丧。
我所知道的最接近的事情是 XML hack:
select s.string as [text()]
from strings s
for xml path('')
但在我看来,这是一种非常奇怪的方法,而且您仍然必须处理带有尾随或前导分隔符垃圾的情况。所以,我想我应该编写一个 CLR 聚合:
select dbo.Fold(s.string, ', ')
from strings s
这很好。但当行数达到 5000 时,这个东西的性能就不太好。我真的不希望针对那么多数据运行它,但令我惊讶的是,随着数据集的增长,聚合的性能会迅速下降。一开始我不明白为什么,后来我意识到SQL Server中处理UDF的方式有问题。
我正在使用带有 StringBuilder
类的 UDF,因此我必须实现 IBinarySerialize
。但出于我不明白的原因,SQL Server 在 Accumulate 的每次迭代之间调用 Write/Read 对。显然,当字符串变大时,这会产生性能问题。有没有办法避免这种昂贵的序列化?
仅供参考,XML hack 的速度要快几个数量级。
I've always found the lack of a built-in aggregate function that just strings together values using some user defined separator, frustrating.
The closest thing I know of is the XML hack:
select s.string as [text()]
from strings s
for xml path('')
But in my opinion that's a very strange way of doing it and you still have to handle the case with trailing or leading separator junk. So, I thought I'd write a CLR aggregate:
select dbo.Fold(s.string, ', ')
from strings s
That's kind of nice. Except this thing doesn't perform very well when the number of rows reaches 5000. I don't really expect to run it against that much data but to my surprise the performance of the aggregate degrades rather rapidly as the data set grows. I didn't understand why at first, then I realized that the way the UDF is handled in SQL Server is the problem.
I'm using an UDF with a StringBuilder
class, because of this I have to implement IBinarySerialize
. But for reasons I do not understand SQL Server is calling the Write/Read pair between each iteration of Accumulate
. Obviously this creates a performance problem when the string gets big. Is there anyway to avoid this costly serialization?
FYI, the XML hack are orders of magnitudes faster.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您正在做的是实现 MySQL GROUP_CONCAT
请参阅这些以获得灵感...
就个人而言,我使用 XML 技巧。
What you are doing is implementing the MySQL GROUP_CONCAT
See these for inspiration...
Personally, I use the XML trick.