慢速 SQL Server CLR 聚合

发布于 2024-11-14 22:32:25 字数 660 浏览 5 评论 0原文

我总是发现缺少一个内置的聚合函数，该函数只是使用一些用户定义的分隔符将值串在一起，这令人沮丧。

我所知道的最接近的事情是 XML hack：

select s.string as [text()] 
from strings s 
for xml path('')

但在我看来，这是一种非常奇怪的方法，而且您仍然必须处理带有尾随或前导分隔符垃圾的情况。所以，我想我应该编写一个 CLR 聚合：

select dbo.Fold(s.string, ', ') 
from strings s

这很好。但当行数达到 5000 时，这个东西的性能就不太好。我真的不希望针对那么多数据运行它，但令我惊讶的是，随着数据集的增长，聚合的性能会迅速下降。一开始我不明白为什么，后来我意识到SQL Server中处理UDF的方式有问题。

我正在使用带有 StringBuilder 类的 UDF，因此我必须实现 IBinarySerialize。但出于我不明白的原因，SQL Server 在 Accumulate 的每次迭代之间调用 Write/Read 对。显然，当字符串变大时，这会产生性能问题。有没有办法避免这种昂贵的序列化？

仅供参考，XML hack 的速度要快几个数量级。

原文

I've always found the lack of a built-in aggregate function that just strings together values using some user defined separator, frustrating.

The closest thing I know of is the XML hack:

select s.string as [text()] 
from strings s 
for xml path('')

But in my opinion that's a very strange way of doing it and you still have to handle the case with trailing or leading separator junk. So, I thought I'd write a CLR aggregate:

select dbo.Fold(s.string, ', ') 
from strings s

That's kind of nice. Except this thing doesn't perform very well when the number of rows reaches 5000. I don't really expect to run it against that much data but to my surprise the performance of the aggregate degrades rather rapidly as the data set grows. I didn't understand why at first, then I realized that the way the UDF is handled in SQL Server is the problem.

I'm using an UDF with a StringBuilder class, because of this I have to implement IBinarySerialize. But for reasons I do not understand SQL Server is calling the Write/Read pair between each iteration of Accumulate. Obviously this creates a performance problem when the string gets big. Is there anyway to avoid this costly serialization?

FYI, the XML hack are orders of magnitudes faster.

分享到QQ

分享到微博