SQL Server 如何评估包含用户定义函数的执行计划的成本?

发布于 2024-08-06 08:55:34 字数 600 浏览 6 评论 0原文

我有一个存储过程,它根据 DATEADD 函数的结果进行过滤 - 我的理解是,这与使用用户定义的函数类似,因为 SQL Server 无法根据该函数的输出存储统计信息它无法评估执行计划的成本。

该查询看起来有点像这样:(

SELECT /* Columns */ FROM
TableA JOIN TableB
ON TableA.id = TableB.join_id
WHERE DATEADD(hour, TableB.HoursDifferent, TableA.StartDate) <= @Now

因此不可能预先计算DATEADD的结果)

我看到的是一个可怕的执行计划,我相信这是由于SQL服务器不正确造成的估计从树的一部分返回的行数为 1,而实际上约为 65,000。然而,当数据库中存在不同(不一定更少)的数据时,我看到相同的存储过程在一小部分时间内执行。

我的问题是 - 在这种情况下,查询优化器如何估计函数的结果?

更新:仅供参考,我更感兴趣的是理解为什么有时我会得到一个好的执行计划,而为什么其他时间我却没有 - 我已经很清楚如何执行从长远来看,我会解决这个问题。

I have a stored procedure which filters based on the result of the DATEADD function - My understanding is that this is similar to using user defined functions in that because SQL server cannot store statistics based on the output of that function it has trouble evaluating the cost of an execution plan.

The query looks a little like this:

SELECT /* Columns */ FROM
TableA JOIN TableB
ON TableA.id = TableB.join_id
WHERE DATEADD(hour, TableB.HoursDifferent, TableA.StartDate) <= @Now

(So its not possible to pre-calculate the outcome of the DATEADD)

What I'm seeing is a terrible terrible execution plan which I believe is due to SQL server incorrectly estimating the number of rows being returned from a part of the tree as being 1, when in fact its ~65,000. I have however seen the same stored procedure execute in a fraction of the time when different (not neccessarily less) data is present in the database.

My question is - in cases like these how does the query optimiser estimate the outcome of the function?

UPDATE: FYI, I'm more interested in understanding why some of the time I get a good execution plan and why the rest of the time I don't - I already have a pretty good idea of how I'm going to fix this in the long term.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

我不吻晚风 2024-08-13 08:55:34

这里的问题不是计划的成本计算。列上的函数阻止 SQL 进行索引查找。您将进行索引扫描或表扫描。

我的建议是看看是否可以从函数中取出其中一列,基本上看看是否可以将函数移动到等式的另一边。它并不完美,但这意味着至少有一列可用于索引查找。

像这样的东西(粗略的想法,未测试),在 TableB.HoursDifference 上有一个索引,然后在 TableA 中的连接列上有一个索引

DATEDIFF(hour, @Now, TableA.StartDate) >= TableB.HoursDifferent

在成本方面,我怀疑优化器将使用表的 30% 'thumb-suck'因为它无法使用统计数据来获得准确的估计,而且它是一个不等式。这意味着它将猜测该谓词将返回表的 30%。

在没有看到执行计划的情况下,真的很难确定任何事情。您提到估计为 1 行,实际为 65000。在某些情况下,这根本不是问题。
http:// /sqlinthewild.co.za/index.php/2009/09/22/estimated-rows-actual-rows-and-execution-count/

It's not the costing of the plan that's the problem here. The function on the columns prevent SQL from doing index seeks. You're going to get an index scan or a table scan.

What I'd suggest is to see if you can get one of the columns out of the function, basically see if you can move the function to the other side of the equality. It's not perfect, but it means that at least one column can be used for an index seek.

Something like this (rough idea, not tested) with an index on TableB.HoursDifference, then an index on the join column in TableA

DATEDIFF(hour, @Now, TableA.StartDate) >= TableB.HoursDifferent

On the costing side, I suspect that the optimiser will use the 30% of the table 'thumb-suck' because it can't use statistics to get an accurate estimate and because it's an inequality. Meaning it's going to guess that 30% of the table will be returned by that predicate.

It's really hard to say anything for sure without seeing the execution plans. You mention an estimate of 1 row and an actual of 65000. In some cases, that's not a problem at all.
http://sqlinthewild.co.za/index.php/2009/09/22/estimated-rows-actual-rows-and-execution-count/

伪心 2024-08-13 08:55:34

查看该函数会有所帮助,但我看到的一件事是在查询中隐藏此类函数可能会导致性能不佳。如果你能事先评估其中的一些,你可能会处于更好的状态。例如,代替

WHERE MyDate < GETDATE()

Try

DECLARE @Today DATETIME
SET @Today = GETDATE()
...
WHERE MyDate < @Today

这似乎表现更好

It would help to see the function, but one thing I have seen is burying functions like that in queries can result in poor performance. If you can evaluate some of it beforehand you might be in better shape. For example, instead of

WHERE MyDate < GETDATE()

Try

DECLARE @Today DATETIME
SET @Today = GETDATE()
...
WHERE MyDate < @Today

this seems to perform better

誰ツ都不明白 2024-08-13 08:55:34

@Kragen,

简短回答:如果您正在使用十个表进行查询,习惯它。您需要了解有关查询提示的所有信息,以及更多技巧。

长答案:

SQL Server 通常只为最多大约三到五个表生成出色的查询计划。根据我的经验,一旦超出这个范围,您基本上就必须自己编写查询计划,使用所有索引和连接提示。 (此外,标量函数似乎在 Cost=0 时进行估计,这真是太疯狂了。)

原因是此后它太复杂了。查询优化器必须从算法上决定做什么,即使是 SQL Server 团队中最聪明的天才,也有太多可能的组合来创建真正通用的算法。

他们说优化者比你聪明。这可能是真的。 但是你有一个优势。这个优势就是如果它不起作用,你可以扔掉它并再试一次!大约第六次尝试时,如果您知道数据,即使是十表连接,您也应该得到一些可以接受的结果。查询优化器无法做到这一点,它必须立即提出某种计划,并且没有第二次机会。

我最喜欢的技巧是通过将 where 子句转换为 case 语句来强制其顺序。而不是:

WHERE
predicate1
AND predicate2
AND....

使用这个:

WHERE
case 
when not predicate1 then 0
when not predicate2 then 0
when not .... then 0
else 1 end = 1

将你的谓词从最便宜到最昂贵排序,你得到的结果在逻辑上是相同的,但 SQL Server 不会搞乱 - 它必须按照你说的顺序执行它们。

@Kragen,

Short answer: If you are doing queries with ten tables, get used to it. You need to learn all about query hints, and a lot more tricks besides.

Long answer:

SQL server generally generates excellent query plans for up to about three to five tables only. Once you go beyond that in my experience you are basically going to have to write the query plan yourself, using all the index and join hints. (In addition, Scalar functions seem to get estimated at Cost=Zero, which is just mad.)

The reason is it is just too damn complicated after that. The query optimiser has to decide what to do algorithmically, and there are too many possible combinations for even the brightest geniuses on the SQL Server team to create an algorithm which works truly universally.

They say the optimiser is smarter than you. That may be true. But you have one advantage. That advantage is if it doesn't work, you can throw it out and try again! By about the sixth attempt you should have something acceptable, even for a ten-table join, if you know the data. The query optimiser cannot do that, it has to come up with some sort of plan instantly, and it gets no second chances.

My favourite trick is to force the order of the where clause by converting it to a case statement. Instead of:

WHERE
predicate1
AND predicate2
AND....

Use this:

WHERE
case 
when not predicate1 then 0
when not predicate2 then 0
when not .... then 0
else 1 end = 1

Order your predicates cheapest to most expensive, and you get an outcome which is logically the same but which SQL server doesn't get to mess around with - it has to do them in the order you say.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文