SQL 中发现性能差异很大(1 小时到 1 分钟)。你能解释一下为什么吗?
以下查询在标准机器上分别需要 70 分钟和 1 分钟,查询 100 万条记录。可能的原因是什么?
查询 [01:10:00]
SELECT *
FROM cdc.fn_cdc_get_net_changes_dbo_PartitionTest(
CASE WHEN sys.fn_cdc_increment_lsn(0x00)<sys.fn_cdc_get_min_lsn('dbo_PartitionTest')
THEN sys.fn_cdc_get_min_lsn('dbo_PartitionTest')
ELSE sys.fn_cdc_increment_lsn(0x00) END
, sys.fn_cdc_get_max_lsn()
, 'all with mask')
WHERE __$operation <> 1
修改后的查询 [00:01:10]
DECLARE @MinLSN binary(10)
DECLARE @MaxLSN binary(10)
SELECT @MaxLSN= sys.fn_cdc_get_max_lsn()
SELECT @MinLSN=CASE WHEN sys.fn_cdc_increment_lsn(0x00)<sys.fn_cdc_get_min_lsn('dbo_PartitionTest')
THEN sys.fn_cdc_get_min_lsn('dbo_PartitionTest')
ELSE sys.fn_cdc_increment_lsn(0x00) END
SELECT *
FROM cdc.fn_cdc_get_net_changes_dbo_PartitionTest(
@MinLSN, @MaxLSN, 'all with mask') WHERE __$operation <> 1
[已修改]
我尝试使用类似的函数重新创建场景,以查看是否对每行评估参数。
CREATE FUNCTION Fn_Test(@a decimal)RETURNS TABLE
AS
RETURN
(
SELECT @a Parameter, Getdate() Dt, PartitionTest.*
FROM PartitionTest
);
SELECT * FROM Fn_Test(RAND(DATEPART(s,GETDATE())))
但对于 38 秒内处理的一百万条记录,我在“参数”列中得到了相同的值。
The following queries are taking 70 minutes and 1 minute respectively on a standard machine for 1 million records. What could be the possible reasons?
Query [01:10:00]
SELECT *
FROM cdc.fn_cdc_get_net_changes_dbo_PartitionTest(
CASE WHEN sys.fn_cdc_increment_lsn(0x00)<sys.fn_cdc_get_min_lsn('dbo_PartitionTest')
THEN sys.fn_cdc_get_min_lsn('dbo_PartitionTest')
ELSE sys.fn_cdc_increment_lsn(0x00) END
, sys.fn_cdc_get_max_lsn()
, 'all with mask')
WHERE __$operation <> 1
Modified Query [00:01:10]
DECLARE @MinLSN binary(10)
DECLARE @MaxLSN binary(10)
SELECT @MaxLSN= sys.fn_cdc_get_max_lsn()
SELECT @MinLSN=CASE WHEN sys.fn_cdc_increment_lsn(0x00)<sys.fn_cdc_get_min_lsn('dbo_PartitionTest')
THEN sys.fn_cdc_get_min_lsn('dbo_PartitionTest')
ELSE sys.fn_cdc_increment_lsn(0x00) END
SELECT *
FROM cdc.fn_cdc_get_net_changes_dbo_PartitionTest(
@MinLSN, @MaxLSN, 'all with mask') WHERE __$operation <> 1
[Modified]
I tried to recreate the scenario with a similar function to see if the parameters are evaluated for each row.
CREATE FUNCTION Fn_Test(@a decimal)RETURNS TABLE
AS
RETURN
(
SELECT @a Parameter, Getdate() Dt, PartitionTest.*
FROM PartitionTest
);
SELECT * FROM Fn_Test(RAND(DATEPART(s,GETDATE())))
But I am getting the same value for the column 'Parameter' for a a million records processed in 38 seconds.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
在您的第一个查询中,您的
fn_cdc_increment_lsn
和fn_cdc_get_min_lsn
会针对每一行执行。在第二个示例中,仅一次。In your first query, your
fn_cdc_increment_lsn
andfn_cdc_get_min_lsn
get executed for every row. In second example, just once.即使是确定性标量函数,每行也至少计算一次。如果相同的确定性标量函数在具有相同参数的同一“行”上多次出现,我相信只有这样它才会被评估一次 - 例如在 fn_X(a, b, c) > 的情况下0 THEN fn_X(a, b, c) ELSE 0 END 或类似的东西。
我认为你的兰德问题是因为你继续重新播种:
正如您所指出的,我已经开始缓存标量函数结果 - 甚至预先计算标量函数结果表并连接到它们。最终必须采取一些措施才能使标量函数发挥作用。不对,最好的选择是 CLR - 显然这些远远优于 SQL UDF。不幸的是,我无法在当前环境中使用它们。
Even deterministic scalar functions are evaluated at least once per row. If the same deterministic scalar function occurs multiple times on the same "row" with the same parameters, I believe only then will it be evaluated once - e.g. in a
CASE WHEN fn_X(a, b, c) > 0 THEN fn_X(a, b, c) ELSE 0 END
or something like that.I think your RAND problem is because you continue to reseed:
I have taken to caching scalar function results as you have indicated - even going so far as to precalculate tables of scalar function results and joining to them. Something has to be done eventually to make scalar functions perform. Right not, the best option is the CLR - apparently these far outperform SQL UDFs. Unfortunately, I cannot use them in my current environment.