存储过程执行计划 - 数据操作

发布于 2024-07-14 00:29:26 字数 598 浏览 7 评论 0原文

我有一个处理大量数据的存储过程(在本例中大约有 500 万行)。 性能差异很大。 我曾见过该流程运行时间短至 15 分钟,也见过它运行长达 4 小时。

为了进行维护,并且为了验证逻辑和处理是否正确,我们将 SP 分为几个部分:

  1. TRUNCATE 并填充一个工作表(已索引),我们稍后可以使用自动验证测试工具。

  2. 将多个表连接在一起(包括其中一些工作表)以生成另一个工作表

重复 1 和/或 2,直到最终完成产生输出。

我担心的是,这是一个单一的 SP,因此在第一次运行时会得到一个执行计划(甚至WITH RECOMPILE)。 但那时,工作表(工作模式中的永久表)是空的。

我担心,无论索引方案如何,执行计划都会很差。

我正在考虑分解 SP 并从其中调用单独的 SP,以便它们可以在构建工作表中的数据后利用重新评估的执行计划。 我还看到了有关使用 EXEC 运行动态 SQL 的参考,显然也可能会得到 RECOMPILE。

我仍在尝试获得 SHOWPLAN 权限,所以我非常盲目。

I have a stored proc that processes a large amount of data (about 5m rows in this example). The performance varies wildly. I've had the process running in as little as 15 minutes and seen it run for as long as 4 hours.

For maintenance, and in order to verify that the logic and processing is correct, we have the SP broken up into sections:

  1. TRUNCATE and populate a work table (indexed) we can verify later with automated testing tools.

  2. Join several tables together (including some of these work tables) to product another work table

Repeat 1 and/or 2 until a final output is produced.

My concern is that this is a single SP and so gets an execution plan when it is first run (even WITH RECOMPILE). But at that time, the work tables (permanent tables in a Work schema) are empty.

I am concerned that, regardless of the indexing scheme, the execution plan will be poor.

I am considering breaking up the SP and calling separate SPs from within it so that they could take advantage of a re-evaluated execution plan after the data in the work tables is built. I have also seen reference to using EXEC to run dynamic SQL which, obviously might get a RECOMPILE also.

I'm still trying to get SHOWPLAN permissions, so I'm flying quite blind.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

冷心人i 2024-07-21 00:29:26

您能够确定是否存在任何锁定问题吗? 您是否在足够小的事务中运行 SP?

将其分解为子过程应该没有任何好处。

有人应该关心你的生产力,在没有基本优化资源的情况下工作。 这表明可能还存在其他可能的看不见的问题。

Are you able to determine whether there are any locking problems? Are you running the SP in sufficiently small transactions?

Breaking it up into subprocedures should have no benefit.

Somebody should be concerned about your productivity, working without basic optimization resources. That suggests there may be other possible unseen issues as well.

总以为 2024-07-21 00:29:26

在下面的链接中获取“剖析执行计划”的免费副本,也许您可​​以从中获得一两个提示,这将使您了解 SP 背后到底发生了什么。

http://dbalink.wordpress .com/2008/08/08/dissecting-sql-server-execution-plans-free-ebook/

Grab the free copy of "Dissecting Execution Plan" in the link below and maybe you can pick up a tip or two from it that will give you some idea of what's really going on under the hood of your SP.

http://dbalink.wordpress.com/2008/08/08/dissecting-sql-server-execution-plans-free-ebook/

疯了 2024-07-21 00:29:26

您确定您看到的变化是由“糟糕”的执行计划引起的吗? 这可能是一个原因,但可能还有许多其他原因:

  • 使用不同数据时数据库计算机上的“其他”负载
  • ,可能存在“简单”和“困难”数据
  • 问题,必须分配更多内存/文件存储
  • ...

您是否尝试过使用相同的数据运行 SP 几次?

另外,为了找出导致运行时/可变性的原因,我会尝试进行一些详细的测量,以将问题定位到代码的特定部分。 (最简单的方法是在 sp 的各个点插入一些日志调用)。 然后尝试解释为什么该部分很慢(除了“5M 行;-))并找出一种方法来使其更快。

目前,我认为在继续“拆分 sp”之前需要回答几个问题“ 路线。

Are you sure that the variability you're seeing is caused by "bad" execution plans? This may be a cause, but there may be a number of other reasons:

  • "other" load on the db machine
  • when using different data, there may be "easy" and "hard" data
  • issues with having to allocate more memory/file storage
  • ...

Have you tried running the SP with the same data a few times?

Also, in order to figure out what is causing the runtime/variability, I'd try to do some detailed measuring to pin the problem down to a specific section of the code. (Easiest way would be to insert some log calls at various points in the sp). Then try to explain why that section is slow (other than "5M rows ;-)) and figure out a way to make that faster.

For now, I think there are a few questions to answer before going down the "splitting up the sp" route.

喜爱皱眉﹌ 2024-07-21 00:29:26

你是对的,在你可以从整个流程的多次执行中获得“实际”执行计划之前,你很难清楚地了解幕后发生的事情。

也许有一点需要考虑。 您的工作表是物理表还是临时表? 如果它们是物理的,您将通过将新数据插入到没有索引(即堆)的新表中来获得性能增益,然后您可以在插入所有数据后在其上构建索引。

另外,您的流程的目的是什么。 听起来您正在移动相当多的数据,在这种情况下您可能希望考虑使用分区。 您可以相对轻松地将数据移入和移出主表。

希望我所详细说明的内容很清楚,但请随时提出进一步的问题。

干杯,约翰

You're right it is quite difficult for you to get a clear picture of what is happening behind the scenes until you can get the "actual" execution plans from several executions of your overall process.

One point to consider perhaps. Are your work tables physical of temporary tables? If they are physical you will get a performance gain by inserting new data into a new table without an index (i.e. a heap) which you can then build an index on after all the data has been inserted.

Also, what is the purpose of your process. It sounds like you are moving quite a bit of data around, in which case you may wish to consider the use of partitioning. You can switch in and out data to your main table with relative ease.

Hope what I have detailed is clear but please feel free to pose further questions.

Cheers, John

总攻大人 2024-07-21 00:29:26

在某些情况下,我发现执行时间/查询计划的这种多样性程度可以归结为统计数据。 我建议在进程运行之前对您正在使用的表运行更新统计信息的一些测试。 这将迫使 SQL 重新评估执行计划,并且我怀疑这会为您提供更一致的结果。 此外,您最好看看执行时间的差异是否与数据库管理员的重新索引作业相关。 也许您还可以在每次运行之前收集一些索引运行状况统计信息。

如果没有,正如其他回答者所建议的那样,您更有可能遇到锁定和/或争用问题。

祝你好运。

In several cases I've seen this level of diversity of execution times / query plans comes down to statistics. I would recommend some tests running update stats against the tables you are using just before the process is run. This will both force a re-evaluation of the execution plan by SQL and, I suspect, give you more consistent results. Additionally you may do well to see if the differences in execution time correlate with re-indexing jobs by your dbas. Perhaps you could also gather some index health statistics before each run.

If not, as other answerers have suggested, you are more likely suffering from locking and/or contention issues.

Good luck with it.

臻嫒无言 2024-07-21 00:29:26

我唯一能想到的是,当没有数据时,执行计划会做错,那就是使用表扫描而不是索引,因为当整个表适合内存时,表扫描速度非常快。 您是否确实观察到或肯定正在发生其他负面影响,因为创建执行计划时没有数据?

您可以在查询中强制使用索引...

似乎我觉得你可能走错了路。

The only thing I can think that an execution plan would do wrong when there's no data is err on the side of using a table scan instead of an index, since table scans are super fast when the whole table will fit into memory. Are there other negatives you're actually observing or are sure are happening because there's no data when an execution plan is created?

You can force usage of indexes in your query...

Seems to me like you might be going down the wrong path.

恍梦境° 2024-07-21 00:29:26

这是某种进料或出料还是您正在创建报告? 如果是 feed,我建议您更改流程以使用 SSIS,它应该能够非常快地移动 500 万条记录。

Is this an infeed or outfeed of some sort or are you creating a report? If it is a feed, I would suggest that you change the process to use SSIS which should be able to move 5 million records very fast.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文