使用表变量比临时表更快
我是否可以安全地假设,如果我有使用 tempdb 编写临时表的存储过程,我最好将它们切换到表变量以获得更好的性能?
Am I safe to assume that where I have stored procedures using the tempdb
to write a temporary table, I'd be better off switching these to table variables to get better performance?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
临时表的性能更好。如果您使用表变量并且变量中的数据变得太大,SQL Server 会自动将变量转换为临时表。
就像几乎所有与数据库相关的问题一样,这取决于您尝试做什么。因此,如果没有更多信息,很难回答。
所以我的答案是,尝试一下,看看执行计划。使用最快的方式和最低的成本。
Temp tables are better in performance. If you use a Table Variable and the Data in the Variable gets too big, the SQL Server converts the Variable automatically into a temp table.
It depends, like almost every Database related question, on what you try to do. So it is hard to answer without more information.
So my answer is, try it and have a look at the execution plan. Use the fastest way with the lowest costs.
@Table 可以更快,因为对象仅在内存中,因此“设置时间”更少。
不过@Tables 有很多问题。
您可以在 @Table 上拥有主键,但仅此而已。用于列组合的其他索引 Clustered NonClustered 是不可能的。
此外,如果您的表将包含任何实际数据量(超过大约 200 行或 1000 行),那么访问该表将会变慢。特别是当您可能没有有用的索引时。
#表是过程中的一个痛苦,因为在调试时需要删除它们,并且创建它们需要更长的时间。并且它们的设置时间更长,因为您需要添加索引作为第二步。但如果你有大量数据,那么每次都是#tables。
即使表中的数据少于 100 行,您可能仍然希望使用 #Tables,因为您可以在表上创建有用的索引。
总之,我大部分时间都使用@Tables,以便在执行简单的过程等时轻松使用。但是任何需要执行的操作都应该是#Table。
@Table can be faster as there is less "setup time" since the object is in memory only.
@Tables have a lot of catches though.
You can have a primary key on a @Table but thats about it. Other indexes Clustered NonClustered for combinations of columns are not possible.
Also if your table is going to contain any real data volumes (more then about 200 maybe 1000 rows) then accessing the table will be slower. Especially when you will probably not have a useful index on it.
#Tables are a pain in procs as they need to be dropped when debugging, They take longer to create. and they take longer to setup as you need to add indexs as a second step. But if you have lots of data then its #tables every time.
Even in cases where you have less then 100 rows of data in a table you may still want to use #Tables as you can create a usefull index on the table.
In summary i use @Tables most of the time for the ease when doing simple proc etc. But anything that need to perform should be a #Table.
@Tables 没有统计数据,因此执行计划需要更多的猜测。因此,建议的上限为 1000 行左右。 #Tables 有统计数据,但这些 可以在调用之间缓存。如果每次 SP 运行时您的基数都显着不同,您需要每次都
REBUILD
和RECOMPILE
。当然,这是一项开销,但必须与垃圾计划的成本相平衡。两种类型都会执行 IO到临时数据库。
所以不,@Tables 不是万能药。
@Tables have no statistics so the execution plan entails more guesswork. Hence the recommended upper limit of 1000-ish rows. #Tables have statistics but these can be cached between invocations. If your cardinalities differ significantly each time the SP runs you'd want to
REBUILD
andRECOMPILE
each time. This is an overhead, of course, but one which must be balanced against the cost of a rubbish plan.Both types will do IO to TempDB.
So no, @Tables are not a panacea.
随着行数的增加,表变量的性能会变得非常差。
这是为什么?
表变量没有分布统计信息,也不会触发重新编译。因此,SQL Server 无法像普通表那样估计表变量中的行数。当优化器编译包含表变量的代码时,它假设表为空,并使用预期行计数 1 进行基数估计。由于优化器仅认为表变量包含单行,因此它会为执行计划选择适用于一小组记录的运算符,例如用于 JOIN 操作的 NESTED LOOPS 运算符。
举个例子,我刚刚修复了一个性能不佳的存储过程。该代码填充一个表变量,并在联接中使用它来过滤与相关帐户相关的行数:
当针对 1700 个帐户的列表运行时,查询需要 1 分 17 秒。只需将过滤表定义从: 更改为:
即可
将查询时间降至 800 毫秒。请注意,表中有 5 行,没有显着差异 - 临时表和表变量的运行时间均为 +/-400 毫秒。
Microsoft 建议如果行数小于 100,则使用表变量。
请注意,Microsoft 在 SQL Server 2019 中进行了更改以改进此问题(v15.x/兼容性级别 150)
Table variables can perform very poorly as the number of rows in them increases.
Why is this?
Table variables don’t have distribution statistics and don’t trigger recompiles. Because of this, SQL Server is not able to estimate the number of rows in a table variable like it does for normal tables. When the optimiser compiles code that contains a table variable, it assumes a table is empty and uses an expected row count of 1 for the cardinality estimate. Because the optimiser only thinks a table variable contains a single row, it picks operators for the execution plan that work well with a small set of records, like the NESTED LOOPS operator for a JOIN operation.
As an example, I have just fixed a stored procedure which was performing poorly. The code was populating a table variable and using it in a join to filter the number of rows to accounts which were relevant:
When run for list of 1700 accounts, the query was taking 1m17s. Just changing the filter table definition from:
to
brought the query time down to 800ms. Note that with 5 rows in the table, there was no significant difference - both temp table and table variable run in +/-400ms.
Microsoft's recommendation is to use Table Variables if the number of rows is <100.
Note that Microsoft have made changes in SQL Server 2019 to improve this (v15.x/Compatibility level 150)