表值参数性能问题

发布于 2024-11-09 04:52:50 字数 990 浏览 6 评论 0原文

我不知道这是否是我使用它们的方式或 Microsoft 实现的问题，但 SQL 2008 表值参数非常慢。

一般来说，如果我需要使用 TVP，那是因为我有很多记录 - 目前，对于超过最少记录的任何内容，它们似乎都慢得无法使用。

我在 .Net 中这样调用它们：

// get the data
DataTable data = GetData();

com.CommandText = "sprocName"

// create the table-value parameter
var tvp = com.Parameters.AddWithValue("data", data);
tvp.SqlDbType = SqlDbType.Structured;

com.ExecuteNonQuery();

我运行探查器来查看原因，实际的 SQL 语句是这样的：

declare @data table ...

insert into @data ( ... fields ... ) values ( ... values ... )
-- for each row
insert into @data ( ... fields ... ) values ( ... values ... )

sprocName(@data)

不过，这是一种非常慢的方法。如果这样做的话会快得多：

insert into @data ( ... fields ... ) 
values ( ... values ... ),
       ( ... values ... ),
       -- for each row
       ( ... values ... )

我不确定为什么它不使用更新、更快的语法。或者甚至是它在 SqlBulkCopy 下所做的任何事情。

SQL 2008 中添加了新语法，但 TVP 也是如此（我认为）。

有什么选择可以让它做到这一点吗？或者我缺少什么？

原文

I don't know whether this is an issue with how I'm using them or Microsoft's implementation, but SQL 2008 table-value parameters are painfully slow.

Generally if I need to use a TVP it's because I've got lots of records - currently they appear to be unusably slow for anything more than the fewest records.

I'm calling them in .Net like this:

// get the data
DataTable data = GetData();

com.CommandText = "sprocName"

// create the table-value parameter
var tvp = com.Parameters.AddWithValue("data", data);
tvp.SqlDbType = SqlDbType.Structured;

com.ExecuteNonQuery();

I ran profiler to see why, and the actual SQL statement is something like this:

declare @data table ...

insert into @data ( ... fields ... ) values ( ... values ... )
-- for each row
insert into @data ( ... fields ... ) values ( ... values ... )

sprocName(@data)

That's a really slow way to do it though.
It would be much quicker if it did this instead:

insert into @data ( ... fields ... ) 
values ( ... values ... ),
       ( ... values ... ),
       -- for each row
       ( ... values ... )

I'm not sure why it isn't using the newer, much quicker syntax. Or even whatever it does under the hood with SqlBulkCopy.

The new syntax was added in SQL 2008, but then so are TVPs (I think).

Is there some option to make it do this? Or something that I'm missing?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

无声静候 2024-11-16 04:52:50

如果 TVP 比其他选项“明显慢”，那么很可能您没有正确实施它们。

您不应该使用 DataTable，除非您的应用程序除了将值发送到 TVP 之外还需要使用它。使用 IEnumerable 接口速度更快，并且使用更少的内存，因为您不必在内存中复制集合只是为了将其发送到数据库。我已将其记录在以下位置：
- 如何在最短的时间内插入 1000 万条记录？（这里还有很多额外的信息和链接））
- 将字典传递给存储过程 T-SQL
- 将数据流式传输到 SQL 中来自应用程序的 Server 2008（在 SQLServerCentral.com 上；需要免费注册）

您不应该对 SqlParameter 使用 AddWithValue，尽管这不太可能是性能问题。但仍然应该是：

SqlParameter tvp = com.Parameters.Add("data", SqlDbType.Structured);
tvp.Value = MethodThatReturnsIEnumerable(MyCollection);

TVP 是表变量，因此不维护统计数据。这意味着，它们向查询优化器报告只有 1 行。所以，在你的过程中，要么：
- 对任何使用 TVP 的查询（除了简单的 SELECT 之外的任何查询）使用语句级重新编译：OPTION (RECOMPILE)
- 创建本地临时表（即单个#）并将TVP的内容复制到临时表中
- 您可以尝试向用户定义的表类型添加聚集主键
- 如果使用 SQL Server 2014 或更高版本，您可以尝试使用内存中 OLTP/内存优化表。请参阅：使用内存优化更快的临时表和表变量

关于您所看到的原因：

insert into @data ( ... fields ... ) values ( ... values ... )
-- for each row
insert into @data ( ... fields ... ) values ( ... values ... )

而不是：

insert into @data ( ... fields ... ) 
values ( ... values ... ),
       ( ... values ... ),

如果这实际上正在发生，那么：

如果插入是在事务则没有真正的性能差异
较新的值列表语法（即 VALUES (row1), (row2), (row3)）仅限于 1000 行左右，因此不是一个可行的选项对于没有该限制的 TVP。然而，这不太可能是使用单个插入的原因，因为在执行 INSERT INTO @data (fields) SELECT tab.[col] FROM (VALUES (), (), .. 时没有限制。 .) tab([col])，我在这里记录：
最大行数表值构造函数。相反...
原因很可能是执行单独插入允许将值从应用程序代码流式传输到 SQL Server：
1. 使用迭代器（即上面 #1 中提到的 IEnumerable），应用代码会发送从方法返回的每一行，并且
2. 构造 VALUES (), (), ... 列表，即使使用 INSERT INTO ... SELECT FROM (VALUES ...) 方法 (不限于 1000 行），在将任何数据发送到 SQL Server 之前，仍然需要构建整个 VALUES 列表。如果数据量很大，构造超长字符串的时间会更长，而且会占用更多的内存。

另请参阅 SQL Server 客户咨询团队提供的白皮书：利用 TVP 最大化吞吐量

If TVPs are "noticeably slower" than the other options, then most likely you are not implementing them correctly.

You should not be using a DataTable, unless your application has use for it outside of sending the values to the TVP. Using the IEnumerable<SqlDataRecord> interface is faster and uses less memory as you are not duplicating the collection in memory only to send it to the DB. I have this documented in the following places:
- How can I insert 10 million records in the shortest time possible? (lots of extra info and links here as well)
- Pass Dictionary to Stored Procedure T-SQL
- Streaming Data Into SQL Server 2008 From an Application (on SQLServerCentral.com ; free registration required)

You should not use AddWithValue for the SqlParameter, though this is not likely a performance issue. But still, it should be:

SqlParameter tvp = com.Parameters.Add("data", SqlDbType.Structured);
tvp.Value = MethodThatReturnsIEnumerable<SqlDataRecord>(MyCollection);

TVPs are Table Variables and as such do not maintain statistics. Meaning, they report only having 1 row to the Query Optimizer. So, in your proc, either:
- Use statement-level recompile on any queries using the TVP for anything other than a simple SELECT: OPTION (RECOMPILE)
- Create a local temporary table (i.e. single #) and copy the contents of the TVP into the temp table
- You could try adding a clustered primary key to the User-Defined Table Type
- If using SQL Server 2014 or newer, you can try making use of In-Memory OLTP / memory-optimized tables. Please see: Faster temp table and table variable by using memory optimization

Regarding why you are seeing:

insert into @data ( ... fields ... ) values ( ... values ... )
-- for each row
insert into @data ( ... fields ... ) values ( ... values ... )

instead of:

insert into @data ( ... fields ... ) 
values ( ... values ... ),
       ( ... values ... ),

IF that is actually what is happening, then:

If the inserts are being done within a Transaction then there is no real performance difference
The newer value-list syntax (i.e. VALUES (row1), (row2), (row3)) is limited to something like 1000 rows and hence not a viable option for TVPs that do not have that limit. HOWEVER, this is not likely the reason that individual inserts are being used, given that there is no limit when doing INSERT INTO @data (fields) SELECT tab.[col] FROM (VALUES (), (), ...) tab([col]), which I documented here:
Maximum Number of Rows for the Table Value Constructor. Instead...
The reason is most likely that doing individual inserts allows for streaming the values from the app code into SQL Server:
1. using an iterator (i.e. the IEnumerable<SqlDataRecord> noted in #1 above), the app code sends each row as it is returned from the method, and
2. constructing the VALUES (), (), ... list, even if doing the INSERT INTO ... SELECT FROM (VALUES ...) approach (which is not limited to 1000 rows), that would still require building the entire VALUES list before sending any of the data into SQL Server. If there is a lot of data, that would take longer to construct the super-long string, and it would take up a lot more memory while doing it.