C# 代码和 SQL Server 性能

发布于 2024-09-27 10:12:48 字数 2688 浏览 0 评论 0原文

我有一个这样设计的 SQL Server 数据库:

TableParameter
  Id    (int, PRIMARY KEY, IDENTITY)
  Name1 (string)
  Name2 (string, can be null)
  Name3 (string, can be null)
  Name4 (string, can be null)

TableValue
  Iteration         (int)
  IdTableParameter  (int, FOREIGN KEY)
  Type              (string)
  Value             (decimal)

因此,正如您刚才所理解的,TableValue 链接到 TableParameterTableParameter 就像一个多维字典。

TableParameter 应该有很多行(超过 300,000 行)

在我的 C# 客户端程序中,我必须在每个 Compute() 函数之后填充此数据库:

for (int iteration = 0; iteration < 5000; iteration++)
{
    Compute();
    FillResultsInDatabase();
}

在 < code>FillResultsInDatabase() 方法,我必须:

  1. 检查我的参数的标签是否已存在于 TableParameter 中。如果它不存在,我必须插入一个新的。
  2. 我必须将值插入 TableValue

第 1 步需要很长时间!我将所有表 TableParameter 加载到 IEnumerable 属性中,然后,对于每个参数,我创建一个

.FirstOfDefault( x => x.Name1 == item.Name1 &&
                      x.Name2 == item.Name2 &&
                      x.Name3 == item.Name3 &&
                      x.Name4 == item.Name4 );

以检测它是否已经存在(然后获取 id)。

这样的表现非常糟糕!

我尝试使用 WHERE 一词进行选择,以避免加载 TableParameter 的每一行,但性能更差!

如何提高第 1 步的性能?

对于第 2 步,使用经典的 INSERT 性能仍然很差。我将尝试SqlBulkCopy

如何提高第 2 步的性能?

已编辑

我已尝试使用存储过程:

CREATE PROCEDURE GetIdParameter
    @Id     int OUTPUT,
    @Name1  nvarchar(50) = null,
    @Name2  nvarchar(50) = null,
    @Name3  nvarchar(50) = null
AS
SELECT TOP 1 @Id = Id FROM TableParameter
WHERE
TableParameter.Name1 = @Name1   
AND
(@Name2 IS NULL OR TableParameter.Name2= @Name2)
AND
(@Name3 IS NULL OR TableParameter.Name3 = @Name3)
GO

CREATE PROCEDURE CreateValue
    @Iteration int,
    @Type   nvarchar(50),
    @Value  decimal(32, 18),
    @Name1  nvarchar(50) = null,
    @Name2  nvarchar(50) = null,
    @Name3  nvarchar(50) = null
AS
DECLARE @IdParameter int
EXEC GetIdParameter @IdParameter OUTPUT, 
                    @Name1, @Name2, @Name3
IF @IdParameter IS NULL
BEGIN
    INSERT TablePArameter (Name1, Name2, Name3) 
                               VALUES
                              (@Name1, @Name2, @Name3)

    SELECT @IdParameter= SCOPE_IDENTITY()
END
  INSERT TableValue (Iteration, IdParamter, Type, Value) 
                              VALUES
                              (@Iteration, @IdParameter, @Type, @Value)
GO

我仍然具有相同的性能...:-((不可接受)

I have a SQL Server database designed like this :

TableParameter
  Id    (int, PRIMARY KEY, IDENTITY)
  Name1 (string)
  Name2 (string, can be null)
  Name3 (string, can be null)
  Name4 (string, can be null)

TableValue
  Iteration         (int)
  IdTableParameter  (int, FOREIGN KEY)
  Type              (string)
  Value             (decimal)

So, as you've just understood, TableValue is linked to TableParameter.
TableParameter is like a multidimensionnal dictionary.

TableParameter is supposed to have a lot of rows (more than 300,000 rows)

From my c# client program, I have to fill this database after each Compute() function :

for (int iteration = 0; iteration < 5000; iteration++)
{
    Compute();
    FillResultsInDatabase();
}

In FillResultsInDatabase() method, I have to :

  1. Check if the label of my parameter already exists in TableParameter. If it doesn't exist, i have to insert a new one.
  2. I have to insert the value in the TableValue

Step 1 takes a long time ! I load all the table TableParameter in a IEnumerable property and then, for each parameter I make a

.FirstOfDefault( x => x.Name1 == item.Name1 &&
                      x.Name2 == item.Name2 &&
                      x.Name3 == item.Name3 &&
                      x.Name4 == item.Name4 );

in order to detect if it already exists (and after to get the id).

Performance are very bad like this !

I've tried to make selection with WHERE word in order to avoid loading every row of TableParameter but performance are worse !

How can I improve the performance of step 1 ?

For Step 2, performance are still bad with classic INSERT. I am going to try SqlBulkCopy.

How can I improve the performance of step 2 ?

EDITED

I've tried with Store Procedure :

CREATE PROCEDURE GetIdParameter
    @Id     int OUTPUT,
    @Name1  nvarchar(50) = null,
    @Name2  nvarchar(50) = null,
    @Name3  nvarchar(50) = null
AS
SELECT TOP 1 @Id = Id FROM TableParameter
WHERE
TableParameter.Name1 = @Name1   
AND
(@Name2 IS NULL OR TableParameter.Name2= @Name2)
AND
(@Name3 IS NULL OR TableParameter.Name3 = @Name3)
GO

CREATE PROCEDURE CreateValue
    @Iteration int,
    @Type   nvarchar(50),
    @Value  decimal(32, 18),
    @Name1  nvarchar(50) = null,
    @Name2  nvarchar(50) = null,
    @Name3  nvarchar(50) = null
AS
DECLARE @IdParameter int
EXEC GetIdParameter @IdParameter OUTPUT, 
                    @Name1, @Name2, @Name3
IF @IdParameter IS NULL
BEGIN
    INSERT TablePArameter (Name1, Name2, Name3) 
                               VALUES
                              (@Name1, @Name2, @Name3)

    SELECT @IdParameter= SCOPE_IDENTITY()
END
  INSERT TableValue (Iteration, IdParamter, Type, Value) 
                              VALUES
                              (@Iteration, @IdParameter, @Type, @Value)
GO

I still have the same performance... :-( (not acceptable)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

慈悲佛祖 2024-10-04 10:12:48

如果我明白发生了什么,您正在查询数据库以查看步骤 1 中的数据是否存在。我将使用对存储过程的 db 调用,如果数据不存在,则插入数据。因此,只需计算结果并将其传递给 sp 即可。

可以先计算结果,然后批量插入吗?

计算函数是否从数据库中获取数据?如果是这样,您可以将操作转换为基于集合的操作并在服务器本身上执行吗?或者可能是其中的一部分?

请记住,sql server 是为大型数据集操作而设计的。

编辑:反映评论
由于代码在数据插入方面速度很慢,并且您怀疑这是因为插入必须在完成之前向后搜索,因此我建议您可能需要在搜索的列上放置 SQL 索引,以便提高搜索速度。

不过我有另一个想法。

为什么不直接插入数据而不进行检查,然后在读取数据时删除该查询中的重复项?

If I understand what's happening you're querying the database to see if the data is there in step 1. I'd use a db call to a stored procedure that that inserts the data if it not there. So just compute the results and pass to the sp.

Can you compute the results first, and then insert in batches?

Does the compute function take data from the database? If so can you turn the operation in to a set based operation and perform it on the server itself? Or may part of it?

Remember that sql server is designed for a large dataset operations.

Edit: reflecting comments
Since the code is slow on the data inserts, and you suspect that it's because the insert has to search back before it can be done, I'd suggest that you may need to place SQL Indexes on the columns that you search on in order to improve searching speed.

However I have another idea.

Why don't you just insert the data without the check and then later when you read the data remove the duplicates in that query?

以可爱出名 2024-10-04 10:12:48

考虑到 name2 - name3 可以为 null,是否可以重构参数表:

TableParameter
  Id    (int, PRIMARY KEY, IDENTITY)
  Name  (string)
  Dimension int

现在您可以对其建立索引并简化查询。 (WHERE name = "TheNameIWant" AND Dimension="2")

(说到索引,您确实在参数表中对名称列建立了索引?)

您在哪里进行插入的提交?如果您执行一条语句提交,请将多个插入分组为一个。

如果您是唯一一个插入值的人,如果速度确实至关重要,请将所有值从数据库加载到内存中并在那里进行检查。

一些想法

只是马里奥

Given the fact that name2 - name3 can be null, would it be possible to restructure the parameter table:

TableParameter
  Id    (int, PRIMARY KEY, IDENTITY)
  Name  (string)
  Dimension int

Now you can index it and simplify the query. (WHERE name = "TheNameIWant" AND Dimension="2")

(And speaking of indices, you do have index the name columns in the parameter table?)

Where do you do your commits on the insert? if you do one statement commits, group multiple inserts into one.

If you are the only one inserting values, if speed is really of essence, load all values from the database into the memory and check there.

just some ideas

hth

Mario

朱染 2024-10-04 10:12:48

我必须承认,我正在努力掌握您试图在这里实现的业务流程。

初步审查时,您似乎正在应用程序层中执行数据比较。我建议您不要这样做,并建议您让数据库引擎完成其设计目的,即管理和实现数据访问。

正如另一位发帖者所提到的,我同意您应该创建一个存储过程来处理您的记录插入逻辑。该过程可以执行简单的检查以查看您的记录是否已存在。

您还应该考虑:

  • 通过在四个名称列之间创建唯一约束来强制执行插入逻辑/规则。
  • 创建包含四个名称列的覆盖非聚集索引。

关于插入件的性能,也许您可​​以提供一些指标来限定您所看到的内容以及如何衡量它?

为了给您一个衡量标准,SQL Server 当前的 ETL 插入记录约为每秒 1600 万行。您期待并希望看到什么样的数字?

I must admit that I'm struggling to grasp the business process that you are trying to achieve here.

On initial review, it appears as if you are are performing a data comparison within your application tier. I would advise against this and suggest that you let the Database Engine do what it is designed to do, to manage and implement your data access.

As another poster has mentioned, I concur that you should look to create a Stored Procedure to handle your record insertion logic. The procedure can perform a simple check to see if your records already exist.

You should also consider:

  • Enforcing the insertion logic/rule by creating a Unique Constraint across the four name columns.
  • Creating a covering non-clustered index incorporating the four name columns.

With regard to performance of your inserts, perhaps you can provide some metrics to qualify what it is that you are seeing and how you are measuring it?

To give you a yardstick the current ETL insertion record for SQL Server is approx 16 million rows per second. What sort of numbers are you expecting and wanting to see?

丶视觉 2024-10-04 10:12:48

最快的方法(到目前为止我知道)是批量插入。但不仅仅是 INSERT 行。尝试插入+选择+联合。它的工作速度非常快。

insert into myTable
select a1, b1, c1, ...
union select a2, b2, c2, ...
union select a3, b3, c3, ...

the fastest way ( i know so far) is bulk insert. but not just lines of INSERT. try insert + select + union. it works pretty fast.

insert into myTable
select a1, b1, c1, ...
union select a2, b2, c2, ...
union select a3, b3, c3, ...
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文