C# 优化：向数据库插入 2 亿行

发布于 2024-10-15 05:50:52 字数 1048 浏览 8 评论 0原文

我有以下（简化的）代码，我想优化速度：

long inputLen = 50000000; // 50 million 
DataTable dataTable = new DataTable();
DataRow dataRow;
object[] objectRow;
while (inputLen--)
{
    objectRow[0] = ...
    objectRow[1] = ...
    objectRow[2] = ...

    // Generate output for this input
    output = ...

    for (int i = 0; i < outputLen; i++) // outputLen can range from 1 to 20,000
    {
         objectRow[3] = output[i];
         dataRow = dataTable.NewRow();
         dataRow.ItemArray = objectRow;
         dataTable.Rows.Add(dataRow);
    }
}

// Bulk copy
SqlBulkCopy bulkTask = new SqlBulkCopy(connection, SqlBulkCopyOptions.TableLock, null);
bulkTask.DestinationTableName = "newTable";
bulkTask.BatchSize = dataTable.Rows.Count;
bulkTask.WriteToServer(dataTable);
bulkTask.Close();

我已经在使用 SQLBulkCopy 来尝试加快速度，但似乎向 DataTable 本身分配值被证明很慢。

我不知道 DataTables 是如何工作的，所以我想知道我是否通过首先创建一个可重用数组，然后将其分配给 DataRow，然后将 DataRow 添加到 DataTable 来创建不必要的开销？或者使用 DataTable 一开始就不是最佳选择？输入来自数据库。

我不太关心LOC，只关心速度。有人可以就此提供一些建议吗？

原文

I have the following (simplified) code which I'd like to optimise for speed:

long inputLen = 50000000; // 50 million 
DataTable dataTable = new DataTable();
DataRow dataRow;
object[] objectRow;
while (inputLen--)
{
    objectRow[0] = ...
    objectRow[1] = ...
    objectRow[2] = ...

    // Generate output for this input
    output = ...

    for (int i = 0; i < outputLen; i++) // outputLen can range from 1 to 20,000
    {
         objectRow[3] = output[i];
         dataRow = dataTable.NewRow();
         dataRow.ItemArray = objectRow;
         dataTable.Rows.Add(dataRow);
    }
}

// Bulk copy
SqlBulkCopy bulkTask = new SqlBulkCopy(connection, SqlBulkCopyOptions.TableLock, null);
bulkTask.DestinationTableName = "newTable";
bulkTask.BatchSize = dataTable.Rows.Count;
bulkTask.WriteToServer(dataTable);
bulkTask.Close();

I'm already using SQLBulkCopy in an attempt to speed things up, but it appears assigning values to the DataTable itself proves to be slow.

I don't know how DataTables work so I'm wondering if I'm creating unnecessary overhead by first creating a reusable array, then assigning it to a DataRow, then adding the DataRow to the DataTable? Or is using DataTable not optimal in the first place? The input comes from a database.

I don't care much about LOC, just speed. Can anyone give some advice on this?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

泪眸﹌ 2024-10-22 05:50:52

对于这么大的表，您应该改用该

public void WriteToServer(IDataReader reader)

方法。

这可能意味着您必须使用代码自己实现一个“假”IDataReader 接口（如果您没有从现有 IDataReader 获取数据），但是这样，您将获得端到端的“流式传输”，并避免 2 亿次循环。

For such a big table, you should instead use the

public void WriteToServer(IDataReader reader)

method.

It may mean you'll have to implement yourself a "fake" IDataReader interface with your code (if you' don't get the data from an existing IDataReader), but this way, you'll get "streaming" from end to end, and will avoid a 200 million loop.

回复收藏 0 原文