C# 优化:向数据库插入 2 亿行
我有以下(简化的)代码,我想优化速度:
long inputLen = 50000000; // 50 million
DataTable dataTable = new DataTable();
DataRow dataRow;
object[] objectRow;
while (inputLen--)
{
objectRow[0] = ...
objectRow[1] = ...
objectRow[2] = ...
// Generate output for this input
output = ...
for (int i = 0; i < outputLen; i++) // outputLen can range from 1 to 20,000
{
objectRow[3] = output[i];
dataRow = dataTable.NewRow();
dataRow.ItemArray = objectRow;
dataTable.Rows.Add(dataRow);
}
}
// Bulk copy
SqlBulkCopy bulkTask = new SqlBulkCopy(connection, SqlBulkCopyOptions.TableLock, null);
bulkTask.DestinationTableName = "newTable";
bulkTask.BatchSize = dataTable.Rows.Count;
bulkTask.WriteToServer(dataTable);
bulkTask.Close();
我已经在使用 SQLBulkCopy 来尝试加快速度,但似乎向 DataTable 本身分配值被证明很慢。
我不知道 DataTables 是如何工作的,所以我想知道我是否通过首先创建一个可重用数组,然后将其分配给 DataRow,然后将 DataRow 添加到 DataTable 来创建不必要的开销?或者使用 DataTable 一开始就不是最佳选择?输入来自数据库。
我不太关心LOC,只关心速度。有人可以就此提供一些建议吗?
I have the following (simplified) code which I'd like to optimise for speed:
long inputLen = 50000000; // 50 million
DataTable dataTable = new DataTable();
DataRow dataRow;
object[] objectRow;
while (inputLen--)
{
objectRow[0] = ...
objectRow[1] = ...
objectRow[2] = ...
// Generate output for this input
output = ...
for (int i = 0; i < outputLen; i++) // outputLen can range from 1 to 20,000
{
objectRow[3] = output[i];
dataRow = dataTable.NewRow();
dataRow.ItemArray = objectRow;
dataTable.Rows.Add(dataRow);
}
}
// Bulk copy
SqlBulkCopy bulkTask = new SqlBulkCopy(connection, SqlBulkCopyOptions.TableLock, null);
bulkTask.DestinationTableName = "newTable";
bulkTask.BatchSize = dataTable.Rows.Count;
bulkTask.WriteToServer(dataTable);
bulkTask.Close();
I'm already using SQLBulkCopy in an attempt to speed things up, but it appears assigning values to the DataTable itself proves to be slow.
I don't know how DataTables work so I'm wondering if I'm creating unnecessary overhead by first creating a reusable array, then assigning it to a DataRow, then adding the DataRow to the DataTable? Or is using DataTable not optimal in the first place? The input comes from a database.
I don't care much about LOC, just speed. Can anyone give some advice on this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
对于这么大的表,您应该改用该
方法。
这可能意味着您必须使用代码自己实现一个“假”
IDataReader
接口(如果您没有从现有IDataReader
获取数据),但是这样,您将获得端到端的“流式传输”,并避免 2 亿次循环。For such a big table, you should instead use the
method.
It may mean you'll have to implement yourself a "fake"
IDataReader
interface with your code (if you' don't get the data from an existingIDataReader
), but this way, you'll get "streaming" from end to end, and will avoid a 200 million loop.我建议不要在内存中保存巨大的数据表,而是实现一个 IDataReader,它可以在批量复制时提供数据。这将减少将所有内容预先保存在内存中的需要,从而有助于提高性能。
Instead of holding a huge data table in memory, I would suggest implementing a
IDataReader
which serves up the data as the bulk copy goes. This will reduce the need to keep everything in memory upfront, and should thus serve to improve performance.您不应该在内存中构建整个数据表。使用 WrtieToServer 的重载,它采用 DataRow 数组。只需将数据分成块即可。
You should not construct entire datatable in memory. Use this overload of WrtieToServer, that takes array of DataRow. Just split in chunks your data.