使用 C# 和 Linq 以 1000 为一组处理文档的 46,000 行
我有下面的代码可以执行。 IT 文本文件中有 46,000 条记录,我需要处理这些记录并将其插入数据库。如果我直接调用它并一次循环一个,则需要很长时间。
我试图使用 LINQ 每隔 1000 行左右提取一次并将其放入一个线程中,这样我就可以一次处理 3000 行并缩短处理时间。但我无法弄清楚。所以我需要一些帮助。
任何建议都将受到欢迎。先感谢您。
var reader = ReadAsLines(tbxExtended.Text);
var ds = new DataSet();
var dt = new DataTable();
string headerNames = "Long|list|of|strings|"
var headers = headerNames.Split('|');
foreach (var header in headers)
dt.Columns.Add(header);
var records = reader.Skip(1);
foreach (var record in records)
dt.Rows.Add(record.Split('|'));
ds.Tables.Add(dt);
ds.AcceptChanges();
ProcessSmallList(ds);
I have this code below that executes. IT has 46,000 records in the text file that i need to process and insert into the database. It takes FOREVER if i just call it directly and loop one at a time.
I was trying to use LINQ to pull every 1000 rows or so and throw it into a thread so I could proces 3000 rows at once and cut the processing time. I can't figure it out though. so I need some help.
Any suggestions would be welcome. Thank You in advance.
var reader = ReadAsLines(tbxExtended.Text);
var ds = new DataSet();
var dt = new DataTable();
string headerNames = "Long|list|of|strings|"
var headers = headerNames.Split('|');
foreach (var header in headers)
dt.Columns.Add(header);
var records = reader.Skip(1);
foreach (var record in records)
dt.Rows.Add(record.Split('|'));
ds.Tables.Add(dt);
ds.AcceptChanges();
ProcessSmallList(ds);
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
如果您正在寻找高性能,请查看 SqlBulkInsert< /a> 如果您使用的是 SqlServer。性能明显优于逐行插入。
下面是一个使用我用于项目的自定义 CSVDataReader 的示例,但任何 IDataReader 兼容的 Reader,DataRow[] 或 DataTable 可用作 WriteToServer 的参数,SQLDataReader, OLEDBDataReader等。
有相当多的选项可用。 (见文章中的链接)
If you are looking for high performance then look at the SqlBulkInsert if you are using SqlServer. The performance is significantly better than Insert row by row.
Here is an example using a custom CSVDataReader that I used for a project, but any IDataReader compatible Reader, DataRow[] or DataTable can be used as a parameter into WriteToServer, SQLDataReader, OLEDBDataReader etc.
There are quite a number of options available. (See the link in the item)
要将数据批量插入数据库,您可能应该使用该数据库引擎的批量插入实用程序(例如 SQL Server 中的 bcp)。您可能希望首先进行处理,将处理后的数据写入单独的文本文件中,然后批量插入到您关注的数据库中。
如果你确实想在线处理并在线插入,内存也是一个(小)因素,例如:
我相信您观察到的缓慢实际上来自数据集。数据集对每个新记录发出一个 INSERT 语句,这意味着您不会通过一次 1,000 行或一次一行执行 Update() 来保存任何内容。您仍然有 46,000 个 INSERT 语句进入数据库,这使得速度变慢。
为了提高性能,恐怕 LINQ 在这里无法帮助您,因为瓶颈在于 46,000 个 INSERT 语句。您应该:
如果如果您坚持使用数据集,则不必使用 LINQ——LINQ 解决不同类型的问题。执行以下操作:
这会在一定程度上提高性能,但是您永远无法达到通过为数据库引擎使用专用批量插入实用程序(或函数调用)获得的性能。
不幸的是,使用这些批量插入功能将使您的代码难以移植到另一个数据库引擎。这是您需要做出的权衡。
To bulk insert data into a database, you probably should be using that database engine's bulk-insert utility (e.g. bcp in SQL Server). You might want to first do the processing, write out the processed data into a separate text file, then bulk-insert into your database of concern.
If you really want to do the processing on-line and insert on-line, memory is also a (small) factor, for example:
I believe the slowness you observed actually came from the dataset. Datasets issue one INSERT statement per new record, which means that you won't be saving anything by doing Update() 1,000 rows at a time or one row at a time. You still have 46,000 INSERT statements going to the database, which makes it slow.
In order to improve performance, I'm afraid LINQ can't help you here, since the bottleneck is with the 46,000 INSERT statements. You should:
If you insist on using datasets, you don't have to do it with LINQ -- LINQ solves a different type of problems. Do something like:
This will improve performance somewhat, but you're never going to approach the performance you obtain by using dedicated bulk-insert utilities (or function calls) for your database engine.
Unfortunately, using those bulk-insert facilities will make your code less portable to another database engine. This is the trade-off you'll need to make.