提高实体框架中的批量插入性能
我想通过实体框架在表中插入20000条记录,大约需要2分钟。除了使用SP之外还有什么方法可以提高其性能吗?这是我的代码:
foreach (Employees item in sequence)
{
t = new Employees ();
t.Text = item.Text;
dataContext.Employees.AddObject(t);
}
dataContext.SaveChanges();
I want to insert 20000 records in a table by entity framework and it takes about 2 min. Is there any way other than using SP to improve its performance. This is my code:
foreach (Employees item in sequence)
{
t = new Employees ();
t.Text = item.Text;
dataContext.Employees.AddObject(t);
}
dataContext.SaveChanges();
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(11)
有机会进行多项改进(如果您使用
DbContext
):设置:
在 100 个插入的包中执行
SaveChanges()
。 .. 或者您可以尝试使用 1000 个项目的包并查看性能的变化。由于在所有这些插入过程中,上下文都是相同的并且它会变得越来越大,因此您可以每 1000 次插入重建上下文对象。
var yourContext = new YourContext();
I觉得这就是最大的收获。在我的导入数据过程中进行此改进,将时间从 7 分钟缩短到 6 秒。
实际数字...在您的情况下不可能是 100 或 1000...尝试并调整它。
There is opportunity for several improvements (if you are using
DbContext
):Set:
Do
SaveChanges()
in packages of 100 inserts... or you can try with packages of 1000 items and see the changes in performance.Since during all this inserts, the context is the same and it is getting bigger, you can rebuild your context object every 1000 inserts.
var yourContext = new YourContext();
I think this is the big gain.Doing this improvements in an importing data process of mine, took it from 7 minutes to 6 seconds.
The actual numbers... could not be 100 or 1000 in your case... try it and tweak it.
这样做是没有办法强制 EF 提高性能的。问题在于 EF 在数据库的单独往返中执行每个插入。太棒了不是吗?甚至数据集也支持批处理。请查看本文以获取一些解决方法。另一种解决方法是使用接受表值参数的自定义存储过程,但您需要原始 ADO.NET。
There is no way to force EF to improve performance when doing it this way. The problem is that EF executes each insert in separate round trip to the database. Awesome isn't it? Even DataSets supported batch processing. Check this article for some workaround. Another workaround can be using custom stored procedure accepting table valued parameter but you need raw ADO.NET for that.
使用下面的代码,您可以使用一个方法来扩展部分上下文类,该方法将获取实体对象的集合并将它们批量复制到数据库。只需将 MyEntities 中的类名称替换为实体类的名称,然后将其添加到项目中正确的命名空间中即可。之后,您需要做的就是调用 BulkInsertAll 方法来移交要插入的实体对象。不要重用上下文类,而是在每次使用它时创建一个新实例。这是必需的,至少在某些版本的 EF 中是必需的,因为与此处使用的 SQLConnection 关联的身份验证数据在使用该类一次后就会丢失。我不知道为什么。
此版本适用于 EF 5
此版本适用于 EF 6
最后,为 Linq-To-Sql 爱好者提供一些东西。
Using the code below you can extend the partial context class with a method that will take a collection of entity objects and bulk copy them to the database. Simply replace the name of the class from MyEntities to whatever your entity class is named and add it to your project, in the correct namespace. After that all you need to do is call the BulkInsertAll method handing over the entity objects you want to insert. Do not reuse the context class, instead create a new instance every time you use it. This is required, at least in some versions of EF, since the authentication data associated with the SQLConnection used here gets lost after having used the class once. I don't know why.
This version is for EF 5
This version is for EF 6
And finally, a little something for you Linq-To-Sql lovers.
也许这里的答案会对您有所帮助。似乎您想定期处理上下文。这是因为随着附加实体的增长,上下文变得越来越大。
Maybe this answer here will help you. Seems that you want to dispose of the context periodically. This is because the context gets bigger and bigger as the attached entities grows.
您的代码存在两个主要性能问题:
使用 Add 方法
在添加每个实体时,Add 方法只会变得越来越慢。
:http://entityframework.net/improve-ef-add-performance
请参阅 例如,通过以下方式添加 10,000 个实体:
注意:实体尚未保存在数据库中!
问题在于,Add 方法尝试在添加的每个实体上检测更改,而 AddRange 在将所有实体添加到上下文后执行一次。
常见的解决方案有:
使用 SaveChanges
尚未为批量操作创建实体框架。对于您保存的每个实体,都会执行数据库往返。
因此,如果您想插入 20,000 条记录,您将执行 20,000 次数据库往返,这是疯狂!
有一些支持批量插入的第三方库:
请参阅:实体框架批量插入库
选择批量插入库时要小心。只有实体框架扩展支持所有类型的关联和继承,并且它是唯一仍然受支持的。
免责声明:我是 实体框架扩展
该库允许您执行场景所需的所有批量操作:
示例
编辑:回答评论中的问题
不要太高,也不要太低。没有适合所有场景的特定值,因为它取决于行大小、索引、触发器等多种因素。
通常建议在 4000 左右。
您可以使用实体框架事务。如果交易启动,我们的图书馆就会使用该交易。但要小心,花费太多时间的事务也会带来诸如某些行/索引/表锁之类的问题。
There are two major performance issues with your code:
Using Add method
The Add method becomes only slower and slower at each entity you add.
See: http://entityframework.net/improve-ef-add-performance
For example, adding 10,000 entities via:
Note: Entities has not been saved yet in the database!
The problem is that the Add method tries to DetectChanges at every entity added while AddRange does it once after all entities have been added to the context.
Common solutions are:
Using SaveChanges
Entity Framework has not been created for Bulk Operations. For every entity you save, a database round-trip is performed.
So, if you want to insert 20,000 records, you will perform 20,000 database round-trip which is INSANE!
There are some third-party libraries supporting Bulk Insert available:
See: Entity Framework Bulk Insert library
Be careful, when choosing a bulk insert library. Only Entity Framework Extensions support all kind of associations and inheritance, and it's the only one still supported.
Disclaimer: I'm the owner of Entity Framework Extensions
This library allows you to perform all bulk operations you need for your scenarios:
Example
EDIT: Answer Question in Comment
Not too high, not too low. There isn't a particular value that fit in all scenarios since it depends on multiple factors such as row size, index, trigger, etc.
It's normally recommended to be around 4000.
You can use Entity Framework transaction. Our library uses the transaction if one is started. But be careful, a transaction that takes too much time come also with problems such as some row/index/table lock.
目前没有更好的方法,但是通过将 SaveChanges 移动到 for 循环内大约 10 个项目可能会有所改善。
您可以调整 10 以更接近更好的性能。它不会极大地提高速度,但它可以让您向用户展示一些进度并使其更加用户友好。
Currently there is no better way, however there may be a marginal improvement by moving SaveChanges inside for loop for probably 10 items.
You can adjust 10 to be closer to better performance. It will not greatly improve speed but it will allow you to show some progress to user and make it more user friendly.
在具有 1 个实例的基本网站的 Azure 环境中。我尝试使用 for 循环从 25000 条记录中一次插入一批 1000 条记录,花费了 11.5 分钟,但在并行执行中花费了不到一分钟。所以我建议使用 TPL (任务并行库)。
In Azure environment with Basic website that has 1 Instance.I tried to insert a Batch of 1000 records at a time out of 25000 records using for loop it took 11.5 min but in parallel execution it took less than a minute.So I recommend using TPL(Task Parallel Library).
更好的方法是完全跳过实体框架来执行此操作并依赖 SqlBulkCopy 类。其他操作可以像以前一样继续使用EF。
这会增加解决方案的维护成本,但无论如何,与使用 EF 相比,有助于将大型对象集合插入数据库所需的时间减少一到两个数量级。
以下是一篇文章,针对具有父子关系的对象,对 SqlBulkCopy 类与 EF 进行了比较(还描述了实现批量插入所需的设计更改): 如何将复杂对象批量插入SQL Server数据库
Better way is to skip the Entity Framework entirely for this operation and rely on SqlBulkCopy class. Other operations can continue using EF as before.
That increases the maintenance cost of the solution, but anyway helps reduce time required to insert large collections of objects into the database by one to two orders of magnitude compared to using EF.
Here is an article that compares SqlBulkCopy class with EF for objects with parent-child relationship (also describes changes in design required to implement bulk insert): How to Bulk Insert Complex Objects into SQL Server Database
尝试使用批量插入...
http://code.msdn.microsoft.com/LinqEntityDataReader
如果您有一组实体,例如 storeEntities,您可以使用 SqlBulkCopy 存储它们,如下所示
。此代码有一个问题。确保实体的实体框架定义与表定义完全相关,确保实体的属性在实体模型中的顺序与 SQL Server 表中的列的顺序相同。如果不这样做将导致异常。
Try using Bulk Insert....
http://code.msdn.microsoft.com/LinqEntityDataReader
If you have a collection of entities e.g storeEntities you can store them using SqlBulkCopy as follows
There is one gotcha with this code. Make sure that the Entity Framework definition for the entity correlates exactly with the table definition, ensure that the Entity's properties are in the same order in the Entity Model as the columns in the SQL Server table. Failure to do this will result in an exception.
虽然回复晚了,但我还是发布了答案,因为我也遭受过同样的痛苦。
我为此创建了一个新的 GitHub 项目,到目前为止,它支持使用 SqlBulkCopy 透明地对 Sql 服务器进行批量插入/更新/删除。
https://github.com/MHanafy/EntityExtensions
还有其他好东西,希望,它将扩展以在未来做更多的事情。
使用它就像简单一样
希望它有帮助!
Although a late reply, but I'm posting the answer because I suffered the same pain.
I've created a new GitHub project just for that, as of now, it supports Bulk insert/update/delete for Sql server transparently using SqlBulkCopy.
https://github.com/MHanafy/EntityExtensions
There're other goodies as well, and hopefully, It will be extended to do more down the track.
Using it is as simple as
Hope it helps!