创建 1000 个实体框架对象时,何时应该调用 SaveChanges()? (就像导入期间一样)
我正在运行一次导入,每次运行都会有 1000 条记录。只是寻找一些对我的假设的确认:
以下哪一个最有意义:
- 每次
AddToClassName()
调用运行SaveChanges()
。 - 每 n 次
AddToClassName()
调用运行SaveChanges()
。 - 在
所有次
AddToClassName()
调用之后运行SaveChanges()
。
第一个选项可能很慢,对吧?因为它需要分析内存中的 EF 对象、生成 SQL 等。
我认为第二个选项是两全其美的,因为我们可以在 SaveChanges()
调用周围包装一个 try catch ,并且如果其中一条记录失败,一次只会丢失 n 条记录。也许将每个批次存储在 List<> 中。如果 SaveChanges()
调用成功,则删除该列表。如果失败,请记录项目。
最后一个选项可能也会变得非常慢,因为在调用 SaveChanges()
之前,每个 EF 对象都必须位于内存中。如果保存失败,则不会提交任何内容,对吗?
I am running an import that will have 1000's of records on each run. Just looking for some confirmation on my assumptions:
Which of these makes the most sense:
- Run
SaveChanges()
everyAddToClassName()
call. - Run
SaveChanges()
every n number ofAddToClassName()
calls. - Run
SaveChanges()
after all of theAddToClassName()
calls.
The first option is probably slow right? Since it will need to analyze the EF objects in memory, generate SQL, etc.
I assume that the second option is the best of both worlds, since we can wrap a try catch around that SaveChanges()
call, and only lose n number of records at a time, if one of them fails. Maybe store each batch in an List<>. If the SaveChanges()
call succeeds, get rid of the list. If it fails, log the items.
The last option would probably end up being very slow as well, since every single EF object would have to be in memory until SaveChanges()
is called. And if the save failed nothing would be committed, right?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
我构建了一个包含需要在表中创建的所有记录的数组。
其次,我在for循环中使用for循环来保存记录。
一个。 for 循环一次获取 100 条记录,直到达到数组的 count (1000)。
a.1.打开上下文使用
a.2.内部 for 循环一次获取一项并将该项添加到表中
a.3.上下文.SaveChanges();
a.4.关闭使用
b.循环到a。
I build an array of all the records that I need to create in a table.
Second, I use for loop in a for loop to save records.
a. for Loop to get 100 records at a time until it reaches count (1000) of array.
a.1. open using for context
a.2. inner for Loop to take one item at a time and to add item to table then
a.3. context.SaveChanges();
a.4. close using
b. Loop to a.
我会先测试一下以确定。性能不一定那么差。
如果需要在一个事务中输入所有行,请在 AddToClassName 类之后调用它。如果可以单独输入行,请在每行后保存更改。数据库一致性很重要。
我不喜欢第二个选择。如果我导入系统并且它会拒绝 1000 行中的 10 行,仅仅因为 1 行是坏的,这会让我感到困惑(从最终用户的角度来看)。可以尝试导入10个,如果失败,则逐个尝试,然后记录。
测试一下是否需要很长时间。不要写“可能”。你还不知道。只有当它确实是一个问题时,才考虑其他解决方案(marc_s)。
编辑
我做了一些测试(时间以毫秒为单位):
10000行:
1行后的SaveChanges():18510,534
100行后的SaveChanges():4350,3075< br>SaveChanges() 10000 行后:5233,0635
50000 行:
SaveChanges() 1 行后:78496,929
500 行后 SaveChanges():22302,2835
50000 行后的 SaveChanges():24022,8765
因此,在 n 行后提交实际上比最终更快。
我的建议是:
测试类别:
表:
类别:
I would test it first to be sure. Performance doesn't have to be that bad.
If you need to enter all rows in one transaction, call it after all of AddToClassName class. If rows can be entered independently, save changes after every row. Database consistence is important.
Second option I don't like. It would be confusing for me (from final user perspective) if I made import to system and it would decline 10 rows out of 1000, just because 1 is bad. You can try to import 10 and if it fails, try one by one and then log.
Test if it takes long time. Don't write 'propably'. You don't know it yet. Only when it is actually a problem, think about other solution (marc_s).
EDIT
I've done some tests (time in miliseconds):
10000 rows:
SaveChanges() after 1 row:18510,534
SaveChanges() after 100 rows:4350,3075
SaveChanges() after 10000 rows:5233,0635
50000 rows:
SaveChanges() after 1 row:78496,929
SaveChanges() after 500 rows:22302,2835
SaveChanges() after 50000 rows:24022,8765
So it is actually faster to commit after n rows than after all.
My recommendation is to:
Test classes:
TABLE:
Class:
我刚刚在自己的代码中优化了一个非常相似的问题,并想指出一个对我有用的优化。
我发现处理 SaveChanges 的大部分时间,无论是一次处理 100 条记录还是 1000 条记录,都受 CPU 限制。因此,通过使用生产者/消费者模式(使用 BlockingCollection 实现)处理上下文,我能够更好地利用 CPU 核心,并从每秒 4000 次更改(如 SaveChanges 的返回值报告)中获得每秒超过 14,000 次更改。 CPU 利用率从大约 13%(我有 8 个核心)上升到大约 60%。即使使用多个消费者线程,我也几乎不会对(非常快的)磁盘 IO 系统造成负担,并且 SQL Server 的 CPU 利用率也不高于 15%。
通过将保存卸载到多个线程,您可以调整提交之前的记录数和执行提交操作的线程数。
我发现创建 1 个生产者线程和(CPU 核心数)-1 个消费者线程允许我调整每批提交的记录数,以便 BlockingCollection 中的项目数在 0 和 1 之间波动(在消费者线程获取 1 个数据之后)物品)。这样,就有足够的工作让消耗线程以最佳方式工作。
当然,这种场景需要为每个批次创建一个新的上下文,我发现即使在我的用例的单线程场景中,这也会更快。
I just optimized a very similar problem in my own code and would like to point out an optimization that worked for me.
I found that much of the time in processing SaveChanges, whether processing 100 or 1000 records at once, is CPU bound. So, by processing the contexts with a producer/consumer pattern (implemented with BlockingCollection), I was able to make much better use of CPU cores and got from a total of 4000 changes/second (as reported by the return value of SaveChanges) to over 14,000 changes/second. CPU utilization moved from about 13 % (I have 8 cores) to about 60%. Even using multiple consumer threads, I barely taxed the (very fast) disk IO system and CPU utilization of SQL Server was no higher than 15%.
By offloading the saving to multiple threads, you have the ability to tune both the number of records prior to commit and the number of threads performing the commit operations.
I found that creating 1 producer thread and (# of CPU Cores)-1 consumer threads allowed me to tune the number of records committed per batch such that the count of items in the BlockingCollection fluctuated between 0 and 1 (after a consumer thread took one item). That way, there was just enough work for the consuming threads to work optimally.
This scenario of course requires creating a new context for every batch, which I find to be faster even in a single-threaded scenario for my use case.
如果您需要导入数千条记录,我会使用 SqlBulkCopy 之类的东西,而不是实体框架。
If you need to import thousands of records, I'd use something like SqlBulkCopy, and not the Entity Framework for that.
使用存储过程。
我相信这将是最简单、最快的方法。
Use a stored procedure.
I believe this would be the easiest and fastest way to do this.
抱歉,我知道这个帖子已经过时了,但我认为这可以帮助其他人解决这个问题。
我遇到了同样的问题,但是可以在提交更改之前验证更改。我的代码看起来像这样并且工作正常。使用
chUser.LastUpdated
我检查它是一个新条目还是只是一个更改。因为不可能重新加载尚未存在于数据库中的条目。Sorry, I know this thread is old, but I think this could help other people with this problem.
I had the same problem, but there is a possibility to validate the changes before you commit them. My code looks like this and it is working fine. With the
chUser.LastUpdated
I check if it is a new entry or only a change. Because it is not possible to reload an Entry that is not in the database yet.