为什么重用 DataContext 会对性能产生负面影响?
经过公平 < a href="https://weblog.west-wind.com/posts/2008/Feb/05/Linq-to-SQL-DataContext-Lifetime-Management" rel="nofollow noreferrer">金额 研究和一些错误,我修改了代码,以便在每次查询数据库或插入数据时创建一个新的 DataContext。并且数据库会被频繁查询 - 对于每一个处理的 250k 事务,在插入事务之前都会查询数据库以获取客户 ID、部门 ID 和类别。
所以现在我正在尝试优化代码,因为它每秒仅处理大约 15 个事务。我删除了一些无关的查询并添加了一些索引,并将其提高到 30/秒。然后我想到,尽管每个人都说 DataContext 是轻量级的,但每次事务创建 4 次新的 DataContext 都需要付出一些代价,所以我尝试重用 DataContext。令我惊讶的是,我发现重用上下文会导致性能下降到每秒 10 个事务!
为什么会出现这样的情况呢?是因为 DataContext 将实体缓存在内存中,并在查询数据库之前首先搜索其内存列表吗?因此,例如,如果我正在查找名为“MCS”的客户的客户 ID(主键),并且客户名称列上有聚集索引,以便数据库查询速度很快,那么内存中的查找会变慢吗?
创建/处置如此多的数据库连接是否真的会减慢速度,或者这只是另一种过早的优化?如果这是真的,是否有办法重用 DataContext,但让它为每个 linq-to-sql 查询执行实际的数据库查询?
After a fair amount of research and some errors, I modified my code so that it creates a new DataContext each time the database is queried or data is inserted. And the database is queried frequently - for each of 250k transactions that are processed, the database is queried to obtain a customer id, department id, and category before the transaction is inserted.
So now I'm trying to optimize the code as it was only processing around 15 transactions a second. I removed some extraneous queries and added some indexes and got it up to 30/sec. I then figured that even though everyone says a DataContext is lightweight, it's got to cost something to create a new one 4 times per transaction, so I tried reusing the DataContext. I found, much to my surprise, that reusing the context caused performance to degrade to 10 transactions a second!
Why would this be the case? Is it because the DataContext caches the entities in memory and first searches through its in-memory list before querying the database? So that if, for example, I'm looking for the customer id (primary key) for the customer with name 'MCS' and the customer name column has a clustered index on it so that the database query is fast, the in-memory lookup will be slower?
And is it true that creating/disposing so many db connections could slow things down, or is this just another premature optimization? And if it is true, is there a way to reuse a DataContext but have it perform an actual database query for each linq-to-sql query?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
以下是为什么重用 DataContext 不是最佳实践,来自 MSDN DataContext 文档:
如果您重复使用 DataContext 进行大量查询,则性能会因以下几个可能的原因而降低:
如果 DataContext 的内存中身份缓存变得如此之大,以至于必须开始写入页面文件,则您的性能将取决于 HD 的读取头速度,实际上根本没有理由使用缓存。
内存中的身份对象越多,每次保存操作花费的时间就越长。
内存
本质上,您所做的事情违反了 DataContext 类的 UoW 原则。
打开数据库连接确实会产生一些与之相关的开销,但是长时间保持连接打开(这通常也意味着锁定表)不如快速打开和关闭连接更可取。
MSDN 中的另一个链接可能对您有帮助,也可能没有帮助:
如何:重用 ADO.NET 命令和 DataContext (LINQ to SQL) 之间的连接
Here's why re-using a DataContext is not a best practice, from the MSDN DataContext documentation:
If you're re-using a DataContext for a large number of queries, your performance will degrade for a couple of possible reasons:
If DataContext's in-memory identity cache becomes so large that it has to start writing to the pagefile then your performance will be bound to the HD's read-head speed and effectively there won't be a reason to use a cache at all.
The more identity objects there are in memory, the longer each save operation takes.
Essentially what you're doing is violating the UoW principle for the DataContext class.
Opening database connections does have some overhead associated with it, but keeping a connection open for a long period of time (which often also means locking a table) is less preferable than opening and closing them quickly.
Another link which may or may not help you from MSDN:
How to: Reuse a Connection Between an ADO.NET Command and a DataContext (LINQ to SQL)
即使使用聚集索引,内存中查找也始终比数据库查询更快(除了边缘情况,例如 386 与 Cray),即使您排除了与网络相关的延迟。
我猜测这种降级与 DataContext 对它跟踪的实体的处理有关:重用上下文将不断增加跟踪实体的数量,并且对 SaveChanges 的调用最终可能需要更多时间。
再说一次,这只是一个猜测——但这就是我要开始寻找的地方。
Even with a clustered index, in-memory lookup will always be faster than a database query--except in edge cases, like a 386 vs. a Cray--even if you factor out network-related delays.
I would guess the degradation has to do with the DataContext's handling of entities that it tracks: reusing a context will continually increase the number of tracked entities, and the call to
SaveChanges
may end up requiring more time.Again, that's a guess--but it's where I'd start looking.
这里不完全正确,但是您是否考虑过某种应用程序级缓存来查找客户 ID、部门 ID 和类别?从您的帖子中不清楚您的系统中存在多少这些实体,或者查询获取它们涉及哪些内容。
然而,举个例子,如果你的系统中有一百万个类别,并且你需要通过类别名称查找它们的 Id,那么在内存中始终保留一个名称/Id 字典以供查找将节省你访问数据库的次数。您处理的交易。这可以极大地提高性能(这假设了一些事情,比如没有定期添加新的护理)。一般来说,与内存中的操作相比,数据库的往返成本较高。
Not exactly on point here, but have you considered some sort of application-level cache to look up the customer id, department id, and category? It's not clear from your post how many of these entities exist in your system, or what is involved in querying to obtain them.
However, as an example, if you have one million categories in your system, and you need to look up their Id by category name, keeping an name/Id dictionary in memory for lookup at all times will save you a trip to the database for transaction you process. This could massively improve performance (this assumes a few things, like new caregories aren't being added regularly). As a general rule, round trips to the database are expensive relative to in-memory operations.
您必须端到端地分析所有内容,看看您的时间真正花在了哪里。
如果行很宽,聚集索引不一定是最快的。最快的可能是覆盖非聚集索引,但这确实不是重点。
我希望为了获得更高的性能,如果您没有真正使用这些功能,您可能必须放弃一些框架。如果您正在使用这些功能 - 好吧,这就是您所付出的代价......
You would have to profile everything end-to-end and see where your time is really being spent.
A clustered index is not necessarily the fastest if a row is wide. The fastest would probably be a covering non-clustered index, but that's really beside the point.
I would expect that to get more performance, you're probably going to have to jettison some of the framework, if you aren't really using the capabilities. If you are using the capabilities - well, that's what you are paying for...