使用 linq、外键关系或本地列表哪种模型最快?
一些基础知识
我有两张表,一张保存用户,一张保存登录日志。 用户表包含大约 15000 多个用户,登录表正在增长并达到 150000 多个帖子。 该数据库基于 SQL Server(非 Express)构建。
为了管理用户,我获得了一个从 ObjectDatasource 填充的 gridview(来自 Devexpress 的 ASPxGridView)。
在总结用户登录次数时,我应该了解哪些一般注意事项?
事情变得异常缓慢。
这是一张显示涉及的表的图片。
我尝试了一些方法。
DbDataContext db = new DbDataContext();
// Using foregin key relationship
foreach (var proUser in db.tblPROUsers)
{
var count = proUser.tblPROUserLogins.Count;
//...
}
执行时间:01:29.316(1分29秒)
// By storing a list in a local variable (I removed the FK relation)
var userLogins = db.tblPROUserLogins.ToList();
foreach (var proUser in db.tblPROUsers)
{
var count = userLogins.Where(x => x.UserId.Equals(proUser.UserId)).Count();
//...
}
执行时间:01:18.410(1分18秒)
// By storing a dictionary in a local variable (I removed the FK relation)
var userLogins = db.tblPROUserLogins.ToDictionary(x => x.UserLoginId, x => x.UserId);
foreach (var proUser in db.tblPROUsers)
{
var count = userLogins.Where(x => x.Value.Equals(proUser.UserId)).Count();
//...
}
执行时间:01:15.821( 1分15秒)
表现最好的模型实际上是字典。不过,我想知道我想听听的任何选项,以及在处理如此大量的数据时这种编码是否存在“不好”的地方。
谢谢
================================================== =======
更新了根据 BrokenGlass 示例的模型
// By storing a dictionary in a local variable (I removed the FK relation)
foreach (var proUser in db.tblPROUsers)
{
var userId = proUser.UserId;
var count = db.tblPROUserLogins.Count(x => x.UserId.Equals(userId));
//...
}
执行时间:02:01.135(2 分 1 秒)
除此之外,我创建了一个存储简单类的列表
public class LoginCount
{
public int UserId { get; set; }
public int Count { get; set; }
}
,并且在总结方法
var loginCount = new List<LoginCount>();
// This foreach loop takes approx 30 secs
foreach (var login in db.tblPROUserLogins)
{
var userId = login.UserId;
// Check if available
var existing = loginCount.Where(x => x.UserId.Equals(userId)).FirstOrDefault();
if (existing != null)
existing.Count++;
else
loginCount.Add(new LoginCount{UserId = userId, Count = 1});
}
// Calling it
foreach (var proUser in tblProUser)
{
var user = proUser;
var userId = user.UserId;
// Count logins
var count = 0;
var loginCounter = loginCount.Where(x => x.UserId.Equals(userId)).FirstOrDefault();
if(loginCounter != null)
count = loginCounter.Count;
//...
}
执行时间:00:36.841(36秒)
结论如此到目前为止,用 linq 进行总结很慢,但我已经到了!
Some basics
I have two tables, one holding the users and one holding a log with logins.
The user table holds something like 15000+ users, the login table is growing and is reaching 150000+ posts.
The database is built upon SQL Server (not express).
To administer the users I got a gridview (ASPxGridView from Devexpress) that I populate from an ObjectDatasource.
Is there any general do’s and donts I should know about when summarizing the number of logins a user made.
Things are getting strangely slow.
Here is a picture showing the involved tables.
I’ve tried a few things.
DbDataContext db = new DbDataContext();
// Using foregin key relationship
foreach (var proUser in db.tblPROUsers)
{
var count = proUser.tblPROUserLogins.Count;
//...
}
Execution time: 01:29.316 (1 minute and 29 seconds)
// By storing a list in a local variable (I removed the FK relation)
var userLogins = db.tblPROUserLogins.ToList();
foreach (var proUser in db.tblPROUsers)
{
var count = userLogins.Where(x => x.UserId.Equals(proUser.UserId)).Count();
//...
}
Execution time: 01:18.410 (1 minute and 18 seconds)
// By storing a dictionary in a local variable (I removed the FK relation)
var userLogins = db.tblPROUserLogins.ToDictionary(x => x.UserLoginId, x => x.UserId);
foreach (var proUser in db.tblPROUsers)
{
var count = userLogins.Where(x => x.Value.Equals(proUser.UserId)).Count();
//...
}
Execution time: 01:15.821 (1 minute and 15 seconds)
The model giving the best performance is actually the dictionary. However I you know of any options I'd like to hear about it, also if there's something "bad" with this kind of coding when handling such large amounts of data.
Thanks
========================================================
UPDATED With a model according to BrokenGlass example
// By storing a dictionary in a local variable (I removed the FK relation)
foreach (var proUser in db.tblPROUsers)
{
var userId = proUser.UserId;
var count = db.tblPROUserLogins.Count(x => x.UserId.Equals(userId));
//...
}
Execution time: 02:01.135 (2 minutes and 1 second)
In addition to this I created a list storing a simple class
public class LoginCount
{
public int UserId { get; set; }
public int Count { get; set; }
}
And in the summarizing method
var loginCount = new List<LoginCount>();
// This foreach loop takes approx 30 secs
foreach (var login in db.tblPROUserLogins)
{
var userId = login.UserId;
// Check if available
var existing = loginCount.Where(x => x.UserId.Equals(userId)).FirstOrDefault();
if (existing != null)
existing.Count++;
else
loginCount.Add(new LoginCount{UserId = userId, Count = 1});
}
// Calling it
foreach (var proUser in tblProUser)
{
var user = proUser;
var userId = user.UserId;
// Count logins
var count = 0;
var loginCounter = loginCount.Where(x => x.UserId.Equals(userId)).FirstOrDefault();
if(loginCounter != null)
count = loginCounter.Count;
//...
}
Execution time: 00:36.841 (36 seconds)
Conclusion so far, summarizing with linq is slow, but Im getting there!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
如果您尝试构建一个执行相同操作的 SQL 查询并独立于您的应用程序(在 SQL Server Management Studio 中)执行它,也许这会很有用。类似于:
(注意:这仅选择
UserId
。如果您想要来自tblPROUser
的其他字段,则需要在这个基本的“顶部”进行简单的 JOIN查询。)确保 {UserId, UserLoginId} 上有一个复合索引并且查询计划正在使用它。在索引中并按该顺序包含两个字段可确保您的查询可以在不触及
tblPROUserLogin
表的情况下运行:然后进行基准测试,看看是否可以获得比 LINQ 代码明显更好的时间:
--- EDIT ---
以下 LINQ 代码段相当于上面的查询:
在控制台中打印以下文本:
--- EDIT 2 ---
To拥有“整个”用户而不仅仅是
UserId
,您可以执行以下操作:控制台输出显示以下查询...
...如果您的索引正确,这应该非常有效:
Perhaps it would be useful if you tried to construct an SQL query that does the same thing and executing it independently of your application (in SQL Server Management Studio). Something like:
(NOTE: This just selects
UserId
. If you want other fields fromtblPROUser
, you'll need a simple JOIN "on top" of this basic query.)Ensure there is a composite index on {UserId, UserLoginId} and it is being used by the query plan. Having both fields in the index and in that order ensures your query can run without touching the
tblPROUserLogin
table:Then benchmark and see if you can get a significantly better time than your LINQ code:
--- EDIT ---
The follwing LINQ snippet is equivalent to the query above:
Which prints the following text in the console:
--- EDIT 2 ---
To have the "whole" user and not just
UserId
, you can do this:And the console output shows the following query...
...which should be very efficient provided your indexes are correct:
第二种情况应该是迄今为止最快的,只要您删除
ToList()
,这样计数就可以在数据库端完成,而不是在内存中:此外,您还必须将由于 EF 无法处理对象的映射属性,因此首先将用户 id 转换为“普通”原始变量。
The second case should always be the fastest by far provided you drop the
ToList()
so counting can be done on the database side, not in memory:Also you have to put the user id into a "plain" primitive variable first since EF can't deal with mapping properties of an object.
抱歉,因为我不在普通计算机上,所以盲目地这样做。只是几个问题,
这样的事情对你有用吗?
Sorry, doing this blind since I'm not on my normal computer. Just a couple of questions
Does something like this work for you?