O/R 映射:单个复杂查询与多个简单查询
我很好奇 SQL 查询的结果集是如何从服务器传输到客户端的。
大多数 O/R 映射器都支持急切加载和延迟加载,两者各有优缺点。 例如,Entity Framework4 (.NET) 具有出色的预加载支持。
然而,假设我们有一个像这样的模型:
BlogPost
{
public string Body {get;set;}
ICollection<Comment> Comments {get;set;}
}
...
和一个像这样的查询:
var posts = context
.Posts
.Include(post => post.Comments)
.Where(post => post.Id == 1)
.First();
这将产生一个 SQL 查询,其中“帖子”的所有数据在每个“评论”的每一行上
重复我们对某个特定帖子有 100 条评论,而 Post.Body 是一大段文本。这不太好吧? 或者数据在发送到客户端时是否以某种方式进行压缩,从而最大限度地减少每行重复数据的开销?
确定一个这样的查询是否比两个简单查询(一个用于获取帖子,一个用于获取其评论)更有效的最佳方法是什么?
在开发环境上对此进行基准测试是毫无意义的,这里有多个因素: SQL 服务器上的 CPU 负载 网络负载 应用程序服务器上的 CPU 负载(具体化对象)
对此有何想法?
[编辑] 澄清:
两个查询将是这样的:
sql
select * from post where postid = 123
结果
id , topic, body , etc...
sql
select * from comment where postid = 123
结果
id,postid, commenttext , etc...
第一个查询将产生一行,第二个查询将产生与注释一样多的行。
对于单个查询,会有与特定帖子的评论一样多的行,但每行上都会重复所有帖子数据。
result
p.id , p.topic, __p.body__, c.id, c.postid, c.commenttext
p.body 将在每一行上重复,从而使结果集变得非常大。 (假设 p.body 包含大量数据;-)
I'm qurious on how the result set of an SQL query is transported from the server to the client.
Most O/R mappers support both eager and lazy load, both have their pros and cons.
e.g. Entity Framework4 (.NET) has wonderful eager load support.
However, lets assume we have a model like this:
BlogPost
{
public string Body {get;set;}
ICollection<Comment> Comments {get;set;}
}
...
and a query like this:
var posts = context
.Posts
.Include(post => post.Comments)
.Where(post => post.Id == 1)
.First();
This will result in a single SQL query, where all the data for the "Post" is repeated on each row for every "Comment"
Lets say we have 100 comments on a specific post and the Post.Body is a massive peice of text. this can't be good?
Or is the data somehow compressed when sent to the client, thus minimizing the overhead of repeating data on each row?
What is the best way to determine if one such query is more efficient than just two simple queries (one for getting the post and one for getting its comments)?
Benchmarking this on a dev environment is pretty pointless, there are multiple factors here:
CPU load on the SQL server
Network load
CPU load on the app server (materializing objects)
Ideas on this?
[Edit]
Clarification:
Two queries would be something like this:
sql
select * from post where postid = 123
result
id , topic, body , etc...
sql
select * from comment where postid = 123
result
id,postid, commenttext , etc...
the first query would yield one row and the 2nd query would yield as many rows as there are comments.
with a single query there would be as many rows as there are comments for the specific post , but with all the post data repeated on each row.
result
p.id , p.topic, __p.body__, c.id, c.postid, c.commenttext
p.body would be repeated on each row, thus making the result set extremely large.
(assuming that p.body contains alot of data that is ;-)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我认为这实际上可以归结为以下几点:
如果您有数百万个帖子,那么即使每个帖子有多个评论,最好使用单个查询,因为聚合的往返时间将比传输附加数据的时间差得多。
所以,我认为你需要有敏锐的眼睛;-)
而且,我认为开发环境中的基准测试并不是毫无意义的,因为它至少可以给出两种方法之间的关系。
I think it really comes down to the following:
If you have several million posts, it will be better to use a single query, even if you have several comments for each post, because the aggregated roundtrip time will be much worse than the time for the transfer of the additional data.
So, I think you need to have a sharp eye ;-)
And also, I think that benchmarking in the dev environment is not pointless, because it can give at least relations between the two ways of doing it.
返回大量行的单个查询几乎总是比仅返回单个行的大量查询要快。
但就您而言,首先检索用户,然后检索所有评论(使用单个查询)可能比在一个查询中获取所有内容更有效。
Having a single query that returns a lot of rows is almost always faster than a lot of queries returning just a single row.
In your case though, retrieving the user first, and then all comments (with a single query) is probably more efficient than getting everything in one query.