EF 4.1:为什么将常量转换为变量会导致额外的子查询?
今天我发现实体框架向它生成的 SQL 添加了不必要的子查询。我开始挖掘我的代码,试图缩小它可能来自的范围。过了(很长)一段时间后,我查明了造成这种情况的原因。但现在我比开始时更困惑,因为我不知道为什么会导致这种情况。
基本上我发现,在某些情况下,只需将常量转换为变量就可以改变实体框架生成的 SQL。我已将所有内容缩小到最低限度,并将其打包到一个小控制台应用程序中:
using System;
using System.Data.Entity;
using System.Linq;
class Program
{
private static readonly BlogContext _db = new BlogContext();
static void Main(string[] args)
{
const string email = "[email protected]";
var comments = from c in _db.Comments
where c.Email == email
select c;
var result = (from p in _db.Posts
join c in comments on p.PostId equals c.PostId
orderby p.Title
select new { p.Title, c.Content });
Console.WriteLine(result);
}
}
public class BlogContext : DbContext
{
public DbSet<Post> Posts { get; set; }
public DbSet<Comment> Comments { get; set; }
}
public class Post
{
public int PostId { get; set; }
public string Title { get; set; }
}
public class Comment
{
public int CommentId { get; set; }
public int PostId { get; set; }
public string Email { get; set; }
public string Content { get; set; }
}
这显示了以下输出,这是完美的:
SELECT
[Extent1].[PostId] AS [PostId],
[Extent1].[Title] AS [Title],
[Extent2].[Content] AS [Content]
FROM [dbo].[Posts] AS [Extent1]
INNER JOIN [dbo].[Comments] AS [Extent2] ON [Extent1].[PostId] = [Extent2].[PostId]
WHERE N'[email protected]' = [Extent2].[Email]
ORDER BY [Extent1].[Title] ASC
现在,如果我将 email
设为变量:
/*const*/ string email = "[email protected]";
输出将发生根本性变化
SELECT
[Project1].[PostId] AS [PostId],
[Project1].[Title] AS [Title],
[Project1].[Content] AS [Content]
FROM ( SELECT
[Extent1].[PostId] AS [PostId],
[Extent1].[Title] AS [Title],
[Extent2].[Content] AS [Content]
FROM [dbo].[Posts] AS [Extent1]
INNER JOIN [dbo].[Comments] AS [Extent2] ON [Extent1].[PostId] = [Extent2].[PostId]
WHERE [Extent2].[Email] = @p__linq__0
) AS [Project1]
ORDER BY [Project1].[Title] ASC
:附注,LINQ to SQL 似乎没有这样做。我知道忽略这一点可能是可以的,因为这两个命令返回相同的数据。但我非常好奇为什么会发生这种情况。直到今天,我一直有这样的印象(也许是错误的?):只要值保持不变(在本例中确实如此),将常量转换为变量总是安全的。所以我必须问...
为什么看似微不足道的更改会导致生成的 SQL 出现如此大的差异?
更新:
需要明确的是,我的问题不是关于email
的值在第一个查询中是一个硬编码值,在第二个查询中是一个变量(这在世界上是有意义的)。我的问题是为什么变量版本会导致额外的子查询。
谢谢!
Today I discovered that Entity Framework was adding an unnecessary sub query to the SQL it generates. I started digging my code trying to narrow down where it might come from. A (long) while later I pin-pointed what's causing it. But now I'm more confused than when I started, as I have no clue why it causes it.
Basically what I discovered is that on certain scenarios, simply converting a constant into a variable can alter the SQL that Entity Framework generates. I've shrunk everything to the bare minimum and packed it in a little console app:
using System;
using System.Data.Entity;
using System.Linq;
class Program
{
private static readonly BlogContext _db = new BlogContext();
static void Main(string[] args)
{
const string email = "[email protected]";
var comments = from c in _db.Comments
where c.Email == email
select c;
var result = (from p in _db.Posts
join c in comments on p.PostId equals c.PostId
orderby p.Title
select new { p.Title, c.Content });
Console.WriteLine(result);
}
}
public class BlogContext : DbContext
{
public DbSet<Post> Posts { get; set; }
public DbSet<Comment> Comments { get; set; }
}
public class Post
{
public int PostId { get; set; }
public string Title { get; set; }
}
public class Comment
{
public int CommentId { get; set; }
public int PostId { get; set; }
public string Email { get; set; }
public string Content { get; set; }
}
This shows the following output, which is perfect:
SELECT
[Extent1].[PostId] AS [PostId],
[Extent1].[Title] AS [Title],
[Extent2].[Content] AS [Content]
FROM [dbo].[Posts] AS [Extent1]
INNER JOIN [dbo].[Comments] AS [Extent2] ON [Extent1].[PostId] = [Extent2].[PostId]
WHERE N'[email protected]' = [Extent2].[Email]
ORDER BY [Extent1].[Title] ASC
Now if I make email
a variable:
/*const*/ string email = "[email protected]";
The output changes radically:
SELECT
[Project1].[PostId] AS [PostId],
[Project1].[Title] AS [Title],
[Project1].[Content] AS [Content]
FROM ( SELECT
[Extent1].[PostId] AS [PostId],
[Extent1].[Title] AS [Title],
[Extent2].[Content] AS [Content]
FROM [dbo].[Posts] AS [Extent1]
INNER JOIN [dbo].[Comments] AS [Extent2] ON [Extent1].[PostId] = [Extent2].[PostId]
WHERE [Extent2].[Email] = @p__linq__0
) AS [Project1]
ORDER BY [Project1].[Title] ASC
As a side note, LINQ to SQL does not seem to do this. I know it's probably okay to ignore this, as both commands return the same data. But I'm extremely curious as to why this happens. Up until today I always had the (perhaps false?) impression that it is always safe to turn a constant into a variable, providing that the value remains the same (which in this case does). So I have to ask...
Why a seemingly insignificant change causes such a big difference in the generated SQL?
Update:
Just to be clear, my question isn't about the value of email
being a hard-coded value in the first query and a variable in the second (which makes all the sense in the world). My question is about why the variable version results in the extra sub query.
Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
答案相当简单。您的 LINQ 查询是用表达式树表达的。 const 变量与非 const 变量的区别在于 常量表达式 和
参数表达式。
当您使用 const 时,您的 LINQ 查询将使用
ConstExpression
作为此变量,而当您使用非 const 时,它会使用ParameterExpression
,EF 运行时对它们的解释会有所不同。常量实际上意味着该值永远不会改变并且该值可以内联到查询中。
The answer is fairly simple. Your LINQ query is expressed with expression trees. The difference with the const variable vs non const one are lies in ConstantExpression and
ParameterExpression.
When you use const your LINQ query uses
ConstExpression
for this variable, and when you use non const it usesParameterExpression
which are interpreted differently by the EF Runtime.Constant actually means that the value will never change and the value can be inlined into the query.
不是问题的答案 - 只是使用参数的上下文。
这与创建查询有关,以便它将重新使用现有的查询计划。
如果将变量(而不是对参数的引用)注入到生成 SQL 中,则当变量更改时,SQL Server(可能还有其他数据库引擎)将无法重新使用相同的计划。
对于常量,这不是问题,因为您知道值始终相同,但对于每次执行查询时的变量,SQL 和查询计划都会略有不同。
这听起来可能不多,但 SQL 只为查询计划分配了一定量的空间,因此在缓存中拥有数百/数千个微小的变化可以说是真正的“空间浪费”!
NOT an answer to the question - just context on using Parameters.
This is to do with creating a query such that it will re-use existing query plans.
If you inject the variable (as opposed to a reference to a parameter) into the generate SQL, then SQL Server (and probably other database engines) will not be able to re-use the same plan when the variable changes.
For constants this is not an issue, because you know the value is always the same, but for variables each time the query is executed the SQL and thus query plan would be slightly different.
This might not sound like much, but SQL only has a certain amount of space assigned for query plans, so having hundreds/thousands of minor variations in the cache is a real 'waste of space' so to speak!
这实际上是 SQL 中的一个很大的区别吗?内部查询与原始查询相同,外部查询只是内部查询的包装,不会更改结果集。
除非这会引起问题,否则我个人不会担心。两种查询风格的查询计划是否不同?我的猜测是它们是相同的。
Is this actually a big difference in the SQL? The inner query is the same as the original query, and the outer query is just a wrapper over the inner that doesn't change the result set.
Unless this is causing problems, I personally wouldn't worry about it. Do the query plans differ between the two flavours of query? My guess is that they're identical.
正如人们所说。两个查询之间的差异很小。
原因是,当您使用变量和常量时,创建 LINQ 时创建的表达式是不同的。 EF 将捕捉到这一点并相应地生成 SQL。它知道它永远不会改变,因此可以将其硬编码到查询中以获得(可能的)性能增益。
编辑:
我认为这个问题没有答案,除了“EF 就是这么做的”。但众所周知,EF 喜欢创建许多子选择。对于更复杂的查询,它可能会产生许多子选择。有些人甚至因为这个事实而拒绝使用 EF。但这只是使用 EF 这样的工具的代价。您失去了对某些事物的细粒度控制,这可以带来巨大的性能提升。当您可以使用 C 并获得更高性能时,为什么还要使用 .NET?当可以使用汇编来获得更多性能提升时,为什么还要使用 C?
确保安全并仍然能够使用高抽象层 EF 的唯一方法是经常使用 SQL 分析器并检查是否存在对真实数据花费太长时间的查询。如果您找到一些,则将它们转换为直接 SQL 或存储过程。
Like people said. Difference between both queries is minimal.
The reason is that the expression, that is created when you create your LINQ is different when you use variable and when constant. And EF will catch this and will generate your SQL acordingly. It knows it will never change, so it can be hard-coded into the query for (possible) performance gain.
Edit:
I dont think there is answer to this question except "Thats how EF does it." But it is very well known that EF likes to creates many subselects. It can result for many subselects for more complex queries. Some even dismis even using EF for this fact. But this is simply price for using tool like EF. You loose fine-grained control over something, that can have big performance gain. Why do you use .NET, when you can use C and gain more performance? Why use C when you can use assembly, to gain more performance gain?
Only way to be safe and still be able to use high-abstraction layer EF is to use SQL profiller often and check if there are not queries that take too long on real data. And if your find some then either conver them to direct SQL or stored procedures.