EF:延迟加载、急切加载和“枚举可枚举”
我发现我对延迟加载等感到困惑
。首先,这两个语句是否等效:
(1) Lazy loading:
_flaggedDates = context.FlaggedDates.Include("scheduledSchools")
.Include ("interviews").Include("partialDayAvailableBlocks")
.Include("visit").Include("events");
(2) Eager loading:
_flaggedDates = context.FlaggedDates;
换句话说,在 (1) 中,“包含”会导致导航集合/属性与请求的特定集合一起加载,无论事实上,您正在使用延迟加载......对吗?
在 (2) 中,该语句将加载所有导航实体,即使您没有具体请求它们,因为您正在使用急切加载......对吗?
其次:即使您使用预先加载,在您“枚举可枚举”之前,数据实际上不会从数据库下载,如以下代码所示:
var dates = from d in _flaggedDates
where d.dateID = 2
select d;
foreach (FlaggedDate date in dates)
{
... etc.
}
直到 foreach 循环,数据才会真正被下载(“枚举”) ... 正确的?换句话说,“vardates”行定义了查询,但查询直到 foreach 循环才执行。
鉴于此(如果我的假设是正确的),急切加载和延迟加载之间的真正区别是什么?看来无论哪种情况,数据都不会在枚举时出现。我错过了什么吗?
(顺便说一下,我的具体经验是代码优先、POCO 开发……尽管这些问题可能更普遍。)
I find I'm confused about lazy loading, etc.
First, are these two statements equivalent:
(1) Lazy loading:
_flaggedDates = context.FlaggedDates.Include("scheduledSchools")
.Include ("interviews").Include("partialDayAvailableBlocks")
.Include("visit").Include("events");
(2) Eager loading:
_flaggedDates = context.FlaggedDates;
In other words, in (1) the "Includes" cause the navigation collections/properties to be loaded along with the specific collection requested, regardless of the fact that you are using lazy loading ... right?
And in (2), the statement will load all the navigation entities even though you do not specifically request them, because you are using eager loading ... right?
Second: even if you are using eager loading, the data will not actually be downloaded from the database until you "enumerate the enumerable", as in the following code:
var dates = from d in _flaggedDates
where d.dateID = 2
select d;
foreach (FlaggedDate date in dates)
{
... etc.
}
The data will not actually be downloaded ("enumerated") until the foreach loop ... right? In other words, the "var dates" line defines the query, but the query is not executed until the foreach loop.
Given that (if my assumptions are correct), what's the real difference between eager loading and lazy loading?? It seems that in either case, the data does not appear until the enumeration. Am I missing something?
(My specific experience is with code-first, POCO development, by the way ... though the questions may apply more generally.)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您对(1)的描述是正确的,但它是急切加载而不是延迟加载的示例。
您对(2)的描述不正确。 (2) 从技术上讲根本不使用加载,但如果您尝试访问 FlaggedDates 上的任何非标量值,则会使用延迟加载。
无论哪种情况,您都是正确的,除非您尝试使用 _flagedDates“执行某些操作”,否则不会从数据存储中加载任何数据。然而,每种情况下发生的情况都不同。
(1):预加载:一旦开始
for
循环,您指定的每个对象都会从数据库中提取出来,并构建到一个巨大的内存数据结构中。这将是一项非常昂贵的操作,需要从数据库中提取大量数据。然而,这一切都将在一次数据库往返中发生,并执行单个 SQL 查询。(2): 延迟加载:当您的
for
循环开始时,它只会加载 FlaggedDates 对象。但是,如果您在for
循环中访问相关对象,它还不会将这些对象加载到内存中。第一次尝试检索给定 FlaggedDate 的 ScheduledSchools 将导致检索学校的新数据库往返,或者抛出异常,因为您的上下文已被处理。由于您将在for
循环内访问 ScheduledSchools 集合,因此对于您最初在for
循环开始时加载的每个 FlaggedDate,您都会有一个新的数据库往返行程。对评论的回复
禁用延迟加载与启用预加载不同。在此示例中:
schools
变量将包含一个空的 EntityCollection,因为我没有在原始查询 (FlaggedDates.First()) 中包含
它们,并且禁用了惰性加载,以便在执行初始查询后无法加载它们。您是正确的,
where d.dateID == 2
意味着只有与该特定 FlaggedDate 对象相关的对象才会被拉入。但是,根据与该 FlaggedDate 相关的对象数量,您最终仍然可能会通过该线路传输大量数据。这是由于 EntityFramework 构建其 SQL 查询的方式所致。 SQL 查询结果始终采用表格格式,这意味着每行必须具有相同数量的列。对于每个 ScheduledSchool 对象,结果集中至少需要有一行,并且由于每行必须至少包含每列的一些值,因此您最终会得到 FlaggedDate 上的每个标量值对象被重复。因此,如果您有 10 个预定学校和 10 个与您的 FlaggedDate 相关的面试,您最终将得到 20 行,每行包含 FlaggedDate 上的每个标量值。一半行的所有 ScheduledSchool 列都将为空值,另一半行的所有 Interviews 列都将为空值。然而,如果您“深入”所包含的数据,情况就会变得非常糟糕。例如,如果每个 ScheduledSchool 都有一个
students
属性(您也包含了该属性),那么突然之间,每个 ScheduledSchool 中的每个 Student 都会有一行,并且在每一行上,每个标量值都包含在 ScheduledSchool 中。 Student 的 ScheduledSchool 将被包括在内(即使最终只有第一行的值被使用),以及原始 FlaggedDate 对象上的每个标量值。它可以很快加起来。这很难用书面形式解释,但如果您查看具有多个
Include
的查询返回的实际数据,您会发现有很多重复数据。您可以使用 LinqPad 查看 EF 代码生成的 SQL 查询。Your description of (1) is correct, but it is an example of Eager Loading rather than Lazy Loading.
Your description of (2) is incorrect. (2) is technically using no loading at all, but will use Lazy Loading if you try to access any non-scalar values on your FlaggedDates.
In either case, you are correct that no data will be loaded from your data store until you attempt to "do something" with the _flaggedDates. However, what happens is different in each case.
(1): Eager loading: as soon as you begin your
for
loop, every one of the objects that you have specified will get pulled from the database and built into a gigantic in-memory data structure. This will be a very expensive operation, pulling an enormous amount of data from your database. However, it will all happen in one database round trip, with a single SQL query getting executed.(2): Lazy loading: When your
for
loop begins, it will only load the FlaggedDates objects. However, if you access related objects inside yourfor
loop, it will not have those objects loaded into memory yet. The first attempt to retrieve the scheduledSchools for a given FlaggedDate will result in either a new database roundtrip to retrieve the schools, or an Exception being thrown because your context has already been disposed. Since you'd be accessing the scheduledSchools collection inside afor
loop, you would have a new database round trip for every FlaggedDate that you initially loaded at the beginning of thefor
loop.Reponse to Comments
Disabling Lazy Loading is not the same as Enabling Eager Loading. In this example:
The
schools
variable will contain an empty EntityCollection, because I didn'tInclude
them in the original query (FlaggedDates.First()), and I disabled lazy loading so that they couldn't be loaded after the initial query had been executed.You are correct that the
where d.dateID == 2
would mean that only the objects related to that specific FlaggedDate object would be pulled in. However, depending on how many objects are related to that FlaggedDate, you could still end up with a lot of data going over that wire. This is due to the way the EntityFramework builds out its SQL query. SQL Query results are always in a tabular format, meaning you must have the same number of columns for every row. For every scheduledSchool object, there needs to be at least one row in the result set, and since every row has to contain at least some value for every column, you end up with every scalar value on your FlaggedDate object being repeated. So if you have 10 scheduledSchools and 10 interviews associated with your FlaggedDate, you'll end up with 20 rows that each contain every scalar value on FlaggedDate. Half of the rows will have null values for all the ScheduledSchool columns, and the other half will have null values for all of the Interviews columns.Where this gets really bad, though, is if you go "deep" in the data you're including. For example, if each ScheduledSchool had a
students
property, which you included as well, then suddenly you would have a row for each Student in each ScheduledSchool, and on each of those rows, every scalar value for the Student's ScheduledSchool would be included (even though only the first row's values end up getting used), along with every scalar value on the original FlaggedDate object. It can add up quickly.It's difficult to explain in writing, but if you look at the actual data coming back from a query with multiple
Include
s, you will see that there is a lot of duplicate data. You can use LinqPad to see the SQL Queries generated by your EF code.没有区别。在 EF 1.0 中情况并非如此,它不支持急切加载(至少不支持自动加载)。在 1.0 中,您必须修改属性以自动加载,或者对属性引用调用 Load() 方法。
要记住的一件事是,如果您像这样跨多个对象进行查询,那么这些 Includes 可能会消失:
ObjectDate.MyObjectProperty 将不会自动加载。
No difference. This was not true in EF 1.0, which didn't support eager loading (at least not automatically). In 1.0, you had to either modify the property to load automatically, or call the Load() method on the property reference.
One thing to keep in mind is that those Includes can go up in smoke if you query across multiple objects like so:
ObjectDate.MyObjectProperty will not be automatically loaded.