实体框架包含OrderBy随机生成重复数据
当我从数据库中检索项目列表(包括一些子项目)(通过 .Include)并随机排序时,EF 给出了一个意外的结果。我创建/克隆了附加项目。
为了更好地解释自己,我创建了一个小而简单的 EF CodeFirst 项目可以重现该问题。 首先,我将为您提供该项目的代码。
项目
创建一个基本的MVC3项目并通过Nuget添加EntityFramework.SqlServerCompact包。
这将添加以下软件包的最新版本:
- EntityFramework v4.3.0
- SqlServerCompact v4.0.8482.1
- EntityFramework.SqlServerCompact v4.1.8482.2
- WebActivator v1.5
模型和 DbContext
using System.Collections.Generic;
using System.Data.Entity;
namespace RandomWithInclude.Models
{
public class PeopleContext : DbContext
{
public DbSet<Person> Persons { get; set; }
public DbSet<Address> Addresses { get; set; }
}
public class Person
{
public int ID { get; set; }
public string Name { get; set; }
public virtual ICollection<Address> Addresses { get; set; }
}
public class Address
{
public int ID { get; set; }
public string AdressLine { get; set; }
public virtual Person Person { get; set; }
}
}
数据库设置和种子数据:EF.SqlServerCompact.cs
using System.Collections.Generic;
using System.Data.Entity;
using System.Data.Entity.Infrastructure;
using RandomWithInclude.Models;
[assembly: WebActivator.PreApplicationStartMethod(typeof(RandomWithInclude.App_Start.EF), "Start")]
namespace RandomWithInclude.App_Start
{
public static class EF
{
public static void Start()
{
Database.DefaultConnectionFactory = new SqlCeConnectionFactory("System.Data.SqlServerCe.4.0");
Database.SetInitializer(new DbInitializer());
}
}
public class DbInitializer : DropCreateDatabaseAlways<PeopleContext>
{
protected override void Seed(PeopleContext context)
{
var address1 = new Address {AdressLine = "Street 1, City 1"};
var address2 = new Address {AdressLine = "Street 2, City 2"};
var address3 = new Address {AdressLine = "Street 3, City 3"};
var address4 = new Address {AdressLine = "Street 4, City 4"};
var address5 = new Address {AdressLine = "Street 5, City 5"};
context.Addresses.Add(address1);
context.Addresses.Add(address2);
context.Addresses.Add(address3);
context.Addresses.Add(address4);
context.Addresses.Add(address5);
var person1 = new Person {Name = "Person 1", Addresses = new List<Address> {address1, address2}};
var person2 = new Person {Name = "Person 2", Addresses = new List<Address> {address3}};
var person3 = new Person {Name = "Person 3", Addresses = new List<Address> {address4, address5}};
context.Persons.Add(person1);
context.Persons.Add(person2);
context.Persons.Add(person3);
}
}
}
控制器: HomeController.cs
using System;
using System.Data.Entity;
using System.Linq;
using System.Web.Mvc;
using RandomWithInclude.Models;
namespace RandomWithInclude.Controllers
{
public class HomeController : Controller
{
public ActionResult Index()
{
var db = new PeopleContext();
var persons = db.Persons
.Include(p => p.Addresses)
.OrderBy(p => Guid.NewGuid());
return View(persons.ToList());
}
}
}
视图:Index.cshtml
@using RandomWithInclude.Models
@model IList<Person>
<ul>
@foreach (var person in Model)
{
<li>
@person.Name
</li>
}
</ul>
这应该是全部,您的应用程序应该编译:)
问题
如您所见,我们有 2 个简单的模型(人员和地址)和一个人可以有多个地址。
我们为生成的数据库添加种子3 个人和 5 个地址。
如果我们从数据库中获取所有人员(包括地址)并将结果随机化,然后仅打印出这些人员的姓名,这就是一切出错的地方。
结果,我有时会得到 4 个人,有时 5 个,有时 3 个,我期望 3 个。总是。
例如:
- 人 1
- 人 3
- 人 1
- 人 3
- 人 2
所以..它正在复制/克隆数据!这并不酷..
似乎 EF 无法跟踪哪些地址是哪个人的子地址。
生成的 SQL 查询是这样的:
SELECT
[Project1].[ID] AS [ID],
[Project1].[Name] AS [Name],
[Project1].[C2] AS [C1],
[Project1].[ID1] AS [ID1],
[Project1].[AdressLine] AS [AdressLine],
[Project1].[Person_ID] AS [Person_ID]
FROM ( SELECT
NEWID() AS [C1],
[Extent1].[ID] AS [ID],
[Extent1].[Name] AS [Name],
[Extent2].[ID] AS [ID1],
[Extent2].[AdressLine] AS [AdressLine],
[Extent2].[Person_ID] AS [Person_ID],
CASE WHEN ([Extent2].[ID] IS NULL) THEN CAST(NULL AS int) ELSE 1 END AS [C2]
FROM [People] AS [Extent1]
LEFT OUTER JOIN [Addresses] AS [Extent2] ON [Extent1].[ID] = [Extent2].[Person_ID]
) AS [Project1]
ORDER BY [Project1].[C1] ASC, [Project1].[ID] ASC, [Project1].[C2] ASC
解决方法
- 如果我从查询,一切顺利。但当然,地址不会被加载,并且每次访问该集合都会对数据库进行新的调用。
- 我可以首先从数据库获取数据,然后通过在 .OrderBy.. 之前添加 .ToList() 进行随机化,如下所示:
var people = db.Persons.Include(p => p.Addresses)。 ToList().OrderBy(p => Guid.NewGuid());
有人知道为什么会这样吗?
这可能是 SQL 生成中的一个错误吗?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
tl;dr:这里有一个漏洞抽象。对我们来说,
Include
是一个简单的指令,用于将事物的集合粘贴到每个返回的Person
行上。但 EF 的Include
实现是通过为每个Person-Address
组合返回整行并在客户端重新组装来完成的。按易失性值排序会导致这些行被打乱,从而分解 EF 所依赖的Person
组。当我们查看此 LINQ 的
ToTraceString()
时:我们看到
因此,我们为每个
A
获取n
行,加上1每个
行,没有任何P
的A
。但是,添加
OrderBy
子句会将 thing-to-order-by 放在有序列的 start 处:在您的情况下给出
So,其中ordered-by- thing 不是
P
的属性,而是易失性的,因此对于相同的不同PA
记录可能不同P
,整个事情崩溃了。我不确定这种行为在
按预期工作~~~铸铁错误
连续体中的哪个位置。但至少现在我们知道了。tl;dr: There's a leaky abstraction here. To us,
Include
is a simple instruction to stick a collection of things onto each single returnedPerson
row. But EF's implementation ofInclude
is done by returning a whole row for eachPerson-Address
combo, and reassembling at the client. Ordering by a volatile value causes those rows to become shuffled, breaking apart thePerson
groups that EF is relying on.When we have a look at
ToTraceString()
for this LINQ:we see
So we get
n
rows for eachA
, plus1
row for eachP
without anyA
s.Adding an
OrderBy
clause, however, puts the thing-to-order-by at the start of the ordered columns:gives
So in your case, where the ordered-by-thing is not a property of a
P
, but is instead volatile, and therefore can be different for differentP-A
records of the sameP
, the whole thing falls apart.I'm not sure where on the
working-as-intended ~~~ cast-iron bug
continuum this behaviour falls. But at least now we know about it.我不认为查询生成存在问题,但是当 EF 尝试将行转换为对象时肯定存在问题。
看起来这里有一个固有的假设,即连接语句中同一个人的数据将按或不按顺序分组在一起返回。
例如,
即使您按其他列排序,连接查询的结果也将始终是,同一个人总是会一个接一个地出现。
对于任何连接查询来说,这个假设大多成立。
但我认为这里有一个更深层次的问题。 OrderBy 适用于当您想要按特定顺序排列数据(与随机相反)时,因此该假设似乎确实合理。
我认为你真的应该取出数据,然后根据代码中的其他一些方式将其随机化
I dont think there is an issue in query generation, but there is definately an issue when EF tries to convert rows into object.
It looks like there is an inherent assumption here that data for the same person in a joined statement will be returned grouped together order by or not.
for example the result of a joined query will always be
even if you order by some other column, same person would always appear one after the other.
this assumption is mostly true for any joined query.
But there is a deeper issue here i think. OrderBy is for when you want data in certain order ( as opposite to random), so that assumption does seem reasonable.
i think you should really get data out and then randomize it according to some other means in your code
从理论来看:
要对项目列表进行排序,比较函数相对于项目应该是稳定的;这意味着对于任何 2 个项目 x, y,x< 的结果是y 应该与查询(调用)的次数相同。
我认为该问题与对 OrderBy 方法:
keySelector - 从元素中提取键的函数。
EF 没有明确提及所提供的函数是否应返回相同的值多次调用对象(在您的情况下返回不同/随机值),但我认为他们在文档中使用的“关键”术语隐含地暗示了这一点。
From theory:
To sort a list of items, the compare function should be stable relative to items; this means that for any 2 items x, y the result of x< y should be the same as many time is queried(called).
I think the issue is related to misunderstanding of specification(documentation) of OrderBy method:
keySelector - A function to extract a key from an element.
EF didn't mention explicitly if the provided function should return the same value for same object as many times is called (in your case returns different/random values), but I think the "key" term that they used in documentation implicitly suggested this.
当您定义查询路径来定义查询结果时(使用包含),查询路径仅对返回的 ObjectQuery 实例有效。 ObjectQuery 的其他实例和对象上下文本身不受影响。此功能允许您链接多个“包含”以进行急切加载。
因此,你的陈述转化为
你想要的而不是你想要的。
因此,您的第二个解决方法工作正常:)
参考:在实体中查询概念模型时加载相关对象
框架 - http://msdn.microsoft.com/en-us/library/bb896272 .aspx
When you define a query path to define the query results, (use Include), the query path is only valid on the returned instance of ObjectQuery. Other instances of ObjectQuery and the object context itself are not affected. This functionality lets you chain multiple "Includes" for eager loading.
Therefor, Your statement translates into
instead of what you intended.
Hence your second workaround works fine :)
Reference: Loading Related Objects While Querying A Conceptual Model in Entity
Framework - http://msdn.microsoft.com/en-us/library/bb896272.aspx
我也遇到了这个问题,并通过向我正在获取的主类添加 Randomizer Guid 属性来解决它。然后,我将列的默认值设置为 NEWID() ,如下所示(使用 EF Core 2)
获取时,它会变得更加复杂。我创建了两个随机整数作为我的排序索引,然后像这样运行查询
这似乎工作得很好,并且应该提供足够的熵,即使是一个大型数据集也可以相当随机。
I also ran into this problem, and solved it by adding a Randomizer Guid property to the main class I was fetching. I then set the column's default value to NEWID() like this (using EF Core 2)
When fetching, it gets a bit more complicated. I created two random integers to function as my order-by indexes, then ran the query like this
This seems to be working well enough, and should provide enough entropy for even a large dataset to be fairly randomized.
人们可以通过阅读 AakashM 答案 和 Nicolae Dascalu 回答,看来 Linq
OrderBy
需要一个稳定的排名函数,这NewID/Guid.NewGuid
不是。因此,我们必须使用另一个在单个查询中稳定的随机生成器。
为了实现这一点,在每次查询之前,使用 .Net 随机生成器来获取随机数。然后将此随机数与实体的独特属性相结合以进行随机排序。为了对结果进行“随机化”,对其进行
校验和
。 (checksum
是一个计算哈希值的 SQL Server 函数;最初的想法基于 此博客。)Id 是
int
,您可以这样编写查询:假设
Person
NewGuid
hack,这很可能不是一个具有良好分布等的良好随机生成器。但它不会导致实体在结果中重复。当心:
如果您的查询排序不能保证实体排名的唯一性,则必须对其进行补充以保证它。例如,如果您使用实体的非唯一属性进行校验和调用,则在
OrderBy
.ThenBy(p => p.Id) 的内容>.如果您的排名对于查询的根实体来说不是唯一的,则其包含的子实体可能会与具有相同排名的其他实体的子实体混合。然后 bug 就会留在这里。
注意:
我更喜欢使用
.Next()
方法获取int
,然后通过异或 (^
) 将其组合到实体int
唯一属性,而不是使用double
并将其相乘。但是SqlFunctions.Checksum< /code>
不幸的是,它没有提供
int
数据类型的重载,尽管 SQL Server 函数应该支持它。您可以使用强制转换来克服这个问题,但为了简单起见,我最终选择了乘法。As one can sort it out by reading AakashM answer and Nicolae Dascalu answer, it strongly seems Linq
OrderBy
requires a stable ranking function, whichNewID/Guid.NewGuid
is not.So we have to use another random generator that would be stable inside a single query.
To achieve this, before each querying, use a .Net Random generator to get a random number. Then combine this random number with a unique property of the entity to get randomly sorted. And to 'randomize' a bit the result,
checksum
it. (checksum
is a SQL Server function that compute a hash; original idea founded on this blog.)Assuming
Person
Id
is anint
, you could write your query this way :Like the
NewGuid
hack, this is very probably not a good random generator with a good distribution and so on. But it does not cause entities to get duplicated in results.Beware:
If your query ordering does not guarantees uniqueness of your entities ranking, you must complement it for guarantying it. By example, if you use a non-unique property of your entities for the checksum call, then add something like
.ThenBy(p => p.Id)
after theOrderBy
.If your ranking is not unique for your queried root entity, its included children may get mixed with children of other entities having the same ranking. And then the bug will stay here.
Note:
I would prefer use
.Next()
method to get anint
then combine it through a xor (^
) to an entityint
unique property, rather than using adouble
and multiply it. ButSqlFunctions.Checksum
unfortunately does not provide an overload forint
data type, though the SQL server function is supposed to support it. You may use a cast to overcome this, but for keeping it simple I finally had chosen to go with the multiply.