Linq / Entity Framework 选择组的最新记录
我正在尝试弄清楚如何编写LINQ / ENTITY框架查询以返回表中每个符号可用的最新数据。
我的数据库表看起来像这样:
ID symbol price_date price
------------------------------------
1 AAPL 2022-02-28 174.50
2 MSFT 2022-02-28 307.20
3 AAPL 2021-03-01 172.23
4 MSFT 2021-03-01 304.15
并非每个符号都有每天的记录。 ID密钥是顺序的,可以安全地用作给定符号的最高ID,其中包含最新数据。
如果我正在编写SQL查询,则以下内容将返回我要寻找的内容:
select prices.*
from prices
where id in (select max(id) from prices group by symbol)
在Linq中,我很难将其用于单个查询。 到目前为止,我将其分为两个查询:
var maxIds = from pp in ctx.Prices
group pp by pp.Symbol
into maxIdBySymbol
select maxIdBySymbol.Max(pp => pp.Id);
var latestPrices = ctx.Prices.Where(it => maxIds.Contains(it.Id)).ToList();
有没有办法在Linq中进行单个查询?
谢谢
I am trying to figure out how to write a LINQ / Entity Framework query to return the latest data available for each symbol in a table.
My database table looks like this:
ID symbol price_date price
------------------------------------
1 AAPL 2022-02-28 174.50
2 MSFT 2022-02-28 307.20
3 AAPL 2021-03-01 172.23
4 MSFT 2021-03-01 304.15
Not every symbol has a record for every day though. The ID key is sequential and is safe for use as the highest ID for a given symbol will contain the latest data.
If I was writing a SQL query, the following would return what I'm looking for:
select prices.*
from prices
where id in (select max(id) from prices group by symbol)
In Linq, I'm having trouble making this into a single query.
What I have so far is dividing it into two queries:
var maxIds = from pp in ctx.Prices
group pp by pp.Symbol
into maxIdBySymbol
select maxIdBySymbol.Max(pp => pp.Id);
var latestPrices = ctx.Prices.Where(it => maxIds.Contains(it.Id)).ToList();
Is there a way to make this a single query in LINQ?
Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
补充:建议的解决方案有效,但效率较低
更多信息请参见最后的补充。
原始解决方案
因此,您将首先创建一组记录,其中每个组仅包含一个特定符号的记录。因此,您将拥有一组包含代码 AAPL 的记录,一组包含代码 MSFT 的记录,等等。
因此,一旦获得了组,就可以在组中选择一个元素。根据您的要求,您选择最新的元素,即
PriceDate
值最高的元素。正如您所说,您还可以采用属性ID
具有最高值的元素。就我个人而言,我不会这样做,因为如果在很远的将来,您的 ID 不再是升序日期,例如因为您添加了在输入错误后编辑 PriceDate 的功能。为此,我将使用 具有参数 resultSelector 的 Queryable.GroupBy 重载。使用 resultSelector 选择每个组中所需的一个元素。
换句话说:从 PriceRecords 表中,创建具有相同属性 Symbol 值的 PriceRecord 组。从符号和具有此符号的 PriceRecords 的每个组合中,按属性 PriceDate 的降序值对所有 PriceRecords 进行排序,并仅保留第一个。
每个组至少有一个元素,因此您可以使用
First
以及FirstOrDefault
。某些版本的 EntityFramework 或 DBMS 在使用 First 时存在问题。如果遇到此问题,请使用 FirstOrDefault。如果你还想取ID最高的那个:
为什么这个解决方案效率较低。
在原来的解决方案中,对一组中的所有记录进行排序,只取第一个记录。如果只取第一个元素,那么对第二个、第三个等元素进行排序有点浪费。
在原始 SQL 中,您将看到如下代码:
因此,并非所有元素都已排序。该序列仅枚举一次,并返回最大的序列。这比对您无论如何都不会使用的元素进行排序要有效得多。
要创建这样的代码,我们需要更改 GroupBy 的参数 resultSelector。让我们使用类似的方法 Max(propertySelector),或 可查询.聚合。像这样的事情:
唉,虽然实体框架的人做了很多工作,但不支持 Max 方法的这种重载,因此也不支持任何 Aggregate 方法。请参阅支持和不支持的 Linq 方法列表
Addition: the proposed solution works, but is less efficient
For more information see the addition at the end.
Original Solution
So you will first make groups of records, where every group contains only records for one specific symbol. So you will have one group that contains the records for symbol AAPL, one group that contains the records for symbol MSFT, etc.
So, once you've got the groups, you select one element in the group. According to your requirement you select the newest element, which is the element with the highest value for
PriceDate
. As you said, you could also take the element with the highest value for propertyID
. Personally I wouldn't do that, because if in a very far future your IDs are not in ascending date anymore, for instance because you add the feature to edit PriceDate after an input error.For this, I would use the overload of Queryable.GroupBy that has a parameter resultSelector. Use the resultSelector to select the one element of each group that you want.
In words: from the table of PriceRecords, make groups of PriceRecords that have the same value for property Symbol. From every combination of Symbol, and PriceRecords that have this symbol, order all PriceRecords by descending value for property PriceDate, and keep only the first one.
Every group has at least one element, so you could have used
First
as well asFirstOrDefault
. Some versions of EntityFramework or DBMS have problems using First. If you encounter this problem, use FirstOrDefault.If you still want to take the one with the highest ID:
Why is this solution less efficient.
In the original solution, all records in a group are sorted, and only the first one is taken. It is a bit of a waste to sort the second, third, etc. element if you will only take the first one.
In the original SQL you'll see code like:
So, not all elements are sorted. The sequence is enumerated only once, and the largest one is returned. This is way more effcient then sorting elements that you won't use anyway.
To create code like this, we need to change parameter resultSelector of the GroupBy. Let is use a method like Max(propertySelector), or one of the overloads of Queryable.Aggregate. Something like this:
Alas, although the guys from entity framework did a tremendous job, this overload of the Max method is not supported, so are none of the Aggregate methods. See List of Supported and Unsupported Linq methods
您可以将
Where
与 notAny
结合使用:You can combine
Where
with notAny
: