JPA 为每个项目选择最新实例
假设我有一个会议实体。每次会议都有一个与会者和一个会议日期。在我的会议表中,我可能为每个与会者举行多个会议,每个会议都有不同的日期。我需要一个 JPA 查询,该查询将为所有与会者仅选择最新的会议。例如,如果我的表如下所示,
Meeting ID | Attendee ID | Meeting Date
1 | 1 | 6/1/2011
2 | 2 | 6/1/2011
3 | 1 | 6/6/2011
4 | 3 | 6/6/2011
我的结果应该是
Meeting ID | Attendee ID | Meeting Date
2 | 2 | 6/1/2011
3 | 1 | 6/6/2011
4 | 3 | 6/6/2011
针对 postgres 使用 JPA 2。会议有 1-1 与会者和简单的时间戳日期。我怀疑我需要通过和 max(blah) 进行分组,也许还需要加入自己,但我不确定解决这个问题的最佳方法。
更新: 经过一晚上的研究,我仍然没有一个可接受的 JPQL 解决方案。到目前为止,这是我所拥有的:
select m from Meeting m
where m.meetingDate in
( select max(meet.meetingDate)
from Meeting meet group by meet.attendee )
我还有与此问题无关的各种其他条件,例如按参加者部门进行过滤等。这种方法有效的唯一原因是我们将会议日期跟踪到秒(或更精确),并且两次会议同时举行的可能性很小。我们在它周围放置了一些 java 东西,只为每个与会者保留最后一次会议,以防万一我们同时有两个与会者,但这是一个非常蹩脚的解决方案。在查询中获取所有内容确实应该不会太困难,但我还没有弄清楚。
Update2: 添加 sql 标签,因为如果我需要使用 sql 创建视图并创建 JPA 对象来映射到视图,我可以接受。
Let's say I have a Meeting entity. Each meeting has a single attendee and a meeting date. Within my meeting table I may have multiple meetings for each attendee, with different dates for each. I need a JPA query that will select only the latest meeting for all attendees. For instance, if my table looks like this
Meeting ID | Attendee ID | Meeting Date
1 | 1 | 6/1/2011
2 | 2 | 6/1/2011
3 | 1 | 6/6/2011
4 | 3 | 6/6/2011
My result should be
Meeting ID | Attendee ID | Meeting Date
2 | 2 | 6/1/2011
3 | 1 | 6/6/2011
4 | 3 | 6/6/2011
Using JPA 2 against postgres. Meeting has 1-1 to attendee and a simple timestamp date. I suspect I'm going to need to do a group by and max(blah) and maybe a join to myself, but I'm not sure of the best way to approach this.
Update:
After spending the evening playing with this, I still do not have an acceptable JPQL solution to this. Here is what I have so far:
select m from Meeting m
where m.meetingDate in
( select max(meet.meetingDate)
from Meeting meet group by meet.attendee )
I have various other conditions that are not relevant to this question, like filtering by attendee department and whatnot. The only reason this works is because we are tracking meeting date to the second (or finer) and the chance that there will be two meetings at exactly the same time is minimal. We are putting some java stuff around it to keep only hte last meeting for each attendee just in case we do get two at the same time, but that's a pretty crappy solution. It really shouldn't be too difficult to get it all in a query, but I have yet to figure it out.
Update2: Adding sql tag because if I need to use sql to create a view and create a JPA object to map to the view I'm ok with that.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
在 SQL 中,解决方案非常简单 - 使用子查询连接表,这会为您提供每个与会者最近的会议:
这很有效,而且工作速度很快!
JPA 的问题是它(或大多数实现)不允许使用子查询进行连接。在花了几个小时尝试首先编译什么之后,然后,它有多慢,我决定我讨厌 JPA。像上面这样的解决方案 - 例如 EXISTS (SELECT .. ) 或 IN (SELECT .. ) - 需要很长时间才能执行,比应有的速度慢了几个数量级。
拥有一个有效的解决方案意味着我只需要从 JPA 访问该解决方案即可。 SQL 中有两个神奇的词可以帮助您做到这一点:
生活变得如此简单......只需定义这样的实体并使用它。
注意:它是只读的。
当然,当你这样做时,任何 JPA 纯粹主义者都会看不起你,所以如果有人有纯 JPA 解决方案,请让我们都知道!
In SQL the solution is very simple - join the table with a subquery, which gives you the most recent meeting for each attendee:
This works, and works fast!
The problem with JPA is that it (or most implementations) won't allow a subquery for a join. After spending several hours trying what will compile in the first place, and then, how slow it is, I decided that I hate JPA. Solutions like the ones above - like EXISTS (SELECT .. ) or IN ( SELECT .. ) - take ages to execute, orders of magnitude slower than they should.
Having a solution that works meant that I just needed to access that solution from JPA. There are two magic words in SQL that help you do just that:
and the life becomes so much simpler... Just define such entity and use it.
Caution: it's read-only.
Of course, any JPA purists will look down on you when you do that, so if anyone has a pure JPA solution, please let us both know!
我想我已经通过这个查询得到了它。
I think I've got it with this query.
我认为在 SQL 中这将非常简单,所以我假设可以映射到 JPA:
编辑:如果您还需要 messageId 本身,您可以使用一个简单的子查询来完成,该子查询返回消息的 messageId,其中其他两个值是相等的。只需确保您处理同一参加者和日期有多个 messageId 的情况(例如,选择第一个结果,因为它们应该都同样好 - 尽管我怀疑此类数据对于会议是否有意义)
Well in SQL that would be quite simple I think, so I assume that can be mapped to JPA:
Edit: If you need the messageId itself as well you can do that with a simple subquery that returns the messageId for a message where the other two values are equal. Just make sure you handle the case where there are several messageIds for the same Attendee and Date (eg pick the first result since they should all be equally good - although I'd doubt that such data even makes sense for meetings)
Plain SQL
As Bulba 说过 合适的方法是将子查询与 group by 结合起来。
JPA、JPQL
问题是无法加入子查询。
这是一个解决方法。
让我们看看使用 group by 在子查询中得到了什么。您将获得
(attendee_id, max(meeting_date)) 对的列表。
这对就像您想要加入的最大日期的行的新唯一 ID。
然后请注意,表中的每一行形成一对
(attendee_id, meet_date)
。因此,每一行都有一个 id 作为一对
(attendee_id, meet_date)
。如果一行的 id 属于子查询中收到的列表,那么我们就取一行。
为了简单起见,我们将此 id 对表示为
attendee_id
和meeting_date
的串联:concat(attendee_id, meet_date)
。那么 SQL 中的查询(对于 JPQL 和 JPA CriteriaBuilder 也类似)将如下所示:
请注意,每个查询只有一个子查询,而不是像某些 答案。
害怕比较字符串?
我们为您提供特别优惠!
让我们将该 id 对编码为数字。
它将是
attendee_id
和meeting_date
的总和,但会进行修改以确保代码的唯一性。我们可以将日期的数字表示作为Unix时间。我们将修复代码可以捕获的最大日期的值,因为最终代码有最大值限制(例如 bigint(int8)<263)。为了方便起见,我们将最大日期设为 2149-06-07 03:00:00。它等于 5662310400(秒)和 65536(天)。
我在这里假设我们需要以天为单位的日期精度(因此我们忽略小时及以下)。
为了构造唯一的代码,我们可以将其解释为以 65536 为基数的数字系统中的数字。该数字系统中的最后一个符号(从 0 到 216-1 的数字)或代码是以下数字天。其他符号将捕获
attendee_id
。在这种解释中,代码看起来像XXXX
,其中每个 X 的范围为 [0,216-1](更准确地说,第一个 X 的范围为 [0,216-1]) 215-1] 因为 1 位用于符号),前三个 X 代表attendee_id
,最后一个 X 代表meeting_date
。因此我们的代码可以捕获的
attendee_id
最大值为 247-1。该代码可以计算为
attendee_id
*65536 +“日期(以天为单位)”。在 postgresql 中,它将是:
其中
date_part
返回以秒为单位的日期,我们通过除以常量将其转换为天数。最后的查询是获取所有与会者的最新会议:
基准测试
我创建了一个表,其结构如问题中所示,并用 100000 行填充它,从 [1, 10000] 中随机选择
attendee_id
和随机日期范围[1970-01-01,2017-09-16]。我使用以下技术对查询进行了基准测试(使用 EXPLAIN ANALYZE) :相关子查询
执行时间:873260.878 ms
使用 group by 连接子查询
执行时间:103.427 毫秒
使用对
(attendee_id, date)
作为键将
attendee_id
和meeting_date
连接为字符串执行时间:207.720毫秒
将
attendee_id
和meeting_date
编码为单个数字(代码)执行时间:127.595毫秒
这是一个 git 包含表方案、表数据(作为 csv)、填充代码表和查询。
Plain SQL
As Bulba has said appropriate way is to join a subquery with group by.
JPA, JPQL
The problem is that you can't join a subquery.
Here is a workaround.
Lets see what you get in the subquery with group by. You get a list of pairs
(attendee_id, max(meeting_date))
.This pair is like a new unique id for row with max date you want to join on.
Then note that each row in the table forms a pair
(attendee_id, meeting_date)
.So every row has an id as a pair
(attendee_id, meeting_date)
.Lets take a row if only it forms an id that belongs to list received in the subquery.
For simplicity lets represent this id-pair as a concatenation of
attendee_id
andmeeting_date
:concat(attendee_id, meeting_date)
.Then the query in SQL(similarly for JPQL and JPA CriteriaBuilder) would be as follows:
Note that there is only one subquery per query, not one subquery for each row like in some answers.
Afraid of comparing strings?
We have a special offer for you!
Lets encode that id-pair to number.
It will be a sum of
attendee_id
andmeeting_date
but with modifications to ensure uniqueness of code. We can take number representation of date as Unix time.We will fix the value of max date that our code can capture because final code has max value limit (e.g. bigint(int8)<263). Lets take for convenience max date as 2149-06-07 03:00:00. It equals 5662310400 in seconds and 65536 in days.
I will assume here that we need precision for date in days(so we ignore hours and below).
To construct unique code we can interpret it as a number in a numerical system with base of 65536. The last symbol(number from 0 to 216-1) in or code in such numerical system is number of days. Other symbols will capture
attendee_id
. In such interpretation code looks likeXXXX
, where each X is in range [0,216-1] (to be more accurate, first X is in range [0,215-1] because of 1 bit for sign), first three X representsattendee_id
and last X representsmeeting_date
.So the max value of
attendee_id
our code can capture is 247-1.The code can be computed as
attendee_id
*65536 + "date in days".In postgresql it will be:
Where
date_part
returns date in seconds which we convert to days by dividing on constant.And final query to get the latest meetings for all attendees:
Benchmarking
I have created a table with stucture as in the question and populated it with 100000 rows randomly selecting
attendee_id
from [1, 10000] and random date from range [1970-01-01, 2017-09-16]. I have benchmarked (with EXPLAIN ANALYZE) queries with the following techniques:Correlated subquery
Execution time: 873260.878 ms
Join subquery with group by
Execution time: 103.427 ms
Use pair
(attendee_id, date)
as a keyConcat
attendee_id
andmeeting_date
as stringsExecution time: 207.720 ms
Encode
attendee_id
andmeeting_date
to a single number(code)Execution time: 127.595 ms
Here is a git with table scheme, table data (as csv), code for populating table, and queries.
试试这个
Try this