JPA 为每个项目选择最新实例

发布于 2024-11-14 23:45:21 字数 1133 浏览 1 评论 0原文

假设我有一个会议实体。每次会议都有一个与会者和一个会议日期。在我的会议表中,我可能为每个与会者举行多个会议,每个会议都有不同的日期。我需要一个 JPA 查询,该查询将为所有与会者仅选择最新的会议。例如,如果我的表如下所示,

Meeting ID | Attendee ID | Meeting Date
1          | 1           |  6/1/2011
2          | 2           |  6/1/2011
3          | 1           |  6/6/2011
4          | 3           |  6/6/2011

我的结果应该是

Meeting ID | Attendee ID | Meeting Date
2          | 2           |  6/1/2011
3          | 1           |  6/6/2011
4          | 3           |  6/6/2011

针对 postgres 使用 JPA 2。会议有 1-1 与会者和简单的时间戳日期。我怀疑我需要通过和 max(blah) 进行分组,也许还需要加入自己,但我不确定解决这个问题的最佳方法。

更新: 经过一晚上的研究,我仍然没有一个可接受的 JPQL 解决方案。到目前为止,这是我所拥有的:

select m from Meeting m 
where m.meetingDate in 
    ( select max(meet.meetingDate) 
      from Meeting meet group by meet.attendee )

我还有与此问题无关的各种其他条件,例如按参加者部门进行过滤等。这种方法有效的唯一原因是我们将会议日期跟踪到秒(或更精确),并且两次会议同时举行的可能性很小。我们在它周围放置了一些 java 东西,只为每个与会者保留最后一次会议,以防万一我们同时有两个与会者,但这是一个非常蹩脚的解决方案。在查询中获取所有内容确实应该不会太困难,但我还没有弄清楚。

Update2: 添加 sql 标签,因为如果我需要使用 sql 创建视图并创建 JPA 对象来映射到视图,我可以接受。

Let's say I have a Meeting entity. Each meeting has a single attendee and a meeting date. Within my meeting table I may have multiple meetings for each attendee, with different dates for each. I need a JPA query that will select only the latest meeting for all attendees. For instance, if my table looks like this

Meeting ID | Attendee ID | Meeting Date
1          | 1           |  6/1/2011
2          | 2           |  6/1/2011
3          | 1           |  6/6/2011
4          | 3           |  6/6/2011

My result should be

Meeting ID | Attendee ID | Meeting Date
2          | 2           |  6/1/2011
3          | 1           |  6/6/2011
4          | 3           |  6/6/2011

Using JPA 2 against postgres. Meeting has 1-1 to attendee and a simple timestamp date. I suspect I'm going to need to do a group by and max(blah) and maybe a join to myself, but I'm not sure of the best way to approach this.

Update:
After spending the evening playing with this, I still do not have an acceptable JPQL solution to this. Here is what I have so far:

select m from Meeting m 
where m.meetingDate in 
    ( select max(meet.meetingDate) 
      from Meeting meet group by meet.attendee )

I have various other conditions that are not relevant to this question, like filtering by attendee department and whatnot. The only reason this works is because we are tracking meeting date to the second (or finer) and the chance that there will be two meetings at exactly the same time is minimal. We are putting some java stuff around it to keep only hte last meeting for each attendee just in case we do get two at the same time, but that's a pretty crappy solution. It really shouldn't be too difficult to get it all in a query, but I have yet to figure it out.

Update2: Adding sql tag because if I need to use sql to create a view and create a JPA object to map to the view I'm ok with that.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

若无相欠,怎会相见 2024-11-21 23:45:21

在 SQL 中,解决方案非常简单 - 使用子查询连接表,这会为您提供每个与会者最近的会议:

select * from Meeting ALL
join ( select max(meetingDate) as newest, attendee
from Meeting group by attendee ) LATEST
on ALL.meetingDate = LATEST.newest AND ALL.attendee = LATEST.attendee

这很有效,而且工作速度很快!

JPA 的问题是它(或大多数实现)不允许使用子查询进行连接。在花了几个小时尝试首先编译什么之后,然后,它有多慢,我决定我讨厌 JPA。像上面这样的解决方案 - 例如 EXISTS (SELECT .. ) 或 IN (SELECT .. ) - 需要很长时间才能执行,比应有的速度慢了几个数量级。

拥有一个有效的解决方案意味着我只需要从 JPA 访问该解决方案即可。 SQL 中有两个神奇的词可以帮助您做到这一点:

CREATE VIEW

生活变得如此简单......只需定义这样的实体并使用它。
注意:它是只读的。

当然,当你这样做时,任何 JPA 纯粹主义者都会看不起你,所以如果有人有纯 JPA 解决方案,请让我们都知道!

In SQL the solution is very simple - join the table with a subquery, which gives you the most recent meeting for each attendee:

select * from Meeting ALL
join ( select max(meetingDate) as newest, attendee
from Meeting group by attendee ) LATEST
on ALL.meetingDate = LATEST.newest AND ALL.attendee = LATEST.attendee

This works, and works fast!

The problem with JPA is that it (or most implementations) won't allow a subquery for a join. After spending several hours trying what will compile in the first place, and then, how slow it is, I decided that I hate JPA. Solutions like the ones above - like EXISTS (SELECT .. ) or IN ( SELECT .. ) - take ages to execute, orders of magnitude slower than they should.

Having a solution that works meant that I just needed to access that solution from JPA. There are two magic words in SQL that help you do just that:

CREATE VIEW

and the life becomes so much simpler... Just define such entity and use it.
Caution: it's read-only.

Of course, any JPA purists will look down on you when you do that, so if anyone has a pure JPA solution, please let us both know!

謸气贵蔟 2024-11-21 23:45:21

我想我已经通过这个查询得到了它。

select m from Meeting m 
    where m.meetingDate = 
        (select max(m1.meetingDate) 
            from Meeting m1 
            where m1.attendee = m.attendee )
    and not exists 
        (select m2 from Meeting m2 
            where m2.attendee = m.attendee 
            and m2.meetingDate > m.meetingDate)

I think I've got it with this query.

select m from Meeting m 
    where m.meetingDate = 
        (select max(m1.meetingDate) 
            from Meeting m1 
            where m1.attendee = m.attendee )
    and not exists 
        (select m2 from Meeting m2 
            where m2.attendee = m.attendee 
            and m2.meetingDate > m.meetingDate)
晒暮凉 2024-11-21 23:45:21

我认为在 SQL 中这将非常简单,所以我假设可以映射到 JPA:

SELECT m.AttendeeId, MAX(m.MeetingDate) from Meeting m GROUP BY m.AttendeeId

编辑:如果您还需要 messageId 本身,您可以使用一个简单的子查询来完成,该子查询返回消息的 messageId,其中其他两个值是相等的。只需确保您处理同一参加者和日期有多个 messageId 的情况(例如,选择第一个结果,因为它们应该都同样好 - 尽管我怀疑此类数据对于会议是否有意义)

Well in SQL that would be quite simple I think, so I assume that can be mapped to JPA:

SELECT m.AttendeeId, MAX(m.MeetingDate) from Meeting m GROUP BY m.AttendeeId

Edit: If you need the messageId itself as well you can do that with a simple subquery that returns the messageId for a message where the other two values are equal. Just make sure you handle the case where there are several messageIds for the same Attendee and Date (eg pick the first result since they should all be equally good - although I'd doubt that such data even makes sense for meetings)

小瓶盖 2024-11-21 23:45:21

Plain SQL

As Bulba 说过 合适的方法是将子查询与 group by 结合起来。

JPA、JPQL

问题是无法加入子查询。

这是一个解决方法。

让我们看看使用 group by 在子查询中得到了什么。您将获得(attendee_id, max(meeting_date)) 对的列表。
这对就像您想要加入的最大日期的行的新唯一 ID。
然后请注意,表中的每一行形成一对(attendee_id, meet_date)
因此,每一行都有一个 id 作为一对 (attendee_id, meet_date)
如果一行的 id 属于子查询中收到的列表,那么我们就取一行。

为了简单起见,我们将此 id 对表示为 attendee_idmeeting_date 的串联:concat(attendee_id, meet_date)

那么 SQL 中的查询(对于 JPQL 和 JPA CriteriaBuilder 也类似)将如下所示:

SELECT * FROM meetings 
WHERE concat(attendee_id, meeting_date) IN
(SELECT concat(attendee_id, max(meeting_date)) FROM meetings GROUP BY attendee_id)

请注意,每个查询只有一个子查询,而不是像某些 答案

害怕比较字符串?

我们为您提供特别优惠!

让我们将该 id 对编码为数字。
它将是 attendee_idmeeting_date 的总和,但会进行修改以确保代码的唯一性。我们可以将日期的数字表示作为Unix时间。
我们将修复代码可以捕获的最大日期的值,因为最终代码有最大值限制(例如 bigint(int8)<263)。为了方便起见,我们将最大日期设为 2149-06-07 03:00:00。它等于 5662310400(秒)和 65536(天)。
我在这里假设我们需要以天为单位的日期精度(因此我们忽略小时及以下)。
为了构造唯一的代码,我们可以将其解释为以 65536 为基数的数字系统中的数字。该数字系统中的最后一个符号(从 0 到 216-1 的数字)或代码是以下数字天。其他符号将捕获 attendee_id。在这种解释中,代码看起来像 XXXX,其中每个 X 的范围为 [0,216-1](更准确地说,第一个 X 的范围为 [0,216-1]) 215-1] 因为 1 位用于符号),前三个 X 代表 attendee_id,最后一个 X 代表 meeting_date
因此我们的代码可以捕获的 attendee_id 最大值为 247-1。
该代码可以计算为 attendee_id*65536 +“日期(以天为单位)”。

在 postgresql 中,它将是:

attendee_id*65536 + date_part('epoch', meeting_date)/(60*60*24)

其中 date_part 返回以秒为单位的日期,我们通过除以常量将其转换为天数。

最后的查询是获取所有与会者的最新会议:

SELECT * FROM meetings
WHERE attendee_id*65536 + date_part('epoch', meeting_date)/(60*60*24)
IN (SELECT attendee_id*65536 + date_part('epoch', max(meeting_date))/(60*60*24) from meetings GROUP BY attendee_id);

基准测试

我创建了一个表,其结构如问题中所示,并用 100000 行填充它,从 [1, 10000] 中随机选择 attendee_id 和随机日期范围[1970-01-01,2017-09-16]。我使用以下技术对查询进行了基准测试(使用 EXPLAIN ANALYZE) :

  1. 相关子查询

    从会议 m1 中选择 *,其中 m1.meeting_date=
    (SELECT max(m2.meeting_date) FROM 会议 m2 WHERE m2.attendee_id=m1.attendee_id);
    

    执行时间:873260.878 ms

  2. 使用 group by 连接子查询

    从会议中选择 * m
    JOIN(从会议中选择 attendee_id、max(meeting_date) GROUP BY attendee_id) attendee_max_date
    ON attendee_max_date.attendee_id = m.attendee_id;;
    

    执行时间:103.427 毫秒

  3. 使用对 (attendee_id, date) 作为键

    • attendee_idmeeting_date 连接为字符串

      从会议中选择 * WHERE concat(attendee_id,meeting_date) IN
      (从会议中选择 concat(attendee_id, max(meeting_date)) GROUP BY attendee_id);
      

      执行时间:207.720毫秒

    • attendee_idmeeting_date 编码为单个数字(代码)

      从会议中选择 *
      WHERE attendee_id*65536 + date_part('epoch',meeting_date)/(60*60*24)
      IN (SELECT attendee_id*65536 + date_part('epoch',max(meeting_date))/(60*60*24) 来自会议 GROUP BY attendee_id);
      

      执行时间:127.595毫秒

这是一个 git 包含表方案、表数据(作为 csv)、填充代码表和查询。

Plain SQL

As Bulba has said appropriate way is to join a subquery with group by.

JPA, JPQL

The problem is that you can't join a subquery.

Here is a workaround.

Lets see what you get in the subquery with group by. You get a list of pairs (attendee_id, max(meeting_date)).
This pair is like a new unique id for row with max date you want to join on.
Then note that each row in the table forms a pair (attendee_id, meeting_date).
So every row has an id as a pair (attendee_id, meeting_date).
Lets take a row if only it forms an id that belongs to list received in the subquery.

For simplicity lets represent this id-pair as a concatenation of attendee_id and meeting_date: concat(attendee_id, meeting_date).

Then the query in SQL(similarly for JPQL and JPA CriteriaBuilder) would be as follows:

SELECT * FROM meetings 
WHERE concat(attendee_id, meeting_date) IN
(SELECT concat(attendee_id, max(meeting_date)) FROM meetings GROUP BY attendee_id)

Note that there is only one subquery per query, not one subquery for each row like in some answers.

Afraid of comparing strings?

We have a special offer for you!

Lets encode that id-pair to number.
It will be a sum of attendee_id and meeting_date but with modifications to ensure uniqueness of code. We can take number representation of date as Unix time.
We will fix the value of max date that our code can capture because final code has max value limit (e.g. bigint(int8)<263). Lets take for convenience max date as 2149-06-07 03:00:00. It equals 5662310400 in seconds and 65536 in days.
I will assume here that we need precision for date in days(so we ignore hours and below).
To construct unique code we can interpret it as a number in a numerical system with base of 65536. The last symbol(number from 0 to 216-1) in or code in such numerical system is number of days. Other symbols will capture attendee_id. In such interpretation code looks like XXXX, where each X is in range [0,216-1] (to be more accurate, first X is in range [0,215-1] because of 1 bit for sign), first three X represents attendee_id and last X represents meeting_date.
So the max value of attendee_id our code can capture is 247-1.
The code can be computed as attendee_id*65536 + "date in days".

In postgresql it will be:

attendee_id*65536 + date_part('epoch', meeting_date)/(60*60*24)

Where date_part returns date in seconds which we convert to days by dividing on constant.

And final query to get the latest meetings for all attendees:

SELECT * FROM meetings
WHERE attendee_id*65536 + date_part('epoch', meeting_date)/(60*60*24)
IN (SELECT attendee_id*65536 + date_part('epoch', max(meeting_date))/(60*60*24) from meetings GROUP BY attendee_id);

Benchmarking

I have created a table with stucture as in the question and populated it with 100000 rows randomly selecting attendee_id from [1, 10000] and random date from range [1970-01-01, 2017-09-16]. I have benchmarked (with EXPLAIN ANALYZE) queries with the following techniques:

  1. Correlated subquery

    SELECT * FROM meetings m1 WHERE m1.meeting_date=
    (SELECT max(m2.meeting_date) FROM meetings m2 WHERE m2.attendee_id=m1.attendee_id);
    

    Execution time: 873260.878 ms

  2. Join subquery with group by

    SELECT * FROM meetings m
    JOIN (SELECT attendee_id, max(meeting_date) from meetings GROUP BY attendee_id) attendee_max_date
    ON attendee_max_date.attendee_id = m.attendee_id;</code>
    

    Execution time: 103.427 ms

  3. Use pair (attendee_id, date) as a key

    • Concat attendee_id and meeting_date as strings

      SELECT * FROM meetings WHERE concat(attendee_id, meeting_date) IN
      (SELECT concat(attendee_id, max(meeting_date)) from meetings GROUP BY attendee_id);
      

      Execution time: 207.720 ms

    • Encode attendee_id and meeting_date to a single number(code)

      SELECT * FROM meetings
      WHERE attendee_id*65536 + date_part('epoch',meeting_date)/(60*60*24)
      IN (SELECT attendee_id*65536 + date_part('epoch',max(meeting_date))/(60*60*24) from meetings GROUP BY attendee_id);
      

      Execution time: 127.595 ms

Here is a git with table scheme, table data (as csv), code for populating table, and queries.

我不吻晚风 2024-11-21 23:45:21

试试这个

SELECT MAX(m.MeetingDate) FROM Meeting m

Try this

SELECT MAX(m.MeetingDate) FROM Meeting m
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文