使用Postgresql高效查询最新记录

发布于 2024-08-10 02:49:29 字数 351 浏览 5 评论 0原文

我需要做一个大查询,但我只想要最新的记录。

对于单个条目,我可能会执行类似的操作

SELECT * FROM table WHERE id = ? ORDER BY date DESC LIMIT 1;

,但我需要提取大量(数千个条目)记录的最新记录,但仅限于最新条目。

这就是我所拥有的。这不是很有效率。我想知道是否有更好的方法。

SELECT * FROM table a WHERE ID IN $LIST AND date = (SELECT max(date) FROM table b WHERE b.id = a.id);

I need to do a big query, but I only want the latest records.

For a single entry I would probably do something like

SELECT * FROM table WHERE id = ? ORDER BY date DESC LIMIT 1;

But I need to pull the latest records for a large (thousands of entries) number of records, but only the latest entry.

Here's what I have. It's not very efficient. I was wondering if there's a better way.

SELECT * FROM table a WHERE ID IN $LIST AND date = (SELECT max(date) FROM table b WHERE b.id = a.id);

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

盛夏尉蓝 2024-08-17 02:49:29

如果您不想更改数据模型,可以使用 DISTINCT ON 从表“b”中获取“a”中每个条目的最新记录:

SELECT DISTINCT ON (a.id) *
FROM a
INNER JOIN b ON a.id=b.id
ORDER BY a.id, b.date DESC

如果您想避免“排序”, “在查询中,添加这样的索引可能可以帮助您,但我不确定:

CREATE INDEX b_id_date ON b (id, date DESC)

SELECT DISTINCT ON (b.id) *
FROM a
INNER JOIN b ON a.id=b.id
ORDER BY b.id, b.date DESC

或者,如果您想以某种方式对表“a”中的记录进行排序:

SELECT DISTINCT ON (sort_column, a.id) *
FROM a
INNER JOIN b ON a.id=b.id
ORDER BY sort_column, a.id, b.date DESC

替代方法

但是,所有上面的查询仍然需要从表“b”读取所有引用的行,因此如果您有大量数据,它可能仍然太慢。

您可以创建一个新表,其中仅保存每个 a.id 的最新“b”记录,甚至可以将这些列移动到“a”表本身中。

If you don't want to change your data model, you can use DISTINCT ON to fetch the newest record from table "b" for each entry in "a":

SELECT DISTINCT ON (a.id) *
FROM a
INNER JOIN b ON a.id=b.id
ORDER BY a.id, b.date DESC

If you want to avoid a "sort" in the query, adding an index like this might help you, but I am not sure:

CREATE INDEX b_id_date ON b (id, date DESC)

SELECT DISTINCT ON (b.id) *
FROM a
INNER JOIN b ON a.id=b.id
ORDER BY b.id, b.date DESC

Alternatively, if you want to sort records from table "a" some way:

SELECT DISTINCT ON (sort_column, a.id) *
FROM a
INNER JOIN b ON a.id=b.id
ORDER BY sort_column, a.id, b.date DESC

Alternative approaches

However, all of the above queries still need to read all referenced rows from table "b", so if you have lots of data, it might still just be too slow.

You could create a new table, which only holds the newest "b" record for each a.id -- or even move those columns into the "a" table itself.

梦魇绽荼蘼 2024-08-17 02:49:29

这可能会更有效率。区别:表 b 的查询仅执行 1 次,相关子查询针对每一行执行:

SELECT * 
FROM table a 
JOIN (SELECT ID, max(date) maxDate
        FROM table
      GROUP BY ID) b
ON a.ID = b.ID AND a.date = b.maxDate
WHERE ID IN $LIST 

this could be more eficient. Difference: query for table b is executed only 1 time, your correlated subquery is executed for every row:

SELECT * 
FROM table a 
JOIN (SELECT ID, max(date) maxDate
        FROM table
      GROUP BY ID) b
ON a.ID = b.ID AND a.date = b.maxDate
WHERE ID IN $LIST 
二货你真萌 2024-08-17 02:49:29

你对此有何看法?

select * from (
   SELECT a.*, row_number() over (partition by a.id order by date desc) r 
   FROM table a where ID IN $LIST 
)
WHERE r=1

我过去经常使用它

what do you think about this?

select * from (
   SELECT a.*, row_number() over (partition by a.id order by date desc) r 
   FROM table a where ID IN $LIST 
)
WHERE r=1

i used it a lot on the past

谷夏 2024-08-17 02:49:29

在方法上 - 创建一个包含表 a 上最近更新/插入时间的小型衍生表 - 将此表称为 a_latest。表 a_latest 将需要足够的粒度来满足您的特定查询要求。 就足够了

CREATE TABLE 
a_latest 
( id INTEGER NOT NULL, 
  date TSTAMP NOT NULL, 
  PRIMARY KEY (id, max_time) );

在你的情况下,使用then 使用类似于 najmeddine 建议的查询

SELECT a.* 
FROM TABLE a, TABLE a_latest 
USING ( id, date );

:然后的技巧是保持 a_latest 是最新的。使用插入和更新触发器来执行此操作。用plppgsql 编写的触发器相当容易编写。如果您愿意,我很乐意提供一个例子。

这里的要点是,最新更新时间的计算是在更新本身期间进行的。这将更多的负载从查询中转移出来。

On method - create a small derivative table containing the most recent update / insertion times on table a - call this table a_latest. Table a_latest will need sufficient granularity to meet your specific query requirements. In your case it should be sufficient to use

CREATE TABLE 
a_latest 
( id INTEGER NOT NULL, 
  date TSTAMP NOT NULL, 
  PRIMARY KEY (id, max_time) );

Then use a query similar to that suggested by najmeddine :

SELECT a.* 
FROM TABLE a, TABLE a_latest 
USING ( id, date );

The trick then is keeping a_latest up to date. Do this using a trigger on insertions and updates. A trigger written in plppgsql is fairly easy to write. I am happy to provide an example if you wish.

The point here is that computation of the latest update time is taken care of during the updates themselves. This shifts more of the load away from the query.

恬淡成诗 2024-08-17 02:49:29

如果每个 id 有很多行,那么您肯定需要一个相关子查询。
它将为每个 id 进行 1 次索引查找,但这比对整个表进行排序要快。

类似于:

SELECT a.id,
(SELECT max(t.date) FROM table t WHERE t.id = a.id) AS lastdate
FROM table2;

您将使用的“table2”不是您在上面的查询中提到的表,因为这里您需要一个不同 id 的列表才能获得良好的性能。由于您的 ID 可能是另一表的外键,因此请使用此表。

If you have many rows per id's you definitely want a correlated subquery.
It will make 1 index lookup per id, but this is faster than sorting the whole table.

Something like :

SELECT a.id,
(SELECT max(t.date) FROM table t WHERE t.id = a.id) AS lastdate
FROM table2;

The 'table2' you will use is not the table you mention in your query above, because here you need a list of distinct id's for good performance. Since your ids are probably FKs into another table, use this one.

倾城花音 2024-08-17 02:49:29

您也可以使用 NOT EXISTS 子查询来回答这个问题。本质上你是在说“选择记录...不存在的地方(选择更新的记录)”:

SELECT t.id FROM table t
WHERE NOT EXISTS
    (SELECT * FROM table n WHERE t.id = n.id AND n.date > t.date)

You can use a NOT EXISTS subquery to answer this also. Essentially you're saying "SELECT record... WHERE NOT EXISTS(SELECT newer record)":

SELECT t.id FROM table t
WHERE NOT EXISTS
    (SELECT * FROM table n WHERE t.id = n.id AND n.date > t.date)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文