使用Postgresql高效查询最新记录
我需要做一个大查询,但我只想要最新的记录。
对于单个条目,我可能会执行类似的操作
SELECT * FROM table WHERE id = ? ORDER BY date DESC LIMIT 1;
,但我需要提取大量(数千个条目)记录的最新记录,但仅限于最新条目。
这就是我所拥有的。这不是很有效率。我想知道是否有更好的方法。
SELECT * FROM table a WHERE ID IN $LIST AND date = (SELECT max(date) FROM table b WHERE b.id = a.id);
I need to do a big query, but I only want the latest records.
For a single entry I would probably do something like
SELECT * FROM table WHERE id = ? ORDER BY date DESC LIMIT 1;
But I need to pull the latest records for a large (thousands of entries) number of records, but only the latest entry.
Here's what I have. It's not very efficient. I was wondering if there's a better way.
SELECT * FROM table a WHERE ID IN $LIST AND date = (SELECT max(date) FROM table b WHERE b.id = a.id);
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
如果您不想更改数据模型,可以使用
DISTINCT ON
从表“b”中获取“a”中每个条目的最新记录:如果您想避免“排序”, “在查询中,添加这样的索引可能可以帮助您,但我不确定:
或者,如果您想以某种方式对表“a”中的记录进行排序:
替代方法
但是,所有上面的查询仍然需要从表“b”读取所有引用的行,因此如果您有大量数据,它可能仍然太慢。
您可以创建一个新表,其中仅保存每个
a.id
的最新“b”记录,甚至可以将这些列移动到“a”表本身中。If you don't want to change your data model, you can use
DISTINCT ON
to fetch the newest record from table "b" for each entry in "a":If you want to avoid a "sort" in the query, adding an index like this might help you, but I am not sure:
Alternatively, if you want to sort records from table "a" some way:
Alternative approaches
However, all of the above queries still need to read all referenced rows from table "b", so if you have lots of data, it might still just be too slow.
You could create a new table, which only holds the newest "b" record for each
a.id
-- or even move those columns into the "a" table itself.这可能会更有效率。区别:表 b 的查询仅执行 1 次,相关子查询针对每一行执行:
this could be more eficient. Difference: query for table b is executed only 1 time, your correlated subquery is executed for every row:
你对此有何看法?
我过去经常使用它
what do you think about this?
i used it a lot on the past
在方法上 - 创建一个包含表 a 上最近更新/插入时间的小型衍生表 - 将此表称为 a_latest。表 a_latest 将需要足够的粒度来满足您的特定查询要求。 就足够了
在你的情况下,使用then 使用类似于 najmeddine 建议的查询
:然后的技巧是保持 a_latest 是最新的。使用插入和更新触发器来执行此操作。用plppgsql 编写的触发器相当容易编写。如果您愿意,我很乐意提供一个例子。
这里的要点是,最新更新时间的计算是在更新本身期间进行的。这将更多的负载从查询中转移出来。
On method - create a small derivative table containing the most recent update / insertion times on table a - call this table a_latest. Table a_latest will need sufficient granularity to meet your specific query requirements. In your case it should be sufficient to use
Then use a query similar to that suggested by najmeddine :
The trick then is keeping a_latest up to date. Do this using a trigger on insertions and updates. A trigger written in plppgsql is fairly easy to write. I am happy to provide an example if you wish.
The point here is that computation of the latest update time is taken care of during the updates themselves. This shifts more of the load away from the query.
如果每个 id 有很多行,那么您肯定需要一个相关子查询。
它将为每个 id 进行 1 次索引查找,但这比对整个表进行排序要快。
类似于:
您将使用的“table2”不是您在上面的查询中提到的表,因为这里您需要一个不同 id 的列表才能获得良好的性能。由于您的 ID 可能是另一表的外键,因此请使用此表。
If you have many rows per id's you definitely want a correlated subquery.
It will make 1 index lookup per id, but this is faster than sorting the whole table.
Something like :
The 'table2' you will use is not the table you mention in your query above, because here you need a list of distinct id's for good performance. Since your ids are probably FKs into another table, use this one.
您也可以使用 NOT EXISTS 子查询来回答这个问题。本质上你是在说“选择记录...不存在的地方(选择更新的记录)”:
You can use a NOT EXISTS subquery to answer this also. Essentially you're saying "SELECT record... WHERE NOT EXISTS(SELECT newer record)":