SQL LIMIT 获取最新记录

发布于 2024-12-27 07:13:43 字数 2072 浏览 0 评论 0原文

我正在编写一个脚本,其中将列出所有 12 个类别的 25 个项目。数据库结构如下:

tbl_items
---------------------------------------------
item_id | item_name | item_value | timestamp 
---------------------------------------------

tbl_categories
-----------------------------
cat_id | item_id | timestamp
-----------------------------

tbl_items中有大约600,000行。我正在使用这个 SQL 查询:

SELECT e.item_id, e.item_value
  FROM tbl_items AS e
  JOIN tbl_categories AS cat WHERE e.item_id = cat.item_id AND cat.cat_id = 6001
  LIMIT 25

在从 6000 到 6012 的 cat_id 循环中使用相同的查询。但我想要每个类别的最新记录。如果我使用类似以下内容:

SELECT e.item_id, e.item_value
  FROM tbl_items AS e
  JOIN tbl_categories AS cat WHERE e.item_id = cat.item_id AND cat.cat_id = 6001
  ORDER BY e.timestamp
  LIMIT 25

..查询将计算大约 10 分钟,这是不可接受的。我可以更好地使用 LIMIT 来为每个类别提供最新的 25 条记录吗?

任何人都可以帮助我在没有 ORDER BY 的情况下实现这一目标吗?任何想法或帮助将受到高度赞赏。

编辑

tbl_items

+---------------------+--------------+------+-----+---------+-------+
| Field               | Type         | Null | Key | Default | Extra |
+---------------------+--------------+------+-----+---------+-------+
| item_id             | int(11)      | NO   | PRI | 0       |       |
| item_name           | longtext     | YES  |     | NULL    |       |
| item_value          | longtext     | YES  |     | NULL    |       |
| timestamp           | datetime     | YES  |     | NULL    |       |
+---------------------+--------------+------+-----+---------+-------+

tbl_categories

+----------------+------------+------+-----+---------+-------+
| Field          | Type       | Null | Key | Default | Extra |
+----------------+------------+------+-----+---------+-------+
| cat_id         | int(11)    | NO   | PRI | 0       |       |
| item_id        | int(11)    | NO   | PRI | 0       |       |
| timestamp      | datetime   | YES  |     | NULL    |       |
+----------------+------------+------+-----+---------+-------+

I am writing a script which will list 25 items of all 12 categories. Database structure is like:

tbl_items
---------------------------------------------
item_id | item_name | item_value | timestamp 
---------------------------------------------

tbl_categories
-----------------------------
cat_id | item_id | timestamp
-----------------------------

There are around 600,000 rows in the table tbl_items. I am using this SQL query:

SELECT e.item_id, e.item_value
  FROM tbl_items AS e
  JOIN tbl_categories AS cat WHERE e.item_id = cat.item_id AND cat.cat_id = 6001
  LIMIT 25

Using the same query in a loop for cat_id from 6000 to 6012. But I want the latest records of every category. If I use something like:

SELECT e.item_id, e.item_value
  FROM tbl_items AS e
  JOIN tbl_categories AS cat WHERE e.item_id = cat.item_id AND cat.cat_id = 6001
  ORDER BY e.timestamp
  LIMIT 25

..the query goes computing for approximately 10 minutes which is not acceptable. Can I use LIMIT more nicely to give the latest 25 records for each category?

Can anyone help me achieve this without ORDER BY? Any ideas or help will be highly appreciated.

EDIT

tbl_items

+---------------------+--------------+------+-----+---------+-------+
| Field               | Type         | Null | Key | Default | Extra |
+---------------------+--------------+------+-----+---------+-------+
| item_id             | int(11)      | NO   | PRI | 0       |       |
| item_name           | longtext     | YES  |     | NULL    |       |
| item_value          | longtext     | YES  |     | NULL    |       |
| timestamp           | datetime     | YES  |     | NULL    |       |
+---------------------+--------------+------+-----+---------+-------+

tbl_categories

+----------------+------------+------+-----+---------+-------+
| Field          | Type       | Null | Key | Default | Extra |
+----------------+------------+------+-----+---------+-------+
| cat_id         | int(11)    | NO   | PRI | 0       |       |
| item_id        | int(11)    | NO   | PRI | 0       |       |
| timestamp      | datetime   | YES  |     | NULL    |       |
+----------------+------------+------+-----+---------+-------+

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

爱要勇敢去追 2025-01-03 07:13:43

可以添加索引吗?如果您在 timestamp 和其他相应列上添加索引,ORDER BY 将不会花费 10 分钟。

Can you add indices? If you add an index on the timestamp and other appropriate columns the ORDER BY won't take 10 minutes.

兲鉂ぱ嘚淚 2025-01-03 07:13:43

首先:

itemscategories之间似乎是N:M关系:一个item可能属于多个categories< /代码>。我这样说是因为 categoriesitem_id 外键。

如果不是 N:M 关系那么你应该考虑改变设计。如果是 1:N 关系,其中一个类别有多个项目,则 item 必须包含 category_id 外键。

使用 N:M:

我重写了您的查询以进行内部联接而不是交叉联接:

  SELECT e.item_id, e.item_value
  FROM 
     tbl_items AS e
  JOIN 
     tbl_categories AS cat 
        on e.item_id = cat.item_id
  WHERE  
     cat.cat_id = 6001
  ORDER BY 
     e.timestamp
  LIMIT 25

要优化性能,所需的索引是:

create index idx_1 on tbl_categories( cat_id, item_id)

项目上的索引不是强制性的,因为主键也已建立索引。
包含时间戳的索引没有多大帮助。可以肯定的是,可以尝试使用 item_idtimestamp 对项目建立索引,以避免访问表并从索引中获取值:

create index idx_2 on tbl_items( item_id, timestamp)

要提高性能,您可以通过以下方式更改类别循环单个查询:

  select T.cat_id, T.item_id, T.item_value from 
  (SELECT cat.cat_id, e.item_id, e.item_value
   FROM 
     tbl_items AS e
   JOIN 
     tbl_categories AS cat 
        on e.item_id = cat.item_id
   ORDER BY 
     e.timestamp
   LIMIT 25
  ) T
  WHERE  
     T.cat_id between 6001 and 6012
  ORDER BY
     T.cat_id, T.item_id

请尝试此查询,并在必要时返回您的评论以对其进行完善。

First of all:

It seems to be a N:M relation between items and categories: a item may be in several categories. I say this because categories has item_id foreign key.

If is not a N:M relationship then you should consider to change design. If it is a 1:N relationship, where a category has several items, then item must constain category_id foreign key.

Working with N:M:

I have rewrite your query to make a inner join insteat a cross join:

  SELECT e.item_id, e.item_value
  FROM 
     tbl_items AS e
  JOIN 
     tbl_categories AS cat 
        on e.item_id = cat.item_id
  WHERE  
     cat.cat_id = 6001
  ORDER BY 
     e.timestamp
  LIMIT 25

To optimize performance required indexes are:

create index idx_1 on tbl_categories( cat_id, item_id)

it is not mandatory an index on items because primary key is also indexed.
A index that contains timestamp don't help as mutch. To be sure can try with an index on item with item_id and timestamp to avoid access to table and take values from index:

create index idx_2 on tbl_items( item_id, timestamp)

To increase performace you can change your loop over categories by a single query:

  select T.cat_id, T.item_id, T.item_value from 
  (SELECT cat.cat_id, e.item_id, e.item_value
   FROM 
     tbl_items AS e
   JOIN 
     tbl_categories AS cat 
        on e.item_id = cat.item_id
   ORDER BY 
     e.timestamp
   LIMIT 25
  ) T
  WHERE  
     T.cat_id between 6001 and 6012
  ORDER BY
     T.cat_id, T.item_id

Please, try this querys and come back with your comments to refine it if necessary.

不乱于心 2025-01-03 07:13:43

抛开所有其他因素,我可以告诉您,查询如此慢的主要原因是因为结果涉及 longtext 列。

MySQL 中的 BLOB 和 TEXT 字段主要用于存储完整的文件(文本或二进制)。它们与 InnoDB 表的行数据分开存储。每次查询涉及排序(显式排序或group by)时,MySQL 一定会使用磁盘进行排序(因为它无法提前确定任何文件有多大)。

这可能是一个经验法则:如果您需要在查询中返回多于一行的列,则字段的类型几乎永远不应该是 TEXTBLOB< /code>,请改用 VARCHARVARBINARY

UPD

如果无法更新表,则使用当前索引和列类型的查询将很难加快。但是,无论如何,这里有一个类似的问题和一个流行的解决方案来解决您的问题: 如何选择每个类别的最新四个项目?

Leaving aside all other factors I can tell you that the main reason why the query is so slow, is because the result involves longtext columns.

BLOB and TEXT fields in MySQL are mostly meant to store complete files, textual or binary. They are stored separately from the row data for InnoDB tables. Each time a query involes sorting (explicitly or for a group by), MySQL is sure to use disk for the sorting (because it can not be sure in advance how large any file is).

And it is probably a rule of thumb: if you need to return more than a single row of a column in a query, the type of the field is almost never should be TEXT or BLOB, use VARCHAR or VARBINARY instead.

UPD

If you can not update the table, the query will hardly be fast with the current indexes and column types. But, anyway, here is a similar question and a popular solution to your problem: How to SELECT the newest four items per category?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文