SQL LIMIT 获取最新记录
我正在编写一个脚本,其中将列出所有 12 个类别的 25 个项目。数据库结构如下:
tbl_items
---------------------------------------------
item_id | item_name | item_value | timestamp
---------------------------------------------
tbl_categories
-----------------------------
cat_id | item_id | timestamp
-----------------------------
表tbl_items
中有大约600,000行。我正在使用这个 SQL 查询:
SELECT e.item_id, e.item_value
FROM tbl_items AS e
JOIN tbl_categories AS cat WHERE e.item_id = cat.item_id AND cat.cat_id = 6001
LIMIT 25
在从 6000 到 6012 的 cat_id
循环中使用相同的查询。但我想要每个类别的最新记录。如果我使用类似以下内容:
SELECT e.item_id, e.item_value
FROM tbl_items AS e
JOIN tbl_categories AS cat WHERE e.item_id = cat.item_id AND cat.cat_id = 6001
ORDER BY e.timestamp
LIMIT 25
..查询将计算大约 10 分钟,这是不可接受的。我可以更好地使用 LIMIT
来为每个类别提供最新的 25 条记录吗?
任何人都可以帮助我在没有 ORDER BY
的情况下实现这一目标吗?任何想法或帮助将受到高度赞赏。
编辑
tbl_items
+---------------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------------------+--------------+------+-----+---------+-------+
| item_id | int(11) | NO | PRI | 0 | |
| item_name | longtext | YES | | NULL | |
| item_value | longtext | YES | | NULL | |
| timestamp | datetime | YES | | NULL | |
+---------------------+--------------+------+-----+---------+-------+
tbl_categories
+----------------+------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------------+------------+------+-----+---------+-------+
| cat_id | int(11) | NO | PRI | 0 | |
| item_id | int(11) | NO | PRI | 0 | |
| timestamp | datetime | YES | | NULL | |
+----------------+------------+------+-----+---------+-------+
I am writing a script which will list 25 items of all 12 categories. Database structure is like:
tbl_items
---------------------------------------------
item_id | item_name | item_value | timestamp
---------------------------------------------
tbl_categories
-----------------------------
cat_id | item_id | timestamp
-----------------------------
There are around 600,000 rows in the table tbl_items
. I am using this SQL query:
SELECT e.item_id, e.item_value
FROM tbl_items AS e
JOIN tbl_categories AS cat WHERE e.item_id = cat.item_id AND cat.cat_id = 6001
LIMIT 25
Using the same query in a loop for cat_id
from 6000 to 6012. But I want the latest records of every category. If I use something like:
SELECT e.item_id, e.item_value
FROM tbl_items AS e
JOIN tbl_categories AS cat WHERE e.item_id = cat.item_id AND cat.cat_id = 6001
ORDER BY e.timestamp
LIMIT 25
..the query goes computing for approximately 10 minutes which is not acceptable. Can I use LIMIT
more nicely to give the latest 25 records for each category?
Can anyone help me achieve this without ORDER BY
? Any ideas or help will be highly appreciated.
EDIT
tbl_items
+---------------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------------------+--------------+------+-----+---------+-------+
| item_id | int(11) | NO | PRI | 0 | |
| item_name | longtext | YES | | NULL | |
| item_value | longtext | YES | | NULL | |
| timestamp | datetime | YES | | NULL | |
+---------------------+--------------+------+-----+---------+-------+
tbl_categories
+----------------+------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------------+------------+------+-----+---------+-------+
| cat_id | int(11) | NO | PRI | 0 | |
| item_id | int(11) | NO | PRI | 0 | |
| timestamp | datetime | YES | | NULL | |
+----------------+------------+------+-----+---------+-------+
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
可以添加索引吗?如果您在
timestamp
和其他相应列上添加索引,ORDER BY
将不会花费 10 分钟。Can you add indices? If you add an index on the
timestamp
and other appropriate columns theORDER BY
won't take 10 minutes.首先:
items
和categories
之间似乎是N:M关系:一个item
可能属于多个categories< /代码>。我这样说是因为
categories
有item_id
外键。如果不是 N:M 关系那么你应该考虑改变设计。如果是 1:N 关系,其中一个类别有多个项目,则
item
必须包含category_id
外键。使用 N:M:
我重写了您的查询以进行内部联接而不是交叉联接:
要优化性能,所需的索引是:
项目上的索引不是强制性的,因为主键也已建立索引。
包含时间戳的索引没有多大帮助。可以肯定的是,可以尝试使用
item_id
和timestamp
对项目建立索引,以避免访问表并从索引中获取值:要提高性能,您可以通过以下方式更改类别循环单个查询:
请尝试此查询,并在必要时返回您的评论以对其进行完善。
First of all:
It seems to be a N:M relation between
items
andcategories
: aitem
may be in severalcategories
. I say this becausecategories
hasitem_id
foreign key.If is not a N:M relationship then you should consider to change design. If it is a 1:N relationship, where a category has several items, then
item
must constaincategory_id
foreign key.Working with N:M:
I have rewrite your query to make a inner join insteat a cross join:
To optimize performance required indexes are:
it is not mandatory an index on items because primary key is also indexed.
A index that contains timestamp don't help as mutch. To be sure can try with an index on item with
item_id
andtimestamp
to avoid access to table and take values from index:To increase performace you can change your loop over categories by a single query:
Please, try this querys and come back with your comments to refine it if necessary.
抛开所有其他因素,我可以告诉您,查询如此慢的主要原因是因为结果涉及
longtext
列。MySQL 中的 BLOB 和 TEXT 字段主要用于存储完整的文件(文本或二进制)。它们与 InnoDB 表的行数据分开存储。每次查询涉及排序(显式排序或
group by
)时,MySQL 一定会使用磁盘进行排序(因为它无法提前确定任何文件有多大)。这可能是一个经验法则:如果您需要在查询中返回多于一行的列,则字段的类型几乎永远不应该是
TEXT
或BLOB< /code>,请改用
VARCHAR
或VARBINARY
。UPD
如果无法更新表,则使用当前索引和列类型的查询将很难加快。但是,无论如何,这里有一个类似的问题和一个流行的解决方案来解决您的问题: 如何选择每个类别的最新四个项目?
Leaving aside all other factors I can tell you that the main reason why the query is so slow, is because the result involves
longtext
columns.BLOB
andTEXT
fields in MySQL are mostly meant to store complete files, textual or binary. They are stored separately from the row data for InnoDB tables. Each time a query involes sorting (explicitly or for agroup by
), MySQL is sure to use disk for the sorting (because it can not be sure in advance how large any file is).And it is probably a rule of thumb: if you need to return more than a single row of a column in a query, the type of the field is almost never should be
TEXT
orBLOB
, useVARCHAR
orVARBINARY
instead.UPD
If you can not update the table, the query will hardly be fast with the current indexes and column types. But, anyway, here is a similar question and a popular solution to your problem: How to SELECT the newest four items per category?