MySQL标记问题:如何选择已标记为X、Y和Z的项目?
我正在处理一个数据库,其中的项目被“标记”一定次数。
item (100k rows)
- id
- name
- other stuff
tag (10k rows)
- id
- name
item2tag (1,000,000 rows)
- item_id
- tag_id
- count
我正在寻找最快的解决方案:
选择已标记为 X、Y 和 Z 的项目(其中 X、Y 和 Z 对应于(可能)标记名称)?
这是我到目前为止所拥有的...我只是想确保我以尽可能最好的方式做到这一点:
首先从名称中获取 tag_ids:
SELECT tag.id WHERE name IN ("X","Y","Z");
然后我按这些 tag_ids 进行分组并使用 Have 来过滤结果:
SELECT item2tag.*, count(tag_id)
FROM item2tag
WHERE tag_id=1 or tag_id=2 or tag_id=3
GROUP BY item_id
HAVING count(tag_id)=3;
然后我就可以从具有这些 id 的项目中进行选择。
SELECT * FROM item WHERE id IN ([results from prior query])
我在 item2tag 中有数百万行,索引位于 (item_id, tag_id) 上。这将是最快的解决方案吗?
I'm dealing with a database where items are "tagged" a certain number of times.
item (100k rows)
- id
- name
- other stuff
tag (10k rows)
- id
- name
item2tag (1,000,000 rows)
- item_id
- tag_id
- count
I'm looking for the fastest solution to:
Select items that have been tagged as X, Y, and Z (where X, Y, and Z correspond to (possibly) tag names) ?
Here's what I have so far... I'd just like to make sure I'm doing it in the best way possible:
First get the tag_ids from the names:
SELECT tag.id WHERE name IN ("X","Y","Z");
Then I group by those tag_ids and use Having to filter the result:
SELECT item2tag.*, count(tag_id)
FROM item2tag
WHERE tag_id=1 or tag_id=2 or tag_id=3
GROUP BY item_id
HAVING count(tag_id)=3;
Then I can just select from item with those ids.
SELECT * FROM item WHERE id IN ([results from prior query])
I have millions of rows in item2tag, with an index on (item_id, tag_id). Is this going to be the fastest solution?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您建议的方法可能是执行查询的最常见方法,但可能不是最快的。使用联接可以更快:
您应该确保具有以下索引:
我在几个不同的场景中针对原始查询对此查询进行了性能测试。
下面贴出我用来做性能测试的SQL。您可以自己运行此测试或稍微修改它并测试其他查询或不同的场景。
警告:请勿在生产数据库上运行此脚本,因为它会修改
item2tag
表的内容。运行该脚本可能需要几分钟的时间,因为它会创建大量数据。The method you have suggested is probably the most common way to perform the query but might not be the fastest. Using joins can be faster:
You should ensure that you have the following indexes:
I performance tested this query against the original in a few different scenarios.
The SQL I used to make performance test is pasted below. You can run this test yourself or modify it slightly and test other queries, or different scenarios.
Warning: Don't run this script on your production database as it modifies the contents of the
item2tag
table. Running the script can take a few minutes as it creates a lot of data.最好有一个以 tag_id 作为第一列的索引 - 否则查找所有 tag_id 为 1 的项目将需要全表扫描(当然,对于任何 tag_id 都相同)。
You'll be better placed having an index that has tag_id as the first column - otherwise finding all items with tag_id 1 will require a full table scan (same for any tag_id, of course).
根据使用单个标签标记的项目数量,您可以通过获取使用一个标签标记的项目列表,然后过滤它以查找其他标签的出现来完成此操作,如下所示:
Depending on how many items are tagged with individual tags, you might do it by getting list of items tagged with one tag, and then filtering it for occurences of other tags, like this: