对按标签过滤的项目进行排序

发布于 2024-10-04 09:56:31 字数 1092 浏览 1 评论 0原文

我想实现一个非常常见的功能 - 按标签过滤一些项目。互联网上有很多教程,其中包含如何操作的示例。查询非常简单且快速(假设存在适当的索引)。
但通常情况下,过滤后的项目需要按某个字段进行排序。例如,当您按 SO 标签过滤问题时,您会对结果进行排序。

为了完成这项任务(假设我们需要按评级排序),可以这样写:

SELECT item.id FROM item
    INNER JOIN taggeditem ON taggeditem.item_id = item.id
WHERE
    taggeditem.tag_id = 1234
ORDER BY item.rating DESC

我们有索引 (taggeditem.tag_id), (item.id), (项目.评级) 此查询的问题是 mysql 无法在 item. rating 上使用索引,因为用于获取行的键与 ORDER BY 中使用的键不同 (MySQL:ORDER BY 优化)。这导致使用临时表和文件排序,进而导致执行时间变慢。

我想出的解决方案是将排序字段非规范化到 taggeditem 表,以便我可以在 taggeditem 上创建索引 (tag_id, item_ rating)

我在SO搜索了类似的问题,只发现了这个:Mysql Slow query: INNER JOIN + ORDER BY Causes filesort。解决方案是一样的。

所以我想问一下,这是解决这个问题的通用方法吗?将一堆排序字段非规范化为 taggeditem(例如创建的、评级的)是一个好习惯吗?在 SO 中,您可以使用 4 个不同的参数(最新、热门、投票、活动)进行排序 - 这是否意味着它们对用于对结果进行排序的字段进行了非规范化? 该解决方案还有其他选择吗?

I want to implement a very common feature - filtering some items by tag. There are many tutorials on the internet with examples of how to do it. The query is quite simple and fast (assuming proper indexes exist).
But usually the filtered items need to be sorted by some field. For example, when you filter questions by tag on SO, you get your results sorted.

To accomplish this task (assuming we need to sort by rating), one could write:

SELECT item.id FROM item
    INNER JOIN taggeditem ON taggeditem.item_id = item.id
WHERE
    taggeditem.tag_id = 1234
ORDER BY item.rating DESC

We have indexes (taggeditem.tag_id), (item.id), (item.rating)
The problem with this query is that mysql can't use index on item.rating, because the key used to fetch the rows is not the same as the one used in the ORDER BY (MySQL: ORDER BY Optimization). This leads to using a temporary table and filesort, which in turn leads to slow execution time.

The solution I came up with is to denormalize sort field to the taggeditem table, so that I could create index (tag_id, item_rating) on taggeditem.

I've searched for similar questions at SO, and found only this one: Mysql slow query: INNER JOIN + ORDER BY causes filesort. The solution was the same.

So, I want to ask, is this a common solution to this problem? Is it a good practice to denormalize a bunch of sort fields to taggeditem, such as created, rating? At SO you can sort using 4 different parameters (newest, hot, votes, active) - does it mean that they denormalized fields which are used to sort results?
Are there any alternatives to this solution?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

云归处 2024-10-11 09:56:31

有一个标准的替代方案 - 更改服务器系统变量。
例如,您可以尝试使用 sort_buffer_size 值(默认 2MB)。
更多信息

There is a standard alternative - change server system variables.
For example, you can experiment with sort_buffer_size value (default 2MB).
More about it.

魂牵梦绕锁你心扉 2024-10-11 09:56:31

一旦您使用 JOIN 并在连接表上进行过滤,您就会陷入糟糕的性能。

正如您所说,避免这种情况的唯一方法是创建非规范化表。

对于SO的排序,我认为他们没有这样的问题:他们只需要按答案表的列对答案进行排序(类似于SELECT * FROM Answers WHERE Question_id = 1234 SORT BY answer_date,带有question_id,answer_date 上的索引)

我也在寻找这样的解决方案,具有多值列,这确实很困难(非规范化数据会很大,因为它需要跨越多值列中的所有值)有价值的列)

As soon as you're using a JOIN, and filter out on the joined table, you're stuck with bad performance.

As you said, the only way to avoid this is to create a denormalized table.

For SO's sorts, I think they have no such issue: they just have to sort answers by a column of the answers' table (something like SELECT * FROM answers WHERE question_id = 1234 SORT BY answer_date, with an index on question_id, answer_date)

I'm also looking for such solutions, with multi-valued columns, and that's really difficult (denormalized data would be huge, as it needs to cross all values in the multi-valued columns)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文