选择“标记帖子”的最优化方式使用 where & mysql数据库的限制
我有以下表结构:
+----------+------------------------+------+-----+------------------------------------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------+------------------------+------+-----+------------------------------------------+----------------+
| id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| body | varchar(200) | NO | | Hey now! | |
| flags | int(10) unsigned | NO | | 0 | |
| views | int(10) unsigned | NO | | 1 | |
+----------+------------------------+------+-----+------------------------------------------+----------------+
并且我想选择“仅未标记的行超过其总视图的 5%,并且已被查看至少 5 次”。
这是我的查询:
SELECT id,body
FROM tablename
WHERE id NOT IN (
SELECT id
FROM tablename
WHERE flags/views * 100 > 5.0
AND views > 5
ORDER BY id DESC
)
ORDER BY id DESC
LIMIT 6
我认为选择“已标记超过总视图 30% 的每一行”将会产生大量开销,尤其是当表增长到大量行时。有人可以帮我优化这个吗?
我还考虑创建一个“flag_score”列,并在每次标记某些内容时更新它,这样我就可以在 flag_score 列上进行选择,而不是在选择中进行数学运算(并节省额外的选择查询)。这听起来是个好方法吗?非常感谢。
编辑:我遇到的另一个问题是,如果我简单地执行以下操作:
SELECT *
FROM tabelname
WHERE flags/views * 100 > 5.0
AND views > 5
ORDER BY id DESC
LIMIT 5
...如果 5 个帖子中的 4 个已被标记,它只会返回 1 行!我希望该语句返回 5 行。
I have the following table structure:
+----------+------------------------+------+-----+------------------------------------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------+------------------------+------+-----+------------------------------------------+----------------+
| id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| body | varchar(200) | NO | | Hey now! | |
| flags | int(10) unsigned | NO | | 0 | |
| views | int(10) unsigned | NO | | 1 | |
+----------+------------------------+------+-----+------------------------------------------+----------------+
and I want to select "only rows that have not been flagged more than 5% of their total views, and have been viewed at least 5 times."
Here is my query:
SELECT id,body
FROM tablename
WHERE id NOT IN (
SELECT id
FROM tablename
WHERE flags/views * 100 > 5.0
AND views > 5
ORDER BY id DESC
)
ORDER BY id DESC
LIMIT 6
I'm thinking that selecting "every single row that has been flagged more than 30% of total views" is going to be a ton of overhead, especially when the table grows to a very large number of rows. Could someone please help me optimize this?
I was also thinking of creating a "flag_score" column and just updating that each time something is flagged, that way I could just select on the flag_score column instead of doing the math within the select (and save myself the extra select query). Does that sound like a good approach? Thanks a bunch.
Edit: The other problem I was having is that if I simply do something like:
SELECT *
FROM tabelname
WHERE flags/views * 100 > 5.0
AND views > 5
ORDER BY id DESC
LIMIT 5
... if 4 out of the 5 posts have been flagged, it will only return 1 row! And I'd like the statement to return 5 rows.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
数学计算并不是很昂贵,所以只要你的条目少于,比如说,100 000 个条目,你就可以放心地这样做。
正如您自己所建议的,您始终可以缓存条件:
它的优点是您可以在 *is_over_30_percent* 上放置索引,这样查询根本不会影响性能。
请记住在 *id_of_updated_entry* 和 date 和 id 上放置一个组合索引,以便索引可用于选择和排序(ORDER BY 的成本很高) )。
The math is not very expensive, so as long as you have less than, say, 100 000 entries, you can do it like this with no worry.
As suggested by yourself you can always cache the condition:
which has the advantage that you can put an index on *is_over_30_percent* so the query takes no performance at all.
Remember to put a combined index on *id_of_updated_entry* and date and id so the index can be used for selecting and ordering (ORDER BY is expensive).
我想说缓存是一种合理的方法,但这完全取决于系统中的读/写情况。如果人们一直在标记,那么每次都需要更新 flag_score 值,因此昂贵的操作将会频繁发生。
另外,我不确定您是否会获得性能提升,但您可以执行 flag/views > .05 来保存每个查询的乘法,尽管我不希望这部分那么昂贵。
I would say caching is a reasonable approach, but it all depends on what the read/writes are like in your system. If people are flagging all of the time, then the flag_score value will need to be updated each time, so your expensive operation will be happening a lot.
Separately, I am not sure if you will get a performance boost, but you could do flag/views > .05 to save the multiplication each query, although I wouldn't expect that part to be that expensive.