MySQL - 为什么 COUNT 带有“大于”?快但“小于”需要永远吗?

发布于 2024-09-14 19:32:11 字数 1368 浏览 11 评论 0原文

SELECT count(*) c FROM full_view WHERE verified > ( DATE (NOW()) - INTERVAL 30 DAY)

如果我运行该查询,则需要一瞬间的时间,但如果我切换比较运算符,则需要很长时间。现在,第一种方式计数 = 0,第二种方式计数 = 120000,但如果我只计算整个表,也需要微秒。

但有一些奇怪的事情正在发生,因为如果查询确实完成,那么此后它会运行得非常快。 MySQL 正在缓存查询或者其他正确的东西吗?好吧,我不想依赖缓存来确保网站不会挂起。

这看起来很荒谬:如果它可以快速计算出大于某个日期的所有内容,为什么还要花更长的时间来计算相反的日期呢?无论哪种方式,它都必须查看整个表,对吗?它需要返回的只是一个数字,因此带宽不应该成为问题。

解释一下查询:

1, 'SIMPLE', 'b', 'range', 'updated,verified_index', 'updated', '3', '', 28, 'Using where'`    
1, 'SIMPLE', 'l', 'eq_ref', 'PRIMARY', 'PRIMARY', '4', 'xyz_main.b.loc_id', 1, 'Using index'
1, 'SIMPLE', 'f', 'ALL', '', '', '', '', 2214, ''

编辑:

这可能有点有趣,我在运行查询时发现了这个信息:

Handler_read_rnd_next:

  • 254436689(当小于)
  • 2(大于)

Key_read_requests: 314393 vs 33(33 是使用大于时所有统计数据的最大数字)

Handler_read_key: 104303 vs 1

绕过视图并直接在主表上运行查询可以消除缓慢的情况。那么我需要做什么来加快速度呢?该视图本质上是这样的:

SELECT x, y, z, verified FROM table1 LEFT JOIN table2 on tab2_ID = table2.ID LEFT JOIN table3 on tab3_ID = table3.ID

已解决: 弗兰基带领我走向正确的方向。第二个连接表(公司表)是通过公司的全文名称连接的。我最近才决定向该表添加一个整数键。名称列应该被索引,但我可能搞砸了。无论如何,我重新组织了一切。我转换了主表中的外键以匹配公司表的整数 ID,而不是完整的公司名称。我重新索引了每个表中的这些列,然后更新了视图以反映新的连接点。现在它可以立即双向运行。 :) 所以我猜整数键是关键。问题已经解决了,但我仍然觉得我原来的问题没有真正解决。

谢谢你们的帮助。

SELECT count(*) c FROM full_view WHERE verified > ( DATE (NOW()) - INTERVAL 30 DAY)

If I run that query it takes a split second but if I switch the comparison operator around it takes eons. Now the first way the count = 0 and the second way the count = 120000, but if I just count the whole table that also takes microseconds.

But there is something something funky going on because if the query ever does finish it runs super quick thereafter. MySQL is caching the query or something right? Well, I don't want to depend on caches to make sure the website doesn't hang.

This seems nonsensical: if it can count everything greater than a certain date quickly, why should it take any longer to count the opposite? Either way it has to look through the whole table right? And all it needs to return is a number so bandwidth shouldn't be an issue.

Explain on the query:

1, 'SIMPLE', 'b', 'range', 'updated,verified_index', 'updated', '3', '', 28, 'Using where'`    
1, 'SIMPLE', 'l', 'eq_ref', 'PRIMARY', 'PRIMARY', '4', 'xyz_main.b.loc_id', 1, 'Using index'
1, 'SIMPLE', 'f', 'ALL', '', '', '', '', 2214, ''

EDIT:

This may be of some interest, I found this info when I run the query:

Handler_read_rnd_next:

  • 254436689 (when doing less than)
  • 2 (for greater than)

Key_read_requests:
314393 vs 33 (33 is the biggest number for all of the stats when using greater than)

Handler_read_key:
104303 vs 1

Bypassing the view and running the query directly on the main table eliminates the slowness. So what do I need to do to speed it up? The view is essentially like this:

SELECT x, y, z, verified FROM table1 LEFT JOIN table2 on tab2_ID = table2.ID LEFT JOIN table3 on tab3_ID = table3.ID

SOLVED:
Frankie led my in the right direction. The second joined table (the company table) was joined via the full text name of the companies. I only recently decided to add a integer key to that table. The name column was supposed to be indexed but I may have botched that. Anyway I re-organized everything. I converted the foreign key in the main table to match the integer ID of the company table rather than the full company name. I re-indexed those columns in each table, then I updated the view to reflect the new join point. Now it runs instantly in both directions. :) So I guess integer keys were the key. The problem is gone but still, I don't feel like my original question was really solved.

Thanks for your help guys.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

郁金香雨 2024-09-21 19:32:11

请运行以下查询并发布结果。

EXPLAIN SELECT count(*) c 
FROM full_view 
WHERE verified > ( DATE (NOW()) - INTERVAL 30 DAY)

长期被遗忘的解释 几乎总是带一些东西进来! ;)


编辑1:
这可能是令人反感的一行:

1, 'SIMPLE', 'f', 'ALL', '', '', '', '', 2214, ''

其中的 ALL 表明存在全表扫描。

您可以进一步深入了解解释< /code> 此图上的语法

请尝试看看差异在哪里...


编辑 2:
该文档肯定会让 Explain 上的事情变得更加清晰输出。请检查一下。


编辑3:
逐步分析explain命令。

1, 'SIMPLE', 'b', 'range', 'updated,verified_index', 'updated', '3', '', 28, 'Using where'`    
1 - id
SIMPLE - simple select, not using sub-queries
b - table name
range - only rows that are in a given range are retrieved, using an index
updated,verified_index - are both possible keys
updated - was the key eventually used
3 - key lenght
'' - this is the ref column and would show which columns or constants are compared to the index name in the key column to select rows from the table.
28 - number of rows mysql believes it must examine to execute the query
Using where - self explanatory

Please run the bellow query and post the results.

EXPLAIN SELECT count(*) c 
FROM full_view 
WHERE verified > ( DATE (NOW()) - INTERVAL 30 DAY)

The long forgotten EXPLAIN almost always brings something in! ;)


Edit 1:
This is probably the offensive line:

1, 'SIMPLE', 'f', 'ALL', '', '', '', '', 2214, ''

The ALL there states that there is a FULL table scan.

You can dig further into the Explain syntax on this diagram.

Do try to see where the differences go...


Edit 2:
This doc will sure make things much clearer on the Explain output. Please check it out.


Edit 3:
Step-by-step analys of the explain command.

1, 'SIMPLE', 'b', 'range', 'updated,verified_index', 'updated', '3', '', 28, 'Using where'`    
1 - id
SIMPLE - simple select, not using sub-queries
b - table name
range - only rows that are in a given range are retrieved, using an index
updated,verified_index - are both possible keys
updated - was the key eventually used
3 - key lenght
'' - this is the ref column and would show which columns or constants are compared to the index name in the key column to select rows from the table.
28 - number of rows mysql believes it must examine to execute the query
Using where - self explanatory
因为看清所以看轻 2024-09-21 19:32:11

我的猜测是,从 Date(Now()) 中减去的内容需要很长时间才能处理。对于已经小于 Date(Now())verified 值,评估可能会被短路,因为此时它必须为 false(当比较“更大比”)。

在与“小于”进行比较的情况下,无论当前值如何,在每种情况下都必须减去日期时间,因为在评估日期时间减法之前无法从逻辑上得出表达式为真或假的结论

不过,这只是一个猜测——对此持保留态度。

My guess is that the subtraction from the Date(Now()) is what's taking a long time to process. For values of verified that are already less than Date(Now()), the evaluation can be short-circuited because at that point it MUST be false (when comparing "greater than").

In the situation where you're comparing with "less than," the date time must be subtracted in every case, regardless of current value, since it has no way of logically concluding the expression to be true or false prior to evaluating the datetime subtraction

That's just a guess, though - take it with a grain of salt.

虚拟世界 2024-09-21 19:32:11

可能有这样的情况:有统计数据告诉数据库引擎没有经过验证的记录> > 。 30天前。在这种情况下,它甚至根本不需要读取表格,而是从统计直方图中获取信息。

It may be the case that there are statistics telling the database engine that there are no records for verified > 30 days ago. In that case, it doesn't even have to read the table at all, but rather get info from the stats histogram.

你不是我要的菜∠ 2024-09-21 19:32:11

如果表中有关于 verified 的索引,则限制性更强的 COUNT(> 1)会更快。不带 WHERE 子句的 COUNT(*) 可以快速返回,因为可以从表/索引统计信息中收集计数。

If you've an index on verified in the table then the more restrictive COUNT (the > one) will be quicker. The COUNT(*) without a WHERE clause can return quickly as the count can be gleaned from table/index statistics.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文