在SQL中,WHERE子句的顺序有什么影响吗?
我的数据库中有一个类似这样的表:
----------------------------------------------------------
| event_id | date | start_time | end_time | duration |
----------------------------------------------------------
| 1 | 2011-05-13 | 01:00:00 | 04:00:00 | 10800 |
| 2 | 2011-05-12 | 17:00:00 | 01:00:00 | 28800 |
| 3 | 2011-05-11 | 11:00:00 | 14:00:00 | 10800 |
----------------------------------------------------------
此示例数据并不能提供完全准确的图片,通常存在涵盖每天每小时的事件。 日期始终指开始时间,因为结束时间有时可能是第二天。 持续时间以秒为单位。
SELECT *
FROM event_schedules
WHERE (
date = CURDATE() //today
OR
date = DATE_SUB(CURDATE(), INTERVAL 1 DAY) //yesterday
)
// and ended before now()
AND DATE_ADD(CONCAT(date, ' ', start_time), INTERVAL duration SECOND) < NOW()
ORDER BY CONCAT(date, ' ', start_time) DESC
LIMIT 1
我在那里有一个子句,括号中的“或”子句,这是不必要的。我希望它可以通过首先过滤掉今天或昨天不开始的任何“事件”来缩短查询时间。查找最近“事件”的唯一方法是对记录进行排序并取第一个。通过添加这个额外的不必要的子句,我实际上减少了需要排序的记录列表吗?如果确实如此,我无法想象优化器能够进行此优化,大多数其他与此类似的问题都与优化器有关。
I have an table in my DB something like this:
----------------------------------------------------------
| event_id | date | start_time | end_time | duration |
----------------------------------------------------------
| 1 | 2011-05-13 | 01:00:00 | 04:00:00 | 10800 |
| 2 | 2011-05-12 | 17:00:00 | 01:00:00 | 28800 |
| 3 | 2011-05-11 | 11:00:00 | 14:00:00 | 10800 |
----------------------------------------------------------
This sample data doesn't give a totally accurate picture, there is typically events covering every hour of every day.
The date always refers to the start_time, as the end_time can sometimes be the following day.
The duration is in seconds.
SELECT *
FROM event_schedules
WHERE (
date = CURDATE() //today
OR
date = DATE_SUB(CURDATE(), INTERVAL 1 DAY) //yesterday
)
// and ended before now()
AND DATE_ADD(CONCAT(date, ' ', start_time), INTERVAL duration SECOND) < NOW()
ORDER BY CONCAT(date, ' ', start_time) DESC
LIMIT 1
I have a clause in there, the OR'ed clause in brackets, that is unnecessary. I hoped that it might improve the query time, by first filtering out any "events" that do not start today or yesterday. The only way to find the most recent "event" is by ordering the records and taking the first. By adding this extra unnecessary clause am I actually reducing the list of records that need to be ordered? If it does I can't imagine the optimizer being able to make this optimization, most other questions similar to this talk about the optimizer.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
向 WHERE 子句添加过滤器以提高性能时请务必小心。虽然它可以减少需要搜索的总行数,但如果过滤大量记录而不使用索引,实际过滤器本身可能会导致更高的成本。在您的情况下,如果列日期已建立索引,您可能会获得更好的性能,因为它可以在 OR 部分使用索引,而在其他部分则不能使用索引,因为它被作为函数调用。另外,你能有未来的约会吗?如果不是,为什么不将 OR 更改为
Be careful when adding filters to your WHERE clause for performance. While it can reduce the overall number of rows that need to be searched, the actual filter itself can cause a higher cost if it's filtering a ton of records and not using an index. In your case, if the column date is indexed, you'll probably get better performance because it can use the index in the OR part, where as it can't in the other parts because it's being called as a function. Also, can you have future dates? If not, why don't you change the OR to
where子句的顺序确实会影响sql引擎获取结果的方式。
他们中的许多人都有办法查看引擎对查询执行的操作。如果您使用 sqlserver,请在客户端工具中查找“显示估计执行计划”。有些有一个像“解释”这样的动词,可以用来显示引擎如何处理查询。
The order of the where clause does affect the way the sql engine gets the results.
Many of them have a way to view what the engine does with a query. If you're using sqlserver look for "show estimated execution plan" in your client tool. Some have a verb like "explain" that can be used to show how the engine treats a query.
嗯,查询引擎中的优化器是任何查询性能或两个等效语句的相对性能的重要组成部分。
您没有告诉我们您运行的查询是否带有额外的 where 。可能存在性能差异,也可能没有。
我的猜测是 LIMIT 与此有很大关系。引擎知道这是一个“一劳永逸”的操作。如果没有 WHERE,排序就是一个 NlogN 操作,在这种特殊情况下,可以通过简单地扫描日期来查找最新的日期来使其线性化。
使用 WHERE,您实际上增加了它必须执行的步骤数;要么必须对表进行完全排序 (NlogN),然后扫描该列表以查找与 WHERE 子句匹配的第一条记录(线性最坏情况,恒定最佳情况),或者必须按 WHERE 进行过滤(线性),然后再次扫描这些记录以找到最大日期(再次线性)。无论哪一个结果更快,它们都比最近日期的列表的一次线性扫描慢。
Well, the optimizer in the query engine is a big part of any query's performance, or the relative performance of two equivalent statements.
You didn't tell us if you ran the query with and without the extra where. There may be a performance difference, there may not.
My guess is that the LIMIT has a lot to do with it. The engine knows this is a "one and done" operation. Without the WHERE, sorting is an NlogN operation, which in this special case can be made linear with a simple scan of the dates to find the most recent.
With the WHERE, you're actually increasing the number of steps it has to perform; either it has to fully order the table (NlogN) and then scan that list for the first record that matches the WHERE clause (linear worst-case, constant best-case), OR it has to filter by the WHERE (linear), then scan those records again to find the max date (linear again). Whichever one turns out faster, they're both slower than one linear scan of the list for the most recent date.