SQLite:ANALYZE 的缺点
ANALYZE 命令有任何缺点吗(除了稍大的数据库)?如果没有,为什么默认不执行?
Does the ANALYZE command have any downsides (except a slighty larger db)? If not, why is not executed by default?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
还有另一个缺点。 ANALYZE 结果可能会导致查询计划程序忽略您真正想要使用的索引。
例如,假设您有一个带有布尔列“isSpecial”的表。大多数行的 isSpecial = 0,但也有一些行的 isSpecial = 1。
当您执行查询 SELECT * FROM MyTable WHERE isSpecial = 1 时,在没有 ANALYZE 数据的情况下,查询规划器将假设isSpecial 上的索引很好并且会使用它。在这种情况下,它恰好是正确的。如果您要执行 isSpecial = 0 那么它仍然会使用索引,这将是低效的,所以不要这样做。
运行 ANALYZE 后,查询计划器将知道 isSpecial 只有两个值,因此索引的选择性很差。所以它不会使用它,即使在上面的 isSpecial = 1 情况下也是如此。为了知道 isSpecial 值的分布非常不均匀,它需要仅在使用 SQLITE_ENABLE_STAT4 选项编译时收集的数据。该选项默认情况下未启用,并且它有一个很大的缺点:它使准备好的语句的查询计划取决于其绑定值,因此 sqlite 将更频繁地重新准备语句。 (可能每次执行时,我都不知道细节)
tl;dr:运行 ANALYZE 几乎不可能在布尔字段上使用索引,即使您知道它们会有所帮助。
There is another downside. The ANALYZE results may cause the query planner to ignore indexes that you really want to use.
For example suppose you have a table with a boolean column "isSpecial". Most of the rows have isSpecial = 0 but there are a few with isSpecial = 1.
When you do a query
SELECT * FROM MyTable WHERE isSpecial = 1
, in the absence of ANALYZE data the query planner will assume the index on isSpecial is good and will use it. In this case it will happen to be right. If you were to do isSpecial = 0 then it would still use the index, which would be inefficient, so don't do that.After you have run ANALYZE, the query planner will know that isSpecial has only two values, so the selectivity of the index is bad. So it won't use it, even in the isSpecial = 1 case above. For it to know that the isSpecial values are very unevenly distributed it would need data that it only gathers when compiled with the SQLITE_ENABLE_STAT4 option. That option is not enabled by default and it has a big downside of its own: it makes the query plan for a prepared statement depend on its bound values, so sqlite will re-prepare the statement much more often. (Possibly every time it's executed, I don't know the details)
tl;dr: running ANALYZE makes it almost impossible to use indexes on boolean fields, even when you know they would be helpful.
简短的回答:计算所花费的时间可能比节省的时间还要多。
与索引不同,添加或更新数据时,ANALYZE 统计数据不会自动保持最新状态。每当添加或更新大量数据时,您都应该重新运行 ANALYZE。
Short answer: it may take more time to calculate than time saved.
Unlike indices the ANALYZE-statistics are not kept up-to-date automatically when data is added or updated. You should rerun ANALYZE any time a significant amount of data has been added of updated.