应该喜欢“searchstr%”使用索引?
我有一个包含多个字段的数据库:
word_id — INTEGER PRIMARY_KEY
word — TEXT
...
大约 15 万行。由于这是一本字典,因此我正在使用 LIKE
搜索带有掩码 'search_string%'
的单词。它曾经有效,需要 15 毫秒才能找到匹配的行。该表有一个字段'word'
的索引。
我修改了表(某些字段超出了范围),执行查询需要 400 毫秒,所以我理解这一点,因为它现在无法使用索引。使用 =
而不是 LIKE
的直接查询会显示 10 毫秒的结果。这里发生了什么事?
I have a database with several fields :
word_id — INTEGER PRIMARY_KEY
word — TEXT
...
And ~150k rows. Since this is a dictionary, I'm searching for a word with mask 'search_string%'
using LIKE
. It used to work, taking 15ms to find matching rows. The table has an index for a field 'word'
.
I modified the table (some fields which are out of the scope) and it's taking 400ms to execute a query, so I understand that as it fails to use index now. Straightforward query with =
instead of LIKE
shows 10ms result. What's happening here?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
在这种情况下不能安全地使用索引。一个简单的实现会将
... WHERE word LIKE 'search_string%'
转换为
... WHERE word >= 'search_string' AND word
'search_strinh'
通过增加搜索字符串的最后一个字符。大于和小于运算符可以使用索引,而 LIKE 则不能。
不幸的是,这在一般情况下不起作用。
LIKE
运算符不区分大小写,这意味着'a' LIKE 'A'
为 true。上述转换将破坏任何带有大写字母的搜索字符串。然而,在某些情况下,您知道区分大小写与特定列无关,并且上述转换是安全的。在这种情况下,您有两种选择。
NOCASE
整理序列。PRAGMA case_sensitive_like = ON; 来更改程序范围内
LIKE
运算符的行为;这些行为中的任何一个都将使 SQLite 能够透明地为您执行上述转换;您只需像往常一样继续使用
LIKE
,SQLite 将重写基础查询以使用索引。您可以在SQLite 查询优化器概述页面上阅读有关“LIKE 优化”的更多信息。
An index cannot safely be used in this case. A naive implementation would transform this:
... WHERE word LIKE 'search_string%'
into
... WHERE word >= 'search_string' AND word < 'search_strinh'
by incrementing the last character of the search string. The greater-than and less-than operators can use an index, where LIKE cannot.
Unfortunately, that won't work in the general case. The
LIKE
operator is case-insensitive, which means that'a' LIKE 'A'
is true. The above transformation would break any search string with capitalized letters.In some cases, however, you know that case sensitivity is irrelevant for a particular column, and the above transformation is safe. In this case, you have two options.
NOCASE
collating sequence on the index that covers this particular field.LIKE
operator program-wide by runningPRAGMA case_sensitive_like = ON;
Either of these behaviors will enable SQLite to transparently do the above transformation for you; you just keep using
LIKE
as always, and SQLite will rewrite the underlying query to use the index.You can read more about "The LIKE Optimization" on the SQLite Query Optimizer Overview page.
GLOB prefix*
解决方法此方法确实使用索引,因此它是
LIKE prefix%
的一个很好的解决方法,不需要修改表排序规则:来自文档:
另请参阅:可以建立索引吗在文本列上加速基于前缀的 LIKE 查询?
要求 SQLite 为我们建议一个索引
我们还可以要求 SQLite 自动建议我们需要的正确的缺失索引类型使用选项查询SQLite 识别缺失索引中提到。首先我们设置一个测试表:
然后,例如使用
.expert
我们可以检查需要哪个索引来加速LIKE
前缀:输出:
所以我们看到它推荐了一个在这种情况下,
COLLATE NOCASE
索引。对于 GLOB:给出:
所以在这种情况下它只是推荐一个常规索引。
在 Sqlite 3.40.1、Ubuntu 23.04 上的一个大型数据库的
TEXT
列上进行了测试,我拥有 7.5 亿行。查询结果几乎是即时的。相关:
GLOB prefix*
workaroundThis method does use the index, and so it is a good workaround for
LIKE prefix%
that doesn't require modifying the table collation:From the docs:
See also: Can an index on a text column speed up prefix based LIKE queries?
Ask SQLite to suggest an index for us
We can also ask for SQLite to automatically suggest the correct missing index type that we would need for each query by using the options mentioned at SQLite identify missing index. First we setup a test table:
Then e.g. with
.expert
we can check which index would be needed to accelerate aLIKE
prefix with:Output:
so we see that it recommends a
COLLATE NOCASE
index in this case. And forGLOB
:gives:
so it just recommended a regular index in that case.
Tested on Sqlite 3.40.1, Ubuntu 23.04 on a
TEXT
column of a large-ish database I have lying around with 750 M rows. Query results are almost instantaneous.Related: