应该喜欢“searchstr%”使用索引？

发布于 2024-12-22 08:57:01 字数 367 浏览 4 评论 0原文

我有一个包含多个字段的数据库：

word_id — INTEGER PRIMARY_KEY
word — TEXT
...

大约 15 万行。由于这是一本字典，因此我正在使用 LIKE 搜索带有掩码 'search_string%' 的单词。它曾经有效，需要 15 毫秒才能找到匹配的行。该表有一个字段'word' 的索引。

我修改了表（某些字段超出了范围），执行查询需要 400 毫秒，所以我理解这一点，因为它现在无法使用索引。使用 = 而不是 LIKE 的直接查询会显示 10 毫秒的结果。这里发生了什么事？

原文

I have a database with several fields :

word_id — INTEGER PRIMARY_KEY
word — TEXT
...

And ~150k rows. Since this is a dictionary, I'm searching for a word with mask 'search_string%' using LIKE. It used to work, taking 15ms to find matching rows. The table has an index for a field 'word'.

I modified the table (some fields which are out of the scope) and it's taking 400ms to execute a query, so I understand that as it fails to use index now. Straightforward query with = instead of LIKE shows 10ms result. What's happening here?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

快乐很简单 2024-12-29 08:57:01

在这种情况下不能安全地使用索引。一个简单的实现会将

... WHERE word LIKE 'search_string%'

转换为

... WHERE word >= 'search_string' AND word 'search_strinh'

通过增加搜索字符串的最后一个字符。大于和小于运算符可以使用索引，而 LIKE 则不能。

不幸的是，这在一般情况下不起作用。 LIKE 运算符不区分大小写，这意味着 'a' LIKE 'A' 为 true。上述转换将破坏任何带有大写字母的搜索字符串。

然而，在某些情况下，您知道区分大小写与特定列无关，并且上述转换是安全的。在这种情况下，您有两种选择。

对覆盖此特定字段的索引使用 NOCASE 整理序列。
通过运行 PRAGMA case_sensitive_like = ON; 来更改程序范围内 LIKE 运算符的行为；

这些行为中的任何一个都将使 SQLite 能够透明地为您执行上述转换；您只需像往常一样继续使用 LIKE ，SQLite 将重写基础查询以使用索引。

您可以在SQLite 查询优化器概述页面上阅读有关“LIKE 优化”的更多信息。

回复收藏 0 原文

空宴 2024-12-29 08:57:01

GLOB prefix* 解决方法

此方法确实使用索引，因此它是 LIKE prefix% 的一个很好的解决方法，不需要修改表排序规则：

SELECT * FROM mytable WHERE mycolumn GLOB 'myprefix*'

来自文档：

GLOB 运算符与 LIKE 类似，但其通配符使用 Unix 文件通配符语法。此外，与 LIKE 不同，GLOB 区分大小写。

另请参阅：可以建立索引吗在文本列上加速基于前缀的 LIKE 查询？

要求 SQLite 为我们建议一个索引

我们还可以要求 SQLite 自动建议我们需要的正确的缺失索引类型使用选项查询SQLite 识别缺失索引中提到。首先我们设置一个测试表：

sqlite3 tmp.sqlite 'CREATE TABLE mytable (mycolumn TEXT)'

然后，例如使用 .expert 我们可以检查需要哪个索引来加速 LIKE 前缀：

sqlite3 tmp.sqlite -cmd .expert \
  -cmd "SELECT * FROM mytable WHERE mycolumn LIKE 'myprefix%'" ''

输出：

CREATE INDEX mytable_idx_79a75761 ON mytable(mycolumn COLLATE NOCASE);

SEARCH mytable USING COVERING INDEX mytable_idx_79a75761 (mycolumn>? AND mycolumn<?)

所以我们看到它推荐了一个在这种情况下，COLLATE NOCASE 索引。对于 GLOB:

sqlite3 tmp.sqlite -cmd .expert \
  -cmd "SELECT * FROM mytable WHERE mycolumn GLOB 'myprefix*'" ''

给出：

CREATE INDEX mytable_idx_234df46c ON mytable(mycolumn);

SEARCH mytable USING COVERING INDEX mytable_idx_234df46c (mycolumn>? AND mycolumn<?)

所以在这种情况下它只是推荐一个常规索引。

在 Sqlite 3.40.1、Ubuntu 23.04 上的一个大型数据库的 TEXT 列上进行了测试，我拥有 7.5 亿行。查询结果几乎是即时的。

相关：

区分大小写和不区分大小写，就像 SQLite 中一样

GLOB prefix* workaround

This method does use the index, and so it is a good workaround for LIKE prefix% that doesn't require modifying the table collation:

SELECT * FROM mytable WHERE mycolumn GLOB 'myprefix*'

From the docs:

The GLOB operator is similar to LIKE but uses the Unix file globbing syntax for its wildcards. Also, GLOB is case sensitive, unlike LIKE.

Ask SQLite to suggest an index for us

We can also ask for SQLite to automatically suggest the correct missing index type that we would need for each query by using the options mentioned at SQLite identify missing index. First we setup a test table:

sqlite3 tmp.sqlite 'CREATE TABLE mytable (mycolumn TEXT)'

Then e.g. with .expert we can check which index would be needed to accelerate a LIKE prefix with:

sqlite3 tmp.sqlite -cmd .expert \
  -cmd "SELECT * FROM mytable WHERE mycolumn LIKE 'myprefix%'" ''

Output:

CREATE INDEX mytable_idx_79a75761 ON mytable(mycolumn COLLATE NOCASE);

SEARCH mytable USING COVERING INDEX mytable_idx_79a75761 (mycolumn>? AND mycolumn<?)

so we see that it recommends a COLLATE NOCASE index in this case. And for GLOB:

sqlite3 tmp.sqlite -cmd .expert \
  -cmd "SELECT * FROM mytable WHERE mycolumn GLOB 'myprefix*'" ''

gives:

CREATE INDEX mytable_idx_234df46c ON mytable(mycolumn);

SEARCH mytable USING COVERING INDEX mytable_idx_234df46c (mycolumn>? AND mycolumn<?)

so it just recommended a regular index in that case.

Tested on Sqlite 3.40.1, Ubuntu 23.04 on a TEXT column of a large-ish database I have lying around with 750 M rows. Query results are almost instantaneous.