SQLite 中的按位掩码与 IN() 效率?

发布于 2024-10-20 10:29:39 字数 456 浏览 4 评论 0原文

我有两种方法从数据库中选择一组条目:

  SELECT ... WHERE `level` IN (1,2,4,8) LIMIT ...;  

或者

  SELECT ... WHERE `level` & mask LIMIT ...;

总共有 4 个“级别”,编号为 1、2、4、8(因为能够在其他地方也使用相同的掩码)。 IN() 的大括号或 mask 都可以包含 4 个级别中的一个或多个级别中的任意一组。该列已建立索引。查询所花费的时间仍然比舒适的要长,我们正在尝试优化它以提高速度。

昨天,有人说决定使用朴素的 IN() 会导致最多四次比较,而我应该使用位掩码来代替。今天我听说位掩码将完全阻碍列上索引的优势,并且速度会慢得多。

你能告诉我哪种方法会更快吗?

I have two ways to select a set of entries from the database:

  SELECT ... WHERE `level` IN (1,2,4,8) LIMIT ...;  

or

  SELECT ... WHERE `level` & mask LIMIT ...;

There are 4 'levels' total, numbered 1,2,4,8 (for reasons of ability to use the same mask elsewhere too). Both the braces of IN() or the mask can contain any set of one or more of the 4 levels. The column is indexed. The query is still taking way longer than comfortable, and we're trying to optimize it for speed.

Yesterday one person said decided using naive IN() results in up to four comparisons and that I should be using a bit mask instead. Today I heard the bit mask will completely thwart advantages from index on the column, and will be much slower.

Can you tell me which approach will be faster?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

夜光 2024-10-27 10:29:39

你的问题很老了,但我仍然会回答它。

位掩码很可能会更慢,因为它必须计算按位 AND 的计算,而 IN 将使用 level索引值来计算在括号内的参数中查找它(我相信,这应该是一个 O(log(n)) 操作)。

现在,您可能会错过的一件事是,他们不做同样的事情。

您的第一个查询将简单地检查 level 是否为 1、2、4 或 8。

您的第二个查询,或者实际上类似于:

SELECT ... WHERE (`level` & mask) = mask LIMIT ...;

能够查找包含以下内容的 levels您想要的掩码以及可能更多的掩码,在您的情况下,它可能会检查 1 到 15 之间的所有值组合。因此,性能会受到影响。


至于 @AlanFoster 建议的暴力基准测试,我不同意他的观点。

最好在查询中添加以下前缀:

  • EXPLAIN
  • EXPLAIN QUERY PLAN

并检查 SQLite 匹配的行数。


更新

EXPLAIN QUERY PLAN SELECT * FROM ... WHERE level IN (2, 3);

SEARCH TABLE ... USING INDEX ..._level (level=?) (~20 rows)

EXPLAIN QUERY PLAN SELECT * FROM ... WHERE (level & 2) = 2; code>

SCAN TABLE ... (~500000 rows)

如您所见,按位AND运算符需要进行全表扫描。

Your question is quite old but I'm still gonna answer it nonetheless.

The bitmask will most probably be slower as it has to work out the computation of the bitwise AND whereas IN will use the indexed value of level to look it up in the arguments enclosed within the parentheses (which, I believe, should be a single O(log(n)) operation).

Now, the thing that you may be missing, is that they don't do the same thing.

Your first query will simply check if level is either 1, 2, 4 or 8.

Your second query, or actually something like:

SELECT ... WHERE (`level` & mask) = mask LIMIT ...;

Has the ability to lookup levels that contain the mask you want and potentially some more, in your case it could be checking all value combinations between 1 and 15. Hence the performance hit.


As for the bruteforced benchmark @AlanFoster suggested, I don't agree with him.

It's far better to prefix the query with either:

  • EXPLAIN, or
  • EXPLAIN QUERY PLAN

And inspect how many rows SQLite is matching.


Update

EXPLAIN QUERY PLAN SELECT * FROM ... WHERE level IN (2, 3);

SEARCH TABLE ... USING INDEX ..._level (level=?) (~20 rows)

EXPLAIN QUERY PLAN SELECT * FROM ... WHERE (level & 2) = 2;

SCAN TABLE ... (~500000 rows)

As you can see, the bitwise AND operator needs a full-table scan.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文