位图索引有何帮助?
维基百科给出了这个例子
Identifier Gender Bitmaps
F M
1 Female 1 0
2 Male 0 1
3 Male 0 1
4 Unspecified 0 0
5 Female 1 0
但我不明白这一点。
- 首先,这是一个索引吗?索引不是应该指向给定键的行(使用 rowid 的)吗?
- 此类索引有用的典型查询是什么?它们比 B 树索引好在哪里?我知道,如果我们在这里使用
Gender
的 B 树索引,我们会得到很多结果,例如,我们查找Gender = Male
,这需要进一步过滤掉(所以不是很有用)。位图如何改善这种情况?
Wikipedia gives this example
Identifier Gender Bitmaps
F M
1 Female 1 0
2 Male 0 1
3 Male 0 1
4 Unspecified 0 0
5 Female 1 0
But I do not understand this.
- How is this an index first of all? Isn't an index supposed to point to rows (using rowid's) given the key?
- What would be the typical queries where such indexes would be useful? How are they better than B-tree indexes? I know that if we use a B-tree index on
Gender
here, we will get a lot of results if for example, we look forGender = Male
, which need to be filtered out further (so not very useful). How does a Bitmap improve the situation?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
位图索引的更好表示是,如果给出上面的示例:
性别列上的位图索引(概念上)如下所示:
当列中不同值的数量相对较低时使用位图索引(考虑相反,所有值都是唯一的:位图索引将与每一行一样宽,和一样长,使其有点像一个大的单位矩阵。)
因此,有了这个索引,就可以像
数据库 一样进行查询在索引中查找性别值的匹配项,找到该位设置为 1 的所有 rowid,然后获取表结果。
像这样的查询:
将获取男性的 1 位,未指定的 1 位,执行按位或,然后获取结果位为 1 的行。
因此,使用位图索引相对于 ab*tree 索引的优点是存储(基数较低,位图索引非常紧凑),以及在解析实际 rowid 之前执行按位操作的能力,这可能非常快。
请注意,位图索引可能会对插入/删除产生性能影响(从概念上讲,您向位图中添加/删除一列,并相应地重新调整它......),并且可能会产生大量争用,因为对行的更新可能会产生大量争用。锁定整个相应的位图条目,并且在提交/回滚第一个更新之前,您无法更新不同的行(具有相同的位图值)。
A better representation of a bitmap index, is if given the sample above:
the a bitmap index on the gender column would (conceptually) look like this:
Bitmap indexes are used when the number of distinct values in a column is relatively low (consider the opposite where all values are unique: the bitmap index would be as wide as every row, and as long making it kind of like one big identity matrix.)
So with this index in place a query like
the database looks for a match in the gender values in the index, finds all the rowids where the bit was set to 1, and then goes and gets the table results.
A query like:
would get the 1 bits for Male, the 1 bits for Unspecified, do a bitwise-OR then go get the rows where the resulting bits are 1.
So, the advantages of using a bitmap index over a b*tree index are storage (with low cardinality, bitmap indexes are pretty compact), and the ability to do bitwise operations before resolving the actual rowids which can be pretty quick.
Note that bitmap indexes can have performance implications with inserts/deletes (conceptually, you add/remove a column to/from the bitmap and rejig it accordingly...), and can create a whole lot of contention as an update on a row can lock the entire corresponding bitmap entry and you can't update a different row (with the same bitmap value) until the first update is committed/rolled back.
这样做的好处是在多列上进行过滤,然后在实际选择数据之前可以通过按位运算合并相应的索引。
如果您有性别、眼睛颜色、头发颜色
那么查询
首先会在 eye_colour['blue'] 索引和 hair_colour['blonde'] 索引之间进行按位或运算,最后在结果和性别['male'] 索引之间进行按位或运算。此操作的计算和 I/O 执行速度都非常快。
生成的比特流将用于挑选实际的行。
位图索引通常用于数据仓库应用程序中的“星型连接”。
The benefit comes when filtering on multiple columns, then the corresponding indexes can be merged with bitwise operations before actually selecting the data.
If you have gender, eye_colour, hair_colour
then the query
would first make a bitwise or between the eye_colour['blue'] index and the hair_colour['blonde'] index and finally bitwise and between the result and the gender['male'] index. This operation performs really fast both computationally and I/O.
The resulting bit stream would be used for picking the actual rows.
Bitmap indexes are typically used in "star joins" in data warehouse applications.
正如维基百科文章中所述,它们使用按位运算,这比比较整数等数据类型的性能更好,因此简短的答案是提高查询速度。
从理论上讲,从示例中选择所有男性或所有女性应该花费更少的计算和时间。
只要想想它在幕后是如何工作的,就会明白为什么它会更快。位在逻辑上要么是真,要么是假。如果您想使用 WHERE 子句进行查询,这最终将评估记录的 true 或 false,以便确定是否将它们包含在结果中。
前言 - 其余部分仅供外行和非技术人员使用
那么下一个问题是如何评估为真?即使比较数值也意味着计算机必须...
如果您使用的是多个部分的 where 子句,例如Where“this = this AND that = that”,请重复
但使用按位逻辑,您只需查看 0(假)和 1(真)值。消除了 90% 的比较工作开销。
As indicated in the Wikipedia article, they use bitwise operations, which can perform better than comparing data types such as integers, so the short answer is increased speed of queries.
Theoretically, it should take up less computations and less time to select all males or all females from your example.
Just thinking about how this works under the hood should make why this is faster obvious. A bit is logically either true or false. If you want to do a query using a WHERE clause, this will eventually evaluate to either a true or a false for the records in order to determine whether to include them in your results.
Preface - the rest of this is meant to be layman's terns and non-techie
So the next question is what does it take to evaluate to true? Even comparing numeric values means that the computer has to...
repeat if you're using a multiple part where clause such as Where "this = this AND that = that"
But using bitwise logic, you're just looking at 0 (false) and 1 (true) values. 90% of the overhead for the comparison work is eliminated.