索引空值以便在 DB2 上快速搜索
据我了解,空值在 DB2 中是不可索引的,因此假设我们有一个巨大的表(Sales),其中包含一个日期列(sold_on),该列通常是日期,但偶尔(10% 的时间)为空。
此外,我们假设它是一个我们无法更改的遗留应用程序,因此这些空值将保留在那里并有意义(假设返回的销售额)。
我们可以通过在 sell_on 和 Total 列上放置索引来加快以下查询的速度,
Select * from Sales
where
Sales.sold_on between date1 and date2
and Sales.total = 9.99
但是索引不会使该查询更快:
Select * from Sales
where
Sales.sold_on is null
and Sales.total = 9.99
因为索引是在值上完成的。
我可以为空值建立索引吗? 也许通过改变索引类型? 索引指标列?
It's my understanding that nulls are not indexable in DB2, so assuming we have a huge table (Sales) with a date column (sold_on) which is normally a date, but is occasionally (10% of the time) null.
Furthermore, let's assume that it's a legacy application that we can't change, so those nulls are staying there and mean something (let's say sales that were returned).
We can make the following query fast by putting an index on the sold_on and total columns
Select * from Sales
where
Sales.sold_on between date1 and date2
and Sales.total = 9.99
But an index won't make this query any faster:
Select * from Sales
where
Sales.sold_on is null
and Sales.total = 9.99
Because the indexing is done on the value.
Can I index nulls? Maybe by changing the index type? Indexing the indicator column?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
特罗尔斯是正确的; 即使 SOLD_ON 值为 NULL 的行也将受益于该列上的索引。 如果您要对 SOLD_ON 进行范围搜索,则通过创建以 SOLD_ON 开头的聚集索引可能会受益更多。 在此特定示例中,可能不需要太多额外开销来维护基于 SOLD_ON 的聚类顺序,因为添加的较新行很可能具有较新的 SOLD_ON 日期。
Troels is correct; even rows with a SOLD_ON value of NULL will benefit from an index on that column. If you're doing ranged searches on SOLD_ON, you may benefit even more by creating a clustered index that begins with SOLD_ON. In this particular example, it may not require much additional overhead to maintain the clustering order based on SOLD_ON, since newer rows added will most likely have a newer SOLD_ON date.
您从哪里得到 DB2 不索引 NULL 的印象? 我在文档或文章中找不到任何支持该主张的内容。 我刚刚在一个大表中使用 IS NULL 限制执行了一个查询,该限制涉及包含一小部分 NULL 的索引列; 在这种情况下,DB2 当然使用了索引(通过 EXPLAIN 进行验证,并通过观察数据库立即响应而不是花费时间来执行表扫描)。
所以:我声称 DB2 对于非主键索引中的 NULL 没有问题。
但正如其他人所写的:您的数据可能以 DB2 认为使用索引不会更快的方式组成。 或者所涉及的表的数据库统计信息不是最新的。
From where did you get the impression that DB2 doesn't index NULLs? I can't find anything in documentation or articles supporting the claim. And I just performed a query in a large table using a IS NULL restriction involving an indexed column containing a small fraction of NULLs; in this case, DB2 certainly used the index (verified by an EXPLAIN, and by observing that the database responded instantly instead of spending time to perform a table scan).
So: I claim that DB2 has no problem with NULLs in non-primary key indexes.
But as others have written: Your data may be composed in a way where DB2 thinks that using an index will not be quicker. Or the database's statistics aren't up-to-date for the involved table(s).
我不是 DB2 专家,但如果您的值中有 10% 为空,我认为仅该列上的索引不会对您的查询有帮助。 10% 太多了,不用费心使用索引——它只会进行表扫描。 如果你说的是 2-3%,我认为它实际上会使用你的指数。
想想一个页面/块上有多少条记录 - 比如说 20。使用索引的原因是为了避免获取不需要的页面。 给定页面包含 0 条空记录的几率是 (90%)^20,即 12%。 这些可能性并不大——无论如何,您都需要获取 88% 的页面,使用索引并不是很有帮助。
但是,如果您的 select 子句仅包含几列(而不是 *)——仅说 salesid,您可能可以让它使用 (sold_on,salesid) 上的索引,因为数据页的读取不会需要——所有数据都在索引中。
I'm no DB2 expert, but if 10% of your values are null, I don't think an index on that column alone will ever help your query. 10% is too many to bother using an index for -- it'll just do a table scan. If you were talking about 2-3%, I think it would actually use your index.
Think about how many records are on a page/block -- say 20. The reason to use an index is to avoid fetching pages you don't need. The odds that a given page will contain 0 records that are null is (90%)^20, or 12%. Those aren't good odds -- you're going to need 88% of your pages to be fetched anyway, using the index isn't very helpful.
If, however, your select clause only included a few columns (and not *) -- say just salesid, you could probably get it to use an index on (sold_on,salesid), as the read of the data page wouldn't be needed -- all the data would be in the index.
根据经验,索引对于最多 15% 的记录值很有用。 ...所以索引在这里可能有用。
如果 DB2 不会索引空值,那么我建议添加一个布尔字段 IsSold,并在设置 sell_on 日期时将其设置为 true(这可以在触发器中完成)。
这不是最好的解决方案,但可能正是您所需要的。
The rule of thumb is that an index is useful for values up on to 15% of the records. ... so an index might be useful here.
If DB2 won't index nulls, then I would suggest adding a boolean field, IsSold, and set it to true whenever the sold_on date gets set (this could be done in a trigger).
That's not the nicest solution, but it might be what you need.