Informix DB 上索引(索引?)的使用
我的 Informix 数据库中有一个包含超过 500 万条记录的表。
这不是实际的表格,但会显示我遇到的问题。
表:销售额
列:sale_id、sale_confirmed、vendor_id、purchase_id
索引: idx1(sale_id)、idx2(sale_confirmed)、idx3(vendor_id)、idx4(purchaser_id)
如果我执行这样的查询:
select *
from sales
where sale_confirmed IS NULL
or sale_confirmed = ''
那么查询将在大约 4 或 5 秒内完成。
如果我执行这样的查询:
select *
from sales
where vendor_id = 12345
or purchaser_id = 12345
那么查询将在大约 4 或 5 秒内完成。
但是,如果我运行此查询(前 2 个查询的组合):
select *
from sales
where (sale_confirmed IS NULL
or sale_confirmed = '' )
and (vendor_id = 12345
or purchaser_id = 12345)
则该查询运行了 15 分钟,然后我取消了它。
数据库似乎不够智能,无法串联使用不同的索引,即它似乎无法使用 idx2 查找 X 行并在该 X 行中使用 idx3 和 idx4 - 是这样吗,我本以为这样做会足够聪明吗?
有没有办法在处理 WHERE 子句的第二部分时强制数据库使用 idx3 和 idx4?
除了创建新索引之外还有其他解决方案吗?
谢谢。
I have a table with 5 million+ records on an Informix database.
This isn't the actual table but will show the problem I'm having.
Table: sales
Columns: sale_id, sale_confirmed, vendor_id, purchaser_id
Indexes: idx1(sale_id), idx2(sale_confirmed), idx3(vendor_id), idx4(purchaser_id)
If I do a query like this:
select *
from sales
where sale_confirmed IS NULL
or sale_confirmed = ''
then the query runs to completion in about 4 or 5 seconds.
If I do a query like this:
select *
from sales
where vendor_id = 12345
or purchaser_id = 12345
then the query runs to completion in about 4 or 5 seconds.
However, if I run this query (a combination of the 2 previous queries):
select *
from sales
where (sale_confirmed IS NULL
or sale_confirmed = '' )
and (vendor_id = 12345
or purchaser_id = 12345)
then the query ran for 15 mins before I cancelled it.
The database doesn't seem to be smart enough to use the different indexes in tandem i.e. it doesn't seem to be able to use idx2 to find X number of rows AND use idx3 and idx4 within that X number of rows - is that right, I would have thought it would be smart enough to do this?
Is there a way of forcing the database to use idx3 and idx4 when processing the second part of the WHERE clause?
Any other solutions short of creating new indexes?
Thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
尝试使用 UNION,其中每个部分只需选择两个索引:
如果 Informix 支持内联视图,则根据供应商/购买者获取行集,然后从该集中排除未经确认的销售。
最后,我认为您可能希望删除 sale_confirmed 上的低基数索引。
PS 我的数据库中通常不会有允许空字符串、NULL 和其他值的列。如果你有的话,我会将该字段限制为 BIT 类型,即 1 和 0,默认值为 0。
Try it with a UNION where only two indices have to be chosen from in each part:
And if Informix supports inline views, get the set of rows based on vendor/purchaser and then from that set exclude the unconfirmed sales.
Finally, I think you might want to drop the low-cardinality index on sale_confirmed.
P.S. I would normally not have a column in my database that allowed the empty string, NULL, and other values. I'd constrain the field probably to a BIT type if you have it, with 1 and 0, with 0 default.
“sale_confirmed”上的索引不太可能有用,因为“sale_confirmed”的基数似乎很低(NULL,是,否?)。更好的模式设计会在“sale_confirmed”上强制执行 NOT NULL,并且 CHECK 约束会强制执行“Y”或“N”,默认情况下可能会给您“N”,除非您另外指定。这将避免必须对“sale_confirmed”进行 OR 运算,这很混乱。
Tim 建议的 UNION 技术可能是一个不错的解决方法。
The index on 'sale_confirmed' is unlikely to be useful because the cardinality of 'sale_confirmed' seems to be low (NULL, yes, no?). A better schema design would enforce NOT NULL on 'sale_confirmed' and a CHECK constraint would enforce 'Y' or 'N' and a default could give you 'N' unless you specified otherwise. That would avoid having to do OR operations on 'sale_confirmed', which are messy.
The UNION technique suggested by Tim is likely to be a decent workaround.