Oracle Anti-Join 执行计划问题
我们有两个表,如下所示:
Event
id
type
... a bunch of other columns
ProcessedEvent
event_id
process
定义了索引。
- 为Event(id) (PK)
- ProcessedEvent (event_id, process)
第一个表示应用程序中的事件。
第二个表示某个事件由某个进程得到处理的事实。有许多进程需要处理某个事件,因此第一个表中的每个条目在第二个表中都有多个条目。
为了找到所有需要处理的事件,我们执行以下查询:
select * // of course we do name the columns in the production code
from Event
where type in ( 'typeA', 'typeB', 'typeC')
and id not in (
select event_id
from ProcessedEvent
where process = :1
)
统计信息是最新的
由于大多数事件都已处理,我认为最好的执行计划应该类似于
- ProcessedEvent Index 上的完整索引
- 扫描 事件索引
- 访问两个表之间的
- 使用其余
- 过滤器
执行以下
- 反连接 相反,Oracle对 ProcessedEvent
- 完整索引扫描 事件表过滤器上的索引全表扫描
- 事件表
- 两个集合之间的
反连接使用索引提示,我让 Oracle执行以下操作:
- 对 ProcessedEvent 进行完整索引扫描 对事件索引
- 进行完整索引扫描 事件
- 表上的索引表访问
- 过滤事件表
- 两个集合之间的反连接
,恕我直言,这确实很愚蠢。
所以我的问题是:oracle坚持早期表访问的原因可能是什么?
添加: 表现很差。我们通过仅选择 Event.ID,然后“手动”获取所需的行来解决性能问题。但当然这只是一种解决方法。
We have two tables like so:
Event
id
type
... a bunch of other columns
ProcessedEvent
event_id
process
There are indexes defined for
- Event(id) (PK)
- ProcessedEvent (event_id, process)
The first represents events in an application.
The second represents the fact that a certain event got processes by a certain process. There are many processes that need to process a certain event, so there are multiple entries in the second table for each entry in the first.
In order to find all the events that need processing we execute the following query:
select * // of course we do name the columns in the production code
from Event
where type in ( 'typeA', 'typeB', 'typeC')
and id not in (
select event_id
from ProcessedEvent
where process = :1
)
Statistics are up to date
Since most events are processed, I think the best execution plan should look something like this
- full index scan on the ProcessedEvent Index
- full index scan on the Event Index
- anti join between the two
- table access with the rest
- filter
Instead Oracle does the following
- full index scan on the ProcessedEvent Index
- full table scan on the Event table
- filter the Event table
- anti join between the two sets
With an index hint I get Oracle to do the following:
- full index scan on the ProcessedEvent Index
- full index scan on the Event Index
- table acces on the Event table
- filter the Event table
- anti join between the two sets
which is really stupid IMHO.
So my question is: what might be the reason for oracle to insist on the early table access?
Addition:
The performance is bad. We are fixing the performance problem by selecting just the Event.IDs and then fetching the needed rows 'manually'. But of course that is just a work around.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
您的完整索引扫描可能会比完整表扫描更快,因为索引可能比表“更薄”。不过,全索引扫描是完整的段读取,其成本与全表扫描大致相同。
但是,您还添加了 TABLE ACCESS BY ROWID 步骤。这是一个昂贵的步骤:ROWID 访问每行一个逻辑 IO,而每个多块获得一个逻辑 IO(取决于您的
db_file_multiblock_read_count 参数) 用于全表扫描。
总之,优化器计算出:
更新:FULL TABLE SCAN 还比 FULL INDEX SCAN 路径更快地启用类型过滤器(因为 INDEX 不知道事件是什么类型),因此减少了将被反连接的集合的大小(全表扫描的另一个优点)。
your FULL INDEX SCAN will probably be faster than a FULL TABLE SCAN since the index is likely "thinner" than the table. Still, the FULL INDEX SCAN is a complete segment reading and it will be about the same cost as the FULL TABLE SCAN.
However, you're also adding a TABLE ACCESS BY ROWID step. It is an expensive step: one logical IO per row for the ROWID access whereas you get one logical IO per multi blocks (depending upon your
db_file_multiblock_read_count parameter
) for the FULL TABLE SCAN.In conclusion, the optimizer computes that:
Update: The FULL TABLE SCAN also enables the filter on type sooner than in the FULL INDEX SCAN path (since the INDEX doesn't know what type an event is), therefore reducing the size of the set that will be anti-joined (yet another advantage of the FULL TABLE SCAN).
优化器做了很多一开始没有意义的事情,但它有它的理由。它们可能并不总是正确,但它们是可以理解的。
由于事件表的大小,它可能比通过 rowid 访问更容易进行全面扫描。顺序读取整个表所涉及的 IO 操作可能比读取位和片段少得多。
性能是否很差,或者您只是想问优化器为什么这样做?
The optimizer does many things which do not make sense at first, but it has it's reasons. They may not always be right, but they're understandable.
The Event table may be easier to full-scan rather than by rowid access because of its size. It could be that there are significantly fewer IO operations involved to read the entire table sequentially than to read bits and pieces.
Is the performance bad, or are you just asking why the optimizer did that?
我无法解释优化器的行为,但我的经验是不惜一切代价避免“NOT IN”,而是用 MINUS 替换它,如下所示:
我已经看到类似转换的查询性能提高了几个数量级。
I can't explain the behavior of the optimizer, but my experience has been to avoid "NOT IN" at all costs, replacing it instead with MINUS, like so:
I've seen orders of magnitude in query performance with similar transformations.
像这样的东西:
可以足够快地工作(至少比
NOT IN
快得多)。Something like:
could work fast enough (at least much faster than
NOT IN
).