插入 50 万个具有相同日期值的条目是否会因该日期列上的非唯一索引而减慢速度?
我有一个游标可以选择表中的所有行,略多于 500,000 行。从游标读取一行,插入到另一个表中,该表有两个索引,都不是唯一的,一个数字,一个“DATE”类型。犯罪。从光标读取下一行,插入...直到光标为空。
我的所有 DATE 列的值都是相同的,来自脚本开始时初始化的时间戳。
这个东西已经运行了 24 小时,只发布了 464K 行,略低于 10K 行/小时。
Oracle 11g,10 个处理器(!?) 一定是出了什么问题。我认为 DATE 索引试图处理该列具有完全相同值的所有这些条目。
I have a cursor that selects all rows in a table, a little over 500,000 rows. Read a row from cursor, INSERT into other table, which has two indexes, neither unique, one numeric, one 'DATE' type. COMMIT. Read next row from Cursor, INSERT...until Cursor is empty.
All my DATE column's values are the same, from a timestamp initialized at the start of the script.
This thing's been running for 24 hours, only posted 464K rows, a little less than 10K rows / hr.
Oracle 11g, 10 processors(!?)
Something has to be wrong. I think it's that DATE index trying to process all these entries with exactly the same value for that column.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
你为什么不直接做:
?
这种缓慢的速度对性能造成的损害比可能没有任何意义的索引要大得多。
Why don't you just do:
?
This slow by slow is doing far more damage to performance than an index that may not make any sense.
索引会减慢插入速度,但会加快查询速度。这是正常的。
如果出现问题,您可以删除索引,插入行,然后再次添加索引。如果您一次执行多次插入,这会更快。
使用游标复制数据的方式似乎效率低下。您可以尝试基于集合的方法:
Indexes slow down inserts but speed up queries. This is normal.
If it is a problem you can remove the index, insert the rows, then add the index again. This can be faster if you are doing many inserts at once.
The way you are copying the data using cursors seems to be inefficient. You could try a set-based approach instead:
在每个插入行之后提交没有多大意义。例如,如果您担心超出撤消容量,则可以保留插入计数并在每一千行后发出一次提交。
更新索引会产生一些影响,但如果在执行插入时无法删除(或禁用)索引,那么这是不可避免的,但事实就是如此。我预计这些承诺会产生更大的影响,尽管我怀疑这是一个有不同意见的话题。
这假设您有充分的理由从游标插入,而不是直接
insert into ... select from
模型。Committing after every inserted row doesn't make much sense. If you're worried about exceeding undo capacity, for example, you can keep a count of the inserts and issue a commit after every thousand rows.
Updating the indexes will have some impact but that's unavoidable if you can't drop (or disable) while the inserts are performed, but that's just how it goes. I'd expect the commits to have a bigger impact, though I suspect that's a topic with varied opinions.
This assumes you have a good reason for inserting from a cursor rather than as a direct
insert into ... select from
model.一般来说,在进行大量插入之前删除索引,然后将它们添加回来通常是一个好主意,这样数据库就不必在每次插入时尝试更新索引。我已经有一段时间没有使用oracle了,但是您是否尝试过在事务中放入多个插入语句?这也应该加快速度。
In general, its often a good idea to delete the indexes before doing a massive insert and then add them back afterwards, so that the db doesnt have to try to update the indexes with each insert. Its been a long while since I've used oracle, but had you tried putting more than one insert statement in a transaction? That should also speed it up.
对于这样的操作,您应该查看 oracle 批量操作,使用 FORALL 和 BULK COLLECT。它将大大减少基础表上的 DDL 操作数量
For operations like this you should look at oracle bulk operations, using FORALL and BULK COLLECT. It will reduce the number of DDL operations on the underlying tables considerably
同意以下评论:消磨你时间的是“慢慢地”处理。复制 500,000 行应该只需几分钟。
如果您有足够大的回滚段,那么单个 INSERT ... SELECT FROM .... 方法将是最好的方法。数据库甚至可以自动将并行技术应用于普通 SQL 语句,而 PL/SQL 则无法做到这一点。
此外,您可以考虑使用 /*+ APPEND */ 提示 - 阅读它并查看它是否适用于您的目标表的情况。
o 使用全部 10 个核心,您将需要使用普通并行 SQL,或者运行 pl/sql 块的 10 个副本,将源表拆分到 10 个副本中。
在 Oracle 10 中,这是一项手动任务(您自己的并行性),但 Oracle 11.2 引入了 DBMS_PARALLEL_EXECUTE。
如果失败,请使用 BULK COLLECT 和 BULK COLLECT 来批量获取/插入。批量插入将是下一个最佳选择 - 以 1000 行左右(或更大)的块进行处理。再次看看 DBMS_PARALLEL_EXECUTE 是否可以帮助您,或者您是否可以通过 DBMS_JOB 分块提交作业。
(警告:我无法访问 Oracle 10 之后的任何内容)
Agreed on comment that what is killing your time is the 'slow by slow' processing. Copying 500,000 rows should be a matter of minutes.
The single INSERT ... SELECT FROM .... approach would be the best one, provided you have big enough Rollback segments. The database may even automatically apply parallel techniques to a plain SQL statement that it will not do with PL/SQL.
In addition you could look at using the /*+ APPEND */ hint - read up on it and see if it may apply to the situation with your target table.
o use all 10 cores you will need to either use plain parallel SQL, or run 10 copies of your pl/sql block, splitting the source table across the 10 copies.
In Oracle 10 this is a manual task (roll your own parallelism) but Oracle 11.2 introduces DBMS_PARALLEL_EXECUTE.
Failing that, bulking up your fetch / insert using the BULK COLLECT & bulk insert would be the next best option - process in chunks of 1000 or so rows (or larger). Again take a look as to whether DBMS_PARALLEL_EXECUTE may help you, or if you could submit the job in chunks via DBMS_JOB.
(Caveat : I don't have access to anything later than Oracle 10)