Oracle 外部表。优化选择查询

发布于 2024-08-08 13:35:41 字数 1035 浏览 3 评论 0原文

我必须从 Oracle 外部表执行许多选择。

我有 10 个看起来很像这样的游标(ext_temp 是外部表)

CURSOR F_CURSOR (day IN varchar,code Number,orig Number)
    IS
    select NVL(sum(table_4.f),0) 
     from ext_temp table_4
    where
      --couple of conditions here, irrelevant for the question at hand.
      AND TO_CHAR(table_4.day,'YYYYMMDD') = day
      AND table_4.CODE = code
      AND table_4.ORIG = orig;

并且外部表有大约 22659 个寄存器。

我的脚本主循环看起来像这样

   for each register in some_query: --22659 registers
       open F_cursor(register.day,register.code,register.orig);
       --open 9 more cursors

       fetch F_cursor into some_var;  
       --fetch 9 more cursors, with the same structure

查询占用太多。我从这里知道我不能有任何索引或DML。

那么有没有办法让它跑得更快呢?我可以重写我的 plsql 脚本,但我认为我没有时间了。

更新:遗漏了一个重要细节。

我不是数据库的所有者或 DBA。那家伙不希望他的数据库中有任何额外的信息(大约 3GB 的数据),而外部表是我们能从他那里得到的全部信息。他不允许我们创建临时表。我不会假装质疑他的理由,但外部表并不是解决这个问题的方法。所以,我们被他们困住了。

I have to perform many selects from an Oracle external table.

I have 10 cursors that look a lot like this (ext_temp is the external table)

CURSOR F_CURSOR (day IN varchar,code Number,orig Number)
    IS
    select NVL(sum(table_4.f),0) 
     from ext_temp table_4
    where
      --couple of conditions here, irrelevant for the question at hand.
      AND TO_CHAR(table_4.day,'YYYYMMDD') = day
      AND table_4.CODE = code
      AND table_4.ORIG = orig;

And the external table has about 22659 registers.

My script main loop looks like this

   for each register in some_query: --22659 registers
       open F_cursor(register.day,register.code,register.orig);
       --open 9 more cursors

       fetch F_cursor into some_var;  
       --fetch 9 more cursors, with the same structure

Queries are taking way to much. And I know from here that i can't have any indexes or DML.

So, is there a way of getting it to run faster? I can rewrite my plsql script, but i don't think I have time left.

Update: missed an important detail.

I'm not the owner or DBA of the database. That guy doesn't want any extra info (its about 3gb of data) in his database, and external tables is all we could get out of him. He doesnt allow us to create temporary tables. I don't pretend to question his reasons, but external tables is not the solution for this. So, we are stuck with them.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

梦里南柯 2024-08-15 13:35:41

将它们设为 Oracle 表。

外部表是用来替换SQL*LOADER的,而不是每天使用它们。

每当您的底层文件发生更改时,只需运行导入脚本即可将外部表的内容加载到 Oracle 表中。

这是您的同名者对此的看法(窃取自 此处):

您正在使用外部表而不是sqlldr

使用外部表,您可以

  • 在一个语句中将平面文件与现有表合并。
  • 将平面文件排序到您想要很好压缩的表格中。
  • 进行并行直接路径加载——无需分割输入文件,写入
    无数的脚本等等
  • 从存储过程或触发器中有效运行 sqlldr(插入不是 sqlldr
  • 进行多表插入
  • 通过管道 plsql 函数传输数据以进行清理/转换

等等。它们而不是sqlldr——将数据获取到数据库,而无需首先使用sqlldr

您通常不会在操作系统中日常查询它们,而是使用它们来加载数据。

更新:

使用 3GB 表永远无法获得良好的性能,因为 Oracle 必须使用 3GB > 对每个查询进行全扫描,这将是一流的磁盘读取主轴移动全扫描,而不是您可以在计划中看到但在实际执行时间中几乎无法注意到的廉价缓存模仿。

尝试说服那个人为您创建一个临时表,您可以使用它来处理数据,并在会话开始时从外部表加载数据。

这不是最好的解决方案,因为它需要在临时表空间中为每个会话保留单独的表副本,但它在性能方面要好得多。

Make them Oracle tables.

External tables are there to replace SQL*LOADER, not to work with them on a daily basis.

Just run an importing script whenever you underlying file changes which would load the contents of an external table into an Oracle table.

Here's what your namesake thinks of it (stolen from here):

you are using external tables instead of sqlldr.

with external tables you can

  • merge a flat file with an existing table in one statement.
  • sort a flat file on the way into a table you want compressed nicely.
  • do a parallel direct path load -- without splitting up the input file, writing
    umpteen scripts and so on
  • run sqlldr in effect from a stored procedure or trigger (insert is not sqlldr)
  • do multi-table inserts
  • flow the data through a pipelined plsql function for cleansing/transformation

and so on. they are instead of sqlldr -- to get data into the database without having to use sqlldr in the first place.

You would not normally query them day to day in an operational system, you use them to load data.

Update:

You won't ever get decent performance with a 3GB table, since Oracle will have to do a 3GB fullscan on each query, and it will be a first-class disk-reading spindle-moving fullscan, not a cheap cached imitation which you can see in the plan but can barely notice in the actual execution time.

Try to convince the guy to create a temporary table for you which you could use to work with the data and just load the data from the external table whenever your session begins.

This is not the best solution since it will need to keep the separate copy of the table for each session in the temporary tablespace but it's much better performance-wise.

悲欢浪云 2024-08-15 13:35:41

如果您必须解决那些没有意义但又无法更改的限制,那真的很困难……

您最好读取一次外部表,然后在类似索引的数据中构建所需的数据代码中的结构(基本上是一个数组,其中每个元素对应您要查找的每个寄存器)。

因此,您的光标将如下所示:

CURSOR F_CURSOR (day IN varchar, orig IN Number)
    IS
    select NVL(sum(table_4.f),0) value, table_4.CODE register
     from ext_temp table_4
    where
      --couple of conditions here, irrelevant for the question at hand.
      AND TO_CHAR(table_4.day,'YYYYMMDD') = day
      -- AND table_4.CODE = code -- don't use this condition!
      AND table_4.ORIG = orig;

您的寄存器循环将变成光标循环:

open F_cursor(register.day,register.orig);
LOOP
    fetch F_cursor into some_var;
    EXIT WHEN F_cursor%NOT_FOUND
    result (some_var.register) := some_var.value;
END LOOP;

因此,您只需要对所有寄存器进行一个循环,而不是为每个寄存器循环遍历外部表。

这可以扩展到您提到的十个光标。

It's really tough if you have to work around restrictions that don't make sense but that you can't change ...

You should be better off reading through the external table once, and then build the required data in an index-like data structure in your code (basically an array with one element for each register you are looking for).

So your cursor would look like this:

CURSOR F_CURSOR (day IN varchar, orig IN Number)
    IS
    select NVL(sum(table_4.f),0) value, table_4.CODE register
     from ext_temp table_4
    where
      --couple of conditions here, irrelevant for the question at hand.
      AND TO_CHAR(table_4.day,'YYYYMMDD') = day
      -- AND table_4.CODE = code -- don't use this condition!
      AND table_4.ORIG = orig;

And your register-loop would turn into a cursor-loop:

open F_cursor(register.day,register.orig);
LOOP
    fetch F_cursor into some_var;
    EXIT WHEN F_cursor%NOT_FOUND
    result (some_var.register) := some_var.value;
END LOOP;

As a result, instead of a loop through the external table for each register, you just need one loop for all registers.

This can be extended for the ten cursors you mentioned.

白首有我共你 2024-08-15 13:35:41

您可以将外部表数据写入临时索引(如果需要)表,然后对其执行多个查询。

create your_temp_table as select * from ext_temp;
create index your_desired_index on your_temp_table(indexed_field);

然后直接使用 your_temp_table 执行所有查询。

You could write your external table data to a temporary indexed (if you want) table, and then perform your multiple queries against it.

create your_temp_table as select * from ext_temp;
create index your_desired_index on your_temp_table(indexed_field);

Then do all your queries directly using your_temp_table.

枕头说它不想醒 2024-08-15 13:35:41

虽然完全同意 Quassnoi 的建议,即外部表似乎不是正确的解决方案,以及 DCookie 的类比,即您被束缚并被扔到海里并被要求游泳,但至少可能有一种方法来构建您的程序,以便外部表只能读取一次。根据您的描述,我认为所有 10 个游标都在从外部表读取,这意味着您强制 Oracle 扫描外部表 10 次。

假设这个推论是正确的,最简单的答案可能是使外部表成为驱动光标,类似于 IronGoofy 的建议。根据下面代码片段中的 some_query 正在执行的操作,

for each register in some_query

并假设查询返回与外部表中相同数量的行这一事实并非巧合,最简单的选择是做类似

FOR register in (select * from ext_temp)
LOOP
  -- Figure out if the row should have been part of cursor 1
  IF( <<set of conditions>> ) 
  THEN
    <<do something>>
  -- Figure out if the row should have been part of cursor 2
  ELSIF( ... )
  ...
END LOOP;

or

FOR register in (select * 
                   from ext_temp a, 
                        (<<some query>>) b 
                  where a.column_name = b.column_name )
LOOP
  -- Figure out if the row should have been part of cursor 1
  IF( <<set of conditions>> ) 
  THEN
    <<do something>>
  -- Figure out if the row should have been part of cursor 2
  ELSIF( ... )
  ...
END LOOP;

的事情 更进一步将逻辑从游标(和 IF 语句)移到驱动游标中应该会更有效。使用上面更简单的代码片段(当然,您可以将 some_query 加入到这些示例中。

FOR register in (select a.*,
                        NVL(sum( (case when condition1 and condition2
                                       then table_4.f
                                       else 0
                                       end) ),
                             0) f_cursor_sum
                  from ext_temp table_4)
LOOP
  <<do something>>
END LOOP;

即使在执行此操作之后,您仍然发现正在执行一些逐行处理,那么您甚至可以更进一步,从驱动游标执行 BULK COLLECT 到本地声明的集合中,并对该集合进行操作。您几乎肯定不希望将 3 GB 的数据提取到本地集合中(尽管可能会破坏 PGA)。让 DBA 得出结论,临时表并不是一件坏事,我不建议这样做),使用 LIMIT 子句一次获取几百行应该会使事情变得更有效率。

While totally agreeing with Quassnoi's suggestion that external tables do not appear to be the proper solution here, as well as DCookie's analogy that you're being bound and tossed overboard and asked to swim, there may at least be a way to structure your program so that the external table is only read once. My belief from your description is that all 10 cursors are reading from the external table, meaning that you are forcing Oracle to scan the external table 10 times.

Assuming this inference is correct, the simplest answer is likely to make the external table the driving cursor, similar to what IronGoofy suggested. Depending on what some_query in the code snippet below is doing,

for each register in some_query

and assuming that the fact that the query returns the same number of rows that are in the external table is not a coincidence, the simplest option would be to do something like

FOR register in (select * from ext_temp)
LOOP
  -- Figure out if the row should have been part of cursor 1
  IF( <<set of conditions>> ) 
  THEN
    <<do something>>
  -- Figure out if the row should have been part of cursor 2
  ELSIF( ... )
  ...
END LOOP;

or

FOR register in (select * 
                   from ext_temp a, 
                        (<<some query>>) b 
                  where a.column_name = b.column_name )
LOOP
  -- Figure out if the row should have been part of cursor 1
  IF( <<set of conditions>> ) 
  THEN
    <<do something>>
  -- Figure out if the row should have been part of cursor 2
  ELSIF( ... )
  ...
END LOOP;

It should be more efficient to take things a step further and move logic out of the cursors (and IF statements) and into the driving cursor. Using the simpler of the code snippets above (you could, of course, join some_query to these examples

FOR register in (select a.*,
                        NVL(sum( (case when condition1 and condition2
                                       then table_4.f
                                       else 0
                                       end) ),
                             0) f_cursor_sum
                  from ext_temp table_4)
LOOP
  <<do something>>
END LOOP;

If, even after doing this, you still find that you are doing some row-by-row processing, you could even go one more step forward and do a BULK COLLECT from the driving cursor into a locally declared collection and operate on that collection. You almost certainly don't want to fetch 3 GB of data into a local collection (though crushing the PGA might lead the DBA to conclude that temporary tables aren't such a bad thing, it's not something I would advise), fetching a few hundred rows at a time using the LIMIT clause should make things a bit more efficient.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文