处理大表 - 如何逐页选择记录?

发布于 2024-10-08 13:24:25 字数 308 浏览 1 评论 0原文

我需要对表中的所有记录进行处理。该表可能非常大,因此我宁愿逐页处理记录。我需要记住已经处理过的记录,因此它们不会包含在我的第二个 SELECT 结果中。

像这样:

对于第一次运行, [从 MyTable 中选择 100 条记录]

对于第二次运行, [从 MyTable 中选择另外 100 条记录]

等等..

我希望你能明白。我的问题是如何编写这样的选择语句?

顺便说一句,我正在使用 oracle,但如果我也可以在任何其他数据库上运行,那就太好了。 我也不想使用存储过程。

非常感谢!

I need to do a process on all the records in a table. The table could be very big so I rather process the records page by page. I need to remember the records that have already been processed so there are not included in my second SELECT result.

Like this:

For first run,
[SELECT 100 records FROM MyTable]

For second run,
[SELECT another 100 records FROM MyTable]

and so on..

I hope you get the picture. My question is how do I write such select statement?

I'm using oracle btw, but would be nice if I can run on any other db too.
I also don't want to use store procedure.

Thank you very much!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

赠意 2024-10-15 13:24:25

您想出的将表分成更小块的任何解决方案最终都会比一次性处理所有内容花费更多的时间。除非表已分区并且您一次只能处理一个分区。

如果全表扫描需要 1 分钟,那么将表分解为 10 个部分将需要 10 分钟。如果表行按您可以使用的索引列的值进行物理排序,则由于集群因素,这会发生一些变化。但无论如何,它比一次性处理它需要更长的时间。

当然,这一切都取决于处理表中的一行需要多长时间。您可以选择通过处理数据块来减少服务器上的负载,但从性能角度来看,您无法击败全表扫描。

Any solution you come up with to break the table into smaller chunks, will end up taking more time than just processing everything in one go. Unless the table is partitioned and you can process exactly one partition at a time.

If a full table scan takes 1 minute, it will take you 10 minutes to break up the table into 10 pieces. If the table rows are physically ordered by the values of an indexed column that you can use, this will change a bit due to clustering factor. But it will anyway take longer than just processing it in one go.

This all depends on how long it takes to process one row from the table of course. You could chose to reduce the load on the server by processing chunks of data, but from a performance perspective, you cannot beat a full table scan.

自演自醉 2024-10-15 13:24:25

您很可能想要利用 Oracle 的 stopkey 优化,因此当您不需要时,您不会最终得到全表扫描。有几种方法可以做到这一点。第一种方法写起来有点长,但是让 Oracle 自动计算出涉及的行数:

select *
from
(
  select rownum rn, v1.*
  from (
    select *
    from table t
    where filter_columns = 'where clause'
    order by columns_to_order_by
  ) v1
  where rownum <= 200
)
where rn >= 101;

您也可以使用 FIRST_ROWS 提示实现相同的效果:

  select /*+ FIRST_ROWS(200) */ *
  from (
    select rownum rn, t.*
    from table t
    where filter_columns = 'where clause'
    order by columns_to_order_by
  ) v1
  where rn between 101 and 200;

我更喜欢 rownum 方法,因此您不必不断更改提示中的值(需要表示最终值,而不是实际返回到页面的行数才能准确)。您可以通过这种方式将开始值和结束值设置为绑定变量,从而避免硬解析。

欲了解更多详情,您可以查看这篇文章

You are most likely going to want to take advantage of Oracle's stopkey optimization, so you don't end up with a full tablescan when you don't want one. There are a couple ways to do this. The first way is a little longer to write, but let's Oracle automatically figure out the number of rows involved:

select *
from
(
  select rownum rn, v1.*
  from (
    select *
    from table t
    where filter_columns = 'where clause'
    order by columns_to_order_by
  ) v1
  where rownum <= 200
)
where rn >= 101;

You could also achieve the same thing with the FIRST_ROWS hint:

  select /*+ FIRST_ROWS(200) */ *
  from (
    select rownum rn, t.*
    from table t
    where filter_columns = 'where clause'
    order by columns_to_order_by
  ) v1
  where rn between 101 and 200;

I much prefer the rownum method, so you don't have to keep changing the value in the hint (which would need to represent the end value and not the number of rows actually returned to the page to be accurate). You can set up the start and end values as bind variables that way, so you avoid hard parsing.

For more details, you can check out this post

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文