一次仅选择一组特定的行

发布于 2024-08-25 05:05:10 字数 266 浏览 10 评论 0原文

我需要从一个表中选择数据并将其插入到另一个表中。目前 SQL 看起来像这样:

   INSERT INTO A (x, y, z)
   SELECT x, y, z
   FROM B b
   WHERE ...

然而,SELECT 很大,导致超过 200 万行,我们认为它占用了太多内存。 Informix(本例中的数据库)在运行查询时耗尽虚拟内存。

我将如何选择并插入一组行(比如 2000)?鉴于我不认为有任何行 ID 等。

I need to select data from one table and insert it into another table. Currently the SQL looks something like this:

   INSERT INTO A (x, y, z)
   SELECT x, y, z
   FROM B b
   WHERE ...

However, the SELECT is huge, resulting in over 2 millions rows and we think it is taking up too much memory. Informix, the db in this case, runs out of virtual memory when the query is run.

How would I go about selecting and inserting a set of rows (say 2000)? Given that I don't think there are any row ids etc.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

双手揣兜 2024-09-01 05:05:11

我假设您有一些执行此脚本的脚本?只要对从嵌套选择返回的值进行排序,您就可以循环和限制。这是一些伪代码。

total = SELECT COUNT(x) FROM B WHERE ...
while (total > 0) 
  INSERT INTO A (x, y, z) SELECT x, y, z FROM B b WHERE ... ORDER BY x LIMIT 2000
  total = total - 2000
end

I assume that you have some script that this is executed from? You can just loop and limit as long as you order the values returned from the nested select. Here is some pseudo code.

total = SELECT COUNT(x) FROM B WHERE ...
while (total > 0) 
  INSERT INTO A (x, y, z) SELECT x, y, z FROM B b WHERE ... ORDER BY x LIMIT 2000
  total = total - 2000
end
御弟哥哥 2024-09-01 05:05:11

我几乎可以肯定,ID​​S 只允许您使用将数据返回到客户端1 的 FIRST 子句,这是您希望尽可能避免的情况。

您说您遇到了内存不足错误(而不是长事务中止错误)?您是否检查过服务器的配置以确保其具有合理的内存量?

这部分取决于您的数据集有多大,以及约束是什么 - 为什么您要跨表进行加载。但我通常的目标是确定一种将数据分区为可加载子集并在循环中按顺序运行这些子集的方法。例如,如果序列号在 1 到 10,000,000 之间,我可能会运行循环十次,条件为 AND seqnum >= 0 AND seqnum <; 1000000',然后AND seqnum >= 1000000 AND seqnum < 2000000'等。最好使用能够通过变量替换范围的语言。

这有点麻烦,并且您希望在范围大小方面采取保守的做法(更多更小的分区而不是更少的更大分区 - 以降低内存不足的风险)。


1 稍微过度简化。例如,存储过程必须算作“客户端”,并且存储过程中的通信成本比访问真正客户端的成本要低(很多)。

I'm almost certain that IDS only lets you use the FIRST clause where the data is returned to the client1, and that is something you want to avoid if at all possible.

You say you get an out of memory error (rather than, say, a long transaction aborted error)? Have you looked at the configuration of your server to ensure it has a reasonable amount of memory?

It depends in part on how big your data set is, and what the constraints are - why you are doing the load across tables. But I would normally aim to determine a way of partitioning the data into loadable subsets and run those sequentially in a loop. For example, if the sequence numbers are between 1 and 10,000,000, I might run the loop ten times, with condition on the sequence number for AND seqnum >= 0 AND seqnum < 1000000' and thenAND seqnum >= 1000000 AND seqnum < 2000000', etc. Preferably in a language with the ability to substitute the range via variables.

This is a bit nuisancy, and you want to err on the conservative side in terms of range size (more smaller partitions rather than fewer bigger ones - to reduce the risk of running out of memory).


1 Over-simplifying slightly. A stored procedure would have to count as 'the client', for example, and the communication cost in a stored procedure is (a lot) less than the cost of going to the genuine client.

攒一口袋星星 2024-09-01 05:05:10

您可以从表中执行 SELECT FIRST n * 。其中 n 是您想要的行数,例如 2000。此外,在 WHERE 子句中执行嵌入式选择,检查您要插入的表中是否存在已存在的行。这样下次运行该语句时,它将不包含已插入的数据。

You can do SELECT FIRST n * from Table. Where n is the amount of rows you want, say 2000. Also, in the WHERE clause do an embedded select that checks the table you are inserting in to for rows already existing. So that the next time the statement is ran, it will not include already inserted data.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文