捕获:java.lang.OutOfMemoryError:Java 堆空间 - 使用 -Xmx 不适用

发布于 2024-11-27 09:39:27 字数 497 浏览 0 评论 0原文

我用 Groovy 编写了一个非常复杂的数据库迁移脚本,它在我的工作站上运行得很好,但在服务器的 JVM 上运行时会产生“Caught: java.lang.OutOfMemoryError: Java heap space”。 JVM 按原样卡住(作为实习生,资源有限),因此除了增加可用内存之外,我还需要找出另一种方法来解决此问题。

当访问一些最大的表时,就会出现错误:特别大但简单的联接(200,000 多行到 50,000 多行)。有没有另一种方法可以实现这样的连接,从而使我免于错误?

查询示例:

target.query(""" 
    SELECT 
        a.*, b.neededColumn 
    FROM 
        bigTable a JOIN mediumTable b ON 
    a.stuff = b.stuff 
    ORDER BY stuff DESC 
""") { ResultSet rs ->
    ...
}

I have written a very complex database migration script in Groovy, that runs just fine on my workstation but produces "Caught: java.lang.OutOfMemoryError: Java heap space" when run on the server's JVM. JVM is stuck as is (limited resources as an intern), so I need to figure out another way to fix this besides increasing available memory.

The error strikes when some of the largest tables are accessed: a particularly large, but simple, join (200,000+ rows to 50,000+ rows). Is there another way I can approach such a join that will save me from the error?

Example of query:

target.query(""" 
    SELECT 
        a.*, b.neededColumn 
    FROM 
        bigTable a JOIN mediumTable b ON 
    a.stuff = b.stuff 
    ORDER BY stuff DESC 
""") { ResultSet rs ->
    ...
}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

会傲 2024-12-04 09:39:27

您可以在数据库服务器上运行 SQL 中的联接吗?

如果没有,您可能会不得不迭代 200,000 个结果中的每一个,将其连接到 50,000 行并写出结果(因此您在任何时候都不会在内存中存储超过 1*50,000 个结果)

或者,如果如果您可以访问多台机器,您可以将 200,000 个项目分成多个块并在每台机器上执行一个块?

编辑

以您的示例代码为例,您应该能够执行以下操作:

new File( 'output.csv' ).withWriter { w ->
  target.eachRow( '''SELECT a.a, a.b, a.c, b.neededColumn FROM
    bigTable a
    JOIN mediumTable b ON a.stuff = b.stuff
    ORDER BY stuff DESC''' ) { row ->
    w.write "$row.a,$row.b,$row.c,$row.neededColumn"
  }
}

这会将每一行写入文件 output.csv

Can you run the join in SQL on the database server?

If not, you're probably stuck with iterating through each of your 200,000 results joining it to the 50,000 rows and writing out the results (so you aren't storing more than 1*50,000 results in memory at any one time)

Or, if you have access to multiple machines, you could divide your 200,000 items into blocks and do one block per machine?

Edit

Taking your example code, you should be able to do:

new File( 'output.csv' ).withWriter { w ->
  target.eachRow( '''SELECT a.a, a.b, a.c, b.neededColumn FROM
    bigTable a
    JOIN mediumTable b ON a.stuff = b.stuff
    ORDER BY stuff DESC''' ) { row ->
    w.write "$row.a,$row.b,$row.c,$row.neededColumn"
  }
}

That will write each row out to the file output.csv

海夕 2024-12-04 09:39:27

您必须更改代码,以便行不会同时全部加载到内存中(即流式传输数据,一次处理每一行)。据我所知,当您使用像 collect 这样的东西时,Groovy 仍然不会这样做,所以重写它以使用 for 循环。

You have to change your code so that the rows are not loaded all into memory at the same time (i.e. stream the data, work on each row one at a time). As far as I know, Groovy still doesn't do this when you use things like collect, so rewrite it to use a for loop.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文