G1GC稳定用于大型火花应用

发布于 2025-01-27 10:18:40 字数 313 浏览 1 评论 0原文

我们的数据管道目前正在Spark 2.4和Java版本1.8上运行,执行所有ETL步骤大约需要10个小时。

目前,我们注意到驾驶员内存堆已升高,并在管道尽头造成了很多完整的GC(堆的堆是70克),即使使用了所有完整的GC,堆仍保持最高水平,请记住这是火花司机。

我们目前正在使用-XX:+XX:+USEG1GC进​​行测试后使用-XX:+useParallelGC。我们注意到,完整的GC的数量很大,因此想更改为G1GC。但是我从同事那里听说G1GC几年前不稳定,想知道G1GC现在是稳定的GC,现在是大型Spark应用程序(在我们的情况下,运行10多个小时,50G+ HEAP尺寸)

Our data pipeline is currently running on Spark 2.4 and java version 1.8, it takes about 10 hours to performing all the ETL steps.

Currently we noticed that driver memory heap is elevated and causing a lot of full GCs toward the end of pipeline( the heap is given 70G ), even with all the full GCs, the heap stays at the highest level, pls keep in mind this is the spark driver.

We are currently using -XX:+UseParallelGC, after testing with -XX:+UseG1GC. We noticed that the number of full GC came down a lot, hence thinking to change to G1GC. But I heard from colleague that G1GC is not stable few years back, wondering is G1GC is stable GC now for large Spark application ( in our case, running 10+ hour, 50G+ heap size)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

惜醉颜 2025-02-03 10:18:40

(这是一条评论增长了)

首先,听起来您有记忆泄漏。从您的描述来看,这听起来好像您没有释放旧的gen。移至G1无法解决。

请注意,我不熟悉您的特定用例,也不知道是否有延迟要求。
话虽如此,我们的实例与G1和800 GB一起运行J8(有时更大的记录为1.5 tb)堆积而没有任何重大问题。
请注意,在这些堆尺寸上,可能会发生怪异的边缘案例,并且配置可能很脆弱。但它用于生产。
我们拥有的一个人是HW线的数量至关重要。即使他们大部分时间都没有做任何事情,一旦GC移入混合周期,HW线的数量也变得非常重要。

一般而言,如果这是没有延迟要求的批处理样式应用程序,请与并行收集器保持联系并修复内存泄漏。 G1同时消耗CPU,并且并行收集器不会。

(this was a comment that grew)

First off, it sounds like you have a memory leak.Just from your description, it sounds like you're not freeing old gen. Moving over to G1 will not fix that.

Note that I'm not familiar with your particular use-case nor do I know if there's any latency requirements.
Having said that, we have instances running J8 with g1 and 800 GB (sometimes larger, record is 1.5 TB) heaps without any major issues.
Note that, that at these kind of heap sizes, weird edge cases can occur and the configuration can be fragile. But it is used in production.
One take away we have is that the number of HW threads are vitally important. Even if they don't do anything most of the time, once the gc moves into mixed cycles, the number of hw thread becomes exceedingly important.

Generally speaking, if this is a batch style application without latency requirements, stay with the parallel collector and fix the memory leak. G1 consumes cpu concurrently and the parallel collector does not.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文