驾驶员内存未在火花结构化流中清理
我正在使用Spark结构化流媒体工作在AWS平台中执行我的ETL 我的驾驶员内存没有被清除。这项工作是阅读从运动性和写作到S3的事件 以下是我的火花配置。另外,附加包含驱动程序JVM HEAP使用图的屏幕快照图(1表示100%)
spark.cleaner.periodicGC.interval=1min
spark.driver.extraJavaOptions=-XX:+UseG1GC
spark.cleaner.referenceTracking.blocking=false
I am using my Spark Structured Streaming job to perform my ETL in AWS platform
My Driver memory is not getting cleared-up. The job is reading the events from Kinesis and writing to S3
Below are the my Spark configurations. Also attaching screenshot containing Driver JVM heap usage graph for reference (1 means 100%)
spark.cleaner.periodicGC.interval=1min
spark.driver.extraJavaOptions=-XX:+UseG1GC
spark.cleaner.referenceTracking.blocking=false
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
当纱线中的 Spark UI 列出大量作业时,我们就看到了这一点。我们限制了 UI 中显示的作业数量(例如 100 个等)、任务和阶段。这有助于减少驱动程序内存使用量。你可以尝试一下
We had seen this when Spark UI in yarn had lots of jobs being listed. We limited the no.of jobs (for ex. to 100 etc), tasks and stages shown in UI. This helped decrease driver memory usage. You can give a try