运行“本地群集”时Apache Spark中的模型，如何防止执行者过早解离？

发布于 2025-01-29 15:25:24 字数 5045 浏览 2 评论 0原文

我有一个Spark应用程序，应在本地模式下进行测试＆amp;使用Scalatest的本地群集模式。

使用此方法提交本地群集模式：

测试成功运行，但是在终止测试时，我在日志中收到了以下错误：


22/05/16 17:45:25 ERROR TaskSchedulerImpl: Lost executor 0 on 172.16.224.18: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
22/05/16 17:45:25 ERROR Worker: Failed to launch executor app-20220516174449-0000/2 for Test.
java.lang.IllegalStateException: Shutdown hooks cannot be modified during shutdown.
    at org.apache.spark.util.SparkShutdownHookManager.add(ShutdownHookManager.scala:195)
    at org.apache.spark.util.ShutdownHookManager$.addShutdownHook(ShutdownHookManager.scala:153)
    at org.apache.spark.util.ShutdownHookManager$.addShutdownHook(ShutdownHookManager.scala:142)
    at org.apache.spark.deploy.worker.ExecutorRunner.start(ExecutorRunner.scala:77)
    at org.apache.spark.deploy.worker.Worker$$anonfun$receive$1.applyOrElse(Worker.scala:547)
    at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:117)
    at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:215)
    at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:102)
    at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:221)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
22/05/16 17:45:25 ERROR Worker: Failed to launch executor app-20220516174449-0000/3 for Test.
java.lang.IllegalStateException: Shutdown hooks cannot be modified during shutdown.
    at org.apache.spark.util.SparkShutdownHookManager.add(ShutdownHookManager.scala:195)
    at org.apache.spark.util.ShutdownHookManager$.addShutdownHook(ShutdownHookManager.scala:153)
    at org.apache.spark.util.ShutdownHookManager$.addShutdownHook(ShutdownHookManager.scala:142)
    at org.apache.spark.deploy.worker.ExecutorRunner.start(ExecutorRunner.scala:77)
    at org.apache.spark.deploy.worker.Worker$$anonfun$receive$1.applyOrElse(Worker.scala:547)
    at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:117)
    at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:215)
    at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:102)
    at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:221)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
22/05/16 17:45:25 ERROR Worker: Failed to launch executor app-20220516174449-0000/4 for Test.
java.lang.IllegalStateException: Shutdown hooks cannot be modified during shutdown.
    at org.apache.spark.util.SparkShutdownHookManager.add(ShutdownHookManager.scala:195)
    at org.apache.spark.util.ShutdownHookManager$.addShutdownHook(ShutdownHookManager.scala:153)
    at org.apache.spark.util.ShutdownHookManager$.addShutdownHook(ShutdownHookManager.scala:142)
    at org.apache.spark.deploy.worker.ExecutorRunner.start(ExecutorRunner.scala:77)
    at org.apache.spark.deploy.worker.Worker$$anonfun$receive$1.applyOrElse(Worker.scala:547)
    at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:117)
    at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:215)
    at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:102)
    at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:221)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
22/05/16 17:45:25 ERROR Worker: Failed to launch executor app-20220516174449-0000/5 for Test.
java.lang.IllegalStateException: Shutdown hooks cannot be modified during shutdown.
    at org.apache.spark.util.SparkShutdownHookManager.add(ShutdownHookManager.scala:195)
    at org.apache.spark.util.ShutdownHookManager$.addShutdownHook(ShutdownHookManager.scala:153)
    at org.apache.spark.util.ShutdownHookManager$.addShutdownHook(ShutdownHookManager.scala:142)
    at org.apache.spark.deploy.worker.ExecutorRunner.start(ExecutorRunner.scala:77)
    at org.apache.spark.deploy.worker.Worker$$anonfun$receive$1.applyOrElse(Worker.scala:547)
    at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:117)
    at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:215)
    at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:102)
    at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dis

...

事实证明executor 0在SparkContext之前已删除停止，这引发了Spark Master的暴力自我修复反应，该反应试图反复发射新的执行者以弥补损失。如何防止这种情况发生？

原文

I have a Spark application that should be tested in both local mode & local-cluster mode, using scalatest.

The local-cluster mode is submitted using this method:

How to scala-test a Spark program under "local-cluster" mode?

The test run successfully, but when terminating the test I got the following error in the log:


22/05/16 17:45:25 ERROR TaskSchedulerImpl: Lost executor 0 on 172.16.224.18: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
22/05/16 17:45:25 ERROR Worker: Failed to launch executor app-20220516174449-0000/2 for Test.
java.lang.IllegalStateException: Shutdown hooks cannot be modified during shutdown.
    at org.apache.spark.util.SparkShutdownHookManager.add(ShutdownHookManager.scala:195)
    at org.apache.spark.util.ShutdownHookManager$.addShutdownHook(ShutdownHookManager.scala:153)
    at org.apache.spark.util.ShutdownHookManager$.addShutdownHook(ShutdownHookManager.scala:142)
    at org.apache.spark.deploy.worker.ExecutorRunner.start(ExecutorRunner.scala:77)
    at org.apache.spark.deploy.worker.Worker$anonfun$receive$1.applyOrElse(Worker.scala:547)
    at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:117)
    at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:215)
    at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:102)
    at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:221)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
22/05/16 17:45:25 ERROR Worker: Failed to launch executor app-20220516174449-0000/3 for Test.
java.lang.IllegalStateException: Shutdown hooks cannot be modified during shutdown.
    at org.apache.spark.util.SparkShutdownHookManager.add(ShutdownHookManager.scala:195)
    at org.apache.spark.util.ShutdownHookManager$.addShutdownHook(ShutdownHookManager.scala:153)
    at org.apache.spark.util.ShutdownHookManager$.addShutdownHook(ShutdownHookManager.scala:142)
    at org.apache.spark.deploy.worker.ExecutorRunner.start(ExecutorRunner.scala:77)
    at org.apache.spark.deploy.worker.Worker$anonfun$receive$1.applyOrElse(Worker.scala:547)
    at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:117)
    at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:215)
    at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:102)
    at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:221)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
22/05/16 17:45:25 ERROR Worker: Failed to launch executor app-20220516174449-0000/4 for Test.
java.lang.IllegalStateException: Shutdown hooks cannot be modified during shutdown.
    at org.apache.spark.util.SparkShutdownHookManager.add(ShutdownHookManager.scala:195)
    at org.apache.spark.util.ShutdownHookManager$.addShutdownHook(ShutdownHookManager.scala:153)
    at org.apache.spark.util.ShutdownHookManager$.addShutdownHook(ShutdownHookManager.scala:142)
    at org.apache.spark.deploy.worker.ExecutorRunner.start(ExecutorRunner.scala:77)
    at org.apache.spark.deploy.worker.Worker$anonfun$receive$1.applyOrElse(Worker.scala:547)
    at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:117)
    at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:215)
    at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:102)
    at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:221)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
22/05/16 17:45:25 ERROR Worker: Failed to launch executor app-20220516174449-0000/5 for Test.
java.lang.IllegalStateException: Shutdown hooks cannot be modified during shutdown.
    at org.apache.spark.util.SparkShutdownHookManager.add(ShutdownHookManager.scala:195)
    at org.apache.spark.util.ShutdownHookManager$.addShutdownHook(ShutdownHookManager.scala:153)
    at org.apache.spark.util.ShutdownHookManager$.addShutdownHook(ShutdownHookManager.scala:142)
    at org.apache.spark.deploy.worker.ExecutorRunner.start(ExecutorRunner.scala:77)
    at org.apache.spark.deploy.worker.Worker$anonfun$receive$1.applyOrElse(Worker.scala:547)
    at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:117)
    at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:215)
    at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:102)
    at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dis

...

It turns out executor 0 was dropped before the SparkContext is stopped, this triggered a violent self-healing reaction from Spark master that tries to repeatedly launch new executors to compensate for the loss. How do I prevent this from happening?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

那请放手 2025-02-05 15:25:24

Spark尝试通过尝试再次运行失败的任务从失败的任务中恢复。您可以做的避免这种情况是将一些属性设置为 1 在

（默认值为4）
spark.task.task.maxfailures >（默认为4）

这些属性可以在$ SPARK_HOME/conf/conf/spark-defaults.conf 中设置为spark-submit：

spark-submit --conf spark.task.maxFailures=1 --conf spark.stage.maxConsecutiveAttempts=1

或在spark上下文中/会话配置在开始会话之前。

编辑：

看来由于内存不足，您的执行者丢失了。您可以尝试增加：

spark.executor.memory
spark.executor.memoryoverhead
spark.memory.offheap.size with（spark spark 。

>）

运行执行程序容器的最大内存大小由spark.executor.memoryoverhead的总和，spark.executor.memory.memory，spark.memory .offHeap.size and spark.executor.pyspark.memory.

Spark attempts to recover from failed tasks by attempting to run them again. What you can do to avoid this is to set some properties to 1 in

spark.task.maxFailures (default is 4)
spark.stage.maxConsecutiveAttempts (default is 4)

These properties can be set in $SPARK_HOME/conf/spark-defaults.conf or given as options to spark-submit:

spark-submit --conf spark.task.maxFailures=1 --conf spark.stage.maxConsecutiveAttempts=1

or in the Spark context/session configuration before starting the session.

EDIT:

It looks like your executors are lost due to insufficient memory. You could try to increase:

spark.executor.memory
spark.executor.memoryOverhead
spark.memory.offHeap.size with (spark.memory.offHeap.enabled=true)

(see Spark configuration)

The maximum memory size of container to running executor is determined by the sum of spark.executor.memoryOverhead, spark.executor.memory, spark.memory.offHeap.size and spark.executor.pyspark.memory.

回复收藏 0 原文

~没有更多了~