如何在独立的多节点多折叠设置上运行火花群集
编辑1:使用network_mode尝试:在工作节点上的主机,相同的结果
我正在设置一个多节点的多个偶数SPARK,在独立配置中:
1个带有1个火花主和X工人的节点
docker-compose for Master+Worker节点:
version: '2'
services:
spark:
image: bitnami/spark:latest
environment:
- SPARK_MODE=master
ports:
- '8080:8080'
- '4040:4040'
- '7077:7077'
spark-worker:
image: bitnami/spark:latest
environment:
- SPARK_MODE=worker
- SPARK_MASTER_URL=spark://spark:7077
deploy:
mode: replicated
replicas: 4
n节点1 ... M工人
docker-compose for Worker节点:
version: '2'
services:
spark-worker:
image: bitnami/spark:latest
environment:
- SPARK_MODE=worker
- SPARK_MASTER_URL=spark://1.1.1.1:7077
network-mode: host
deploy:
mode: replicated
replicas: 4
我可以在Spark Master Web UI上看到正确的已注册工人数量。 但是,当我在主人上提交一份工作时,主日志充满了:
spark_1 | 22/07/01 13:32:27 INFO Master: Removing executor app-20220701133058-0002/499 because it is EXITED
spark_1 | 22/07/01 13:32:27 INFO Master: Launching executor app-20220701133058-0002/530 on worker worker-20220701130135-172.18.0.4-35337
spark_1 | 22/07/01 13:32:27 INFO Master: Removing executor app-20220701133058-0002/501 because it is EXITED
spark_1 | 22/07/01 13:32:27 INFO Master: Launching executor app-20220701133058-0002/531 on worker worker-20220701132457-172.18.0.5-39517
spark_1 | 22/07/01 13:32:27 INFO Master: Removing executor app-20220701133058-0002/502 because it is EXITED
spark_1 | 22/07/01 13:32:27 INFO Master: Launching executor app-20220701133058-0002/532 on worker worker-20220701132457-172.18.0.2-43527
spark_1 | 22/07/01 13:32:27 INFO Master: Removing executor app-20220701133058-0002/505 because it is EXITED
spark_1 | 22/07/01 13:32:27 INFO Master: Launching executor app-20220701133058-0002/533 on worker worker-20220701130134-172.18.0.3-35961
spark_1 | 22/07/01 13:32:27 INFO Master: Removing executor app-20220701133058-0002/504 because it is EXITED
spark_1 | 22/07/01 13:32:27 INFO Master: Launching executor app-20220701133058-0002/534 on worker worker-20220701132453-172.18.0.5-40345
spark_1 | 22/07/01 13:32:28 INFO Master: Removing executor app-20220701133058-0002/506 because it is EXITED
spark_1 | 22/07/01 13:32:28 INFO Master: Launching executor app-20220701133058-0002/535 on worker worker-20220701132454-172.18.0.2-42907
spark_1 | 22/07/01 13:32:28 INFO Master: Removing executor app-20220701133058-0002/514 because it is EXITED
spark_1 | 22/07/01 13:32:28 INFO Master: Launching executor app-20220701133058-0002/536 on worker worker-20220701132442-172.18.0.2-41669
spark_1 | 22/07/01 13:32:28 INFO Master: Removing executor app-20220701133058-0002/503 because it is EXITED
spark_1 | 22/07/01 13:32:28 INFO Master: Launching executor app-20220701133058-0002/537 on worker worker-20220701132454-172.18.0.3-37011
spark_1 | 22/07/01 13:32:28 INFO Master: Removing executor app-20220701133058-0002/509 because it is EXITED
spark_1 | 22/07/01 13:32:28 INFO Master: Launching executor app-20220701133058-0002/538 on worker worker-20220701132455-172.18.0.4-42013
spark_1 | 22/07/01 13:32:28 INFO Master: Removing executor app-20220701133058-0002/507 because it is EXITED
spark_1 | 22/07/01 13:32:28 INFO Master: Launching executor app-20220701133058-0002/539 on worker worker-20220701132510-172.18.0.3-39097
spark_1 | 22/07/01 13:32:28 INFO Master: Removing executor app-20220701133058-0002/508 because it is EXITED
spark_1 | 22/07/01 13:32:28 INFO Master: Launching executor app-20220701133058-0002/540 on worker worker-20220701132510-172.18.0.2-40827
spark_1 | 22/07/01 13:32:28 INFO Master: Removing executor app-20220701133058-0002/513 because it is EXITED
示例远程工人日志:
spark-worker_1 | 22/07/01 13:32:32 INFO ExecutorRunner: Launch command: "/opt/bitnami/java/bin/java" "-cp" "/opt/bitnami/spark/conf/:/opt/bitnami/spark/jars/*" "-Xmx1024M" "-Dspark.driver.port=38385" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "spark://CoarseGrainedScheduler@63ab9653f1c0:38385" "--executor-id" "561" "--hostname" "172.18.0.4" "--cores" "1" "--app-id" "app-20220701133058-0002" "--worker-url" "spark://[email protected]:35337"
spark-worker_1 | 22/07/01 13:32:38 INFO Worker: Executor app-20220701133058-0002/561 finished with state EXITED message Command exited with code 1 exitStatus 1
spark-worker_1 | 22/07/01 13:32:38 INFO ExternalShuffleBlockResolver: Clean up non-shuffle and non-RDD files associated with the finished executor 561
spark-worker_1 | 22/07/01 13:32:38 INFO ExternalShuffleBlockResolver: Executor is not registered (appId=app-20220701133058-0002, execId=561)
spark-worker_1 | 22/07/01 13:32:38 INFO Worker: Asked to launch executor app-20220701133058-0002/595 for API Bruteforce
spark-worker_1 | 22/07/01 13:32:38 INFO SecurityManager: Changing view acls to: spark
spark-worker_1 | 22/07/01 13:32:38 INFO SecurityManager: Changing modify acls to: spark
spark-worker_1 | 22/07/01 13:32:38 INFO SecurityManager: Changing view acls groups to:
spark-worker_1 | 22/07/01 13:32:38 INFO SecurityManager: Changing modify acls groups to:
spark-worker_1 | 22/07/01 13:32:38 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(spark); groups with view permissions: Set(); users with modify permissions: Set(spark); groups with modify permissions: Set()
spark-worker_1 | 22/07/01 13:32:38 INFO ExecutorRunner: Launch command: "/opt/bitnami/java/bin/java" "-cp" "/opt/bitnami/spark/conf/:/opt/bitnami/spark/jars/*" "-Xmx1024M" "-Dspark.driver.port=38385" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "spark://CoarseGrainedScheduler@63ab9653f1c0:38385" "--executor-id" "595" "--hostname" "172.18.0.4" "--cores" "1" "--app-id" "app-20220701133058-0002" "--worker-url" "spark://[email protected]:35337"
spark-worker_1 | 22/07/01 13:32:43 INFO Worker: Executor app-20220701133058-0002/595 finished with state EXITED message Command exited with code 1 exitStatus 1
spark-worker_1 | 22/07/01 13:32:43 INFO ExternalShuffleBlockResolver: Clean up non-shuffle and non-RDD files associated with the finished executor 595
spark-worker_1 | 22/07/01 13:32:43 INFO ExternalShuffleBlockResolver: Executor is not registered (appId=app-20220701133058-0002, execId=595)
spark-worker_1 | 22/07/01 13:32:43 INFO Worker: Asked to launch executor app-20220701133058-0002/629 for API Bruteforce
spark-worker_1 | 22/07/01 13:32:43 INFO SecurityManager: Changing view acls to: spark
spark-worker_1 | 22/07/01 13:32:43 INFO SecurityManager: Changing modify acls to: spark
spark-worker_1 | 22/07/01 13:32:43 INFO SecurityManager: Changing view acls groups to:
spark-worker_1 | 22/07/01 13:32:43 INFO SecurityManager: Changing modify acls groups to:
spark-worker_1 | 22/07/01 13:32:43 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(spark); groups with view permissions: Set(); users with modify permissions: Set(spark); groups with modify permissions: Set()
spark-worker_1 | 22/07/01 13:32:43 INFO ExecutorRunner: Launch command: "/opt/bitnami/java/bin/java" "-cp" "/opt/bitnami/spark/conf/:/opt/bitnami/spark/jars/*" "-Xmx1024M" "-Dspark.driver.port=38385" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "spark://CoarseGrainedScheduler@63ab9653f1c0:38385" "--executor-id" "629" "--hostname" "172.18.0.4" "--cores" "1" "--app-id" "app-20220701133058-0002" "--worker-url" "spark://[email protected]:35337"
吞吐量非常低,并且在工人节点上使用的CPU使用率达到100%,
我相信它与工人的Docker端口映射有关节点,但我不知道我需要在工作容器上公开哪些端口?如果它们是同一端口,我该如何为同一台计算机上的多个容器配置它们?
EDIT 1: Tried with network_mode: host on the worker nodes, same result
I am setting up a multi-node multi-docker cluster of spark, in standalone configuration:
1 node with 1 spark master and X workers
docker-compose for master+worker node:
version: '2'
services:
spark:
image: bitnami/spark:latest
environment:
- SPARK_MODE=master
ports:
- '8080:8080'
- '4040:4040'
- '7077:7077'
spark-worker:
image: bitnami/spark:latest
environment:
- SPARK_MODE=worker
- SPARK_MASTER_URL=spark://spark:7077
deploy:
mode: replicated
replicas: 4
N nodes with 1...M workers
docker-compose for worker nodes:
version: '2'
services:
spark-worker:
image: bitnami/spark:latest
environment:
- SPARK_MODE=worker
- SPARK_MASTER_URL=spark://1.1.1.1:7077
network-mode: host
deploy:
mode: replicated
replicas: 4
I can see on the Spark Master web UI the correct number of workers registered.
But when I submit a job on master, the master logs are filled with:
spark_1 | 22/07/01 13:32:27 INFO Master: Removing executor app-20220701133058-0002/499 because it is EXITED
spark_1 | 22/07/01 13:32:27 INFO Master: Launching executor app-20220701133058-0002/530 on worker worker-20220701130135-172.18.0.4-35337
spark_1 | 22/07/01 13:32:27 INFO Master: Removing executor app-20220701133058-0002/501 because it is EXITED
spark_1 | 22/07/01 13:32:27 INFO Master: Launching executor app-20220701133058-0002/531 on worker worker-20220701132457-172.18.0.5-39517
spark_1 | 22/07/01 13:32:27 INFO Master: Removing executor app-20220701133058-0002/502 because it is EXITED
spark_1 | 22/07/01 13:32:27 INFO Master: Launching executor app-20220701133058-0002/532 on worker worker-20220701132457-172.18.0.2-43527
spark_1 | 22/07/01 13:32:27 INFO Master: Removing executor app-20220701133058-0002/505 because it is EXITED
spark_1 | 22/07/01 13:32:27 INFO Master: Launching executor app-20220701133058-0002/533 on worker worker-20220701130134-172.18.0.3-35961
spark_1 | 22/07/01 13:32:27 INFO Master: Removing executor app-20220701133058-0002/504 because it is EXITED
spark_1 | 22/07/01 13:32:27 INFO Master: Launching executor app-20220701133058-0002/534 on worker worker-20220701132453-172.18.0.5-40345
spark_1 | 22/07/01 13:32:28 INFO Master: Removing executor app-20220701133058-0002/506 because it is EXITED
spark_1 | 22/07/01 13:32:28 INFO Master: Launching executor app-20220701133058-0002/535 on worker worker-20220701132454-172.18.0.2-42907
spark_1 | 22/07/01 13:32:28 INFO Master: Removing executor app-20220701133058-0002/514 because it is EXITED
spark_1 | 22/07/01 13:32:28 INFO Master: Launching executor app-20220701133058-0002/536 on worker worker-20220701132442-172.18.0.2-41669
spark_1 | 22/07/01 13:32:28 INFO Master: Removing executor app-20220701133058-0002/503 because it is EXITED
spark_1 | 22/07/01 13:32:28 INFO Master: Launching executor app-20220701133058-0002/537 on worker worker-20220701132454-172.18.0.3-37011
spark_1 | 22/07/01 13:32:28 INFO Master: Removing executor app-20220701133058-0002/509 because it is EXITED
spark_1 | 22/07/01 13:32:28 INFO Master: Launching executor app-20220701133058-0002/538 on worker worker-20220701132455-172.18.0.4-42013
spark_1 | 22/07/01 13:32:28 INFO Master: Removing executor app-20220701133058-0002/507 because it is EXITED
spark_1 | 22/07/01 13:32:28 INFO Master: Launching executor app-20220701133058-0002/539 on worker worker-20220701132510-172.18.0.3-39097
spark_1 | 22/07/01 13:32:28 INFO Master: Removing executor app-20220701133058-0002/508 because it is EXITED
spark_1 | 22/07/01 13:32:28 INFO Master: Launching executor app-20220701133058-0002/540 on worker worker-20220701132510-172.18.0.2-40827
spark_1 | 22/07/01 13:32:28 INFO Master: Removing executor app-20220701133058-0002/513 because it is EXITED
Sample remote worker logs:
spark-worker_1 | 22/07/01 13:32:32 INFO ExecutorRunner: Launch command: "/opt/bitnami/java/bin/java" "-cp" "/opt/bitnami/spark/conf/:/opt/bitnami/spark/jars/*" "-Xmx1024M" "-Dspark.driver.port=38385" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "spark://CoarseGrainedScheduler@63ab9653f1c0:38385" "--executor-id" "561" "--hostname" "172.18.0.4" "--cores" "1" "--app-id" "app-20220701133058-0002" "--worker-url" "spark://[email protected]:35337"
spark-worker_1 | 22/07/01 13:32:38 INFO Worker: Executor app-20220701133058-0002/561 finished with state EXITED message Command exited with code 1 exitStatus 1
spark-worker_1 | 22/07/01 13:32:38 INFO ExternalShuffleBlockResolver: Clean up non-shuffle and non-RDD files associated with the finished executor 561
spark-worker_1 | 22/07/01 13:32:38 INFO ExternalShuffleBlockResolver: Executor is not registered (appId=app-20220701133058-0002, execId=561)
spark-worker_1 | 22/07/01 13:32:38 INFO Worker: Asked to launch executor app-20220701133058-0002/595 for API Bruteforce
spark-worker_1 | 22/07/01 13:32:38 INFO SecurityManager: Changing view acls to: spark
spark-worker_1 | 22/07/01 13:32:38 INFO SecurityManager: Changing modify acls to: spark
spark-worker_1 | 22/07/01 13:32:38 INFO SecurityManager: Changing view acls groups to:
spark-worker_1 | 22/07/01 13:32:38 INFO SecurityManager: Changing modify acls groups to:
spark-worker_1 | 22/07/01 13:32:38 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(spark); groups with view permissions: Set(); users with modify permissions: Set(spark); groups with modify permissions: Set()
spark-worker_1 | 22/07/01 13:32:38 INFO ExecutorRunner: Launch command: "/opt/bitnami/java/bin/java" "-cp" "/opt/bitnami/spark/conf/:/opt/bitnami/spark/jars/*" "-Xmx1024M" "-Dspark.driver.port=38385" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "spark://CoarseGrainedScheduler@63ab9653f1c0:38385" "--executor-id" "595" "--hostname" "172.18.0.4" "--cores" "1" "--app-id" "app-20220701133058-0002" "--worker-url" "spark://[email protected]:35337"
spark-worker_1 | 22/07/01 13:32:43 INFO Worker: Executor app-20220701133058-0002/595 finished with state EXITED message Command exited with code 1 exitStatus 1
spark-worker_1 | 22/07/01 13:32:43 INFO ExternalShuffleBlockResolver: Clean up non-shuffle and non-RDD files associated with the finished executor 595
spark-worker_1 | 22/07/01 13:32:43 INFO ExternalShuffleBlockResolver: Executor is not registered (appId=app-20220701133058-0002, execId=595)
spark-worker_1 | 22/07/01 13:32:43 INFO Worker: Asked to launch executor app-20220701133058-0002/629 for API Bruteforce
spark-worker_1 | 22/07/01 13:32:43 INFO SecurityManager: Changing view acls to: spark
spark-worker_1 | 22/07/01 13:32:43 INFO SecurityManager: Changing modify acls to: spark
spark-worker_1 | 22/07/01 13:32:43 INFO SecurityManager: Changing view acls groups to:
spark-worker_1 | 22/07/01 13:32:43 INFO SecurityManager: Changing modify acls groups to:
spark-worker_1 | 22/07/01 13:32:43 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(spark); groups with view permissions: Set(); users with modify permissions: Set(spark); groups with modify permissions: Set()
spark-worker_1 | 22/07/01 13:32:43 INFO ExecutorRunner: Launch command: "/opt/bitnami/java/bin/java" "-cp" "/opt/bitnami/spark/conf/:/opt/bitnami/spark/jars/*" "-Xmx1024M" "-Dspark.driver.port=38385" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "spark://CoarseGrainedScheduler@63ab9653f1c0:38385" "--executor-id" "629" "--hostname" "172.18.0.4" "--cores" "1" "--app-id" "app-20220701133058-0002" "--worker-url" "spark://[email protected]:35337"
Throughput is very low, and CPU usage on worker nodes is reaching 100%
I believe it has something to do with docker port mapping on the worker nodes, but I can't figure out which ports I need to expose on the worker containers? And if they're the same port, how would I configure them for multiple containers on the same machine?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我认为您应该加入工人节点的码头组合
I think you should add in the docker-compose of worker nodes
我不知道您是否曾经解决过这个问题,但是我看到的一件事是Spark Master试图与正在管理的工人建立(出站)连接。
如指定,您的工人的Docker-compose.yml不会暴露任何端口。因此,我的猜测是工人连接到主人,主人记录了这一点,这就是为什么它看到工人的原因。但是,当主人去与工人建立连接时,没有端口暴露,并且会出来。
因此,我也会在您的docker-compose.yml中公开工人的端口。
I don't know if you ever solved this, but one thing I've seen is that Spark master tries to make (outbound) connections to the workers it is managing.
As specified, the docker-compose.yml for your workers does not expose any ports. So my guess is that the workers connect to the master, and the master records this, which is why it sees the workers. however, when the master goes to make connections back to the workers there are no ports exposed and it times out.
So I'd expose the workers' ports in your docker-compose.yml as well.