如何在火花集群上编程获取数量可用节点
我正在尝试通过让我的Java Spark应用程序自动计算可在进行分区的可用核心数量来删除在我的Spark-Submit中添加参数的手动步骤。希望能确定一个解决方案以编程方式执行此操作。
我确实查看了这个解决方案[所以问题] spark:spark:获取大量群集核心数量。 ,但不确定如何执行“封装violator”组件以允许blockmanager.master.getStoragestatus.length-1
在Java工作。我还尝试了sc.getExecutorStoragestatus.length -1
无用。我能够通过java.lang.runtime.getRuntime.availableProcessors
获得数量的核心,但是节点/工人/执行者的数量仍然使我感到不安。
希望某人对如何使执行者数量超出建议的建议有建议。我在Spark 3.0,在Java写作
I am trying to remove a manual step of adding an argument to my spark-submit by having my java spark application automatically calculate number of available cores to do partitions on. The hope was to identify a solution to do this programmatically.
I did look at this solution [SO question]Spark: get number of cluster cores programmatically, but am not sure how to do the "EncapsulationViolator" component to allow blockManager.master.getStorageStatus.length - 1
to work in Java. I have also tried sc.getExecutorStorageStatus.length - 1
to no avail. I was able to get number of cores via java.lang.Runtime.getRuntime.availableProcessors
, but number of nodes/workers/executors still eludes me.
Hoping someone has a suggestion on how to get number of executors beyond what has been suggested. I'm in spark 3.0 and writing in java
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我最终使用AWS CLI Shell脚本查询群集以获取此信息,
首先必须获取clusterID
,然后必须获得许多运行实例:
然后,我从群集的主节点中调用了此信息,并用它乘以它,
使我陷入困境。编程分区总数
I ended up using AWS cli shell script to query the cluster for this information
first had to get the clusterid
then had to get number of running instances:
I then called this from within the master node of the cluster and multiplied it by
This got me the total number of partitions programmatically