是否可以在执行者中没有Python模块的情况下穿过Spark-Submit?
我有一个7个节点的簇。 1:主 2:NN1 3:NN2 4:DN1 5:DN2 6:DN3 7:GN
I仅在网关节点中安装了Python模块。 我通过Spark-Submit运行了Python代码。
以下是我的代码,
from pyspark import SparkConf, SparkContext
import pandas as pd
# Spark configuration
conf = SparkConf().setAppName("uber-date-trips")
sc = SparkContext(conf=conf)
# data parsing
lines = sc.textFile( "/user/tester/trips_2020-03.csv" )
header = lines.first()
filtered_lines = lines.filter(lambda row:row != header)
# data filtering and counting same date data
dates = filtered_lines.map(lambda x: x.split(",")[2].split(" ")[0])
result = dates.countByValue()
# save the result to csv type
pd.Series(result, name="trips").to_csv("trips_date.csv")
我认为这项工作将失败。 因为执行者中没有Python模块(DN1,DN2,DN3)。
但是这项工作是成功。
有人向我解释吗?
谢谢
I have a cluster of 7 nodes.
1: master
2: nn1
3: nn2
4: dn1
5: dn2
6: dn3
7: gn
I only installed python modules in gateway node.
And I ran a python code through spark-submit.
below is my code
from pyspark import SparkConf, SparkContext
import pandas as pd
# Spark configuration
conf = SparkConf().setAppName("uber-date-trips")
sc = SparkContext(conf=conf)
# data parsing
lines = sc.textFile( "/user/tester/trips_2020-03.csv" )
header = lines.first()
filtered_lines = lines.filter(lambda row:row != header)
# data filtering and counting same date data
dates = filtered_lines.map(lambda x: x.split(",")[2].split(" ")[0])
result = dates.countByValue()
# save the result to csv type
pd.Series(result, name="trips").to_csv("trips_date.csv")
I thought this job would be failed.
because there weren't python modules in executors(dn1, dn2, dn3).
But this job was success.
Is there anybody to explain to me?
Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论