是否可以在执行者中没有Python模块的情况下穿过Spark-Submit？

发布于 2025-02-04 07:25:52 字数 797 浏览 2 评论 0原文

我有一个7个节点的簇。 1：主 2：NN1 3：NN2 4：DN1 5：DN2 6：DN3 7：GN

I仅在网关节点中安装了Python模块。我通过Spark-Submit运行了Python代码。

以下是我的代码，

from pyspark import SparkConf, SparkContext
import pandas as pd

# Spark configuration
conf = SparkConf().setAppName("uber-date-trips")
sc = SparkContext(conf=conf)

# data parsing
lines = sc.textFile( "/user/tester/trips_2020-03.csv" )
header = lines.first()
filtered_lines = lines.filter(lambda row:row != header)

# data filtering and counting same date data
dates = filtered_lines.map(lambda x: x.split(",")[2].split(" ")[0])
result = dates.countByValue()

# save the result to csv type
pd.Series(result, name="trips").to_csv("trips_date.csv")

我认为这项工作将失败。因为执行者中没有Python模块（DN1，DN2，DN3）。

但是这项工作是成功。

有人向我解释吗？

谢谢

原文

I have a cluster of 7 nodes.
1: master
2: nn1
3: nn2
4: dn1
5: dn2
6: dn3
7: gn

I only installed python modules in gateway node.
And I ran a python code through spark-submit.

below is my code

from pyspark import SparkConf, SparkContext
import pandas as pd

# Spark configuration
conf = SparkConf().setAppName("uber-date-trips")
sc = SparkContext(conf=conf)

# data parsing
lines = sc.textFile( "/user/tester/trips_2020-03.csv" )
header = lines.first()
filtered_lines = lines.filter(lambda row:row != header)

# data filtering and counting same date data
dates = filtered_lines.map(lambda x: x.split(",")[2].split(" ")[0])
result = dates.countByValue()

# save the result to csv type
pd.Series(result, name="trips").to_csv("trips_date.csv")

I thought this job would be failed.
because there weren't python modules in executors(dn1, dn2, dn3).

But this job was success.

Is there anybody to explain to me?

Thanks

分享到QQ

分享到微博