使用UDF时,Pyspark应用程序无法找到Python
当我尝试在本地Pyspark应用程序中使用UDF时,出现错误,说找不到Python。
示例UDF我正在使用:
import pyspark.sql.functions as sql_func
test_udf = sql_func.udf(lambda x: x.lower(), StringType())
df.withColumn("Sample", test_udf("SampleCol")).show(5)
错误消息,
Python was not found; run without arguments to install from the Microsoft Store, or disable this shortcut from Settings > Manage App Execution Aliases.
22/06/30 10:49:45 WARN ProcfsMetricsGetter: Exception when trying to compute pagesize, as a result reporting of ProcessTree metrics is stopped
22/06/30 10:49:46 ERROR Executor: Exception in task 0.0 in stage 4.0 (TID 7)
org.apache.spark.SparkException: Python worker failed to connect back.
但是,仅当我评估Spark UDF时,此错误才会发生。我能够将Spark DataFrame转换为PANDAS DataFrame,并在没有问题的情况下应用相同的功能。但是,每当我使用df.show()应用UDF后,每当我评估数据帧时,它都无法正常工作。
关于我的设置的注释:
- 在Windows 10上运行
- 使用python 3.9.6
- Spark 3.3
我尝试过的事物
- :从设置&GT中禁用捷径;管理应用程序执行别名
- 将Pyspark_python添加到我的系统环境变量中,
谢谢您的帮助!
When I try to use a UDF in my local pyspark application, an error is produced saying python cannot be found.
Sample UDF I am using:
import pyspark.sql.functions as sql_func
test_udf = sql_func.udf(lambda x: x.lower(), StringType())
df.withColumn("Sample", test_udf("SampleCol")).show(5)
Error message
Python was not found; run without arguments to install from the Microsoft Store, or disable this shortcut from Settings > Manage App Execution Aliases.
22/06/30 10:49:45 WARN ProcfsMetricsGetter: Exception when trying to compute pagesize, as a result reporting of ProcessTree metrics is stopped
22/06/30 10:49:46 ERROR Executor: Exception in task 0.0 in stage 4.0 (TID 7)
org.apache.spark.SparkException: Python worker failed to connect back.
However, this error only happens when I am evaluating a Spark UDF. I am able to convert the Spark DataFrame to a Pandas DataFrame and apply the same function without an issue. However, whenever I evaluate the DataFrame with a df.show() after the udf is applied it does not work.
Notes about my set up:
- Running on Windows 10
- Using Virtual environment with Python 3.9.6
- Spark 3.3
Things I have tried that did not work:
- Disabling the short cut from Settings > Manage App Execution Aliases
- Adding PYSPARK_PYTHON to my system environment variables
Thank you for your help!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论