使用基于文件的收集时不支持 arrow,而从 pandas 转换为 Spark 时,反之亦然

发布于 2025-01-12 21:59:58 字数 1882 浏览 3 评论 0原文

我正在尝试使用 arrow

enabling spark.conf.set("spark.sql.execution.arrow.pyspark.enabled", "true"), but getting following error 

 /databricks/spark/python/pyspark/sql/pandas/conversion.py:340: UserWarning: createDataFrame 
 attempted Arrow optimization because 'spark.sql.execution.arrow.pyspark.enabled' is set to 
 true; however, failed by the reason below:
 [Errno 13] Permission denied: '/local_disk0/spark-0419ce26-a5d1-4c8a-b985- 
 55ca5737a123/pyspark-f272e212-2760-40d2-9e6c-891f858a9a48/tmp92jv6g71'
 Attempting non-optimization as 'spark.sql.execution.arrow.pyspark.fallback.enabled' is set to 
 true.
 warnings.warn(msg)
 /databricks/spark/python/pyspark/sql/pandas/conversion.py:161: UserWarning: toPandas 
 attempted Arrow optimization because 'spark.sql.execution.arrow.pyspark.enabled' is set to 
 true, but has reached the error below and can not continue. Note that 
'spark.sql.execution.arrow.pyspark.fallback.enabled' does not have an effect on failures in 
 the middle of computation.
 arrow is not supported when using file-based collect
 warnings.warn(msg)
 Exception: arrow is not supported when using file-based collect

我们的集群版本是 10.3(包括 Apache Spark 3.2.1、Scala 2.12),驱动程序类型是 standard_E32_V3

下面是我尝试从文档中使用的代码 文档链接

 import numpy as np
 import pandas as pd

 # Enable Arrow-based columnar data transfers
 spark.conf.set("spark.sql.execution.arrow.pyspark.enabled", "true")

 # Generate a Pandas DataFrame
 pdf = pd.DataFrame(np.random.rand(100, 3))

 # Create a Spark DataFrame from a Pandas DataFrame using Arrow
 df = spark.createDataFrame(pdf)

 # Convert the Spark DataFrame back to a Pandas DataFrame using Arrow
 result_pdf = df.select("*").toPandas()

I am trying to use arrow by

enabling spark.conf.set("spark.sql.execution.arrow.pyspark.enabled", "true"), but getting following error 

 /databricks/spark/python/pyspark/sql/pandas/conversion.py:340: UserWarning: createDataFrame 
 attempted Arrow optimization because 'spark.sql.execution.arrow.pyspark.enabled' is set to 
 true; however, failed by the reason below:
 [Errno 13] Permission denied: '/local_disk0/spark-0419ce26-a5d1-4c8a-b985- 
 55ca5737a123/pyspark-f272e212-2760-40d2-9e6c-891f858a9a48/tmp92jv6g71'
 Attempting non-optimization as 'spark.sql.execution.arrow.pyspark.fallback.enabled' is set to 
 true.
 warnings.warn(msg)
 /databricks/spark/python/pyspark/sql/pandas/conversion.py:161: UserWarning: toPandas 
 attempted Arrow optimization because 'spark.sql.execution.arrow.pyspark.enabled' is set to 
 true, but has reached the error below and can not continue. Note that 
'spark.sql.execution.arrow.pyspark.fallback.enabled' does not have an effect on failures in 
 the middle of computation.
 arrow is not supported when using file-based collect
 warnings.warn(msg)
 Exception: arrow is not supported when using file-based collect

Our cluster version is 10.3 (includes Apache Spark 3.2.1, Scala 2.12), driver type is standard_E32_V3

below is the code which I have tried to use from documentation documentation link

 import numpy as np
 import pandas as pd

 # Enable Arrow-based columnar data transfers
 spark.conf.set("spark.sql.execution.arrow.pyspark.enabled", "true")

 # Generate a Pandas DataFrame
 pdf = pd.DataFrame(np.random.rand(100, 3))

 # Create a Spark DataFrame from a Pandas DataFrame using Arrow
 df = spark.createDataFrame(pdf)

 # Convert the Spark DataFrame back to a Pandas DataFrame using Arrow
 result_pdf = df.select("*").toPandas()

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文