py4jexception:constructor org.apache.spark.sql.sparksession([[class org.apache.spark.spark.sparkcontext,class java.util.hashmap])

发布于 2025-02-13 09:11:25 字数 11314 浏览 3 评论 0原文

我试图通过Visual Studio代码在EC2 Linux机器上的Jupyter笔记本电脑中运行Spark会话。我的代码看起来如下:

from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("spark_app").getOrCreate()

错误是:

{
    "name": "Py4JError",
    "message": "An error occurred while calling None.org.apache.spark.sql.SparkSession. Trace:\npy4j.Py4JException: Constructor org.apache.spark.sql.SparkSession([class org.apache.spark.SparkContext, class java.util.HashMap]) does not exist\n\tat py4j.reflection.ReflectionEngine.getConstructor(ReflectionEngine.java:179)\n\tat py4j.reflection.ReflectionEngine.getConstructor(ReflectionEngine.java:196)\n\tat py4j.Gateway.invoke(Gateway.java:237)\n\tat py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)\n\tat py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)\n\tat py4j.GatewayConnection.run(GatewayConnection.java:238)\n\tat java.base/java.lang.Thread.run(Thread.java:829)\n\n",
    "stack": "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m\n\u001b[0;31mPy4JError\u001b[0m                                 Traceback (most recent call last)\n\u001b[1;32mc:\\Users\\IrinaKaerkkaenen\\Projekte\\ZugPortal\\test.ipynb Cell 3'\u001b[0m in \u001b[0;36m<cell line: 2>\u001b[0;34m()\u001b[0m\n\u001b[1;32m      <a href='vscode-notebook-cell:/c%3A/Users/IrinaKaerkkaenen/Projekte/ZugPortal/test.ipynb#ch0000002?line=0'>1</a>\u001b[0m \u001b[39mfrom\u001b[39;00m \u001b[39mpyspark\u001b[39;00m\u001b[39m.\u001b[39;00m\u001b[39msql\u001b[39;00m \u001b[39mimport\u001b[39;00m SparkSession\n\u001b[0;32m----> <a href='vscode-notebook-cell:/c%3A/Users/IrinaKaerkkaenen/Projekte/ZugPortal/test.ipynb#ch0000002?line=1'>2</a>\u001b[0m spark \u001b[39m=\u001b[39m SparkSession\u001b[39m.\u001b[39;49mbuilder\u001b[39m.\u001b[39;49mappName(\u001b[39m\"\u001b[39;49m\u001b[39mspark_app\u001b[39;49m\u001b[39m\"\u001b[39;49m)\u001b[39m.\u001b[39;49mgetOrCreate()\n\nFile \u001b[0;32m~/anaconda3/lib/python3.9/site-packages/pyspark/sql/session.py:272\u001b[0m, in \u001b[0;36mSparkSession.Builder.getOrCreate\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m    269\u001b[0m     sc \u001b[39m=\u001b[39m SparkContext\u001b[39m.\u001b[39mgetOrCreate(sparkConf)\n\u001b[1;32m    270\u001b[0m     \u001b[39m# Do not update `SparkConf` for existing `SparkContext`, as it's shared\u001b[39;00m\n\u001b[1;32m    271\u001b[0m     \u001b[39m# by all sessions.\u001b[39;00m\n\u001b[0;32m--> 272\u001b[0m     session \u001b[39m=\u001b[39m SparkSession(sc, options\u001b[39m=\u001b[39;49m\u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49m_options)\n\u001b[1;32m    273\u001b[0m \u001b[39melse\u001b[39;00m:\n\u001b[1;32m    274\u001b[0m     \u001b[39mgetattr\u001b[39m(\n\u001b[1;32m    275\u001b[0m         \u001b[39mgetattr\u001b[39m(session\u001b[39m.\u001b[39m_jvm, \u001b[39m\"\u001b[39m\u001b[39mSparkSession$\u001b[39m\u001b[39m\"\u001b[39m), \u001b[39m\"\u001b[39m\u001b[39mMODULE$\u001b[39m\u001b[39m\"\u001b[39m\n\u001b[1;32m    276\u001b[0m     )\u001b[39m.\u001b[39mapplyModifiableSettings(session\u001b[39m.\u001b[39m_jsparkSession, \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_options)\n\nFile \u001b[0;32m~/anaconda3/lib/python3.9/site-packages/pyspark/sql/session.py:307\u001b[0m, in \u001b[0;36mSparkSession.__init__\u001b[0;34m(self, sparkContext, jsparkSession, options)\u001b[0m\n\u001b[1;32m    303\u001b[0m         \u001b[39mgetattr\u001b[39m(\u001b[39mgetattr\u001b[39m(\u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_jvm, \u001b[39m\"\u001b[39m\u001b[39mSparkSession$\u001b[39m\u001b[39m\"\u001b[39m), \u001b[39m\"\u001b[39m\u001b[39mMODULE$\u001b[39m\u001b[39m\"\u001b[39m)\u001b[39m.\u001b[39mapplyModifiableSettings(\n\u001b[1;32m    304\u001b[0m             jsparkSession, options\n\u001b[1;32m    305\u001b[0m         )\n\u001b[1;32m    306\u001b[0m     \u001b[39melse\u001b[39;00m:\n\u001b[0;32m--> 307\u001b[0m         jsparkSession \u001b[39m=\u001b[39m \u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49m_jvm\u001b[39m.\u001b[39;49mSparkSession(\u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49m_jsc\u001b[39m.\u001b[39;49msc(), options)\n\u001b[1;32m    308\u001b[0m \u001b[39melse\u001b[39;00m:\n\u001b[1;32m    309\u001b[0m     \u001b[39mgetattr\u001b[39m(\u001b[39mgetattr\u001b[39m(\u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_jvm, \u001b[39m\"\u001b[39m\u001b[39mSparkSession$\u001b[39m\u001b[39m\"\u001b[39m), \u001b[39m\"\u001b[39m\u001b[39mMODULE$\u001b[39m\u001b[39m\"\u001b[39m)\u001b[39m.\u001b[39mapplyModifiableSettings(\n\u001b[1;32m    310\u001b[0m         jsparkSession, options\n\u001b[1;32m    311\u001b[0m     )\n\nFile \u001b[0;32m~/anaconda3/lib/python3.9/site-packages/py4j/java_gateway.py:1585\u001b[0m, in \u001b[0;36mJavaClass.__call__\u001b[0;34m(self, *args)\u001b[0m\n\u001b[1;32m   1579\u001b[0m command \u001b[39m=\u001b[39m proto\u001b[39m.\u001b[39mCONSTRUCTOR_COMMAND_NAME \u001b[39m+\u001b[39m\\\n\u001b[1;32m   1580\u001b[0m     \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_command_header \u001b[39m+\u001b[39m\\\n\u001b[1;32m   1581\u001b[0m     args_command \u001b[39m+\u001b[39m\\\n\u001b[1;32m   1582\u001b[0m     proto\u001b[39m.\u001b[39mEND_COMMAND_PART\n\u001b[1;32m   1584\u001b[0m answer \u001b[39m=\u001b[39m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_gateway_client\u001b[39m.\u001b[39msend_command(command)\n\u001b[0;32m-> 1585\u001b[0m return_value \u001b[39m=\u001b[39m get_return_value(\n\u001b[1;32m   1586\u001b[0m     answer, \u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49m_gateway_client, \u001b[39mNone\u001b[39;49;00m, \u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49m_fqn)\n\u001b[1;32m   1588\u001b[0m \u001b[39mfor\u001b[39;00m temp_arg \u001b[39min\u001b[39;00m temp_args:\n\u001b[1;32m   1589\u001b[0m     temp_arg\u001b[39m.\u001b[39m_detach()\n\nFile \u001b[0;32m~/anaconda3/lib/python3.9/site-packages/py4j/protocol.py:330\u001b[0m, in \u001b[0;36mget_return_value\u001b[0;34m(answer, gateway_client, target_id, name)\u001b[0m\n\u001b[1;32m    326\u001b[0m         \u001b[39mraise\u001b[39;00m Py4JJavaError(\n\u001b[1;32m    327\u001b[0m             \u001b[39m\"\u001b[39m\u001b[39mAn error occurred while calling \u001b[39m\u001b[39m{0}\u001b[39;00m\u001b[39m{1}\u001b[39;00m\u001b[39m{2}\u001b[39;00m\u001b[39m.\u001b[39m\u001b[39m\\n\u001b[39;00m\u001b[39m\"\u001b[39m\u001b[39m.\u001b[39m\n\u001b[1;32m    328\u001b[0m             \u001b[39mformat\u001b[39m(target_id, \u001b[39m\"\u001b[39m\u001b[39m.\u001b[39m\u001b[39m\"\u001b[39m, name), value)\n\u001b[1;32m    329\u001b[0m     \u001b[39melse\u001b[39;00m:\n\u001b[0;32m--> 330\u001b[0m         \u001b[39mraise\u001b[39;00m Py4JError(\n\u001b[1;32m    331\u001b[0m             \u001b[39m\"\u001b[39m\u001b[39mAn error occurred while calling \u001b[39m\u001b[39m{0}\u001b[39;00m\u001b[39m{1}\u001b[39;00m\u001b[39m{2}\u001b[39;00m\u001b[39m. Trace:\u001b[39m\u001b[39m\\n\u001b[39;00m\u001b[39m{3}\u001b[39;00m\u001b[39m\\n\u001b[39;00m\u001b[39m\"\u001b[39m\u001b[39m.\u001b[39m\n\u001b[1;32m    332\u001b[0m             \u001b[39mformat\u001b[39m(target_id, \u001b[39m\"\u001b[39m\u001b[39m.\u001b[39m\u001b[39m\"\u001b[39m, name, value))\n\u001b[1;32m    333\u001b[0m \u001b[39melse\u001b[39;00m:\n\u001b[1;32m    334\u001b[0m     \u001b[39mraise\u001b[39;00m Py4JError(\n\u001b[1;32m    335\u001b[0m         \u001b[39m\"\u001b[39m\u001b[39mAn error occurred while calling \u001b[39m\u001b[39m{0}\u001b[39;00m\u001b[39m{1}\u001b[39;00m\u001b[39m{2}\u001b[39;00m\u001b[39m\"\u001b[39m\u001b[39m.\u001b[39m\n\u001b[1;32m    336\u001b[0m         \u001b[39mformat\u001b[39m(target_id, \u001b[39m\"\u001b[39m\u001b[39m.\u001b[39m\u001b[39m\"\u001b[39m, name))\n\n\u001b[0;31mPy4JError\u001b[0m: An error occurred while calling None.org.apache.spark.sql.SparkSession. Trace:\npy4j.Py4JException: Constructor org.apache.spark.sql.SparkSession([class org.apache.spark.SparkContext, class java.util.HashMap]) does not exist\n\tat py4j.reflection.ReflectionEngine.getConstructor(ReflectionEngine.java:179)\n\tat py4j.reflection.ReflectionEngine.getConstructor(ReflectionEngine.java:196)\n\tat py4j.Gateway.invoke(Gateway.java:237)\n\tat py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)\n\tat py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)\n\tat py4j.GatewayConnection.run(GatewayConnection.java:238)\n\tat java.base/java.lang.Thread.run(Thread.java:829)\n\n"
}

在阅读文本编辑器中的完整错误之前,运行单元格的输出是

Output exceeds the size limit. Open the full output data in a text editor
---------------------------------------------------------------------------
Py4JError                                 Traceback (most recent call last)
/tmp/ipykernel_5260/8684085.py in <module>
      1 from pyspark.sql import SparkSession
----> 2 spark = SparkSession.builder.appName("spark_app").getOrCreate()

~/anaconda3/envs/zupo_env_test1/lib64/python3.7/site-packages/pyspark/sql/session.py in getOrCreate(self)
    270                     # Do not update `SparkConf` for existing `SparkContext`, as it's shared
    271                     # by all sessions.
--> 272                     session = SparkSession(sc, options=self._options)
    273                 else:
    274                     getattr(

~/anaconda3/envs/zupo_env_test1/lib64/python3.7/site-packages/pyspark/sql/session.py in __init__(self, sparkContext, jsparkSession, options)
    305                 )
    306             else:
--> 307                 jsparkSession = self._jvm.SparkSession(self._jsc.sc(), options)
    308         else:
    309             getattr(getattr(self._jvm, "SparkSession$"), "MODULE$").applyModifiableSettings(

~/anaconda3/envs/zupo_env_test1/lib64/python3.7/site-packages/py4j/java_gateway.py in __call__(self, *args)
   1584         answer = self._gateway_client.send_command(command)
   1585         return_value = get_return_value(
-> 1586             answer, self._gateway_client, None, self._fqn)
   1587 
...
    at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
    at py4j.GatewayConnection.run(GatewayConnection.java:238)
    at java.base/java.lang.Thread.run(Thread.java:829)

我在没有成功的情况下进行了很多谷歌搜索。有人知道怎么了吗?

我将IPython内核安装了3.9 python。

错误发生之前的警告:

WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/home/ec2-user/spark/spark-3.1.2-bin-hadoop2.7/jars/spark-unsafe_2.12-3.1.2.jar) to constructor java.nio.DirectByteBuffer(long,int)
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
22/07/05 21:06:22 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).

I am trying to run a spark session in the Jupyter Notebook on a EC2 Linux machine via Visual Studio Code. My code looks as following:

from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("spark_app").getOrCreate()

the error is:

{
    "name": "Py4JError",
    "message": "An error occurred while calling None.org.apache.spark.sql.SparkSession. Trace:\npy4j.Py4JException: Constructor org.apache.spark.sql.SparkSession([class org.apache.spark.SparkContext, class java.util.HashMap]) does not exist\n\tat py4j.reflection.ReflectionEngine.getConstructor(ReflectionEngine.java:179)\n\tat py4j.reflection.ReflectionEngine.getConstructor(ReflectionEngine.java:196)\n\tat py4j.Gateway.invoke(Gateway.java:237)\n\tat py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)\n\tat py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)\n\tat py4j.GatewayConnection.run(GatewayConnection.java:238)\n\tat java.base/java.lang.Thread.run(Thread.java:829)\n\n",
    "stack": "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m\n\u001b[0;31mPy4JError\u001b[0m                                 Traceback (most recent call last)\n\u001b[1;32mc:\\Users\\IrinaKaerkkaenen\\Projekte\\ZugPortal\\test.ipynb Cell 3'\u001b[0m in \u001b[0;36m<cell line: 2>\u001b[0;34m()\u001b[0m\n\u001b[1;32m      <a href='vscode-notebook-cell:/c%3A/Users/IrinaKaerkkaenen/Projekte/ZugPortal/test.ipynb#ch0000002?line=0'>1</a>\u001b[0m \u001b[39mfrom\u001b[39;00m \u001b[39mpyspark\u001b[39;00m\u001b[39m.\u001b[39;00m\u001b[39msql\u001b[39;00m \u001b[39mimport\u001b[39;00m SparkSession\n\u001b[0;32m----> <a href='vscode-notebook-cell:/c%3A/Users/IrinaKaerkkaenen/Projekte/ZugPortal/test.ipynb#ch0000002?line=1'>2</a>\u001b[0m spark \u001b[39m=\u001b[39m SparkSession\u001b[39m.\u001b[39;49mbuilder\u001b[39m.\u001b[39;49mappName(\u001b[39m\"\u001b[39;49m\u001b[39mspark_app\u001b[39;49m\u001b[39m\"\u001b[39;49m)\u001b[39m.\u001b[39;49mgetOrCreate()\n\nFile \u001b[0;32m~/anaconda3/lib/python3.9/site-packages/pyspark/sql/session.py:272\u001b[0m, in \u001b[0;36mSparkSession.Builder.getOrCreate\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m    269\u001b[0m     sc \u001b[39m=\u001b[39m SparkContext\u001b[39m.\u001b[39mgetOrCreate(sparkConf)\n\u001b[1;32m    270\u001b[0m     \u001b[39m# Do not update `SparkConf` for existing `SparkContext`, as it's shared\u001b[39;00m\n\u001b[1;32m    271\u001b[0m     \u001b[39m# by all sessions.\u001b[39;00m\n\u001b[0;32m--> 272\u001b[0m     session \u001b[39m=\u001b[39m SparkSession(sc, options\u001b[39m=\u001b[39;49m\u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49m_options)\n\u001b[1;32m    273\u001b[0m \u001b[39melse\u001b[39;00m:\n\u001b[1;32m    274\u001b[0m     \u001b[39mgetattr\u001b[39m(\n\u001b[1;32m    275\u001b[0m         \u001b[39mgetattr\u001b[39m(session\u001b[39m.\u001b[39m_jvm, \u001b[39m\"\u001b[39m\u001b[39mSparkSession$\u001b[39m\u001b[39m\"\u001b[39m), \u001b[39m\"\u001b[39m\u001b[39mMODULE$\u001b[39m\u001b[39m\"\u001b[39m\n\u001b[1;32m    276\u001b[0m     )\u001b[39m.\u001b[39mapplyModifiableSettings(session\u001b[39m.\u001b[39m_jsparkSession, \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_options)\n\nFile \u001b[0;32m~/anaconda3/lib/python3.9/site-packages/pyspark/sql/session.py:307\u001b[0m, in \u001b[0;36mSparkSession.__init__\u001b[0;34m(self, sparkContext, jsparkSession, options)\u001b[0m\n\u001b[1;32m    303\u001b[0m         \u001b[39mgetattr\u001b[39m(\u001b[39mgetattr\u001b[39m(\u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_jvm, \u001b[39m\"\u001b[39m\u001b[39mSparkSession$\u001b[39m\u001b[39m\"\u001b[39m), \u001b[39m\"\u001b[39m\u001b[39mMODULE$\u001b[39m\u001b[39m\"\u001b[39m)\u001b[39m.\u001b[39mapplyModifiableSettings(\n\u001b[1;32m    304\u001b[0m             jsparkSession, options\n\u001b[1;32m    305\u001b[0m         )\n\u001b[1;32m    306\u001b[0m     \u001b[39melse\u001b[39;00m:\n\u001b[0;32m--> 307\u001b[0m         jsparkSession \u001b[39m=\u001b[39m \u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49m_jvm\u001b[39m.\u001b[39;49mSparkSession(\u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49m_jsc\u001b[39m.\u001b[39;49msc(), options)\n\u001b[1;32m    308\u001b[0m \u001b[39melse\u001b[39;00m:\n\u001b[1;32m    309\u001b[0m     \u001b[39mgetattr\u001b[39m(\u001b[39mgetattr\u001b[39m(\u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_jvm, \u001b[39m\"\u001b[39m\u001b[39mSparkSession$\u001b[39m\u001b[39m\"\u001b[39m), \u001b[39m\"\u001b[39m\u001b[39mMODULE$\u001b[39m\u001b[39m\"\u001b[39m)\u001b[39m.\u001b[39mapplyModifiableSettings(\n\u001b[1;32m    310\u001b[0m         jsparkSession, options\n\u001b[1;32m    311\u001b[0m     )\n\nFile \u001b[0;32m~/anaconda3/lib/python3.9/site-packages/py4j/java_gateway.py:1585\u001b[0m, in \u001b[0;36mJavaClass.__call__\u001b[0;34m(self, *args)\u001b[0m\n\u001b[1;32m   1579\u001b[0m command \u001b[39m=\u001b[39m proto\u001b[39m.\u001b[39mCONSTRUCTOR_COMMAND_NAME \u001b[39m+\u001b[39m\\\n\u001b[1;32m   1580\u001b[0m     \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_command_header \u001b[39m+\u001b[39m\\\n\u001b[1;32m   1581\u001b[0m     args_command \u001b[39m+\u001b[39m\\\n\u001b[1;32m   1582\u001b[0m     proto\u001b[39m.\u001b[39mEND_COMMAND_PART\n\u001b[1;32m   1584\u001b[0m answer \u001b[39m=\u001b[39m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_gateway_client\u001b[39m.\u001b[39msend_command(command)\n\u001b[0;32m-> 1585\u001b[0m return_value \u001b[39m=\u001b[39m get_return_value(\n\u001b[1;32m   1586\u001b[0m     answer, \u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49m_gateway_client, \u001b[39mNone\u001b[39;49;00m, \u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49m_fqn)\n\u001b[1;32m   1588\u001b[0m \u001b[39mfor\u001b[39;00m temp_arg \u001b[39min\u001b[39;00m temp_args:\n\u001b[1;32m   1589\u001b[0m     temp_arg\u001b[39m.\u001b[39m_detach()\n\nFile \u001b[0;32m~/anaconda3/lib/python3.9/site-packages/py4j/protocol.py:330\u001b[0m, in \u001b[0;36mget_return_value\u001b[0;34m(answer, gateway_client, target_id, name)\u001b[0m\n\u001b[1;32m    326\u001b[0m         \u001b[39mraise\u001b[39;00m Py4JJavaError(\n\u001b[1;32m    327\u001b[0m             \u001b[39m\"\u001b[39m\u001b[39mAn error occurred while calling \u001b[39m\u001b[39m{0}\u001b[39;00m\u001b[39m{1}\u001b[39;00m\u001b[39m{2}\u001b[39;00m\u001b[39m.\u001b[39m\u001b[39m\\n\u001b[39;00m\u001b[39m\"\u001b[39m\u001b[39m.\u001b[39m\n\u001b[1;32m    328\u001b[0m             \u001b[39mformat\u001b[39m(target_id, \u001b[39m\"\u001b[39m\u001b[39m.\u001b[39m\u001b[39m\"\u001b[39m, name), value)\n\u001b[1;32m    329\u001b[0m     \u001b[39melse\u001b[39;00m:\n\u001b[0;32m--> 330\u001b[0m         \u001b[39mraise\u001b[39;00m Py4JError(\n\u001b[1;32m    331\u001b[0m             \u001b[39m\"\u001b[39m\u001b[39mAn error occurred while calling \u001b[39m\u001b[39m{0}\u001b[39;00m\u001b[39m{1}\u001b[39;00m\u001b[39m{2}\u001b[39;00m\u001b[39m. Trace:\u001b[39m\u001b[39m\\n\u001b[39;00m\u001b[39m{3}\u001b[39;00m\u001b[39m\\n\u001b[39;00m\u001b[39m\"\u001b[39m\u001b[39m.\u001b[39m\n\u001b[1;32m    332\u001b[0m             \u001b[39mformat\u001b[39m(target_id, \u001b[39m\"\u001b[39m\u001b[39m.\u001b[39m\u001b[39m\"\u001b[39m, name, value))\n\u001b[1;32m    333\u001b[0m \u001b[39melse\u001b[39;00m:\n\u001b[1;32m    334\u001b[0m     \u001b[39mraise\u001b[39;00m Py4JError(\n\u001b[1;32m    335\u001b[0m         \u001b[39m\"\u001b[39m\u001b[39mAn error occurred while calling \u001b[39m\u001b[39m{0}\u001b[39;00m\u001b[39m{1}\u001b[39;00m\u001b[39m{2}\u001b[39;00m\u001b[39m\"\u001b[39m\u001b[39m.\u001b[39m\n\u001b[1;32m    336\u001b[0m         \u001b[39mformat\u001b[39m(target_id, \u001b[39m\"\u001b[39m\u001b[39m.\u001b[39m\u001b[39m\"\u001b[39m, name))\n\n\u001b[0;31mPy4JError\u001b[0m: An error occurred while calling None.org.apache.spark.sql.SparkSession. Trace:\npy4j.Py4JException: Constructor org.apache.spark.sql.SparkSession([class org.apache.spark.SparkContext, class java.util.HashMap]) does not exist\n\tat py4j.reflection.ReflectionEngine.getConstructor(ReflectionEngine.java:179)\n\tat py4j.reflection.ReflectionEngine.getConstructor(ReflectionEngine.java:196)\n\tat py4j.Gateway.invoke(Gateway.java:237)\n\tat py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)\n\tat py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)\n\tat py4j.GatewayConnection.run(GatewayConnection.java:238)\n\tat java.base/java.lang.Thread.run(Thread.java:829)\n\n"
}

The output of running the cell before I read the full error in text editor is the following

Output exceeds the size limit. Open the full output data in a text editor
---------------------------------------------------------------------------
Py4JError                                 Traceback (most recent call last)
/tmp/ipykernel_5260/8684085.py in <module>
      1 from pyspark.sql import SparkSession
----> 2 spark = SparkSession.builder.appName("spark_app").getOrCreate()

~/anaconda3/envs/zupo_env_test1/lib64/python3.7/site-packages/pyspark/sql/session.py in getOrCreate(self)
    270                     # Do not update `SparkConf` for existing `SparkContext`, as it's shared
    271                     # by all sessions.
--> 272                     session = SparkSession(sc, options=self._options)
    273                 else:
    274                     getattr(

~/anaconda3/envs/zupo_env_test1/lib64/python3.7/site-packages/pyspark/sql/session.py in __init__(self, sparkContext, jsparkSession, options)
    305                 )
    306             else:
--> 307                 jsparkSession = self._jvm.SparkSession(self._jsc.sc(), options)
    308         else:
    309             getattr(getattr(self._jvm, "SparkSession
quot;), "MODULE
quot;).applyModifiableSettings(

~/anaconda3/envs/zupo_env_test1/lib64/python3.7/site-packages/py4j/java_gateway.py in __call__(self, *args)
   1584         answer = self._gateway_client.send_command(command)
   1585         return_value = get_return_value(
-> 1586             answer, self._gateway_client, None, self._fqn)
   1587 
...
    at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
    at py4j.GatewayConnection.run(GatewayConnection.java:238)
    at java.base/java.lang.Thread.run(Thread.java:829)

I have googled a lot without a success. Does anybody has an idea what is wrong?

I use IPython Kernel with 3.9 Python installed.

The warnings before the error comes:

WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/home/ec2-user/spark/spark-3.1.2-bin-hadoop2.7/jars/spark-unsafe_2.12-3.1.2.jar) to constructor java.nio.DirectByteBuffer(long,int)
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
22/07/05 21:06:22 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

一百个冬季 2025-02-20 09:11:25

我遇到了同样的问题,我已经修复了从PIP和SPARK中安装相同版本的Pyspark。您应该检查安装版本是否相同。

I had the same problem, I've fixed it installing the same version of pyspark from pip and spark. you should check if your installed versions are the same.

行雁书 2025-02-20 09:11:25

看来,计算机中安装的spark的版本与pyspark的版本不匹配。

使用以下命令检查Spark的版本:

<path-to-spark-bin>/spark-submit --version
# example output
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.1.3
      /_/

现在正如示例输出所显示的那样,已安装的Spark版本为3.1.3,因此您需要通过执行同一版本安装Spark(Pyspark)的Python库(Pyspark)以下命令:

pip install pyspark==<the-version-of-your-spark>
# Example
pip install pyspark==3.1.3

It appears that the version of spark installed in your machine does not match the version of pyspark.

Check the version of your spark using the following command:

<path-to-spark-bin>/spark-submit --version
# example output
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.1.3
      /_/

Now as it appears from the sample output, the version of installed Spark version is 3.1.3, so you need to install the python library of spark (pyspark) with the same version by executing the following command:

pip install pyspark==<the-version-of-your-spark>
# Example
pip install pyspark==3.1.3
韶华倾负 2025-02-20 09:11:25

pip install pyspark == 3.2.1

#它也对我有用

pip install pyspark==3.2.1

#It worked for me also

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文