pyspark-没有这样的文件或目录

发布于 2025-02-11 18:32:44 字数 15147 浏览 0 评论 0原文

我安装了Anaconda。 使用Pyspark时,我有一个错误。 没有这样的文件或目录。什么文件或目录? 我看到我也有一个错误的任务集合程序。

  22/07/03 13:26:32错误执行器:阶段0.0中的任务0.0中的异常(TID 0)/1]
java.io.ioexception:无法运行程序“/home/anaconda3/bin/python3”:错误= 2,没有此类文件或目录
  在java.lang.lang.processbuilder.start(processBuilder.java:1048)
  在`Enter code there`org.apache.spark.apak.python.pythonworkerfactory.startdaemon(pythonworkerfactory.scala:209)
  at org.apache.spark.api.python.pythonworkerfactory.createthroughdaemon(pythonworkerfactory.scala:132)
  at org.apache.spark.api.python.pythonworkerfactory.create(pythonworkerfactory.scala:105)
  atrg.apache.spark.sparkenv.createpythonworker(Sparkenv.Scala:119)
  请访问org.apache.spark.api.python.basepythonrunner.com pupute(pythonrunner.scala:145)
  请访问org.apache.spark.api.python.pythonrdd.com putute(pythonrdd.scala:65)
  请访问org.apache.spark.rdd.rdd.computeorreadcheckpoint(rdd.scala:373)
  atrg.apache.spark.rdd.rdd.iterator(rdd.scala:337)
  请访问org.apache.spark.rdd.mappartitionsrdd.compute(mappartitionsrd.scala:52)
  请访问org.apache.spark.rdd.rdd.computeorreadcheckpoint(rdd.scala:373)
  atrg.apache.spark.rdd.rdd.iterator(rdd.scala:337)
  请访问org.apache.spark.rdd.mappartitionsrdd.compute(mappartitionsrd.scala:52)
  请访问org.apache.spark.rdd.rdd.computeorreadcheckpoint(rdd.scala:373)
  atrg.apache.spark.rdd.rdd.iterator(rdd.scala:337)
  请访问org.apache.spark.rdd.mappartitionsrdd.compute(mappartitionsrd.scala:52)
  请访问org.apache.spark.rdd.rdd.computeorreadcheckpoint(rdd.scala:373)
  atrg.apache.spark.rdd.rdd.iterator(rdd.scala:337)
  请访问org.apache.spark.rdd.mappartitionsrdd.compute(mappartitionsrd.scala:52)
  请访问org.apache.spark.rdd.rdd.computeorreadcheckpoint(rdd.scala:373)
  atrg.apache.spark.rdd.rdd.iterator(rdd.scala:337)
  请访问org.apache.spark.rdd.mappartitionsrdd.compute(mappartitionsrd.scala:52)
  请访问org.apache.spark.rdd.rdd.computeorreadcheckpoint(rdd.scala:373)
  atrg.apache.spark.rdd.rdd.iterator(rdd.scala:337)
  请访问org.apache.spark.scheduler.resulttask.runtask(resulttask.scala:90)
  atrg.apache.spark.scheduler.task.run(task.scala:131)
  atorg.apache.spark.executor.executor $ taskrunner。
  在org.apache.spark.util.utils $。
  请访问org.apache.spark.executor.executor $ taskrunner.run(executor.scala:500)
  at Java.util.concurrent.threadpoolexecutor.runworker(threadpoolexecutor.java:1149)
  at Java.util.concurrent.threadpoolexecutor $ worker.run(threadpoolexecutor.java:624)
  在java.lang.thread.run(thread.java:748)
引起:java.io.ioexception:错误= 2,没有此类文件或目录
  在java.lang.unixprocess.forkandexec(本机方法)
  在java.lang.unixprocess。
  在java.lang.processimpl.start(processimpl.java:134)
  在java.lang.lang.processbuilder.start(processBuilder.java:1029)
  ... 31更多
22/07/03 13:26:32警告TaskSetManager:丢失任务0.0在阶段0.0(TID 0)(mypc executor驱动程序):java.io.io.io.ioexception:无法运行
 

程序“/home/anaconda3/bin/python3”:错误= 2,没有这样的文件或 目录 在java.lang.lang.processbuilder.start(processBuilder.java:1048) atrg.apache.spark.api.python.pythonworkerfactory.startdaemon(pythonworkerfactory.scala:209) at org.apache.spark.api.python.pythonworkerfactory.createthroughdaemon(pythonworkerfactory.scala:132) at org.apache.spark.api.python.pythonworkerfactory.create(pythonworkerfactory.scala:105) atrg.apache.spark.sparkenv.createpythonworker(Sparkenv.Scala:119) 请访问org.apache.spark.api.python.basepythonrunner.com pupute(pythonrunner.scala:145) 请访问org.apache.spark.api.python.pythonrdd.com putute(pythonrdd.scala:65) 请访问org.apache.spark.rdd.rdd.computeorreadcheckpoint(rdd.scala:373) atrg.apache.spark.rdd.rdd.iterator(rdd.scala:337) 请访问org.apache.spark.rdd.mappartitionsrdd.compute(mappartitionsrd.scala:52) 请访问org.apache.spark.rdd.rdd.computeorreadcheckpoint(rdd.scala:373) atrg.apache.spark.rdd.rdd.iterator(rdd.scala:337) 请访问org.apache.spark.rdd.mappartitionsrdd.compute(mappartitionsrd.scala:52) 请访问org.apache.spark.rdd.rdd.computeorreadcheckpoint(rdd.scala:373) atrg.apache.spark.rdd.rdd.iterator(rdd.scala:337) 请访问org.apache.spark.rdd.mappartitionsrdd.compute(mappartitionsrd.scala:52) 请访问org.apache.spark.rdd.rdd.computeorreadcheckpoint(rdd.scala:373) atrg.apache.spark.rdd.rdd.iterator(rdd.scala:337) 请访问org.apache.spark.rdd.mappartitionsrdd.compute(mappartitionsrd.scala:52) 请访问org.apache.spark.rdd.rdd.computeorreadcheckpoint(rdd.scala:373) atrg.apache.spark.rdd.rdd.iterator(rdd.scala:337) 请访问org.apache.spark.rdd.mappartitionsrdd.compute(mappartitionsrd.scala:52) 请访问org.apache.spark.rdd.rdd.computeorreadcheckpoint(rdd.scala:373) atrg.apache.spark.rdd.rdd.iterator(rdd.scala:337) 请访问org.apache.spark.scheduler.resulttask.runtask(resulttask.scala:90) atrg.apache.spark.scheduler.task.run(task.scala:131) atorg.apache.spark.executor.executor $ taskrunner。 在org.apache.spark.util.utils $。 请访问org.apache.spark.executor.executor $ taskrunner.run(executor.scala:500) at Java.util.concurrent.threadpoolexecutor.runworker(threadpoolexecutor.java:1149) at Java.util.concurrent.threadpoolexecutor $ worker.run(threadpoolexecutor.java:624) 在java.lang.thread.run(thread.java:748) 引起:java.io.ioexception:错误= 2,没有此类文件或目录 在java.lang.unixprocess.forkandexec(本机方法) 在java.lang.unixprocess。(unixProcess.java:247) 在java.lang.processimpl.start(processimpl.java:134) 在java.lang.lang.processbuilder.start(processBuilder.java:1029) ... 31多

  22/07/03 13:26:32错误tasksetManager:阶段0.0中的任务0失败1次;流产的工作
[阶段0:> (0 + 0) / 1]
----------------------------------------------------------------------------- --------------------------------------
PY4JJAVAERROR TRACKBACK(最近的最新电话)
/tmp/ipykernel_5618/3726558592.py in< gt; gt;
----> 1 df.show()

〜/anaconda3/lib/python3.8/site-packages/pyspark/sql/dataframe.py in show(self,n,n,truncate,垂直)
    482“”
    483如果Isinstance(截短,布尔)并截断:
 - > 484印刷(self._jdf.showstring(n,20,垂直))
    485其他:
    486打印(self._jdf.showstring(n,int(截断),垂直))

〜/anaconda3/lib/python3.8/site-ackages/py4j/java_gateway.py in __call __ __(self, *args)
   1307 
   1308答案= self.gateway_client.send_command(命令)
 - > 1309 return_value = get_return_value(
   1310答案,self.gateway_client,self.target_id,self.name)
   1311 

〜/anaconda3/lib/python3.8/site-ackages/pyspark/sql/utils.py in Deco(*a,** kw)
    109 def deco(*a,** kw):
    110尝试:
 - > 111返回f(*a,** kw)
    112除了py4j.protocol.py4jjavaerror为e:
    113转换= convert_exception(e.java_exception)

〜/anaconda3/lib/python3.8/site-ackages/py4j/protoct.py in get_return_value(答案,gateway_client,target_id,name)
    324 value = output_converter [type](答案[2:],gateway_client)
    325如果答案[1] == reference_type:
 - > 326提升Py4jjavaerror(
    327“调用{0} {1} {2}。\ n”时发生错误。
    328格式(target_id,“。”,名称),值)

PY4JJAVAERROR:调用O44.showstring时发生错误。
:org.apache.spark.sparkexception:由于阶段失败而流产的工作:阶段0.0中的任务0失败1次,最新失败:丢失
 

阶段0.0任务0.0(TID 0)(MYPC执行驱动程序): java.io.ioexception:无法运行程序“/home/anaconda3/bin/python3”: 错误= 2,没有此类文件或目录 在java.lang.lang.processbuilder.start(processBuilder.java:1048) atrg.apache.spark.api.python.pythonworkerfactory.startdaemon(pythonworkerfactory.scala:209) at org.apache.spark.api.python.pythonworkerfactory.createthroughdaemon(pythonworkerfactory.scala:132) at org.apache.spark.api.python.pythonworkerfactory.create(pythonworkerfactory.scala:105) atrg.apache.spark.sparkenv.createpythonworker(Sparkenv.Scala:119) 请访问org.apache.spark.api.python.basepythonrunner.com pupute(pythonrunner.scala:145) 请访问org.apache.spark.api.python.pythonrdd.com putute(pythonrdd.scala:65) 请访问org.apache.spark.rdd.rdd.computeorreadcheckpoint(rdd.scala:373) atrg.apache.spark.rdd.rdd.iterator(rdd.scala:337) 请访问org.apache.spark.rdd.mappartitionsrdd.compute(mappartitionsrd.scala:52) 请访问org.apache.spark.rdd.rdd.computeorreadcheckpoint(rdd.scala:373) atrg.apache.spark.rdd.rdd.iterator(rdd.scala:337) 请访问org.apache.spark.rdd.mappartitionsrdd.compute(mappartitionsrd.scala:52) 请访问org.apache.spark.rdd.rdd.computeorreadcheckpoint(rdd.scala:373) atrg.apache.spark.rdd.rdd.iterator(rdd.scala:337) 请访问org.apache.spark.rdd.mappartitionsrdd.compute(mappartitionsrd.scala:52) 请访问org.apache.spark.rdd.rdd.computeorreadcheckpoint(rdd.scala:373) atrg.apache.spark.rdd.rdd.iterator(rdd.scala:337) 请访问org.apache.spark.rdd.mappartitionsrdd.compute(mappartitionsrd.scala:52) 请访问org.apache.spark.rdd.rdd.computeorreadcheckpoint(rdd.scala:373) atrg.apache.spark.rdd.rdd.iterator(rdd.scala:337) 请访问org.apache.spark.rdd.mappartitionsrdd.compute(mappartitionsrd.scala:52) 请访问org.apache.spark.rdd.rdd.computeorreadcheckpoint(rdd.scala:373) atrg.apache.spark.rdd.rdd.iterator(rdd.scala:337) 请访问org.apache.spark.scheduler.resulttask.runtask(resulttask.scala:90) atrg.apache.spark.scheduler.task.run(task.scala:131) atorg.apache.spark.executor.executor $ taskrunner。 在org.apache.spark.util.utils $。 请访问org.apache.spark.executor.executor $ taskrunner.run(executor.scala:500) at Java.util.concurrent.threadpoolexecutor.runworker(threadpoolexecutor.java:1149) at Java.util.concurrent.threadpoolexecutor $ worker.run(threadpoolexecutor.java:624) 在java.lang.thread.run(thread.java:748) 引起:java.io.ioexception:错误= 2,没有此类文件或目录 在java.lang.unixprocess.forkandexec(本机方法) 在java.lang.unixprocess。(unixProcess.java:247) 在java.lang.processimpl.start(processimpl.java:134) 在java.lang.lang.processbuilder.start(processBuilder.java:1029) ... 31多

 驱动程序stacktrace:
  at rog.apache.spark.scheduler.dagscheduler.failjobandiondentententstages(dagscheduler.scala:2258)
  atorg.apache.spark.scheduler.dagscheduler。
  请访问org.apache.spark.scheduler.dagscheduler。
  在scala.collection.mutable.resizablearray.foreach(resizablearray.scala:62)
  在scala.collection.mutable.resizablearray.foreach $(resizablearray.scala:55)
  在scala.collection.mutable.arraybuffer.foreach(arraybuffer.scala:49)
  atrg.apache.spark.scheduler.dagscheduler.abortstage(dagscheduler.scala:2206)
  请访问org.apache.spark.scheduler.dagscheduler。
  atorg.apache.spark.scheduler.dagscheduler。
  在scala.option.foreach(option.scala:407)
  atrg.apache.spark.scheduler.dagscheduler.handletasksetfailed(dagscheduler.scala:1079)
  atrg.apache.spark.scheduler.dagschedulereventprocessloop.doonreceive(dagscheduler.scala:2445)
  atorg.apache.spark.scheduler.dagschedulereventprocessloop.onreceive(dagscheduler.scala:2387)
  atorg.apache.spark.scheduler.dagschedulereventprocessloop.onreceive(dagscheduler.scala:2376)
  atorg.apache.spark.util.eventloop $$ anon $ 1.lun(Eventloop.scala:49)
  atrg.apache.spark.scheduler.dagscheduler.runjob(dagscheduler.scala:868)
  atorg.apache.spark.sparkcontext.runjob(SparkContext.Scala:2196)
  atorg.apache.spark.sparkcontext.runjob(SparkContext.Scala:2217)
  atorg.apache.spark.sparkcontext.runjob(SparkContext.Scala:2236)
  atrg.apache.spark.sql.execution.sparkplan.executetake(SparkPlan.Scala:472)
  atrg.apache.spark.sql.execution.sparkplan.executetake(SparkPlan.Scala:425)
  atrg.apache.spark.sql.execution.collectlimitexec.executecollect(limit.scala:47)
  请访问org.apache.spark.sql.dataset.collectfromplan(dataset.scala:3696)
  at org.apache.spark.sql.dataset。$ anonfun $ head $ 1(dataset.scala:2722)
  atorg.apache.spark.sql.dataset。$ anonfun $ withAction $ 1(dataset.scala:3687)
  请访问org.apache.spark.sql.execution.sqlexecution $。$ anonfun $ withnewexecutionId $ 5(sqlexecution.scala:103)
  atrg.apache.spark.sql.execution.sqlexecution $ .withsqlConfPropaged(sqlexecution.scala:163)
  请访问org.apache.spark.sql.execution.sqlexecution $。$ anonfun $ withnewexecutionId $ 1(sqlexecution.scala:90)
  atorg.apache.spark.sql.sparksession.vithactive(SparkSession.Scala:775)
  atrg.apache.spark.sql.execution.sqlexecution $ .WithNewExeCutionId(sqlexecution.scala:64)
  atrg.apache.spark.sql.dataset.withaction(dataset.scala:3685)
  atrg.apache.spark.sql.dataset.head(dataset.scala:2722)
  atrg.apache.spark.sql.dataset.take(dataset.scala:2929)
  at org.apache.spark.sql.dataset.getrows(dataset.scala:301)
  请访问org.apache.spark.sql.dataset.showstring(dataset.scala:338)
  在sun.reflect.nativemethodaccessorimpl.invoke0(天然方法)
  在sun.reflect.nativemethodaccessorimpl.invoke(nativemethodaccessorimpl.java:62)
  在sun.reflect.delegatingmethodaccessorimpl.invoke(授权methodaccessorimpl.java:43)
  在java.lang.reflect.method.invoke(method.java:498)
  在py4j.reflection.methodinvoker.invoke(MethodInvoker.java:244)
  在py4j.reflection.reflectionengine.invoke(reflectionengine.java:357)
  在py4j.gateway.invoke(Gateway.java:282)
  在py4j.commands.abstractcommand.invokemethod(AbstractCommand.java:132)
  在py4j.commands.callcommand.execute(callcommand.java:79)
  在py4j.gatewayconnection.run(gatewayConnection.java:238)
  在java.lang.thread.run(thread.java:748)
引起:java.io.ioexception:无法运行程序“/home/home/anaconda3/bin/python3”:error = 2,没有此类文件或目录
  在java.lang.lang.processbuilder.start(processBuilder.java:1048)
  atrg.apache.spark.api.python.pythonworkerfactory.startdaemon(pythonworkerfactory.scala:209)
  at org.apache.spark.api.python.pythonworkerfactory.createthroughdaemon(pythonworkerfactory.scala:132)
  at org.apache.spark.api.python.pythonworkerfactory.create(pythonworkerfactory.scala:105)
  atrg.apache.spark.sparkenv.createpythonworker(Sparkenv.Scala:119)
  请访问org.apache.spark.api.python.basepythonrunner.com pupute(pythonrunner.scala:145)
  请访问org.apache.spark.api.python.pythonrdd.com putute(pythonrdd.scala:65)
  请访问org.apache.spark.rdd.rdd.computeorreadcheckpoint(rdd.scala:373)
  atrg.apache.spark.rdd.rdd.iterator(rdd.scala:337)
  请访问org.apache.spark.rdd.mappartitionsrdd.compute(mappartitionsrd.scala:52)
  请访问org.apache.spark.rdd.rdd.computeorreadcheckpoint(rdd.scala:373)
  atrg.apache.spark.rdd.rdd.iterator(rdd.scala:337)
  请访问org.apache.spark.rdd.mappartitionsrdd.compute(mappartitionsrd.scala:52)
  请访问org.apache.spark.rdd.rdd.computeorreadcheckpoint(rdd.scala:373)
  atrg.apache.spark.rdd.rdd.iterator(rdd.scala:337)
  请访问org.apache.spark.rdd.mappartitionsrdd.compute(mappartitionsrd.scala:52)
  请访问org.apache.spark.rdd.rdd.computeorreadcheckpoint(rdd.scala:373)
  atrg.apache.spark.rdd.rdd.iterator(rdd.scala:337)
  请访问org.apache.spark.rdd.mappartitionsrdd.compute(mappartitionsrd.scala:52)
  请访问org.apache.spark.rdd.rdd.computeorreadcheckpoint(rdd.scala:373)
  atrg.apache.spark.rdd.rdd.iterator(rdd.scala:337)
  请访问org.apache.spark.rdd.mappartitionsrdd.compute(mappartitionsrd.scala:52)
  请访问org.apache.spark.rdd.rdd.computeorreadcheckpoint(rdd.scala:373)
  atrg.apache.spark.rdd.rdd.iterator(rdd.scala:337)
  请访问org.apache.spark.scheduler.resulttask.runtask(resulttask.scala:90)
  atrg.apache.spark.scheduler.task.run(task.scala:131)
  atorg.apache.spark.executor.executor $ taskrunner。
  在org.apache.spark.util.utils $。
  请访问org.apache.spark.executor.executor $ taskrunner.run(executor.scala:500)
  at Java.util.concurrent.threadpoolexecutor.runworker(threadpoolexecutor.java:1149)
  at Java.util.concurrent.threadpoolexecutor $ worker.run(threadpoolexecutor.java:624)
  ...另外
引起:java.io.ioexception:错误= 2,没有此类文件或目录
  在java.lang.unixprocess.forkandexec(本机方法)
  在java.lang.unixprocess。
  在java.lang.processimpl.start(processimpl.java:134)
  在java.lang.lang.processbuilder.start(processBuilder.java:1029)
  ... 31更多
 

I installed anaconda.
I have this error when working with pyspark.
No such file or directory. What file or directory ?
I see that also i have an ERROR TaskSetManager.

enter image description here

22/07/03 13:26:32 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)/ 1]
java.io.IOException: Cannot run program "/home/anaconda3/bin/python3": error=2, No such file or directory
  at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
  at `enter code here`org.apache.spark.api.python.PythonWorkerFactory.startDaemon(PythonWorkerFactory.scala:209)
  at org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:132)
  at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:105)
  at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:119)
  at org.apache.spark.api.python.BasePythonRunner.compute(PythonRunner.scala:145)
  at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:65)
  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
  at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
  at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
  at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
  at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
  at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
  at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
  at org.apache.spark.scheduler.Task.run(Task.scala:131)
  at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
  at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
  at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: error=2, No such file or directory
  at java.lang.UNIXProcess.forkAndExec(Native Method)
  at java.lang.UNIXProcess.<init>(UNIXProcess.java:247)
  at java.lang.ProcessImpl.start(ProcessImpl.java:134)
  at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
  ... 31 more
22/07/03 13:26:32 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0) (mypc executor driver): java.io.IOException: Cannot run

program "/home/anaconda3/bin/python3": error=2, No such file or
directory
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
at org.apache.spark.api.python.PythonWorkerFactory.startDaemon(PythonWorkerFactory.scala:209)
at org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:132)
at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:105)
at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:119)
at org.apache.spark.api.python.BasePythonRunner.compute(PythonRunner.scala:145)
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:65)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:131)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: error=2, No such file or directory
at java.lang.UNIXProcess.forkAndExec(Native Method)
at java.lang.UNIXProcess.(UNIXProcess.java:247)
at java.lang.ProcessImpl.start(ProcessImpl.java:134)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
... 31 more

22/07/03 13:26:32 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1 times; aborting job
[Stage 0:>                                                          (0 + 0) / 1]
---------------------------------------------------------------------------
Py4JJavaError                             Traceback (most recent call last)
/tmp/ipykernel_5618/3726558592.py in <module>
----> 1 df.show()

~/anaconda3/lib/python3.8/site-packages/pyspark/sql/dataframe.py in show(self, n, truncate, vertical)
    482         """
    483         if isinstance(truncate, bool) and truncate:
--> 484             print(self._jdf.showString(n, 20, vertical))
    485         else:
    486             print(self._jdf.showString(n, int(truncate), vertical))

~/anaconda3/lib/python3.8/site-packages/py4j/java_gateway.py in __call__(self, *args)
   1307 
   1308         answer = self.gateway_client.send_command(command)
-> 1309         return_value = get_return_value(
   1310             answer, self.gateway_client, self.target_id, self.name)
   1311 

~/anaconda3/lib/python3.8/site-packages/pyspark/sql/utils.py in deco(*a, **kw)
    109     def deco(*a, **kw):
    110         try:
--> 111             return f(*a, **kw)
    112         except py4j.protocol.Py4JJavaError as e:
    113             converted = convert_exception(e.java_exception)

~/anaconda3/lib/python3.8/site-packages/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
    324             value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
    325             if answer[1] == REFERENCE_TYPE:
--> 326                 raise Py4JJavaError(
    327                     "An error occurred while calling {0}{1}{2}.\n".
    328                     format(target_id, ".", name), value)

Py4JJavaError: An error occurred while calling o44.showString.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost

task 0.0 in stage 0.0 (TID 0) (mypc executor driver):
java.io.IOException: Cannot run program "/home/anaconda3/bin/python3":
error=2, No such file or directory
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
at org.apache.spark.api.python.PythonWorkerFactory.startDaemon(PythonWorkerFactory.scala:209)
at org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:132)
at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:105)
at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:119)
at org.apache.spark.api.python.BasePythonRunner.compute(PythonRunner.scala:145)
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:65)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:131)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: error=2, No such file or directory
at java.lang.UNIXProcess.forkAndExec(Native Method)
at java.lang.UNIXProcess.(UNIXProcess.java:247)
at java.lang.ProcessImpl.start(ProcessImpl.java:134)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
... 31 more

Driver stacktrace:
  at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2258)
  at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2207)
  at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2206)
  at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
  at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
  at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
  at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2206)
  at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1079)
  at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1079)
  at scala.Option.foreach(Option.scala:407)
  at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1079)
  at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2445)
  at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2387)
  at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2376)
  at org.apache.spark.util.EventLoop$anon$1.run(EventLoop.scala:49)
  at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:868)
  at org.apache.spark.SparkContext.runJob(SparkContext.scala:2196)
  at org.apache.spark.SparkContext.runJob(SparkContext.scala:2217)
  at org.apache.spark.SparkContext.runJob(SparkContext.scala:2236)
  at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:472)
  at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:425)
  at org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:47)
  at org.apache.spark.sql.Dataset.collectFromPlan(Dataset.scala:3696)
  at org.apache.spark.sql.Dataset.$anonfun$head$1(Dataset.scala:2722)
  at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3687)
  at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
  at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
  at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
  at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
  at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
  at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3685)
  at org.apache.spark.sql.Dataset.head(Dataset.scala:2722)
  at org.apache.spark.sql.Dataset.take(Dataset.scala:2929)
  at org.apache.spark.sql.Dataset.getRows(Dataset.scala:301)
  at org.apache.spark.sql.Dataset.showString(Dataset.scala:338)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:498)
  at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
  at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
  at py4j.Gateway.invoke(Gateway.java:282)
  at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
  at py4j.commands.CallCommand.execute(CallCommand.java:79)
  at py4j.GatewayConnection.run(GatewayConnection.java:238)
  at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: Cannot run program "/home/anaconda3/bin/python3": error=2, No such file or directory
  at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
  at org.apache.spark.api.python.PythonWorkerFactory.startDaemon(PythonWorkerFactory.scala:209)
  at org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:132)
  at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:105)
  at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:119)
  at org.apache.spark.api.python.BasePythonRunner.compute(PythonRunner.scala:145)
  at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:65)
  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
  at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
  at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
  at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
  at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
  at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
  at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
  at org.apache.spark.scheduler.Task.run(Task.scala:131)
  at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
  at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
  ... 1 more
Caused by: java.io.IOException: error=2, No such file or directory
  at java.lang.UNIXProcess.forkAndExec(Native Method)
  at java.lang.UNIXProcess.<init>(UNIXProcess.java:247)
  at java.lang.ProcessImpl.start(ProcessImpl.java:134)
  at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
  ... 31 more

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文