pyspark-没有这样的文件或目录
我安装了Anaconda。 使用Pyspark时,我有一个错误。 没有这样的文件或目录。什么文件或目录? 我看到我也有一个错误的任务集合程序。
22/07/03 13:26:32错误执行器:阶段0.0中的任务0.0中的异常(TID 0)/1] java.io.ioexception:无法运行程序“/home/anaconda3/bin/python3”:错误= 2,没有此类文件或目录 在java.lang.lang.processbuilder.start(processBuilder.java:1048) 在`Enter code there`org.apache.spark.apak.python.pythonworkerfactory.startdaemon(pythonworkerfactory.scala:209) at org.apache.spark.api.python.pythonworkerfactory.createthroughdaemon(pythonworkerfactory.scala:132) at org.apache.spark.api.python.pythonworkerfactory.create(pythonworkerfactory.scala:105) atrg.apache.spark.sparkenv.createpythonworker(Sparkenv.Scala:119) 请访问org.apache.spark.api.python.basepythonrunner.com pupute(pythonrunner.scala:145) 请访问org.apache.spark.api.python.pythonrdd.com putute(pythonrdd.scala:65) 请访问org.apache.spark.rdd.rdd.computeorreadcheckpoint(rdd.scala:373) atrg.apache.spark.rdd.rdd.iterator(rdd.scala:337) 请访问org.apache.spark.rdd.mappartitionsrdd.compute(mappartitionsrd.scala:52) 请访问org.apache.spark.rdd.rdd.computeorreadcheckpoint(rdd.scala:373) atrg.apache.spark.rdd.rdd.iterator(rdd.scala:337) 请访问org.apache.spark.rdd.mappartitionsrdd.compute(mappartitionsrd.scala:52) 请访问org.apache.spark.rdd.rdd.computeorreadcheckpoint(rdd.scala:373) atrg.apache.spark.rdd.rdd.iterator(rdd.scala:337) 请访问org.apache.spark.rdd.mappartitionsrdd.compute(mappartitionsrd.scala:52) 请访问org.apache.spark.rdd.rdd.computeorreadcheckpoint(rdd.scala:373) atrg.apache.spark.rdd.rdd.iterator(rdd.scala:337) 请访问org.apache.spark.rdd.mappartitionsrdd.compute(mappartitionsrd.scala:52) 请访问org.apache.spark.rdd.rdd.computeorreadcheckpoint(rdd.scala:373) atrg.apache.spark.rdd.rdd.iterator(rdd.scala:337) 请访问org.apache.spark.rdd.mappartitionsrdd.compute(mappartitionsrd.scala:52) 请访问org.apache.spark.rdd.rdd.computeorreadcheckpoint(rdd.scala:373) atrg.apache.spark.rdd.rdd.iterator(rdd.scala:337) 请访问org.apache.spark.scheduler.resulttask.runtask(resulttask.scala:90) atrg.apache.spark.scheduler.task.run(task.scala:131) atorg.apache.spark.executor.executor $ taskrunner。 在org.apache.spark.util.utils $。 请访问org.apache.spark.executor.executor $ taskrunner.run(executor.scala:500) at Java.util.concurrent.threadpoolexecutor.runworker(threadpoolexecutor.java:1149) at Java.util.concurrent.threadpoolexecutor $ worker.run(threadpoolexecutor.java:624) 在java.lang.thread.run(thread.java:748) 引起:java.io.ioexception:错误= 2,没有此类文件或目录 在java.lang.unixprocess.forkandexec(本机方法) 在java.lang.unixprocess。 在java.lang.processimpl.start(processimpl.java:134) 在java.lang.lang.processbuilder.start(processBuilder.java:1029) ... 31更多 22/07/03 13:26:32警告TaskSetManager:丢失任务0.0在阶段0.0(TID 0)(mypc executor驱动程序):java.io.io.io.ioexception:无法运行
程序“/home/anaconda3/bin/python3”:错误= 2,没有这样的文件或 目录 在java.lang.lang.processbuilder.start(processBuilder.java:1048) atrg.apache.spark.api.python.pythonworkerfactory.startdaemon(pythonworkerfactory.scala:209) at org.apache.spark.api.python.pythonworkerfactory.createthroughdaemon(pythonworkerfactory.scala:132) at org.apache.spark.api.python.pythonworkerfactory.create(pythonworkerfactory.scala:105) atrg.apache.spark.sparkenv.createpythonworker(Sparkenv.Scala:119) 请访问org.apache.spark.api.python.basepythonrunner.com pupute(pythonrunner.scala:145) 请访问org.apache.spark.api.python.pythonrdd.com putute(pythonrdd.scala:65) 请访问org.apache.spark.rdd.rdd.computeorreadcheckpoint(rdd.scala:373) atrg.apache.spark.rdd.rdd.iterator(rdd.scala:337) 请访问org.apache.spark.rdd.mappartitionsrdd.compute(mappartitionsrd.scala:52) 请访问org.apache.spark.rdd.rdd.computeorreadcheckpoint(rdd.scala:373) atrg.apache.spark.rdd.rdd.iterator(rdd.scala:337) 请访问org.apache.spark.rdd.mappartitionsrdd.compute(mappartitionsrd.scala:52) 请访问org.apache.spark.rdd.rdd.computeorreadcheckpoint(rdd.scala:373) atrg.apache.spark.rdd.rdd.iterator(rdd.scala:337) 请访问org.apache.spark.rdd.mappartitionsrdd.compute(mappartitionsrd.scala:52) 请访问org.apache.spark.rdd.rdd.computeorreadcheckpoint(rdd.scala:373) atrg.apache.spark.rdd.rdd.iterator(rdd.scala:337) 请访问org.apache.spark.rdd.mappartitionsrdd.compute(mappartitionsrd.scala:52) 请访问org.apache.spark.rdd.rdd.computeorreadcheckpoint(rdd.scala:373) atrg.apache.spark.rdd.rdd.iterator(rdd.scala:337) 请访问org.apache.spark.rdd.mappartitionsrdd.compute(mappartitionsrd.scala:52) 请访问org.apache.spark.rdd.rdd.computeorreadcheckpoint(rdd.scala:373) atrg.apache.spark.rdd.rdd.iterator(rdd.scala:337) 请访问org.apache.spark.scheduler.resulttask.runtask(resulttask.scala:90) atrg.apache.spark.scheduler.task.run(task.scala:131) atorg.apache.spark.executor.executor $ taskrunner。 在org.apache.spark.util.utils $。 请访问org.apache.spark.executor.executor $ taskrunner.run(executor.scala:500) at Java.util.concurrent.threadpoolexecutor.runworker(threadpoolexecutor.java:1149) at Java.util.concurrent.threadpoolexecutor $ worker.run(threadpoolexecutor.java:624) 在java.lang.thread.run(thread.java:748) 引起:java.io.ioexception:错误= 2,没有此类文件或目录 在java.lang.unixprocess.forkandexec(本机方法) 在java.lang.unixprocess。(unixProcess.java:247) 在java.lang.processimpl.start(processimpl.java:134) 在java.lang.lang.processbuilder.start(processBuilder.java:1029) ... 31多
22/07/03 13:26:32错误tasksetManager:阶段0.0中的任务0失败1次;流产的工作 [阶段0:> (0 + 0) / 1] ----------------------------------------------------------------------------- -------------------------------------- PY4JJAVAERROR TRACKBACK(最近的最新电话) /tmp/ipykernel_5618/3726558592.py in< gt; gt; ----> 1 df.show() 〜/anaconda3/lib/python3.8/site-packages/pyspark/sql/dataframe.py in show(self,n,n,truncate,垂直) 482“” 483如果Isinstance(截短,布尔)并截断: - > 484印刷(self._jdf.showstring(n,20,垂直)) 485其他: 486打印(self._jdf.showstring(n,int(截断),垂直)) 〜/anaconda3/lib/python3.8/site-ackages/py4j/java_gateway.py in __call __ __(self, *args) 1307 1308答案= self.gateway_client.send_command(命令) - > 1309 return_value = get_return_value( 1310答案,self.gateway_client,self.target_id,self.name) 1311 〜/anaconda3/lib/python3.8/site-ackages/pyspark/sql/utils.py in Deco(*a,** kw) 109 def deco(*a,** kw): 110尝试: - > 111返回f(*a,** kw) 112除了py4j.protocol.py4jjavaerror为e: 113转换= convert_exception(e.java_exception) 〜/anaconda3/lib/python3.8/site-ackages/py4j/protoct.py in get_return_value(答案,gateway_client,target_id,name) 324 value = output_converter [type](答案[2:],gateway_client) 325如果答案[1] == reference_type: - > 326提升Py4jjavaerror( 327“调用{0} {1} {2}。\ n”时发生错误。 328格式(target_id,“。”,名称),值) PY4JJAVAERROR:调用O44.showstring时发生错误。 :org.apache.spark.sparkexception:由于阶段失败而流产的工作:阶段0.0中的任务0失败1次,最新失败:丢失
阶段0.0任务0.0(TID 0)(MYPC执行驱动程序): java.io.ioexception:无法运行程序“/home/anaconda3/bin/python3”: 错误= 2,没有此类文件或目录 在java.lang.lang.processbuilder.start(processBuilder.java:1048) atrg.apache.spark.api.python.pythonworkerfactory.startdaemon(pythonworkerfactory.scala:209) at org.apache.spark.api.python.pythonworkerfactory.createthroughdaemon(pythonworkerfactory.scala:132) at org.apache.spark.api.python.pythonworkerfactory.create(pythonworkerfactory.scala:105) atrg.apache.spark.sparkenv.createpythonworker(Sparkenv.Scala:119) 请访问org.apache.spark.api.python.basepythonrunner.com pupute(pythonrunner.scala:145) 请访问org.apache.spark.api.python.pythonrdd.com putute(pythonrdd.scala:65) 请访问org.apache.spark.rdd.rdd.computeorreadcheckpoint(rdd.scala:373) atrg.apache.spark.rdd.rdd.iterator(rdd.scala:337) 请访问org.apache.spark.rdd.mappartitionsrdd.compute(mappartitionsrd.scala:52) 请访问org.apache.spark.rdd.rdd.computeorreadcheckpoint(rdd.scala:373) atrg.apache.spark.rdd.rdd.iterator(rdd.scala:337) 请访问org.apache.spark.rdd.mappartitionsrdd.compute(mappartitionsrd.scala:52) 请访问org.apache.spark.rdd.rdd.computeorreadcheckpoint(rdd.scala:373) atrg.apache.spark.rdd.rdd.iterator(rdd.scala:337) 请访问org.apache.spark.rdd.mappartitionsrdd.compute(mappartitionsrd.scala:52) 请访问org.apache.spark.rdd.rdd.computeorreadcheckpoint(rdd.scala:373) atrg.apache.spark.rdd.rdd.iterator(rdd.scala:337) 请访问org.apache.spark.rdd.mappartitionsrdd.compute(mappartitionsrd.scala:52) 请访问org.apache.spark.rdd.rdd.computeorreadcheckpoint(rdd.scala:373) atrg.apache.spark.rdd.rdd.iterator(rdd.scala:337) 请访问org.apache.spark.rdd.mappartitionsrdd.compute(mappartitionsrd.scala:52) 请访问org.apache.spark.rdd.rdd.computeorreadcheckpoint(rdd.scala:373) atrg.apache.spark.rdd.rdd.iterator(rdd.scala:337) 请访问org.apache.spark.scheduler.resulttask.runtask(resulttask.scala:90) atrg.apache.spark.scheduler.task.run(task.scala:131) atorg.apache.spark.executor.executor $ taskrunner。 在org.apache.spark.util.utils $。 请访问org.apache.spark.executor.executor $ taskrunner.run(executor.scala:500) at Java.util.concurrent.threadpoolexecutor.runworker(threadpoolexecutor.java:1149) at Java.util.concurrent.threadpoolexecutor $ worker.run(threadpoolexecutor.java:624) 在java.lang.thread.run(thread.java:748) 引起:java.io.ioexception:错误= 2,没有此类文件或目录 在java.lang.unixprocess.forkandexec(本机方法) 在java.lang.unixprocess。(unixProcess.java:247) 在java.lang.processimpl.start(processimpl.java:134) 在java.lang.lang.processbuilder.start(processBuilder.java:1029) ... 31多
驱动程序stacktrace: at rog.apache.spark.scheduler.dagscheduler.failjobandiondentententstages(dagscheduler.scala:2258) atorg.apache.spark.scheduler.dagscheduler。 请访问org.apache.spark.scheduler.dagscheduler。 在scala.collection.mutable.resizablearray.foreach(resizablearray.scala:62) 在scala.collection.mutable.resizablearray.foreach $(resizablearray.scala:55) 在scala.collection.mutable.arraybuffer.foreach(arraybuffer.scala:49) atrg.apache.spark.scheduler.dagscheduler.abortstage(dagscheduler.scala:2206) 请访问org.apache.spark.scheduler.dagscheduler。 atorg.apache.spark.scheduler.dagscheduler。 在scala.option.foreach(option.scala:407) atrg.apache.spark.scheduler.dagscheduler.handletasksetfailed(dagscheduler.scala:1079) atrg.apache.spark.scheduler.dagschedulereventprocessloop.doonreceive(dagscheduler.scala:2445) atorg.apache.spark.scheduler.dagschedulereventprocessloop.onreceive(dagscheduler.scala:2387) atorg.apache.spark.scheduler.dagschedulereventprocessloop.onreceive(dagscheduler.scala:2376) atorg.apache.spark.util.eventloop $$ anon $ 1.lun(Eventloop.scala:49) atrg.apache.spark.scheduler.dagscheduler.runjob(dagscheduler.scala:868) atorg.apache.spark.sparkcontext.runjob(SparkContext.Scala:2196) atorg.apache.spark.sparkcontext.runjob(SparkContext.Scala:2217) atorg.apache.spark.sparkcontext.runjob(SparkContext.Scala:2236) atrg.apache.spark.sql.execution.sparkplan.executetake(SparkPlan.Scala:472) atrg.apache.spark.sql.execution.sparkplan.executetake(SparkPlan.Scala:425) atrg.apache.spark.sql.execution.collectlimitexec.executecollect(limit.scala:47) 请访问org.apache.spark.sql.dataset.collectfromplan(dataset.scala:3696) at org.apache.spark.sql.dataset。$ anonfun $ head $ 1(dataset.scala:2722) atorg.apache.spark.sql.dataset。$ anonfun $ withAction $ 1(dataset.scala:3687) 请访问org.apache.spark.sql.execution.sqlexecution $。$ anonfun $ withnewexecutionId $ 5(sqlexecution.scala:103) atrg.apache.spark.sql.execution.sqlexecution $ .withsqlConfPropaged(sqlexecution.scala:163) 请访问org.apache.spark.sql.execution.sqlexecution $。$ anonfun $ withnewexecutionId $ 1(sqlexecution.scala:90) atorg.apache.spark.sql.sparksession.vithactive(SparkSession.Scala:775) atrg.apache.spark.sql.execution.sqlexecution $ .WithNewExeCutionId(sqlexecution.scala:64) atrg.apache.spark.sql.dataset.withaction(dataset.scala:3685) atrg.apache.spark.sql.dataset.head(dataset.scala:2722) atrg.apache.spark.sql.dataset.take(dataset.scala:2929) at org.apache.spark.sql.dataset.getrows(dataset.scala:301) 请访问org.apache.spark.sql.dataset.showstring(dataset.scala:338) 在sun.reflect.nativemethodaccessorimpl.invoke0(天然方法) 在sun.reflect.nativemethodaccessorimpl.invoke(nativemethodaccessorimpl.java:62) 在sun.reflect.delegatingmethodaccessorimpl.invoke(授权methodaccessorimpl.java:43) 在java.lang.reflect.method.invoke(method.java:498) 在py4j.reflection.methodinvoker.invoke(MethodInvoker.java:244) 在py4j.reflection.reflectionengine.invoke(reflectionengine.java:357) 在py4j.gateway.invoke(Gateway.java:282) 在py4j.commands.abstractcommand.invokemethod(AbstractCommand.java:132) 在py4j.commands.callcommand.execute(callcommand.java:79) 在py4j.gatewayconnection.run(gatewayConnection.java:238) 在java.lang.thread.run(thread.java:748) 引起:java.io.ioexception:无法运行程序“/home/home/anaconda3/bin/python3”:error = 2,没有此类文件或目录 在java.lang.lang.processbuilder.start(processBuilder.java:1048) atrg.apache.spark.api.python.pythonworkerfactory.startdaemon(pythonworkerfactory.scala:209) at org.apache.spark.api.python.pythonworkerfactory.createthroughdaemon(pythonworkerfactory.scala:132) at org.apache.spark.api.python.pythonworkerfactory.create(pythonworkerfactory.scala:105) atrg.apache.spark.sparkenv.createpythonworker(Sparkenv.Scala:119) 请访问org.apache.spark.api.python.basepythonrunner.com pupute(pythonrunner.scala:145) 请访问org.apache.spark.api.python.pythonrdd.com putute(pythonrdd.scala:65) 请访问org.apache.spark.rdd.rdd.computeorreadcheckpoint(rdd.scala:373) atrg.apache.spark.rdd.rdd.iterator(rdd.scala:337) 请访问org.apache.spark.rdd.mappartitionsrdd.compute(mappartitionsrd.scala:52) 请访问org.apache.spark.rdd.rdd.computeorreadcheckpoint(rdd.scala:373) atrg.apache.spark.rdd.rdd.iterator(rdd.scala:337) 请访问org.apache.spark.rdd.mappartitionsrdd.compute(mappartitionsrd.scala:52) 请访问org.apache.spark.rdd.rdd.computeorreadcheckpoint(rdd.scala:373) atrg.apache.spark.rdd.rdd.iterator(rdd.scala:337) 请访问org.apache.spark.rdd.mappartitionsrdd.compute(mappartitionsrd.scala:52) 请访问org.apache.spark.rdd.rdd.computeorreadcheckpoint(rdd.scala:373) atrg.apache.spark.rdd.rdd.iterator(rdd.scala:337) 请访问org.apache.spark.rdd.mappartitionsrdd.compute(mappartitionsrd.scala:52) 请访问org.apache.spark.rdd.rdd.computeorreadcheckpoint(rdd.scala:373) atrg.apache.spark.rdd.rdd.iterator(rdd.scala:337) 请访问org.apache.spark.rdd.mappartitionsrdd.compute(mappartitionsrd.scala:52) 请访问org.apache.spark.rdd.rdd.computeorreadcheckpoint(rdd.scala:373) atrg.apache.spark.rdd.rdd.iterator(rdd.scala:337) 请访问org.apache.spark.scheduler.resulttask.runtask(resulttask.scala:90) atrg.apache.spark.scheduler.task.run(task.scala:131) atorg.apache.spark.executor.executor $ taskrunner。 在org.apache.spark.util.utils $。 请访问org.apache.spark.executor.executor $ taskrunner.run(executor.scala:500) at Java.util.concurrent.threadpoolexecutor.runworker(threadpoolexecutor.java:1149) at Java.util.concurrent.threadpoolexecutor $ worker.run(threadpoolexecutor.java:624) ...另外 引起:java.io.ioexception:错误= 2,没有此类文件或目录 在java.lang.unixprocess.forkandexec(本机方法) 在java.lang.unixprocess。 在java.lang.processimpl.start(processimpl.java:134) 在java.lang.lang.processbuilder.start(processBuilder.java:1029) ... 31更多
I installed anaconda.
I have this error when working with pyspark.
No such file or directory. What file or directory ?
I see that also i have an ERROR TaskSetManager.
22/07/03 13:26:32 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)/ 1] java.io.IOException: Cannot run program "/home/anaconda3/bin/python3": error=2, No such file or directory at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048) at `enter code here`org.apache.spark.api.python.PythonWorkerFactory.startDaemon(PythonWorkerFactory.scala:209) at org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:132) at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:105) at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:119) at org.apache.spark.api.python.BasePythonRunner.compute(PythonRunner.scala:145) at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:65) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at org.apache.spark.rdd.RDD.iterator(RDD.scala:337) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at org.apache.spark.rdd.RDD.iterator(RDD.scala:337) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at org.apache.spark.rdd.RDD.iterator(RDD.scala:337) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at org.apache.spark.rdd.RDD.iterator(RDD.scala:337) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at org.apache.spark.rdd.RDD.iterator(RDD.scala:337) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at org.apache.spark.rdd.RDD.iterator(RDD.scala:337) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:131) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.io.IOException: error=2, No such file or directory at java.lang.UNIXProcess.forkAndExec(Native Method) at java.lang.UNIXProcess.<init>(UNIXProcess.java:247) at java.lang.ProcessImpl.start(ProcessImpl.java:134) at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029) ... 31 more 22/07/03 13:26:32 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0) (mypc executor driver): java.io.IOException: Cannot run
program "/home/anaconda3/bin/python3": error=2, No such file or
directory
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
at org.apache.spark.api.python.PythonWorkerFactory.startDaemon(PythonWorkerFactory.scala:209)
at org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:132)
at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:105)
at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:119)
at org.apache.spark.api.python.BasePythonRunner.compute(PythonRunner.scala:145)
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:65)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:131)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: error=2, No such file or directory
at java.lang.UNIXProcess.forkAndExec(Native Method)
at java.lang.UNIXProcess.(UNIXProcess.java:247)
at java.lang.ProcessImpl.start(ProcessImpl.java:134)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
... 31 more22/07/03 13:26:32 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1 times; aborting job [Stage 0:> (0 + 0) / 1] --------------------------------------------------------------------------- Py4JJavaError Traceback (most recent call last) /tmp/ipykernel_5618/3726558592.py in <module> ----> 1 df.show() ~/anaconda3/lib/python3.8/site-packages/pyspark/sql/dataframe.py in show(self, n, truncate, vertical) 482 """ 483 if isinstance(truncate, bool) and truncate: --> 484 print(self._jdf.showString(n, 20, vertical)) 485 else: 486 print(self._jdf.showString(n, int(truncate), vertical)) ~/anaconda3/lib/python3.8/site-packages/py4j/java_gateway.py in __call__(self, *args) 1307 1308 answer = self.gateway_client.send_command(command) -> 1309 return_value = get_return_value( 1310 answer, self.gateway_client, self.target_id, self.name) 1311 ~/anaconda3/lib/python3.8/site-packages/pyspark/sql/utils.py in deco(*a, **kw) 109 def deco(*a, **kw): 110 try: --> 111 return f(*a, **kw) 112 except py4j.protocol.Py4JJavaError as e: 113 converted = convert_exception(e.java_exception) ~/anaconda3/lib/python3.8/site-packages/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name) 324 value = OUTPUT_CONVERTER[type](answer[2:], gateway_client) 325 if answer[1] == REFERENCE_TYPE: --> 326 raise Py4JJavaError( 327 "An error occurred while calling {0}{1}{2}.\n". 328 format(target_id, ".", name), value) Py4JJavaError: An error occurred while calling o44.showString. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost
task 0.0 in stage 0.0 (TID 0) (mypc executor driver):
java.io.IOException: Cannot run program "/home/anaconda3/bin/python3":
error=2, No such file or directory
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
at org.apache.spark.api.python.PythonWorkerFactory.startDaemon(PythonWorkerFactory.scala:209)
at org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:132)
at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:105)
at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:119)
at org.apache.spark.api.python.BasePythonRunner.compute(PythonRunner.scala:145)
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:65)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:131)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: error=2, No such file or directory
at java.lang.UNIXProcess.forkAndExec(Native Method)
at java.lang.UNIXProcess.(UNIXProcess.java:247)
at java.lang.ProcessImpl.start(ProcessImpl.java:134)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
... 31 moreDriver stacktrace: at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2258) at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2207) at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2206) at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2206) at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1079) at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1079) at scala.Option.foreach(Option.scala:407) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1079) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2445) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2387) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2376) at org.apache.spark.util.EventLoop$anon$1.run(EventLoop.scala:49) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:868) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2196) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2217) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2236) at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:472) at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:425) at org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:47) at org.apache.spark.sql.Dataset.collectFromPlan(Dataset.scala:3696) at org.apache.spark.sql.Dataset.$anonfun$head$1(Dataset.scala:2722) at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3687) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64) at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3685) at org.apache.spark.sql.Dataset.head(Dataset.scala:2722) at org.apache.spark.sql.Dataset.take(Dataset.scala:2929) at org.apache.spark.sql.Dataset.getRows(Dataset.scala:301) at org.apache.spark.sql.Dataset.showString(Dataset.scala:338) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:282) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:238) at java.lang.Thread.run(Thread.java:748) Caused by: java.io.IOException: Cannot run program "/home/anaconda3/bin/python3": error=2, No such file or directory at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048) at org.apache.spark.api.python.PythonWorkerFactory.startDaemon(PythonWorkerFactory.scala:209) at org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:132) at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:105) at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:119) at org.apache.spark.api.python.BasePythonRunner.compute(PythonRunner.scala:145) at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:65) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at org.apache.spark.rdd.RDD.iterator(RDD.scala:337) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at org.apache.spark.rdd.RDD.iterator(RDD.scala:337) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at org.apache.spark.rdd.RDD.iterator(RDD.scala:337) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at org.apache.spark.rdd.RDD.iterator(RDD.scala:337) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at org.apache.spark.rdd.RDD.iterator(RDD.scala:337) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at org.apache.spark.rdd.RDD.iterator(RDD.scala:337) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:131) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ... 1 more Caused by: java.io.IOException: error=2, No such file or directory at java.lang.UNIXProcess.forkAndExec(Native Method) at java.lang.UNIXProcess.<init>(UNIXProcess.java:247) at java.lang.ProcessImpl.start(ProcessImpl.java:134) at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029) ... 31 more
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论