在 Hadoop 上运行 Python
我正在尝试通过 hive 和 hadoop 运行一个非常简单的 python 脚本。
这是我的脚本:
#!/usr/bin/env python
import sys
for line in sys.stdin:
line = line.strip()
nums = line.split()
i = nums[0]
print i
我想在下表中运行它:
hive> select * from test;
OK
1 3
2 2
3 1
Time taken: 0.071 seconds
hive> desc test;
OK
col1 int
col2 string
Time taken: 0.215 seconds
我正在运行:
hive> select transform (col1, col2) using './proba.py' from test;
但总是得到类似的信息:
...
2011-11-18 12:23:32,646 Stage-1 map = 0%, reduce = 0%
2011-11-18 12:23:58,792 Stage-1 map = 100%, reduce = 100%
Ended Job = job_201110270917_20215 with errors
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
我已经尝试了此过程的许多不同修改,但我不断失败。 :(
是我做错了什么还是我的 hive/hadoop 安装有问题?
I am trying to run a very simple python script via hive and hadoop.
This is my script:
#!/usr/bin/env python
import sys
for line in sys.stdin:
line = line.strip()
nums = line.split()
i = nums[0]
print i
And I want to run it on the following table:
hive> select * from test;
OK
1 3
2 2
3 1
Time taken: 0.071 seconds
hive> desc test;
OK
col1 int
col2 string
Time taken: 0.215 seconds
I am running:
hive> select transform (col1, col2) using './proba.py' from test;
But always get something like:
...
2011-11-18 12:23:32,646 Stage-1 map = 0%, reduce = 0%
2011-11-18 12:23:58,792 Stage-1 map = 100%, reduce = 100%
Ended Job = job_201110270917_20215 with errors
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
I have tried many different modifications of this procedure but I constantly fail. :(
Am I do something wrong or there is a problem with my hive/hadoop installation?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
如果我调试这个,我会检查一些事情:
1) python 文件是否设置为可执行(chmod +x file.py)
2) 确保python 文件在所有机器上都位于同一位置。可能更好 - 将文件放在 hdfs 中,然后您可以使用“using 'hdfs://path/to/file.py' ”而不是本地路径
3)看看您的工作hadoop 仪表板 (http://master-node:9100),如果您单击失败的任务,它将为您提供实际的 java 错误和堆栈跟踪,以便您可以查看执行中实际出现的问题
4) 确保所有从站上都安装了 python节点! (我总是忽略这一点)
希望有帮助......
A few things I'd check for if I were debugging this:
1) Is the python file set to be executable (chmod +x file.py)
2) Make sure the python file is in the same place on all machines. Probably better - put the file in hdfs then you can use " using 'hdfs://path/to/file.py' " instead of a local path
3) Take a look at your job on the hadoop dashboard (http://master-node:9100), if you click on a failed task it will give you the actual java error and stack trace so you can see what actually went wrong with the execution
4) make sure python is installed on all the slave nodes! (I always overlook this one)
Hope that helps...
检查 hive.log 和/或 hadoop 作业(示例中的 job_201110270917_20215)的日志以获取更详细的错误消息。
Check hive.log and/or the log from the hadoop job (job_201110270917_20215 in your example) for a more detailed error message.
“FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask”是一个通用错误,当底层 Map/Reduce 任务出现问题时,Hive 将返回该错误。您需要转到 hive 日志文件(位于 HiveServer2 计算机上)并查找实际的异常堆栈跟踪。
"FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask" is a generic error that hive returns when something goes wong in the underlying map/reduce task. You need to go to hive log files(located on the HiveServer2 machine) and find the actual exception stack trace.