在 Hadoop 上运行 Python

发布于 2024-12-17 05:13:39 字数 915 浏览 0 评论 0原文

我正在尝试通过 hive 和 hadoop 运行一个非常简单的 python 脚本。

这是我的脚本：

#!/usr/bin/env python
import sys

for line in sys.stdin:
    line = line.strip()
    nums = line.split()
    i = nums[0]
    print i

我想在下表中运行它：

hive> select * from test;
OK
1       3
2       2
3       1
Time taken: 0.071 seconds
hive> desc test;
OK
col1    int
col2    string
Time taken: 0.215 seconds

我正在运行：

hive> select transform (col1, col2) using './proba.py' from test;

但总是得到类似的信息：

...
2011-11-18 12:23:32,646 Stage-1 map = 0%,  reduce = 0%
2011-11-18 12:23:58,792 Stage-1 map = 100%,  reduce = 100%
Ended Job = job_201110270917_20215 with errors
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask

我已经尝试了此过程的许多不同修改，但我不断失败。 :(

是我做错了什么还是我的 hive/hadoop 安装有问题？

原文

I am trying to run a very simple python script via hive and hadoop.

This is my script:

#!/usr/bin/env python
import sys

for line in sys.stdin:
    line = line.strip()
    nums = line.split()
    i = nums[0]
    print i

And I want to run it on the following table:

hive> select * from test;
OK
1       3
2       2
3       1
Time taken: 0.071 seconds
hive> desc test;
OK
col1    int
col2    string
Time taken: 0.215 seconds

I am running:

hive> select transform (col1, col2) using './proba.py' from test;

But always get something like:

...
2011-11-18 12:23:32,646 Stage-1 map = 0%,  reduce = 0%
2011-11-18 12:23:58,792 Stage-1 map = 100%,  reduce = 100%
Ended Job = job_201110270917_20215 with errors
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask

I have tried many different modifications of this procedure but I constantly fail. :(

Am I do something wrong or there is a problem with my hive/hadoop installation?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

夜血缘 2024-12-24 05:13:39

如果我调试这个，我会检查一些事情：

1) python 文件是否设置为可执行(chmod +x file.py)

2) 确保python 文件在所有机器上都位于同一位置。可能更好 - 将文件放在 hdfs 中，然后您可以使用“using 'hdfs://path/to/file.py' ”而不是本地路径

3)看看您的工作hadoop 仪表板 (http://master-node:9100)，如果您单击失败的任务，它将为您提供实际的 java 错误和堆栈跟踪，以便您可以查看执行中实际出现的问题

4) 确保所有从站上都安装了 python节点！（我总是忽略这一点）

希望有帮助......

回复收藏 0 原文

不打扰别人 2024-12-24 05:13:39

检查 hive.log 和/或 hadoop 作业（示例中的 job_201110270917_20215）的日志以获取更详细的错误消息。

回复收藏 0 原文

别忘他 2024-12-24 05:13:39

“FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask”是一个通用错误，当底层 Map/Reduce 任务出现问题时，Hive 将返回该错误。您需要转到 hive 日志文件（位于 HiveServer2 计算机上）并查找实际的异常堆栈跟踪。

回复收藏 0 原文

~没有更多了~