Pig Hadoop 流帮助

发布于 2024-11-26 19:56:45 字数 1876 浏览 6 评论 0原文

我在运行 Pig Streaming 时遇到问题。当我启动一个交互式猪实例时(仅供参考,我通过 SSH/Putty 在交互式猪 AWS EMR 实例的主节点上执行此操作),仅使用一台机器,我的猪流工作完美(它也适用于我的 windows cloudera VM 映像) )。然而,当我切换到使用多台计算机时,它只是停止工作并给出各种错误。

请注意:

  • 我能够在多计算机实例上运行没有任何流命令的 Pig 脚本,没有任何问题。
  • 我所有的 Pig 工作都是在 Pig MapReduce 模式下完成的,而不是 –x 本地模式。
  • 我的 python 脚本 (stream1.py) 上面有这个 #!/usr/bin/env python

下面是我迄今为止尝试过的选项的小样本(以下所有命令都是在 master/main 上的 grunt shell 中完成的)节点,我通过 ssh/putty 访问该节点):

这就是我将 python 文件获取到母节点上以便可以使用它的方法:

cp s3n://darin.emr-logs/stream1.py stream1.py
copyToLocal stream1.py /home/hadoop/stream1.py
chmod 755 stream1.py

这些是我的各种流尝试:

cooc = stream ct_pag_ph through `stream1.py`
dump coco;
ERROR 2090: Received Error while processing the reduce plan: 'stream1.py ' failed with exit status: 127

cooc = stream ct_pag_ph through `python stream1.py`;
dump coco;
ERROR 2090: Received Error while processing the reduce plan: 'python stream1.py ' failed with exit status: 2

DEFINE X `stream1.py`; 
cooc = stream ct_bag_ph through X;
dump coco;
ERROR 2090: Received Error while processing the reduce plan: 'stream1.py ' failed with exit status: 127

DEFINE X `stream1.py`; 
cooc = stream ct_bag_ph through `python X`;
dump coco;
ERROR 2090: Received Error while processing the reduce plan: 'python X ' failed with exit status: 2

DEFINE X `stream1.py` SHIP('stream1.py');
cooc = STREAM ct_bag_ph THROUGH X;
dump cooc;
ERROR 2017: Internal error creating job configuration.

DEFINE X `stream1.py` SHIP('/stream1.p');
cooc = STREAM ct_bag_ph THROUGH X;
dump cooc;

DEFINE X `stream1.py` SHIP('stream1.py') CACHE('stream1.py');
cooc = STREAM ct_bag_ph THROUGH X;
ERROR 2017: Internal error creating job configuration.

define X 'python /home/hadoop/stream1.py' SHIP('/home/hadoop/stream1.py');
cooc = STREAM ct_bag_ph THROUGH X;

I am having issues running pig streaming. When I start up an interactive pig instance (fyi, I am doing this on the master node of an interactive pig AWS EMR instance via SSH/Putty) with only one machine my pig streaming work perfectly (it also works on my windows cloudera VM image). However, when I switch to using more than one computer, it simply stops working and give various errors.

Note that:

  • I am able to run Pig scripts that don’t have any stream commands with no problem on a multi computer instance.
  • all my pig work is being done in pig MapReduce mode rather than –x local mode.
  • my python script (stream1.py) has this on top #!/usr/bin/env python

Below is small sample of the options I have tried so far (all of the below commands are done in the grunt shell on the master/main node, which I am accessing via ssh/putty):

This is how I get the python file onto the mater node so it can be used:

cp s3n://darin.emr-logs/stream1.py stream1.py
copyToLocal stream1.py /home/hadoop/stream1.py
chmod 755 stream1.py

These are my various stream attemts:

cooc = stream ct_pag_ph through `stream1.py`
dump coco;
ERROR 2090: Received Error while processing the reduce plan: 'stream1.py ' failed with exit status: 127

cooc = stream ct_pag_ph through `python stream1.py`;
dump coco;
ERROR 2090: Received Error while processing the reduce plan: 'python stream1.py ' failed with exit status: 2

DEFINE X `stream1.py`; 
cooc = stream ct_bag_ph through X;
dump coco;
ERROR 2090: Received Error while processing the reduce plan: 'stream1.py ' failed with exit status: 127

DEFINE X `stream1.py`; 
cooc = stream ct_bag_ph through `python X`;
dump coco;
ERROR 2090: Received Error while processing the reduce plan: 'python X ' failed with exit status: 2

DEFINE X `stream1.py` SHIP('stream1.py');
cooc = STREAM ct_bag_ph THROUGH X;
dump cooc;
ERROR 2017: Internal error creating job configuration.

DEFINE X `stream1.py` SHIP('/stream1.p');
cooc = STREAM ct_bag_ph THROUGH X;
dump cooc;

DEFINE X `stream1.py` SHIP('stream1.py') CACHE('stream1.py');
cooc = STREAM ct_bag_ph THROUGH X;
ERROR 2017: Internal error creating job configuration.

define X 'python /home/hadoop/stream1.py' SHIP('/home/hadoop/stream1.py');
cooc = STREAM ct_bag_ph THROUGH X;

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

一生独一 2024-12-03 19:56:45
DEFINE X `stream1.py` SHIP('stream1.py');

根据您的先决条件并且在您当前的本地目录中有stream1.py,对我来说似乎有效。

确定这一点的方法:

DEFINE X `python stream1.py` SHIP('/local/path/stream1.py');

SHIP 的目标是将命令复制到所有任务的工作目录中。

DEFINE X `stream1.py` SHIP('stream1.py');

Appears valid to me according to your preconditions and having stream1.py in your current local directory.

A way to be sure of this:

DEFINE X `python stream1.py` SHIP('/local/path/stream1.py');

The goal of SHIP is to copy the command in the working directory of all the tasks.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文