Hadoop Pig Latin 无法通过 Python 脚本进行流式传输
我有一个简单的 python 脚本(moo.py),我正在尝试流式传输
import sys, os
for line in sys.stdin:
print 1;
,当我在本地运行此猪脚本(pig -x local)时,我尝试运行此猪脚本
DEFINE CMD `python moo.py` ship('moo.py');
data = LOAD 's3://path/to/my/data/*' AS (a:chararray, b:chararray, c:int, d:int);
res = STREAM data through CMD;
dump res;
,一切都很好, 但是当我在没有 -x local 的情况下运行它时,它会打印出此错误
[main] 错误 org.apache.pig.tools.grunt.Grunt - 错误 2017:创建作业配置时发生内部错误。
[日志档案]
原因:java.io.FileNotFoundException:文件moo.py不存在。
有什么想法吗?
I have a simple python script (moo.py) that i am trying to stream though
import sys, os
for line in sys.stdin:
print 1;
and i try to run this pig script
DEFINE CMD `python moo.py` ship('moo.py');
data = LOAD 's3://path/to/my/data/*' AS (a:chararray, b:chararray, c:int, d:int);
res = STREAM data through CMD;
dump res;
when i run this pig script local (pig -x local) everything is fine,
but when i run it without -x local, it prints out this error
[main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2017: Internal error creating job configuration.
[Log file]
Caused by: java.io.FileNotFoundException: File moo.py does not exist.
any idea?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
这很可能是相对路径的问题。
尝试:
这也可能是读/写/执行权限的问题。
it's most likely an issue of relative path.
try:
it can also be an issue of read/write/execute permission.
问题是我使用了
ship()
函数而不是cache()
while
ship()
工作文件 - 将本地文件从主服务器传递到从服务器cache()
用于从属设备从可访问的位置获取文件例如亚马逊上的 s3
希望对任何人有帮助:]
The problem was that i used
ship()
function instead ofcache()
while
ship()
works file - passing local files from the master to the slavescache()
is used by the slaves to obtain files from an accessible placesuch as s3 on amazon
hope that helps anyone :]