Apache Pig 权限问题
我正在尝试在我的 Hadoop 集群上启动并运行 Apache Pig,但遇到了权限问题。 Pig 本身可以正常启动并连接到集群 - 在 Pig shell 中,我可以通过我的 HDFS 目录ls
。然而,当我尝试实际加载数据并运行 Pig 命令时,我遇到了与权限相关的错误:
grunt> A = load 'all_annotated.txt' USING PigStorage() AS (id:long, text:chararray, lang:chararray);
grunt> DUMP A;
2011-08-24 18:11:40,961 [main] ERROR org.apache.pig.tools.grunt.Grunt - You don't have permission to perform the operation. Error from the server: org.apache.hadoop.security.AccessControlException: Permission denied: user=steven, access=WRITE, inode="":hadoop:supergroup:r-xr-xr-x
2011-08-24 18:11:40,977 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias A
Details at logfile: /Users/steven/Desktop/Hacking/hadoop/pig/pig-0.9.0/pig_1314230681326.log
grunt>
在本例中,all_annotated.txt
是我创建的 HDFS 主目录中的一个文件,并且大多数绝对有权限;无论我尝试加载什么文件,都会出现同样的问题。但是,我认为这不是问题,因为错误本身表明 Pig 正在尝试在某个地方写入。谷歌搜索后,我发现一些邮件列表帖子表明某些 Pig Latin 语句(order
等)需要对 HDFS 文件系统上的临时目录进行写访问,该目录的位置由 控制hdfsd-site.xml 中的 hadoop.tmp.dir
属性。我不认为 load
属于该类别,但为了确定,我将 hadoop.tmp.dir
更改为指向一个目录在我的 HDFS 主目录中,问题仍然存在。
那么,有人对可能发生的事情有任何想法吗?
I'm attempting to get Apache Pig up and running on my Hadoop cluster, and am encountering a permissions problem. Pig itself is launching and connecting to the cluster just fine- from within the Pig shell, I can ls
through and around my HDFS directories. However, when I try and actually load data and run Pig commands, I run into permissions-related errors:
grunt> A = load 'all_annotated.txt' USING PigStorage() AS (id:long, text:chararray, lang:chararray);
grunt> DUMP A;
2011-08-24 18:11:40,961 [main] ERROR org.apache.pig.tools.grunt.Grunt - You don't have permission to perform the operation. Error from the server: org.apache.hadoop.security.AccessControlException: Permission denied: user=steven, access=WRITE, inode="":hadoop:supergroup:r-xr-xr-x
2011-08-24 18:11:40,977 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias A
Details at logfile: /Users/steven/Desktop/Hacking/hadoop/pig/pig-0.9.0/pig_1314230681326.log
grunt>
In this case, all_annotated.txt
is a file in my HDFS home directory that I created, and most definitely have permissions to; the same problem occurs no matter what file I try to load
. However, I don't think that's the problem, as the error itself indicates Pig is trying to write somewhere. Googling around, I found a few mailing list posts suggesting that certain Pig Latin statements (order
, etc.) need write access to a temporary directory on the HDFS file system whose location is controlled by the hadoop.tmp.dir
property in hdfsd-site.xml. I don't think load
falls into that category, but just to be sure, I changed hadoop.tmp.dir
to point to a directory within my HDFS home directory, and the problem persisted.
So, anybody out there have any ideas as to what might be going on?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
可能是您的 pig.temp.dir 设置。它默认为 hdfs 上的 /tmp。 Pig 会在那里写入临时结果。如果你没有 /tmp 的权限,Pig 会抱怨。尝试通过 -Dpig.temp.dir 覆盖它。
Probably your pig.temp.dir setting. It defaults to /tmp on hdfs. Pig will write temporary result there. If you don't have permission to /tmp, Pig will complain. Try to override it by -Dpig.temp.dir.
问题可能是
hadoop.tmp.dir
是本地文件系统上的目录,而不是 HDFS。尝试将该属性设置为您知道自己具有写入权限的本地目录。我在 Hadoop 中使用常规 MapReduce 时遇到了同样的错误。A problem might be that
hadoop.tmp.dir
is a directory on your local filesystem, not HDFS. Try setting that property to a local directory you know you have write access to. I've run into the same error using regular MapReduce in Hadoop.