如何将pig bag保存为json格式
我
example$ pig --version
Apache Pig version 0.8.1-cdh3u1 (rexported)
compiled Jul 18 2011, 08:29:40
在非常简单的数据集上
example$ hadoop fs -cat /user/pavel/trivial.log
1 one
2 two
3 three
运行 Pig ,我尝试使用以下脚本将包格式保存为 json:
REGISTER ./pig.jar;
A = LOAD 'trivial.log' USING PigStorage('\t') AS (mynum: int, mynumstr: chararray);
B = GROUP A BY mynum;
DUMP B;
STORE B into 'trivial_json.out' USING JsonStorage();
并且收到错误: 后端错误消息
---------------------
java.lang.NullPointerException
at org.apache.pig.ResourceSchema.<init>(ResourceSchema.java:239)
at org.apache.pig.builtin.JsonStorage.prepareToWrite(JsonStorage.java:129)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.<init>(PigOutputFormat.java:124)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.getRecordWriter(PigOutputFormat.java:85)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:559)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:414)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
at org.apache.hadoop.mapred.Child.main(Child.java:264)
Pig Stack Trace
---------------
ERROR 2997: Unable to recreate exception from backed error: java.lang.NullPointerException
org.apache.pig.backend.executionengine.ExecException: ERROR 2997: Unable to recreate exception from backed error: java.lang.NullPointerException
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getErrorMessages(Launcher.java:221)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:154)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:337)
at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:382)
at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1209)
at org.apache.pig.PigServer.execute(PigServer.java:1201)
at org.apache.pig.PigServer.access$100(PigServer.java:129)
at org.apache.pig.PigServer$Graph.execute(PigServer.java:1528)
at org.apache.pig.PigServer.executeBatchEx(PigServer.java:373)
at org.apache.pig.PigServer.executeBatch(PigServer.java:340)
at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:115)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:172)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:144)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:90)
at org.apache.pig.Main.run(Main.java:396)
at org.apache.pig.Main.main(Main.java:107)
================================================================================
我的 Java 能力不够强,无法在几分钟内进行调试,有人可以建议可能发生了什么吗?
非常感谢! -帕维尔
I'm running Pig
example$ pig --version
Apache Pig version 0.8.1-cdh3u1 (rexported)
compiled Jul 18 2011, 08:29:40
on very simple dataset
example$ hadoop fs -cat /user/pavel/trivial.log
1 one
2 two
3 three
I'm trying to save the bag format as json by using the following script:
REGISTER ./pig.jar;
A = LOAD 'trivial.log' USING PigStorage('\t') AS (mynum: int, mynumstr: chararray);
B = GROUP A BY mynum;
DUMP B;
STORE B into 'trivial_json.out' USING JsonStorage();
and I get an error:
Backend error message
---------------------
java.lang.NullPointerException
at org.apache.pig.ResourceSchema.<init>(ResourceSchema.java:239)
at org.apache.pig.builtin.JsonStorage.prepareToWrite(JsonStorage.java:129)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.<init>(PigOutputFormat.java:124)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.getRecordWriter(PigOutputFormat.java:85)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:559)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:414)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
at org.apache.hadoop.mapred.Child.main(Child.java:264)
Pig Stack Trace
---------------
ERROR 2997: Unable to recreate exception from backed error: java.lang.NullPointerException
org.apache.pig.backend.executionengine.ExecException: ERROR 2997: Unable to recreate exception from backed error: java.lang.NullPointerException
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getErrorMessages(Launcher.java:221)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:154)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:337)
at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:382)
at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1209)
at org.apache.pig.PigServer.execute(PigServer.java:1201)
at org.apache.pig.PigServer.access$100(PigServer.java:129)
at org.apache.pig.PigServer$Graph.execute(PigServer.java:1528)
at org.apache.pig.PigServer.executeBatchEx(PigServer.java:373)
at org.apache.pig.PigServer.executeBatch(PigServer.java:340)
at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:115)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:172)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:144)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:90)
at org.apache.pig.Main.run(Main.java:396)
at org.apache.pig.Main.main(Main.java:107)
================================================================================
I'm not strong enough in Java to debug in minutes, can somebody suggest what might be going on?
Thanks much!
-Pavel
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
通过电子邮件收到一些与此相关的问题,认为这对其他人可能有用:
事实证明,JsonStorage 类未包含在我的 Pig 安装中。事实上它甚至还没有出现在稳定的分支中,你在 0.9.2 中找不到它。但是如果你得到最新的主干
http://svn.apache.org/repos/asf /pig/trunk/
然后 trunk/test/org/apache/pig/test/TestJsonLoaderStorage.java 显示了它是如何工作的。如果您不反对未发布的版本,那么您可以尝试一下。如果您这样做,您还应该看看 Avro(带有 json 元数据的二进制格式)。也没有正式出来。
我试图运行一个流式 python 作业,并试图避免手动指定模式。我最终传递了 Pig 包并手工解析它们。
希望这有帮助。
-帕维尔
Got some questions regarding this by email, thought this might be useful for others:
It turned out that the JsonStorage class wasn't included in my Pig installation. In fact it's not even out in a stable branch yet, you won't find it in 0.9.2. But if you get the latest trunk
http://svn.apache.org/repos/asf/pig/trunk/
then trunk/test/org/apache/pig/test/TestJsonLoaderStorage.java shows how it works. If you are not averse to unreleased versions then you could try it. If you go this way you should also take a look at Avro (binary format with json metadata). It's not officially out either.
I was trying to run a streaming python job and was trying to avoid manually specifying the schema. I ended up passing Pig bags and parsing them by hand.
Hope this helps.
-Pavel