对 Hive 中存储的数据运行 Pig 查询

发布于 2024-11-02 22:42:34 字数 452 浏览 3 评论 0原文

我想知道如何运行以 Hive 格式存储的 Pig 查询。我已将 Hive 配置为存储压缩数据(使用本教程 http://wiki.apache.org/ hadoop/Hive/压缩存储)。

在此之前,我曾经只使用带有 Hive 分隔符 (^A) 的普通 Pig 加载函数。但现在 Hive 将数据存储在压缩的序列文件中。使用哪个加载函数?

请注意,不需要像这里提到的那样紧密集成:Using Hive with Pig,只需加载即可函数用于读取 Hive 生成​​的压缩序列文件。

感谢所有的答案。

I would like to know how to run Pig queries stored in Hive format. I have configured Hive to store compressed data (using this tutorial http://wiki.apache.org/hadoop/Hive/CompressedStorage).

Before that I used to just use normal Pig load function with Hive's delimiter (^A). But now Hive stores data in sequence files with compression. Which load function to use?

Note that don't need close integration like mentioned here: Using Hive with Pig, just what load function to use to read compressed sequence files generated by Hive.

Thanks for all the answers.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

沩ん囻菔务 2024-11-09 22:42:34

这是我发现的:
如果您将数据存储为 RCFile,则使用 HiveColumnarLoader 很有意义。要使用此加载表,您需要首先注册一些 jar:

register /srv/pigs/piggybank.jar
register /usr/lib/hive/lib/hive-exec-0.5.0.jar
register /usr/lib/hive/lib/hive-common-0.5.0.jar

a = LOAD '/user/hive/warehouse/table' USING org.apache.pig.piggybank.storage.HiveColumnarLoader('ts int, user_id int, url string');

要从序列文件加载数据,您必须使用 PiggyBank(如前面的示例所示)。 Piggybank 的 SequenceFile 加载器应该处理压缩文件:

register /srv/pigs/piggybank.jar
DEFINE SequenceFileLoader org.apache.pig.piggybank.storage.SequenceFileLoader();
a = LOAD '/user/hive/warehouse/table' USING SequenceFileLoader AS (int, int);

这不适用于 Pig 0.7,因为它无法读取 BytesWritable 类型并将其转换为 Pig 类型,并且您会得到此异常:

2011-07-01 10:30:08,589 WARN org.apache.pig.piggybank.storage.SequenceFileLoader: Unable to translate key class org.apache.hadoop.io.BytesWritable to a Pig datatype
2011-07-01 10:30:08,625 WARN org.apache.hadoop.mapred.Child: Error running child
org.apache.pig.backend.BackendException: ERROR 0: Unable to translate class org.apache.hadoop.io.BytesWritable to a Pig datatype
    at org.apache.pig.piggybank.storage.SequenceFileLoader.setKeyType(SequenceFileLoader.java:78)
    at org.apache.pig.piggybank.storage.SequenceFileLoader.getNext(SequenceFileLoader.java:132)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:142)
    at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:448)
    at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:639)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:315)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:217)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1063)
    at org.apache.hadoop.mapred.Child.main(Child.java:211)

此处描述了如何编译 Piggybank:无法建立存钱罐-> /home/build/ivy/lib 不存在

Here's what I found out:
Using HiveColumnarLoader makes sense if you store data as a RCFile. To load table using this you need to register some jars first:

register /srv/pigs/piggybank.jar
register /usr/lib/hive/lib/hive-exec-0.5.0.jar
register /usr/lib/hive/lib/hive-common-0.5.0.jar

a = LOAD '/user/hive/warehouse/table' USING org.apache.pig.piggybank.storage.HiveColumnarLoader('ts int, user_id int, url string');

To load data from Sequence file you have to use PiggyBank (as in previous example). SequenceFile loader from Piggybank should handle compressed files:

register /srv/pigs/piggybank.jar
DEFINE SequenceFileLoader org.apache.pig.piggybank.storage.SequenceFileLoader();
a = LOAD '/user/hive/warehouse/table' USING SequenceFileLoader AS (int, int);

This doesn't work with Pig 0.7 because it's unable to read BytesWritable type and cast it to Pig type and you get this exception:

2011-07-01 10:30:08,589 WARN org.apache.pig.piggybank.storage.SequenceFileLoader: Unable to translate key class org.apache.hadoop.io.BytesWritable to a Pig datatype
2011-07-01 10:30:08,625 WARN org.apache.hadoop.mapred.Child: Error running child
org.apache.pig.backend.BackendException: ERROR 0: Unable to translate class org.apache.hadoop.io.BytesWritable to a Pig datatype
    at org.apache.pig.piggybank.storage.SequenceFileLoader.setKeyType(SequenceFileLoader.java:78)
    at org.apache.pig.piggybank.storage.SequenceFileLoader.getNext(SequenceFileLoader.java:132)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:142)
    at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:448)
    at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:639)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:315)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:217)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1063)
    at org.apache.hadoop.mapred.Child.main(Child.java:211)

How to compile piggybank is described here: Unable to build piggybank -> /home/build/ivy/lib does not exist

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文