将数据从 Apache Pig 存储到 SequenceFile
Apache Pig 可以使用 PiggyBank SequenceFileLoader
从 Hadoop 序列文件加载数据:
REGISTER /home/hadoop/pig/contrib/piggybank/java/piggybank.jar;< /code>
定义 SequenceFileLoader org.apache.pig.piggybank.storage.SequenceFileLoader();
log = LOAD '/data/logs' USING SequenceFileLoader AS (...)
是否还有一个库允许从 Pig 写入 Hadoop 序列文件?
Apache Pig can load data from Hadoop sequence files using the PiggyBank SequenceFileLoader
:
REGISTER /home/hadoop/pig/contrib/piggybank/java/piggybank.jar;
DEFINE SequenceFileLoader org.apache.pig.piggybank.storage.SequenceFileLoader();
log = LOAD '/data/logs' USING SequenceFileLoader AS (...)
Is there also a library out there that would allow writing to Hadoop sequence files from Pig?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
只需实现一个 StoreFunc 即可实现此目的。
现在这是可能的,尽管一旦 Pig 0.7 发布,它会变得相当容易,因为它包括对加载/存储接口的完全重新设计。
“Hadoop 扩展包” Twitter
即将开源开源于 github,包含用于基于 Google Protocol Buffers 生成加载和存储函数的代码(基于相同的输入/输出格式构建——显然,您已经拥有用于序列文件的那些)。如果您需要如何做一些不那么琐碎的事情的示例,请查看它。但它应该相当简单。It's just a matter of implementing a StoreFunc to do so.
This is possible now, although it will become a fair bit easier once Pig 0.7 comes out, as it includes a complete redesign of the Load/Store interfaces.
The "Hadoop expansion pack" Twitter
is about to open sourceopen-sourced at github, includes code for generating Load and Store funcs based on Google Protocol Buffers (building on Input/Output formats for same -- you already have those for sequence files, obviously). Check it out if you need examples of how to do some of the less trivial stuff. It should be fairly straightforward though.这似乎对我有用。 https://github.com/kevinweil/elephant-bird/pull/73
This seemed to work for me. https://github.com/kevinweil/elephant-bird/pull/73