关于如何为 Hbase 编写 Hadoop 输入格式/输出格式的任何想法
有人有编写从 Hbase 获取日期的 Hadoop 输入格式/输出格式的经验吗?
我想要比 HbaseTableInputFormat 更具体的东西,因为我的想法是将我的业务对象直接返回到 mapred 程序。这意味着能够构建一个可以分布在多行中的对象。
谢谢你的帮助 埃赫
Is anyone have some experience of writing a Hadoop InputFormat/OutputFormat that get their date from Hbase ?
I'd like something more specific than the HbaseTableInputFormat because my idea is to return my business objects directly to the mapred program. Which means being able to build an object that can spread among several rows.
Thanks For you help
Ech
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您也许可以扩展
RecordReader
和/或FileInputFormat
并在其中实现您需要执行的操作。也许扩展HbaseTableInputFormat
并覆盖您需要不同行为的函数。(尚未使用HbaseTableInputFormat
所以不确定您会做什么,只是一个看看的想法)在我参与的一个项目中,我们必须扩展
RecordReader
和FileInputFormat
才能处理 WC3 日志文件。原因是为了确保每个映射器都可以访问标头,这些标头仅位于文件的顶部,而不是每个块。我还没有扩展这些功能,并且不确定您的具体情况,它可能(或不会)使用
RecordReader
和/或FileInputFormat
。不幸的是,我对我想要的系统不太熟悉,因此我无法对其进行详细说明并提供进一步的建议。
希望我所说的能为您指明更多正确的方向。 :)
You might be able to extend
RecordReader
and/orFileInputFormat
and implement what you need to do inside those. Maybe extendHbaseTableInputFormat
and override the functions you need different behavior in. (Haven't worked withHbaseTableInputFormat
so not sure what you'd do, just an idea to look at)In a project I've worked on we had to extend
RecordReader
andFileInputFormat
to be able to process WC3 log files. The reason was to be sure each mapper had access to the headers, which are only at the top of the file and not with each chunk.I haven't worked with extending those, and not sure about your exact situation, it might (or not) work to extend and implement the different functionality with
RecordReader
and/orFileInputFormat
.I, unfortunately, don't have the familiarity with the systems that I'd like to that would allow me to elaborate on it with further advice.
Hopefully what I've said points you more towards the right direction. :)
我认为如果没有对分区程序进行严重的黑客攻击,这是不可能的。只需首先缩小 Hbase 表,将多行折叠为一行,稍后用于构建业务对象。
I don't think that's possible without gross hacks with Partitioner. Just reduce your Hbase tables first to collapse multiple rows into one row which is later used to construct your business objects.