关于如何为 Hbase 编写 Hadoop 输入格式/输出格式的任何想法

发布于 2024-10-18 09:41:17 字数 160 浏览 1 评论 0原文

有人有编写从 Hbase 获取日期的 Hadoop 输入格式/输出格式的经验吗?

我想要比 HbaseTableInputFormat 更具体的东西,因为我的想法是将我的业务对象直接返回到 mapred 程序。这意味着能够构建一个可以分布在多行中的对象。

谢谢你的帮助 埃赫

Is anyone have some experience of writing a Hadoop InputFormat/OutputFormat that get their date from Hbase ?

I'd like something more specific than the HbaseTableInputFormat because my idea is to return my business objects directly to the mapred program. Which means being able to build an object that can spread among several rows.

Thanks For you help
Ech

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

记忆之渊 2024-10-25 09:41:17

您也许可以扩展 RecordReader 和/或 FileInputFormat 并在其中实现您需要执行的操作。也许扩展 HbaseTableInputFormat 并覆盖您需要不同行为的函数。(尚未使用 HbaseTableInputFormat 所以不确定您会做什么,只是一个看看的想法)

在我参与的一个项目中,我们必须扩展 RecordReaderFileInputFormat 才能处理 WC3 日志文件。原因是为了确保每个映射器都可以访问标头,这些标头仅位于文件的顶部,而不是每个块。

我还没有扩展这些功能,并且不确定您的具体情况,它可能(或不会)使用 RecordReader 和/或 FileInputFormat

不幸的是,我对我想要的系统不太熟悉,因此我无法对其进行详细说明并提供进一步的建议。
希望我所说的能为您指明更多正确的方向。 :)

You might be able to extend RecordReader and/or FileInputFormat and implement what you need to do inside those. Maybe extend HbaseTableInputFormat and override the functions you need different behavior in. (Haven't worked with HbaseTableInputFormat so not sure what you'd do, just an idea to look at)

In a project I've worked on we had to extend RecordReader and FileInputFormat to be able to process WC3 log files. The reason was to be sure each mapper had access to the headers, which are only at the top of the file and not with each chunk.

I haven't worked with extending those, and not sure about your exact situation, it might (or not) work to extend and implement the different functionality with RecordReader and/or FileInputFormat.

I, unfortunately, don't have the familiarity with the systems that I'd like to that would allow me to elaborate on it with further advice.
Hopefully what I've said points you more towards the right direction. :)

风尘浪孓 2024-10-25 09:41:17

我认为如果没有对分区程序进行严重的黑客攻击,这是不可能的。只需首先缩小 Hbase 表,将多行折叠为一行,稍后用于构建业务对象。

I don't think that's possible without gross hacks with Partitioner. Just reduce your Hbase tables first to collapse multiple rows into one row which is later used to construct your business objects.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文