Hadoop可以读取任意密钥的二进制文件
看起来Hadoop MapReduce需要文本或二进制文本中的键值对结构。 实际上,我们可能需要将文件分割成多个块来进行处理。但钥匙可能是 分布在整个文件中。一个键后面跟着一个值可能不是一个明确的界限。有没有可以读取这种类型的二进制文件的InputFileFormatter?我不想使用MapReduce和MapReduce。这会降低性能并违背使用 MapReduce 的目的。 有什么建议吗?谢谢,
It looks like Hadoop MapReduce requires a key value pair structure in the text or binary text.
In reality we might have files to be split into chunks to be processed. But the keys may be
spread across the file. It may not be a clear cut that one key followed by one value. Is there any InputFileFormatter that can read such type of binary files? I don't want to use Map Reduce and Map Reduce. That will slow down the performance and defeat the purpose of using map reduce.
Any suggestions? Thanks,
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
根据Hadoop:权威指南
如果文件被 HDFS 在边界之间分割,那么 Hadoop 框架将处理它。但如果您手动分割文件,则必须考虑边界。
什么情况下,我们可以看看解决方法吗?
According to the Hadoop : The Definitive Guide
If the file is split by HDFS between boundaries, then Hadoop framework will take care of it. But if you split the file manually, then boundaries have to be taken into consideration.
What's the scenario, we can look at a workaround for this?