如何读取 Hadoop Sequentil 文件作为 Hadoop 作业的输入?
我有一个顺序文件,其类型为 "org.apache.hadoop.typedbytes.TypedBytesWritable" 的键值对,我必须提供此文件作为 Hadoop 作业的输入并必须进行处理仅在地图中。我的意思是我不必做任何需要减少的事情。
1)我如何将 FileInputFormat 指定为 SequentialFile ?
2)map函数的签名是什么。
3)我如何从map而不是Reduce中获取输出?
I have a Sequential file which has the key-value pair of type "org.apache.hadoop.typedbytes.TypedBytesWritable" , I have to provide this file as the input to the Hadoop job and have to process it in map only. I mean i dont have to do anything which will need reduce.
1) How will i specify the FileInputFormat as SequentialFile ?
2) What will be the signature of map function.
3) How will i get output from map instead of Reduce?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
设置 SequenceFileAsBinaryInputFormat作为输入格式。这是 代码 SequenceFileAsBinaryInputFormat 类。
这是代码
该映射将使用 BytesWritable 作为键和值类型来调用。
将mapred.reduce.tasks属性设置为0。map的输出将是作业的最终输出。
另外,请查看 SequenceFileAsTextInputFormat。将以 Text 作为键和值类型来调用地图。
Set the SequenceFileAsBinaryInputFormat as the input format. Here is the code for the SequenceFileAsBinaryInputFormat class.
Here is the code
The map would be invoked with a BytesWritable as key and value types.
Set the
mapred.reduce.tasks
property to 0. The output of the map will be the final output of the job.Also, take a look at the SequenceFileAsTextInputFormat. The map would be invoked with Text as key and value types.