如何使用Hadoop API copyMerge功能? addString 参数是什么?
有谁知道或使用过 Hadoop API 中的 copyMerge 函数 - FileUtil?
copyMerge(FileSystem srcFS, Path srcDir, FileSystem dstFS, Path dstFile, boolean deleteSource, Configuration conf, String addString);
函数中,addString参数是什么?如何设置这些文件的合并方式?示例我有零件号 1,2,3,4,5...,我想将它们按升序合并到一个文件中,我该怎么做?
有关 API 的详细信息: http://archive.cloudera.com/cdh/3/hadoop-0.20.2+320/api/org/apache/hadoop/fs/FileUtil.html
谢谢!
Does anyone know or have used copyMerge function in Hadoop API - FileUtil?
copyMerge(FileSystem srcFS, Path srcDir, FileSystem dstFS, Path dstFile, boolean deleteSource, Configuration conf, String addString);
In the function, what is the addString parameter? How do I set how those files are merged? Example I have part number 1,2,3,4,5..., I want to combine them into one file in ascending order, how can I do it?
Detail about the API: http://archive.cloudera.com/cdh/3/hadoop-0.20.2+320/api/org/apache/hadoop/fs/FileUtil.html
Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
看起来 addString 刚刚写入 FileUtil class
当没有文档时,源代码是详细信息的真实且最佳来源。我写了几篇关于如何设置 Git 的文章此处和此处。 Git 有助于更快、更轻松地访问代码。
Looks like the the addString is just written to the OutputStream in the FileUtil class
When there is no documentation, source code is the true and best source for details. I have written a few articles on how to setup Git here and here. Git helps for faster and easier access to the code.