使用 MapReduce 进行排列

发布于 2024-11-18 09:50:47 字数 196 浏览 6 评论 0原文

有没有办法用 MapReduce 生成排列？

输入文件：

1  title1
2  title2
3  title3

我的目标：

1,2  title1,title2
1,3  title1,title3
2,3  title2,title3

原文

Is there a way to generate permutations with MapReduce?

input file:

1  title1
2  title2
3  title3

my goal:

1,2  title1,title2
1,3  title1,title3
2,3  title2,title3

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

海之角 2024-11-25 09:50:47

由于文件将具有 n 个输入，因此排列应具有 n^2 个输出。您可以让 n 个任务执行其中的 n 个操作，这是有道理的。我相信你可以这样做（假设只针对一个文件）：

将你的输入文件放入 DistributedCache 可供您的 Mapper/Reducers 只读访问。在文件的每一行上进行输入分割（就像在 WordCount 中一样）。因此，映射器将收到一行（例如示例中的 title1 ）。然后从 DistributedCache 中的文件中读取行并发出键/值对：将键作为输入，将值作为 DistributedCache 文件中的每一行。

在此模型中，您应该只需要一个 Map 步骤。

像这样的东西：

public static class PermuteMapper
   extends Mapper<Object, Text, Text, Text>{

private static final IN_FILENAME="file.txt";
          
public void map(Object key, Text value, Context context
                ) throws IOException, InterruptedException {

  String inputLine = value.toString();

  // set the property mapred.cache.files in your
  // configuration for the file to be available
  Path[] cachedPaths = DistributedCache.getLocalCacheArchives(conf);
  if ( cachedPaths[0].getName().equals(IN_FILENAME) ) {
     // function defined elsewhere
     String[] cachedLines = getLinesFromPath(cachedPaths[0]);
     for (String line : cachedLines)
       context.emit(inputLine, line);
  }
}
}

Since a file will have n inputs, the permutations should have n^2 outputs. It makes sense that you could have n tasks perform n of those operations. I believe you could do this (assuming only for one file):

Put your input file into the DistributedCache to be accessible as read-only to your Mapper/Reducers. Make an input split on each line of the file (like in WordCount). The mapper will thus recieve one line (e.g. title1 in your example). Then read the lines out of the file in the DistributedCache and emit your key/value pairs: with the key as your input and the values as each line from the file from DistributedCache.

In this model, you should only need a Map step.

Something like:

public static class PermuteMapper
   extends Mapper<Object, Text, Text, Text>{

private static final IN_FILENAME="file.txt";
          
public void map(Object key, Text value, Context context
                ) throws IOException, InterruptedException {

  String inputLine = value.toString();

  // set the property mapred.cache.files in your
  // configuration for the file to be available
  Path[] cachedPaths = DistributedCache.getLocalCacheArchives(conf);
  if ( cachedPaths[0].getName().equals(IN_FILENAME) ) {
     // function defined elsewhere
     String[] cachedLines = getLinesFromPath(cachedPaths[0]);
     for (String line : cachedLines)
       context.emit(inputLine, line);
  }
}
}

回复收藏 0 原文

~没有更多了~