使用mapreduce计算pagerank时如何迭代

发布于 2024-12-27 20:04:59 字数 699 浏览 0 评论 0 原文

当我尝试使用 mapreduce 实现 PageRank 时,我遇到了一些问题。 我想引用这里的代码https://stackoverflow.com/a/5029780/1117436来描述问题。

map ((url,PR), out_links) //PR = random at start
for link in out_links
  emit(link, ((PR/size(out_links)), url))

reduce(url, List[(weight, url)):
   PR =0
   for v in weights
   PR = PR + v
   Set urls = all urls from list

 emit((url, PR), urls)

在上面的过程中,很明显map过程的输入的第二个参数是url的Out链接,但是reduce过程的输出的第二个参数似乎是url的In链接。那么这些代码如何迭代工作呢?

那么我想问的是如何编写代码才能使pagerank算法正常工作?

更新:我认为这个答案解决了我的问题。 https://stackoverflow.com/a/13568286/1117436

I have questions when I'm trying to implementing PageRank with mapreduce.
I want to cite the codes here https://stackoverflow.com/a/5029780/1117436 to describe the problem.

map ((url,PR), out_links) //PR = random at start
for link in out_links
  emit(link, ((PR/size(out_links)), url))

reduce(url, List[(weight, url)):
   PR =0
   for v in weights
   PR = PR + v
   Set urls = all urls from list

 emit((url, PR), urls)

In the above process, it's clearly that the second parameter of the input of map procedure is the Out links of url but the second parameter of the output of reduce procedure seems to be the In links of url. So how can these codes work iteratively?

Then what I want to ask is how to write codes to make the pagerank alrorithm work properly?

UPDATE: I think this answer solves my problem.
https://stackoverflow.com/a/13568286/1117436

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

野侃 2025-01-03 20:04:59

您可以使用 MapReduce 实现迭代算法,但这可能不是最好、更有效的方法(因为每次迭代都会将内容移动到 HDFS/磁盘)。

话虽如此,如果您有兴趣了解如何使用 MapReduce 实现 PageRank 等功能,请查看此处:

PageRank.java

如果你有兴趣,你可以看看一堆旧的(即2009)幻灯片在这里:

现在,您可以通过 Pregel 克隆实现/运行 PageRank 获得更多乐趣例如 Praveen 已经向您建议的 Apache Giraph

You can implement iterative algorithms using MapReduce, but it might not be the best and more efficient way (because you move stuff to HDFS/disk each iteration).

Having said that, if you are interested in looking at how one might implement something like PageRank using MapReduce have a look here:

Start from the run() method in PageRank.java

If you are interested, you can have a look at a bunch of old (i.e. 2009) slides here:

Now, you can have much more fun at implementing/running PageRank with a Pregel clone such as Apache Giraph as Praveen already suggested to you.

姐不稀罕 2025-01-03 20:04:59

已经有几个图形处理框架。

查看可用于图形处理的 Apache Giraph。 Giraph 基于 MR。 GoldenOrb 正处于非常早期的阶段。另外,请查看 Apache Hama,它是 BSP,它有自己的计算引擎,不是基于 MR 的,而是使用 HDFS 进行存储。 Hama 还可以用于进行图形处理。

There are already a couple of graph processing frameworks.

Look at Apache Giraph which can used for graph processing. Giraph is based on MR. GoldenOrb is in a very early stage. Also, take a look at Apache Hama which is an implementation of BSP, this has it's own computation engine and is not MR based, but uses HDFS for storage. Hama can also be used for graph processing.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文