旅行推销员和 Map/Reduce:放弃通道

发布于 2024-11-09 07:53:06 字数 702 浏览 2 评论 0原文

这是一个学术问题而不是实际问题。在旅行商问题或任何其他涉及寻找最小优化的问题中……如果使用映射/归约方法,那么采用某种方法将当前最小结果广播给所有的人似乎是有价值的。以某种方式控制计算节点,允许它们放弃超过该值的计算。

换句话说,如果我们将问题映射出来,我们希望每个节点知道何时在给定的部分结果完成之前、但当它已经超出了其他解决方案时放弃它。

立即想到的一种方法是减速器是否有办法向映射器提供反馈。考虑一下我们是否有 100 个节点,以及映射器向它们提供数百万条路径。如果减速器将最佳结果提供给映射器,则该值可以作为参数包含在每个新路径(问题子集)中。在这种方法中,粒度相当粗糙……100 个节点将各自不断地完成问题的分区,直到完成,并且仅根据映射器的下一个请求获得新的最小值。 (对于少量节点和大量问题分区/子集来说,在这种粒度上工作是无关紧要的;而且,人们可能可以将启发式应用于将可能的路线或问题子集馈送到节点的顺序,以快速收敛到最优值,从而最大限度地减少节点执行的“浪费”计算量)。

我想到的另一种方法是让节点主动订阅某种频道、多播甚至广播,它们可以从计算循环中收集新的最小值。在这种情况下,当(由他们的同行之一)通知更好的解决方案时,他们可以立即放弃错误的计算。

所以,我的问题是:

  • 这个概念是否被任何与现有映射/归约讨论相关的技术术语所涵盖?
  • 当前的地图/归约框架是否提供支持这种动态反馈的功能?
  • 这个想法有什么缺陷吗……为什么它很愚蠢?

This is an academic rather than practical question. In the Traveling Salesman Problem, or any other which involves finding a minimum optimization ... if one were using a map/reduce approach it seems like there would be some value to having some means for the current minimum result to be broadcast to all of the computational nodes in some manner that allows them to abandon computations which exceed that.

In other words if we map the problem out we'd like each node to know when to give up on a given partial result before it's complete but when it's already exceeded some other solution.

One approach that comes immediately to mind would be if the reducer had a means to provide feedback to the mapper. Consider if we had 100 nodes, and millions of paths being fed to them by the mapper. If the reducer feeds the best result to the mapper than that value could be including as an argument along with each new path (problem subset). In this approach the granularity is fairly rough ... the 100 nodes will each keep grinding away on their partition of the problem to completion and only get the new minimum with their next request from the mapper. (For a small number of nodes and a huge number of problem partitions/subsets to work across this granularity would be inconsequential; also it's likely that one could apply heuristics to the sequence in which the possible routes or problem subsets are fed to the nodes to get a rapid convergence towards the optimum and thus minimize the amount of "wasted" computation performed by the nodes).

Another approach that comes to mind would be for the nodes to be actively subscribed to some sort of channel, or multicast or even broadcast from which they could glean new minimums from their computational loop. In that case they could immediately abandon a bad computation when notified of a better solution (by one of their peers).

So, my questions are:

  • Is this concept covered by any terms of art in relation to existing map/reduce discussions
  • Do any of the current map/reduce frameworks provide features to support this sort of dynamic feedback?
  • Is there some flaw with this idea ... some reason why it's stupid?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

醉生梦死 2024-11-16 07:53:06

这是一个很酷的主题,没有那么多文献,以前就已经完成了。因此,这几乎是一篇集思广益的文章,而不是所有问题的答案;)

因此,每个 TSP 都可以表示为一张图表,看起来可能像这样:(取自德语维基百科)

TSP graph

现在您可以在其上运行图形算法。 MapReduce 可以很好地用于图形处理,尽管它有很多开销。
您需要一个称为“消息传递”的范例。本文对此进行了描述:论文
我在博客中以图探索的方式讲述了它,它非常简单地讲述了它是如何工作的。 我的博客文章

这就是您的方式可以告诉映射器当前的最小结果是什么(也许只是顶点本身)。

有了所有的知识,想到 应该是相当标准的分支定界算法(您所描述的)来达到目标​​。就像有一个随机起始顶点并分支到每个相邻顶点一样。这会导致向每个邻接点发送一条消息,其中包含从起始顶点到达该邻接点的成本(映射步骤)。顶点本身仅在其成本低于当前存储的成本(减少步骤)时更新其成本。最初应将其设置为无穷大。
您会一遍又一遍地执行此操作,直到再次到达起始顶点(显然是在您访问了所有其他顶点之后)。因此,您必须以某种方式跟踪当前到达顶点的最佳方式,这也可以存储在顶点本身中。有时你必须绑定这个分支并切断成本太高的分支,这可以在读取消息后在reduce步骤中完成。
基本上,这只是 MapReduce 中的图形算法和一种最短路径的混合。
请注意,这不会产生节点之间的最佳方式,它仍然是一个启发式的事情。而你只是将 NP 难题简化了。

但再次自我宣传一下,也许您已经在我链接的博客文章中读过它,MapReduce 存在一个抽象,它在这种图形处理中的开销要少得多。它称为BSP(批量同步并行)。它在通信和计算模型上更加自由。所以我确信使用 BSP 可以比 MapReduce 更好地实现这一点。您可以通过它更好地了解您谈到的这些渠道。

我目前正在参与一个 Summer of Code 项目,该项目针对 BSP 的这些 SSSP 问题。也许您想感兴趣的话访问。这可能是一个部分解决方案,我的博客中也对此进行了很好的描述。 SSSP 在我的博客中

我'我很高兴听到一些反馈;)

that's a cool theme, that doesn't have that much literature, that was done on it before. So this is pretty much a brainstorming post, rather than an answer to all your problems ;)

So every TSP can be expressed as a graph, that looks possibly like this one: (taken it from the german Wikipedia)

TSP graph

Now you can run a graph algorithm on it. MapReduce can be used for graph processing quite well, although it has much overhead.
You need a paradigm that is called "Message Passing". It was described in this paper here: Paper.
And I blog'd about it in terms of graph exploration, it tells quite simple how it works. My Blogpost

This is the way how you can tell the mapper what is the current minimum result (maybe just for the vertex itself).

With all the knowledge in the back of the mind, it should be pretty standard to think of a branch and bound algorithm (that you described) to get to the goal. Like having a random start vertex and branching to every adjacent vertex. This causes a message to be send to each of this adjacents with the cost it can be reached from the start vertex (Map Step). The vertex itself only updates its cost if it is lower than the currently stored cost (Reduce Step). Initially this should be set to infinity.
You're doing this over and over again until you've reached the start vertex again (obviously after you visited every other one). So you have to somehow keep track of the currently best way to reach a vertex, this can be stored in the vertex itself, too. And every now and then you have to bound this branching and cut off branches that are too costly, this can be done in the reduce step after reading the messages.
Basically this is just a mix of graph algorithms in MapReduce and a kind of shortest paths.
Note that this won't yield to the optimal way between the nodes, it is still a heuristic thing. And you're just parallizing the NP-hard problem.

BUT a little self-advertising again, maybe you've read it already in the blog post I've linked, there exists an abstraction to MapReduce, that has way less overhead in this kind of graph processing. It is called BSP (Bulk synchonous parallel). It is more freely in the communication and it's computing model. So I'm sure that this can be a lot better implemented with BSP than MapReduce. You can realize these channels you've spoken about better with it.

I'm currently involved in an Summer of Code project which targets these SSSP problems with BSP. Maybe you want to visit if you're interested. This could then be a part solution, it is described very well in my blog, too. SSSP's in my blog

I'm excited to hear some feedback ;)

ゞ花落谁相伴 2024-11-16 07:53:06

看来 Storm 实现了我的想法。它本质上是一个计算拓扑(想想每个计算节点如何根据键/散列函数将结果路由到特定的减速器)。

这并不完全是我所描述的,但如果有一种足够低延迟的方式来传播当前边界(即局部最优信息),拓扑中的每个节点都可以更新/接收该边界,以便知道要丢弃哪些结果,则可能会很有用。

It seems that Storm implements what I was thinking of. It's essentially a computational topology (think of how each compute node might be routing results based on a key/hashing function to the specific reducers).

This is not exactly what I described, but might be useful if one had a sufficiently low-latency way to propagate current bounding (i.e. local optimum information) which each node in the topology could update/receive in order to know which results to discard.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文