分布式系统设计

发布于 2024-10-08 15:58:24 字数 262 浏览 1 评论 0原文

在分布式系统中,某个节点将“X”个工作单元平均分配到“N”个节点(通过套接字消息传递)。

随着工作节点数量的增加,每个节点完成工作的速度更快,但我们必须建立更多的连接。

在实际情况下,这类似于在类似 Hadoop 的系统中更改 10 个节点,每个节点处理 100GB x 1,000,000 个节点,每个节点处理 1MB。

  • 在这种情况下设置更多连接会产生什么影响? poll() 函数的开销很大吗?
  • 最好的方法是什么?

In a distributed system, a certain node distributes 'X' units of work equally across 'N' nodes (via socket message passing).

As we increase the number of worker nodes, each nodes completes his job faster but we have to set-up more connections.

In a real situation, it would be similar to changing 10 nodes in a Hadoop-like system with each node processing 100GB by 1,000,000 nodes with each node processing 1MB.

  • What's the impact of setting up more connections in this case? Is this a big overhead in poll() function?
  • What's the best approach?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

我乃一代侩神 2024-10-15 15:58:24

听起来您需要查阅阿姆达尔定律

至少我是这样计算高速交换机上有多少台机器最适合我的并行计算的。

Sounds like you will need to consult Amdahl's Law.

At least it was how I computed how many machines on a high-speed switch were optimal for my parallel computations.

你的往事 2024-10-15 15:58:24

Supervisor 和 Worker 之间是否必须使用套接字和消息传递?

您可以使用某种类型的队列,以避免给 Supervisor 带来负担。或者类似HDFS的分布式文件系统来分发任务并收集结果。

它还取决于您计划部署 Worker 的节点数量。 1,000,000 个节点是一个非常大的数字,因此在这种情况下,您必须将任务分配到多个队列中。

需要注意的是,如果所有节点同时完成任务会发生什么。当他们可以请求新任务时,值得考虑一些可变性。 ZooKeeper (http://hadoop.apache.org/zookeeper/) 也可以用来同步作业。

Does it have to use sockets and message passing between Supervisor and Worker?

You can use some type of queuing so avoid putting load onto the Supervisor. Or a distributed file system similar to HDFS to distribute the tasks and collect the results.

It also depends on the number of nodes you are planning to deploy the Workers on. 1,000,000 nodes is a very big number therefore in that case, you'll have to distribute the tasks into multiple queues.

The thing to be careful about is what will happen if all the nodes finish their tasks at the same time. It would be worth putting some variability into when they can request for a new task. ZooKeeper (http://hadoop.apache.org/zookeeper/) is potentially something you can also use to synchronise the jobs.

梦晓ヶ微光ヅ倾城 2024-10-15 15:58:24

你能衡量一下你的网络成本吗?在工作机器上花费的时间应该只是消息传递和接收成本的一部分。

您还可以描述将每个工作结果处理到主结果中的 O 符号吗?

您的主循环是否期望得到响应?

顺便说一句——如果您的工作节点完成速度更快但未充分利用 CPU 资源,您可能会错过设计权衡?

当然,你可能是任何法律的规则或例外(争论/过时的研究)。 ;-)

Can you measure your network cost? The time spent working on the worker machine should be only part of the cost of the message pass and receive.

Also can you describe the O notation for handling each worker result into the master result?

Does your master round robin expected responses?

btw -- if your worker nodes are finishing quicker but underutilizing the cpu resources you may be missing a design trade-off?

of course, you could be the rule or the exception to any law(argument/out of date research). ;-)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文