基于两种类型变量(在 php 中)实现加权随机选择的最佳方法是什么?

发布于 2024-07-24 14:24:17 字数 432 浏览 9 评论 0原文

基本上我的困境是这样的。 我有一个托管文件的 x 服务器列表。 还有另一台服务器,托管站点的 mysql 数据库和应用程序。 当文件上传(到前端服务器)时,应用程序会检查哪个服务器上有最多的可用空间,并将文件移动到那里。 如果您从 2 个以上具有相同可用空间的空服务器开始,这种方法效果很好。 如果您稍后将另一台服务器引入混合中......它将比当前服务器拥有更多的可用空间,此方法不是那么有效,因为所有新文件都将难以捉摸地上传到新服务器,这会导致过载因为它将处理大部分新流量,直到它在可用空间方面赶上其余盒子。

所以我想引入一个权重系统,这将有助于规范文件的分布。 因此,如果 3 台服务器分别设置为 33%,并且 1 台服务器具有明显更多的可用空间,它仍然会比其他服务器获得更多的上传(即使它具有相同的权重),但负载将分散到所有服务器上服务器。

谁能建议一个好的仅 php 实现吗?

Basically my dilemma is this. I have a list of x servers that host files. There is another server, that hosts the site's mysql db and application. When a file is uploaded (to the frontend server), the application checks to see which server has the most free space on it, and moves the file there. This works fine if you started with 2+ empty servers with identical amount of free space. If you introduce another server into the mix at a later point.... which will have more free space than the current servers, this method isnt so effective, because all the new files will be uploaded elusively to the new server, which would overload since it will be handling most of the new traffic till it catches up with the rest of the boxes in terms of free space.

So I thought to introduce a weighting system as well, which will help normalize the distribution of files. So if the 3 servers are set at 33% each, and 1 server has significantly more free space, it would still get more uploads than the other servers (even though it has the same weight), but the load would be spread out over all the servers.

Can anyone suggest a good php-only implementation of this?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

jJeQQOZ5 2024-07-31 14:24:17

一种方法是将所有有空间保存文件的服务器上的所有可用空间相加(因此,具有可用空间但不足以保存文件的服务器显然会被排除在外)。 然后确定每个服务器所占空间的百分比(因此新服务器将占比例更大的百分比)。 使用随机数并将其与百分比对齐以确定选择哪个服务器。

例如,考虑拥有五台具有以下可用空间级别的服务器:

Server 1:   2048MB
Server 2:  51400MB
Server 3:   1134MB
Server 4: 140555MB

您需要存储一个 1500MB 的文件。 这使得服务器 3 停止运行,为我们留下了 194003MB 的可用空间。

Server 1:  1.0%
Server 2: 26.5%
Server 4: 72.5%

然后,您选择一个 0 到 100 之间的随机数: 40

Numbers between 0 and 1 (inclusive) would go to Server 1
Numbers > 1 and <= 26.5 would go to Server 2
Numbers > 26.5 and <= 100 would go to Server 4

因此,在本例中,40 表示它存储在服务器 4 上。

One approach would be to sum all available space on all of the servers that have the space to hold the file (so a server with available space but not enough to hold the file would obviously be excluded). Then determine the percentage of that space that each server accounts for (so a new server would account for a proportionally larger percentage). Use a random number and align it with the percentages to determine which server to pick.

For instance, consider having five servers with the following free space levels:

Server 1:   2048MB
Server 2:  51400MB
Server 3:   1134MB
Server 4: 140555MB

You need to store a 1500MB file. That knocks Server 3 out of the running, leaving us with 194003MB total free space.

Server 1:  1.0%
Server 2: 26.5%
Server 4: 72.5%

You then choose a random number between 0 and 100: 40

Numbers between 0 and 1 (inclusive) would go to Server 1
Numbers > 1 and <= 26.5 would go to Server 2
Numbers > 26.5 and <= 100 would go to Server 4

So in this case, 40 means it gets stored on Server 4.

九命猫 2024-07-31 14:24:17

流量平衡通常非常重要。 您可以添加某种加权系统来平衡它(尽管,正如您所说,新服务器仍然会比其他服务器超载),或者使用其他一些交替方法,其中一台服务器永远不会连续两次被击中,就像例子。

但我想我可能会人为地平衡服务器数据,通过将内容从一个服务器移动到另一个服务器,使它们几乎彼此相等,然后让原始或加权/交替算法正常工作。

这不是一个仅限 php 的实现,而只是一些需要考虑的想法。

Traffic balancing is usually very crucial. You can add some sort of weighting system to balance it (although, as you say, the new server will still be overloaded more than the others), or some other alternating method where one server never gets hit twice in a row, just as an example.

But I think I would probably artificially balance out the servers data so that they're almost equal to each other by moving content from one to the other, and then let the original or weighted/alternating algorithm do its job normally.

That's not a php-only implementation, but just some ideas to consider.

π浅易 2024-07-31 14:24:17

实现它的方法如下:

  1. 创建一个包含所有空白空间的数组,在您的情况下以分数形式 { 0.5, 0.5, 1.0 }
  2. 创建第二个权重数组 - 服务器中的空间量除以总空间量空间量,如第一个数组中所示 - { 0.25, 0.25, 0.5 }
  3. 通过调用 1.0*mt_rand()/mt_getmaxrand() 获取随机数,标准化为 (0.0,1.0)
  4. 运行以下循环:

    $total_weight = 0.0; 
      for ( $i = 10; $i <= sizeof($weights); $i++) { 
        $total_weight += #weights[$i]; 
        if($rand <= $total_weight) { 
      返回$i; 
        } 
      } 
      

返回值是服务器的索引

A way to implement it is the following:

  1. Create an array of all the empty space, as a fraction, in your case { 0.5, 0.5, 1.0 }
  2. Create a second array of weights - the amount of space in the server divided by the total amount of space, as it is represented in the first array - { 0.25, 0.25, 0.5 }
  3. Get a random number, normalised to (0.0,1.0) by calling 1.0*mt_rand()/mt_getmaxrand()
  4. run the following loop:

    $total_weight = 0.0;
    for ( $i = 10; $i <= sizeof($weights); $i++) {
      $total_weight += #weights[$i];
      if($rand <= $total_weight) {
    return $i;
      }
    }
    

The returned value is the index of the server

╰◇生如夏花灿烂 2024-07-31 14:24:17

您已经进入了分布式文件系统的世界——一个比您想象的更大的问题空间预计。

在这个领域已经做了很多工作/研究。 您应该考虑使用可用的解决方案,例如 MogileFS,或者至少对他们如何解决您遇到的问题(以及您尚未遇到的问题)

举个我所说的“您尚未遇到的问题”的意思的例子:您实际上不应该存储至少 2 个每个文件的副本,这样如果您丢失一台服务器,您就不会丢失该服务器上的所有文件? 当然,一旦开始这样做,您是否应该能够同时从多个服务器读取单个文件的部分内容,以提高性能? 当然,现在您必须弄清楚文件是如何分发的,当服务器发生故障时,当新服务器上线时,它们如何重新分发,等等......

正确执行此操作很复杂。 如果可以避免,就不要重新发明轮子。 如果你必须重新发明轮子,至少花一些时间看看其他人是如何建造他们的轮子的。

You've entered the world of distributed filesystems -- a problem space larger than you likely anticipated.

A lot of work/research has been done in this field. You should consider using an available solution like MogileFS, or, at the very least, doing some research on how they solved the problems you've encountered (as well as the problems you haven't encountered yet)

For an example of what I mean by "problems you haven't encountered yet": shouldn't you actually be storing at least 2 copies of every file, so that if you lose one server, you haven't lost all the files on it? Of course, once you start doing that, shouldn't you be able to read parts of a single file from multiple servers simultaneously, for a performance gain? And of course, now you have to figure out how files are distributed, how they get redistrubuted when a server fails, when a new server comes online, etc. etc...

Doing this right is complicated. Don't reinvent the wheel if you can avoid it. And if you have to reinvent the wheel, at least spend some time looking at how other people built theirs.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文