寻找最佳组的算法
设备包含一个位置数组,其中一些位置包含我们要定期读取的值。
我们想要定期阅读的位置列表还指定了我们想要阅读它们的频率。允许比指定的值更频繁地读取值,但不能低于指定的频率。
单个读取操作可以从数组中读取连续的位置序列,因此可以从一个读取操作返回一组多个值。在单个操作中可以读取的连续位置的最大数量为 M。
目标是将位置分组以最小化读取操作的时间平均数量。如果有不止一种方法可以做到这一点,则决定性因素是最小化读取位置的时间平均数量。
(如果执行此操作的算法允许对位置列表进行增量更改,即在列表中添加或删除一个位置不需要从头开始重新计算分组,则将获得奖励积分!)
我将尽力澄清这有一些 M=6 的例子。
下图显示了位置数组。这些数字代表该位置所需的读取周期。
| 1 | 1 | | | 1 | | | | | | 5 | | 2 |
\-------------------/ \-----------/
group A group B
在第一个示例中,A 组每秒读取一次,B 组每 2 秒读取一次。请注意,应该每 5 秒读取一次的位置实际上是每 2 秒读取一次 - 这很好。
| 1 | | | | | 1 | 1 | | 1 |
\-----------------------/\----------/
group A group B (non-optimal!)
这个例子显示了我最初的简单算法的失败,该算法是将第一组填满,然后开始另一组。下面的分组是更优化的,因为虽然每秒组读取的数量相同,但这些组中读取的位置数量较小:
| 1 | | | | | 1 | 1 | | 1 |
\---/ \---------------/
group A group B (optimal)
最后,一个三组优于二组的示例:
| 5 | | | | | 1 | 1 | | | | | 5 |
\-----------------------/\----------------------/
group A group B (non-optimal)
该解决方案需要每秒两次组读取。更好的解决方案如下:
| 5 | | | | | 1 | 1 | | | | | 5 |
\---/ \-------/ \---/
group A group B group C
这需要每 5 秒读取两次(A 组和 C 组)加上每秒一次(B 组):每秒 1.4 组读取。
编辑:(如果您允许非周期性读取,则此示例有一个更好的解决方案。在第一秒,读取第一个解决方案的两组。在第 2、3、4 和 5 秒读取第二个解决方案的 B 组重复。这会导致每秒 1.2 组读取,但我将不允许这样做,因为这会使负责调度读取的代码变得更加复杂。)
我查找了聚类算法,但这不是聚类问题。 。我还找到了 分配算法在一定条件下N组数字的列表,这指出了“装箱”问题,但我认为这也不是问题。
顺便说一句,很抱歉标题含糊不清。我想不出一个简洁的描述,甚至想不出相关的搜索关键词!
2010 年 9 月 28 日添加的新示例:
这与前面的示例类似,但所有项目都以相同的速率更新。现在,两组比三组更好:
| 1 | | | | | 1 | 1 | | | | | 1 |
\-----------------------/\----------------------/
group A group B (optimal)
我已经开始尝试了解如何实施迭代改进。假设提出了一个分组算法:
| 1 | | | | | 1 | 1 | | | | | 1 | 1 | | | | | 1 |
\---/ \-------/ \-------/ \---/
group A group B group C group D (non-optimal)
\-----------------------/\----------------------/\----------------------/
group A group B group C (optimal)
这可以改进为三个相邻组,每组 6 个。雷克斯建议(下面的评论)我可以尝试将三元组组合成对。但在这种情况下,我必须将四重奏组合成三重奏,因为没有合法的中间安排可以将 A+B+C(或 B+C+D)重新排列成一对,而 D 保持原样。
我最初认为这表明在一般情况下,不能保证可以通过进行本地修改从现有的有效解决方案创建新的有效解决方案。这意味着可以使用模拟退火、遗传算法等算法来尝试改进次优解决方案。
但雷克斯指出(下面的评论),你总是可以将现有的组分成两部分。尽管这总是会增加成本函数,但这意味着解决方案需要摆脱局部最小值才能达到全局最小值。
A device contains an array of locations, some of which contain values that we want to read periodically.
Our list of locations that we want to read periodically also specifies how often we want to read them. It is permitted to read a value more frequently than specified, but not less frequently.
A single read operation can read a contiguous sequence of locations from the array, so it is possible to return a group of multiple values from one read operation. The maximum number of contiguous locations that can be read in a single operation is M.
The goal is to group locations so as to minimize the time-averaged number of read operations. In the event that there is more than one way to do this, the tie-breaker is to minimize the time-averaged number of locations read.
(Bonus points are awarded if the algorithm to do this allows incremental changes to the list of locations - i.e. adding or removing one location to/from the list doesn't require the groupings to be recalculated from scratch!)
I'll try to clarify this with some examples where M=6.
The following diagram shows the array of locations. The numbers represent the desired read period for that location.
| 1 | 1 | | | 1 | | | | | | 5 | | 2 |
\-------------------/ \-----------/
group A group B
In this first example group A is read every second and group B every 2 seconds. Note that the location that should be read every 5 seconds is actually read every 2 seconds - which is fine.
| 1 | | | | | 1 | 1 | | 1 |
\-----------------------/\----------/
group A group B (non-optimal!)
This example shows a failure of my initial simple-minded algorithm, which was to fill up the first group until full and then start another. The following grouping is more optimal because although the number of group reads per second is the same, the number of locations read in those groups is smaller:
| 1 | | | | | 1 | 1 | | 1 |
\---/ \---------------/
group A group B (optimal)
Finally, an example where three groups is better than two:
| 5 | | | | | 1 | 1 | | | | | 5 |
\-----------------------/\----------------------/
group A group B (non-optimal)
This solution requires two group reads per second. A better solution is as follows:
| 5 | | | | | 1 | 1 | | | | | 5 |
\---/ \-------/ \---/
group A group B group C
This requires two reads every 5 seconds (groups A and C) plus one every second (group B): 1.4 group reads per second.
Edit: (There is an even better solution to this example if you allow reads to be non-periodic. On the 1st second, read both groups of the first solution. On seconds 2, 3, 4 and 5 read group B of the second solution. Repeat. This results in 1.2 group reads per second. But I'm going to disallow this because it would make the code responsible for scheduling the reads much more complicated.)
I looked up clustering algorithms but this isn't a clustering problem. I also found Algorithm to allocate a list of numbers to N groups under certain condition, which pointed to the 'Bin packing' problem, but I don't think this is it either.
By the way, sorry for the vague title. I can't think of a concise description, or even relevant search keywords!
New examples added 28 September 2010:
This is like the previous example, but all items updating at the same rate. Now two groups is better than three:
| 1 | | | | | 1 | 1 | | | | | 1 |
\-----------------------/\----------------------/
group A group B (optimal)
I've started trying to see how iterative improvements might be implemented. Suppose a grouping algorithm came up with:
| 1 | | | | | 1 | 1 | | | | | 1 | 1 | | | | | 1 |
\---/ \-------/ \-------/ \---/
group A group B group C group D (non-optimal)
\-----------------------/\----------------------/\----------------------/
group A group B group C (optimal)
This can be improved to three adjacent groups each of 6. Rex suggested (comment below) that I could try combining triplets into pairs. But in this case I would have to combine a quartet into a triplet, because there is no legal intermediate arrangement in which A+B+C (or B+C+D) can be rearranged into a pair leaving D as it is.
I originally thought that this was an indication that in the general case there is no guarantee that a new valid solution can be created from an existing valid solution by making a local modification. This would have meant that algorithms such as simulated annealing, genetic algorithms, etc, could be used to try to refine a suboptimal solution.
But Rex pointed out (comment below) that you can always split an existing group into two. Despite the fact that this always increases the cost function, all that means is that the solution needs to get out of its local minimum in order to reach the global minimum.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
这个问题与类似的 NP 完全问题一样,在添加新项时具有相同的不稳定属性,所以我认为它也是一个。由于我怀疑您想要的是效果相当好的东西,而不是证明为什么它很难,所以我将重点关注一种给出近似解决方案的算法。
我将通过将其转换为一个图表来解决这个问题,其中如果每秒必须读取 N 次,则 bin 的值为 1/N,并以 M 的宽度(例如 6)模糊图表,在原始值处达到峰值。 (对于 6,我可能会使用加权(1/6 1/5 1/4 1/3 1/2 1 1/2 1/3 1/4 1/5 1/6)。)然后将垃圾箱扔到所有本地最大值(按距离对对进行排序,如果可以的话,首先覆盖接近的最大值对)。现在您将涵盖大部分最重要的价值观。然后通过扩展现有的读数或在必要时添加新的读数来捕获任何缺失的组。根据结构,您可能需要通过在读取之间移动位置来添加一些细化,但如果您幸运的话,这甚至没有必要。
由于这本质上是一个本地算法,如果您跟踪模糊图,您可以相当轻松地添加新项目并在本地重新进行峰值覆盖(以及本地细化)。
只是为了看看这对您的数据有何作用,两组情况看起来像(乘以 60,这样我就不必跟踪分数权重)
所以我们完成了,解决方案是最佳的。
对于三组示例,将“5”加权为“1/5”并将所有内容乘以 300,因此再次没有分数,
This problem has the same property of instability on addition of new items that similar NP-complete problems do, so I assume it is one also. Since I suspect that you want something that works reasonably well instead of a proof of why it's hard, I'll focus on an algorithm to give an approximate solution.
I would solve this problem by converting this into a graph where bins were valued at 1/N if they had to be read N times per second, and blur the graph with a width of M (e.g. 6), peaked at the original. (For 6, I might use weighting (1/6 1/5 1/4 1/3 1/2 1 1/2 1/3 1/4 1/5 1/6).) Then throw bins at all the local maxima (sort pairs by distance apart and cover close pairs of maxima first if you can). Now you'll have most of your most important values covered. Then catch any missing groups by extending the existing reads, or by adding new reads if necessary. Depending on the structure you may want to add some refinement by shifting locations between reads, but if you're lucky that won't even be necessary.
Since this is essentially a local algorithm, if you keep track of the blurred graph, you can fairly easily add new items and re-do the peak-covering locally (and the refinement locally).
Just to see how this would work on your data, the two-group case would look like (multiplying by 60 so I don't have to keep track of fractional weights)
So we're done, and the solution is optimal.
For the three group example, weighting "5" as "1/5" and multiplying everything by 300 so again there are no fractions,