将列表划分为子集
我有一个项目列表,我想将其划分为子集。为了便于讨论,我们假设它们是文件。我希望每个子集最多包含 5 个文件,并且如果可能的话,子集中文件的总大小小于 1 MB。如果单个文件超过 1MB,则它本身应该属于一个子集。
我以稍微更通用的形式编写了这篇文章,使用通用的“项目指标”而不是文件大小。但我怀疑有一种更简单和/或更好的方法可以做到这一点。有什么建议吗?
这是我所得到的:
public static IEnumerable<IEnumerable<T>> InSetsOf<T>(this IEnumerable<T> source, int maxItemsPerSet, int maxMetricPerSet, Func<T, int> getMetric)
{
int currentMetricSum = 0;
List<T> currentSet = new List<T>();
foreach (T listItem in source)
{
int itemMetric = getMetric(listItem);
if (currentSet.Count > 0 &&
(currentSet.Count >= maxItemsPerSet || (currentMetricSum + itemMetric) > maxMetricPerSet))
{
yield return currentSet;
//Start a new subset
currentSet = new List<T>();
currentMetricSum = 0;
}
currentSet.Add(listItem);
currentMetricSum += itemMetric;
}
//Return the last set
yield return currentSet;
}
I have a list of items which I would like to partition into subsets. For the sake of discussion lets say they're files. I would like each subset to contain at most 5 files, and for the total size of the files in the subset to be less than 1 MB if possible. If a single file exceeds 1MB it should be in a subset by itself.
I wrote this up in a slightly more generic form, using a generic "item metric" instead of file size. But I suspect there's a simpler and/or better way to do this. Any suggestions?
Here's what I've got:
public static IEnumerable<IEnumerable<T>> InSetsOf<T>(this IEnumerable<T> source, int maxItemsPerSet, int maxMetricPerSet, Func<T, int> getMetric)
{
int currentMetricSum = 0;
List<T> currentSet = new List<T>();
foreach (T listItem in source)
{
int itemMetric = getMetric(listItem);
if (currentSet.Count > 0 &&
(currentSet.Count >= maxItemsPerSet || (currentMetricSum + itemMetric) > maxMetricPerSet))
{
yield return currentSet;
//Start a new subset
currentSet = new List<T>();
currentMetricSum = 0;
}
currentSet.Add(listItem);
currentMetricSum += itemMetric;
}
//Return the last set
yield return currentSet;
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
装箱问题是一个 NP 难题。获得最佳解决方案的唯一方法是测试所有组合。如果有固定数量的不同大小,则可以使用动态规划系统地完成(有一个关于SO的答案本例的示例代码),但是这种算法的运行时间非常糟糕。
这意味着您应该寻找一种启发式方法,让您在合理的时间内接近最佳解决方案。您的算法(首次拟合)是一个很好的起点。只需付出很少的努力,就可以通过减小尺寸对物品进行预分类来稍微改进。然而,还有其他几种或多或少复杂的启发式方法可以提高速度和结果。
Google 搜索返回以下结果之一:装箱启发式的基本分析(有一个 论文 分析结果)。显然,带有 bin 查找表的最佳拟合算法可以在合理的运行时间内提供良好的结果。
Bin packing is a NP-hard problem. The only way to get an optimal solution is to test all combinations. If there is a fixed number of distinct sizes, it can be systematically done using dynamic programming (there is an answer on SO with sample code for this case), but the running time for such algorithm is terrible.
This means you should be looking for a heuristic which would get you close to the optimal solution in reasonable time. Your algorithm (first-fit) is a good starting point. With little effort, it can be slightly improved by presorting the items by decreasing size. There are, however, several other more-or-less complex heuristics which improve both speed and the results.
A Google search returned this as one of the results: Basic analysis of bin-packing heuristics (there is a paper which analyses results). Apparently, a best fit algorithm with a bin lookup table provides good results with a reasonable running time.
缺少 1MB 测试,但除此之外你的代码对我来说看起来还不错。我认为没有更好的方法可以做到这一点。
The 1MB test is missing, but otherwise your code looks OK to me. I do not think there is a significantly better way of doing it.