用于从字典中计算附加组合的 Python 脚本

发布于 2024-08-30 08:34:01 字数 1581 浏览 5 评论 0原文

我正在尝试编写一个脚本,该脚本将采用一个项目字典,每个项目包含 0 - 10 之间的值的属性,并添加各种元素来选择哪些项目组合可以达到所需的总数。我还需要脚本来执行此操作,仅使用具有相同“槽”的公共项目。

例如:

item_list = {
 'item_1': {'slot': 'top', 'prop_a': 2, 'prop_b': 0, 'prop_c': 2, 'prop_d': 1 },
 'item_2': {'slot': 'top', 'prop_a': 5, 'prop_b': 0, 'prop_c': 1, 'prop_d':-1 },
 'item_3': {'slot': 'top', 'prop_a': 2, 'prop_b': 5, 'prop_c': 2, 'prop_d':-2 },
 'item_4': {'slot': 'mid', 'prop_a': 5, 'prop_b': 5, 'prop_c':-5, 'prop_d': 0 },
 'item_5': {'slot': 'mid', 'prop_a':10, 'prop_b': 0, 'prop_c':-5, 'prop_d': 0 },
 'item_6': {'slot': 'mid', 'prop_a':-5, 'prop_b': 2, 'prop_c': 3, 'prop_d': 5 },
 'item_7': {'slot': 'bot', 'prop_a': 1, 'prop_b': 3, 'prop_c':-4, 'prop_d': 4 },
 'item_8': {'slot': 'bot', 'prop_a': 2, 'prop_b': 2, 'prop_c': 0, 'prop_d': 0 },
 'item_9': {'slot': 'bot', 'prop_a': 3, 'prop_b': 1, 'prop_c': 4, 'prop_d':-4 },
}

然后,脚本需要从“item_list”字典中选择哪些组合,每个“槽”使用 1 个项目,以便在添加时实现所需的结果。

例如,如果所需结果为:“prop_a”:3、“prop_b”:3、“prop_c”:8、“prop_d”:0,则脚本将选择“item_2”、“item_6”和“item_9”,以及任何其他有效的组合。

'item_2': {'slot': 'top', 'prop_a': 5, 'prop_b': 0, 'prop_c': 1, 'prop_d':-1 }
'item_6': {'slot': 'mid', 'prop_a':-5, 'prop_b': 2, 'prop_c': 3, 'prop_d': 5 }
'item_9': {'slot': 'bot', 'prop_a': 3, 'prop_b': 1, 'prop_c': 4, 'prop_d':-4 }
 'total':                 'prop_a': 3, 'prop_b': 3, 'prop_c': 8, 'prop_d': 0

有什么想法如何实现这一点?它不需要用Python编写,甚至不需要一个完整的脚本,但只要解释一下理论上如何做到这一点对我来说就足够了。我尝试过对每个组合进行循环,但这似乎很快就会失去控制并且难以管理。实际脚本需要使用 20 个不同的“槽”(每个槽有 8 个属性)对大约 1,000 个项目执行此操作。

感谢您的帮助!

I am trying to write a script that will take a dictionary of items, each containing properties of values from 0 - 10, and add the various elements to select which combination of items achieve the desired totals. I also need the script to do this, using only items that have the same "slot" in common.

For example:

item_list = {
 'item_1': {'slot': 'top', 'prop_a': 2, 'prop_b': 0, 'prop_c': 2, 'prop_d': 1 },
 'item_2': {'slot': 'top', 'prop_a': 5, 'prop_b': 0, 'prop_c': 1, 'prop_d':-1 },
 'item_3': {'slot': 'top', 'prop_a': 2, 'prop_b': 5, 'prop_c': 2, 'prop_d':-2 },
 'item_4': {'slot': 'mid', 'prop_a': 5, 'prop_b': 5, 'prop_c':-5, 'prop_d': 0 },
 'item_5': {'slot': 'mid', 'prop_a':10, 'prop_b': 0, 'prop_c':-5, 'prop_d': 0 },
 'item_6': {'slot': 'mid', 'prop_a':-5, 'prop_b': 2, 'prop_c': 3, 'prop_d': 5 },
 'item_7': {'slot': 'bot', 'prop_a': 1, 'prop_b': 3, 'prop_c':-4, 'prop_d': 4 },
 'item_8': {'slot': 'bot', 'prop_a': 2, 'prop_b': 2, 'prop_c': 0, 'prop_d': 0 },
 'item_9': {'slot': 'bot', 'prop_a': 3, 'prop_b': 1, 'prop_c': 4, 'prop_d':-4 },
}

The script would then need to select which combinations from the "item_list" dict that using 1 item per "slot" that would achieve a desired result when added.

For example, if the desired result was: 'prop_a': 3, 'prop_b': 3, 'prop_c': 8, 'prop_d': 0, the script would select 'item_2', 'item_6', and 'item_9', along with any other combination that worked.

'item_2': {'slot': 'top', 'prop_a': 5, 'prop_b': 0, 'prop_c': 1, 'prop_d':-1 }
'item_6': {'slot': 'mid', 'prop_a':-5, 'prop_b': 2, 'prop_c': 3, 'prop_d': 5 }
'item_9': {'slot': 'bot', 'prop_a': 3, 'prop_b': 1, 'prop_c': 4, 'prop_d':-4 }
 'total':                 'prop_a': 3, 'prop_b': 3, 'prop_c': 8, 'prop_d': 0

Any ideas how to accomplish this? It does not need to be in python, or even a thorough script, but just an explanation on how to do this in theory would be enough for me. I have tried working out looping through every combination, but that seems to very quickly get our of hand and unmanageable. The actual script will need to do this for about 1,000 items using 20 different "slots", each with 8 properties.

Thanks for the help!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

段念尘 2024-09-06 08:34:01

由于属性可以同时具有正值和负值,并且您需要所有令人满意的组合,因此我相信不存在可能的“基本”优化——也就是说,没有多项式时间解(假设 P != NP...;-)。所有解决方案都将归结为枚举所有每槽位组合并检查最终结果,可能进行非常小的调整,这可能会为您节省一些百分比的精力,但没什么大不了的。

如果 20 个可能的槽位中有 1000 个项目,假设每个槽位约 50 个项目均匀分布,则总体大约有 50**20 种可能性,即 9536743164062500000000000000000000 -- 大约10**34(无数亿亿亿......)。一般来说,您不能从“所有解决方案搜索”中“修剪”任何子树,因为当您对第一个 20-p 槽进行假设选择时,无论 prop 值如何,仍然存在 < em>可能是从可以满足约束(或者多个)的剩余p个槽中选择的。

如果你能找到一个精确的多项式时间解决方案来解决这个 NP 完全问题,那么你基本上已经彻底改变了现代数学和计算机科学——图灵奖和菲尔德奖只是随后荣誉的开始。这不太可能。

为了解决一个可行的问题,您必须以某些方式放宽您的要求(接受仅找到解决方案子集的可能性,接受概率而不是确定性方法,接受近似解决方案,...)。

一旦你这样做了,一些小的优化可能是有意义的——例如,从对所有属性值和目标求和常量(等于每个属性的最小负值多一)开始,以便每个属性值>。 0 - 现在您可以按(例如)某些属性的值或所有属性的总和对插槽进行排序,并根据以下知识进行一些修剪:向部分假设解决方案添加一个插槽将使每个累积道具值增加至少 X 并且总计至少 Y(因此,如果任一条件使运行总计超过目标,您可以修剪该分支)。一般来说,这种启发式近似不需要使大 O 行为变得更好,但它可以将预期乘数值减少到足以使问题更接近计算上可行。

但是,如果没有可能的需求放宽,那么甚至不值得寻找如此聪明的小技巧:在这种情况下,问题将在计算上保持不可行,因此寻找聪明的小优化实际上不会有成效。

Since the properties can have both positive and negative values, and you need all satisfactory combinations, I believe there is no "essential" optimization possible -- that is, no polynomial-time solution (assuming P != NP...;-). All solutions will come down to enumerating all the one-per-slot combinations and checking the final results, with very minor tweaks possible that may save you some percent effort here or there, but nothing really big.

If you have 1000 items in 20 possible slots, say equally distributed at about 50 items per slot, there are around 50**20 possibilities overall, i.e, 9536743164062500000000000000000000 -- about 10**34 (a myriad billions of billions of billions...). You cannot, in general, "prune" any subtree from the "all-solutions search", because no matter the prop values when you have a hypothetical pick for the first 20-p slots, there still might be a pick of the remaining p slots that could satisfy the constraint (or, more than one).

If you could find an exact polynomial-time solution for this, a NP-complete problem, you'd basically have revolutionized modern mathematics and computer science -- Turing prizes and Field medals would only be the start of the consequent accolades. This is not very likely.

To get down to a feasible problem, you'll have to relax your requirements in some ways (accept the possibility of finding just a subset of the solutions, accept a probabilistic rather than deterministic approach, accept approximate solutions, ...).

Once you do, some small optimizations may make sense -- for example, start with summing constants (equal to one more than the smallest negative value of each propriety) to all the property values and targets, so that every prop value is > 0 -- now you can sort the slots by (e.g.) value for some property, or the sum of all properties, and do some pruning based on the knowledge that adding one more slot to a partial hypothetical solution will increase each cumulative prop value by at least X and the total by at least Y (so you can prune that branch if either condition makes the running totals exceed the target). This kind of heuristic approximation need not make the big-O behavior any better, in general, but it may reduce the expected multiplier value by enough to get the problem closer to being computationally feasible.

But it's not even worth looking for such clever little tricks if there's no requirement relaxation possible: in that case, the problem will stay computationally unfeasible, so looking for clever little optimizations would not be practically productive anyway.

压抑⊿情绪 2024-09-06 08:34:01

这个问题本质上是子集和问题的概括(这是NP完全的,是的) 到多个维度。重述问题(以确保我们解决同一问题):您有 1000 个项目,分为 20 个类(您称之为槽)。对于 8 个属性中的每一个,每个项目都有一个 [-10,10] 范围内的整数值;因此,每个项目可以被认为具有一个 8 维向量的值。您想要从每个槽中挑选一个项目,以便总价值(将这些 8 维向量相加)是给定向量。

在您给出的示例中,您有 4 个维度,3 个类中的 9 个项目具有值 (2,0,2,1)、(5,0,1,-1)、... 等,并且您想要从每个类别中选择一项来得出总和 (3,3,8,0)。正确的?

暴力搜索

首先,是枚举所有可能性的暴力搜索。假设您的 1000 个项目平均分为 20 个类别(因此每个类别有 50 个),则每个类别有 50 个选择,这意味着您必须检查 5020=9536743164062500000000000000000000 个选择(并且对于每个坐标,您需要将 8 个坐标中的每个坐标的 20 个元素相加并进行检查,因此运行时间将为 ∝ 5020·20·8):这是不可行的。

动态编程,单次

然后是动态编程解决方案,它是不同的,并且在实践中通常在暴力破解不可行的情况下起作用,但不幸的是在这种情况下似乎也不可行。 (如果您对“属性值”有更好的限制,您的性能就会呈指数级提高。)这里的想法是跟踪达到每个可能的总和的一种方法。 [-10,10] 中的 20 个数字的总和位于 [-200,200] 内,因此 8 维向量“仅”有 4008=655360000000000000000 个可能的和。 (这只是其他搜索空间的一小部分,但这对您来说并没有什么安慰。对于每个“属性”,您还可以计算[每个类别中的最大项目]和[每个类别中的最小项目]之和之间的差异] 用较小的数字替换 400。)动态规划算法的思想如下。

  • 让last[(a,b,c,d,e,f,g,h)][k]表示您可以从第k个类别中获取的一项(以及前k-1个类别中的每一项) ) 使总和精确为 (a,b,c,d,e,f,g,h)。
    然后,伪代码:

    对于 k=1 到 20:
        对于 k 类中的每个项目 i:
            对于每个向量 v 其中last[v][k-1]不为空:
                最后[v + 值(i)][k] = i
    

然后,如果您想要的最终总和是 s,则从第 k 个类中选择项目last[s][k],从 (k-1 中选择项目last[s-value(i)][k-1] )第 类,依此类推。在最坏的情况下,这需要时间 ∝ 20·50·4008·8(仅是松散的上限,而不是严格的分析)。

动态编程,单独

“完美”解决方案就到此为止了。但是,如果您允许启发式解决方案和那些“最有可能在实践中发挥作用”的解决方案,您可以做得更好(即使是准确地解决问题)。例如,您可以针对 8 个维度中的每一个维度单独解决问题。这个实现起来就更容易了,最坏的情况下只需要时间∝20·50·400·8=3200000,而且很容易就能做到。如果将 last[][] 保留为列表,而不是单个元素,那么最后您(实际上)拥有一个子集列表,这些子集实现了坐标的给定总和(在“乘积”中)形式”)。在实践中,没有多少子集加起来可以精确地达到您想要的总和,因此您可以从子集数量最少的坐标开始,然后尝试每个子集的其他 7 个坐标。这一步的复杂性取决于问题中的数据,但我怀疑(或者人们可以希望)要么(1)具有相等总和的集合非常少,在这种情况下,这个交集会将集合的数量减少到检查,或者(2)将有许多具有给定总和的集合,在这种情况下,您会很早就找到一个。

无论如何,首先对每个坐标单独进行动态编程肯定会让您在第二阶段搜索更小的空间。

近似算法

如果您不需要总和完全相等并且愿意接受在所需总和的某个因子内的总和,则有一个众所周知的想法用于获得 FPTAS(完全子集和问题的多项式时间近似方案),该问题以(项目数等)和 1/ε 的时间多项式运行。我已经花了很多时间来解释这一点,但你可以查一下 - 基本上,它只是将 4008 空间替换为较小的空间,例如将数字四舍五入到最接近的倍数5,或者其他什么。

This problem is essentially a generalization of the subset-sum problem (which is NP-complete, yes) to multiple dimensions. To restate the problem (to make sure we're solving the same problem): you have 1000 items divided into 20 classes (which you call slots). Each item has an integer value in [-10,10] for each of 8 properties; thus each item can be considered to have a value which is an 8-dimensional vector. You want to pick one item from each slot, so that the total value (adding these 8-dimensional vectors) is a given vector.

In the example you gave, you have 4 dimensions, and the 9 items in 3 classes have values (2,0,2,1), (5,0,1,-1), … etc., and you want to pick one item from each class to make the sum (3,3,8,0). Right?

Brute-force

First, there is the brute-force search that enumerates all possibilities. Assuming your 1000 items are divided equally into the 20 classes (so you have 50 in each), you have 50 choices for each class, which means you'd have to check 5020=9536743164062500000000000000000000 choices (and for each of them you need to add up the 20 elements along each of the 8 coordinates and check, so the running time would be ∝ 5020·20·8): this is not feasible.

Dynamic-programming, single-shot

Then there is the dynamic-programming solution, which is different, and in practice often works where brute-force is infeasible, but in this case unfortunately seems infeasible as well. (You'd improve it exponentially if you got better bounds on your "property values".) The idea here is to keep track of one way of reaching each possible sum. The sum of 20 numbers from [-10,10] lies in [-200,200], so there are "only" 4008=655360000000000000000 possible sums for your 8-dimensional vector. (This is a tiny fraction of the other search space, but it's no consolation to you. You can also take, for each "property", the difference between the sums of [largest item in each class] and [smallest item in each class] to replace the 400 with a smaller number.) The idea of the dynamic-programming algorithm is the following.

  • Let last[(a,b,c,d,e,f,g,h)][k] denote one item you can take from the kth class (along with one item each from the first k-1 classes) to make the sum exactly (a,b,c,d,e,f,g,h).
    Then, pseudocode:

    for k=1 to 20:
        for each item i in class k:
            for each vector v for which last[v][k-1] is not null:
                last[v + value(i)][k] = i
    

Then, if your desired final sum is s, you pick item last[s][k] from the kth class, item last[s-value(i)][k-1] from the (k-1)th class, and so on. This takes time ∝ 20·50·4008·8 in the worst case (only a loose upper bound, not a tight analysis).

Dynamic-programming, separately

So much for "perfect" solutions. However, if you allow heuristic solutions and those that "will most likely work in practice", you can do better (even for solving the problem exactly). For instance, you can solve the problem separately for each of the 8 dimensions. This is even easier to implement, takes only time ∝ 20·50·400·8=3200000 in the worst case, and you can do it quite easily. If you keep last[][] as a list, instead of a single element, then at the end you have (effectively) a list of subsets which achieve the given sum for that coordinate (in "product form"). In practice, not many subsets may add up exactly to the sum you want, so you can start with the coordinate for which the number of subsets is smallest, then try each of those subsets for the other 7 coordinates. The complexity of this step depends on the data in the problem, but I suspect (or one can hope) that either (1) there will be very few sets with equal sums, in which case this intersection will whittle down the number of sets to check, or (2) there will be many sets with a given sum, in which case you'll find one quite early.

In any case, doing the dynamic-programming separately for each coordinate first is definitely going to allow you to search over a much smaller space in the second stage.

Approximate algorithms

If you don't need the sums to be exactly equal and will accept sums that are within a certain factor of your required sum, there is a well-known idea used to get an FPTAS (fully polynomial-time approximation scheme) for the subset-sum problem, which runs in time polynomial in (number of items, etc.) and 1/ε. I've exhausted my time to explain this, but you can look it up — basically, it just replaces the 4008 space by a smaller one, by e.g. rounding numbers up to the nearest multiple of 5, or whatever.

葬シ愛 2024-09-06 08:34:01

这听起来像是 背包问题 的变体,通常通过 动态编程

但是,您可能可以使用递归编写一个相当简单的解决方案(但速度较慢):

def GetItemsForSlot(item_list, slot):
  return [ (k,v) for (k,v) in item_list.items() if v['slot'] == slot]

def SubtractWeights(current_weights, item_weights):
  remaining_weights = {}
  for (k,v) in current_weights.items():
      remaining_weights[k] = current_weights[k] - item_weights[k]
  return remaining_weights

def AllWeightsAreZero(remaining_weights):
  return not [v for v in remaining_weights.values() if v != 0]

def choose_items(item_list, remaining_weights, available_slots,
                 accumulated_items=[ ]):
    print "choose_items: ", remaining_weights, available_slots, \
      accumulated_items
    # Base case: we have no more available slots.
    if not available_slots:
      if AllWeightsAreZero(remaining_weights):
          # This is a solution.
          print "SOLUTION FOUND: ", accumulated_items
          return
      else:
          # This had remaining weight, not a solution.
          return

    # Pick the next available_slot
    slot = available_slots[0]
    # Iterate over each item for this slot, checking to see if they're in a
    # solution.
    for name, properties in GetItemsForSlot(item_list, slot):
        choose_items(item_list, # pass the items recursively
                     SubtractWeights(remaining_weights, properties),
                     available_slots[1:], # pass remaining slots
                     accumulated_items + [name]) # Add this item




if __name__ == "__main__":
    total_weights = {
       'prop_a': 3,
       'prop_b': 3,
       'prop_c': 8,
       'prop_d': 0
    }

    choose_items(item_list, total_weights, ["top", "mid", "bot"])

这已经过测试,并且似乎有效。但没有承诺:)

保留插槽和prop_a 作为同一对象的属性使其使用起来有点困难。我建议使用类而不是字典来使代码更容易理解。

This sounds like a variation of the Knapsack problem, which is commonly solved with dynamic programming.

But, you could probably write a fairly simple solution (but slower) using recursion:

def GetItemsForSlot(item_list, slot):
  return [ (k,v) for (k,v) in item_list.items() if v['slot'] == slot]

def SubtractWeights(current_weights, item_weights):
  remaining_weights = {}
  for (k,v) in current_weights.items():
      remaining_weights[k] = current_weights[k] - item_weights[k]
  return remaining_weights

def AllWeightsAreZero(remaining_weights):
  return not [v for v in remaining_weights.values() if v != 0]

def choose_items(item_list, remaining_weights, available_slots,
                 accumulated_items=[ ]):
    print "choose_items: ", remaining_weights, available_slots, \
      accumulated_items
    # Base case: we have no more available slots.
    if not available_slots:
      if AllWeightsAreZero(remaining_weights):
          # This is a solution.
          print "SOLUTION FOUND: ", accumulated_items
          return
      else:
          # This had remaining weight, not a solution.
          return

    # Pick the next available_slot
    slot = available_slots[0]
    # Iterate over each item for this slot, checking to see if they're in a
    # solution.
    for name, properties in GetItemsForSlot(item_list, slot):
        choose_items(item_list, # pass the items recursively
                     SubtractWeights(remaining_weights, properties),
                     available_slots[1:], # pass remaining slots
                     accumulated_items + [name]) # Add this item




if __name__ == "__main__":
    total_weights = {
       'prop_a': 3,
       'prop_b': 3,
       'prop_c': 8,
       'prop_d': 0
    }

    choose_items(item_list, total_weights, ["top", "mid", "bot"])

This was tested, and seemed to work. No promises though :)

Keeping slot & prop_a as properties of the same object made it a little harder to work with. I'd suggest using classes instead of a dictionary to make the code easier to understand.

许仙没带伞 2024-09-06 08:34:01

我尝试过对每个组合进行循环,但这似乎很快就失去了我们的控制力并且难以管理。实际的脚本需要使用 20 个不同的“槽”(每个槽有 8 个属性)对大约 1,000 个项目执行此操作。

首先将结构加载到一个好的对象层次结构中,然后分段解决它可能会帮助您思考。

例子:

class Items(dict):
    def find(self, **clauses):
        # TODO!

class Slots(dict):
    # TODO!

items = Items()
for item, slots in item_list.items():
    items[item] = Slots(slots)
    # consider abstracting out slot based on location (top, mid, bot) too

print items.find(prop_a=3, prop_b=3, prop_c=8, prop_d=0)

I have tried working out looping through every combination, but that seems to very quickly get our of hand and unmanageable. The actual script will need to do this for about 1,000 items using 20 different "slots", each with 8 properties.

It might help your thinking to load the structure in a nice object hierarchy first and then solve it piecewise.

Example:

class Items(dict):
    def find(self, **clauses):
        # TODO!

class Slots(dict):
    # TODO!

items = Items()
for item, slots in item_list.items():
    items[item] = Slots(slots)
    # consider abstracting out slot based on location (top, mid, bot) too

print items.find(prop_a=3, prop_b=3, prop_c=8, prop_d=0)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文