如何处理猜数字游戏(有一些变化)算法?

发布于 2024-12-08 18:55:26 字数 4797 浏览 0 评论 0原文

更新(2020 年 7 月):问题已经有 9 年历史了,但仍然是我非常感兴趣的问题。从那时起,机器学习(RNN、CNN、GANS 等)、新方法和廉价 GPU 的兴起使得启用新方法。我认为重新审视这个问题看看是否有新的方法会很有趣。

我正在学习编程(Python 和算法),并正在尝试从事一个我觉得有趣的项目。我已经创建了一些基本的 Python 脚本,但我不确定如何为我正在尝试构建的游戏提供解决方案。

游戏的运作方式如下:

用户将获得有价值的物品。例如,

Apple = 1
Pears = 2
Oranges  = 3

他们将有机会选择他们喜欢的任何组合(即 100 个苹果、20 个梨和一个橙子)。计算机获得的唯一输出是总价值(在本例中,当前为 143 美元)。计算机将尝试猜测他们拥有什么。显然它在第一回合就无法正确进行。

         Value    quantity(day1)    value(day1)
Apple      1        100                100
Pears      2         20                 40
Orange     3          1                  3
Total               121                143

下一回合,用户可以修改他们的数量,但不能超过总量的 5%(或者我们可以选择的其他百分比。例如,我将使用 5%)。水果的价格可能会发生变化(随机),因此总价值也可能会随之变化(为简单起见,在本例中我没有改变水果价格)。使用上面的示例,在游戏的第 2 天,用户在第 3 天返回值 152 美元和 164 美元。这是一个示例:

Quantity (day2)   %change (day2)    Value (day2)   Quantity (day3)   %change (day3)   Value(day3)
 104                                 104            106                                106
  21                                  42             23                                 46
   2                                   6              4                                 12
 127               4.96%             152            133               4.72%            164

*(我希望表格显示正确,我必须手动将它们隔开,所以希望这不仅仅是在我的屏幕上执行此操作,如果它不起作用,请告诉我,我会尝试上传屏幕截图。)

我试图看看是否可以计算出随着时间的推移数量是多少(假设用户有耐心)继续输入数字)。我知道现在我唯一的限制是总值不能超过 5%,所以我现在的准确度不能在 5% 以内,这样用户将永远输入它。

到目前为止我所做的事情

这是到目前为止我的解决方案(不多)。基本上,我获取所有值并找出它们的所有可能组合(我已完成这部分)。然后我将所有可能的组合放入数据库中作为字典(例如,对于 143 美元,可能有一个字典条目 {apple:143, Pears:0, Oranges :0}..一直到 {apple :0,梨:1,橙子:47}。每次我得到一个新数字时,我都会这样做,这样我就得到了所有可能性的列表,

这就是我在使用上述规则时遇到的问题。最好的解决方案?我想我需要一个适应度函数来自动比较两天的数据并消除前几天数据方差超过 5% 的任何可能性

问题:

所以我的问题是 。用户更改总数并且我有一个所有概率的列表,我应该如何处理这个问题?我需要学习什么?有什么算法或我可以使用的适用理论吗?错误,你能建议我可以添加哪些规则吗这个目标是否可行(如果不是当前状态。我正在考虑添加更多水果并说他们必须至少采摘 3 个,等等)?另外,我对遗传算法只有一个模糊的了解,但我想我可以在这里使用它们,有什么我可以使用的吗?

我非常非常渴望学习,因此任何建议或提示将不胜感激(只是请不要告诉我这个游戏是不可能的)。

更新:得到的反馈表明这很难解决。所以我想我应该在游戏中添加另一个条件,该条件不会干扰玩家正在做的事情(游戏对他们来说保持不变),但每天水果的价值都会改变价格(随机)。这样是不是就更容易解决了呢?因为在 5% 的变动和某些水果价值变化的范围内,随着时间的推移,只有少数组合是可能的。

第一天,一切皆有可能,获得足够接近的范围几乎是不可能的,但随着水果价格的变化,用户只能选择 5% 的变化,那么(随着时间的推移)范围不应该越来越窄。在上面的例子中,如果价格波动足够大,我想我可以暴力破解一个解决方案,给我一个猜测范围,但我试图弄清楚是否有一个更优雅的解决方案或其他解决方案来不断缩小这个范围时间。

UPDATE2:经过阅读和询问,我相信这是一个隐藏的马尔可夫/维特比问题,它跟踪水果价格的变化以及总和(最后一个数据点的权重最重)。但我不确定如何应用这种关系。我认为情况确实如此,并且可能是错误的,但至少我开始怀疑这是某种类型的机器学习问题。

更新 3:我创建了一个测试用例(数字较小)和一个生成器来帮助自动化用户生成的数据,并且我正在尝试从中创建一个图表以查看更有可能发生的情况。

这是代码,以及总价值和对用户实际水果数量的评论。

#!/usr/bin/env python
import itertools

# Fruit price data
fruitPriceDay1 = {'Apple':1, 'Pears':2, 'Oranges':3}
fruitPriceDay2 = {'Apple':2, 'Pears':3, 'Oranges':4}
fruitPriceDay3 = {'Apple':2, 'Pears':4, 'Oranges':5}

# Generate possibilities for testing (warning...will not scale with large numbers)
def possibilityGenerator(target_sum, apple, pears, oranges):
    allDayPossible = {}
    counter = 1
    apple_range = range(0, target_sum + 1, apple)
    pears_range = range(0, target_sum + 1, pears)
    oranges_range = range(0, target_sum + 1, oranges)
    for i, j, k in itertools.product(apple_range, pears_range, oranges_range):
        if i + j + k == target_sum:
            currentPossible = {}

            #print counter
            #print 'Apple', ':', i/apple, ',', 'Pears', ':', j/pears, ',', 'Oranges', ':', k/oranges
            currentPossible['apple'] = i/apple
            currentPossible['pears'] = j/pears
            currentPossible['oranges'] = k/oranges

            #print currentPossible
            allDayPossible[counter] = currentPossible
            counter = counter +1
    return allDayPossible

# Total sum being returned by user for value of fruits
totalSumDay1=26 # Computer does not know this but users quantities are apple: 20, pears 3, oranges 0 at the current prices of the day
totalSumDay2=51 # Computer does not know this but users quantities are apple: 21, pears 3, oranges 0 at the current prices of the day
totalSumDay3=61 # Computer does not know this but users quantities are apple: 20, pears 4, oranges 1 at the current prices of the day
graph = {}
graph['day1'] = possibilityGenerator(totalSumDay1, fruitPriceDay1['Apple'], fruitPriceDay1['Pears'], fruitPriceDay1['Oranges'] )
graph['day2'] = possibilityGenerator(totalSumDay2, fruitPriceDay2['Apple'], fruitPriceDay2['Pears'], fruitPriceDay2['Oranges'] )
graph['day3'] = possibilityGenerator(totalSumDay3, fruitPriceDay3['Apple'], fruitPriceDay3['Pears'], fruitPriceDay3['Oranges'] )

# Sample of dict = 1 : {'oranges': 0, 'apple': 0, 'pears': 0}..70 : {'oranges': 8, 'apple': 26, 'pears': 13}
print graph

Update(July 2020): Question is 9 years old but still one that I'm deeply interested in. In the time since, machine learning(RNN's, CNN's, GANS,etc), new approaches and cheap GPU's have risen that enable new approaches. I thought it would be fun to revisit this question to see if there are new approaches.

I am learning programming (Python and algorithms) and was trying to work on a project that I find interesting. I have created a few basic Python scripts, but I’m not sure how to approach a solution to a game I am trying to build.

Here’s how the game will work:

Users will be given items with a value. For example,

Apple = 1
Pears = 2
Oranges  = 3

They will then get a chance to choose any combo of them they like (i.e. 100 apples, 20 pears, and one orange). The only output the computer gets is the total value (in this example, it's currently $143). The computer will try to guess what they have. Which obviously it won’t be able to get correctly the first turn.

         Value    quantity(day1)    value(day1)
Apple      1        100                100
Pears      2         20                 40
Orange     3          1                  3
Total               121                143

The next turn the user can modify their numbers but no more than 5% of the total quantity (or some other percent we may chose. I’ll use 5% for example.). The prices of fruit can change(at random) so the total value may change based on that also (for simplicity I am not changing fruit prices in this example). Using the above example, on day 2 of the game, the user returns a value of $152 and $164 on day 3. Here's an example:

Quantity (day2)   %change (day2)    Value (day2)   Quantity (day3)   %change (day3)   Value(day3)
 104                                 104            106                                106
  21                                  42             23                                 46
   2                                   6              4                                 12
 127               4.96%             152            133               4.72%            164

*(I hope the tables show up right, I had to manually space them so hopefully it's not just doing it on my screen, if it doesn't work let me know and I'll try to upload a screenshot.)

I am trying to see if I can figure out what the quantities are over time (assuming the user will have the patience to keep entering numbers). I know right now my only restriction is the total value cannot be more than 5% so I cannot be within 5% accuracy right now so the user will be entering it forever.

What I have done so far

Here’s my solution so far (not much). Basically, I take all the values and figure out all the possible combinations of them (I am done this part). Then I take all the possible combos and put them in a database as a dictionary (so for example for $143, there could be a dictionary entry {apple:143, Pears:0, Oranges :0}..all the way to {apple:0, Pears:1, Oranges :47}. I do this each time I get a new number so I have a list of all possibilities.

Here’s where I’m stuck. In using the rules above, how can I figure out the best possible solution? I think I’ll need a fitness function that automatically compares the two days data and removes any possibilities that have more than 5% variance of the previous days data.

Questions:

So my question with user changing the total and me having a list of all the probabilities, how should I approach this? What do I need to learn? Is there any algorithms out there or theories that I can use that are applicable? Or, to help me understand my mistake, can you suggest what rules I can add to make this goal feasible (if it's not in its current state. I was thinking adding more fruits and saying they must pick at least 3, etc..)? Also, I only have a vague understanding of genetic algorithms, but I thought I could use them here, if is there something I can use?

I'm very very eager to learn so any advice or tips would be greatly appreciated (just please don't tell me this game is impossible).

UPDATE: Getting feedback that this is hard to solve. So I thought I'd add another condition to the game that won't interfere with what the player is doing (game stays the same for them) but everyday the value of the fruits change price (randomly). Would that make it easier to solve? Because within a 5% movement and certain fruit value changes, only a few combinations are probable over time.

Day 1, anything is possible and getting a close enough range is almost impossible, but as the prices of fruits change and the user can only choose a 5% change, then shouldn't (over time) the range be narrow and narrow. In the above example, if prices are volatile enough I think I could brute force a solution that gave me a range to guess in, but I'm trying to figure out if there's a more elegant solution or other solutions to keep narrowing this range over time.

UPDATE2: After reading and asking around, I believe this is a hidden Markov/Viterbi problem that tracks the changes in fruit prices as well as total sum (weighting the last data point the heaviest). I'm not sure how to apply the relationship though. I think this is the case and could be wrong but at the least I'm starting to suspect this is a some type of machine learning problem.

Update 3: I am created a test case (with smaller numbers) and a generator to help automate the user generated data and I am trying to create a graph from it to see what's more likely.

Here's the code, along with the total values and comments on what the users actually fruit quantities are.

#!/usr/bin/env python
import itertools

# Fruit price data
fruitPriceDay1 = {'Apple':1, 'Pears':2, 'Oranges':3}
fruitPriceDay2 = {'Apple':2, 'Pears':3, 'Oranges':4}
fruitPriceDay3 = {'Apple':2, 'Pears':4, 'Oranges':5}

# Generate possibilities for testing (warning...will not scale with large numbers)
def possibilityGenerator(target_sum, apple, pears, oranges):
    allDayPossible = {}
    counter = 1
    apple_range = range(0, target_sum + 1, apple)
    pears_range = range(0, target_sum + 1, pears)
    oranges_range = range(0, target_sum + 1, oranges)
    for i, j, k in itertools.product(apple_range, pears_range, oranges_range):
        if i + j + k == target_sum:
            currentPossible = {}

            #print counter
            #print 'Apple', ':', i/apple, ',', 'Pears', ':', j/pears, ',', 'Oranges', ':', k/oranges
            currentPossible['apple'] = i/apple
            currentPossible['pears'] = j/pears
            currentPossible['oranges'] = k/oranges

            #print currentPossible
            allDayPossible[counter] = currentPossible
            counter = counter +1
    return allDayPossible

# Total sum being returned by user for value of fruits
totalSumDay1=26 # Computer does not know this but users quantities are apple: 20, pears 3, oranges 0 at the current prices of the day
totalSumDay2=51 # Computer does not know this but users quantities are apple: 21, pears 3, oranges 0 at the current prices of the day
totalSumDay3=61 # Computer does not know this but users quantities are apple: 20, pears 4, oranges 1 at the current prices of the day
graph = {}
graph['day1'] = possibilityGenerator(totalSumDay1, fruitPriceDay1['Apple'], fruitPriceDay1['Pears'], fruitPriceDay1['Oranges'] )
graph['day2'] = possibilityGenerator(totalSumDay2, fruitPriceDay2['Apple'], fruitPriceDay2['Pears'], fruitPriceDay2['Oranges'] )
graph['day3'] = possibilityGenerator(totalSumDay3, fruitPriceDay3['Apple'], fruitPriceDay3['Pears'], fruitPriceDay3['Oranges'] )

# Sample of dict = 1 : {'oranges': 0, 'apple': 0, 'pears': 0}..70 : {'oranges': 8, 'apple': 26, 'pears': 13}
print graph

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

迟月 2024-12-15 18:55:26

我们将结合图论和概率:

在第一天,构建一组所有可行的解决方案。让我们将解集表示为 A1={a1(1), a1(2),...,a1(n)}。

第二天,您可以再次构建解决方案集 A2。

现在,对于 A2 中的每个元素,您需要检查是否可以从 A1 的每个元素到达它(给定 x% 容差)。如果是这样 - 将 A2(n) 连接到 A1(m)。如果无法从 A1(m) 中的任何节点到达 - 您可以删除该节点。

基本上我们正在构建一个连接的有向无环图。

图中的所有路径都有相同的可能性。只有当从Am到Am+1(从Am中的节点到Am+1中的节点)存在单边时,才能找到精确解。

当然,某些节点比其他节点出现在更多路径中。每个节点的概率可以根据包含该节点的路径数量直接推导出来。

通过为每个节点分配一个权重,该权重等于通往该节点的路径数,无需保留所有历史记录,而只需保留前一天的历史记录。

另外,看看非负-值线性二番图方程 - 我不久前问过的一个问题。接受的答案是枚举每一步中所有组合的好方法。

We'll combine graph-theory and probability:

On the 1st day, build a set of all feasible solutions. Lets denote the solutions set as A1={a1(1), a1(2),...,a1(n)}.

On the second day you can again build the solutions set A2.

Now, for each element in A2, you'll need to check if it can be reached from each element of A1 (given x% tolerance). If so - connect A2(n) to A1(m). If it can't be reached from any node in A1(m) - you can delete this node.

Basically we are building a connected directed acyclic graph.

All paths in the graph are equally likely. You can find an exact solution only when there is a single edge from Am to Am+1 (from a node in Am to a node in Am+1).

Sure, some nodes appear in more paths than other nodes. The probability for each node can be directly deduced based on the number of paths that contains this node.

By assigning a weight to each node, which equals to the number of paths that leads to this node, there is no need to keep all history, but only the previous day.

Also, have a look at non-negative-values linear diphantine equations - A question I asked a while ago. The accepted answer is a great way to enumarte all combos in each step.

无名指的心愿 2024-12-15 18:55:26

免责声明:由于我误读了问题的一些关键部分,因此在暂时删除我的答案并仔细重新阅读问题后,我极大地改变了我的答案。虽然仍然引用类似的主题和算法,但在我尝试自己解决 C# 中的一些问题后,答案得到了很大改善。

好莱坞版本

  • 问题是 动态约束满足问题 (DCSP),约束满足问题(CSP。)
  • 使用 蒙特卡罗 来查找给定日期的潜在解决方案(如果值和数量范围不小)。否则,请使用暴力来找到所有可能的解决方案。
  • 使用约束记录(与 DCSP 相关),级联应用于前几天以限制潜在的解决方案集。
  • 根据概率交叉手指,瞄准并射击(猜测)。
  • (可选)布鲁斯·威利斯获胜。

原始版本

首先,我想说明我在这里看到的两个主要问题:

  1. 可能的解决方案的数量之多。仅知道物品的数量和总价值(例如 3 和 143)将产生很多可能的解决方案。另外,让算法选择有效的解决方案而不不可避免地尝试无效的解决方案(总数不等于 143)并不容易。

  2. 当在给定日期 Di 找到可能的解决方案时,必须找到一种方法,通过 { Di+1 .. Di+n }. 给出的附加信息来消除潜在的解决方案。

让我们为即将到来的示例奠定一些基础:

  • 让我们在整个游戏中保持相同的项目值。它可以是随机的,也可以由用户选择。
  • 可能的项目值被限制在 [1-10] 的非常有限的范围内,其中没有两个项目可以具有相同的值。
  • 任何项目的数量都不能超过 100。这意味着:[0-100]。

为了更容易地解决这个问题我冒昧地改变了一个约束,这使得算法收敛得更快:

  • “总量”规则被这个规则覆盖:你可以添加或删除任意数量的一天内总计 [1-10] 范围内的项目。但是,添加或删除相同数量的项目(总计)不能超过两次。这也使得游戏的最长生命周期为 20 天。

这条规则使我们能够更容易地排除解决方案。并且,在非微小范围内,渲染回溯算法仍然无用,就像你原来的问题和规则一样。

以我的拙见,这条规则并不是游戏的本质,而只是一个促进者,使计算机能够解决问题。

问题 1:寻找潜在的解决方案

对于初学者来说,问题 1 可以使用 Monte 来解决Carlo 算法 寻找一组潜在的解决方案。该技术很简单:为项目值和数量生成随机数(在各自可接受的范围内)。对所需数量的项目重复此过程。验证解决方案是否可接受。这意味着验证项目是否具有不同的值并且总数是否等于我们的目标总数(例如 143)。

虽然此技术具有易于实现的优点,但也有一些缺点:

  • 不能保证用户的解决方案出现在我们的结果中。
  • 还有很多“错过”。例如,考虑到我们的限制,需要大约 3,000,000 次尝试才能找到 1,000 个潜在的解决方案。
  • 这需要很多时间:在我懒惰的笔记本电脑上大约需要 4 到 5 秒。

如何克服这些缺点?好吧...

  • 将范围限制为较小的值并
  • 找到足够数量的潜在解决方案,以便用户的解决方案很有可能出现在您的解决方案集中。
  • 使用启发式方法更容易地找到解决方案(稍后会详细介绍)。

请注意,限制范围越多,蒙特卡罗算法的用处就越小,因为在合理的时间内几乎没有足够的有效解决方案来迭代所有这些解决方案。对于约束 { 3, [1-10], [0-100] } ,大约有 741,000,000 个有效解(不受目标总值的限制)。蒙特卡罗可以在那里使用。对于 { 3, [1-5], [0-10] },只有大约 80,000 个。无需使用蒙特卡罗;暴力 for 循环就可以了。

我相信问题1就是所谓的约束满足问题 (或 CSP。)

问题 2:限制潜在解决方案集

鉴于问题 1 是一个 CSP,我会继续调用 问题2,以及一般问题,动态 CSP(或 DCSP. )

当原始配方
问题以某种方式改变,通常是因为
需要考虑的限制因素会因环境而变化。 DCSP
被视为一系列静态 CSP,每个 CSP 都是
前一个可以添加变量和约束
(限制)或取消(放宽)。

与 CSP 一起使用的一种可能对解决此问题有用的技术称为“约束记录”:

  • 随着环境的每次变化(用户输入 Di+1 的值),查找信息关于新约束:添加-删除约束可能“使用”的数量是多少。
  • 将约束应用到级联中的每一天。连锁反应可能会大大减少可能的解决方案。

为此,您需要每天获得一组新的可能解决方案;使用蛮力或蒙特卡罗。然后,将 Di 的解决方案与 Di-1 进行比较,并仅保留能够继承前几天的解决方案且不违反约束的解决方案。

您可能必须保留哪些解决方案导致其他解决方案的历史记录(可能在有向图中)。约束记录使您能够记住可能的添加-删除数量,并据此拒绝解决方案。

还可以采取许多其他步骤来进一步改进您的解决方案。以下是一些想法:

  • 记录前几天解决方案中发现的项目值组合的约束。立即拒绝其他解决方案(因为项目值不得更改。)您甚至可以使用特定于解决方案的约束为每个现有解决方案找到较小的解决方案集,以尽早拒绝无效的解决方案。
  • 每天生成一些“突变”的完整历史解决方案,以“修复”D1 解决方案集不包含用户解决方案的情况。您可以使用遗传算法根据现有的解决方案集找到突变群体。)
  • 使用启发式方法可以轻松找到解决方案(例如,当找到有效的解决方案时,尝试通过替换周围的数量来找到该解决方案的变体。)
  • 使用行为启发式方法以预测某些用户操作(例如,每个项目的数量相同、极端模式等)。
  • 在用户输入新数量时继续进行一些计算。

考虑到所有这些,尝试找出一个基于解决方案的出现和启发式的排名系统,以确定候选解决方案。

Disclaimer: I changed my answer dramatically after temporarily deleting my answer and re-reading the question carefully as I misread some critical parts of the question. While still referencing similar topics and algorithms, the answer was greatly improved after I attempted to solve some of the problem in C# myself.

Hollywood version

  • The problem is a Dynamic constraint satisfaction problem (DCSP), a variation on Constraint satisfaction problems (CSP.)
  • Use Monte Carlo to find potential solutions for a given day if values and quantity ranges are not tiny. Otherwise, use brute force to find every potential solutions.
  • Use Constraint Recording (related to DCSP), applied in cascade to previous days to restrict the potential solution set.
  • Cross your fingers, aim and shoot (Guess), based on probability.
  • (Optional) Bruce Willis wins.

Original version

First, I would like to state what I see two main problems here:

  1. The sheer number of possible solutions. Knowing only the number of items and the total value, lets say 3 and 143 for example, will yield a lot of possible solutions. Plus, it is not easy to have an algorithm picking valid solution without inevitably trying invalid solutions (total not equal to 143.)

  2. When possible solutions are found for a given day Di, one must find a way to eliminate potential solutions with the added information given by { Di+1 .. Di+n }.

Let's lay down some bases for the upcoming examples:

  • Lets keep the same item values, the whole game. It can either be random or chosen by the user.
  • The possible item values is bound to the very limited range of [1-10], where no two items can have the same value.
  • No item can have a quantity greater than 100. That means: [0-100].

In order to solve this more easily I took the liberty to change one constraint, which makes the algorithm converge faster:

  • The "total quantity" rule is overridden by this rule: You can add or remove any number of items within the [1-10] range, total, in one day. However, you cannot add or remove the same number of items, total, more than twice. This also gives the game a maximum lifecycle of 20 days.

This rule enables us to rule out solutions more easily. And, with non-tiny ranges, renders Backtracking algorithms still useless, just like your original problem and rules.

In my humble opinion, this rule is not the essence of the game but only a facilitator, enabling the computer to solve the problem.

Problem 1: Finding potential solutions

For starters, problem 1. can be solved using a Monte Carlo algorithm to find a set of potential solutions. The technique is simple: Generate random numbers for item values and quantities (within their respective accepted range). Repeat the process for the required number of items. Verify whether or not the solution is acceptable. That means verifying if items have distinct values and the total is equal to our target total (say, 143.)

While this technique has the advantage of being easy to implement it has some drawbacks:

  • The user's solution is not guaranteed to appear in our results.
  • There is a lot of "misses". For instance, it takes more or less 3,000,000 tries to find 1,000 potential solutions given our constraints.
  • It takes a lot of time: around 4 to 5 seconds on my lazy laptop.

How to get around these drawback? Well...

  • Limit the range to smaller values and
  • Find an adequate number of potential solutions so there is a good chance the user's solution appears in your solution set.
  • Use heuristics to find solutions more easily (more on that later.)

Note that the more you restrict the ranges, the less useful while be the Monte Carlo algorithm is, since there will be few enough valid solutions to iterate on them all in reasonable time. For constraints { 3, [1-10], [0-100] } there is around 741,000,000 valid solutions (not constrained to a target total value.) Monte Carlo is usable there. For { 3, [1-5], [0-10] }, there is only around 80,000. No need to use Monte Carlo; brute force for loops will do just fine.

I believe the problem 1 is what you would call a Constraint satisfaction problem (or CSP.)

Problem 2: Restrict the set of potential solutions

Given the fact that problem 1 is a CSP, I would go ahead and call problem 2, and the problem in general, a Dynamic CSP (or DCSP.)

[DCSPs] are useful when the original formulation of a
problem is altered in some way, typically because the set of
constraints to consider evolves because of the environment. DCSPs
are viewed as a sequence of static CSPs, each one a transformation of
the previous one in which variables and constraints can be added
(restriction) or removed (relaxation).

One technique used with CSPs that might be useful to this problem is called Constraint Recording:

  • With each change in the environment (user entered values for Di+1), find information about the new constraint: What are the possibly "used" quantities for the add-remove constraint.
  • Apply the constraint to every preceding day in cascade. Rippling effects might significantly reduce possible solutions.

For this to work, you need to get a new set of possible solutions every day; Use either brute force or Monte Carlo. Then, compare solutions of Di to Di-1 and keep only solutions that can succeed to previous days' solutions without violating constraints.

You will probably have to keep an history of what solutions lead to what other solutions (probably in a directed graph.) Constraint recording enables you to remember possible add-remove quantities and rejects solutions based on that.

There is a lot of other steps that could be taken to further improve your solution. Here are some ideas:

  • Record constraints for item-value combinations found in previous days solutions. Reject other solutions immediately (as item values must not change.) You could even find a smaller solution sets for each existing solution using solution-specific constraints to reject invalid solutions earlier.
  • Generate some "mutant", full-history, solutions each day in order to "repair" the case where the D1 solution set doesn't contain the user's solution. You could use a genetic algorithm to find a mutant population based on an existing solution set.)
  • Use heuristics in order find solutions easily (e.g. when a valid solution is found, try and find variations of this solution by substituting quantities around.)
  • Use behavioral heuristics in order to predict some user actions (e.g. same quantity for every item, extreme patterns, etc.)
  • Keep making some computations while the user is entering new quantities.

Given all of this, try and figure out a ranking system based on occurrence of solutions and heuristics to determine a candidate solution.

冷清清 2024-12-15 18:55:26

这个问题是不可能解决的。

假设您确切地知道项目数量增加的比率,而不仅仅是最大比率是多少。

一个用户有 N 个水果,你有 D 天的猜测时间。

每天你都会得到 N 个新变量,那么总共就有 D*N 个变量。

每天您只能生成两个方程。一个方程是 n_item*price 的总和,另一个方程基于已知的比率。总共最多有 2*D 方程(如果它们都是独立的)。

2*D< N*D 对于所有N> 2

This problem is impossible to solve.

Let's say that you know exactly for what ratio number of items was increased, not just what is the maximum ratio for this.

A user has N fruits and you have D days of guessing.

In each day you get N new variables and then you have in total D*N variables.

For each day you can generate only two equations. One equation is the sum of n_item*price and other is based on a known ratio. In total you have at most 2*D equations if they are all independent.

2*D < N*D for all N > 2

も让我眼熟你 2024-12-15 18:55:26

我写了一个程序来玩这个游戏。当然,我必须使人类方面自动化,但我相信我以这样的方式完成了这一切,当与真人比赛时,我不应该使我的方法无效。

我从机器学习的角度来处理这个问题,并将问题视为隐藏的马尔可夫模型,其中总价格是观察值。我的解决方案是使用粒子过滤器。该解决方案是使用 NumPy 和 SciPy 在 Python 2.7 中编写的。

我在注释中明确地或在代码中隐含地陈述了我所做的任何假设。为了让代码以自动化方式运行,我还设置了一些额外的约束。它并没有特别优化,因为我试图在可理解性而不是速度上犯错误。

每次迭代都会输出当前的真实数量和猜测值。我只是将输出通过管道传输到一个文件,以便我可以轻松查看它。一个有趣的扩展是将输出绘制在 2D(2 个水果)或 3D(3 个水果)图表上。然后您将能够看到粒子过滤器对解决方案的磨练。

更新:

编辑代码以包含调整后更新的参数。包括使用 matplotlib 的绘图调用(通过 pylab)。绘图可以在 Linux-Gnome 上运行,您的情况可能会有所不同。默认 NUM_FRUITS 为 2 以支持绘图。只需注释掉所有 pylab 调用即可删除绘图,并能够将 NUM_FRUITS 更改为任何内容。

很好地估计了由 UnknownQuantities X 价格 = TotalPrice 表示的当前 fxn。在 2D(2 个水果)中,这是一条线,在 3D(3 个水果)中,它是一个平面。对于粒子过滤器来说,数据似乎太少,无法可靠地磨练出正确的数量。在粒子过滤器之上需要更多的智能才能真正汇集历史信息。您可以尝试将粒子滤波器转换为二阶或三阶。

更新 2:

我经常修改我的代码。我尝试了很多事情,现在展示我将要制作的最终程序(开始对这个想法感到厌倦)。

更改:

粒子现在使用浮点数而不是整数。不确定这是否有任何有意义的影响,但这是一个更通用的解决方案。仅在进行猜测时才四舍五入为整数。

绘图将真实数量显示为绿色方块,将当前猜测显示为红色方块。目前相信的粒子显示为蓝点(大小取决于我们相信它们的程度)。这使得很容易看出算法的运行情况。 (绘图也在 Win 7 64 位上进行了测试和工作)。

添加了关闭/打开数量更改和价格更改的参数。当然,两者“关”并不有趣。

它做得非常好,但是,正如已经指出的那样,这是一个非常棘手的问题,因此很难得到确切的答案。关闭 CHANGE_QUANTITIES 会产生最简单的情况。通过在 CHANGE_QUANTITIES 关闭的情况下运行 2 个水果,您可以了解问题的难度。看看它磨练出正确答案的速度有多快,然后看看当你增加水果的数量时它有多难。

您还可以通过保持 CHANGE_QUANTITIES 开启,但将 MAX_QUANTITY_CHANGE 从非常小的值 (.001) 调整为“大”值 (.05) 来了解难度。

它陷入困境的一种情况是维度(一种水果的数量)接近于零。因为它使用粒子的平均值来猜测它总是会偏离零这样的硬边界。

总的来说,这是一个很棒的粒子过滤器教程。


from __future__ import division
import random
import numpy
import scipy.stats
import pylab

# Assume Guesser knows prices and total
# Guesser must determine the quantities

# All of pylab is just for graphing, comment out if undesired
#   Graphing only graphs first 2 FRUITS (first 2 dimensions)

NUM_FRUITS = 3
MAX_QUANTITY_CHANGE = .01 # Maximum percentage change that total quantity of fruit can change per iteration
MAX_QUANTITY = 100 # Bound for the sake of instantiating variables
MIN_QUANTITY_TOTAL = 10 # Prevent degenerate conditions where quantities all hit 0
MAX_FRUIT_PRICE = 1000 # Bound for the sake of instantiating variables
NUM_PARTICLES = 5000
NEW_PARTICLES = 500 # Num new particles to introduce each iteration after guessing
NUM_ITERATIONS = 20 # Max iterations to run
CHANGE_QUANTITIES = True
CHANGE_PRICES = True

'''
  Change individual fruit quantities for a random amount of time
  Never exceed changing fruit quantity by more than MAX_QUANTITY_CHANGE
'''
def updateQuantities(quantities):
  old_total = max(sum(quantities), MIN_QUANTITY_TOTAL)
  new_total = old_total
  max_change = int(old_total * MAX_QUANTITY_CHANGE)

  while random.random() > .005: # Stop Randomly    
    change_index = random.randint(0, len(quantities)-1)
    change_val = random.randint(-1*max_change,max_change)

    if quantities[change_index] + change_val >= 0: # Prevent negative quantities
      quantities[change_index] += change_val
      new_total += change_val

      if abs((new_total / old_total) - 1) > MAX_QUANTITY_CHANGE:
        quantities[change_index] -= change_val # Reverse the change

def totalPrice(prices, quantities):
  return sum(prices*quantities)

def sampleParticleSet(particles, fruit_prices, current_total, num_to_sample):
  # Assign weight to each particle using observation (observation is current_total)
  # Weight is the probability of that particle (guess) given the current observation
  # Determined by looking up the distance from the hyperplane (line, plane, hyperplane) in a
  #   probability density fxn for a normal distribution centered at 0 
  variance = 2
  distances_to_current_hyperplane = [abs(numpy.dot(particle, fruit_prices)-current_total)/numpy.linalg.norm(fruit_prices) for particle in particles]
  weights = numpy.array([scipy.stats.norm.pdf(distances_to_current_hyperplane[p], 0, variance) for p in range(0,NUM_PARTICLES)])

  weight_sum = sum(weights) # No need to normalize, as relative weights are fine, so just sample un-normalized

  # Create new particle set weighted by weights
  belief_particles = []
  belief_weights = []
  for p in range(0, num_to_sample):
    sample = random.uniform(0, weight_sum)
    # sum across weights until we exceed our sample, the weight we just summed is the index of the particle we'll use
    p_sum = 0
    p_i = -1
    while p_sum < sample:
      p_i += 1
      p_sum += weights[p_i]
    belief_particles.append(particles[p_i])
    belief_weights.append(weights[p_i])

  return belief_particles, numpy.array(belief_weights)

'''
  Generates new particles around the equation of the current prices and total (better particle generation than uniformly random)
'''
def generateNewParticles(current_total, fruit_prices, num_to_generate):
  new_particles = []
  max_values = [int(current_total/fruit_prices[n]) for n in range(0,NUM_FRUITS)]
  for p in range(0, num_to_generate):
    new_particle = numpy.array([random.uniform(1,max_values[n]) for n in range(0,NUM_FRUITS)])
    new_particle[-1] = (current_total - sum([new_particle[i]*fruit_prices[i] for i in range(0, NUM_FRUITS-1)])) / fruit_prices[-1]
    new_particles.append(new_particle)
  return new_particles


# Initialize our data structures:
# Represents users first round of quantity selection
fruit_prices = numpy.array([random.randint(1,MAX_FRUIT_PRICE) for n in range(0,NUM_FRUITS)])
fruit_quantities = numpy.array([random.randint(1,MAX_QUANTITY) for n in range(0,NUM_FRUITS)])
current_total = totalPrice(fruit_prices, fruit_quantities)
success = False

particles = generateNewParticles(current_total, fruit_prices, NUM_PARTICLES) #[numpy.array([random.randint(1,MAX_QUANTITY) for n in range(0,NUM_FRUITS)]) for p in range(0,NUM_PARTICLES)]
guess = numpy.average(particles, axis=0)
guess = numpy.array([int(round(guess[n])) for n in range(0,NUM_FRUITS)])

print "Truth:", str(fruit_quantities)
print "Guess:", str(guess)

pylab.ion()
pylab.draw()
pylab.scatter([p[0] for p in particles], [p[1] for p in particles])
pylab.scatter([fruit_quantities[0]], [fruit_quantities[1]], s=150, c='g', marker='s')
pylab.scatter([guess[0]], [guess[1]], s=150, c='r', marker='s')
pylab.xlim(0, MAX_QUANTITY)
pylab.ylim(0, MAX_QUANTITY)
pylab.draw()

if not (guess == fruit_quantities).all():
  for i in range(0,NUM_ITERATIONS):
    print "------------------------", i

    if CHANGE_PRICES:
      fruit_prices = numpy.array([random.randint(1,MAX_FRUIT_PRICE) for n in range(0,NUM_FRUITS)])

    if CHANGE_QUANTITIES:
      updateQuantities(fruit_quantities)
      map(updateQuantities, particles) # Particle Filter Prediction

    print "Truth:", str(fruit_quantities)
    current_total = totalPrice(fruit_prices, fruit_quantities)

    # Guesser's Turn - Particle Filter:
    # Prediction done above if CHANGE_QUANTITIES is True

    # Update
    belief_particles, belief_weights = sampleParticleSet(particles, fruit_prices, current_total, NUM_PARTICLES-NEW_PARTICLES)
    new_particles = generateNewParticles(current_total, fruit_prices, NEW_PARTICLES)

    # Make a guess:
    guess = numpy.average(belief_particles, axis=0, weights=belief_weights) # Could optimize here by removing outliers or try using median
    guess = numpy.array([int(round(guess[n])) for n in range(0,NUM_FRUITS)]) # convert to integers
    print "Guess:", str(guess)

    pylab.cla()
    #pylab.scatter([p[0] for p in new_particles], [p[1] for p in new_particles], c='y') # Plot new particles
    pylab.scatter([p[0] for p in belief_particles], [p[1] for p in belief_particles], s=belief_weights*50) # Plot current particles
    pylab.scatter([fruit_quantities[0]], [fruit_quantities[1]], s=150, c='g', marker='s') # Plot truth
    pylab.scatter([guess[0]], [guess[1]], s=150, c='r', marker='s') # Plot current guess
    pylab.xlim(0, MAX_QUANTITY)
    pylab.ylim(0, MAX_QUANTITY)
    pylab.draw()

    if (guess == fruit_quantities).all():
      success = True
      break

    # Attach new particles to existing particles for next run:
    belief_particles.extend(new_particles)
    particles = belief_particles
else:
  success = True

if success:
  print "Correct Quantities guessed"
else:
  print "Unable to get correct answer within", NUM_ITERATIONS, "iterations"

pylab.ioff()
pylab.show()

I wrote a program to play the game. Of course, I had to automate the human side, but I believe I did it all in such a way that I shouldn't invalidate my approach when played against a real human.

I approached this from a machine learning perspective and treated the problem as a hidden markov model where the total price was the observation. My solution is to use a particle filter. This solution is written in Python 2.7 using NumPy and SciPy.

I stated any assumptions I made either explicitly in the comments or implicitly in the code. I also set some additional constraints for the sake of getting code to run in an automated fashion. It's not particularly optimized as I tried to err on the side comprehensibility rather than speed.

Each iteration outputs the current true quantities and the guess. I just pipe the output to a file so I can review it easily. An interesting extension would be to plot the output on a graph either 2D (for 2 fruits) or 3D (for 3 fruits). Then you would be able to see the particle filter hone in on the solution.

Update:

Edited the code to include updated parameters after tweaking. Included plotting calls using matplotlib (via pylab). Plotting works on Linux-Gnome, your mileage may vary. Defaulted NUM_FRUITS to 2 for plotting support. Just comment out all the pylab calls to remove plotting and be able to change NUM_FRUITS to anything.

Does a good job estimating the current fxn represented by UnknownQuantities X Prices = TotalPrice. In 2D (2 Fruits) this is a line, in 3D (3 Fruits) it'd be a plane. Seems to be too little data for the particle filter to reliably hone in on the correct quantities. Need a little more smarts on top of the particle filter to really bring together the historical information. You could try converting the particle filter to 2nd- or 3rd-order.

Update 2:

I've been playing around with my code, a lot. I tried a bunch of things and now present the final program that I'll be making (starting to burn out on this idea).

Changes:

The particles now use floating points rather than integers. Not sure if this had any meaningful effect, but it is a more general solution. Rounding to integers is done only when making a guess.

Plotting shows true quantities as green square and current guess as red square. Currently believed particles shown as blue dots (sized by how much we believe them). This makes it really easy to see how well the algorithm is working. (Plotting also tested and working on Win 7 64-bit).

Added parameters for turning off/on quantity changing and price changing. Of course, both 'off' is not interesting.

It does a pretty dang good job, but, as has been noted, it's a really tough problem, so getting the exact answer is hard. Turning off CHANGE_QUANTITIES produces the simplest case. You can get an appreciation for the difficulty of the problem by running with 2 fruits with CHANGE_QUANTITIES off. See how quickly it hones in on the correct answer then see how harder it is as you increase the number of fruit.

You can also get a perspective on the difficulty by keeping CHANGE_QUANTITIES on, but adjusting the MAX_QUANTITY_CHANGE from very small values (.001) to "large" values (.05).

One situation where it struggles is if on dimension (one fruit quantity) gets close to zero. Because it's using an average of particles to guess it will always skew away from a hard boundary like zero.

In general this makes a great particle filter tutorial.


from __future__ import division
import random
import numpy
import scipy.stats
import pylab

# Assume Guesser knows prices and total
# Guesser must determine the quantities

# All of pylab is just for graphing, comment out if undesired
#   Graphing only graphs first 2 FRUITS (first 2 dimensions)

NUM_FRUITS = 3
MAX_QUANTITY_CHANGE = .01 # Maximum percentage change that total quantity of fruit can change per iteration
MAX_QUANTITY = 100 # Bound for the sake of instantiating variables
MIN_QUANTITY_TOTAL = 10 # Prevent degenerate conditions where quantities all hit 0
MAX_FRUIT_PRICE = 1000 # Bound for the sake of instantiating variables
NUM_PARTICLES = 5000
NEW_PARTICLES = 500 # Num new particles to introduce each iteration after guessing
NUM_ITERATIONS = 20 # Max iterations to run
CHANGE_QUANTITIES = True
CHANGE_PRICES = True

'''
  Change individual fruit quantities for a random amount of time
  Never exceed changing fruit quantity by more than MAX_QUANTITY_CHANGE
'''
def updateQuantities(quantities):
  old_total = max(sum(quantities), MIN_QUANTITY_TOTAL)
  new_total = old_total
  max_change = int(old_total * MAX_QUANTITY_CHANGE)

  while random.random() > .005: # Stop Randomly    
    change_index = random.randint(0, len(quantities)-1)
    change_val = random.randint(-1*max_change,max_change)

    if quantities[change_index] + change_val >= 0: # Prevent negative quantities
      quantities[change_index] += change_val
      new_total += change_val

      if abs((new_total / old_total) - 1) > MAX_QUANTITY_CHANGE:
        quantities[change_index] -= change_val # Reverse the change

def totalPrice(prices, quantities):
  return sum(prices*quantities)

def sampleParticleSet(particles, fruit_prices, current_total, num_to_sample):
  # Assign weight to each particle using observation (observation is current_total)
  # Weight is the probability of that particle (guess) given the current observation
  # Determined by looking up the distance from the hyperplane (line, plane, hyperplane) in a
  #   probability density fxn for a normal distribution centered at 0 
  variance = 2
  distances_to_current_hyperplane = [abs(numpy.dot(particle, fruit_prices)-current_total)/numpy.linalg.norm(fruit_prices) for particle in particles]
  weights = numpy.array([scipy.stats.norm.pdf(distances_to_current_hyperplane[p], 0, variance) for p in range(0,NUM_PARTICLES)])

  weight_sum = sum(weights) # No need to normalize, as relative weights are fine, so just sample un-normalized

  # Create new particle set weighted by weights
  belief_particles = []
  belief_weights = []
  for p in range(0, num_to_sample):
    sample = random.uniform(0, weight_sum)
    # sum across weights until we exceed our sample, the weight we just summed is the index of the particle we'll use
    p_sum = 0
    p_i = -1
    while p_sum < sample:
      p_i += 1
      p_sum += weights[p_i]
    belief_particles.append(particles[p_i])
    belief_weights.append(weights[p_i])

  return belief_particles, numpy.array(belief_weights)

'''
  Generates new particles around the equation of the current prices and total (better particle generation than uniformly random)
'''
def generateNewParticles(current_total, fruit_prices, num_to_generate):
  new_particles = []
  max_values = [int(current_total/fruit_prices[n]) for n in range(0,NUM_FRUITS)]
  for p in range(0, num_to_generate):
    new_particle = numpy.array([random.uniform(1,max_values[n]) for n in range(0,NUM_FRUITS)])
    new_particle[-1] = (current_total - sum([new_particle[i]*fruit_prices[i] for i in range(0, NUM_FRUITS-1)])) / fruit_prices[-1]
    new_particles.append(new_particle)
  return new_particles


# Initialize our data structures:
# Represents users first round of quantity selection
fruit_prices = numpy.array([random.randint(1,MAX_FRUIT_PRICE) for n in range(0,NUM_FRUITS)])
fruit_quantities = numpy.array([random.randint(1,MAX_QUANTITY) for n in range(0,NUM_FRUITS)])
current_total = totalPrice(fruit_prices, fruit_quantities)
success = False

particles = generateNewParticles(current_total, fruit_prices, NUM_PARTICLES) #[numpy.array([random.randint(1,MAX_QUANTITY) for n in range(0,NUM_FRUITS)]) for p in range(0,NUM_PARTICLES)]
guess = numpy.average(particles, axis=0)
guess = numpy.array([int(round(guess[n])) for n in range(0,NUM_FRUITS)])

print "Truth:", str(fruit_quantities)
print "Guess:", str(guess)

pylab.ion()
pylab.draw()
pylab.scatter([p[0] for p in particles], [p[1] for p in particles])
pylab.scatter([fruit_quantities[0]], [fruit_quantities[1]], s=150, c='g', marker='s')
pylab.scatter([guess[0]], [guess[1]], s=150, c='r', marker='s')
pylab.xlim(0, MAX_QUANTITY)
pylab.ylim(0, MAX_QUANTITY)
pylab.draw()

if not (guess == fruit_quantities).all():
  for i in range(0,NUM_ITERATIONS):
    print "------------------------", i

    if CHANGE_PRICES:
      fruit_prices = numpy.array([random.randint(1,MAX_FRUIT_PRICE) for n in range(0,NUM_FRUITS)])

    if CHANGE_QUANTITIES:
      updateQuantities(fruit_quantities)
      map(updateQuantities, particles) # Particle Filter Prediction

    print "Truth:", str(fruit_quantities)
    current_total = totalPrice(fruit_prices, fruit_quantities)

    # Guesser's Turn - Particle Filter:
    # Prediction done above if CHANGE_QUANTITIES is True

    # Update
    belief_particles, belief_weights = sampleParticleSet(particles, fruit_prices, current_total, NUM_PARTICLES-NEW_PARTICLES)
    new_particles = generateNewParticles(current_total, fruit_prices, NEW_PARTICLES)

    # Make a guess:
    guess = numpy.average(belief_particles, axis=0, weights=belief_weights) # Could optimize here by removing outliers or try using median
    guess = numpy.array([int(round(guess[n])) for n in range(0,NUM_FRUITS)]) # convert to integers
    print "Guess:", str(guess)

    pylab.cla()
    #pylab.scatter([p[0] for p in new_particles], [p[1] for p in new_particles], c='y') # Plot new particles
    pylab.scatter([p[0] for p in belief_particles], [p[1] for p in belief_particles], s=belief_weights*50) # Plot current particles
    pylab.scatter([fruit_quantities[0]], [fruit_quantities[1]], s=150, c='g', marker='s') # Plot truth
    pylab.scatter([guess[0]], [guess[1]], s=150, c='r', marker='s') # Plot current guess
    pylab.xlim(0, MAX_QUANTITY)
    pylab.ylim(0, MAX_QUANTITY)
    pylab.draw()

    if (guess == fruit_quantities).all():
      success = True
      break

    # Attach new particles to existing particles for next run:
    belief_particles.extend(new_particles)
    particles = belief_particles
else:
  success = True

if success:
  print "Correct Quantities guessed"
else:
  print "Unable to get correct answer within", NUM_ITERATIONS, "iterations"

pylab.ioff()
pylab.show()
烟燃烟灭 2024-12-15 18:55:26

对于你的初始规则:

从我的学生时代开始,我会说,如果我们对 5% 的变化进行抽象,我们每天都会有一个包含三个未知值的方程(抱歉,我不懂数学)英语词汇),与前一天的值相同。
在第 3 天,您有三个方程、三个未知值,并且解应该是直接的。

我想如果这三个元素的值足够不同,那么每天 5% 的变化可能会被忘记,因为正如您所说,我们将使用近似值并对数字进行四舍五入。

对于您的适应规则:

在这种情况下,有太多的未知数和变化的值,因此我知道没有直接的解决方案。在这一点上我相信利奥尔;他的做法看起来不错! (如果您的价格和数量范围有限。)

For your initial rules:

From my school years, I would say that if we make an abstraction of the 5% changes, we have everyday an equation with three unknown values (sorry I don't know the maths vocabulary in English), which are the same values as previous day.
At day 3, you have three equations, three unknown values, and the solution should be direct.

I guess the 5% change each day may be forgotten if the values of the three elements are different enough, because, as you said, we will use approximations and round the numbers.

For your adapted rules:

Too many unknowns - and changing - values in this case, so there is no direct solution I know of. I would trust Lior on this; his approach looks fine! (If you have a limited range for prices and quantities.)

流年已逝 2024-12-15 18:55:26

我意识到我的答案变得相当冗长,所以我将代码移到了顶部(这可能是大多数人感兴趣的)。下面有两件事:

  1. 解释为什么(深度)神经网络不是解决这个问题的好方法,以及
  2. 解释为什么我们不能根据给定的信息唯一地确定人类的选择。

对于对任一主题感兴趣的人,请参阅下文。对于其他人,这里是代码。


找到所有可能解决方案的代码

正如我在答案中进一步解释的那样,您的问题尚未确定。一般情况下,有多种可能的解决方案,并且随着天数的增加,这个数字至少呈指数增长。对于原始问题和扩展问题都是如此。尽管如此,我们可以(某种程度上)有效地找到所有解决方案(这是 NP 难的,所以不要期望太多)。

回溯(来自 20 世纪 60 年代,所以不完全是现代的)是这里选择的算法。在Python中,我们可以将其编写为递归生成器,这实际上非常优雅:

def backtrack(pos, daily_total, daily_item_value, allowed_change, iterator_bounds, history=None):
    if pos == len(daily_total):
        yield np.array(history)
        return

    it = [range(start, stop, step) for start, stop, step in iterator_bounds[pos][:-1]]
    for partial_basket in product(*it):
        if history is None:
            history = [partial_basket]
        else:
            history.append(partial_basket)

        # ensure we only check items that match the total basket value
        # for that day
        partial_value = np.sum(np.array(partial_basket) * daily_item_value[pos, :-1])
        if (daily_total[pos] - partial_value) % daily_item_value[pos, -1] != 0:
            history.pop()
            continue

        last_item = (daily_total[pos] - partial_value) // daily_item_value[pos, -1]
        if last_item < 0:
            history.pop()
            continue

        basket = np.array([*partial_basket] + [int(last_item)])
        basket_value = np.sum(basket * daily_item_value[pos])
        history[-1] = basket
        if len(history) > 1:
            # ensure that today's basket stays within yesterday's range
            previous_basket = history[-2]
            previous_basket_count = np.sum(previous_basket)
            current_basket_count = np.sum(basket)
            if (np.abs(current_basket_count - previous_basket_count) > allowed_change * previous_basket_count):
                history.pop()
                continue

        yield from backtrack(pos + 1, daily_total, daily_item_value, allowed_change, iterator_bounds, history)
        history.pop()

这种方法本质上将所有可能的候选者构造成一棵大树,然后在违反约束时通过剪枝执行深度优先搜索。每当遇到叶节点时,我们都会产生结果。

树搜索(通常)可以并行化,但这超出了本文的范围。如果没有太多额外的洞察力,它将使解决方案的可读性降低。这同样适用于减少代码的常量开销,例如,将约束 if ...: continue 放入 iterator_bounds 变量中并进行更少的检查。

我将完整的代码示例(包括游戏人性方面的模拟器)放在这个答案的底部。


解决这个问题的现代机器学习

这个问题已有 9 年历史,但仍然是我非常感兴趣的问题。从那时起,机器学习(RNN、CNN、GANS 等)、新方法和廉价 GPU 的兴起,使新方法成为可能。我认为重新审视这个问题看看是否有新的方法会很有趣。

我真的很喜欢你对深度神经网络世界的热情;不幸的是,由于以下几个原因,它们在这里不适用:(

  1. 精确性)如果您需要精确解决方案,例如您的游戏,神经网络无法提供。
  2. 整数约束)目前占主导地位的神经网络训练方法是基于梯度下降的,因此问题必须是可微的,或者您需要能够以可微的方式重新表述它;将自己限制在整数上会将 GD 方法扼杀在摇篮中。您可以尝试进化算法来搜索参数化。这确实存在,但这些方法目前还不太成熟。
  3. 非凸性)在典型的公式中,训练神经网络是一种局部方法,这意味着如果您的算法收敛,您将恰好找到 1 个(局部最优)解决方案。在一般情况下,您的游戏对于原始版本和扩展版本都有许多可能的解决方案。这不仅意味着——平均而言——你无法弄清楚人类的选择(篮子),而且还意味着你无法控制神经网络会找到众多解决方案中的哪一个。当前的 NN 成功案例也遭受着同样的命运,但他们往往并不真正关心,因为他们只想要一些解决方案而不是特定的解决方案。一些还可以的解决方案胜过完全没有解决方案。
  4. 专家领域知识)对于这款游戏,您拥有大量可以用来改进优化/学习的领域知识。充分利用神经网络中的任意领域知识并非易事,对于这个游戏来说,构建自定义机器学习模型(而不是神经网络)会更容易、更高效。

为什么游戏无法唯一解决 - 第 1 部分

让我们首先考虑替代问题并提升整数要求,即篮子(人类为给定的 N 个水果选择)天)可以有分数水果(0.3 个橙子)。

总价值约束np.dot(basket, daily_price) ==total_value限制了篮子的可能解决方案;它把问题减少了一维。自由选择N-1个水果的数量,你总能找到第N个水果的值来满足约束。所以虽然看起来一天有 N 个选择,但实际上我们可以自由做出的只有 N-1 个,最后一个就完全确定了根据我们之前的选择。因此,对于游戏进行的每一天,我们都需要估计额外的 N-1 个选择/变量。

我们可能想要强制所有选择都大于 0,但这只会减少我们可以选择数字的间隔;任何实数的开区间都包含无限多个数字,因此我们永远不会因此而耗尽选择。还有 N-1 个选择要做。

两天之间,篮子总成交量 np.sum(basket) 最多只变化前一天的 some_percent,即 np.abs(np.sum (previous_basket) - np.sum(basket)) <= some_percent * np.sum(previous_basket)。我们在某一天可以做出的一些选择将使购物篮的变化超过前一天的 some_percent。为了确保我们永远不会违反这一点,我们可以自由地做出 N-2 个选择,然后必须选择第 N-1 个变量,以便添加它并添加 < code>N - 变量(根据我们之前的选择固定)保持在 some_percent 范围内。 (注意:这是一个不等式约束,因此,如果我们具有相等性,则只会减少选择的数量,即,篮子恰好改变some_percent。在优化理论中,这称为主动约束.)

我们可以再次考虑所有选择都应该大于 0 的约束,但争论仍然是这只是改变了我们现在可以自由选择 N-2 变量的区间。

因此,在 D 天之后,我们只剩下 N-1 个选择来从第一天开始进行估算(无变化约束)和 (D-1)*(N- 2) 选择对接下来的每一天进行估计。不幸的是,我们超出了进一步减少这个数字的限制,并且未知数的数量每天至少增长 N-2。这本质上就是 Luka Rahne 的“2*D < N*D for all N > 2”的含义。我们可能会找到许多可能性相同的候选者。

每天确切的食品价格对此并不重要。只要它们具有一定的价值,它们就会限制其中一种选择。因此,如果您按照指定的方式扩展游戏,那么总是有机会产生无限多个解决方案;无论天数。


为什么游戏仍然无法唯一解决 - 第 2 部分

我们没有考虑一个可能有助于解决此问题的约束:仅允许选择整数解。整数约束的问题在于处理起来非常复杂。然而,我们这里主要关心的是添加这个约束是否能让我们在足够的时间内唯一地解决问题。对于这一点,有一个相当直观的反例。假设您有连续 3 天,并且对于第 1 天和第 3 天,总值限制只允许购买一个篮子。换句话说,我们知道第一天和第三天的篮子,但不知道第二天的篮子。在这里,我们只知道它的总价值,它在some_percent之内第 1 天和第 3 天在第 2 天的 some_percent 之内。这些信息是否足以计算出第 2 天购物篮中的物品?

some_percent = 0.05
Day 1: basket: [3 2]  prices: [10 7]  total_value: 44
Day 2: basket: [x y]  prices: [5  5]  total_value: 25
Day 3: basket: [2 3]  prices: [9  5]  total_value: 33

Possible Solutions Day 2: [2 3], [3 2]

上面是一个例子,由于总价值的限制,我们知道两天的价值,但这仍然不允许我们计算出第 2 天篮子的确切组成。因此,虽然在某些情况下可以解决这个问题,但一般来说这是不可能的。在第 3 天之后添加更多天数根本无助于计算第 2 天。它可能有助于缩小第 3 天的选项范围(这将缩小第 2 天的选项范围),但我们已经只剩下第 3 天的 1 个选择,所以它没有用。


完整代码

import numpy as np
from itertools import product
import tqdm


def sample_uniform(n, r):
    # check out: http://compneuro.uwaterloo.ca/files/publications/voelker.2017.pdf
    sample = np.random.rand(n + 2)
    sample_norm = np.linalg.norm(sample)
    unit_sample = (sample / sample_norm)
    change = np.floor(r * unit_sample[:-2]).astype(np.int)
    return change


def human(num_fruits, allowed_change=0.05, current_distribution=None):
    allowed_change = 0.05
    if current_distribution is None:
        current_distribution = np.random.randint(1, 50, size=num_fruits)
    yield current_distribution.copy()

    # rejection sample a suitable change
    while True:
        current_total = np.sum(current_distribution)
        maximum_change = np.floor(allowed_change * current_total)

        change = sample_uniform(num_fruits, maximum_change)
        while np.sum(change) > maximum_change:
            change = sample_uniform(num_fruits, maximum_change)

        current_distribution += change
        yield current_distribution.copy()


def prices(num_fruits, alter_prices=False):
    current_prices = np.random.randint(1, 10, size=num_fruits)
    while True:
        yield current_prices.copy()
        if alter_prices:
            current_prices = np.random.randint(1, 10, size=num_fruits)


def play_game(num_days, num_fruits=3, alter_prices=False):
    human_choice = human(num_fruits)
    price_development = prices(num_fruits, alter_prices=alter_prices)

    history = {
        "basket": list(),
        "prices": list(),
        "total": list()
    }
    for day in range(num_days):
        choice = next(human_choice)
        price = next(price_development)
        total_price = np.sum(choice * price)

        history["basket"].append(choice)
        history["prices"].append(price)
        history["total"].append(total_price)

    return history


def backtrack(pos, daily_total, daily_item_value, allowed_change, iterator_bounds, history=None):
    if pos == len(daily_total):
        yield np.array(history)
        return

    it = [range(start, stop, step) for start, stop, step in iterator_bounds[pos][:-1]]
    for partial_basket in product(*it):
        if history is None:
            history = [partial_basket]
        else:
            history.append(partial_basket)

        # ensure we only check items that match the total basket value
        # for that day
        partial_value = np.sum(np.array(partial_basket) * daily_item_value[pos, :-1])
        if (daily_total[pos] - partial_value) % daily_item_value[pos, -1] != 0:
            history.pop()
            continue

        last_item = (daily_total[pos] - partial_value) // daily_item_value[pos, -1]
        if last_item < 0:
            history.pop()
            continue

        basket = np.array([*partial_basket] + [int(last_item)])
        basket_value = np.sum(basket * daily_item_value[pos])
        history[-1] = basket
        if len(history) > 1:
            # ensure that today's basket stays within relative tolerance
            previous_basket = history[-2]
            previous_basket_count = np.sum(previous_basket)
            current_basket_count = np.sum(basket)
            if (np.abs(current_basket_count - previous_basket_count) > allowed_change * previous_basket_count):
                history.pop()
                continue

        yield from backtrack(pos + 1, daily_total, daily_item_value, allowed_change, iterator_bounds, history)
        history.pop()


if __name__ == "__main__":
    np.random.seed(1337)
    num_fruits = 3
    allowed_change = 0.05
    alter_prices = False
    history = play_game(15, num_fruits=num_fruits, alter_prices=alter_prices)

    total_price = np.stack(history["total"]).astype(np.int)
    daily_price = np.stack(history["prices"]).astype(np.int)
    basket = np.stack(history["basket"]).astype(np.int)

    maximum_fruits = np.floor(total_price[:, np.newaxis] / daily_price).astype(np.int)
    iterator_bounds = [[[0, maximum_fruits[pos, fruit], 1] for fruit in range(num_fruits)] for pos in range(len(basket))]
    # iterator_bounds = np.array(iterator_bounds)
    # import pdb; pdb.set_trace()

    pbar = tqdm.tqdm(backtrack(0, total_price,
                               daily_price, allowed_change, iterator_bounds), desc="Found Solutions")
    for solution in pbar:
        # test price guess
        calculated_price = np.sum(np.stack(solution) * daily_price, axis=1)
        assert np.all(calculated_price == total_price)

        # test basket change constraint
        change = np.sum(np.diff(solution, axis=0), axis=1)
        max_change = np.sum(solution[:-1, ...], axis=1) * allowed_change
        assert np.all(change <= max_change)

        # indicate that we found the original solution
        if not np.any(solution - basket):
            pbar.set_description("Found Solutions (includes original)")

I realized that my answer was getting quite lengthy, so I moved the code to the top (which is probably what most people are interested in). Below it there are two things:

  1. an explanation why (deep) neural networks are not a good approach to this problem, and
  2. an explanation why we can't uniquely determine the human's choices with the given information.

For those of you interested in either topic, please see below. For the rest of you, here is the code.


Code that finds all possible solutions

As I explain further down in the answer, your problem is under-determined. In the average case, there are many possible solutions, and this number grows at least exponentially as the number of days increases. This is true for both, the original and the extended problem. Nevertheless, we can (sort of) efficiently find all solutions (it's NP hard, so don't expect too much).

Backtracking (from the 1960s, so not exactly modern) is the algorithm of choice here. In python, we can write it as a recursive generator, which is actually quite elegant:

def backtrack(pos, daily_total, daily_item_value, allowed_change, iterator_bounds, history=None):
    if pos == len(daily_total):
        yield np.array(history)
        return

    it = [range(start, stop, step) for start, stop, step in iterator_bounds[pos][:-1]]
    for partial_basket in product(*it):
        if history is None:
            history = [partial_basket]
        else:
            history.append(partial_basket)

        # ensure we only check items that match the total basket value
        # for that day
        partial_value = np.sum(np.array(partial_basket) * daily_item_value[pos, :-1])
        if (daily_total[pos] - partial_value) % daily_item_value[pos, -1] != 0:
            history.pop()
            continue

        last_item = (daily_total[pos] - partial_value) // daily_item_value[pos, -1]
        if last_item < 0:
            history.pop()
            continue

        basket = np.array([*partial_basket] + [int(last_item)])
        basket_value = np.sum(basket * daily_item_value[pos])
        history[-1] = basket
        if len(history) > 1:
            # ensure that today's basket stays within yesterday's range
            previous_basket = history[-2]
            previous_basket_count = np.sum(previous_basket)
            current_basket_count = np.sum(basket)
            if (np.abs(current_basket_count - previous_basket_count) > allowed_change * previous_basket_count):
                history.pop()
                continue

        yield from backtrack(pos + 1, daily_total, daily_item_value, allowed_change, iterator_bounds, history)
        history.pop()

This approach essentially structures all possible candidates into a large tree and then performs depth first search with pruning whenever a constraint is violated. Whenever a leaf node is encountered, we yield the result.

Tree search (in general) can be parallelized, but that is out of scope here. It will make the solution less readable without much additional insight. The same goes for reducing constant overhead of the code, e.g., working the constraints if ...: continue into the iterator_bounds variable and do less checks.

I put the full code example (including a simulator for the human side of the game) at the bottom of this answer.


Modern Machine Learning for this problem

Question is 9 years old but still one that I'm deeply interested in. In the time since, machine learning(RNN's, CNN's, GANS,etc), new approaches and cheap GPU's have risen that enable new approaches. I thought it would be fun to revisit this question to see if there are new approaches.

I really like your enthusiasm for the world of deep neural networks; unfortunately they simply do not apply here for a few reasons:

  1. (Exactness) If you need an exact solution, like for your game, NNs can't provide that.
  2. (Integer Constraint) The currently dominant NN training methods are gradient descent based, so the problem has to be differentiable or you need to be able to reformulate it in such a way that it becomes differentiable; constraining yourself to integers kills GD methods in the cradle. You could try evolutionary algorithms to search for a parameterization. This does exist, but those methods are currently a lot less established.
  3. (Non-Convexity) In the typical formulation, training a NN is a local method, which means you will find exactly 1 (locally optimal) solution if your algorithm is converging. In the average case, your game has many possible solutions for both the original and extended version. This not only means that - on average - you can't figure out the human's choice (basket), but also that you have no control over which of the many solutions the NN will find. Current NN success stories suffer the same fate, but they tend to don't really care, because they only want some solution instead of a specific one. Some okay-ish solution beats the hell out of no solution at all.
  4. (Expert Domain Knowledge) For this game, you have a lot of domain knowledge that can be exploited to improve the optimization/learning. Taking full advantage of arbitrary domain knowledge in NNs is not trivial and for this game building a custom ML model (not a neural network) would be easier and more efficient.

Why the game can not be uniquely solved - Part 1

Let's consider a substitute problem first and lift the integer requirement, i.e., the basket (human choice of N fruits for a given day) can have fractional fruits (0.3 oranges).

The total value constraint np.dot(basket, daily_price) == total_value limits the possible solutions for the basket; it reduces the problem by one dimension. Freely pick amounts for N-1 fruits, and you can always find a value for the N-th fruit to satisfy the constraint. So while it seems that there are N choices to make for a day, there are actually only N-1 that we can make freely, and the last one will be fully determined by our previous choices. So for each day the game goes on, we need to estimate an additional N-1 choices/variables.

We might want to enforce that all the choices are greater than 0, but that only reduces the interval from which we can choose a number; any open interval of real numbers has infinitely many numbers in it, so we will never run out of options because of this. Still N-1 choices to make.

Between two days, the total basket volume np.sum(basket) only changes by at most some_percent of the previous day, i.e. np.abs(np.sum(previous_basket) - np.sum(basket)) <= some_percent * np.sum(previous_basket). Some of the choices we could make at a given day will change the basket by more than some_percent of the previous day. To make sure we never violate this, we can freely make N-2 choices and then have to pick the N-1-th variable so that adding it and adding the N-the variable (which is fixed from our previous choices) stays within some_percent. (Note: This is an inequality constraint, so it will only reduce the number of choices if we have equality, i.e., the basket changes by exactly some_percent. In optimization theory this is known as the constraint being active.)

We can again think about the constraint that all choices should be greater 0, but the argument remains that this simply changes the interval from which we can now freely choose N-2 variables.

So after D days we are left with N-1 choices to estimate from the first day (no change constraint) and (D-1)*(N-2) choices to estimate for each following day. Unfortunately, we ran out of constraints to further reduce this number and the number of unknowns grows by at least N-2 each day. This is essentially what what Luka Rahne meant with "2*D < N*D for all N > 2". We will likely find many candidates which are all equally probable.

The exact food prices each day don't matter for this. As long as they are of some value, they will constrain one of the choices. Hence, if you extend your game in the way you specify, there is always a chance for infinitely many solutions; regardless of the number of days.


Why the game can still not be uniquely solved - Part 2

There is one constraint we didn't look at which might help fix this: only allow integer solutions for choices. The problem with integer constraints is that they are very complex to deal with. However, our main concern here is if adding this constraint will allow us to uniquely solve the problem given enough days. For this, there is a rather intuitive counter-example. Suppose you have 3 consecutive days, and for the 1st and 3d day, the total value constraint only allows one basket. In other words, we know the basket for day 1 and day 3, but not for day 2. Here, we only know it's total value, that it is within some_percent of day 1 and that day 3 is within some_percent of day 2. Is this enough information to always work out what is in the basket on day 2?

some_percent = 0.05
Day 1: basket: [3 2]  prices: [10 7]  total_value: 44
Day 2: basket: [x y]  prices: [5  5]  total_value: 25
Day 3: basket: [2 3]  prices: [9  5]  total_value: 33

Possible Solutions Day 2: [2 3], [3 2]

Above is one example, where we know the values for two days thanks to the total value constraint, but that still won't allow us to work out the exact composition of the basket at day 2. Thus, while it may be possible to work it out in some cases, it is not possible in general. Adding more days after day 3 doesn't help figuring out day 2 at all. It might help in narrowing the options for day 3 (which will then narrow the options for day 2), but we already have just 1 choice left for day 3, so it's no use.


Full Code

import numpy as np
from itertools import product
import tqdm


def sample_uniform(n, r):
    # check out: http://compneuro.uwaterloo.ca/files/publications/voelker.2017.pdf
    sample = np.random.rand(n + 2)
    sample_norm = np.linalg.norm(sample)
    unit_sample = (sample / sample_norm)
    change = np.floor(r * unit_sample[:-2]).astype(np.int)
    return change


def human(num_fruits, allowed_change=0.05, current_distribution=None):
    allowed_change = 0.05
    if current_distribution is None:
        current_distribution = np.random.randint(1, 50, size=num_fruits)
    yield current_distribution.copy()

    # rejection sample a suitable change
    while True:
        current_total = np.sum(current_distribution)
        maximum_change = np.floor(allowed_change * current_total)

        change = sample_uniform(num_fruits, maximum_change)
        while np.sum(change) > maximum_change:
            change = sample_uniform(num_fruits, maximum_change)

        current_distribution += change
        yield current_distribution.copy()


def prices(num_fruits, alter_prices=False):
    current_prices = np.random.randint(1, 10, size=num_fruits)
    while True:
        yield current_prices.copy()
        if alter_prices:
            current_prices = np.random.randint(1, 10, size=num_fruits)


def play_game(num_days, num_fruits=3, alter_prices=False):
    human_choice = human(num_fruits)
    price_development = prices(num_fruits, alter_prices=alter_prices)

    history = {
        "basket": list(),
        "prices": list(),
        "total": list()
    }
    for day in range(num_days):
        choice = next(human_choice)
        price = next(price_development)
        total_price = np.sum(choice * price)

        history["basket"].append(choice)
        history["prices"].append(price)
        history["total"].append(total_price)

    return history


def backtrack(pos, daily_total, daily_item_value, allowed_change, iterator_bounds, history=None):
    if pos == len(daily_total):
        yield np.array(history)
        return

    it = [range(start, stop, step) for start, stop, step in iterator_bounds[pos][:-1]]
    for partial_basket in product(*it):
        if history is None:
            history = [partial_basket]
        else:
            history.append(partial_basket)

        # ensure we only check items that match the total basket value
        # for that day
        partial_value = np.sum(np.array(partial_basket) * daily_item_value[pos, :-1])
        if (daily_total[pos] - partial_value) % daily_item_value[pos, -1] != 0:
            history.pop()
            continue

        last_item = (daily_total[pos] - partial_value) // daily_item_value[pos, -1]
        if last_item < 0:
            history.pop()
            continue

        basket = np.array([*partial_basket] + [int(last_item)])
        basket_value = np.sum(basket * daily_item_value[pos])
        history[-1] = basket
        if len(history) > 1:
            # ensure that today's basket stays within relative tolerance
            previous_basket = history[-2]
            previous_basket_count = np.sum(previous_basket)
            current_basket_count = np.sum(basket)
            if (np.abs(current_basket_count - previous_basket_count) > allowed_change * previous_basket_count):
                history.pop()
                continue

        yield from backtrack(pos + 1, daily_total, daily_item_value, allowed_change, iterator_bounds, history)
        history.pop()


if __name__ == "__main__":
    np.random.seed(1337)
    num_fruits = 3
    allowed_change = 0.05
    alter_prices = False
    history = play_game(15, num_fruits=num_fruits, alter_prices=alter_prices)

    total_price = np.stack(history["total"]).astype(np.int)
    daily_price = np.stack(history["prices"]).astype(np.int)
    basket = np.stack(history["basket"]).astype(np.int)

    maximum_fruits = np.floor(total_price[:, np.newaxis] / daily_price).astype(np.int)
    iterator_bounds = [[[0, maximum_fruits[pos, fruit], 1] for fruit in range(num_fruits)] for pos in range(len(basket))]
    # iterator_bounds = np.array(iterator_bounds)
    # import pdb; pdb.set_trace()

    pbar = tqdm.tqdm(backtrack(0, total_price,
                               daily_price, allowed_change, iterator_bounds), desc="Found Solutions")
    for solution in pbar:
        # test price guess
        calculated_price = np.sum(np.stack(solution) * daily_price, axis=1)
        assert np.all(calculated_price == total_price)

        # test basket change constraint
        change = np.sum(np.diff(solution, axis=0), axis=1)
        max_change = np.sum(solution[:-1, ...], axis=1) * allowed_change
        assert np.all(change <= max_change)

        # indicate that we found the original solution
        if not np.any(solution - basket):
            pbar.set_description("Found Solutions (includes original)")

倒带 2024-12-15 18:55:26

当玩家选择的组合将可能性的数量减少到 1 时,计算机将获胜。否则,玩家可以选择一个组合,并将总数限制在一定百分比内,该计算机可能永远不会获胜。

import itertools
import numpy as np


def gen_possible_combination(total, prices):
    """
    Generates all possible combinations of numbers of items for
    given prices constraint by total
    """
    nitems = [range(total//p + 1) for p in prices]
    prices_arr = np.array(prices)
    combo = [x for x in itertools.product(
        *nitems) if np.dot(np.array(x), prices_arr) == total]

    return combo


def reduce(combo1, combo2, pct):
    """
    Filters impossible transitions which are greater than pct
    """
    combo = {}
    for x in combo1:
        for y in combo2:
            if abs(sum(x) - sum(y))/sum(x) <= pct:
                combo[y] = 1

    return list(combo.keys())


def gen_items(n, total):
    """
    Generates a list of items
    """
    nums = [0] * n
    t = 0
    i = 0
    while t < total:
        if i < n - 1:
            n1 = np.random.randint(0, total-t)
            nums[i] = n1
            t += n1
            i += 1
        else:
            nums[i] = total - t
            t = total

    return nums


def main():
    pct = 0.05
    i = 0
    done = False
    n = 3
    total_items = 26  # np.random.randint(26)
    combo = None
    while not done:
        prices = [np.random.randint(1, 10) for _ in range(n)]
        items = gen_items(n, total_items)

        total = np.dot(np.array(prices),  np.array(items))
        combo1 = gen_possible_combination(total, prices)

        if combo:
            combo = reduce(combo, combo1, pct)
        else:
            combo = combo1
        i += 1
        print(i, 'Items:', items, 'Prices:', prices, 'Total:',
              total, 'No. Possibilities:', len(combo))

        if len(combo) == 1:
            print('Solution', combo)
            break
        if np.random.random() < 0.5:
            total_items = int(total_items * (1 + np.random.random()*pct))
        else:
            total_items = int(
                np.ceil(total_items * (1 - np.random.random()*pct)))


if __name__ == "__main__":
    main()

When the player selects a combination which will reduce the number of possibilities to 1, computer will win. Otherwise, the player can pick a combination with the constraint of the total varying within a certain percentage, that computer may never win.

import itertools
import numpy as np


def gen_possible_combination(total, prices):
    """
    Generates all possible combinations of numbers of items for
    given prices constraint by total
    """
    nitems = [range(total//p + 1) for p in prices]
    prices_arr = np.array(prices)
    combo = [x for x in itertools.product(
        *nitems) if np.dot(np.array(x), prices_arr) == total]

    return combo


def reduce(combo1, combo2, pct):
    """
    Filters impossible transitions which are greater than pct
    """
    combo = {}
    for x in combo1:
        for y in combo2:
            if abs(sum(x) - sum(y))/sum(x) <= pct:
                combo[y] = 1

    return list(combo.keys())


def gen_items(n, total):
    """
    Generates a list of items
    """
    nums = [0] * n
    t = 0
    i = 0
    while t < total:
        if i < n - 1:
            n1 = np.random.randint(0, total-t)
            nums[i] = n1
            t += n1
            i += 1
        else:
            nums[i] = total - t
            t = total

    return nums


def main():
    pct = 0.05
    i = 0
    done = False
    n = 3
    total_items = 26  # np.random.randint(26)
    combo = None
    while not done:
        prices = [np.random.randint(1, 10) for _ in range(n)]
        items = gen_items(n, total_items)

        total = np.dot(np.array(prices),  np.array(items))
        combo1 = gen_possible_combination(total, prices)

        if combo:
            combo = reduce(combo, combo1, pct)
        else:
            combo = combo1
        i += 1
        print(i, 'Items:', items, 'Prices:', prices, 'Total:',
              total, 'No. Possibilities:', len(combo))

        if len(combo) == 1:
            print('Solution', combo)
            break
        if np.random.random() < 0.5:
            total_items = int(total_items * (1 + np.random.random()*pct))
        else:
            total_items = int(
                np.ceil(total_items * (1 - np.random.random()*pct)))


if __name__ == "__main__":
    main()
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文