如何处理猜数字游戏(有一些变化)算法?
更新(2020 年 7 月):问题已经有 9 年历史了,但仍然是我非常感兴趣的问题。从那时起,机器学习(RNN、CNN、GANS 等)、新方法和廉价 GPU 的兴起使得启用新方法。我认为重新审视这个问题看看是否有新的方法会很有趣。
我正在学习编程(Python 和算法),并正在尝试从事一个我觉得有趣的项目。我已经创建了一些基本的 Python 脚本,但我不确定如何为我正在尝试构建的游戏提供解决方案。
游戏的运作方式如下:
用户将获得有价值的物品。例如,
Apple = 1
Pears = 2
Oranges = 3
他们将有机会选择他们喜欢的任何组合(即 100 个苹果、20 个梨和一个橙子)。计算机获得的唯一输出是总价值(在本例中,当前为 143 美元)。计算机将尝试猜测他们拥有什么。显然它在第一回合就无法正确进行。
Value quantity(day1) value(day1)
Apple 1 100 100
Pears 2 20 40
Orange 3 1 3
Total 121 143
下一回合,用户可以修改他们的数量,但不能超过总量的 5%(或者我们可以选择的其他百分比。例如,我将使用 5%)。水果的价格可能会发生变化(随机),因此总价值也可能会随之变化(为简单起见,在本例中我没有改变水果价格)。使用上面的示例,在游戏的第 2 天,用户在第 3 天返回值 152 美元和 164 美元。这是一个示例:
Quantity (day2) %change (day2) Value (day2) Quantity (day3) %change (day3) Value(day3)
104 104 106 106
21 42 23 46
2 6 4 12
127 4.96% 152 133 4.72% 164
*(我希望表格显示正确,我必须手动将它们隔开,所以希望这不仅仅是在我的屏幕上执行此操作,如果它不起作用,请告诉我,我会尝试上传屏幕截图。)
我试图看看是否可以计算出随着时间的推移数量是多少(假设用户有耐心)继续输入数字)。我知道现在我唯一的限制是总值不能超过 5%,所以我现在的准确度不能在 5% 以内,这样用户将永远输入它。
到目前为止我所做的事情
这是到目前为止我的解决方案(不多)。基本上,我获取所有值并找出它们的所有可能组合(我已完成这部分)。然后我将所有可能的组合放入数据库中作为字典(例如,对于 143 美元,可能有一个字典条目 {apple:143, Pears:0, Oranges :0}..一直到 {apple :0,梨:1,橙子:47}。每次我得到一个新数字时,我都会这样做,这样我就得到了所有可能性的列表,
这就是我在使用上述规则时遇到的问题。最好的解决方案?我想我需要一个适应度函数来自动比较两天的数据并消除前几天数据方差超过 5% 的任何可能性
问题:
所以我的问题是 。用户更改总数并且我有一个所有概率的列表,我应该如何处理这个问题?我需要学习什么?有什么算法或我可以使用的适用理论吗?错误,你能建议我可以添加哪些规则吗这个目标是否可行(如果不是当前状态。我正在考虑添加更多水果并说他们必须至少采摘 3 个,等等)?另外,我对遗传算法只有一个模糊的了解,但我想我可以在这里使用它们,有什么我可以使用的吗?
我非常非常渴望学习,因此任何建议或提示将不胜感激(只是请不要告诉我这个游戏是不可能的)。
更新:得到的反馈表明这很难解决。所以我想我应该在游戏中添加另一个条件,该条件不会干扰玩家正在做的事情(游戏对他们来说保持不变),但每天水果的价值都会改变价格(随机)。这样是不是就更容易解决了呢?因为在 5% 的变动和某些水果价值变化的范围内,随着时间的推移,只有少数组合是可能的。
第一天,一切皆有可能,获得足够接近的范围几乎是不可能的,但随着水果价格的变化,用户只能选择 5% 的变化,那么(随着时间的推移)范围不应该越来越窄。在上面的例子中,如果价格波动足够大,我想我可以暴力破解一个解决方案,给我一个猜测范围,但我试图弄清楚是否有一个更优雅的解决方案或其他解决方案来不断缩小这个范围时间。
UPDATE2:经过阅读和询问,我相信这是一个隐藏的马尔可夫/维特比问题,它跟踪水果价格的变化以及总和(最后一个数据点的权重最重)。但我不确定如何应用这种关系。我认为情况确实如此,并且可能是错误的,但至少我开始怀疑这是某种类型的机器学习问题。
更新 3:我创建了一个测试用例(数字较小)和一个生成器来帮助自动化用户生成的数据,并且我正在尝试从中创建一个图表以查看更有可能发生的情况。
这是代码,以及总价值和对用户实际水果数量的评论。
#!/usr/bin/env python
import itertools
# Fruit price data
fruitPriceDay1 = {'Apple':1, 'Pears':2, 'Oranges':3}
fruitPriceDay2 = {'Apple':2, 'Pears':3, 'Oranges':4}
fruitPriceDay3 = {'Apple':2, 'Pears':4, 'Oranges':5}
# Generate possibilities for testing (warning...will not scale with large numbers)
def possibilityGenerator(target_sum, apple, pears, oranges):
allDayPossible = {}
counter = 1
apple_range = range(0, target_sum + 1, apple)
pears_range = range(0, target_sum + 1, pears)
oranges_range = range(0, target_sum + 1, oranges)
for i, j, k in itertools.product(apple_range, pears_range, oranges_range):
if i + j + k == target_sum:
currentPossible = {}
#print counter
#print 'Apple', ':', i/apple, ',', 'Pears', ':', j/pears, ',', 'Oranges', ':', k/oranges
currentPossible['apple'] = i/apple
currentPossible['pears'] = j/pears
currentPossible['oranges'] = k/oranges
#print currentPossible
allDayPossible[counter] = currentPossible
counter = counter +1
return allDayPossible
# Total sum being returned by user for value of fruits
totalSumDay1=26 # Computer does not know this but users quantities are apple: 20, pears 3, oranges 0 at the current prices of the day
totalSumDay2=51 # Computer does not know this but users quantities are apple: 21, pears 3, oranges 0 at the current prices of the day
totalSumDay3=61 # Computer does not know this but users quantities are apple: 20, pears 4, oranges 1 at the current prices of the day
graph = {}
graph['day1'] = possibilityGenerator(totalSumDay1, fruitPriceDay1['Apple'], fruitPriceDay1['Pears'], fruitPriceDay1['Oranges'] )
graph['day2'] = possibilityGenerator(totalSumDay2, fruitPriceDay2['Apple'], fruitPriceDay2['Pears'], fruitPriceDay2['Oranges'] )
graph['day3'] = possibilityGenerator(totalSumDay3, fruitPriceDay3['Apple'], fruitPriceDay3['Pears'], fruitPriceDay3['Oranges'] )
# Sample of dict = 1 : {'oranges': 0, 'apple': 0, 'pears': 0}..70 : {'oranges': 8, 'apple': 26, 'pears': 13}
print graph
Update(July 2020): Question is 9 years old but still one that I'm deeply interested in. In the time since, machine learning(RNN's, CNN's, GANS,etc), new approaches and cheap GPU's have risen that enable new approaches. I thought it would be fun to revisit this question to see if there are new approaches.
I am learning programming (Python and algorithms) and was trying to work on a project that I find interesting. I have created a few basic Python scripts, but I’m not sure how to approach a solution to a game I am trying to build.
Here’s how the game will work:
Users will be given items with a value. For example,
Apple = 1
Pears = 2
Oranges = 3
They will then get a chance to choose any combo of them they like (i.e. 100 apples, 20 pears, and one orange). The only output the computer gets is the total value (in this example, it's currently $143). The computer will try to guess what they have. Which obviously it won’t be able to get correctly the first turn.
Value quantity(day1) value(day1)
Apple 1 100 100
Pears 2 20 40
Orange 3 1 3
Total 121 143
The next turn the user can modify their numbers but no more than 5% of the total quantity (or some other percent we may chose. I’ll use 5% for example.). The prices of fruit can change(at random) so the total value may change based on that also (for simplicity I am not changing fruit prices in this example). Using the above example, on day 2 of the game, the user returns a value of $152 and $164 on day 3. Here's an example:
Quantity (day2) %change (day2) Value (day2) Quantity (day3) %change (day3) Value(day3)
104 104 106 106
21 42 23 46
2 6 4 12
127 4.96% 152 133 4.72% 164
*(I hope the tables show up right, I had to manually space them so hopefully it's not just doing it on my screen, if it doesn't work let me know and I'll try to upload a screenshot.)
I am trying to see if I can figure out what the quantities are over time (assuming the user will have the patience to keep entering numbers). I know right now my only restriction is the total value cannot be more than 5% so I cannot be within 5% accuracy right now so the user will be entering it forever.
What I have done so far
Here’s my solution so far (not much). Basically, I take all the values and figure out all the possible combinations of them (I am done this part). Then I take all the possible combos and put them in a database as a dictionary (so for example for $143, there could be a dictionary entry {apple:143, Pears:0, Oranges :0}..all the way to {apple:0, Pears:1, Oranges :47}. I do this each time I get a new number so I have a list of all possibilities.
Here’s where I’m stuck. In using the rules above, how can I figure out the best possible solution? I think I’ll need a fitness function that automatically compares the two days data and removes any possibilities that have more than 5% variance of the previous days data.
Questions:
So my question with user changing the total and me having a list of all the probabilities, how should I approach this? What do I need to learn? Is there any algorithms out there or theories that I can use that are applicable? Or, to help me understand my mistake, can you suggest what rules I can add to make this goal feasible (if it's not in its current state. I was thinking adding more fruits and saying they must pick at least 3, etc..)? Also, I only have a vague understanding of genetic algorithms, but I thought I could use them here, if is there something I can use?
I'm very very eager to learn so any advice or tips would be greatly appreciated (just please don't tell me this game is impossible).
UPDATE: Getting feedback that this is hard to solve. So I thought I'd add another condition to the game that won't interfere with what the player is doing (game stays the same for them) but everyday the value of the fruits change price (randomly). Would that make it easier to solve? Because within a 5% movement and certain fruit value changes, only a few combinations are probable over time.
Day 1, anything is possible and getting a close enough range is almost impossible, but as the prices of fruits change and the user can only choose a 5% change, then shouldn't (over time) the range be narrow and narrow. In the above example, if prices are volatile enough I think I could brute force a solution that gave me a range to guess in, but I'm trying to figure out if there's a more elegant solution or other solutions to keep narrowing this range over time.
UPDATE2: After reading and asking around, I believe this is a hidden Markov/Viterbi problem that tracks the changes in fruit prices as well as total sum (weighting the last data point the heaviest). I'm not sure how to apply the relationship though. I think this is the case and could be wrong but at the least I'm starting to suspect this is a some type of machine learning problem.
Update 3: I am created a test case (with smaller numbers) and a generator to help automate the user generated data and I am trying to create a graph from it to see what's more likely.
Here's the code, along with the total values and comments on what the users actually fruit quantities are.
#!/usr/bin/env python
import itertools
# Fruit price data
fruitPriceDay1 = {'Apple':1, 'Pears':2, 'Oranges':3}
fruitPriceDay2 = {'Apple':2, 'Pears':3, 'Oranges':4}
fruitPriceDay3 = {'Apple':2, 'Pears':4, 'Oranges':5}
# Generate possibilities for testing (warning...will not scale with large numbers)
def possibilityGenerator(target_sum, apple, pears, oranges):
allDayPossible = {}
counter = 1
apple_range = range(0, target_sum + 1, apple)
pears_range = range(0, target_sum + 1, pears)
oranges_range = range(0, target_sum + 1, oranges)
for i, j, k in itertools.product(apple_range, pears_range, oranges_range):
if i + j + k == target_sum:
currentPossible = {}
#print counter
#print 'Apple', ':', i/apple, ',', 'Pears', ':', j/pears, ',', 'Oranges', ':', k/oranges
currentPossible['apple'] = i/apple
currentPossible['pears'] = j/pears
currentPossible['oranges'] = k/oranges
#print currentPossible
allDayPossible[counter] = currentPossible
counter = counter +1
return allDayPossible
# Total sum being returned by user for value of fruits
totalSumDay1=26 # Computer does not know this but users quantities are apple: 20, pears 3, oranges 0 at the current prices of the day
totalSumDay2=51 # Computer does not know this but users quantities are apple: 21, pears 3, oranges 0 at the current prices of the day
totalSumDay3=61 # Computer does not know this but users quantities are apple: 20, pears 4, oranges 1 at the current prices of the day
graph = {}
graph['day1'] = possibilityGenerator(totalSumDay1, fruitPriceDay1['Apple'], fruitPriceDay1['Pears'], fruitPriceDay1['Oranges'] )
graph['day2'] = possibilityGenerator(totalSumDay2, fruitPriceDay2['Apple'], fruitPriceDay2['Pears'], fruitPriceDay2['Oranges'] )
graph['day3'] = possibilityGenerator(totalSumDay3, fruitPriceDay3['Apple'], fruitPriceDay3['Pears'], fruitPriceDay3['Oranges'] )
# Sample of dict = 1 : {'oranges': 0, 'apple': 0, 'pears': 0}..70 : {'oranges': 8, 'apple': 26, 'pears': 13}
print graph
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
我们将结合图论和概率:
在第一天,构建一组所有可行的解决方案。让我们将解集表示为 A1={a1(1), a1(2),...,a1(n)}。
第二天,您可以再次构建解决方案集 A2。
现在,对于 A2 中的每个元素,您需要检查是否可以从 A1 的每个元素到达它(给定 x% 容差)。如果是这样 - 将 A2(n) 连接到 A1(m)。如果无法从 A1(m) 中的任何节点到达 - 您可以删除该节点。
基本上我们正在构建一个连接的有向无环图。
图中的所有路径都有相同的可能性。只有当从Am到Am+1(从Am中的节点到Am+1中的节点)存在单边时,才能找到精确解。
当然,某些节点比其他节点出现在更多路径中。每个节点的概率可以根据包含该节点的路径数量直接推导出来。
通过为每个节点分配一个权重,该权重等于通往该节点的路径数,无需保留所有历史记录,而只需保留前一天的历史记录。
另外,看看非负-值线性二番图方程 - 我不久前问过的一个问题。接受的答案是枚举每一步中所有组合的好方法。
We'll combine graph-theory and probability:
On the 1st day, build a set of all feasible solutions. Lets denote the solutions set as A1={a1(1), a1(2),...,a1(n)}.
On the second day you can again build the solutions set A2.
Now, for each element in A2, you'll need to check if it can be reached from each element of A1 (given x% tolerance). If so - connect A2(n) to A1(m). If it can't be reached from any node in A1(m) - you can delete this node.
Basically we are building a connected directed acyclic graph.
All paths in the graph are equally likely. You can find an exact solution only when there is a single edge from Am to Am+1 (from a node in Am to a node in Am+1).
Sure, some nodes appear in more paths than other nodes. The probability for each node can be directly deduced based on the number of paths that contains this node.
By assigning a weight to each node, which equals to the number of paths that leads to this node, there is no need to keep all history, but only the previous day.
Also, have a look at non-negative-values linear diphantine equations - A question I asked a while ago. The accepted answer is a great way to enumarte all combos in each step.
免责声明:由于我误读了问题的一些关键部分,因此在暂时删除我的答案并仔细重新阅读问题后,我极大地改变了我的答案。虽然仍然引用类似的主题和算法,但在我尝试自己解决 C# 中的一些问题后,答案得到了很大改善。
好莱坞版本
原始版本
首先,我想说明我在这里看到的两个主要问题:
可能的解决方案的数量之多。仅知道物品的数量和总价值(例如 3 和 143)将产生很多可能的解决方案。另外,让算法选择有效的解决方案而不不可避免地尝试无效的解决方案(总数不等于 143)并不容易。
当在给定日期 Di 找到可能的解决方案时,必须找到一种方法,通过 { Di+1 .. Di+n }. 给出的附加信息来消除潜在的解决方案。
让我们为即将到来的示例奠定一些基础:
为了更容易地解决这个问题我冒昧地改变了一个约束,这使得算法收敛得更快:
这条规则使我们能够更容易地排除解决方案。并且,在非微小范围内,渲染回溯算法仍然无用,就像你原来的问题和规则一样。
以我的拙见,这条规则并不是游戏的本质,而只是一个促进者,使计算机能够解决问题。
问题 1:寻找潜在的解决方案
对于初学者来说,问题 1 可以使用 Monte 来解决Carlo 算法 寻找一组潜在的解决方案。该技术很简单:为项目值和数量生成随机数(在各自可接受的范围内)。对所需数量的项目重复此过程。验证解决方案是否可接受。这意味着验证项目是否具有不同的值并且总数是否等于我们的目标总数(例如 143)。
虽然此技术具有易于实现的优点,但也有一些缺点:
如何克服这些缺点?好吧...
请注意,限制范围越多,蒙特卡罗算法的用处就越小,因为在合理的时间内几乎没有足够的有效解决方案来迭代所有这些解决方案。对于约束 { 3, [1-10], [0-100] } ,大约有 741,000,000 个有效解(不受目标总值的限制)。蒙特卡罗可以在那里使用。对于 { 3, [1-5], [0-10] },只有大约 80,000 个。无需使用蒙特卡罗;暴力
for
循环就可以了。我相信问题1就是所谓的约束满足问题 (或 CSP。)
问题 2:限制潜在解决方案集
鉴于问题 1 是一个 CSP,我会继续调用 问题2,以及一般问题,动态 CSP(或 DCSP. )
与 CSP 一起使用的一种可能对解决此问题有用的技术称为“约束记录”:
为此,您需要每天获得一组新的可能解决方案;使用蛮力或蒙特卡罗。然后,将 Di 的解决方案与 Di-1 进行比较,并仅保留能够继承前几天的解决方案且不违反约束的解决方案。
您可能必须保留哪些解决方案导致其他解决方案的历史记录(可能在有向图中)。约束记录使您能够记住可能的添加-删除数量,并据此拒绝解决方案。
还可以采取许多其他步骤来进一步改进您的解决方案。以下是一些想法:
考虑到所有这些,尝试找出一个基于解决方案的出现和启发式的排名系统,以确定候选解决方案。
Disclaimer: I changed my answer dramatically after temporarily deleting my answer and re-reading the question carefully as I misread some critical parts of the question. While still referencing similar topics and algorithms, the answer was greatly improved after I attempted to solve some of the problem in C# myself.
Hollywood version
Original version
First, I would like to state what I see two main problems here:
The sheer number of possible solutions. Knowing only the number of items and the total value, lets say 3 and 143 for example, will yield a lot of possible solutions. Plus, it is not easy to have an algorithm picking valid solution without inevitably trying invalid solutions (total not equal to 143.)
When possible solutions are found for a given day Di, one must find a way to eliminate potential solutions with the added information given by { Di+1 .. Di+n }.
Let's lay down some bases for the upcoming examples:
In order to solve this more easily I took the liberty to change one constraint, which makes the algorithm converge faster:
This rule enables us to rule out solutions more easily. And, with non-tiny ranges, renders Backtracking algorithms still useless, just like your original problem and rules.
In my humble opinion, this rule is not the essence of the game but only a facilitator, enabling the computer to solve the problem.
Problem 1: Finding potential solutions
For starters, problem 1. can be solved using a Monte Carlo algorithm to find a set of potential solutions. The technique is simple: Generate random numbers for item values and quantities (within their respective accepted range). Repeat the process for the required number of items. Verify whether or not the solution is acceptable. That means verifying if items have distinct values and the total is equal to our target total (say, 143.)
While this technique has the advantage of being easy to implement it has some drawbacks:
How to get around these drawback? Well...
Note that the more you restrict the ranges, the less useful while be the Monte Carlo algorithm is, since there will be few enough valid solutions to iterate on them all in reasonable time. For constraints { 3, [1-10], [0-100] } there is around 741,000,000 valid solutions (not constrained to a target total value.) Monte Carlo is usable there. For { 3, [1-5], [0-10] }, there is only around 80,000. No need to use Monte Carlo; brute force
for
loops will do just fine.I believe the problem 1 is what you would call a Constraint satisfaction problem (or CSP.)
Problem 2: Restrict the set of potential solutions
Given the fact that problem 1 is a CSP, I would go ahead and call problem 2, and the problem in general, a Dynamic CSP (or DCSP.)
One technique used with CSPs that might be useful to this problem is called Constraint Recording:
For this to work, you need to get a new set of possible solutions every day; Use either brute force or Monte Carlo. Then, compare solutions of Di to Di-1 and keep only solutions that can succeed to previous days' solutions without violating constraints.
You will probably have to keep an history of what solutions lead to what other solutions (probably in a directed graph.) Constraint recording enables you to remember possible add-remove quantities and rejects solutions based on that.
There is a lot of other steps that could be taken to further improve your solution. Here are some ideas:
Given all of this, try and figure out a ranking system based on occurrence of solutions and heuristics to determine a candidate solution.
这个问题是不可能解决的。
假设您确切地知道项目数量增加的比率,而不仅仅是最大比率是多少。
一个用户有 N 个水果,你有 D 天的猜测时间。
每天你都会得到 N 个新变量,那么总共就有 D*N 个变量。
每天您只能生成两个方程。一个方程是 n_item*price 的总和,另一个方程基于已知的比率。总共最多有 2*D 方程(如果它们都是独立的)。
2*D< N*D 对于所有N> 2
This problem is impossible to solve.
Let's say that you know exactly for what ratio number of items was increased, not just what is the maximum ratio for this.
A user has N fruits and you have D days of guessing.
In each day you get N new variables and then you have in total D*N variables.
For each day you can generate only two equations. One equation is the sum of n_item*price and other is based on a known ratio. In total you have at most 2*D equations if they are all independent.
2*D < N*D for all N > 2
我写了一个程序来玩这个游戏。当然,我必须使人类方面自动化,但我相信我以这样的方式完成了这一切,当与真人比赛时,我不应该使我的方法无效。
我从机器学习的角度来处理这个问题,并将问题视为隐藏的马尔可夫模型,其中总价格是观察值。我的解决方案是使用粒子过滤器。该解决方案是使用 NumPy 和 SciPy 在 Python 2.7 中编写的。
我在注释中明确地或在代码中隐含地陈述了我所做的任何假设。为了让代码以自动化方式运行,我还设置了一些额外的约束。它并没有特别优化,因为我试图在可理解性而不是速度上犯错误。
每次迭代都会输出当前的真实数量和猜测值。我只是将输出通过管道传输到一个文件,以便我可以轻松查看它。一个有趣的扩展是将输出绘制在 2D(2 个水果)或 3D(3 个水果)图表上。然后您将能够看到粒子过滤器对解决方案的磨练。
更新:
编辑代码以包含调整后更新的参数。包括使用 matplotlib 的绘图调用(通过 pylab)。绘图可以在 Linux-Gnome 上运行,您的情况可能会有所不同。默认 NUM_FRUITS 为 2 以支持绘图。只需注释掉所有 pylab 调用即可删除绘图,并能够将 NUM_FRUITS 更改为任何内容。
很好地估计了由 UnknownQuantities X 价格 = TotalPrice 表示的当前 fxn。在 2D(2 个水果)中,这是一条线,在 3D(3 个水果)中,它是一个平面。对于粒子过滤器来说,数据似乎太少,无法可靠地磨练出正确的数量。在粒子过滤器之上需要更多的智能才能真正汇集历史信息。您可以尝试将粒子滤波器转换为二阶或三阶。
更新 2:
我经常修改我的代码。我尝试了很多事情,现在展示我将要制作的最终程序(开始对这个想法感到厌倦)。
更改:
粒子现在使用浮点数而不是整数。不确定这是否有任何有意义的影响,但这是一个更通用的解决方案。仅在进行猜测时才四舍五入为整数。
绘图将真实数量显示为绿色方块,将当前猜测显示为红色方块。目前相信的粒子显示为蓝点(大小取决于我们相信它们的程度)。这使得很容易看出算法的运行情况。 (绘图也在 Win 7 64 位上进行了测试和工作)。
添加了关闭/打开数量更改和价格更改的参数。当然,两者“关”并不有趣。
它做得非常好,但是,正如已经指出的那样,这是一个非常棘手的问题,因此很难得到确切的答案。关闭 CHANGE_QUANTITIES 会产生最简单的情况。通过在 CHANGE_QUANTITIES 关闭的情况下运行 2 个水果,您可以了解问题的难度。看看它磨练出正确答案的速度有多快,然后看看当你增加水果的数量时它有多难。
您还可以通过保持 CHANGE_QUANTITIES 开启,但将 MAX_QUANTITY_CHANGE 从非常小的值 (.001) 调整为“大”值 (.05) 来了解难度。
它陷入困境的一种情况是维度(一种水果的数量)接近于零。因为它使用粒子的平均值来猜测它总是会偏离零这样的硬边界。
总的来说,这是一个很棒的粒子过滤器教程。
I wrote a program to play the game. Of course, I had to automate the human side, but I believe I did it all in such a way that I shouldn't invalidate my approach when played against a real human.
I approached this from a machine learning perspective and treated the problem as a hidden markov model where the total price was the observation. My solution is to use a particle filter. This solution is written in Python 2.7 using NumPy and SciPy.
I stated any assumptions I made either explicitly in the comments or implicitly in the code. I also set some additional constraints for the sake of getting code to run in an automated fashion. It's not particularly optimized as I tried to err on the side comprehensibility rather than speed.
Each iteration outputs the current true quantities and the guess. I just pipe the output to a file so I can review it easily. An interesting extension would be to plot the output on a graph either 2D (for 2 fruits) or 3D (for 3 fruits). Then you would be able to see the particle filter hone in on the solution.
Update:
Edited the code to include updated parameters after tweaking. Included plotting calls using matplotlib (via pylab). Plotting works on Linux-Gnome, your mileage may vary. Defaulted NUM_FRUITS to 2 for plotting support. Just comment out all the pylab calls to remove plotting and be able to change NUM_FRUITS to anything.
Does a good job estimating the current fxn represented by UnknownQuantities X Prices = TotalPrice. In 2D (2 Fruits) this is a line, in 3D (3 Fruits) it'd be a plane. Seems to be too little data for the particle filter to reliably hone in on the correct quantities. Need a little more smarts on top of the particle filter to really bring together the historical information. You could try converting the particle filter to 2nd- or 3rd-order.
Update 2:
I've been playing around with my code, a lot. I tried a bunch of things and now present the final program that I'll be making (starting to burn out on this idea).
Changes:
The particles now use floating points rather than integers. Not sure if this had any meaningful effect, but it is a more general solution. Rounding to integers is done only when making a guess.
Plotting shows true quantities as green square and current guess as red square. Currently believed particles shown as blue dots (sized by how much we believe them). This makes it really easy to see how well the algorithm is working. (Plotting also tested and working on Win 7 64-bit).
Added parameters for turning off/on quantity changing and price changing. Of course, both 'off' is not interesting.
It does a pretty dang good job, but, as has been noted, it's a really tough problem, so getting the exact answer is hard. Turning off CHANGE_QUANTITIES produces the simplest case. You can get an appreciation for the difficulty of the problem by running with 2 fruits with CHANGE_QUANTITIES off. See how quickly it hones in on the correct answer then see how harder it is as you increase the number of fruit.
You can also get a perspective on the difficulty by keeping CHANGE_QUANTITIES on, but adjusting the MAX_QUANTITY_CHANGE from very small values (.001) to "large" values (.05).
One situation where it struggles is if on dimension (one fruit quantity) gets close to zero. Because it's using an average of particles to guess it will always skew away from a hard boundary like zero.
In general this makes a great particle filter tutorial.
对于你的初始规则:
从我的学生时代开始,我会说,如果我们对 5% 的变化进行抽象,我们每天都会有一个包含三个未知值的方程(抱歉,我不懂数学)英语词汇),与前一天的值相同。
在第 3 天,您有三个方程、三个未知值,并且解应该是直接的。
我想如果这三个元素的值足够不同,那么每天 5% 的变化可能会被忘记,因为正如您所说,我们将使用近似值并对数字进行四舍五入。
对于您的适应规则:
在这种情况下,有太多的未知数和变化的值,因此我知道没有直接的解决方案。在这一点上我相信利奥尔;他的做法看起来不错! (如果您的价格和数量范围有限。)
For your initial rules:
From my school years, I would say that if we make an abstraction of the 5% changes, we have everyday an equation with three unknown values (sorry I don't know the maths vocabulary in English), which are the same values as previous day.
At day 3, you have three equations, three unknown values, and the solution should be direct.
I guess the 5% change each day may be forgotten if the values of the three elements are different enough, because, as you said, we will use approximations and round the numbers.
For your adapted rules:
Too many unknowns - and changing - values in this case, so there is no direct solution I know of. I would trust Lior on this; his approach looks fine! (If you have a limited range for prices and quantities.)
我意识到我的答案变得相当冗长,所以我将代码移到了顶部(这可能是大多数人感兴趣的)。下面有两件事:
对于对任一主题感兴趣的人,请参阅下文。对于其他人,这里是代码。
找到所有可能解决方案的代码
正如我在答案中进一步解释的那样,您的问题尚未确定。一般情况下,有多种可能的解决方案,并且随着天数的增加,这个数字至少呈指数增长。对于原始问题和扩展问题都是如此。尽管如此,我们可以(某种程度上)有效地找到所有解决方案(这是 NP 难的,所以不要期望太多)。
回溯(来自 20 世纪 60 年代,所以不完全是现代的)是这里选择的算法。在Python中,我们可以将其编写为递归生成器,这实际上非常优雅:
这种方法本质上将所有可能的候选者构造成一棵大树,然后在违反约束时通过剪枝执行深度优先搜索。每当遇到叶节点时,我们都会产生结果。
树搜索(通常)可以并行化,但这超出了本文的范围。如果没有太多额外的洞察力,它将使解决方案的可读性降低。这同样适用于减少代码的常量开销,例如,将约束
if ...: continue
放入iterator_bounds
变量中并进行更少的检查。我将完整的代码示例(包括游戏人性方面的模拟器)放在这个答案的底部。
解决这个问题的现代机器学习
我真的很喜欢你对深度神经网络世界的热情;不幸的是,由于以下几个原因,它们在这里不适用:(
为什么游戏无法唯一解决 - 第 1 部分
让我们首先考虑替代问题并提升整数要求,即篮子(人类为给定的
N
个水果选择)天)可以有分数水果(0.3 个橙子)。总价值约束
np.dot(basket, daily_price) ==total_value
限制了篮子的可能解决方案;它把问题减少了一维。自由选择N-1
个水果的数量,你总能找到第N个水果的值来满足约束。所以虽然看起来一天有N
个选择,但实际上我们可以自由做出的只有N-1
个,最后一个就完全确定了根据我们之前的选择。因此,对于游戏进行的每一天,我们都需要估计额外的N-1
个选择/变量。我们可能想要强制所有选择都大于 0,但这只会减少我们可以选择数字的间隔;任何实数的开区间都包含无限多个数字,因此我们永远不会因此而耗尽选择。还有
N-1
个选择要做。两天之间,篮子总成交量
np.sum(basket)
最多只变化前一天的some_percent
,即np.abs(np.sum (previous_basket) - np.sum(basket)) <= some_percent * np.sum(previous_basket)
。我们在某一天可以做出的一些选择将使购物篮的变化超过前一天的some_percent
。为了确保我们永远不会违反这一点,我们可以自由地做出N-2
个选择,然后必须选择第N-1
个变量,以便添加它并添加 < code>N - 变量(根据我们之前的选择固定)保持在some_percent
范围内。 (注意:这是一个不等式约束,因此,如果我们具有相等性,则只会减少选择的数量,即,篮子恰好改变some_percent
。在优化理论中,这称为主动约束.)我们可以再次考虑所有选择都应该大于 0 的约束,但争论仍然是这只是改变了我们现在可以自由选择
N-2
变量的区间。因此,在
D
天之后,我们只剩下N-1
个选择来从第一天开始进行估算(无变化约束)和(D-1)*(N- 2)
选择对接下来的每一天进行估计。不幸的是,我们超出了进一步减少这个数字的限制,并且未知数的数量每天至少增长N-2
。这本质上就是 Luka Rahne 的“2*D < N*D for all N > 2
”的含义。我们可能会找到许多可能性相同的候选者。每天确切的食品价格对此并不重要。只要它们具有一定的价值,它们就会限制其中一种选择。因此,如果您按照指定的方式扩展游戏,那么总是有机会产生无限多个解决方案;无论天数。
为什么游戏仍然无法唯一解决 - 第 2 部分
我们没有考虑一个可能有助于解决此问题的约束:仅允许选择整数解。整数约束的问题在于处理起来非常复杂。然而,我们这里主要关心的是添加这个约束是否能让我们在足够的时间内唯一地解决问题。对于这一点,有一个相当直观的反例。假设您有连续 3 天,并且对于第 1 天和第 3 天,总值限制只允许购买一个篮子。换句话说,我们知道第一天和第三天的篮子,但不知道第二天的篮子。在这里,我们只知道它的总价值,它在
some_percent
之内第 1 天和第 3 天在第 2 天的some_percent
之内。这些信息是否足以计算出第 2 天购物篮中的物品?上面是一个例子,由于总价值的限制,我们知道两天的价值,但这仍然不允许我们计算出第 2 天篮子的确切组成。因此,虽然在某些情况下可以解决这个问题,但一般来说这是不可能的。在第 3 天之后添加更多天数根本无助于计算第 2 天。它可能有助于缩小第 3 天的选项范围(这将缩小第 2 天的选项范围),但我们已经只剩下第 3 天的 1 个选择,所以它没有用。
完整代码
I realized that my answer was getting quite lengthy, so I moved the code to the top (which is probably what most people are interested in). Below it there are two things:
For those of you interested in either topic, please see below. For the rest of you, here is the code.
Code that finds all possible solutions
As I explain further down in the answer, your problem is under-determined. In the average case, there are many possible solutions, and this number grows at least exponentially as the number of days increases. This is true for both, the original and the extended problem. Nevertheless, we can (sort of) efficiently find all solutions (it's NP hard, so don't expect too much).
Backtracking (from the 1960s, so not exactly modern) is the algorithm of choice here. In python, we can write it as a recursive generator, which is actually quite elegant:
This approach essentially structures all possible candidates into a large tree and then performs depth first search with pruning whenever a constraint is violated. Whenever a leaf node is encountered, we yield the result.
Tree search (in general) can be parallelized, but that is out of scope here. It will make the solution less readable without much additional insight. The same goes for reducing constant overhead of the code, e.g., working the constraints
if ...: continue
into theiterator_bounds
variable and do less checks.I put the full code example (including a simulator for the human side of the game) at the bottom of this answer.
Modern Machine Learning for this problem
I really like your enthusiasm for the world of deep neural networks; unfortunately they simply do not apply here for a few reasons:
Why the game can not be uniquely solved - Part 1
Let's consider a substitute problem first and lift the integer requirement, i.e., the basket (human choice of
N
fruits for a given day) can have fractional fruits (0.3 oranges).The total value constraint
np.dot(basket, daily_price) == total_value
limits the possible solutions for the basket; it reduces the problem by one dimension. Freely pick amounts forN-1
fruits, and you can always find a value for the N-th fruit to satisfy the constraint. So while it seems that there areN
choices to make for a day, there are actually onlyN-1
that we can make freely, and the last one will be fully determined by our previous choices. So for each day the game goes on, we need to estimate an additionalN-1
choices/variables.We might want to enforce that all the choices are greater than 0, but that only reduces the interval from which we can choose a number; any open interval of real numbers has infinitely many numbers in it, so we will never run out of options because of this. Still
N-1
choices to make.Between two days, the total basket volume
np.sum(basket)
only changes by at mostsome_percent
of the previous day, i.e.np.abs(np.sum(previous_basket) - np.sum(basket)) <= some_percent * np.sum(previous_basket)
. Some of the choices we could make at a given day will change the basket by more thansome_percent
of the previous day. To make sure we never violate this, we can freely makeN-2
choices and then have to pick theN-1
-th variable so that adding it and adding theN
-the variable (which is fixed from our previous choices) stays withinsome_percent
. (Note: This is an inequality constraint, so it will only reduce the number of choices if we have equality, i.e., the basket changes by exactlysome_percent
. In optimization theory this is known as the constraint being active.)We can again think about the constraint that all choices should be greater 0, but the argument remains that this simply changes the interval from which we can now freely choose
N-2
variables.So after
D
days we are left withN-1
choices to estimate from the first day (no change constraint) and(D-1)*(N-2)
choices to estimate for each following day. Unfortunately, we ran out of constraints to further reduce this number and the number of unknowns grows by at leastN-2
each day. This is essentially what what Luka Rahne meant with "2*D < N*D for all N > 2
". We will likely find many candidates which are all equally probable.The exact food prices each day don't matter for this. As long as they are of some value, they will constrain one of the choices. Hence, if you extend your game in the way you specify, there is always a chance for infinitely many solutions; regardless of the number of days.
Why the game can still not be uniquely solved - Part 2
There is one constraint we didn't look at which might help fix this: only allow integer solutions for choices. The problem with integer constraints is that they are very complex to deal with. However, our main concern here is if adding this constraint will allow us to uniquely solve the problem given enough days. For this, there is a rather intuitive counter-example. Suppose you have 3 consecutive days, and for the 1st and 3d day, the total value constraint only allows one basket. In other words, we know the basket for day 1 and day 3, but not for day 2. Here, we only know it's total value, that it is within
some_percent
of day 1 and that day 3 is withinsome_percent
of day 2. Is this enough information to always work out what is in the basket on day 2?Above is one example, where we know the values for two days thanks to the total value constraint, but that still won't allow us to work out the exact composition of the basket at day 2. Thus, while it may be possible to work it out in some cases, it is not possible in general. Adding more days after day 3 doesn't help figuring out day 2 at all. It might help in narrowing the options for day 3 (which will then narrow the options for day 2), but we already have just 1 choice left for day 3, so it's no use.
Full Code
当玩家选择的组合将可能性的数量减少到 1 时,计算机将获胜。否则,玩家可以选择一个组合,并将总数限制在一定百分比内,该计算机可能永远不会获胜。
When the player selects a combination which will reduce the number of possibilities to 1, computer will win. Otherwise, the player can pick a combination with the constraint of the total varying within a certain percentage, that computer may never win.