创建项目的独特随机串联的算法
我正在考虑一种算法,该算法将创建 Y 部分的 X 个最独特的串联,其中每个部分可以是多个项目之一。例如 3 部分:
part #1: 0,1,2 part #2: a,b,c part #3: x,y,z
5 个串联的(随机的,某些可能性的一种情况)结果:
0ax 1by 2cz 0bz (note that '0by' would be "less unique " than '0bz' because 'by' already was) 2ay (note that 'a' didn't after '2' jet, and 'y' didn't after 'a' jet)
下一个串联的简单坏结果:
1cy ('c' wasn't after 1, 'y' wasn't after 'c', BUT '1'-'y' already was as first-last
简单好下一个结果将是:
0cy ('c' wasn't after '0', 'y' wasn't after 'c', and '0'-'y' wasn't as first-last part)
1az
1cx
我知道这个解决方案限制了可能的结果,但是当所有完全独特的可能性将会消失,算法应该继续并尝试保持最可用的唯一性(尽可能少地重复)。
考虑真实的例子:
Boy/Girl/Martin
bought/stole/get
bottle/milk/water
我想要这样的结果:
Boy get milk
Martin stole bottle
Girl bought water
Boy bought bottle (not water, because of 'bought+water' and not milk, because of 'Boy+milk')
也许从所有组合的树开始,但如何首先选择最独特的树?
编辑:根据这个示例数据,我们可以看到,创建完全独特的4 个单词 * 3 种可能性的结果,仅向我们提供 3 个结果:
Martin stole a bootle
Boy bought an milk
He get hard water
但是,可以请求更多结果。所以,4.结果应该是最可用的唯一性,就像马丁买了硬牛奶
,而不是马丁偷了水
编辑:有人开始寻找解决方案? 将每个部分想象成一个可以旋转的桶,向下旋转时最后一个项目作为第一个,向上旋转时第一个作为最后一个。现在,像这样设置巴雷尔:
Martin|stole |a |bootle
Boy |bought|an |milk
He |get |hard|water
现在,按照我们看到的那样写句子,然后将第一个巴雷尔向上旋转一次,第二个巴雷尔旋转两次,第三个巴雷尔旋转三次,依此类推。我们得到句子(注意第三巴雷尔做了一个完整的旋转):
Boy |get |a |milk
He |stole |an |water
Martin|bought|hard|bootle
我们得到下一个解决方案。我们可以再进行一次处理以获得更多的解决方案:
He |bought|a |water
Martin|get |an |bootle
Boy |stole |hard|milk
问题是第一个桶将与最后一个桶连接,因为平行旋转。 我想知道如果我在最后一个解决方案中再旋转最后一个桶一次,这是否会更独特(但我提供了其他连接,例如水 - 但这只会重复2次,而不是像现在这样3次)。不知道“桶”是这里的好思考方式。
我认为我们应该首先找到唯一性的定义
例如,改变唯一性要丢弃什么?如果我们使用已经使用过的单词?重复两个彼此靠近的单词是否比在其他单词的某个间隙中重复一个单词更独特?所以,这个问题可能是主观的。
但我认为在很多序列中,每个单词应该使用相似的次数(例如随机选择单词并从集合中删除,并在获取所有单词后刷新下次可以获得的所有选项) - 这很容易做到。
但是,即使我们每个单词的次数相似,我们也应该采取一些措施来避免单词之间的重复连接。我认为,更独特的是重复远离彼此的单词,而不是彼此相邻的单词。
I'm thinking about an algorithm that will create X most unique concatenations of Y parts, where each part can be one of several items. For example 3 parts:
part #1: 0,1,2 part #2: a,b,c part #3: x,y,z
And the (random, one case of some possibilities) result of 5 concatenations:
0ax 1by 2cz 0bz (note that '0by' would be "less unique " than '0bz' because 'by' already was) 2ay (note that 'a' didn't after '2' jet, and 'y' didn't after 'a' jet)
Simple BAD results for next concatenation:
1cy ('c' wasn't after 1, 'y' wasn't after 'c', BUT '1'-'y' already was as first-last
Simple GOOD next result would be:
0cy ('c' wasn't after '0', 'y' wasn't after 'c', and '0'-'y' wasn't as first-last part)
1az
1cx
I know that this solution limit possible results, but when all full unique possibilities will gone, algorithm should continue and try to keep most avaible uniqueness (repeating as few as possible).
Consider real example:
Boy/Girl/Martin
bought/stole/get
bottle/milk/water
And I want results like:
Boy get milk
Martin stole bottle
Girl bought water
Boy bought bottle (not water, because of 'bought+water' and not milk, because of 'Boy+milk')
Maybe start with a tree of all combinations, but how to select most unique trees first?
Edit: According to this sample data, we can see, that creation of fully unique results for 4 words * 3 possibilities, provide us only 3 results:
Martin stole a bootle
Boy bought an milk
He get hard water
But, there can be more results requested. So, 4. result should be most-available-uniqueness like Martin bought hard milk
, not Martin stole a water
Edit: Some start for a solution ?
Imagine each part as a barrel, wich can be rotated, and last item goes as first when rotates down, first goes as last when rotating up. Now, set barells like this:
Martin|stole |a |bootle
Boy |bought|an |milk
He |get |hard|water
Now, write sentences as We see, and rotate first barell UP once, second twice, third three and so on. We get sentences (note that third barell did one full rotation):
Boy |get |a |milk
He |stole |an |water
Martin|bought|hard|bootle
And we get next solutions. We can do process one more time to get more solutions:
He |bought|a |water
Martin|get |an |bootle
Boy |stole |hard|milk
The problem is that first barrel will be connected with last, because rotating parallel.
I'm wondering if that will be more uniqe if i rotate last barrel one more time in last solution (but the i provide other connections like an-water - but this will be repeated only 2 times, not 3 times like now). Don't know that "barrels" are good way ofthinking here.
I think that we should first found a definition for uniqueness
For example, what is changing uniqueness to drop ? If we use word that was already used ? Do repeating 2 words close to each other is less uniqe that repeating a word in some gap of other words ? So, this problem can be subjective.
But I think that in lot of sequences, each word should be used similar times (like selecting word randomly and removing from a set, and after getting all words refresh all options that they can be obtained next time) - this is easy to do.
But, even if we get each words similar number od times, we should do something to do-not-repeat-connections between words. I think, that more uniqe is repeating words far from each other, not next to each other.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
任何时候您需要一个新的串联,只需生成一个完全随机的串联,计算它的适合度,然后接受该串联或拒绝它(即概率上)。
CalculateFitness 永远不应该返回零。如果是这样,您可能会发现自己陷入无限循环。
随着 C 的增加,不太理想的串联更容易被接受。
当您减少 C 时,每次调用 CreateGoodConcatenation 时您都会面临增加的迭代(加上结果中的熵减少)
Anytime you need a new concatenation, just generate a completely random one, calculate it's fitness, and then either accept that concatenation or reject it (probabilistically, that is).
CalculateFitness should never return zero. If it does, you might find yourself in an infinite loop.
As you increase C, less ideal concatenations are accepted more readily.
As you decrease C, you face increased iterations for each call to CreateGoodConcatenation (plus less entropy in the result)