有趣的字符串算法
给定两个有限的字符串序列,
A
和B
,每个长度为n
, 例如:A1:“kk”,A2:“ka”,A3:“kkk”,A4:“a” B1:“ka”,B2:“kakk”,B3:“ak”,B4:“k”
给出一个有限的索引序列,使得它们对于 A 的集中度 和 B 给出相同的字符串。 允许重复。
在此示例中,我找不到解决方案,但例如如果列表 (1,2,2,4)
是一个解决方案,则 A1 + A2 + A2 + A4 = B1 + B2 + B2 + B4。 在这个例子中只有两个角色,但是已经非常困难了。 事实上,找到一个字符的最短解决方案并不是一件容易的事!
我试着想一些事情......例如,字符串长度的总和必须相等,并且对于第一个和最后一个字符串,我们需要相应的字符。 但没有别的。 我想对于某些字符串来说这是根本不可能的。 有人能想到一个好的算法吗?
编辑:显然,这是 帖子通信问题
没有算法可以决定是否存在这样的问题一个实例是否有解。 如果有的话,停机问题就可以解决。 肮脏的伎俩...
Given two finite sequences of string,
A
andB
, of lengthn
each,
for example:A1: "kk", A2: "ka", A3: "kkk", A4: "a" B1: "ka", B2: "kakk", B3: "ak", B4: "k"
Give a finite sequences of indexes so that their concentration for A
and B gives the same string. Repetitions allowed.
In this example I can't find the solution but for example if the list (1,2,2,4)
is a solution then A1 + A2 + A2 + A4 = B1 + B2 + B2 + B4
. In this example there are only two characters but it's already very difficult. Actually it's not even trivial to find the shortest solution with one character!
I tried to think of things.. for example the total sum of the length of the strings must be equal and the for the first and last string we need corresponding characters. But nothing else. I suppose for some set of strings it's simply impossible. Anyone can think of a good algorithm?
EDIT: Apparently, this is the Post Correspondence Problem
There is no algorithm that can decide whether a such an instance has a solution or not. If there were, the halting problem could be solved. Dirty trick...
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
非常棘手的问题,但我会尝试一下。 这更像是一种意识流,而不是答案,提前道歉。
如果我理解正确的话,你会得到 2 个大小相等的字符串序列,A 和 B,索引从 1..n 开始。 然后,您必须找到一个索引序列,使得字符串 A(1)..A(m) 的串联等于字符串 B(1)..B(m) 的串联,其中 m 是索引序列的长度。
我观察到的第一件事是可能有无限多个解决方案。 例如,给定:
可能的解决方案是:
那么你怎么知道何时停止呢? 一旦你有了一个解决方案? 一旦其中一个解决方案是另一个解决方案的超集?
您可以开始的一个地方是从两个集合中取出所有最小公共长度的字符串(在上面的示例中,您将从两个集合中取出“x”,然后搜索共享公共索引的 2 个相等的字符串。然后您可以对下一个尺寸的字符串重复此操作,例如,如果第一组有 3 个长度分别为 1、2 和 3 的字符串,第二组分别有长度为 1、3 和 3 的字符串,则您将采用 的字符串。 length 3。如果你找不到更多的字符串,那么
当你必须开始组合多个字符串时,问题就会变得更加困难,就像我上面的例子一样。强制方法是开始排列两个集合中的所有字符串,当连接时,会产生相同长度的字符串,然后比较它们,在下面的示例中:
您将从长度为 2 的序列开始:
比较这些给出 "gaac" 与 "baag" 和 "baga" 与 "ggac",两者都不是相等,所以没有解。 接下来,我们将寻找长度为 3 的序列:
同样,没有解,所以我们最终得到的序列尺寸 4,我们没有解决方案。
现在它变得更加棘手,因为我们必须开始考虑也许重复一些指数,现在我的大脑正在融化。
我认为在字符串中查找公共子序列可能会有所帮助,然后使用字符串中不匹配的其余部分。 但我不太知道怎么做。
Very tough question, but I'll give it a shot. This is more of a stream of consciousness than an answer, apologies in advance.
If I understand this correctly, you're given 2 equal sized sequences of strings, A and B, indexed from 1..n, say. You then have to find a sequence of indices such that the concatenation of strings A(1)..A(m) equals the concatenation of strings B(1)..B(m) where m is the length of the sequence of indices.
The first thing I would observe is that there could be an infinite number of solutions. For example, given:
Possible solutions are:
So how would you know when to stop? As soon as you had one solution? As soon as one of the solutions is a superset of another solution?
One place you could start would be by taking all the strings of minimum common length from both sets (in my example above, you would take the "x" from both, and searching for 2 equal strings that share a common index. You can then repeat this for strings of the next size up. For example, if the first set has 3 strings of length 1, 2 and 3 respectively, and the second set has strings of length 1, 3 and 3 respectively, you would take the strings of length 3. You would do this until you have no more strings. If you find any, then you have a solution to the problem.
It then gets harder when you have to start combining several strings as in my example above. The naive, brute force approach would be to start permuting all strings from both sets that, when concatenated, result in strings of the same length, then compare them. So in the below example:
You would start with sequences of length 2:
Comparing these gives "gaac" vs "baag" and "baga" vs "ggac", neither of which are equal, so there are no solutions there. Next, we would go for sequences of length 3:
Again, no solutions, so then we end up with sequences of size 4, of which we have no solutions.
Now it gets even trickier, as we have to start thinking about perhaps repeating some indices, and now my brain is melting.
I'm thinking looking for common subsequences in the strings might be helpful, and then using the remaining parts in the strings that were not matched. But I don't quite know how.
一个非常简单的方法就是使用广度优先搜索之类的东西。 这还有一个优点,即找到的第一个解决方案将具有最小的尺寸。
A very simple way is to just use something like a breadth-first search. This also has the advantage that the first solution found will have minimal size.
目前尚不清楚您正在寻找的“解决方案”是什么,最长的解决方案? 最短的? 所有解决方案?
由于您允许重复,因此某些输入会有无限多个解决方案,因此我将致力于:
查找固定长度下的所有序列。
以伪代码形式编写,但其方式与 f# 序列表达式非常相似
一些可减少问题的琐碎约束:
基于此我们可以快速消除很多无解的输入
It is not clear what the 'solution' you are looking for is, the longest solution? the shortest? all solutions?
Since you allow repetition there will an infinite number of solutions for some inputs so I will work on:
Find all sequences under a fixed length.
Written as a pseudo code but in a manner very similar to f# sequence expressions
Some trivial constraints to reduce the problem:
Based on this we can quickly eliminate many inputs with no solution
这是暴力搜索的建议。 首先生成限制为列表长度的数字序列:
[0,0,..]
[1,0,..]
[2,0,..]
[3,0,..]
[0,1,..]
...
数字序列长度决定了找到的任何解决方案中将包含多少个字符串。
然后使用数字作为字符串列表的索引来生成 A 和 B 字符串:
该算法假设 A 和 B 的长度相等。
我测试了你的示例,
但找不到任何解决方案,尽管它似乎适用于简单的测试。
Here's a suggestion for a brute force search. First generate number sequences bounded to the length of your list:
[0,0,..]
[1,0,..]
[2,0,..]
[3,0,..]
[0,1,..]
...
The number sequence length determines how many strings are going to be in any solution found.
Then generate A and B strings by using the numbers as indexes into your string lists:
This algorithm assumes equal lengths for A and B.
I tested your example with
but could not find any solutions, though it seemed to work for simple tests.