二进制字符串排列
我在 http://www.interviewstreet.com 上遇到问题。
Bob 收到了 Alice 发送的长度为 N 的二进制字符串。他知道由于传输错误,多达 K 位可能已被损坏(并因此被翻转)。然而,他也知道爱丽丝想要传输的字符串不是周期性的。如果一个字符串不能表示为连接一定次数的较小字符串,则该字符串不是周期性的。例如,“0001”、“0110”不是周期性的,而“00000”、“010101”是周期性字符串。 现在他想知道 Alice 可以传输多少个可能的字符串。
因此,首先我用二项式定理做了一些测试,通过使用它,我能够找到给定一个字符串和许多损坏的位可以表示一个字符串有多少种不同的方式。我的第二步是找到一种方法来查找周期字符串的数量。我发现使用具有素数长度的字符串可以轻松完成此操作。这是通过检查是否有足够的 0 或 1 来仅用 0 或 1 填充字符串来完成的。
1111111 或 0000000
现在我使用纯粹的强力算法,当涉及到任何类型的大字符串时,它不会削减它。有人可以向我指出任何类型的组合技术可以帮助解决这个问题吗?谢谢。
I came across a problem on http://www.interviewstreet.com .
Bob has received a binary string of length N transmitted by Alice. He knows that due to errors in transmission, up to K bits might have been corrupted (and hence flipped). However, he also knows that the string Alice had intended to transmit was not periodic. A string is not periodic if it cannot be represented as a smaller string concatenated some number of times. For example, "0001", "0110" are not periodic while "00000", "010101" are periodic strings.
Now he wonders how many possible strings could Alice have transmitted.
So first off I did some tests with the Binomial theorem and through the use of that I am able to find how many different ways a String can be represented given a String and a number of corrupted bits. My second step was to find a way with which to find the number of periodic Strings. I see that this can easily be done with Strings with a prime numbered length. This is done by checking if there are enough 0's or 1's to fill the String up with only 0's or 1's.
1111111 or 0000000
Right now I use a pure brute force algorithm which just wont cut it when it comes to any sort of large string. Is there any sort of Combinatorics techniques that somebody could point me to that would help solve this problem? Thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
利奥尔走在正确的道路上。
长度为
N
的字符串总数为2^N
。其中一些是周期性的。其他人则不然。我们将周期字符串的数量称为A(N)
,将非周期字符串的数量称为B(N)
。那么如果我们将长度为 1 的字符串定义为非周期性的,那么
现在让我们假设 N > N。 1..那么长度为
N
的周期串集合包括周期短于N
的周期串。然而,对于长度为 N 的非周期字符串集来说,情况并非如此。长度为
N
的周期性字符串集合由长度为n
的约数的非周期性字符串的重复组成,包括长度为 1 的字符串。换句话说:例如:
现在我们有一个关于长度
N
的周期和非周期字符串数量的递推方程。不幸的是,这对我们回答实际问题没有多大帮助。
这个问题意味着 Bob 收到了一个特定的字符串,他想知道有多少个非周期字符串与该字符串相差最多
K
位。接收到的字符串可能有C(N,K)
种可能的突变,而这些突变可能是传输的字符串。我们需要从中减去该集合中周期串的数量。我们该怎么做呢?首先,我们可以观察到任何周期串都是非周期串的重复。因此,对于每个潜在周期
k
(N
的除数),我们查看长度k
的子串。如果所有字符串与公共字符串的差异加起来不超过K
位,则该公共字符串是周期字符串的基础,我们应该将计数减一。如果最小距离为d
,并且K - d > N/k
,那么我们可以翻转每个子字符串中的各个位并且仍然有匹配项,并且我们必须相应地减少计数。Lior was on the right track.
The total number of strings of length
N
is2^N
. Some of these are periodic. Others are not. Let's call the number of periodic stringsA(N)
, and the number of non-periodic stringsB(N)
. ThenIf we define strings of length 1 to be non-periodic, then
Let's assume now that
N > 1
. Then the set of periodic strings of lengthN
includes strings that are periodic with a period shorter thanN
. However, this is not the case for the set of non-periodic strings of lengthN
.The set of periodic strings of length
N
is made up of repetitions of non-periodic strings of lengths that are divisors ofn
, including those of length 1. In other words:For example:
So we now have a recurrence equation for the number of periodic and non-periodic strings of length
N
.Unfortunately, this doesn't help us that much to answer the actual question.
The question implies that Bob has received a specific string, and he wants to know how many non-periodic strings differ by at most
K
bits from this string. There areC(N,K)
possible mutations of the received string that could be the transmitted string. We need to subtract from this the number of periodic strings in this set. How can we go about this?First, we can use the observation that any periodic string is a repetition of non-periodic strings. So, for each potential period
k
(divisor ofN
), we look at the substrings of lengthk
. If all strings are different from a common string by no more thanK
bits combined, then this common string is the basis for a periodic string and we should decrease the count by one. If the minimum distance isd
, andK - d > N/k
, then we can flip individual bits in each substring and still have a match, and we have to decrease our count accordingly.计算长度为n的非周期字符串的数量:
计算周期字符串的数量关于长度n:
找到n的所有约数,除了n本身。例如:如果n=6 - 除数为1,2,3。
(此处讨论了此方法)< /p>
每个除数 m 可用于表示 2^m 个周期字符串。例如
m=3: {000,...111 } - 2^3 周期字符串
因此,对于 n=6,有 2+4+8 个周期串
正如 Jeffery Sax 和 ANeves 指出的,其中一些周期字符串是相同的(例如 0* = 00* = 000*),因此我们必须消除这些。
一个简单的方法是将所有这些字符串添加到存储唯一元素的关联容器中(例如 在 C++ 中设置),并计算该容器中的元素数量。
更好的优化是:对于 m=m1,找到 m1 的所有除数,并避免添加这些集合中已有字符串的周期性字符串。
下一步是计算任何这些周期字符串与接收到的字符串之间的汉明距离 。如果小于K-则计数。
编辑:大 N 和小 K 的更好解决方案
检查字符串是否周期性的算法:
这可以通过将字符串与其自身的移位版本进行比较来完成。如果一个字符串与其 p 位循环移位相同,那么它的循环周期为 p。
因此,每次循环移位字符串一位 - 我们可以在最多楼层(N/2)字符串比较中检测它是否是周期性的。
计算可能传输的字符串
如果没有非周期性传输要求,并且我们收到了 N 位消息 - 可能已传输的消息数为 C(N, 0) + C(N, 1) + C(N, 2) + ... + C(N, K)
对于 N=1000 和 K=3: C(1000,0)+C(1000,1)+C(1000,2)+C(1000,3)= 166,667,501
(这是原字符串中切换0/1/2/3位的组合数)。
根据这个数字,我们需要减少无法传输的周期字符串的数量。
例如:如果接收到的字符串是000000并且K=2,我们可以确定发送的字符串不在{000000,001001,010010,100100}中。这些都是周期性的,与接收到的字符串的汉明距离最大为 K。
C(6,0)+C(6,1)+C(6,2)=1+6+15=22
其中,有 4 个组合是周期性的。
算法:
我们将从接收到的字符串开始,并生成上述所有组合。对于每个组合,我们将检查它是否是周期性的。如果是这样 - 我们会将计数减 1。
To count the number of non-periodic strings on length n:
To count the number of periodic strings on length n:
Find all divisors of n, except n itself. For example: if n=6 - the divisors are 1,2,3.
(The method for this had been discussed here)
Each divisor m can be used to represent 2^m periodic strings. For example
m=3: {000,...111} - 2^3 periodic strings
So for n=6, there are 2+4+8 periodic strings
As Jeffery Sax and ANeves pointed out, some of these periodic strings are identical {for example 0* = 00* = 000*), so we have to eliminate these.
A naive method would be to add all these strings to an associative container that stores unique elements (such as set in C++), and count the number of elements in the that container.
A better optimization would be: for m=m1, find all divisors of m1, and avoid adding strings that are periodic of strings already in these sets.
The next step would be to calculate the Hamming distance between any of these periodic strings and the received string. If it is less than K- count it.
Edit: A better solution for large N and small K
Algorithm for checking if a string is periodic:
This can be accomplished by comparing the string with a shifted-version of itself. If a string is identical to it's p-bit circular-shift - then it has a cycle of p.
So circularly-shifting the string one bit at a time - we can detect if it is periodic in up to floor(N/2) string comparisons.
Counting possible transmitted strings
If there would be no non-periodic transmission requirement, and we received an N bits message - the number of possible messages that could have been transmitted is
C(N, 0) + C(N, 1) + C(N, 2) + ... + C(N, K)
For N=1000 and K=3: C(1000,0)+C(1000,1)+C(1000,2)+C(1000,3)= 166,667,501
(This is the number of combinations of switching 0/1/2/3 bits in the original string).
From this number, we need to decrease the number of periodic strings - which couldn't have been transmitted.
For example: if the received string was 000000 and K=2, we can be sure that the transmitted string was not in {000000,001001,010010,100100}. These are all periodic, with hamming distance of up to K from the received string.
C(6,0)+C(6,1)+C(6,2)=1+6+15=22
Out of these, 4 combinations are periodic.
Algorithm:
We'll start with the received string, and generate all combinations stated above. For each combination we will check if it is periodic. If so - we'll decrease our count by 1.
Lior 和 Jeffrey 的答案构成了解决该问题的基础,但本文仍有待解决的一个有趣问题是,如何有效地计算给定
[input string, N, K]
。我的回答主要集中在这一点上。正如 Lior 和 Jeffrey 所指出的,在检查周期性字符串时,我们只需要关心长度等于 n 的约数的子字符串。让我们看一个例子,看看实现这一目标的效率如何。
周期为 m 的周期串的数量
假设输入字符串为
,让我们尝试找到周期为 m=4 的周期串的数量
第一位
如果我们比较每个子串的第一位,我们会发现它们全部都是是
0
。如果我们假设所有子串中的所有后续位都相同,则在执行 0 位翻转或 4 位翻转时,可以使输入字符串成为周期性的(周期为 4)。现在我们知道存在 2 个周期性字符串,一个用于 k=0,另一个用于 k=4(假设所有子串中的后续位都相同)。
第二位
现在让我们进入第二位。
但是等等,上面的陈述是正确的,前提是子串中当前位之前的所有位也有助于使字符串具有周期性。我们知道,只有
k=0
和k=4
各自使字符串周期到第 1 位。因此,当考虑到第 2 位之前的所有位时,我们可以在以下 4 种情况下获得周期字符串:
第三位
转到第三位,我们将看到:
第四位
对于第四位也是最后一位:
我们现在完成了加上子串的最后一位以及最后一步中各种 k 值的周期串的总数,得出 16。
递归关系和伪代码
让我们表示任何变化的
k
的周期串的当前计数从使用 R[k] 将1
转换为K
。对于每次迭代,我们需要查找前一次迭代的R[]
值。我们在每次迭代中最终要做的事情是:
如果我们通过从最低到最高迭代所有周期来执行上述过程,我们将获得所有周期字符串的计数。将选择的周期将是小于
N
的N
的约数。从最低到最高的顺序查看它们会有优势,请阅读下一节以了解详细信息。考虑可被较小周期整除的周期
仔细观察,您会发现我们还会多次计算某些周期字符串。
例如。以下周期字符串:
最终将被计为 m = 4 和 m = 8 的一部分
让
C[m]
表示在长度为的周期内获得的周期字符串的总数m
使用上面的伪代码获得。令C[m']
表示使用长度为m
的周期获得的周期串的实际计数,但不计算使用periods
可以形成的周期串。 m
更具体地说,如果当前周期
m
具有除数t
、u
和v
,它们是小于m
,那么我们将计算周期t
、u
和v
周期字符串的数量同样,即当计算所有 m 值的周期串总数时,我们需要注意排除
C[t]
、C[u]
和C[v]
并仅考虑C[m']
。因为当我们计算
C[m]
时,我们已经计算了C[t']
、C[u']
的值> 和C[v']
,我们只需查找它们并用C[m]
减去它们即可得到C[m']
。我将把这个简单的部分作为读者的练习。Lior and Jeffrey's answers form the foundation for solving the problem, but one interesting problem that remains to be solved in this post is, how can you efficiently compute the number of strings which are periodic for a given
[input string, N, K]
. My answer will primarily focus on that.As pointed out by Lior and Jeffrey, we need only to care about the substrings of length equal to the divisors of n when checking for strings which are periodic. Let's look at an example to see how efficient we can achieve this.
Number of periodic strings with period m
Let the input string be
and let us try to find the number of periodic strings with period m=4
First Bit
If we compare the first bit of each of the substrings, we'll see that all of them are
0
s. If we assume all the subsequent bits are the same, amongst all the substrings, then the input string can be made periodic (with period 4), when performing either 0 bitflips or 4 bitflips.So now we know there exists 2 strings which are periodic, one for k=0 and other for k=4 (assuming that the subsequent bits are the same in all the substrings).
Second Bit
Now let's progress to the second bit.
But wait, the above statement is true IFF all the bits prior to the current bit in the substring also contribute in making the string periodic. We know that only
k=0
andk=4
, each make the string periodic up to 1st bit.So when accounting for all bits upto the 2nd bit, we can get a periodic string in the following 4 cases:
Third Bit
Moving on to third bit, we'll see this:
Fourth Bit
For our fourth and final bit:
We are now done with the last bit of the substring and the total of the periodic strings for various k values in the last step gives 16.
Recursive relation and Pseudocode
Let us denote the current count of periodic strings for any
k
which varies from1
toK
using R[k]. For each iteration, we need to look up the previous iteration'sR[]
value.What we essentially end up doing in each iteration is:
If we perform the above process by iterating over all periods from the lowest to the highest we'll get the count of all the periodic strings. The periods which will be chosen will be the divisors of
N
which are less thanN
. Going over them from lowest to highest will have an advantage, read the next section for the details.Accounting for periods which are divisible by smaller periods
On close observation you'll notice that we're also counting certain periodic strings more than once.
eg. the following periodic string:
will end up being counted as part of both m = 4, and m = 8
Let
C[m]
denote the total count of periodic strings obtained for a period of lengthm
obtained using the above pseudocode. LetC[m']
denote the actual count of periodic strings obtained using period of lengthm
but not counting the periodic strings which can be formed usingperiods < m
More specifically, if the current period
m
has divisorst
,u
andv
which are less thanm
, then we'll be counting the number of periodic strings of periodst
,u
andv
as well, i.e.When counting the total number of periodic strings for all values of m, we need to take care of excluding
C[t]
,C[u]
, andC[v]
and only account forC[m']
.Since when we're computing
C[m]
, we already would have computed the values forC[t']
,C[u']
andC[v']
, we just need to look them up and subtract them fromC[m]
to obtainC[m']
. I'll leave this simple portion as an exercise for the reader.复杂度 N^1/2 * N * (除数之和)
complexity N^1/2 * N * (sum of divisors)
扩展 Lior Kogan 的答案 二进制字符串排列
设 F(n) 为非周期数可以形成给定长度n的字符串。
为了找到 n 的非周期字符串 F(n) 的数量,我们需要从 n 的所有排列中消除所有 F(m),其中 m 是 n 的约数。
简单的 python 实现
输出:
[0, 2, 2, 6, 12, 30, 54, 126, 240, 504]
Extending the Lior Kogan's answer Binary String permutations
Let F(n) be the number of aperiodic strings that can be formed for a given length n.
To find number of aperiodic string for n, F(n), we need to eliminate all the F(m)'s where m is a divisor of n from all the permutations of n.
naive python implementation
output :
[0, 2, 2, 6, 12, 30, 54, 126, 240, 504]
我用下面的算法解决了这个答案,这是非常基本的。希望我解决的问题是您所问问题的答案。
如果字符串中有 8 个二进制字符,并且我想获得它的所有可能的排列,那么以下算法将正确地给出这些值。在其中,我特意跳过了排列“00000000”,因为它对我来说没有价值:)。
下面的代码是用 Ruby 编写的:
I solved this answer with the following algorithm, which is extremely basic. Hopefully the problem I solved is the answer to what you are asking about.
If I have 8 binary characters in a string, and I want to get all possible permutations for it, then the following algorithm will correctly give you those values. In it, I specifically skip the permutation "00000000" as it has no value to me :).
The below code is in Ruby: