计算排列中有效块数量的算法
可能的重复:
查找排列中的排序子序列
给定一个数组 A,其中包含1,2,...,n 的排列。子块 A[i..j]
如果数组 A 的所有数字都出现在 A[i..j]
中,则称为有效块
是连续的数字(可能不按顺序)。
给定一个数组 A= [ 7 3 4 1 2 6 5 8] 有效块是 [3 4], [1,2], [6,5],
[3 4 1 2], [3 4 1 2 6 5], [7 3 4 1 2 6 5], [7 3 4 1 2 6 5 8]
因此上述排列的计数为 7。
给出一个 O( n log n) 算法来计算有效块的数量。
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
好的,我的代表人数减少到 1,因为我在相关问题上悬赏 200:查找排列中已排序的子序列
所以我暂时不能发表评论。
我有一个想法:
1) 找到所有排列组。它们是:(78)、(34)、(12)、(65)。与群论不同,它们的顺序和位置以及它们是否相邻都很重要。因此,群 (78) 可以表示为结构
(7, 8, false)
,而 (34) 将表示为(3,4,true)
。我使用 Python 的元组表示法,但实际上使用整个类来表示组可能会更好。这里的真或假意味着连续或不连续。如果(max(gp1) == min(gp2) + 1 或 max(gp2) == min(gp1) + 1) 且连续(gp1) 和连续(gp2),则两个组“相邻”
。这不是union(gp1, gp2)
连续的唯一条件,因为(14)
和(23)
组合成 <代码>(14) 很好。对于算法课作业来说,这是一个很好的问题,但对于面试来说,这是一个糟糕的问题。我怀疑这是家庭作业。Ok, I am down to 1 rep because I put 200 bounty on a related question: Finding sorted sub-sequences in a permutation
so I cannot leave comments for a while.
I have an idea:
1) Locate all permutation groups. They are: (78), (34), (12), (65). Unlike in group theory, their order and position, and whether they are adjacent matters. So, a group (78) can be represented as a structure
(7, 8, false)
, while (34) would be(3,4,true)
. I am using Python's notation for tuples, but it is actually might be better to use a whole class for the group. Here true or false means contiguous or not. Two groups are "adjacent" if(max(gp1) == min(gp2) + 1 or max(gp2) == min(gp1) + 1) and contigous(gp1) and contiguos(gp2)
. This is not the only condition, forunion(gp1, gp2)
to be contiguous, because(14)
and(23)
combine into(14)
nicely. This is a great question for algo class homework, but a terrible one for interview. I suspect this is homework.只是一些想法:
乍一看,这听起来不可能:完全排序的数组将具有 O(n2) 个有效子块。
因此,您需要一次计算多个有效子块。检查子块的有效性是O(n)。检查子块是否完全排序也是O(n)。完全排序的子块包含 n·(n - 1)/2 个有效子块,您可以对这些子块进行计数,而无需进一步分解该子块。
现在,整个数组显然始终有效。对于分而治之的方法,您需要将其分解。有两个可能的断点:最高元素的位置和最低元素的位置。如果在这些点之一将数组分成两部分,包括包含第二个极值元素的部分中的极值,则不可能有有效的子块跨越此断点。
通过始终选择产生更均匀分割的极值,这对于“随机”数组应该效果很好(平均 O(n log n))。但是,当您的输入类似于
(1 5 2 6 3 7 4 8)
时,我会发现问题,这似乎会产生 O(n2)< /em> 行为。(1 4 7 2 5 8 3 6 9)
类似(我希望你能看到这个模式)。我目前没有看到任何技巧来捕捉这种更糟糕的情况,但似乎它需要其他分裂技术。Just some thoughts:
At first sight, this sounds impossible: a fully sorted array would have O(n2) valid sub-blocks.
So, you would need to count more than one valid sub-block at a time. Checking the validity of a sub-block is O(n). Checking whether a sub-block is fully sorted is O(n) as well. A fully sorted sub-block contains n·(n - 1)/2 valid sub-blocks, which you can count without further breaking this sub-block up.
Now, the entire array is obviously always valid. For a divide-and-conquer approach, you would need to break this up. There are two conceivable breaking points: the location of the highest element, and that of the lowest element. If you break the array into two at one of these points, including the extremum in the part that contains the second-to-extreme element, there cannot be a valid sub-block crossing this break-point.
By always choosing the extremum that produces a more even split, this should work quite well (average O(n log n)) for "random" arrays. However, I can see problems when your input is something like
(1 5 2 6 3 7 4 8)
, which seems to produce O(n2) behaviour.(1 4 7 2 5 8 3 6 9)
would be similar (I hope you see the pattern). I currently see no trick to catch this kind of worse case, but it seems that it requires other splitting techniques.这个问题确实涉及到一些“数学技巧”,但一旦你明白了,它就相当简单了。但是,我的解决方案的其余部分不符合 O(n log n) 标准。
数学部分:
对于任何两个连续的数字,它们的总和为
2k+1
,其中k
是最小元素。对于 3,它是3k+3
,4:4k+6
,对于N
这样的数字,它是Nk + sum(1, N-1)
。因此,您需要两个可以同时完成的步骤:动态编程部分
使用前一行条目的结果构建两个表,以构建每个连续行的条目。不幸的是,我完全错了,因为这仍然需要 n^2 子数组检查。啊!
This question does involve a bit of a "math trick" but it's fairly straight forward once you get it. However, the rest of my solution won't fit the O(n log n) criteria.
The math portion:
For any two consecutive numbers their sum is
2k+1
wherek
is the smallest element. For three it is3k+3
, 4 :4k+6
and forN
such numbers it isNk + sum(1,N-1)
. Hence, you need two steps which can be done simultaneously:The dynamic programming portion
Build two tables using the results of the previous row's entries to build each successive row's entries. Unfortunately, I'm totally wrong as this would still necessitate n^2 sub-array checks. Ugh!
我的命题
STEP = 2 // 考试数量
B [0,0,0,0,0,0,0,0]
B [1,1,0,0,0,0,0,0]
VALID(A ,B) - 如果移动一 B 无效
[0,1,1,0,0,0,0,0]
VALID(A,B) - 如果移动一和步骤
B 有效 [0,0,0,1, 1,0,0,0]
有效 (A,B)
B [0,0,0,0,0,1,1,0]
步骤 = 3
B [1,1,1,0,0,0,0 ,0] 不行
B [0,1,1,1,0,0,0,0] 可以
B [0,0,0,0,1,1,1,0] 不行
STEP = 4
B [1 ,1,1,1,0,0,0,0] not ok
B [0,1,1,1,1,0,0,0] ok
.....
valid 方法检查所有元素是否连续
从未测试过,但可能没问题
My proposition
STEP = 2 // amount of examed number
B [0,0,0,0,0,0,0,0]
B [1,1,0,0,0,0,0,0]
VALID(A,B) - if not valid move one
B [0,1,1,0,0,0,0,0]
VALID(A,B) - if valid move one and step
B [0,0,0,1,1,0,0,0]
VALID (A,B)
B [0,0,0,0,0,1,1,0]
STEP = 3
B [1,1,1,0,0,0,0,0] not ok
B [0,1,1,1,0,0,0,0] ok
B [0,0,0,0,1,1,1,0] not ok
STEP = 4
B [1,1,1,1,0,0,0,0] not ok
B [0,1,1,1,1,0,0,0] ok
.....
The valid method check that all elements are consecutive
Never tested but, might be ok
原始数组不包含重复项,因此本身必须是连续的块。我们称这个块为(1 ~ n)。我们可以通过检查第一个元素是否为 1 或 n 来测试块 (2 ~ n) 是否连续,时间复杂度为 O(1)。同样,我们可以通过检查最后一个元素是 1 还是 n 来测试块 (1 ~ n-1)。
我无法将其完全塑造成一个有效的解决方案,但也许它会帮助某人......
The original array doesn't contain duplicates so must itself be a consecutive block. Lets call this block (1 ~ n). We can test to see whether block (2 ~ n) is consecutive by checking if the first element is 1 or n which is O(1). Likewise we can test block (1 ~ n-1) by checking whether the last element is 1 or n.
I can't quite mould this into a solution that works but maybe it will help someone along...
和其他人一样,我只是把它扔掉......它适用于下面的单个示例,但是 YMMV!
这个想法是计算非法子块的数量,并从可能的总数中减去它。我们通过依次检查每个数组元素并排除包含该元素但不包含其前驱或后继的子块来计算非法元素。
对于 [1,N] 中的 i,计算 B[A[i]] = i。
设 Count = 长度>1 的子块总数,即 N-choose-2(起始索引和结束索引的每种可能组合各一个)。
对于每个 i,考虑 A[i]。忽略边缘情况,设 x=A[i]-1,设 y=A[i]+1。 A[i] 不能参与任何不包含 x 或 y 的子块。设 iX=B[x] 且 iY=B[y]。这里有几种情况需要独立对待。一般情况是
iX。在这种情况下,我们可以消除子块 A[iX+1 .. iY-1] 以及所有包含 i 的中间块。有 (i - iX + 1) * (iY - i + 1) 个这样的子块,所以称这个数为已消除数。 (其他情况留给读者作为练习,这些边缘情况也是如此。)设置计数 = 计数 - 消除。
返回计数。
总成本似乎为 N *(步骤 2 的成本)= O(N)。
WRINKLE:在步骤 2 中,我们必须小心不要多次消除每个子区间。我们可以通过仅消除完全或部分位于位置 i 右侧的子区间来实现此目的。
示例:
A = [1, 3, 2, 4]
B = [1, 3, 2, 4]
初始计数 = (4*3)/2 = 6
i=1: A[i]=1,因此需要包含 2 的子块。我们可以排除[1,3]。消除=1,计数-> 5.
i=2:A[i]=3,因此需要包含2或4的子块。这排除了 [1,3],但我们在从 i=1 向右看时已经考虑到了它。消除= 0。i
=3:A[i]=2,因此需要其中包含[1]或[3]的子块。我们可以排除[2,4]。消除=1,计数-> 4.
i=4:A[i] = 4,所以我们需要其中包含[3]的子块。这排除了 [2,4],但我们在从 i=3 向右看时已经考虑到了它。已消除 = 0。
最终计数 = 4,对应于子块 [1,3,2,4]、[1,3,2]、[3,2,4] 和 [3,2]。
Like everybody else, I'm just throwing this out ... it works for the single example below, but YMMV!
The idea is to count the number of illegal sub-blocks, and subtract this from the total possible number. We count the illegal ones by examining each array element in turn and ruling out sub-blocks that include the element but not its predecessor or successor.
Foreach i in [1,N], compute B[A[i]] = i.
Let Count = the total number of sub-blocks with length>1, which is N-choose-2 (one for each possible combination of starting and ending index).
Foreach i, consider A[i]. Ignoring edge cases, let x=A[i]-1, and let y=A[i]+1. A[i] cannot participate in any sub-block that does not include x or y. Let iX=B[x] and iY=B[y]. There are several cases to be treated independently here. The general case is that
iX<i<iY<i
. In this case, we can eliminate the sub-block A[iX+1 .. iY-1] and all intervening blocks containing i. There are (i - iX + 1) * (iY - i + 1) such sub-blocks, so call this number Eliminated. (Other cases left as an exercise for the reader, as are those edge cases.) Set Count = Count - Eliminated.Return Count.
The total cost appears to be N * (cost of step 2) = O(N).
WRINKLE: In step 2, we must be careful not to eliminate each sub-interval more than once. We can accomplish this by only eliminating sub-intervals that lie fully or partly to the right of position i.
Example:
A = [1, 3, 2, 4]
B = [1, 3, 2, 4]
Initial count = (4*3)/2 = 6
i=1: A[i]=1, so need sub-blocks with 2 in them. We can eliminate [1,3] from consideration. Eliminated = 1, Count -> 5.
i=2: A[i]=3, so need sub-blocks with 2 or 4 in them. This rules out [1,3] but we already accounted for it when looking right from i=1. Eliminated = 0.
i=3: A[i] = 2, so need sub-blocks with [1] or [3] in them. We can eliminate [2,4] from consideration. Eliminated = 1, Count -> 4.
i=4: A[i] = 4, so we need sub-blocks with [3] in them. This rules out [2,4] but we already accounted for it when looking right from i=3. Eliminated = 0.
Final Count = 4, corresponding to the sub-blocks [1,3,2,4], [1,3,2], [3,2,4] and [3,2].
(这是尝试执行 N.log(N) 最坏情况。不幸的是,这是错误的——有时会低估。它错误地假设您可以通过仅查看相邻的较小有效块对来找到所有块。事实上,您有查看三元组、四元组等,以获得所有较大的块。)
您可以使用表示子块和子块队列的结构来完成此操作。
分配一个与原始数组大小相同的子块数组,并将每个子块初始化为其中只有一个项目。随时将它们添加到队列中。如果你从数组 [ 7 3 4 1 2 6 5 8 ] 开始,你最终会得到一个像这样的队列:
queue: ( [7,7] [3,3] [4,4] [1,1] [2 ,2] [6,6] [5,5] [8,8] )
subbblock [7,7] 的 {index, width, lo_value, p_above } 值将为 { 0, 1, 7, null }。
现在很容易了。请原谅 c-ish 伪代码。
这会找到我认为的所有内容。通常为 O(N log(N)),但对于完全排序或反排序列表,则为 O(N^2)。我认为这个问题有一个答案——当您构建原始子块数组时,您会查找排序和反排序序列并将它们添加为基础级子块。如果您要保留计数,请将其增加 (width * (width + 1))/2 作为基础级别。这将为您提供包括所有 1 长度子块在内的计数。
之后,只需使用上面的循环,弹出并推送队列即可。如果您正在计数,则必须在左侧和右侧子块上都有一个乘数,并将它们相乘以计算增量。乘数是最左边(对于 p_left)或最右边(对于 p_right)基本级子块的宽度。
希望这是清楚的并且没有太多问题。我只是把它敲出来,所以它甚至可能是错误的。
[稍后注意。这毕竟是行不通的。请参阅下面的注释。]
(This is an attempt to do this N.log(N) worst case. Unfortunately it's wrong -- it sometimes undercounts. It incorrectly assumes you can find all the blocks by looking at only adjacent pairs of smaller valid blocks. In fact you have to look at triplets, quadruples, etc, to get all the larger blocks.)
You do it with a struct that represents a subblock and a queue for subblocks.
Alloc an array of subblocks the same size as the original array, and init each subblock to have exactly one item in it. Add them to the queue as you go. If you start with array [ 7 3 4 1 2 6 5 8 ] you will end up with a queue like this:
queue: ( [7,7] [3,3] [4,4] [1,1] [2,2] [6,6] [5,5] [8,8] )
The { index, width, lo_value, p_above } values for subbblock [7,7] will be { 0, 1, 7, null }.
Now it's easy. Forgive the c-ish pseudo-code.
This will find them all I think. It's usually O(N log(N)), but it'll be O(N^2) for a fully sorted or anti-sorted list. I think there's an answer to this though -- when you build the original array of subblocks you look for sorted and anti-sorted sequences and add them as the base-level subblocks. If you are keeping a count increment it by (width * (width + 1))/2 for the base-level. That'll give you the count INCLUDING all the 1-length subblocks.
After that just use the loop above, popping and pushing the queue. If you're counting you'll have to have a multiplier on both the left and right subblocks and multiply these together to calculate the increment. The multiplier is the width of the leftmost (for p_left) or rightmost (for p_right) base-level subblock.
Hope this is clear and not too buggy. I'm just banging it out, so it may even be wrong.
[Later note. This doesn't work after all. See note below.]