给定未排序的二进制数组,计数为1' s,只能检查整个子阵列是否均为zeros
给定一个未排序的二进制数组,a
,唯一允许的操作是all_zeros(a)
,它返回true
如果所有数组的元素为0 。
此all_zeros(a)
的复杂性是o(len(a)) +大型开销常数
我想找到所有包含1s,的索引All_zeros 的运行最少 一个合理的子问题是假设1s的数量“大多数”(例如,x100〜x1000)小于0。
从理论上讲,这是通过在数组元素上迭代和测试all_zeros([element])
。
来解决的。
实际上,间接费用恒定迫使我们尽可能大的批量工作。我们不能假设知道数组中1的比率,但是如果某些算法需要该知识,请分享。
我正在寻找一个概念解决方案,因此我没有指定间接费用常数与all_zeros
的计算时间之间的比率。
请注意,我正在寻找平均情况解决方案,而不是为最坏的情况解决方案。
现在,这需要在1和0上定义概率分布,但是我试图将其保持在很高的水平上,而我不会大大介绍细节,同时仍保持此答案。
可能会有一个最佳情况解决方案,这些解决方案始终获得最小开销。如果有一个,它将被接受。
Given an unsorted binary array, a
, the only allowed operation is all_zeros(a)
, which returns True
iff all of the array's elements are 0.
The complexity of this all_zeros(a)
is o(len(a)) + large overhead constant
I would like to find all the indices which contain 1s, in the least runs as possible of all_zeros
A reasonable sub problem is to assume the number of 1s is "much" (say, x100~x1000) smaller than the number of 0s.
Theoretically, this is simply solved by iterating over the array element-wise, and testing all_zeros([element])
.
In practice, the overhead constant forces us to work in as large batches as possible. We can't assume to know the ratio of 1's in the array, but if some algorithm requires that knowledge, please do share it.
I am looking for a conceptual solution, thus I am not specifying the ratio between the overhead constant and the computation time of all_zeros
.
Please notice I am looking for an average case solution, and not for a worst case solution.
This now requires to define a probability distribution over 1's and 0's, but I am trying to keep this at a high level, and I will not go greatly into the details, while still keeping this answerable.
There may be a best case solution, that always gets the minimum overhead. If there is one, it will be accepted.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我会检查大块,只有较小的块,如果它们不是零。
根据1s的比率和“大型高架常数”,我会选择合适的开始尺寸。
在这里,如何检查(例如)
数据:(仅用于可读性)
我希望这个想法很清楚。但是我强烈建议您介绍要启动哪个块大小以及何时切换单元素块。
I would check big chunks and only try smaller ones if they are not zero.
Depending on the ratio of 1s and 'large overhead constant' I would chose a suitable start size.
Here the idea how to check (by example)
The data: (spaces only for readability)
I hope the idea is clear. But I highly recommend to profile with which block-size to start and when to switch for single-element blocks.
如果
all_zeros(a)
返回某些子阵列的false,则可以在该子阵列中二进制搜索以查找第一个1的位置。此过程对此1的任何元素一无所知,因此您将在那之后重新开始。问题是要制作最初的查询的大小。如果每个查询返回true的概率为50%,您将进行最少的查询总数。如果您的初始查询有50%的机会找到1,则二进制搜索中的所有查询也将有50%的机会,而总成本 per 1 是log2 < /sub> l + 1查询,如果1s平均分开。
如果L的时间是应有的两倍,那么或一半是应有的时间,那么成本每1查询约1查询,这是1s相距遥远的价格很小。
因此,不需要知道1开始的频率的非常好的算法将是:
总成本将是log2 l + some_small_small_small_number每1的查询,如果1是随机分布的,我认为这是最坏的情况。
if
all_zeros(a)
returns false for some subarray, then you can binary search within that subarray to find the position of the first 1. This process tells you nothing about any elements following that 1, so you would start again after that.The question is what size to make your initial queries. You will make the fewest total number of queries if the probability of each query returning true is 50%. If your initial query has a 50% chance of finding a 1, then all the queries in the binary search will also have a 50% chance, and the total cost per 1 is log2 L + 1 queries, if 1s are L slots apart on average.
If L is twice as long as it should be, then or half as long as it should be, then the cost goes up by about 1 query per 1, which is a pretty small price to pay when 1s are far apart.
So a pretty good algorithm that doesn't require knowing the frequency of 1s to start with would be:
The total cost will be log2 L + some_small_number of queries per 1, if the 1s are randomly distributed, and I think that's the worst case.