给定未排序的二进制数组,计数为1' s,只能检查整个子阵列是否均为zeros

发布于 2025-02-11 18:25:30 字数 627 浏览 4 评论 0原文

给定一个未排序的二进制数组,a,唯一允许的操作是all_zeros(a),它返回true如果所有数组的元素为0 。
all_zeros(a)的复杂性是o(len(a)) +大型开销常数

我想找到所有包含1s,的索引All_zeros 的运行最少 一个合理的子问题是假设1s的数量“大多数”(例如,x100〜x1000)小于0。


从理论上讲,这是通过在数组元素上迭代和测试all_zeros([element])
来解决的。
实际上,间接费用恒定迫使我们尽可能大的批量工作。我们不能假设知道数组中1的比率,但是如果某些算法需要该知识,请分享。

我正在寻找一个概念解决方案,因此我没有指定间接费用常数与all_zeros的计算时间之间的比率。

请注意,我正在寻找平均情况解决方案,而不是为最坏的情况解决方案。
现在,这需要在1和0上定义概率分布,但是我试图将其保持在很高的水平上,而我不会大大介绍细节,同时仍保持此答案。

可能会有一个最佳情况解决方案,这些解决方案始终获得最小开销。如果有一个,它将被接受。

Given an unsorted binary array, a, the only allowed operation is all_zeros(a), which returns True iff all of the array's elements are 0.
The complexity of this all_zeros(a) is o(len(a)) + large overhead constant

I would like to find all the indices which contain 1s, in the least runs as possible of all_zeros
A reasonable sub problem is to assume the number of 1s is "much" (say, x100~x1000) smaller than the number of 0s.


Theoretically, this is simply solved by iterating over the array element-wise, and testing all_zeros([element]).
In practice, the overhead constant forces us to work in as large batches as possible. We can't assume to know the ratio of 1's in the array, but if some algorithm requires that knowledge, please do share it.

I am looking for a conceptual solution, thus I am not specifying the ratio between the overhead constant and the computation time of all_zeros.

Please notice I am looking for an average case solution, and not for a worst case solution.
This now requires to define a probability distribution over 1's and 0's, but I am trying to keep this at a high level, and I will not go greatly into the details, while still keeping this answerable.
There may be a best case solution, that always gets the minimum overhead. If there is one, it will be accepted.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

葵雨 2025-02-18 18:25:30

我会检查大块,只有较小的块,如果它们不是零。
根据1s的比率和“大型高架常数”,我会选择合适的开始尺寸。

在这里,如何检查(例如)

数据:(仅用于可读性)

   00001110 00100001 00100000 01000000 00000000 00000000 00000101 01010000
1. xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
   -> both checked intervalls are non-zero -> half them

2. xxxxxxxxxxxxxxxxx XXXXXXXXXXXXXXXXX xxxxxxxxxxxxxxxxx XXXXXXXXXXXXXXXXX
      non-zero           non-zero          zero              non-zero

3. xxxxxxxx XXXXXXXX xxxxxxxx XXXXXXXX                   xxxxxxxx XXXXXXXX
     n-z      n-z      n-z      n-z                        n-z      n-z   

4. xxxxXXXX xxxxXXXX xxxxXXXX xxxxXXXX                   xxxxXXXX xxxxXXXX
   zero n-z n-z n-z  n-z zero n-z zero                   zero n-z n-z zero

5.     xxXX xxXXxxXX xxXX     xxXX                           xxXX xxXX 

...

我希望这个想法很清楚。但是我强烈建议您介绍要启动哪个块大小以及何时切换单元素块。

I would check big chunks and only try smaller ones if they are not zero.
Depending on the ratio of 1s and 'large overhead constant' I would chose a suitable start size.

Here the idea how to check (by example)

The data: (spaces only for readability)

   00001110 00100001 00100000 01000000 00000000 00000000 00000101 01010000
1. xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
   -> both checked intervalls are non-zero -> half them

2. xxxxxxxxxxxxxxxxx XXXXXXXXXXXXXXXXX xxxxxxxxxxxxxxxxx XXXXXXXXXXXXXXXXX
      non-zero           non-zero          zero              non-zero

3. xxxxxxxx XXXXXXXX xxxxxxxx XXXXXXXX                   xxxxxxxx XXXXXXXX
     n-z      n-z      n-z      n-z                        n-z      n-z   

4. xxxxXXXX xxxxXXXX xxxxXXXX xxxxXXXX                   xxxxXXXX xxxxXXXX
   zero n-z n-z n-z  n-z zero n-z zero                   zero n-z n-z zero

5.     xxXX xxXXxxXX xxXX     xxXX                           xxXX xxXX 

...

I hope the idea is clear. But I highly recommend to profile with which block-size to start and when to switch for single-element blocks.

第七度阳光i 2025-02-18 18:25:30

如果all_zeros(a)返回某些子阵列的false,则可以在该子阵列中二进制搜索以查找第一个1的位置。此过程对此1的任何元素一无所知,因此您将在那之后重新开始。

问题是要制作最初的查询的大小。如果每个查询返回true的概率为50%,您将进行最少的查询总数。如果您的初始查询有50%的机会找到1,则二进制搜索中的所有查询也将有50%的机会,而总成本 per 1 是log2 < /sub> l + 1查询,如果1s平均分开。

如果L的时间是应有的两倍,那么或一半是应有的时间,那么成本每1查询约1查询,这是1s相距遥远的价格很小。

因此,不需要知道1开始的频率的非常好的算法将是:

  1. 设置L = 128。这是1个频率的先验估计值。
  2. 检查第一个L元素。如果全部为零,则将L乘以2,然后继续使用该数组的其余部分。
  3. 否则,如果是&gt,则将l除以2; 1,二进制搜索以找到第一个1的位置,并在第一个1之后继续使用阵列的其余部分。

总成本将是log2 l + some_small_small_small_number每1的查询,如果1是随机分布的,我认为这是最坏的情况。

if all_zeros(a) returns false for some subarray, then you can binary search within that subarray to find the position of the first 1. This process tells you nothing about any elements following that 1, so you would start again after that.

The question is what size to make your initial queries. You will make the fewest total number of queries if the probability of each query returning true is 50%. If your initial query has a 50% chance of finding a 1, then all the queries in the binary search will also have a 50% chance, and the total cost per 1 is log2 L + 1 queries, if 1s are L slots apart on average.

If L is twice as long as it should be, then or half as long as it should be, then the cost goes up by about 1 query per 1, which is a pretty small price to pay when 1s are far apart.

So a pretty good algorithm that doesn't require knowing the frequency of 1s to start with would be:

  1. Set L=128, say. This is an a priori estimate of 1 frequency.
  2. Check the first L elements. If it's all zero, then multiply L by 2 and continue with the rest of the array.
  3. Otherwise, divide L by 2 if it's > 1, binary search to find the position of the first 1, and continue with the rest of the array after that first 1.

The total cost will be log2 L + some_small_number of queries per 1, if the 1s are randomly distributed, and I think that's the worst case.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文