用于存储数据和计算平均值的数据结构

发布于 2024-12-29 03:06:25 字数 678 浏览 0 评论 0原文

在这个问题中,我们感兴趣的是支持保持无限数量的 Y 轴平行向量的数据结构。

每个节点包含位置(X 轴值)和高度(Y 轴值)。我们可以假设没有两个向量位于同一位置。

请提供支持以下功能的高效数据结构:

  1. init((x1,y1)(x2,y2)(x3,y3)...(xn,yn)) - DS 将包含所有n个向量,而VECTOR#i的位置是xi VECTOR#i的高度是yi。 我们还知道 x1 x1 x1 x1 x1 x1 x1 < x2< x3< ...< xn(关于 y 一无所知) - 平均复杂度 = O(n)

  2. insert(x,y) - 添加位置为 x 且高度为 y 的向量。 - 复杂度 = O(logn) 平均摊销。

  3. update(x,y) - 将向量#x 的高度更新为 y。 - 最坏情况复杂度 = O(logn)

  4. average_around(x) - 返回 x 的 logn 个邻居的高度平均值 - 平均复杂度 = O(1)

空间复杂度:O(n)

In this problem, we are interested in a data structure that supports keeping infinite numbers of Y axis parallel vectors.

Each node contains location (X axis value) and height (Y axis value). We can assume there are no two vectors in the same location.

Please advise for an efficient data structure that supports:

  1. init((x1,y1)(x2,y2)(x3,y3)...(xn,yn)) - the DS will contain all n vectors, while VECTOR#i's location is xi VECTOR#i's hieght is yi.
    We also know that x1 < x2 < x3 < ... < xn (nothing is known about the y) - complexity = O(n) on average

  2. insert(x,y) - add vector with location x and height y. - complexity = O(logn) amortized on average.

  3. update(x,y) - update vector#x's height to y. - complexity = O(logn) worst case

  4. average_around(x) - return the heights average of logn neighbors of x - complexity = O(1) on average

Space Complexity: O(n)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

扛刀软妹 2025-01-05 03:06:25

我无法提供完整的答案,但这可能是正确方向的暗示。

基本思想

  • 假设您计算了 n 个数字 a_1,...,a_n 的平均值,则该平均值为 平均值=(a_1+...+a_n)/n。如果我们现在用 b 替换 a_n,我们可以重新计算新平均值,如下所示:avg'=(a_1+...+a_(n-1)+ b)/n,或更简单 - avg'=((avg*n)-a_n+b)/n。这意味着,如果我们交换一个元素,我们可以通过简单、快速的操作,使用原始平均值重新计算平均值,而不需要重新迭代所有参与平均值的元素。

注意:我假设您希望每侧都有 log n 个邻居,即总共有 2 log(n) 个邻居。如果您想要总共有 ​​log(n) 个邻居,您可以简单地调整它。此外,由于 log n 在大多数情况下不会是自然数,我假设您正在谈论 floor(log n),但我只会写 <为了简单起见,code>log n。

我考虑的主要事情是,您必须在 O(1) 中计算元素 x 周围的平均值。因此,我想你必须以某种方式预先计算这个平均值并存储它。因此,我将在节点中存储以下内容:

  • x 值
  • y 值
  • 平均值

请注意,如果您具有以下结构,则 update(x,y) 严格以 O(log n) 运行: 如果您更新元素从 x 到高度 y,您必须考虑其平均值受此变化影响的 2log(n) 邻居。您可以在 O(1) 内重新计算这些平均值:

假设 update(x,y) 影响元素 b,其平均值也将被更新。然后,您只需将 average(b) 乘以邻居数量(2log(n),如上所述)。然后,我们减去元素 x 的旧 y 值,并添加 x< 的新(更新)y 值/代码>。之后,我们除以2 log(n)。这确保我们现在拥有元素 b 的更新平均值。这仅涉及一些计算,因此可以在 O(1) 内完成。由于我们有 2log n 邻居,update 的运行时间为 O(2log n)=O(log n)

当您插入新元素e时,您必须更新受该新元素e影响的所有元素的平均值。这基本上就像在更新例程中一样完成。但是,当 log n (或者准确地说 floor(log n))更改其值时,您必须小心。如果 floor(log n) 保持不变(在大多数情况下),那么您可以执行 update 中描述的类似操作,但是您必须“删除”一个元素的高度,“添加”新添加元素的高度。在这些“好的”情况下,运行时间再次严格O(log n)

现在,当 floor(log n) 发生变化(增加 1)时,您必须对所有元素执行更新。也就是说,您必须对 n 个元素执行 O(1) 操作,从而导致运行时间为 O(n)。但是,floor(log n) 加 1 的情况很少(您需要将 n 的值加倍才能增加 floor(log n) 乘 1 - 假设我们正在讨论以 2 为底的 log,这在计算机科学中并不罕见)。我们用 c*n 或简单的 cn 表示这个时间。

因此,让我们考虑一个插入序列:第一个插入需要更新:c*1,第二个插入需要更新:2*c。下一次发生昂贵的插入时,是第四次插入:4*c,然后是第八次插入:8c,第十六个插入:16*c代码>.两次昂贵的插入之间的距离每次都会加倍:

insert # 1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18 ..
cost     1c 2c 1  4c 1  1  1  8c 1  1   1   1   1   1   1   16c 1   1  ..

由于不需要remove,因此我们可以继续分析而无需任何“特殊情况”,并且仅考虑插入序列。您会看到,大多数插入的成本为 1,而少数插入则很昂贵(1、2、4、8、16、32,...)。因此,如果我们总共有 m 个插入,那么我们大约有 log m 个昂贵的插入,以及大约 m-log m 个廉价的插入。为了简单起见,我们假设简单的 m 个廉价插入。

然后,我们可以计算 m 次插入的成本:

         log m
         ----
         \      i
m*1 +    /     2   
         ----
          i=0

m*1 计算廉价操作,计算昂贵操作的总和。可以证明整个事情最多 4m (事实上,您甚至可以很容易地显示更好的估计,但对我们来说这就足够了)。

因此,我们看到 m 个插入操作总共最多花费 4m 。因此,单个插入操作的成本最多4m/m=4,因此是O(1) 摊销。

那么,剩下两件事:

  • 如何存储所有条目?
  • 如何在 O(n) 内初始化数据结构?

我建议将所有条目存储在跳过列表或某些保证对数搜索操作的树中(否则,插入和更新需要超过 O(log n) 才能找到正确的位置)。请注意,数据结构必须可以在 O(n) 内构建 - 假设元素根据其 x 坐标排序,这应该不是大问题。

要在 O(n) 中初始化数据结构,我建议从索引 log n 处的元素开始,并以简单的方式计算其平均值(总而言之,2log n 邻居,除以2 log n)。

然后,您进一步移动索引并使用 average(index-1) 计算 average(index)average(index)=average(index-1)* log(n)-y(index-1-log(n))+y(index+log(n))

也就是说,我们遵循与更新类似的方法。这意味着计算平均值的成本O(log n + n*1)=O(n)。因此,我们可以计算 O(n) 的平均值。

请注意,您必须考虑一些我在这里没有描述的细节(例如边界情况:索引 1 处的元素两侧都没有 log(n) 邻居 - 您如何继续这?)。

I can't provide a full answer, but it might be a hint into the right direction.

Basic ideas:

  • Let's assume you've calculated the average of n numbers a_1,...,a_n, then this average is avg=(a_1+...+a_n)/n. If we now replace a_n by b, we can recalculate the new average as follows: avg'=(a_1+...+a_(n-1)+b)/n, or - simpler - avg'=((avg*n)-a_n+b)/n. That means, if we exchange one element, we can recompute the average using the original average value by simple, fast operations, and don't need to re-iterate over all elements participating in the average.

Note: I assume that you want to have log n neighbours on each side, i.e. in total we have 2 log(n) neighbours. You can simply adapt it if you want to have log(n) neighbours in total. Moreover, since log n in most cases won't be a natural number, I assume that you are talking about floor(log n), but I'll just write log n for simplicity.

The main thing I'm considering is the fact that you have to tell the average around element x in O(1). Thus, I suppose you have to somehow precompute this average and store it. So, i would store in a node the following:

  • x value
  • y value
  • average around

Note that update(x,y) runs strictly in O(log n) if you have this structure: If you update element x to height y, you have to consider the 2log(n) neighbours whose average is affected by this change. You can recalculate each of these averages in O(1):

Let's assume, update(x,y) affects an element b, whose average is to be updated as well. Then, you simply multiply average(b) by the number of neighbours (2log(n) as stated above). Then, we subtract the old y-value of element x, and add the new (updated) y-value of x. After that, we divide by 2 log(n). This ensures that we now have the updated average for element b. This involved only some calculations and can thus be done in O(1). Since we have 2log n neighbours, update runs in O(2log n)=O(log n).

When you insert a new element e, you have to update the average of all elements affected by this new element e. This is essentially done like in the update routine. However, you have to be careful when log n (or precisely floor(log n)) changes its value. If floor(log n) stays the same (which it will, in most cases), then you can just do the analogue things described in update, however you will have to "remove" the height of one element, and "add" the height of the newly added element. In these "good" cases, run time is again strictly O(log n).

Now, when floor(log n) is changing (incrementing by 1), you have to perform an update for all elements. That is, you have to do an O(1) operation for n elements, resulting in a running time of O(n). However, it is very seldom the case that floor(log n) increments by 1 (you need to double the value of n to increment floor(log n) by 1 - assuming we are talking about log to base 2, which is not uncommon in computer science). We denote this time by c*n or simply cn.

Thus, let's consider a sequence of inserts: The first insert needs an update: c*1, the second insert needs an update: 2*c. The next time an expensive insert occurs, is the fourth insert: 4*c, then the eight insert: 8c, the sixtenth insert: 16*c. The distance between two expensive inserts is doubled each time:

insert # 1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18 ..
cost     1c 2c 1  4c 1  1  1  8c 1  1   1   1   1   1   1   16c 1   1  ..

Since no remove is required, we can continue with our analysis without any "special cases" and consider only a sequence of inserts. You see that most inserts cost 1, while few are expensive (1,2,4,8,16,32,...). So, if we have m inserts in total, we have roughly log m expensive inserts, and roughly m-log m cheap inserts. For simplicity, we assume simply m cheap inserts.

Then, we can compute the cost for m inserts:

         log m
         ----
         \      i
m*1 +    /     2   
         ----
          i=0

m*1 counts the cheap operations, the sum the expensive ones. It can be shown that the whole thing is at most 4m (in fact you can even show better estimates quite easily, but for us this suffices).

Thus, we see that m insert operations cost at most 4m in total. Thus, a single insert operation costs at most 4m/m=4, thus is O(1) amortized.

So, there are 2 things left:

  • How to store all the entries?
  • How to initialize the data structure in O(n)?

I suggest storing all entries in a skip-list, or some tree that guarantees logarithmic search-operations (otherwise, insert and update require more than O(log n) for finding the correct position). Note that the data structure must be buildable in O(n) - which should be no big problem assuming the elements are sorted according to their x-coordinate.

To initialize the data structure in O(n), I suggest beginning at element at index log n and computing its average the simple way (sum up, the 2log n neighbours, divide by 2 log n).

Then you move the index one further and compute average(index) using average(index-1): average(index)=average(index-1)*log(n)-y(index-1-log(n))+y(index+log(n)).

That is, we follow a similar approach as in update. This means that computing the averages costs O(log n + n*1)=O(n). Thus, we can compute the averages in O(n).

Note that you have to take some details into account which I haven't described here (e.g. border cases: element at index 1 does not have log(n) neighbours on both sides - how do you proceed with this?).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文