用于存储数据和计算平均值的数据结构
在这个问题中,我们感兴趣的是支持保持无限数量的 Y 轴平行向量的数据结构。
每个节点包含位置(X 轴值)和高度(Y 轴值)。我们可以假设没有两个向量位于同一位置。
请提供支持以下功能的高效数据结构:
init((x1,y1)(x2,y2)(x3,y3)...(xn,yn))
- DS 将包含所有n个向量,而VECTOR#i的位置是xi VECTOR#i的高度是yi。 我们还知道 x1x1
x1
x1
x1
x1
x1 < x2< x3< ...< xn
(关于 y 一无所知) - 平均复杂度 = O(n)insert(x,y)
- 添加位置为 x 且高度为 y 的向量。 - 复杂度 = O(logn) 平均摊销。update(x,y)
- 将向量#x 的高度更新为 y。 - 最坏情况复杂度 = O(logn)average_around(x)
- 返回 x 的 logn 个邻居的高度平均值 - 平均复杂度 = O(1)
空间复杂度:O(n)
In this problem, we are interested in a data structure that supports keeping infinite numbers of Y axis parallel vectors.
Each node contains location (X axis value) and height (Y axis value). We can assume there are no two vectors in the same location.
Please advise for an efficient data structure that supports:
init((x1,y1)(x2,y2)(x3,y3)...(xn,yn))
- the DS will contain all n vectors, while VECTOR#i's location is xi VECTOR#i's hieght is yi.
We also know thatx1 < x2 < x3 < ... < xn
(nothing is known about the y) - complexity = O(n) on averageinsert(x,y)
- add vector with location x and height y. - complexity = O(logn) amortized on average.update(x,y)
- update vector#x's height to y. - complexity = O(logn) worst caseaverage_around(x)
- return the heights average of logn neighbors of x - complexity = O(1) on average
Space Complexity: O(n)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我无法提供完整的答案,但这可能是正确方向的暗示。
基本思想:
n
个数字a_1,...,a_n
的平均值,则该平均值为平均值=(a_1+...+a_n)/n
。如果我们现在用b
替换a_n
,我们可以重新计算新平均值,如下所示:avg'=(a_1+...+a_(n-1)+ b)/n
,或更简单 -avg'=((avg*n)-a_n+b)/n
。这意味着,如果我们交换一个元素,我们可以通过简单、快速的操作,使用原始平均值重新计算平均值,而不需要重新迭代所有参与平均值的元素。注意:我假设您希望每侧都有
log n
个邻居,即总共有2 log(n)
个邻居。如果您想要总共有 log(n)
个邻居,您可以简单地调整它。此外,由于log n
在大多数情况下不会是自然数,我假设您正在谈论floor(log n)
,但我只会写 <为了简单起见,code>log n。我考虑的主要事情是,您必须在 O(1) 中计算元素
x
周围的平均值。因此,我想你必须以某种方式预先计算这个平均值并存储它。因此,我将在节点中存储以下内容:请注意,如果您具有以下结构,则
update(x,y)
严格以 O(log n) 运行: 如果您更新元素从x
到高度y
,您必须考虑其平均值受此变化影响的2log(n)
邻居。您可以在 O(1) 内重新计算这些平均值:假设
update(x,y)
影响元素b
,其平均值也将被更新。然后,您只需将average(b)
乘以邻居数量(2log(n)
,如上所述)。然后,我们减去元素x
的旧y 值
,并添加x< 的新(更新)
y
值/代码>。之后,我们除以2 log(n)
。这确保我们现在拥有元素b
的更新平均值。这仅涉及一些计算,因此可以在 O(1) 内完成。由于我们有2log n
邻居,update
的运行时间为O(2log n)=O(log n)
。当您插入新元素
e
时,您必须更新受该新元素e
影响的所有元素的平均值。这基本上就像在更新
例程中一样完成。但是,当log n
(或者准确地说floor(log n)
)更改其值时,您必须小心。如果floor(log n)
保持不变(在大多数情况下),那么您可以执行update
中描述的类似操作,但是您必须“删除”一个元素的高度,“添加”新添加元素的高度。在这些“好的”情况下,运行时间再次严格O(log n)
。现在,当
floor(log n)
发生变化(增加 1)时,您必须对所有元素执行更新。也就是说,您必须对n
个元素执行 O(1) 操作,从而导致运行时间为O(n)
。但是,floor(log n)
加 1 的情况很少(您需要将n
的值加倍才能增加floor(log n)
乘 1 - 假设我们正在讨论以 2 为底的log
,这在计算机科学中并不罕见)。我们用c*n
或简单的cn
表示这个时间。因此,让我们考虑一个插入序列:第一个插入需要更新:
c*1
,第二个插入需要更新:2*c
。下一次发生昂贵的插入时,是第四次插入:4*c
,然后是第八次插入:8c
,第十六个插入:16*c
代码>.两次昂贵的插入之间的距离每次都会加倍:由于不需要
remove
,因此我们可以继续分析而无需任何“特殊情况”,并且仅考虑插入序列。您会看到,大多数插入的成本为 1,而少数插入则很昂贵(1、2、4、8、16、32,...)。因此,如果我们总共有m
个插入,那么我们大约有log m
个昂贵的插入,以及大约m-log m
个廉价的插入。为了简单起见,我们假设简单的m
个廉价插入。然后,我们可以计算
m
次插入的成本:m*1
计算廉价操作,计算昂贵操作的总和。可以证明整个事情最多4m
(事实上,您甚至可以很容易地显示更好的估计,但对我们来说这就足够了)。因此,我们看到
m
个插入操作总共最多花费4m
。因此,单个插入操作的成本最多4m/m=4
,因此是O(1)
摊销。那么,剩下两件事:
我建议将所有条目存储在跳过列表或某些保证对数搜索操作的树中(否则,插入和更新需要超过 O(log n) 才能找到正确的位置)。请注意,数据结构必须可以在 O(n) 内构建 - 假设元素根据其 x 坐标排序,这应该不是大问题。
要在 O(n) 中初始化数据结构,我建议从索引
log n
处的元素开始,并以简单的方式计算其平均值(总而言之,2log n
邻居,除以2 log n
)。然后,您进一步移动索引并使用
average(index-1)
计算average(index)
:average(index)=average(index-1)* log(n)-y(index-1-log(n))+y(index+log(n))
。也就是说,我们遵循与更新类似的方法。这意味着计算平均值的成本
O(log n + n*1)=O(n)
。因此,我们可以计算 O(n) 的平均值。请注意,您必须考虑一些我在这里没有描述的细节(例如边界情况:索引 1 处的元素两侧都没有
log(n)
邻居 - 您如何继续这?)。I can't provide a full answer, but it might be a hint into the right direction.
Basic ideas:
n
numbersa_1,...,a_n
, then this average isavg=(a_1+...+a_n)/n
. If we now replacea_n
byb
, we can recalculate the new average as follows:avg'=(a_1+...+a_(n-1)+b)/n
, or - simpler -avg'=((avg*n)-a_n+b)/n
. That means, if we exchange one element, we can recompute the average using the original average value by simple, fast operations, and don't need to re-iterate over all elements participating in the average.Note: I assume that you want to have
log n
neighbours on each side, i.e. in total we have2 log(n)
neighbours. You can simply adapt it if you want to havelog(n)
neighbours in total. Moreover, sincelog n
in most cases won't be a natural number, I assume that you are talking aboutfloor(log n)
, but I'll just writelog n
for simplicity.The main thing I'm considering is the fact that you have to tell the average around element
x
in O(1). Thus, I suppose you have to somehow precompute this average and store it. So, i would store in a node the following:Note that
update(x,y)
runs strictly in O(log n) if you have this structure: If you update elementx
to heighty
, you have to consider the2log(n)
neighbours whose average is affected by this change. You can recalculate each of these averages in O(1):Let's assume,
update(x,y)
affects an elementb
, whose average is to be updated as well. Then, you simply multiplyaverage(b)
by the number of neighbours (2log(n)
as stated above). Then, we subtract the oldy-value
of elementx
, and add the new (updated)y
-value ofx
. After that, we divide by2 log(n)
. This ensures that we now have the updated average for elementb
. This involved only some calculations and can thus be done in O(1). Since we have2log n
neighbours,update
runs inO(2log n)=O(log n)
.When you insert a new element
e
, you have to update the average of all elements affected by this new elemente
. This is essentially done like in theupdate
routine. However, you have to be careful whenlog n
(or preciselyfloor(log n)
) changes its value. Iffloor(log n)
stays the same (which it will, in most cases), then you can just do the analogue things described inupdate
, however you will have to "remove" the height of one element, and "add" the height of the newly added element. In these "good" cases, run time is again strictlyO(log n)
.Now, when
floor(log n)
is changing (incrementing by 1), you have to perform an update for all elements. That is, you have to do an O(1) operation forn
elements, resulting in a running time ofO(n)
. However, it is very seldom the case thatfloor(log n)
increments by 1 (you need to double the value ofn
to incrementfloor(log n)
by 1 - assuming we are talking aboutlog
to base 2, which is not uncommon in computer science). We denote this time byc*n
or simplycn
.Thus, let's consider a sequence of inserts: The first insert needs an update:
c*1
, the second insert needs an update:2*c
. The next time an expensive insert occurs, is the fourth insert:4*c
, then the eight insert:8c
, the sixtenth insert:16*c
. The distance between two expensive inserts is doubled each time:Since no
remove
is required, we can continue with our analysis without any "special cases" and consider only a sequence of inserts. You see that most inserts cost 1, while few are expensive (1,2,4,8,16,32,...). So, if we havem
inserts in total, we have roughlylog m
expensive inserts, and roughlym-log m
cheap inserts. For simplicity, we assume simplym
cheap inserts.Then, we can compute the cost for
m
inserts:m*1
counts the cheap operations, the sum the expensive ones. It can be shown that the whole thing is at most4m
(in fact you can even show better estimates quite easily, but for us this suffices).Thus, we see that
m
insert operations cost at most4m
in total. Thus, a single insert operation costs at most4m/m=4
, thus isO(1)
amortized.So, there are 2 things left:
I suggest storing all entries in a skip-list, or some tree that guarantees logarithmic search-operations (otherwise, insert and update require more than O(log n) for finding the correct position). Note that the data structure must be buildable in O(n) - which should be no big problem assuming the elements are sorted according to their x-coordinate.
To initialize the data structure in O(n), I suggest beginning at element at index
log n
and computing its average the simple way (sum up, the2log n
neighbours, divide by2 log n
).Then you move the index one further and compute
average(index)
usingaverage(index-1)
:average(index)=average(index-1)*log(n)-y(index-1-log(n))+y(index+log(n))
.That is, we follow a similar approach as in update. This means that computing the averages costs
O(log n + n*1)=O(n)
. Thus, we can compute the averages in O(n).Note that you have to take some details into account which I haven't described here (e.g. border cases: element at index 1 does not have
log(n)
neighbours on both sides - how do you proceed with this?).