随机二进制树，证明高概率为o（logn）

发布于 2025-02-09 00:59:32 字数 468 浏览 2 评论 0原文

我的教授现在在这次练习中被困了两天：

“考虑一下由重复插入构建的s s s s s s s s s s s ofered s的二进制树，其中S的元素被随机统一地插入的置换插入。证明高度是高度树的o（logn）具有很高的优势。”

到目前为止，我的工作是研究随机算法的宣传分析。例如，CLSR书籍有一章“ 12.4随机构建的二进制搜索树”，其证明是通过重复插入在随机置换上构建的预期高度为o（logn）。许多其他书籍证明了这一点。但这不是我们想要的。我们想证明一种更强的束缚。高度为O（logn）具有高度的优势。我研究了经典论文“关于二元搜索树高度的注释，1986年”，他证明高度为〜4.31107 ... logn，logn，概率很高。但是分析已经超出了我的联盟。我无法理解论文中的要点的逻辑。

我所见过的每本书和文章都使用了Devroye的论文的引用，并说：“也可以证明，高高的高度为O（logn）”。

我应该如何继续？

提前致谢。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

樱娆 2025-02-16 00:59:32

我将根据众所周知的概率结果概述我的最佳想法。您将需要添加详细信息并使其严格。

首先，让我们考虑将枢轴下降到二进制树中随机节点的过程。假设您的随机节点已知在i和i+m-1之间。在增加路径长度的下一步中，您将在该范围内选择一个数字j。使用概率（J-1）/M我们的随机节点现在在一系列长度j中。使用概率1/m是j。并且使用概率（MJ-1）/M它在J上方，现在处于长度（MJI）的范围。在这些范围内，未知节点均匀分布。

明显的连续近似是从离散到连续。我们从0到m选择一个随机数x作为下一个枢轴。有了X/M 的概率，我们处于大小x的范围内。有了（MX）/M的概率，我们处于大小mx的范围内。因此，我们通过随机数x是x/m 或（MX）/M。 x的分布是已知的。我们从连续近似中获取的样品序列是独立的。

另一项注释。 log（x）具有预期值e和一个方差v可以计算的。由于X始终在0和1之间，因此其预期值为负。

现在，使用0＆lt选择ε; ε。证明的轮廓如下。

证明在每个步骤中，从离散到连续近似的预期误差最多增加o（1）。
证明的总和（ε -1/e）log（n） log> log> log（x）的示例的可能性不在> -log（n ）是o（1/n）。
证明随机节点在深度（2ε + 1/e）log（n）或更少的IS o（1/n）的概率。
证明随机排列的概率在深度（3ε + 1/e）log（n） is o（1/log（n））或更多信息时具有任何节点。。

我们走吧。

1。在每个步骤中，从离散到连续近似的错误最多会增加o（1）。

每个步骤中的两个圆形最多是1。从上一个步骤开始，在下一步中，近似值收缩但不会增加的前一个错误缩小。而且两个圆形最多都是1。因此，错误最多增加2。

2。证明的总和（ε -1/e）log（n） log> log> log（x）的示例的可能性不在> -log（n ）是o（1/n）。

求和log> log（x）的期望值（ε-1/e））log（n）时间为（εe -1）log（n）。由于e是负面的，因此这在我们的目标之下。现在，我们可以使用 bernstein norqualities 远离卑鄙。事实证明，这与变量数量的指数成正比。由于我们具有o（log（n）），这将与1/n成比例。

3。证明一个随机节点在深度（2ε + 1/e）log（n）或更少的概率是o（1/n）。

使用概率1 -o（1/n），在（ε + 1/e）log（n）步骤中连续近似已收敛到1 < /代码>。有o（log（n））步骤，因此连续和离散之间的误差最多是o（log（log（n））。（返回步骤1查看。）因此，我们只需要证明未能从o（log（n））可能性转变为1 in εlog（n）步骤最多是o（1/n）。

如果我们可以证明，对于任何给定常数a，这将暗示这是暗示的，那么最多未能从a k到1的可能性最多可能k步骤是k中的负指数。（k这是εlog（n）。）

为此，让我们记录一个1时，我们每次将空间切成下一个元素，然后记录一个> 0否则。此类序列的数量为2^k。但是对于任何给定的a将搜索空间减少到1。请注意，每次赔率至少1/2您至少将搜索空间剪切1/2。在k步骤中，有2^k 1或0的序列是否将搜索空间切成两半。使用二项式公式和Stirling的近似程序来稍微播放，这将使您对未能将形式的足够时间o（k（3/4）^k）切成足够的时间的可能性上限。这足以满足我们的目的。

4。证明随机排列的概率至少在深度上具有任何节点（3ε + 1/e） is o（1/log（n））。

随机二进制树中随机节点的比例至少为（2ε + 1/e）log（n）最多是＆lt; p/n对于某些p。

任何随机树深度（3ε + 1/e）log（n）具有n节点和εlog（n）节点深度至少（2ε + 1/e）log（n）。如果在深度（3ε + 1/e）log（n）超过p/（εlog（n））的几率 深度（2ε + 1/e）的节点仅在这些图中。因此，通过鸽子孔原理，我们在可能性上具有上限。

I will outline my best idea based on well-known probability results. You will need to add details and make it rigorous.

First let's consider the process of descending down pivots to a random node in a binary tree. Suppose that your random node is known to be somewhere between i and i+m-1. At the next step which adds to the length of the path, you will pick a number j in that range. With probability (j-1)/m our random node is now in a range of length j. With probability 1/m it was j. And with probability (m-j-1)/m it was above j and is now in a range of length (m-j-i). Within those ranges, the unknown node is evenly distributed.

The obvious continuous approximation is to go from discrete to continuous. We pick a random number x from 0 to m to be the next pivot. With probability of x/m we are in a range of size x. With probability of (m-x)/m we are in a range of size m-x. We have therefore shrunk our factor by a random number X that is x/m or (m-x)/m. The distribution of X is known. And the sequence of samples we take from the continuous approximation is independent.

One more note. log(X) has both an expected value E and a variance V that can be calculated. Since X is always between 0 and 1, its expected value is negative.

Now pick ε with 0 < ε. The outline of the proof is as follows.

Show that at each step, the expected error from the discrete to the continuous approximation increases by at most O(1).
Show that the probability that the sum of (ε - 1/E) log(n) samples of log(X) fails to be below -log(n) is O(1/n).
Show that the probability that a random node is at depth (2ε + 1/E) log(n) or less is O(1/n).
Show that the probability of a random permutation has ANY node at depth (3ε + 1/E) log(n) is O(1/log(n)) or more.

Let's go.

1. Show that at each step, the error from the discrete to the continuous approximation increases by at most O(1).

The two roundoffs in each step are at most 1. Any errors carried over from the previous step will shrink by a random factorThe previous errors from the approximation shrink, but not increase, in the next step. And the two roundoffs are both at most 1. So the error increases by at most 2.

2. Show that the probability that the sum of (ε - 1/E) log(n) samples of log(X) fails to be below -log(n) is O(1/n).

The expected value of summing log(X) for (ε - 1/E) log(n) times is (ε E - 1) log(n). Since E is negative, this is below our target. Now we can use the Bernstein Inequalities we can put a bound on the probability of being that far from the mean. This will turn out to be proportional to an exponential in the number of variables. Since we have O(log(n)), this will be proportional to 1/n.

3. Show that the probability that a random node is at depth (2ε + 1/E) log(n) or less is O(1/n).

With probability 1 - O(1/n), in (ε + 1/E) log(n) steps the continuous approximation has converged to within 1. There were O(log(n)) steps, and therefore the error between continuous and discrete is at most O(log(n)). (Look back to step 1 to see that.) So we just have to show that the odds of failing to go from O(log(n)) possibilities to 1 in ε log(n) steps is at most O(1/n).

This would be implied if we could show that for any given constant a, the odds of failing to go from a k to 1 possibilities in at most k steps is a negative exponential in k. (k here is ε log(n).)

For that, let's record a 1 every time we cut the space in half with the next element, and a 0 otherwise. The number of such sequences is 2^k. But for any given a, if k is large enough, then you can't cut the space in half more than k/4 times without reducing the search space to 1. note that each time with odds at least 1/2 you cut the search space by at least 1/2. In k steps there are 2^k sequences of 1 or 0 for whether you cut the search space in half. A little playing around with the binomial formula and Stirling's approximation will get you an upper limit on the likelihood of failing to halve enough times of the form O( k (3/4)^k ). Which is sufficient for our purposes.

4. Show that the probability of a random permutation has ANY node at depth at least (3ε + 1/E) is O(1/log(n)).

The proportion of random nodes in random binary trees that are depth at least (2ε + 1/E) log(n) is at most < p/n for some p.

Any random tree with any node at depth (3ε + 1/E) log(n) has n nodes and ε log(n) nodes of depth at least (2ε + 1/E) log(n). If the odds of having a node at depth (3ε + 1/E) log(n) exceeds p / (ε log(n)), then we have too many nodes of depth (2ε + 1/E) log(n) just in those graphs. And therefore by the pigeon hole principle, we have our upper bound on the likelihood.

回复收藏 0 原文

~没有更多了~