渐近表示法 - n (log n) (log n) 是否简化?
如果我有一个需要 n log n 步骤的算法(例如堆排序),其中步骤需要 log n 时间(例如比较/交换 0 到 n-1 范围内的“大”整数),那么整个过程。
显然我们可以说“n (log n) (log n)”,但我很难说服自己我不能将其简化为“n log n”。与此同时,我很难证明我能做到的本能是合理的。
我的直觉在这方面完全是错误的吗?
编辑
看来我的简单示例避免使问题复杂化使问题变得复杂。那好吧。
这个问题的真正原因是我经常有一个已知复杂度的标准算法,但使用不同的底层容器实现,因此各个步骤是 O(log n) 而不是 O(1)。例如,Hopcrofts 自动机最小化算法为 O(n log n) - 但如果您开始使用二叉树容器来存储状态、转换和中间结果,则步骤本身将变为 O(log n) - O(n log n) 为不再有效,因为 O(1) 访问的假设无效。
尽管如此,人们仍然会声称有 n 个状态和 m 个转换,但假设转换注释的数量是恒定的并且自动机或多或少是确定性的,则 n 和 m 对于自动机来说往往是线性相关的。
过去我并没有太担心这个问题——我处理的案例并不是很大。但是,好吧,我正在对我的自动机代码进行重大重构,我想我也可以对一些关键算法进行正确的数学计算。
编辑
我也越来越相信“n (log n) (log n)”是错误的。
如果 a 的复杂度为 O(b log b),其中 b 的复杂度为 O(log c),则 a 的 c 表示是多少?
If I have an algorithm that takes n log n steps (e.g. heapsort), where the steps take log n time (e.g. comparison/exchange of "big" integers in the range 0 to n-1), what is the asymptotic bound for the whole process.
Obviously we can say "n (log n) (log n)", but I'm having a hard time trying to convince myself that I cannot simplify this to "n log n". At the same time, I'm having a hard time justifying the instinct that insists that I can.
Is my instinct just plain wrong on this?
EDIT
It seems my simple-example-to-avoid-complicating-the-issue has complicated the issue. Oh well.
The real reason for the question is that I often have a standard algorithm with a known complexity, but implemented using different underlying containers, so that the individual steps are O(log n) rather than O(1). For example, Hopcrofts automaton minimisation algorithm is O(n log n) - but if you start using binary tree containers for the states, transitions and intermediate results, the steps themselves become O(log n) - the O(n log n) is no longer valid because the assumption of O(1) accesses is invalid.
Still, people will claim that there are n states and m transitions, but n and m tend to be linearly related for automata, assuming that the number of transition annotations is constant and that the automata are more-or-less deterministic.
I haven't worried too much about this in the past - the cases I work with aren't very big. But, well, I'm doing a major refactoring of my automata code, and I thought I might as well do the math properly for some key algorithms as I go.
EDIT
I'm also increasingly convinced that the "n (log n) (log n)" is wrong.
If a is O(b log b) where b is O(log c), what is a in terms of c?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
一般来说,您不能像这样乘以复杂性:对于堆排序,N 表示堆中的项目数,而对于大整数,N 可能表示可能值的上限。一般来说,这些不必相关,因此它是 N log N log M(其中 M 是项目可能采用的范围)。
在特定的应用中,大整数很可能遵循某种特定的分布。例如,可能已知它们都低于10^20。如果是这样,比较操作将花费常数时间(由上限 10^20 确定)。那么,log M 也是恒定的,整个复杂度为 O(N log N)。
In general, you can't multiply complexities like this: for heap sort, N indicates the number of items in the heap, whereas for the big integers, N probably indicates the upper bound of possible values. In general, these don't have to be related, so that it's rather N log N log M (where M is the range that the items may take).
In a specific application, most likely, the large integers follow some specific distribution. For example, it may be known that they are all below 10^20. If so, the comparison operations take constant time (determined by an upper bound of 10^20). Then, log M is also constant, and the entire complexity is in O(N log N).
这是一个反证法:
假设有一个函数
f(n) = n(log n)(log n)
。假设我们认为它也是θ(n log n)
、theta(n log n)
,所以换句话说,有一个a
其中f(n) <= a * n log n
对于较大的n
成立。现在考虑 f(2^(a+1)):
f(2^(a+1)) = 2^(a+1) * log(2^(a+1) )) * log(2^(a+1)) = 2^(a+1) * log(2^(a+1)) * (a+1),明显大于
a * 2^(a+1) * log(2^(a+1))
,我们有一个矛盾。因此f(n)
不是一个n log n
函数。Here's a proof by contradiction:
Let's say that a function
f(n) = n(log n)(log n)
. Assume that we think it's alsoΘ(n log n)
,theta(n log n)
, so in other words there is ana
for whichf(n) <= a * n log n
holds for largen
.Now consider
f(2^(a+1))
:f(2^(a+1)) = 2^(a+1) * log(2^(a+1)) * log(2^(a+1)) = 2^(a+1) * log(2^(a+1)) * (a+1)
, which is clearly larger thana * 2^(a+1) * log(2^(a+1))
, and we have a contradiction. thereforef(n)
is not ann log n
function.您无法将
n (log n) (log n)
减少到n (log n)
,因为这不是按常数因子减少。n (log n)
2
有什么问题吗?You won't be able to reduce
n (log n) (log n)
ton (log n)
simply because that's not a reduction by a constant factor.What's wrong with
n (log n)
2
?好的,程序的一般复杂性度量如下:
在这个定义中,一切都被考虑在内,因为整数可能是任意大,并且对它们的算术运算也会增加 O(n) 下的函数。
但如果我们直接对图灵机进行编程,就不会出现你的问题。在现实世界中,我们更喜欢抽象。虽然我们仍然计算运行程序所需的“步骤”,但我们假设某些操作需要一步。我们出于不同的原因假设:
在这种情况下(您的第一次编辑),如果您想具体化您的复杂性度量,您应该只将 O 下的函数相乘。如果您认为现在采取一个步骤需要采取 O(log N) 步骤,那么具体化的数量步骤增加了 O(log N) 倍。因此总复杂度为 O(Nlog Nlog N)。
至于你的第二次编辑,情况有所不同。让您的算法复杂度为 O(nlog N)。但你知道输入由
M
个数字组成,每个数字为log K
位,因此 `N = O(Mlog K)(我们需要考虑分隔符等)。将整体复杂度写为 O(M * log K * (log M + log log K)) 在数学上是正确的,所以这里没有问题。但通常我们会抽象出不必要的细节,以找到要比较的不同算法的共同基础。Okay, the general complexity measure of a program is the following:
In this definition everything is taken into account, because the integer numbers may be arbitrarily big, and arithmetic operations on them would also increase the function under O(n).
But if we were programming Turing machines directly, your question wouldn't have been arisen. In real world we prefer to abstract. While we still calculate "steps" that are needed to run the program, we assume that certain operations take one step. We assume that by different reasons:
In this case (your first EDIT), if you want to concretize your complexity measure, you should just multiply functions under O. If what you thought to take one step now considered to take O(log N) steps, then the amount of concretized steps increases by a factor of O(log N). Therefore the total complexity is O(Nlog Nlog N).
As to your second EDIT, it's a different situation. Let your algorithm be a complexity of O(nlog N). But you know that the input consists of
M
numbers, each oflog K
digits, so `N = O(Mlog K) (we need to account separators, etc). It's mathematically correct then to write the overall complexity as O(M * log K * (log M + log log K)), so there's no problem here. But usually we abstract away unnecessary details to find a common basis for different algorithms to be compared.