Python big_o似乎返回了完全不正确的结果 - 我在做什么错?

发布于 2025-02-04 19:16:57 字数 1633 浏览 4 评论 0原文

我正在比较使用 big_o 模块,对于以下方法,我的函数不会返回预期的结果:这是


    def itertools_chain_from_iterable(arr): 
        return list(chain.from_iterable(arr))

返回“常数”,这可能是不正确的。

  1. 这是:
    def merge_extend(self,arr):
        output = []
        for l in arr:
            output.extend(l)
        return output

(最多不应该是二次吗?

    def merge_w_sum(self,arr): 
        return sum(arr,[])

返回“立方” //mathieularose.com/how-not-not-not-to-flatten-a-list-list-lists-lists-in-python#:%7e:text = we%20can%20Conclude%20that%20%20Sum(LST%2C%20%5B%5B%5B%5B%5B%5B% 5d)%20HAS%20季度%20综合性。%20综合%20to%3A“ rel =“ nofollow noreferrer”>在这里进行证明。

  1. 此外,列表理解一个
    def merge_bucket(self,bucket):
        return [number for n in bucket for number in n]

返回“ polynomial”,这似乎很恐怖好吧)

用于计算复杂性的代码:

print('<function name>:', big_o.big_o(<function name>, 
       lambda n:[big_o.datagen.integers(9900,1,9999999) for n in range(50)],
       n_measures=20)[0])

输出:

complexity of itertools_chain_from_iterable: Constant: time = 0.0013 (sec)
complexity of merge_w_sum: Linear: time = 0.46 + 6.2E-07*n (sec)
complexity of merge_extend: Cubic: time = 0.048 + -2.3E-18*n^3 (sec)
complexity of merge_bucket: Polynomial: time = 0.2 * x^-0.019 (sec)

我在做什么(或理解)错误?

I am comparing runtimes of different ways of flattening a list of lists using the big_o module, and for following methods my function does not return the expected results, namely:

  1. This one:

    def itertools_chain_from_iterable(arr): 
        return list(chain.from_iterable(arr))

returns "constant", which can't possibly be true.

  1. This one:
    def merge_extend(self,arr):
        output = []
        for l in arr:
            output.extend(l)
        return output

returns "cubic" (shouldn't it be quadratic at most?), while...

  1. ..this one
    def merge_w_sum(self,arr): 
        return sum(arr,[])

returns "linear" (I'm quite sure it should be quadratic, see proof here.

  1. Furthermore, the list comprehension one
    def merge_bucket(self,bucket):
        return [number for n in bucket for number in n]

returns "polynomial", which seems terrifying (would expect linear here as well)

Code used to calculate the complexities:

print('<function name>:', big_o.big_o(<function name>, 
       lambda n:[big_o.datagen.integers(9900,1,9999999) for n in range(50)],
       n_measures=20)[0])

Output:

complexity of itertools_chain_from_iterable: Constant: time = 0.0013 (sec)
complexity of merge_w_sum: Linear: time = 0.46 + 6.2E-07*n (sec)
complexity of merge_extend: Cubic: time = 0.048 + -2.3E-18*n^3 (sec)
complexity of merge_bucket: Polynomial: time = 0.2 * x^-0.019 (sec)

What is it that I'm doing (or understanding) wrong? Many thanks in advance for useful tips!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

君勿笑 2025-02-11 19:16:57

您的lambda忽略了其参数n,而始终产生相同的常数 size输入。大小的输入n

(注意:最初的问题有8个功能,其中7个被判断为“恒定时间”。它被编辑为更大的常数,然后得到其他判断。我猜您的计算机的速度有些不稳定,因为恒定输入仍应引导为了持续的运行时间,因此在任何情况下都应该像我所说的那样固定输入的功能。)

鉴于它是二维n,或sqrt(n)sqrt(n)的列表。如果您的目标是证明sum(arr,[])是不好的,则大概是尺寸1的n列表。例如:

lambda n: [[i] for i in range(n)]

一个完整​​程序:

import big_o
from itertools import chain

def chained(arr):
    return list(chain.from_iterable(arr))

def merge_extend(arr):
    output = []
    for l in arr:
        output.extend(l)
    return output

def merge_w_sum(arr): 
    return sum(arr,[])

def merge_bucket(bucket):
    return [number for n in bucket for number in n]

funcs = [
    (chained, 10**5),
    (merge_extend, 10**5),
    (merge_w_sum, 10**4),
    (merge_bucket, 10**5),
]

for _ in range(3):
    for func, max_n in funcs:
        complexity = big_o.big_o(
            func,
            lambda n: [[0]] * n,
            max_n=max_n,
            n_timings=10,
        )[0]
        print(
            f'{func.__name__:13}',
            complexity
        )
    print()

示例结果:

chained       Linear: time = 8.2E-05 + 5.8E-08*n (sec)
merge_extend  Linear: time = -4.2E-06 + 8.4E-08*n (sec)
merge_w_sum   Quadratic: time = 0.00013 + 2.4E-09*n^2 (sec)
merge_bucket  Linear: time = 0.00046 + 8E-08*n (sec)

chained       Logarithmic: time = -0.0075 + 0.0014*log(n) (sec)
merge_extend  Linear: time = -2E-05 + 8.5E-08*n (sec)
merge_w_sum   Quadratic: time = 0.00011 + 2.4E-09*n^2 (sec)
merge_bucket  Linear: time = -4.2E-05 + 2.6E-07*n (sec)

chained       Linear: time = -1.8E-05 + 1.6E-07*n (sec)
merge_extend  Logarithmic: time = -0.01 + 0.0019*log(n) (sec)
merge_w_sum   Quadratic: time = 8.3E-05 + 2.4E-09*n^2 (sec)
merge_bucket  Linear: time = 7.1E-05 + 8.3E-08*n (sec)

您可以在大多数时候看到它正确,但有时仍然会认为对数而不是线性。在更稳定的系统上,它可能会更好。较大的max_n值也应该有所帮助,但是我尝试了,然后big_o用一些已知内部错误。

另请注意,我为不同功能使用了不同的max_n值。所有人都使用同样的东西是不好的。如果您对所有人都使用相同的内容,那么如果它很大,则二次时间解决方案需要难以忍受的时间很长,如果它很小,线性时间解决方案花费的时间很少,以至于big_o很难区分它与对数或对数或对数或线性化。似乎没有一个对所有人有益的中等价值。理想情况下,big_o将适当地调整max_n,但我认为它不支持它。

Your lambda ignores its argument n and instead always produce the same constant-size input. Produce input of size n instead.

(A note: originally the question had 8 functions and 7 of them were judged "constant time". It was edited to a larger constant and then got other judgements. I guess your computer's speed is somewhat unstable, as the constant input should still lead to constant runtimes and thus "constant time" judgements. In any case, the input-generating function should be fixed like I said.)

Given that it's two-dimensional, you could for example produce n lists of size 1, one list of size n, or sqrt(n) lists of size sqrt(n). Presumably n lists of size 1 is what you want if your goal is to demonstrate that sum(arr, []) is bad. For example:

lambda n: [[i] for i in range(n)]

A full program:

import big_o
from itertools import chain

def chained(arr):
    return list(chain.from_iterable(arr))

def merge_extend(arr):
    output = []
    for l in arr:
        output.extend(l)
    return output

def merge_w_sum(arr): 
    return sum(arr,[])

def merge_bucket(bucket):
    return [number for n in bucket for number in n]

funcs = [
    (chained, 10**5),
    (merge_extend, 10**5),
    (merge_w_sum, 10**4),
    (merge_bucket, 10**5),
]

for _ in range(3):
    for func, max_n in funcs:
        complexity = big_o.big_o(
            func,
            lambda n: [[0]] * n,
            max_n=max_n,
            n_timings=10,
        )[0]
        print(
            f'{func.__name__:13}',
            complexity
        )
    print()

Sample results:

chained       Linear: time = 8.2E-05 + 5.8E-08*n (sec)
merge_extend  Linear: time = -4.2E-06 + 8.4E-08*n (sec)
merge_w_sum   Quadratic: time = 0.00013 + 2.4E-09*n^2 (sec)
merge_bucket  Linear: time = 0.00046 + 8E-08*n (sec)

chained       Logarithmic: time = -0.0075 + 0.0014*log(n) (sec)
merge_extend  Linear: time = -2E-05 + 8.5E-08*n (sec)
merge_w_sum   Quadratic: time = 0.00011 + 2.4E-09*n^2 (sec)
merge_bucket  Linear: time = -4.2E-05 + 2.6E-07*n (sec)

chained       Linear: time = -1.8E-05 + 1.6E-07*n (sec)
merge_extend  Logarithmic: time = -0.01 + 0.0019*log(n) (sec)
merge_w_sum   Quadratic: time = 8.3E-05 + 2.4E-09*n^2 (sec)
merge_bucket  Linear: time = 7.1E-05 + 8.3E-08*n (sec)

You can see it gets it right most of the time, but still sometimes thinks logarithmic instead of linear. On a more stable system it might work better. Larger max_n values should help, too, but I tried and then big_o crashed with some known internal error.

Also note that I used different max_n values for the different functions. It's no good to use the same for all. If you use the same for all, then if it's large, a quadratic time solution takes unbearably long, and if it's small, a linear time solution takes so little time that big_o has trouble differentiating it from logarithmic or linearithmic. There doesn't seem to be a medium value that's good for all. Ideally, big_o would automatically adjust max_n appropriately, but I don't think it supports that.

野鹿林 2025-02-11 19:16:57

首先,重要的是要了解 big o noreforge 是关于估计功能的渐近生长。

即当输入大小增长到无限时,它们如何“表现”。

例如,如果据说算法在 o(n^2)中运行,则(大致)
输入大小( n )生长到无穷大,执行时间将由恒定乘以 n^2 的束缚(有关更准确的定义,请参见上面的链接)。

因此,“真实的” Big -O与具体的时间测量无关(特别是因为我们无法测量无限输入) - 至少在理论意义上。

PYPI/BIG-O 之类的库尝试估计输入尺寸增长时处理时间的增长方式。
但这只是一个估计,在很大程度上取决于时间函数的属性和实际上给出的输入的大小。

就您而言,似乎您给了它太小的输入。
这导致了很短的执行时间,导致估计方法失败。
您可以尝试显着增加输入大小。


更新:最后一段(关于输入太小的原因)是一个猜测是错误的 - 请参见@kellybundy的答案上面。

First it's important to understand that Big O notation is about estimating the asymptotic growth of functions.
I.e. how they "behave" when input size grows to infinity.

E.g. if an algorithm is said to run in O(n^2), it means (roughly) that when the
input size (n) grows to infinity, the execution time will be bound by a constant multiplied by n^2 (for a more accurate definition see the link above).

Therefore the "real" big-O has nothing to do with concrete time measurements (in particular because we cannot measure infinite input) - at least in the theoretical sense.

Libraries like pypi/big-O try to estimate how the processing time grows when input size grows.
But it is only an estimation and depends heavily on the properties of the time function, and size of input you actually give it.

In your case, it seems like you gave it too small input.
This resulted in very short execution times that caused the estimation method to fail.
You can try to increase the input size significantly.


Update: the last paragraph (regarding the reason the input is too small) was a guess that turned out to be wrong - see @KellyBundy's answer above.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文