Python 3 中 len(set) 与 set.__len__() 的性能分析

发布于 12-25 16:37 字数 796 浏览 2 评论 0原文

在分析我的 Python 应用程序时,我发现使用集合时 len() 似乎是一个非常昂贵的函数。请参阅下面的代码:

import cProfile

def lenA(s):
    for i in range(1000000):
        len(s);

def lenB(s):
    for i in range(1000000):
        s.__len__();

def main():
    s = set();
    lenA(s);
    lenB(s);

if __name__ == "__main__":
    cProfile.run("main()","stats");

根据下面的分析器统计,lenA() 似乎比 lenB() 慢 14 倍:

 ncalls  tottime  percall  cumtime  percall  filename:lineno(function)
      1    1.986    1.986    3.830    3.830  .../lentest.py:5(lenA)
1000000    1.845    0.000    1.845    0.000  {built-in method len}
      1    0.273    0.273    0.273    0.273  .../lentest.py:9(lenB)

我错过了什么吗?目前我使用 __len__() 而不是 len(),但代码看起来很脏:(

While profiling my Python's application, I've discovered that len() seems to be a very expensive one when using sets. See the below code:

import cProfile

def lenA(s):
    for i in range(1000000):
        len(s);

def lenB(s):
    for i in range(1000000):
        s.__len__();

def main():
    s = set();
    lenA(s);
    lenB(s);

if __name__ == "__main__":
    cProfile.run("main()","stats");

According to profiler's stats below, lenA() seems to be 14 times slower than lenB():

 ncalls  tottime  percall  cumtime  percall  filename:lineno(function)
      1    1.986    1.986    3.830    3.830  .../lentest.py:5(lenA)
1000000    1.845    0.000    1.845    0.000  {built-in method len}
      1    0.273    0.273    0.273    0.273  .../lentest.py:9(lenB)

Am I missing something? Currently I use __len__() instead of len(), but the code looks dirty :(

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

混吃等死2025-01-01 16:37:11

显然,len 有一些开销,因为它执行函数调用并将 AttributeError 转换为 TypeError。另外,set.__len__ 是一个如此简单的操作,与任何其他操作相比,它肯定会非常快,但我仍然没有发现使用 timeit< 时有 14 倍的差异。 /code>:

In [1]: s = set()

In [2]: %timeit s.__len__()
1000000 loops, best of 3: 197 ns per loop

In [3]: %timeit len(s)
10000000 loops, best of 3: 130 ns per loop

您应该始终只调用 len,而不是 __len__。如果对 len 的调用是程序中的瓶颈,您应该重新考虑其设计,例如某处的缓存大小或在不调用 len 的情况下计算它们。

Obviously, len has some overhead, since it does a function call and translates AttributeError to TypeError. Also, set.__len__ is such a simple operation that it's bound to be very fast in comparison to just about anything, but I still don't find anything like the 14x difference when using timeit:

In [1]: s = set()

In [2]: %timeit s.__len__()
1000000 loops, best of 3: 197 ns per loop

In [3]: %timeit len(s)
10000000 loops, best of 3: 130 ns per loop

You should always just call len, not __len__. If the call to len is the bottleneck in your program, you should rethink its design, e.g. cache sizes somewhere or calculate them without calling len.

洋洋洒洒2025-01-01 16:37:11

这是关于探查器的一个有趣的观察,它与 len 函数的实际性能无关。您会看到,在探查器统计信息中,有两行与 lenA 相关:

 ncalls  tottime  percall  cumtime  percall  filename:lineno(function)
      1    1.986    1.986    3.830    3.830  .../lentest.py:5(lenA)
1000000    1.845    0.000    1.845    0.000  {built-in method len}

...而只有一行与 lenB 相关:

      1    0.273    0.273    0.273    0.273  .../lentest.py:9(lenB)

探查器对来自 lenB 的每个调用进行了计时code>lenAlen,但对 lenB 作为一个整体进行计时。定时调用总是会增加一些开销;对于 lenA,您会发现此开销增加了一百万倍。

This is an interesting observation about the profiler, which has nothing to do with the actual performance of the len function. You see, in the profiler stats, there are two lines concerning lenA:

 ncalls  tottime  percall  cumtime  percall  filename:lineno(function)
      1    1.986    1.986    3.830    3.830  .../lentest.py:5(lenA)
1000000    1.845    0.000    1.845    0.000  {built-in method len}

...while there is only one line concerning lenB:

      1    0.273    0.273    0.273    0.273  .../lentest.py:9(lenB)

The profiler has timed each single call from lenA to len, but timed lenB as a whole. Timing a call always adds some overhead; in the case of lenA you see this overhead multiplied a million times.

嘿咻2025-01-01 16:37:11

这将是一个评论,但在拉斯曼对其有争议的结果和我得到的结果发表评论之后,我认为将我的数据添加到线程中很有趣。

尝试或多或少相同的设置,我得到了与OP相反的结果,并且与larsman评论的方向相同:

12.1964105975   <- __len__
6.22144670823   <- len()

C:\Python26\programas>

测试:

def lenA(s):
    for i in range(100):
        len(s);

def lenB(s):
    for i in range(100):
        s.__len__();

s = set()

if __name__ == "__main__":

    from timeit import timeit
    print timeit("lenB(s)", setup="from __main__ import lenB, s")
    print timeit("lenA(s)", setup="from __main__ import lenA, s")

这是win7中的activepython 2.6.7 64位

This was going to be a comment but after larsman's comment on his controversial results and the result I got, I think it is interesting to add my data to the thread.

Trying more or less the same setup I got the contrary the OP got, and in the same direction commented by larsman:

12.1964105975   <- __len__
6.22144670823   <- len()

C:\Python26\programas>

The test:

def lenA(s):
    for i in range(100):
        len(s);

def lenB(s):
    for i in range(100):
        s.__len__();

s = set()

if __name__ == "__main__":

    from timeit import timeit
    print timeit("lenB(s)", setup="from __main__ import lenB, s")
    print timeit("lenA(s)", setup="from __main__ import lenA, s")

This is activepython 2.6.7 64bit in win7

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文