列表查找比元组更快？

发布于 2024-10-31 12:30:01 字数 1307 浏览 11 评论 0原文

过去，当我在紧密循环中需要类似数组的索引查找时，我通常使用元组，因为它们通常看起来性能非常好（接近于仅使用 n 个变量）。然而，今天我决定质疑这个假设，并得出了一些令人惊讶的结果：

In [102]: l = range(1000)
In [103]: t = tuple(range(1000))
In [107]: timeit(lambda : l[500], number = 10000000)
Out[107]: 2.465047836303711
In [108]: timeit(lambda : t[500], number = 10000000)
Out[108]: 2.8896381855010986

元组查找似乎比列表查找花费的时间长 17%！重复实验得到了相似的结果。拆解它们，我发现它们都是：

In [101]: dis.dis(lambda : l[5])
  1           0 LOAD_GLOBAL              0 (l)
              3 LOAD_CONST               1 (5)
              6 BINARY_SUBSCR       
              7 RETURN_VALUE

作为参考，典型的 10,000,000 个全局变量查找/返回需要 2.2 秒。另外，我在没有 lambda 的情况下运行了它，你知道，以防万一（请注意，数字 = 100,000,000 而不是 10,000,000）。

In [126]: timeit('t[500]', 't=range(1000)', number=100000000)
Out[126]: 6.972800970077515
In [127]: timeit('t[500]', 't=tuple(range(1000))', number=100000000)
Out[127]: 9.411366939544678

这里，元组查找花费的时间延长了 35%。这是怎么回事？对于非常紧密的循环，这实际上看起来是一个很大的差异。可能是什么原因造成的？

请注意，对于分解为变量（例如 x,y=t），元组稍微快一些（在我的几次测试中，时间减少了大约 6%），并且对于从固定数量的参数进行构造，元组速度更快（大约减少了 83% 的时间））。不要将这些结果视为一般规则；我只是进行了一些小型测试，这对于大多数项目来说毫无意义。

In [169]: print(sys.version)
2.7.1 (r271:86882M, Nov 30 2010, 09:39:13) 
[GCC 4.0.1 (Apple Inc. build 5494)]

原文

In the past, when I've needed array-like indexical lookups in a tight loop, I usually use tuples, since they seem to be generally extremely performant (close to using just n-number of variables). However, I decided to question that assumption today and came up with some surprising results:

In [102]: l = range(1000)
In [103]: t = tuple(range(1000))
In [107]: timeit(lambda : l[500], number = 10000000)
Out[107]: 2.465047836303711
In [108]: timeit(lambda : t[500], number = 10000000)
Out[108]: 2.8896381855010986

Tuple lookups appear to take 17% longer than list lookups! Repeated experimentation gave similar results. Disassembling each, I found them to both be:

In [101]: dis.dis(lambda : l[5])
  1           0 LOAD_GLOBAL              0 (l)
              3 LOAD_CONST               1 (5)
              6 BINARY_SUBSCR       
              7 RETURN_VALUE

For reference, a typical 10,000,000 global variable lookup/returns take 2.2s. Also, I ran it without the lambdas, y'know, just in case (note that number=100,000,000 rather than 10,000,000).

In [126]: timeit('t[500]', 't=range(1000)', number=100000000)
Out[126]: 6.972800970077515
In [127]: timeit('t[500]', 't=tuple(range(1000))', number=100000000)
Out[127]: 9.411366939544678

Here, the tuple lookup take 35% longer. What's going on here? For very tight loops, this actually seems like a significant discrepancy. What could be causing this?

Note that for decomposition into variable (e.g. x,y=t), tuples are slightly faster (~6% in my few tests less time) and for construction from a fixed number of arguments, tuples are crazy faster(~83% less time). Don't take these results as general rules; I just performed a few minitests that are going to be meaningless for most projects.

In [169]: print(sys.version)
2.7.1 (r271:86882M, Nov 30 2010, 09:39:13) 
[GCC 4.0.1 (Apple Inc. build 5494)]

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

献世佛 2024-11-07 12:30:01

元组主要是构建列表更快，而不是访问列表更快。

元组的访问速度应该稍快一些：它们需要少一个间接寻址。然而，我认为主要的好处是它们在构建列表时不需要第二次分配。

列表的查找速度稍快的原因是 Python 引擎对其进行了特殊的优化：

case BINARY_SUBSCR:
    w = POP();
    v = TOP();
    if (PyList_CheckExact(v) && PyInt_CheckExact(w)) {
        /* INLINE: list[int] */
        Py_ssize_t i = PyInt_AsSsize_t(w);
        if (i < 0)
            i += PyList_GET_SIZE(v);
        if (i >= 0 && i < PyList_GET_SIZE(v)) {
            x = PyList_GET_ITEM(v, i);
            Py_INCREF(x);
        }

将此优化注释掉后，元组比列表稍微快一点（大约快 4%）。

请注意，在这里为元组添加单独的特殊情况优化并不是一个好主意。 VM 循环主体中的每个特殊情况都会增加代码大小，从而降低缓存一致性，这意味着每种其他类型的查找都需要一个额外的分支。

Tuples are primarily faster for constructing lists, not for accessing them.

Tuples should be slightly faster to access: they require one less indirection. However, I believe the main benefit is that they don't require a second allocation when constructing the list.

The reason lists are slightly faster for lookups is because the Python engine has a special optimization for it:

case BINARY_SUBSCR:
    w = POP();
    v = TOP();
    if (PyList_CheckExact(v) && PyInt_CheckExact(w)) {
        /* INLINE: list[int] */
        Py_ssize_t i = PyInt_AsSsize_t(w);
        if (i < 0)
            i += PyList_GET_SIZE(v);
        if (i >= 0 && i < PyList_GET_SIZE(v)) {
            x = PyList_GET_ITEM(v, i);
            Py_INCREF(x);
        }

With this optimization commented out, tuples are very slightly faster than lists (by about 4%).

Note that adding a separate special-case optimization for tuples here isn't necessary a good idea. Every special case like this in the main body of the VM loop increases the code size, which decreases cache consistency, and it means every other type of lookup requires an extra branch.

回复收藏 0 原文