使用未使用的变量可以提高 20% 的速度?为什么?

发布于 2025-01-17 02:48:24 字数 4191 浏览 3 评论 0原文

我正在做很多基准测试。我从来没有见过这样的事情。我很困惑。创建一个根本不使用的额外全局变量可以使我的部分代码速度提高约 20%。 为什么?

我正在对一个生成可迭代对象的函数进行基准测试,测量消耗(迭代)它们所需的时间。我有两种食用方式。当我获得高 CPU 份额时的典型时间:

       | Without the variable | With the variable 
-------+----------------------+-------------------
Output |   0.74 s  consume_1  |  0.72 s  consume_1 
       |   0.96 s  consume_2  |  0.77 s  consume_2 
       |                      |                    
       |   0.74 s  consume_1  |  0.75 s  consume_1 
       |   0.96 s  consume_2  |  0.78 s  consume_2 
       |                      |                    
       |   0.73 s  consume_1  |  0.73 s  consume_1 
       |   0.95 s  consume_2  |  0.78 s  consume_2 
-------+----------------------+-------------------
Debug  |  Real time: 5.110 s  | Real time: 4.560 s
       |  User time: 4.546 s  | User time: 4.386 s
       |  Sys. time: 0.535 s  | Sys. time: 0.150 s
       |  CPU share: 99.43 %  | CPU share: 99.47 %

创建无意义的变量使 consume_2 的消耗速度加快了约 0.2 秒(从 0.97 到 0.77)。此外,“调试”统计数据也存在显着差异。最剧烈的是“系统时间”:对于“无”,它始终在 0.5 秒左右,对于“有”,它始终在 0.14 秒左右。

我正在 TIO 上执行此操作,您可以自己在那里重现它:
<一href="https://tio.run/##dVLbbsMgDH3PV/ilKlmjKmlfpkj9hf1AVUU0cVokAgzcLf36DEIuWqfxBD7HPsc25kl3rY7vxg5Da3UHJDoUBKIz2tL0SkZEG 7SctJ0xQdjdkAhtxP3bktbSzQSLBvmUXGspsSah1QI3@PnAJGmwBeT1vfoWdK803dG6qmChGr9KTMsE/OkdnIAeRuKKjEAbDGXQg1CA6tEFj8h6N@WF8xQoG@ g9yZ1LcYFduIhdUV6SpPaGfFKV@/IfWk1@5mjBXp1NZYNqgDKI4VH9hboaGDtlMZpBnv4WOfwnEh@h8Y4bto6bFWn2Ry1OIyoF@nib7XnVuAyWp2l0IB35wlI 4YparG7Ii9yf1SOjNafmFNnR1/rubC7zBcZ3D1EcgL3PL1u7WMQTB@KGY5N214eXMYlGPeVPBn1/jFe2pSJdUY4Uitt3sDy042MIGaJHYV5XiXqmK9EhNh@EH" rel="nofollow noreferrer" title="Python 3.8(预发行版) – 在线试用">没有变量 / 使用变量

这是代码,我将额外的变量称为 foobar。另请注意,consume_1 加载全局 deque 10000 次,而 consume_2 仅加载少量全局变量,所以如果有的话,我认为consume_1 将是受影响的。

from timeit import timeit
from operator import itemgetter
from itertools import repeat
from collections import deque

def each_with_others_1(iterable):
    xs = tuple(iterable)
    for i, x in enumerate(xs):
        yield x, xs[:i] + xs[i+1:]

consume_0 = None

def consume_1(each_with_others):
    for each, others in each_with_others:
        deque(others, 0)

def consume_2(each_with_others):
    otherss = map(itemgetter(1), each_with_others)
    deque(map(deque, otherss, repeat(0)), 0)

lst = list(range(10000))
foobar = None
for solver in [each_with_others_1] * 3:
    for consume in consume_1, consume_2:
        t = timeit(lambda: consume(solver(lst)), number=1)
        print('%.2f s ' % t, consume.__name__)
    print()

更新:在安装 Python 3.8.2 后也在 Google Compute Engine 实例上重现,创建变量使 consume_2 速度加快了约 15%:

       | Without the variable | With the variable 
-------+----------------------+-------------------
Output |   0.64 s  consume_1  |  0.65 s  consume_1 
       |   0.80 s  consume_2  |  0.68 s  consume_2 
       |                      |                    
       |   0.64 s  consume_1  |  0.65 s  consume_1 
       |   0.80 s  consume_2  |  0.68 s  consume_2 
       |                      |                    
       |   0.64 s  consume_1  |  0.64 s  consume_1 
       |   0.78 s  consume_2  |  0.68 s  consume_2 
-------+----------------------+-------------------
Debug  |   real   0m 4.327s   |  real   0m 3.987s
       |   user   0m 3.987s   |  user   0m 3.902s
       |   sys    0m 0.340s   |  sys    0m 0.084s

“调试”来自将其称为 time python test。 py.对于“without”,sys 始终在 0.32 秒左右。对于“with”,它始终在 0.09 秒左右。

I'm doing a lot of benchmarks. I've never seen something like this. I'm stumped. Creating an extra global variable, not used at all, makes part of my code about 20% faster. Why?

I'm benchmarking a function that produces iterables, measuring how long it takes to consume (iterate) them. I have two ways for consuming. Typical times when I get high CPU share:

       | Without the variable | With the variable 
-------+----------------------+-------------------
Output |   0.74 s  consume_1  |  0.72 s  consume_1 
       |   0.96 s  consume_2  |  0.77 s  consume_2 
       |                      |                    
       |   0.74 s  consume_1  |  0.75 s  consume_1 
       |   0.96 s  consume_2  |  0.78 s  consume_2 
       |                      |                    
       |   0.73 s  consume_1  |  0.73 s  consume_1 
       |   0.95 s  consume_2  |  0.78 s  consume_2 
-------+----------------------+-------------------
Debug  |  Real time: 5.110 s  | Real time: 4.560 s
       |  User time: 4.546 s  | User time: 4.386 s
       |  Sys. time: 0.535 s  | Sys. time: 0.150 s
       |  CPU share: 99.43 %  | CPU share: 99.47 %

Creating the pointless variable makes the consumption with consume_2 about 0.2 seconds faster (from 0.97 to 0.77). Also, significant differences in the "Debug" statistics. The most drastic is "Sys. time": for "without" it's consistently around 0.5 seconds and for "with" it's consistently around 0.14 seconds.

I'm doing this on TIO, and you can reproduce it there yourself:
Without the variable / With the variable

Here's the code, I called the extra variable foobar. Also note that consume_1 loads the global deque 10000 times while consume_2 has just a handful of loads of globals, so if anything, I'd think that consume_1 would be the affected one.

from timeit import timeit
from operator import itemgetter
from itertools import repeat
from collections import deque

def each_with_others_1(iterable):
    xs = tuple(iterable)
    for i, x in enumerate(xs):
        yield x, xs[:i] + xs[i+1:]

consume_0 = None

def consume_1(each_with_others):
    for each, others in each_with_others:
        deque(others, 0)

def consume_2(each_with_others):
    otherss = map(itemgetter(1), each_with_others)
    deque(map(deque, otherss, repeat(0)), 0)

lst = list(range(10000))
foobar = None
for solver in [each_with_others_1] * 3:
    for consume in consume_1, consume_2:
        t = timeit(lambda: consume(solver(lst)), number=1)
        print('%.2f s ' % t, consume.__name__)
    print()

Update: Also reproduced on a Google Compute Engine instance after installing Python 3.8.2, creating the variable made consume_2 about 15% faster:

       | Without the variable | With the variable 
-------+----------------------+-------------------
Output |   0.64 s  consume_1  |  0.65 s  consume_1 
       |   0.80 s  consume_2  |  0.68 s  consume_2 
       |                      |                    
       |   0.64 s  consume_1  |  0.65 s  consume_1 
       |   0.80 s  consume_2  |  0.68 s  consume_2 
       |                      |                    
       |   0.64 s  consume_1  |  0.64 s  consume_1 
       |   0.78 s  consume_2  |  0.68 s  consume_2 
-------+----------------------+-------------------
Debug  |   real   0m 4.327s   |  real   0m 3.987s
       |   user   0m 3.987s   |  user   0m 3.902s
       |   sys    0m 0.340s   |  sys    0m 0.084s

The "Debug" came from calling it as time python test.py. For "without", sys is consistently around 0.32 seconds. For "with" it's consistently around 0.09 seconds.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文