使用未使用的变量可以提高 20% 的速度?为什么?
我正在做很多基准测试。我从来没有见过这样的事情。我很困惑。创建一个根本不使用的额外全局变量可以使我的部分代码速度提高约 20%。 为什么?
我正在对一个生成可迭代对象的函数进行基准测试,测量消耗(迭代)它们所需的时间。我有两种食用方式。当我获得高 CPU 份额时的典型时间:
| Without the variable | With the variable
-------+----------------------+-------------------
Output | 0.74 s consume_1 | 0.72 s consume_1
| 0.96 s consume_2 | 0.77 s consume_2
| |
| 0.74 s consume_1 | 0.75 s consume_1
| 0.96 s consume_2 | 0.78 s consume_2
| |
| 0.73 s consume_1 | 0.73 s consume_1
| 0.95 s consume_2 | 0.78 s consume_2
-------+----------------------+-------------------
Debug | Real time: 5.110 s | Real time: 4.560 s
| User time: 4.546 s | User time: 4.386 s
| Sys. time: 0.535 s | Sys. time: 0.150 s
| CPU share: 99.43 % | CPU share: 99.47 %
创建无意义的变量使 consume_2
的消耗速度加快了约 0.2 秒(从 0.97 到 0.77)。此外,“调试”统计数据也存在显着差异。最剧烈的是“系统时间”:对于“无”,它始终在 0.5 秒左右,对于“有”,它始终在 0.14 秒左右。
我正在 TIO 上执行此操作,您可以自己在那里重现它:
<一href="https://tio.run/##dVLbbsMgDH3PV/ilKlmjKmlfpkj9hf1AVUU0cVokAgzcLf36DEIuWqfxBD7HPsc25kl3rY7vxg5Da3UHJDoUBKIz2tL0SkZEG 7SctJ0xQdjdkAhtxP3bktbSzQSLBvmUXGspsSah1QI3@PnAJGmwBeT1vfoWdK803dG6qmChGr9KTMsE/OkdnIAeRuKKjEAbDGXQg1CA6tEFj8h6N@WF8xQoG@ g9yZ1LcYFduIhdUV6SpPaGfFKV@/IfWk1@5mjBXp1NZYNqgDKI4VH9hboaGDtlMZpBnv4WOfwnEh@h8Y4bto6bFWn2Ry1OIyoF@nib7XnVuAyWp2l0IB35wlI 4YparG7Ii9yf1SOjNafmFNnR1/rubC7zBcZ3D1EcgL3PL1u7WMQTB@KGY5N214eXMYlGPeVPBn1/jFe2pSJdUY4Uitt3sDy042MIGaJHYV5XiXqmK9EhNh@EH" rel="nofollow noreferrer" title="Python 3.8(预发行版) – 在线试用">没有变量 / 使用变量
这是代码,我将额外的变量称为 foobar
。另请注意,consume_1
加载全局 deque
10000 次,而 consume_2
仅加载少量全局变量,所以如果有的话,我认为consume_1
将是受影响的。
from timeit import timeit
from operator import itemgetter
from itertools import repeat
from collections import deque
def each_with_others_1(iterable):
xs = tuple(iterable)
for i, x in enumerate(xs):
yield x, xs[:i] + xs[i+1:]
consume_0 = None
def consume_1(each_with_others):
for each, others in each_with_others:
deque(others, 0)
def consume_2(each_with_others):
otherss = map(itemgetter(1), each_with_others)
deque(map(deque, otherss, repeat(0)), 0)
lst = list(range(10000))
foobar = None
for solver in [each_with_others_1] * 3:
for consume in consume_1, consume_2:
t = timeit(lambda: consume(solver(lst)), number=1)
print('%.2f s ' % t, consume.__name__)
print()
更新:在安装 Python 3.8.2 后也在 Google Compute Engine 实例上重现,创建变量使 consume_2
速度加快了约 15%:
| Without the variable | With the variable
-------+----------------------+-------------------
Output | 0.64 s consume_1 | 0.65 s consume_1
| 0.80 s consume_2 | 0.68 s consume_2
| |
| 0.64 s consume_1 | 0.65 s consume_1
| 0.80 s consume_2 | 0.68 s consume_2
| |
| 0.64 s consume_1 | 0.64 s consume_1
| 0.78 s consume_2 | 0.68 s consume_2
-------+----------------------+-------------------
Debug | real 0m 4.327s | real 0m 3.987s
| user 0m 3.987s | user 0m 3.902s
| sys 0m 0.340s | sys 0m 0.084s
“调试”来自将其称为 time python test。 py
.对于“without”,sys
始终在 0.32 秒左右。对于“with”,它始终在 0.09 秒左右。
I'm doing a lot of benchmarks. I've never seen something like this. I'm stumped. Creating an extra global variable, not used at all, makes part of my code about 20% faster. Why?
I'm benchmarking a function that produces iterables, measuring how long it takes to consume (iterate) them. I have two ways for consuming. Typical times when I get high CPU share:
| Without the variable | With the variable
-------+----------------------+-------------------
Output | 0.74 s consume_1 | 0.72 s consume_1
| 0.96 s consume_2 | 0.77 s consume_2
| |
| 0.74 s consume_1 | 0.75 s consume_1
| 0.96 s consume_2 | 0.78 s consume_2
| |
| 0.73 s consume_1 | 0.73 s consume_1
| 0.95 s consume_2 | 0.78 s consume_2
-------+----------------------+-------------------
Debug | Real time: 5.110 s | Real time: 4.560 s
| User time: 4.546 s | User time: 4.386 s
| Sys. time: 0.535 s | Sys. time: 0.150 s
| CPU share: 99.43 % | CPU share: 99.47 %
Creating the pointless variable makes the consumption with consume_2
about 0.2 seconds faster (from 0.97 to 0.77). Also, significant differences in the "Debug" statistics. The most drastic is "Sys. time": for "without" it's consistently around 0.5 seconds and for "with" it's consistently around 0.14 seconds.
I'm doing this on TIO, and you can reproduce it there yourself:
Without the variable / With the variable
Here's the code, I called the extra variable foobar
. Also note that consume_1
loads the global deque
10000 times while consume_2
has just a handful of loads of globals, so if anything, I'd think that consume_1
would be the affected one.
from timeit import timeit
from operator import itemgetter
from itertools import repeat
from collections import deque
def each_with_others_1(iterable):
xs = tuple(iterable)
for i, x in enumerate(xs):
yield x, xs[:i] + xs[i+1:]
consume_0 = None
def consume_1(each_with_others):
for each, others in each_with_others:
deque(others, 0)
def consume_2(each_with_others):
otherss = map(itemgetter(1), each_with_others)
deque(map(deque, otherss, repeat(0)), 0)
lst = list(range(10000))
foobar = None
for solver in [each_with_others_1] * 3:
for consume in consume_1, consume_2:
t = timeit(lambda: consume(solver(lst)), number=1)
print('%.2f s ' % t, consume.__name__)
print()
Update: Also reproduced on a Google Compute Engine instance after installing Python 3.8.2, creating the variable made consume_2
about 15% faster:
| Without the variable | With the variable
-------+----------------------+-------------------
Output | 0.64 s consume_1 | 0.65 s consume_1
| 0.80 s consume_2 | 0.68 s consume_2
| |
| 0.64 s consume_1 | 0.65 s consume_1
| 0.80 s consume_2 | 0.68 s consume_2
| |
| 0.64 s consume_1 | 0.64 s consume_1
| 0.78 s consume_2 | 0.68 s consume_2
-------+----------------------+-------------------
Debug | real 0m 4.327s | real 0m 3.987s
| user 0m 3.987s | user 0m 3.902s
| sys 0m 0.340s | sys 0m 0.084s
The "Debug" came from calling it as time python test.py
. For "without", sys
is consistently around 0.32 seconds. For "with" it's consistently around 0.09 seconds.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论