编写 CPU 绑定脚本来粗略测量 CPU 性能

发布于 2024-11-01 21:38:00 字数 477 浏览 3 评论 0原文

我编写了一个脚本并在不同的机器上运行它。脚本如下所示

def f(n):
    x = None
    while n:
        x = simple_math(n)
        n -= 1
    return x

start = now()
f(BIGNUM)    
print now() - start

在脚本末尾，它打印完成需要多长时间。这是否足以比较不同机器的简单 Python 脚本的实际 CPU 速度？

简单地说，我的意思是它不使用多处理模块或任何其他技术来利用多核机器。

这个问题不是关于

让 python 程序运行得更快，
多处理模块
GIL、I/O 效率等
非 cPython 程序

只是我想确定我理解机器之间 CPU 性能的方法是否相当正确。

原文

I have wrote a script and running it on different machines. Script looks like below

def f(n):
    x = None
    while n:
        x = simple_math(n)
        n -= 1
    return x

start = now()
f(BIGNUM)    
print now() - start

At the end of the script it print how much time does it take to finish. Is this good enough to compare different machine for practical CPU speed for simple Python scripts?

By simple I mean it does not use multiprocessing module or any other technique to take advantage of multi-core machines.

This question is not about

making python programs run faster
multiprocessing module
GIL, I/O efficiency etc.
non cPython programs

Just that I want make sure if my approach to understand CPU performance among machines is fairly correct.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

甜妞爱困 2024-11-08 21:38:00

无数现有基准有什么问题？更复杂的可能更强大一些。我（请注意，我不是这个主题的专家）可以发现您幼稚方法的主要问题是：

现代 CPU 非常复杂，并且采用了非常聪明的优化。纯粹受 CPU 限制的速度可能会有很大差异，具体取决于缓存可以提供帮助的频率、程序导致管道停顿的频率、分支预测正确的频率，以及可能还有更多（这些只是我的想法））。尽管当您使用相同可执行文件的相同版本运行相同的脚本执行相同的操作时，其中许多应该不会产生影响
多线程操作系统永远不会让一个程序独占CPU。总会有一些其他程序同时运行，窃取时间，并且您无法真正知道有多少 x 秒花在运行您的程序上，以及有多少花在其他程序上。至少，您应该多次运行一个程序，并以最少的时间作为它所花费的时间，而其他程序的推断相对较少。即使如此，您也需要在两个基准测试中具有大致相同的系统负载，以使这些数字具有一定的意义。
至少 CPython 不会多线程，因此您只能获得一个核心的速度。

但是，由于您的要求似乎只是“对 CPU 速度的非常粗略的估计”，因此充分意识到这些数字除了将 CPU 速度放入数量级之外不能用于任何其他用途，因此即使在那时也必须持保留态度，并且不要“不要告诉任何有关任何实际应用程序的实际性能的信息”，这可能没问题 - 只是不要认为它接近准确。尽管如此，为什么不使用一个强化的基准测试套件，它已经付出了一些努力来减轻（而不是消除 - 没有人能做到这一点）这些问题呢？

另请注意，timeit stdlib 模块比手动使用秒表更容易使用，并且尝试（不是太难，但这是一个开始）通过我提到的方法修复第二点。

What's wrong with all of the countless existing benchmarks? The more sophisticated ones are propably a bit more robust. The major problems of your naive approach I - and I'm not an expert on this topic, mind you - can spot are:

Modern CPUs are highly complex and employ very clever optimizations. The speed of a purely CPU-bound can vary widely depending on how often the cache can help, how often the program causes pipeline stalls, how often branch prediction is correct, and propably many many more (these were just off the top of my head). Although many of these shouldn't make a difference when you use the same build of the same executable running the same script doing the same pure calculations, they can matter - to a degree none of us can predict - once you change any of these paramteres (e.g. using a different build because of a different OS or architecture).
Multi-threading OSs will never let a program occupy the CPU exclusively. There will always be some other program running at the same time stealing time, and you can't really know how much of the x seconds were spent running your program and how many were spent on other programs. At the very least, you should run a program many times and take the minimum time as the time it takes with relatively little inference from other programs. And even then, you need to have about the same system load in both benchmarks to make the numbers somewhat meaningful.
At least CPython won't multi-thread, so you only get the speed of one core.

But since your requirements seem to be "very rough estimate of CPU speed only, in full awareness that these numbers can't be used for anything except putting CPU speed into orders of magnitude, must be taken with a grain of salt even then and don't tell anything about the actual performance of any real applications", it might be okay - just don't consider it anywhere close to accurate. Still, why not use a hardened benchmark suite that already put some effort into mitigating (not removing - nobody can do that) these problems?

Also note that the timeit stdlib module is both easier to use than manually wielding the stopwatch and tries (not too hard, but it's a start) to fix the second point by the method I mentioned.

回复收藏 0 原文