测量算法的实际运行时间

发布于 2024-10-28 15:45:44 字数 135 浏览 5 评论 0原文

一个抽象算法操作大约分摊到多少条MIPS物理指令?至于抽象算法操作,我指的是基本操作,例如等。

我看看这不是一个严格的测量技术:-)

Kejia

Approximately, how many physical instructions of MIPS does an abstract algorithm operation amortize to? As for an abstract algorithm operation, I means a basic operation, such as add, divide, etc.

I see this is not a strict measuring technique :-)

Kejia

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

冬天旳寂寞 2024-11-04 15:45:45

这取决于CPU架构。一些处理器需要多个周期来执行单个指令(例如除法),而其他处理器则设法在每个周期中执行所有机器代码指令。

有时,衡量算法需要多少浮点运算是相关的。然而这并没有考虑I/O(例如读取内存)。

CPU 的速度有时以 FLOPS(每秒浮点运算数)为单位,这可以帮助您估算时间。再次强调,没有考虑 I/O,也没有考虑多线程问题(也是一个非常重要的衡量因素)。

It depends on the CPU architecture. Some processors requires several cycles for a single instruction such as divivide, while others manage to execute all machine code instructions in a single cycle each.

It is sometimes relevant to measure an algorithm in how many floating point operations it requires. However this does not take I/O (such as reading memory) into consideration.

The speed of a CPU is sometimes provided in FLOPS (Floating Point OPerations per Second) which could help to give you a time estimate. Again, not taking I/O into consideration - and not multi-threading issues (also a very important measuring factor).

拿命拼未来 2024-11-04 15:45:45

Donald Knuth 在撰写《计算机编程的艺术》第一卷时解决了这个问题。
在序言中,他给出了在虚拟机器的汇编代码中呈现算法的冗长理由 -

...为了避免这种困境,我有
试图设计一个“理想”
名为“MIX,”的计算机,具有非常
简单的操作规则...

这样,人们就可以明智地讨论算法需要多少个“周期”,而不必关心机器之间的差异、缓存、延迟、管道或计算机优化的任何其他方式为了节省时间,但不知道需要多长时间。

Donald Knuth addressed this very problem in writing Volume 1 of "The Art of Computer Programming".
In the preface he gives a lengthy justification for presenting algorithms in the assembly code for an imaginary machine -

... To avoid this dilemma, I have
attempted to design an "ideal"
computer called "MIX," with very
simple rules of operation ...

That way, one can talk sensibly about how many "cycles" an algorithm would take, without having to care about differences between machines, caching, latency, pipelines, or any of the other ways computers have been optimized to save time, at the expense of knowing how long they will take.

撩动你心 2024-11-04 15:45:44

此处提供了基本 MIPS 指令的列表。您提到的大多数“基本操作”都是一条或两条 MIPS 指令,这可能适用于大多数当前的 CPU 系列。

然而,这根本没有考虑任何现代 CPU 的架构和性能特征。不同的指令通常具有不同的完成时间。当前的 CPU 通常会实现分支预测、指令管道、内存缓存、并行化以及一系列其他技术,以使代码执行速度更快。

因此,仅仅拥有算法的汇编代码实现并不能说明其执行速度。您必须在实际硬件上测量和分析代码才能获得可比较的结果。事实上,某些算法在某些 CPU 上可能更有效,即使在同一 CPU 系列中也是如此。

一个常见且易于理解的例子是指令缓存的影响。展开循环将消除许多分支操作,这直观地使代码更快。但是,如果您在指令高速缓存非常少的同系列 CPU 上运行该代码,则增加的对主内存的访问可能会使其比简单的基于分支的循环慢得多。

There is a list of the basic MIPS instructions here. Most of the "basic operations" that you mentioned are a single MIPS instruction or perhaps two, which probably holds true on most current CPU families.

However this does not take into account at all the architecture and performance characteristics of any of the modern CPUs. Different instructions often have diffrent completion times. Current CPUs usually implement branch prediction, instruction pipelines, memory caching, parallelisation and a whole list of other techniques to make the code execution faster.

Therefore just having the assembly code implementation of an algorithm says nothing about its execution speed. You would have to measure and profile the code on the actual hardware to obtain comparable results. In fact, some algorithms may be far more effective on certain CPUs, even within the same CPU family.

A common and rather understandable example is the effect of the instruction cache. Unrolling a loop will eliminate a number of branch operations, which intuitively makes code faster. If you run that code on a CPU of the same family with very little instruction cache memory, though, the added accesses to the main memory can make it far slower than the simple branch-based loop.

腹黑女流氓 2024-11-04 15:45:44

计算机很复杂。如果你想达到这个水平,你需要开始考虑你使用的是哪种CPU,你的编译器如何使用这个CPU的指令集,哪些变量保存在哪些寄存器中,它们的位级表示是什么,即便如此,指令数量并不总是能轻松映射到实际运行时间。不同的指令可能需要不同数量的时钟周期来执行,这甚至没有考虑操作系统线程和程序的缓存未命中率。

最后,我们首先使用 big-O notatoin 是有充分理由的:)


顺便说一句,整数上最简单的操作(加、减)应该映射到单个机器指令,以防万一您担心。

Computers are complicated. If you want to get down to this level you need to start considering what kind of CPU you are using, how well your compiler can use this CPU's instruction set, what variables are being kept in what registers, what are their bit-level representations, etc. Even then, the number of instructions not always easily maps to the actual running time. Different instructions can take different ammounts of clock cycles to execute and this is not even thinking about OS threading and your program's cache miss rate.

In the end, there is a good reason we use big-O notatoin in the first place :)


BTW, most simple operations (add, subtract) on integers should map to a single machine instruction, in case you are worried.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文