即使 RAM 很慢,单周期指令也需要一个周期吗?

发布于 2024-08-08 04:48:04 字数 155 浏览 5 评论 0原文

我正在使用嵌入式 RISC 处理器。我无法弄清楚一件基本的事情。

CPU手册明确指出指令ld r1, [p1](C语言中:r1 = *p1)需要一个周期。寄存器r1的大小是32位。然而,内存总线只有 16 位宽。那么如何才能在一个周期内获取所有数据呢?

I am using an embedded RISC processor. There is one basic thing I have a problem figuring out.

The CPU manual clearly states that the instruction ld r1, [p1] (in C: r1 = *p1) takes one cycle. Size of register r1 is 32 bits. However, the memory bus is only 16 bits wide. So how can it fetch all data in one cycle?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

尹雨沫 2024-08-15 04:48:04

时钟时间假设全宽零等待状态存储器。内核执行该指令所需的时间是一个时钟周期。

曾经有一段时间,每条指令占用不同数量的时钟周期。那时内存也相对较快,通常是零等待状态。在管道出现之前有一段时间,您必须烧录一个时钟周期获取,然后一个时钟周期解码,然后一个时钟周期执行,加上可变长度指令的额外时钟周期和额外的时钟周期(如果指令有内存操作)。

如今时钟速度很高,芯片空间相对便宜,因此一个时钟周期加法或乘法是常态,管道和缓存也是如此。处理器时钟速度不再是性能的决定因素。内存相对昂贵且速度慢。因此,缓存(配置、数量和大小)、总线大小、内存速度、外设速度决定了系统的整体性能。通常,增加处理器时钟速度,但不增加内存或外设,即使有任何性能增益,也会表现出最小的性能增益,在某些情况下,它可能会使其速度变慢。

内存大小和等待状态不是参考手册中时钟执行规范的一部分,它们仅讨论内核本身以每条指令的时钟为单位的成本。如果是指令总线和数据总线分离的哈佛架构,那么一个时钟周期就可能是一个内存周期。指令的获取至少发生在前一个时钟周期(如果不是早于该时钟周期),因此在时钟周期开始时,指令准备就绪、解码并执行(读取内存周期)发生在时钟周期结束时的一个时钟周期内。一个时钟周期,读取结果被锁存到寄存器中。如果指令和数据总线是共享的,那么您可能会认为它仍然在一个时钟周期内完成,但您无法获取下一条指令,因此那里有一点停顿,他们可能会欺骗并称其为一个时钟周期循环。

The clock times are assuming full width zero wait state memory. The time it takes for the core to execute that instruction is one clock cycle.

There was a time when each instruction took a different number of clock cycles. Memory was relatively fast then too, usually zero wait state. There was a time before pipelines as well where you had to burn a clock cycle fetching, then a clock cycle decoding, then a clock cycle executing, plus extra clock cycles for variable length instructions and extra clock cycles if the instruction had a memory operation.

Today clock speeds are high, chip real estate is relatively cheap so a one clock cycle add or multiply is the norm, as are pipelines and caches. Processor clock speed is no longer the determining factor for performance. Memory is relatively expensive and slow. So caches (configuration, number of and size), bus size, memory speed, peripheral speed determine the overall performance of a system. Normally increasing the processor clock speed but not the memory or peripherals will show minimal if any performance gain, in some occasions it can make it slower.

Memory size and wait states are not part of the clock execution spec in the reference manual, they are talking about only what the core itself costs you in units of clocks for each of the instructions. If it is a harvard architecture where the instruction and data bus are separate, then one clock is possible with the memory cycle. The fetch of the instruction happens at least the prior clock cycle if not before that, so at the beginning of the clock cycle the instruction is ready, decode, and execute (the read memory cycle) happen during the one clock at the end of the one clock cycle the result of the read is latched into the register. If the instruction and data bus are shared, then you could argue that it still finishes in one clock cycle, but you do not get to fetch the next instruction so there is a bit of a stall there, they might cheat and call that one clock cycle.

萌逼全场 2024-08-15 04:48:04

我的理解是:当说某条指令需要一个周期时,并不是说指令会在一个周期内完成。 我们应该计算指令管道的数量。 假设你的CPU有5级流水线,如果顺序执行该指令将需要5个周期。

My understanding is : when saying some instruction take one cycle , it is not that instruction will be finished in one cycle. We should take in count of instruction pipe-line. Suppose your CPU has 5 stage pipe line , that instruction would takes 5 cycles if it were exectued sequentially.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文