控制 C 中内存映射寄存器的读写访问宽度
我正在使用基于 x86 的核心来操作 32 位内存映射寄存器。仅当 CPU 对此寄存器生成 32 位宽读取和写入时,我的硬件才能正确运行。该寄存器在 32 位地址上对齐,并且不能以字节粒度寻址。
我该如何保证我的 C(或 C99)编译器在所有情况下都只生成完整的 32 位宽读取和写入?
例如,如果我执行如下读取-修改-写入操作:
volatile uint32_t* p_reg = 0xCAFE0000;
*p_reg |= 0x01;
我不希望编译器了解只有底部字节更改并生成 8 位宽读/写的事实。由于 x86 上的 8 位操作的机器代码通常更密集,因此我担心会出现不必要的优化。一般来说,禁用优化不是一种选择。
----- 编辑 --------
一篇有趣且非常相关的论文:http://www.cs。 utah.edu/~regehr/papers/emsoft08-preprint.pdf
I'm using and x86 based core to manipulate a 32-bit memory mapped register. My hardware behaves correctly only if the CPU generates 32-bit wide reads and writes to this register. The register is aligned on a 32-bit address and is not addressable at byte granularity.
What can I do to guarantee that my C (or C99) compiler will only generate full 32-bit wide reads and writes in all cases?
For example, if I do a read-modify-write operation like this:
volatile uint32_t* p_reg = 0xCAFE0000;
*p_reg |= 0x01;
I don't want the compiler to get smart about the fact that only the bottom byte changes and generate 8-bit wide read/writes. Since the machine code is often more dense for 8-bit operations on x86, I'm afraid of unwanted optimizations. Disabling optimizations in general is not an option.
----- EDIT -------
An interesting and very relevant paper: http://www.cs.utah.edu/~regehr/papers/emsoft08-preprint.pdf
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
您关心的问题可以通过
易失性
限定符来解决。6.7.3/6“类型限定符”说:
5.1.2.3“程序执行”说(除其他外):
接下来是通常被称为“as-if”规则的句子,如果最终结果相同,则允许实现不遵循抽象机器语义:
但是,6.7.3/6 本质上是说表达式中使用的 volatile 限定类型不能应用“as-if”规则 - 必须遵循实际的抽象机器语义。因此,如果取消引用指向易失性 32 位类型的指针,则必须读取或写入完整的 32 位值(取决于操作)。
Your concerns are covered by the
volatile
qualifier.6.7.3/6 "Type qualifiers" says:
5.1.2.3 "Program execution" says (among other things):
This is followed by a sentence that is commonly referred to as the 'as-if' rule, which allows an implementation to not follow the abstract machine semantics if the end result is the same:
But, 6.7.3/6 essentially says that volatile-qualified types used in an expression cannot have the 'as-if' rule applied - the actual abstract machine semantics must be followed. Therefore, if pointer to a volatile 32-bit type is dereferenced, then the full 32-bit value must be read or written (depending on the operation).
保证编译器做正确事情的唯一方法是用汇编程序编写加载和存储例程并从 C 调用它们。多年来我使用过的编译器 100% 都可能会出错(包括 GCC) 。
有时优化器会帮助你,例如你想将一些对编译器来说显示为小数字 0x10 的常量存储到 32 位寄存器中,这就是你具体要求的,也是我所看到的,否则优秀的编译器会尝试这样做。一些编译器会认为进行 8 位写入比 32 位写入更便宜,并更改指令。可变指令长度目标将使情况变得更糟,因为编译器试图节省程序空间,而不仅仅是它可能假设的总线上的内存周期。 (例如 xor ax,ax 而不是 mov eax,0)
对于像 gcc 这样不断发展的东西,今天有效的代码不能保证明天也能工作(你甚至不能用当前版本的 gcc 编译某些版本的 gcc) 。同样,在您办公桌上的编译器上运行的代码可能不适用于其他编译器。
去掉猜测和实验,创建加载和存储函数。
这样做的附带好处是,您创建了一个很好的抽象层,如果/当您想以某种方式模拟代码或让代码在应用程序空间而不是在金属上运行时,反之亦然,可以替换汇编器函数使用模拟目标或替换为通过网络到达带有设备的目标的代码等。
The ONLY way to GUARANTEE that the compiler will do the right thing is to write your load and store routines in assembler and call them from C. 100% of the compilers I have used over the years can and will get it wrong (GCC included).
Sometimes the optimizer gets you, for example you want to store some constant that appears to the compiler as a small number 0x10 lets say, into a 32 bit register, which is what you asked specifically and what I have watched otherwise good compilers try to do. Some compilers will decide that it is cheaper to do an 8 bit write instead of a 32 bit write and change the instruction. Variable instruction length targets are going to make this worse as the compiler is trying to save program space and not just memory cycles on what it may assume the bus to be. (xor ax,ax instead of mov eax,0 for example)
And with something that is constantly evolving like gcc, code that works today has no guarantees of working tomorrow (you cant even compile some versions of gcc with the current version of gcc). Likewise code that works on the compiler at your desk may not work universally for others.
Take the guessing and the experimenting out of it, and create load and store functions.
The side benefit to this is that you create a nice abstraction layer, if/when you want to simulate your code in some fashion or have the code run in application space instead of on the metal, or vice versa, the assembler functions can be replaced with a simulated target or replaced with code that crosses a network to a target with the device on it, etc.
好吧,一般来说,如果您将寄存器类型设置为 32 位易失性,我不希望它优化高位字节。由于使用了 volatile 关键字,编译器无法假设高位字节中的值为 0x00。因此,即使您只使用 8 位文字值,它也必须写入完整的 32 位。我从未在 0x86 或 Ti 处理器或其他嵌入式处理器上遇到过此问题。一般来说, volatile 关键字就足够了。唯一有点奇怪的情况是处理器本身不支持您尝试写入的字长,但这对于 32 位数字来说不应该是 0x86 上的问题。
虽然编译器可以生成使用 4 位写入的指令流,但这并不是对单个 32 位写入的处理器时间或指令空间的优化。
Well, generally speaking I wouldn't expect it to optimize out the high order bytes if you have the register typed as a 32 bit volatile. Due to the use of the volatile keyword the compiler cannot assume that the values in the high order bytes are 0x00. Thus it must write the full 32bits even if you are only using a 8bit literal value. I've never experience a issue with this on the 0x86 or Ti processors, or other embedded processors. Generally the volatile keyword is enough. The only time things get a little weird is if the processor does not natively support the word size you're trying to write, but that shouldn't be an issue on the 0x86 for a 32 bit number.
While it would be possible for the compiler to generate a instruction stream that used 4 bit writes, that would not be an optimization in either processor time or instruction space over a single 32 bit write.
如果访问硬件时不使用字节(无符号字符)类型,编译器将更有可能不生成 8 位数据传输指令。
您必须将端口读取为 32 位值,修改该值,然后写回:
If you don't use byte (unsigned char) types when accessing the hardware, there will be a better chance of the compiler not generating 8-bit data transfer instructions.
You would have to read the port as a 32 bit value, modify the value, then write back:
由于针对硬件的读取-修改-写入操作在多条指令中执行总是存在巨大风险,因此大多数处理器提供了一条指令,通过一条不能中断的指令来操作寄存器/内存。
根据您正在操作的寄存器类型,它可能会在修改阶段发生变化,然后您会写回错误值。
如果这很重要,我建议您在汇编中编写自己的读取-修改-写入函数。
我从未听说过优化类型的编译器(为了优化而进行类型转换)。如果它被声明为 int32,那么它始终是 int32,并且始终在内存中右对齐。检查编译器文档以了解各种优化是如何工作的。
我想我知道你的担忧来自哪里,结构。结构通常会被填充以达到最佳对齐方式。这就是为什么您需要在它们周围包装 #pragma pack() 以使它们字节对齐。
您只需单步执行程序集,然后您将看到编译器如何翻译您的代码。我很确定它没有改变你的类型。
Since a read-modify-write operation against hardware always is a huge risk to do in several instructions, most processors offer an instruction to manipulate a register/memory with one single instruction that can't be interrupted.
Depending on what type of register you are manipulating, it could change during your modify phase and then you would write back a false value.
I would recommend as dwelch suggest to write your own read-modify-write function in assembly if this is critical.
I have never heard of a compiler that optimizes a type (doing a type conversion with purpose to optimize). If it is declared as an int32 it is always a int32 and will always be aligned right in memory. Check your compiler documentation to see how the various optimizations work.
I think I know where your concern comes from, structures. Structures are usually padded to the optimal alignment. This is why you need to wrapp a #pragma pack() around them to get them byte aligned.
You can just single step through the assembly and then you will see how the compiler translated your code. I'm pretty sure it has not changed your type.