我正在尝试学习汇编(所以请耐心等待),并且我在这一行收到编译错误:
mov byte [t_last], [t_cur]
错误是
error: invalid combination of opcode and operands
我怀疑此错误的原因很简单,mov 指令不可能在两个指令之间移动内存地址,但谷歌搜索了半个小时,我还无法确认这一点 - 是这样吗?
另外,假设我是对的,这意味着我需要使用寄存器作为复制内存的中间点:
mov cl, [t_cur]
mov [t_last], cl
建议使用什么寄存器(或者我应该使用堆栈)?
I'm trying to learn assembly (so bear with me) and I'm getting a compile error on this line:
mov byte [t_last], [t_cur]
The error is
error: invalid combination of opcode and operands
I suspect that the cause of this error is simply that its not possible for a mov instruction to move between two memory addresses, but half an hour of googling and I haven't been able to confirm this - is this the case?
Also, assuming I'm right that means I need to use a register as an intermediate point for copying memory:
mov cl, [t_cur]
mov [t_last], cl
Whats the recommended register to use (or should I use the stack instead)?
发布评论
评论(6)
你的怀疑是正确的,你无法从一个记忆转移到另一个记忆。
任何通用寄存器都可以。如果您不确定寄存器中的内容,请记住按下寄存器,并在完成后将其恢复。
Your suspicion is correct, you can't move from memory to memory.
Any general-purpose register will do. Remember to PUSH the register if you are not sure what's inside it and to restore it back once done.
在 16 位中这非常简单,只需执行以下操作:
注意:pushes &如果需要保存寄存器的内容,则需要 pops。
It's really simple in 16 bit, just do the following:
Note: the pushes & pops are neceessary if you need to save the contents of the registers.
没错,x86 机器代码无法使用两个显式内存操作数(
[]
中指定的任意地址)对指令进行编码任何不需要保存/恢复的寄存器。
在所有主流的 32 位和 64 位调用约定中,EAX、ECX 和 EDX 都是调用破坏的,因此 AL 、CL 和 DL 是不错的选择。对于字节或字复制,您通常需要将
movzx
加载到 32 位寄存器中,然后存储到 8 位或 16 位寄存器中。这避免了对寄存器旧值的错误依赖。如果您主动想要合并到另一个值的低位,则仅使用窄 16 或 8 位mov
加载。 x86 的movzx
类似于 ARMldrb
等指令。在 64 位模式下,SIL、DIL、r8b、r9b 等也是不错的选择,但需要在存储的机器代码中使用 REX 前缀,因此有一个较小的代码大小原因需要避免使用它们。
出于性能原因,通常避免编写 AH、BH、CH 或 DH,除非您已阅读并理解以下链接,并且任何错误的依赖关系或部分寄存器合并停顿都不会成为问题或在代码中根本不会发生。
首先,您根本无法压入单个字节,因此您无法从堆栈进行字节加载/字节存储。对于单词、双字或 qword(取决于 CPU 模式),您可以
push [src]
/pop [dst]
,但这比通过登记。在从最终目的地读取数据之前,它引入了额外的存储/重新加载存储转发延迟,并需要更多的微指令。除非堆栈上的某个位置是所需的目的地,并且您无法将该局部变量优化到寄存器中,在这种情况下,
push [src]
只需将其复制到那里即可并为其分配堆栈空间。请参阅 https://agner.org/optimize/ 以及 x86 标签 wiki
That's correct, x86 machine code can't encode an instruction with two explicit memory operands (arbitrary addresses specified in
[]
)Any register you don't need to save/restore.
In all the mainstream 32-bit and 64-bit calling conventions, EAX, ECX, and EDX are call-clobbered, so AL, CL, and DL are good choices. For a byte or word copy, you typically want a
movzx
load into a 32-bit register, then an 8-bit or 16-bit store. This avoids a false dependency on the old value of the register. Only use a narrow 16 or 8-bitmov
load if you actively want to merge into the low bits of another value. x86'smovzx
is the analogue of instructions like ARMldrb
.In 64-bit mode, SIL, DIL, r8b, r9b and so on are also fine choices, but require a REX prefix in the machine code for the store so there's a minor code-size reason to avoid them.
Generally avoid writing AH, BH, CH, or DH for performance reasons, unless you've read and understood the following links and any false dependencies or partial-register merging stalls aren't going to be a problem or happen at all in your code.
First of all, you can't push a single byte at all, so there's no way you could do a byte load / byte store from the stack. For a word, dword, or qword (depending on CPU mode), you could
push [src]
/pop [dst]
, but that's a lot slower than copying via a register. It introduces an extra store/reload store-forwarding latency before the data can be read from the final destination, and takes more uops.Unless somewhere on the stack is the desired destination and you can't optimize that local variable into a register, in which case
push [src]
is just fine to copy it there and allocate stack space for it.See https://agner.org/optimize/ and other x86 performance links in the x86 tag wiki
从技术上来说,从一个记忆转移到另一个记忆是可能的。
尝试使用MOVS(移动字符串),并设置[E]SI和[E]DI,具体取决于是否
您想要传输字节、字等。
但请注意,这比执行两次 MOV 效率低,但它确实在一条指令中执行复制。
以下是 MOVS 的使用方式及其工作原理:
https://www.felixcloutier.com/x86/movs:movsb: movsw:movsd:movsq
指令 MOVS 几乎从不单独使用,大多数情况下与 REP 前缀结合使用。
现代 CPU 具有相当高效的
rep movs
实现,接近使用 AVX 向量加载/存储指令的循环速度。从逻辑上讲,复制发生在 48 个 4 字节双字块的副本中,但真正现代的 CPU(快速字符串/ERMSB)将使用 16 或 32 字节块来提高效率。
本手册解释了如何使用 REP 及其工作原理:
https://www.felixcloutier.com/x86/rep:repe:代表:代表:代表
It is technically possible to move from memory to memory.
Try using MOVS (move string), and setting [E]SI and [E]DI, depending on whether
you want to transfer byte(s), word(s), etc.
Note however that this is less efficient than executing MOV twice, but it does execute the copy in a single instruction.
Here's how MOVS should be used, and how it works:
https://www.felixcloutier.com/x86/movs:movsb:movsw:movsd:movsq
The instruction MOVS is almost never used on its own, and is for the most part used in conjunction with a REP prefix.
Modern CPUs have fairly efficient implementations of
rep movs
that is close to the speed of a loop using AVX vector load/store instructions.Logically the copy happens in 48 copies of 4-byte dword chunks, but really modern CPUs (fast strings / ERMSB) will use 16 or 32-byte chunks for efficiency.
This manual explains how REP should be used, and how it works:
https://www.felixcloutier.com/x86/rep:repe:repz:repne:repnz
还有一个 MOVS 命令用于将数据从内存移动到内存:
There's also a MOVS command from moving data from memory to memory:
只是想和大家讨论一下“记忆障碍”。
在c代码中将
被汇编为
系统不能保证赋值的原子性。这就是为什么我们需要人民币
(读屏障)
Just want to discuss "memory barrier" with you.
In c code
would be assembled to
The system cannot guarantee the atomicity of the assignment. That's why we need a rmb
(read barrier)