在执行该代码时,将未对准的机器代码中的直接操作数编写是安全的吗?
假设我有看起来像这样的X86-64代码(尽管此问题更普遍地适用于所有代码):
mov rbx,7F0140E5247Dh
jmp rbx
如果该目标值不符合,那么覆盖目标常数是否安全,而该代码可以执行?换句话说,我可以观察到部分更新的跳跃目标,从而导致跳到不存在的地址?另外,如果目标常数横穿页面或缓存线边界,这是否安全?
编辑:
我只对更改单个说明而不更改指令边界位置感兴趣。
Let's say I have x86-64 code that looks like this (though this question applies more generally to all code):
mov rbx,7F0140E5247Dh
jmp rbx
Is it safe to overwrite the target constant if that target value is not aligned, while that code could be executing? In other words could I observe a partially updated jump target, resulting in a jumping to non existent addresses? Additionally is this safe if the target constant crosses pages or cache line boundaries?
Edit:
I'm only interested in changing single instructions and not changing the instruction boundary locations.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
只有写作为原子,保证只要不跨越缓存线边界,但在英特尔上不重组的QWORD会写入,但不能保证在AMD上。最低命名的原子能性保证是8字节对齐的商店是原子的,不仅仅是这一点。
使用
XCHG
进行保证 - 原子RMW。如果常数本身越过缓存线边界,那将非常慢,但我相信正确。 (总线锁,不仅是缓存锁;因此,速度甚至是一个perf计数器,甚至仅用于拆分>lock
,甚至是CPU功能,至少在内核代码中造成该故障它在VM中。),如果常数不会跨越有问题的边界,那么它的任何CPU都应该像对齐的原子操作一样快。或者,如果您的CPU支持AVX,则可以保证16个字节的SSE/AVX商店在使用AVX的CPU上保证了原子。 (直到几年后才记录在实践中基本上是安全的,但幸运的是,它对所有AVX CPU都追溯了,没有新的功能。)边界,您可以以这种方式进行更新。 否则与周围的字节覆盖周围的字节不会引起问题。
(除非另一个线程也非常在附近进行另一个常数的更新, 一些填充或NOP使连续的8字节对齐,尤其是如果您可以仅延长早期说明不需要实际的NOP,甚至不需要
MOV R64,IMM64
本身。 (尽管这是10个字节,并且指令的最大长度为15。)在其他情况下,这确实是 not 在其他情况下完全推广到替换多个指令,
在其他情况下,您可能会重写一系列指令,其中包含一个指令范围的指令不同的地方,那将是另一个故事。您说这个问题“更普遍”,但仅用于更换立即或用相同长度之一替换整个4字节或8字节的说明。如果另一个线程可以在您正在编写的区域内在RIP中睡觉或运行,则在更新后,您必须考虑从旧序列中的任何可能的RIP来考虑CodeFetch的情况。因此,正如我所说,改变指导界限是有问题的。
但是,如果您尊重这一限制,则交叉修改代码是AFAIK安全的。我认为Windows热点QUIESCES可能正在运行代码的其他线程,但是我不知道为什么它已经确保有一个大量的指令可以覆盖它。他们要么过于谨慎,要么有一些我不知道不尊重商店原子的风险。也许只是他们不想在不一致的函数的情况下依赖2个字节商店的原子,甚至认为这是正常编译器设置的单独原因,这是默认的。
Only if the write is atomic, which is guaranteed with unaligned qword writes on Intel as long as it doesn't span a cache-line boundary, but not guaranteed on AMD. The lowest-common-denominator atomicity guarantee is that 8-byte aligned stores are atomic, no more than that.
Use an
xchg
to do a guaranteed-atomic RMW. That will be very slow if the constant itself crosses a cache-line boundary, but correct I believe. (Bus lock, not just a cache lock; so slow there's a perf counter even just for split-lock
, and even a CPU feature to make that fault at least in kernel code so you can find instances of it in a VM.) And if the constant doesn't span a problematic boundary for whatever CPU this is, it should be as fast as an aligned atomic operation.Or if your CPU supports AVX, 16-byte aligned SSE/AVX stores are guaranteed atomic on CPUs with AVX. (Only recently documented after years of this being known to be basically safe in practice, but fortunately it's retroactive to all AVX CPUs, no new feature-bit.) So if you can get your constant to line up to not span a 16-byte boundary, you can update it that way. (Overwriting the surrounding bytes with themselves can't cause a problem, unless another thread is also doing updates of another constant very nearby.)
If performance matters for this (e.g. doing it more than once a minute or so), probably worthwhile to use some padding or a NOP to get the constant 8-byte aligned, especially if you can just lengthen earlier instructions to not need an actual NOP, or even the
mov r64,imm64
itself. (Although it's 10 bytes and the max length for an instruction is 15.)This does not fully generalize to replacing multiple instructions
In other cases where you might be rewriting a sequence of instructions with one with instruction boundaries in different places, that would be a different story. You say the question applies "more generally", but only to replacing an immediate or replacing a whole 4-byte or 8-byte instruction with one of the same length. If another thread could be sleeping or running with RIP inside the region you're writing, you have to consider the case of code-fetch from any possible RIP from the old sequence, after the update. So as I said, changing instruction boundaries is problematic.
But if you respect that limitation, cross-modifying code is AFAIK safe. I think Windows hot-patching quiesces other threads that might be running code, but I don't know why since it already makes sure there's a single large-enough instruction for it to overwrite. Either they're over-cautious, or there's some risk I'm not aware of with code-fetch not respecting store atomicity. Maybe it's just that they don't want to depend on 2-byte store atomicity in case of unaligned functions, even thought that's the default for separate reasons with normal compiler settings.