这种用于堆栈切换的内联asm方法可以吗？

发布于 2024-12-26 07:04:09 字数 923 浏览 3 评论 0原文

对于某些功能，我需要切换堆栈以使原始堆栈保持不变。为此，我编写了两个宏，如下所示。

#define SAVE_STACK()    __asm__ __volatile__ ( "mov %%rsp, %0; mov %1, %%rsp" : \
"=m" (saved_sp) : "m" (temp_sp) );
#define RESTORE_STACK() __asm__ __volatile__ ( "mov %0, %%rsp" : \
"=m" (saved_sp) );

这里的 temp_sp 和 saved_sp 是线程局部变量。 temp_sp 指向我们使用的临时堆栈。对于一个我希望不修改其原始堆栈的函数，我将 SAVE_STACK 放在开头，将 RESTORE_STACK 放在底部。例如，像这样。

int some_func(int param1, int param2)
{
 int a, b, r;
 SAVE_STACK();
 // Function Body here
 .....................
 RESTORE_STACK();
 return r;
}

现在我的问题是这种方法是否可行。在 x86（64 位）上，局部变量和参数通过 rbp 寄存器访问，并且 rsp 在函数序言中相应地被减去，直到在函数尾声中添加它为止使其恢复到原始值。因此，我认为这里没有问题。

我不确定在存在上下文切换和信号的情况下这是否正确。（在 Linux 上）。另外，我不确定如果函数是内联的，或者是否应用了尾部调用优化（其中使用 jmp 而不是 call），这是否正确。您发现这种方法有任何问题或副作用吗？

原文

For some functions, I need to switch the stack so that the original stack remains unmodified. For that purpose, I have written two macros as shown below.

#define SAVE_STACK()    __asm__ __volatile__ ( "mov %%rsp, %0; mov %1, %%rsp" : \
"=m" (saved_sp) : "m" (temp_sp) );
#define RESTORE_STACK() __asm__ __volatile__ ( "mov %0, %%rsp" : \
"=m" (saved_sp) );

Here temp_sp and saved_sp are thread local variables. temp_sp points to the makeshift stack that we use. For a function, whose original stack I want unmodified, I place SAVE_STACK at the beginning and RESTORE_STACK at bottom. For example, like this.

int some_func(int param1, int param2)
{
 int a, b, r;
 SAVE_STACK();
 // Function Body here
 .....................
 RESTORE_STACK();
 return r;
}

Now my question is whether this approach is fine. On x86 (64bit), the local variables and parameters are accessed through the rbp register and rsp is accordingly subtracted in function prologue and not touched until in function epilogue where it is added to bring it back to the original value. Therefore, I see no problem here.

I am not sure, if this is correct in the presence of context switches and signals though. (On Linux). Also I'm not sure if this is correct if the function is inlined or if tail call optimization (where jmp instead of call is used) is applied. Do you see any problem or side effects with this approach?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

苦妄 2025-01-02 07:04:09

通过上面显示的代码，我可以想到以下损坏：

在 x86/x64 上，如果认为合适，GCC 将使用序言/尾声“装饰”您的函数，并且您无法阻止它这样做（就像在 ARM 上一样，其中 __attribute__((__naked__)) 强制在没有序言/结尾的情况下创建代码，也就是没有堆栈帧设置）。
这可能最终会在切换堆栈之前分配堆栈/创建对堆栈内存位置的引用。更糟糕的是，如果由于编译器的选择，在切换堆栈之前将这样的地址放入非易失性寄存器中，则它可能会别名为两个位置（您更改的相对于堆栈指针的位置和相对于其他寄存器的位置）这是一样的）。
同样，在 x86/x64 上，ABI 建议对叶函数（“红色区域”）进行优化，其中未分配堆栈帧，但末尾“下方”的 128 字节堆栈可供函数使用。除非您的内存缓冲区考虑到这一点，否则可能会发生您意想不到的溢出。
信号在备用堆栈上处理（请参阅sigaltstack()），并且执行您自己的堆栈切换可能会使您的代码无法从信号处理程序内调用。它肯定会使其不可重入，并且根据检索“堆栈位置”的位置/方式，也肯定会使其成为非线程安全的。

一般来说，如果您想在不同的堆栈上运行特定的代码段，为什么不呢：

在不同的线程中运行它（每个线程都有不同的堆栈）？
触发例如 SIGUSR1 并在信号处理程序中运行代码（您可以将其配置为使用不同的堆栈）？
通过 makecontext() / swapcontext() 运行它（请参阅手册页中的示例）？

编辑：

既然你说“你想比较两个进程的内存”，那么，有不同的方法，特别是外部进程跟踪 - 附加一个“调试器” （这可以是您自己编写的过程，使用 ptrace() 来控制您想要监视的内容，并让它代表您跟踪的对象处理断点/检查点，以执行您需要的验证）。这也会更加灵活，因为它不需要更改您检查的代码。

With the code that you've shown above, I can think of the following breakage:

On x86/x64, GCC will "deco" your function with prologues/epilogues if it sees fit, and you can't stop it from doing that (like on ARM, where __attribute__((__naked__)) forces code creation without prologues/epilogues, aka without stackframe setup).
That might end up allocating stack / creating references to stack memory locations before you switch the stack. Even worse if, again, due to the compiler's choice, such an address is put into a nonvolatile register before you switch the stack, it might alias to two locations (the stackpointer-relative one that you changed and the other-reg-relative one that is the same).
Again, on x86/x64, the ABI suggests an optimization for leaf functions (the "red zone") where no stackframe is allocated yet 128 Bytes of stack "below" the end are usable by the function. Unless your memory buffer takes this into account, overruns might occur that you're not expecting.
Signals are handled on alternate stacks (see sigaltstack()) and doing your own stack switching might make your code uncallable from within signal handlers. It'll definitely make it non-reentrant, and depending on where/how you retrieve the "stack location" will also definitely make it non-threadsafe.

In general, if you want to run a specific piece of code on a different stack, why not either:

run it in a different thread (every thread gets a different stack) ?
trigger e.g. SIGUSR1 and run your code in a signal handler (which you can configure to use a different stack) ?
run it via makecontext() / swapcontext() (see the example in the manpage) ?

Edit:

Since you say "you want to compare the memory of two processes", again, there's different methods for that, in particular external process tracing - attach a "debugger" (that can be a process you write yourself that uses ptrace() to control what you want to monitor, and have it handle e.g. breakpoints / checkpoints on behalf of those you trace, to perform the validations you need). That'd be more flexible as well because it doesn't require to change the code you inspect.

回复收藏 0 原文

念﹏祤嫣 2025-01-02 07:04:09

默认情况下，-fomit-frame-pointer 处于启用状态。除非您打算在禁用优化的情况下进行编译，否则除序言/结尾之外的函数不会触及 RSP 的假设是非常错误的。

即使您确实使用了 -O3 -fno-omit-frame -pointer，在某些情况下，编译器仍会移动 RSP，尽管它们不会使用它来访问参数和局部变量。例如alloc / C99 VLA，或者甚至调用一个具有超过6个参数的函数（或者更准确地说，一个参数不适合寄存器的函数），都会移动RSP。（调用函数可能只使用 mov 存储，具体取决于编译器选择的策略。）

此外，“收缩包装”优化（其中函数延迟保存调用保留的寄存器，直到可能的提前退出之后）可能意味着您的堆栈切换发生<在编译器准备好保存/恢复之前。并且您的恢复可能发生得太晚或太早。（艾迈斯半导体的评论中提到了这一点。）