在 GNU C 内联汇编中编写 Linux int 80h 系统调用包装器

发布于 2024-10-19 13:13:51 字数 1313 浏览 2 评论 0原文

我正在尝试使用内联汇编... 我阅读了此页面 http://www.codeproject.com/KB/cpp/edujini_inline_asm.aspx 但我无法理解传递给我的函数的参数。

我正在写一个 C 编写示例..这是我的函数头：

write2(char *str, int len){
}

这是我的汇编代码：

global write2
write2:
    push ebp
    mov ebp, esp
    mov eax, 4      ;sys_write
    mov ebx, 1      ;stdout
    mov ecx, [ebp+8]    ;string pointer
    mov edx, [ebp+12]   ;string size
    int 0x80        ;syscall
    leave
    ret

我必须做什么才能将该代码传递给 C 函数...我正在做这样的事情：

write2(char *str, int len){
    asm ( "movl 4, %%eax;"
          "movl 1, %%ebx;"
          "mov %1, %%ecx;"
          //"mov %2, %%edx;"
          "int 0x80;"
           :
           : "a" (str), "b" (len)
    );
}

那是因为我没有输出变量，那么我该如何处理呢？另外，使用这段代码：

global main
main:
    mov ebx, 5866       ;PID
    mov ecx, 9      ;SIGKILL
    mov eax, 37     ;sys_kill
    int 0x80        ;interruption
    ret

我怎样才能将该代码内联到我的代码中..这样我就可以向用户询问pid..像这样.. 这是我的预编码

void killp(int pid){
    asm ( "mov %1, %%ebx;"
          "mov 9, %%ecx;"
          "mov 37, %%eax;"
           :
           : "a" (pid)         /* optional */
    );
}

原文

I'm trying to use inline assembly...
I read this page http://www.codeproject.com/KB/cpp/edujini_inline_asm.aspx but I can't understand the parameters passing to my function.

I'm writing a C write example.. this is my function header:

write2(char *str, int len){
}

And this is my assembly code:

global write2
write2:
    push ebp
    mov ebp, esp
    mov eax, 4      ;sys_write
    mov ebx, 1      ;stdout
    mov ecx, [ebp+8]    ;string pointer
    mov edx, [ebp+12]   ;string size
    int 0x80        ;syscall
    leave
    ret

What do I have to do pass that code to the C function... I'm doing something like this:

write2(char *str, int len){
    asm ( "movl 4, %%eax;"
          "movl 1, %%ebx;"
          "mov %1, %%ecx;"
          //"mov %2, %%edx;"
          "int 0x80;"
           :
           : "a" (str), "b" (len)
    );
}

That's because I don't have an output variable, so how do I handle that?
Also, with this code:

global main
main:
    mov ebx, 5866       ;PID
    mov ecx, 9      ;SIGKILL
    mov eax, 37     ;sys_kill
    int 0x80        ;interruption
    ret

How can I put that code inline in my code.. so I can ask for the pid to the user.. like this..
This is my precode

void killp(int pid){
    asm ( "mov %1, %%ebx;"
          "mov 9, %%ecx;"
          "mov 37, %%eax;"
           :
           : "a" (pid)         /* optional */
    );
}

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

壹場煙雨 2024-10-26 13:13:51

好吧，你没有具体说，但从你的帖子来看，你似乎正在使用 gcc 及其带有约束语法的内联汇编（其他 C 编译器具有非常不同的内联语法）。也就是说，您可能需要使用 AT&T 汇编器语法而不是 Intel，因为这就是 gcc 所使用的语法。

综上所述，让我们看看您的 write2 函数。首先，您不想创建堆栈帧，因为 gcc 将创建一个堆栈帧，因此如果您在 asm 代码中创建一个堆栈帧，最终将得到两个帧，事情可能会变得非常混乱。其次，由于 gcc 正在布置堆栈帧，因此您无法使用“[ebp + offset]”访问变量，因为您不知道它是如何布置的。

这就是约束的目的——你说你希望 gcc 将值放在什么地方（任何寄存器、内存、特定寄存器）并在 asm 代码中使用“%X”。最后，如果您在 asm 代码中使用显式寄存器，则需要在第三部分（在输入约束之后）列出它们，以便 gcc 知道您正在使用它们。否则，它可能会在其中一个寄存器中放入一些重要的值，而您会破坏该值。

您还需要告诉编译器内联汇编将或可能从输入操作数指向的内存中读取或写入；这是不暗示的。

因此，您的 write2 函数如下所示：

void write2(char *str, int len) {
    __asm__ volatile (
        "movl $4, %%eax;"      // SYS_write
        "movl $1, %%ebx;"      // file descriptor = stdout_fd
        "movl %0, %%ecx;"
        "movl %1, %%edx;"
        "int $0x80"
        :: "g" (str), "g" (len)       // input values we MOV from
        : "eax", "ebx", "ecx", "edx", // registers we destroy
          "memory"                    // memory has to be in sync so we can read it
     );
}

请注意 AT&T 语法 - src、dest 而不是 dest、src 和寄存器名称之前的 %。

现在这可以工作了，但是效率很低，因为它会包含很多额外的 movs。一般来说，您不应该在 asm 代码中使用 mov 指令或显式寄存器，因为您最好使用约束来说明您想要的东西并让编译器确保它们在那里。这样，优化器可能可以摆脱大部分 mov，特别是如果它内联函数（如果您指定 -O3，它将执行此操作）。方便的是，i386 机器模型对特定寄存器有限制，因此您可以这样做：

void write2(char *str, int len) {
    __asm__ volatile (
        "movl $4, %%eax;"
        "movl $1, %%ebx;"
        "int $0x80"
        :: "c" (str), /* c constraint tells the compiler to put str in ecx */
           "d" (len)  /* d constraint tells the compiler to put len in edx */
        : "eax", "ebx", "memory");
}

或者甚至更好

// UNSAFE: destroys EAX (with return value) without telling the compiler
void write2(char *str, int len) {
    __asm__ volatile ("int $0x80"
        :: "a" (4), "b" (1), "c" (str), "d" (len)
        : "memory");
}

还要注意使用 易失性，它需要告诉编译器这不能被消除为死亡即使它的输出（没有）没有被使用。（没有输出操作数的asm已经是隐式的易失性，但是当真正的目的不是计算某些东西时，将其显式化并没有什么坏处；它是为了产生像这样的副作用系统调用。）

编辑

最后一点要注意的是——这个函数正在执行一个 write 系统调用，它会在 eax 中返回一个值——要么是写入的字节数，要么是错误代码。因此，您可以通过输出约束来实现：

int write2(const char *str, int len) {
    __asm__ volatile ("int $0x80" 
     : "=a" (len)
     : "a" (4), "b" (1), "c" (str), "d" (len),
       "m"( *(const char (*)[])str )       // "dummy" input instead of memory clobber
     );
    return len;
}

所有系统调用都以 EAX 形式返回。从-4095到-1（含）的值是负errno代码，其他值是非错误。（这适用于全局的所有 Linux 系统调用）。

如果您正在编写通用系统调用包装器，则可能需要一个“内存”破坏器，因为不同的系统调用具有不同的指针操作数，并且可能是输入或输出。请参阅 https://godbolt.org/z/GOXBue 获取如果省略则中断的示例，以及此答案了解有关虚拟内存输入/输出的更多详细信息。

对于此输出操作数，您需要显式的 易失性 —— 每次 asm 语句在源代码中“运行”时，正好有一个 write 系统调用。否则，编译器可以假设它的存在只是为了计算其返回值，并且可以消除使用相同输入的重复调用，而不是编写多行。（或者如果您没有检查返回值，则将其完全删除。）

Well, you don't say specifically, but by your post, it appears like you're using gcc and its inline asm with constraints syntax (other C compilers have very different inline syntax). That said, you probably need to use AT&T assembler syntax rather than Intel, as that's what gets used with gcc.

So with the above said, lets look at your write2 function. First, you don't want to create a stack frame, as gcc will create one, so if you create one in the asm code, you'll end up with two frames, and things will probably get very confused. Second, since gcc is laying out the stack frame, you can't access vars with "[ebp + offset]" as you don't know how it's being laid out.

That's what the constraints are for -- you say what kind of place you want gcc to put the value (any register, memory, specific register) and the use "%X" in the asm code. Finally, if you use explicit registers in the asm code, you need to list them in the 3rd section (after the input constraints) so gcc knows you are using them. Otherwise it might put some important value in one of those registers, and you'd clobber that value.

You also need to tell the compiler that inline asm will or might read from or write to memory pointed-to by the input operands; that is not implied.

So with all that, your write2 function looks like:

void write2(char *str, int len) {
    __asm__ volatile (
        "movl $4, %%eax;"      // SYS_write
        "movl $1, %%ebx;"      // file descriptor = stdout_fd
        "movl %0, %%ecx;"
        "movl %1, %%edx;"
        "int $0x80"
        :: "g" (str), "g" (len)       // input values we MOV from
        : "eax", "ebx", "ecx", "edx", // registers we destroy
          "memory"                    // memory has to be in sync so we can read it
     );
}

Note the AT&T syntax -- src, dest rather than dest, src and % before the register name.

Now this will work, but its inefficient as it will contain lots of extra movs. In general, you should NEVER use mov instructions or explicit registers in asm code, as you're much better off using constraints to say where you want things and let the compiler ensure that they're there. That way, the optimizer can probably get rid of most of the movs, particularly if it inlines the function (which it will do if you specify -O3). Conveniently, the i386 machine model has constraints for specific registers, so you can instead do:

void write2(char *str, int len) {
    __asm__ volatile (
        "movl $4, %%eax;"
        "movl $1, %%ebx;"
        "int $0x80"
        :: "c" (str), /* c constraint tells the compiler to put str in ecx */
           "d" (len)  /* d constraint tells the compiler to put len in edx */
        : "eax", "ebx", "memory");
}

or even better

// UNSAFE: destroys EAX (with return value) without telling the compiler
void write2(char *str, int len) {
    __asm__ volatile ("int $0x80"
        :: "a" (4), "b" (1), "c" (str), "d" (len)
        : "memory");
}

Note also the use of volatile which is needed to tell the compiler that this can't be eliminated as dead even though its outputs (of which there are none) are not used. (asm with no output operands is already implicitly volatile, but making it explicit doesn't hurt when the real purpose isn't to calculate something; it's for a side effect like a system call.)

edit

One final note -- this function is doing a write system call, which does return a value in eax -- either the number of bytes written or an error code. So you can get that with an output constraint:

int write2(const char *str, int len) {
    __asm__ volatile ("int $0x80" 
     : "=a" (len)
     : "a" (4), "b" (1), "c" (str), "d" (len),
       "m"( *(const char (*)[])str )       // "dummy" input instead of memory clobber
     );
    return len;
}

All system calls return in EAX. Values from -4095 to -1 (inclusive) are negative errno codes, other values are non-errors. (This applies globally to all Linux system calls).

If you're writing a generic system-call wrapper, you probably need a "memory" clobber because different system calls have different pointer operands, and might be inputs or outputs. See https://godbolt.org/z/GOXBue for an example that breaks if you leave it out, and this answer for more details about dummy memory inputs/outputs.

With this output operand, you need the explicit volatile -- exactly one write system call per time the asm statement "runs" in the source. Otherwise the compiler is allowed to assume that it exists only to compute its return value, and can eliminate repeated calls with the same input instead of writing multiple lines. (Or remove it entirely if you didn't check the return value.)

回复收藏 0 原文

~没有更多了~