汇编中的（进位标志）和系统调用（Mac OS 上的 x64 Intel 语法）之间有什么关系？

发布于 2025-01-13 23:49:48 字数 5044 浏览 4 评论 0原文

我是汇编语言的新手，我必须实现在MAC中使用汇编语言x64读取函数。到目前为止，这就是我所做的：

;;;;;;ft_read.s;;;;;;

global _ft_read:
section .text
extern ___error

_ft_read:
    mov rax, 0x2000003 ; store syscall value of read on rax 
    syscall            ; call read and pass to it rdi , rsi, rdx  ==> rax read(rdi, rsi, rdx)
    cmp rax, 103       ; compare rax with 103 by subtracting 103 from rax ==> rax - 103
    jl _ft_read_error  ; if the result of cmp is less than 0 then jump to _ft_read_error
    ret                ; else return the rax value which is btw the return value of syscall

_ft_read_error:
    push rax
    call ___error
    pop rcx
    mov [rax], rcx
    mov rax, -1
    ret

如您在上面看到的，我使用 syscall 调用 read，然后将存储在 rax 中的 read syscall 的返回值与 103 进行比较，我将解释为什么比较它与 103 但在此之前，让我解释一下其他事情，即 errno （mac 的手册页），这是手册页中关于 errno 的内容：

当系统调用检测到错误时，它返回一个整数值，指示 ing 失败（通常为 -1）并相应地设置变量 errno。 <这个允许在收到 -1 时解释失败并采取行动相应地。>成功的调用永远不会设置 errno；一旦设置，它仍然存在直到另一个错误发生。仅应在出现错误后进行检查。请注意，许多系统调用会重载这些错误的含义数字，并且必须根据类型解释含义以及通话的情况。
以下是给定的错误及其名称的完整列表在中。
0 错误 0。未使用。
1 EPERM 操作不允许。尝试执行仅限于具有适当权限的进程或仅限于文件或其他资源的所有者。
2 ENOENT 没有这样的文件或目录。指定路径名的组成部分不存在，或者路径名是空字符串。
...................................................... ...我将跳过这一部分（顺便说一句，我写了这行）...................................... ................
101 ETIME STREAM ioctl() 超时。此错误保留供将来使用。
102 EOPNOTSUPP 套接字不支持操作。所尝试的操作不支持所引用的套接字类型；例如，尝试接受数据报套接字上的连接。

据我了解，在使用 lldb 调试了很多时间后，我注意到 syscall 返回了 errno 中显示的数字之一> 手册页，例如，当我传递错误的文件描述符时，在我的 ft_read 函数中使用下面的 main.c 代码，如下所示：

int bad_file_des = -1337;// a file descriptor which it doesn't exist of course, you can change it with -42 as you like
ft_read(bad_file_des, buff, 300);

我们的 syscall 返回9，它存储在rax 所以我比较 rax rax rax rax rax 103（因为 errno 值从 0 到 102）然后跳转到 ft_read_error ，因为这是它应该做的。

一切都按我的计划进行，但是当我打开一个现有文件并将其文件描述符传递给我的 ft_read 函数时，出现了一个不知从何而来的问题，如下面的 main.c 所示，我们的 readsyscall 返回“返回读取的字节数”，这就是 read syscall 返回的内容，如手动的：

成功时，返回读取的字节数（零表示结束文件的位置），并且文件位置提前该数字。这是如果该数字小于字节数则不会出错要求；例如，这可能会发生，因为字节数较少现在实际上可用（也许是因为我们已经接近尾声了文件，或者因为我们正在从管道或终端读取），或者因为 read() 被信号中断了。另请参阅注释。
出错时，返回 -1，并适当设置 errno。在这个情况下，未指定文件位置（如果有）变化。

在我的 main 中，它工作得很好，我向我的 ft_read 函数传递了一个好的文件描述符、一个用于存储数据的缓冲区以及要读取的 50 个字节，因此 syscall将返回存储在rax中的50，然后比较使其工作>>> rax = 50 < 103 那么即使没有错误，它也会跳转到 ft_read_error ，只是因为 50 是这些 errno 错误号之一，而该错误号不在其中案件。

有人建议使用 jc （如果设置了进位标志则跳转）而不是 jl （如果少则跳转），如下面的代码所示：

;;;;;;ft_read.s;;;;;;

global _ft_read:
section .text
extern ___error

_ft_read:
    mov rax, 0x2000003 ; store syscall value of read on rax 
    syscall            ; call read and pass to it rdi , rsi, rdx  ==> rax read(rdi, rsi, rdx)
                       ; deleted the cmp
    jc _ft_read_error  ; if carry flag is set then jump to _ft_read_error
    ret                ; else return the rax value which is btw the return value of syscall

_ft_read_error:
    push rax
    call ___error
    pop rcx
    mov [rax], rcx
    mov rax, -1
    ret

猜猜看，它工作得很好并且当没有错误时，errno 使用我的 ft_read 返回 0，当有错误时，它返回相应的错误号。

但问题是我不知道为什么设置了进位标志，当没有cmp时，系统调用是否设置了进位标志当调用过程中出现错误，或者后台发生其他事情时？我想要详细解释系统调用和进位标志之间的关系，我对汇编还是新手，非常想学习它，提前感谢。

syscall 和进位标志之间的关系是什么以及syscall如何设置它？

这是我的main.c 函数，我用它来编译上面的汇编代码：

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <fcntl.h>
#include <errno.h>

ssize_t ft_read(int fildes, void *buf, size_t nbyte);

int     main()
{
    /*-----------------------------------------------------------------------*/
    ///////////////////////////////////////////////////////////////////////////
    /********************************ft_read**********************************/
    int     fd = open("./main.c", O_RDONLY);
    char    *buff = calloc(sizeof(char), 50 + 1);
    int     ret = ft_read(fd, buff, 50);

    printf("ret value = %d,  error value = %d : %s\n", ret, errno, strerror(errno));
    //don't forget to free ur buffer bro, this is just a test main don't be like me.
    return (0);
}

原文

I am new to the assembly language, and I have to make an implementation of read function using assembly language x64 in MAC.
so far this is what I did :

;;;;;;ft_read.s;;;;;;

global _ft_read:
section .text
extern ___error

_ft_read:
    mov rax, 0x2000003 ; store syscall value of read on rax 
    syscall            ; call read and pass to it rdi , rsi, rdx  ==> rax read(rdi, rsi, rdx)
    cmp rax, 103       ; compare rax with 103 by subtracting 103 from rax ==> rax - 103
    jl _ft_read_error  ; if the result of cmp is less than 0 then jump to _ft_read_error
    ret                ; else return the rax value which is btw the return value of syscall

_ft_read_error:
    push rax
    call ___error
    pop rcx
    mov [rax], rcx
    mov rax, -1
    ret

as you can see above, I call read with syscall, and then I compare the returned value of read syscall that stored in rax with 103, I will explain why I compare it with 103 but before that, let me explain something else, which is errno (man page of mac), this is what is written in the manual page about errno:

When a system call detects an error, it returns an integer value indicat-ing indicating
ing failure (usually -1) and sets the variable errno accordingly. <This
allows interpretation of the failure on receiving a -1 and to take action
accordingly.> Successful calls never set errno; once set, it remains
until another error occurs. It should only be examined after an error.
Note that a number of system calls overload the meanings of these error
numbers, and that the meanings must be interpreted according to the type
and circumstances of the call.
The following is a complete list of the errors and their names as given
in <sys/errno.h>.
0 Error 0. Not used.
1 EPERM Operation not permitted. An attempt was made to perform an operation limited to processes with appropriate privileges or to the
owner of a file or other resources.
2 ENOENT No such file or directory. A component of a specified pathname did not exist, or the pathname was an empty string.
..................................................I'll skip this part (I wrote this line btw)..................................................
101 ETIME STREAM ioctl() timeout. This error is reserved for future use.
102 EOPNOTSUPP Operation not supported on socket. The attempted operation is not supported for the type of socket referenced; for example, trying to accept a connection on a datagram socket.

and as I understand and after I debugged a lot of time using lldb, I noticed that syscall returns one of those numbers that are shown in the errno man page, for example when I pass a bad file descriptor, in my ft_read function using the below main.c code like this :

int bad_file_des = -1337;// a file descriptor which it doesn't exist of course, you can change it with -42 as you like
ft_read(bad_file_des, buff, 300);

our syscall returns 9 which is stored in rax so I compare if rax < 103 (because errno values are from 0 to 102) then jump to ft_read_error because that's what it should do.

Well everything works as I planned but there is a problem which came from nowhere, when I open an existing file and I pass it's file descriptor to my ft_read function as shown in the below main.c, our read syscall returns "the number of bytes read is returned", this is what read syscall returns as described on the manual:

On success, the number of bytes read is returned (zero indicates end
of file), and the file position is advanced by this number. It is
not an error if this number is smaller than the number of bytes
requested; this may happen for example because fewer bytes are
actually available right now (maybe because we were close to end-of-
file, or because we are reading from a pipe, or from a terminal), or
because read() was interrupted by a signal. See also NOTES.
On error, -1 is returned, and errno is set appropriately. In this
case, it is left unspecified whether the file position (if any)
changes.

and in my main that it works pretty fine, I pass to my ft_read function a good file descriptor, a buffer to store the data, and 50 bytes to read, so syscall will return 50 that stored in rax, then the comparison makes it's job >> rax = 50 < 103 then it will jump to ft_read_error even that there is no error, just because 50 is one of those errno error numbers which is not in this case.

someone suggests to use jc (jump if carry flag is set) rather than jl (jump if less) as shown in the code below :

;;;;;;ft_read.s;;;;;;

global _ft_read:
section .text
extern ___error

_ft_read:
    mov rax, 0x2000003 ; store syscall value of read on rax 
    syscall            ; call read and pass to it rdi , rsi, rdx  ==> rax read(rdi, rsi, rdx)
                       ; deleted the cmp
    jc _ft_read_error  ; if carry flag is set then jump to _ft_read_error
    ret                ; else return the rax value which is btw the return value of syscall

_ft_read_error:
    push rax
    call ___error
    pop rcx
    mov [rax], rcx
    mov rax, -1
    ret

and guess what, it works perfectly and errno returns 0 using my ft_read when there is no error, and it returns the appropriate error number when there is an error.

but the problem is that I don't know why the carry flag got set, when there is no cmp, does syscall set the carry flag when there is an error during the call, or there is another thing happening in the background? I want a detailed explanation about the relation between the syscall and carry flag, I am still new to assembly and I want to learn it so badly, and thanks in advance.

what is the relation between the syscall and carry flag and how syscall sets it?

this is my main.c function that I use to compile the assembly code above :

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <fcntl.h>
#include <errno.h>

ssize_t ft_read(int fildes, void *buf, size_t nbyte);

int     main()
{
    /*-----------------------------------------------------------------------*/
    ///////////////////////////////////////////////////////////////////////////
    /********************************ft_read**********************************/
    int     fd = open("./main.c", O_RDONLY);
    char    *buff = calloc(sizeof(char), 50 + 1);
    int     ret = ft_read(fd, buff, 50);

    printf("ret value = %d,  error value = %d : %s\n", ret, errno, strerror(errno));
    //don't forget to free ur buffer bro, this is just a test main don't be like me.
    return (0);
}

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

牵你的手，一向走下去 2025-01-20 23:49:48

造成混淆的部分原因是术语“系统调用”用于两个完全不同的事物：

通过执行syscall调用内核从文件中读取的实际请求说明。
C 函数 read()，由用户空间 C 库提供，作为 C 程序方便地访问 #1 功能的一种方式。

手册页记录了如何使用#2，但在汇编中您正在使用#1。整体语义是相同的，但访问它们的方式的细节不同。

特别是，C 函数 (#2) 遵循这样的约定：通过从函数返回 -1 并设置变量 errno 来指示错误。然而，这对于 #1 来说并不是一种指示错误的便捷方式。 errno 是位于程序内存中某处的全局（或线程局部）变量；内核不知道在哪里，而且告诉它也很尴尬，所以内核不能轻易地直接写入这个变量。对于内核来说，以其他方式返回错误代码更简单，并将其留给 C 库来设置 errno 变量。

基于BSD的操作系统通常遵循的约定是内核系统调用（#1）将根据是否发生错误来设置或清除进位标志。如果没有发生错误，rax包含系统调用的返回值（这里是读取的字节数）；如果确实发生错误，eax 包含错误代码（通常是一个 32 位值，因为 errno 是一个 int）。因此，如果您使用汇编语言编写，那么您应该会看到这样的情况。

至于内核如何设置/清除进位标志，当系统调用完成时，内核会执行 sysret 指令将控制权转移回用户空间。该指令的功能之一是从r11恢复rflags寄存器。当系统调用开始时，内核将保存进程的原始rflags，因此它只需在之前设置或清除该 64 位值中的低位（即进位标志所在的位置）即可或者将其加载到 r11 中以准备 sysret 后。然后，当您的进程继续执行系统调用后面的指令时，进位标志将处于相应的状态。

cmp 指令当然是 x86 CPU 设置进位标志的一种方式，但它绝不是唯一方式。即使是这样，您在用户空间程序中看不到该代码也不会感到惊讶，因为是内核决定了它的设置方式。

为了实现 #2，C 库的 read() 函数需要在内核约定 (#1) 和 C 程序员期望的 (#2) 之间进行接口，因此他们必须编写一些代码来检查进位标志并根据需要填充errno。他们的此函数的代码可能如下所示：

    global read
read:
    mov rax, 0x2000003
    ; fd, buf, count are in rdi, rsi, rdx respectively
    syscall
    jc read_error
    ; no error, return value is in rax which is where the C caller expects it
    ret
read_error:
    ; error occurred, eax contains error code
    mov [errno], eax
    ; C caller expects return value of -1
    mov rax, -1 
    ret

有更多信息MacOS 程序集的 64 位系统调用文档。我希望我能引用一些更权威的文档，但我不知道在哪里可以找到它。这里的内容似乎是“常识”。

Part of the confusion is that the term "system call" is used for two things that are really different:

The actual request to the kernel to read from a file, as invoked by executing the syscall instruction.
The C function read(), provided by the userspace C library as a way for C programs to conveniently access the functionality of #1.

The man page documents how to use #2, but in assembly you are working with #1. The overall semantics are the same, but the details of how you access them are different.

In particular, the C function (#2) follows the convention that errors are indicated by returning -1 from the function and setting the variable errno. However, this is not a convenient way for #1 to indicate errors. errno is a global (or thread-local) variable located somewhere in the program's memory; the kernel doesn't know where, and it would be awkward to tell it, so the kernel can't easily write this variable directly. It's simpler for the kernel to return error codes some other way, and leave it up to the C library to set the errno variable.

The convention that BSD-based operating systems generally follow is that the kernel system call (#1) will set or clear the carry flag according to whether an error occurred. If no error occurred, rax contains the system call's return value (here, the number of bytes read); if an error did occur, eax contains the error code (it's normally a 32-bit value, since errno is an int). So if you are writing in assembly, that is what you should expect to see.

As to how the kernel manages to set/clear the carry flag, when the system call is complete, the kernel executes the sysret instruction to transfer control back to user space. One of the functions of this instruction is to restore the rflags register from r11. The kernel will have saved your process's original rflags when the system call began, so it merely has to set or clear the low-order bit (that's where the carry flag is) in this 64-bit value before or after loading it into r11 in preparation for sysret. Then when your process continues execution with the instruction following your syscall, the carry flag will be in the corresponding state.

The cmp instruction is certainly one of the ways that an x86 CPU can set the carry flag, but it's by no means the only way. And even if it were, it shouldn't surprise you not to see that code in the userspace program, since it's the kernel that determines how it is set.

In order to implement #2, the C library's read() function needs to interface between the kernel's convention (#1) and what the C programmer is expecting (#2), so they have to write some code to check the carry flag and populate errno if needed. Their code for this function could look something like the following:

    global read
read:
    mov rax, 0x2000003
    ; fd, buf, count are in rdi, rsi, rdx respectively
    syscall
    jc read_error
    ; no error, return value is in rax which is where the C caller expects it
    ret
read_error:
    ; error occurred, eax contains error code
    mov [errno], eax
    ; C caller expects return value of -1
    mov rax, -1 
    ret

There is some more info at 64-bit syscall documentation for MacOS assembly. I wish I could cite some more authoritative documentation, but I don't know where to find it. What's here seems to be "common knowledge".

回复收藏 0 原文

~没有更多了~