从两个线程同时读取文件描述符

发布于 2024-10-18 20:33:19 字数 565 浏览 5 评论 0原文

  1. 我的问题:在 Linux 中(以及在 FreeBsd 中,通常在 UNIX 中)是否可以/合法地从两个线程同时读取单个文件描述符?

  2. 我做了一些搜索,但一无所获,尽管很多人问类似的问题,即同时从/向套接字 fd 读取/写入(意味着在其他线程写入时读取,在其他线程读取时不读取)。我还阅读了一些手册页,但对我的问题没有得到明确的答案。

  3. 为什么我会问这个问题。我尝试实现一个简单的程序来计算标准输入中的行数,例如 wc -l。实际上,我正在测试我自制的 C++ io 引擎的开销,发现 wc 快了 1.7 倍。我削减了一些 C++ 并接近 wc 速度,但没有达到它。然后我尝试了输入缓冲区大小并对其进行了优化,但 wc 仍然明显更快一些。最后我创建了 2 个并行读取相同 STDIN_FILENO 的线程,这最终比 wc 更快!但是行数变得不正确......所以我想一些垃圾来自意外的读取。内核不关心进程读取什么吗?

编辑:我做了一些研究,发现通过系统调用直接调用 read 不会改变任何东西。内核代码似乎做了一些同步处理,但我不太了解(read_write.c)

  1. my question: in Linux (and in FreeBsd, and generally in UNIX) is it possible/legal to read single file descriptor simultaneously from two threads?

  2. I did some search but found nothing, although a lot of people ask like question about reading/writing from/to socket fd at the same time (meaning reading when other thread is writing, not reading when other is reading). I also have read some man pages and got no clear answer on my question.

  3. Why I ask it. I tried to implement simple program that counts lines in stdin, like wc -l. I actually was testing my home-made C++ io engine for overhead, and discovered that wc is 1.7 times faster. I trimmed down some C++ and came closer to wc speed but didn't reach it. Then I experimented with input buffer size, optimized it, but still wc is clearly a bit faster. Finally I created 2 threads which read same STDIN_FILENO in parallel, and this at last was faster than wc! But lines count became incorrect... so I suppose some junk comes from reads which is unexpected. Doesn't kernel care what process read?

Edit: I did some research and discovered just that calling read directly via syscall does not change anything. Kernel code seem to do some sync handling, but i didnt understand much (read_write.c)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

惜醉颜 2024-10-25 20:33:19

这是未定义的行为, POSIX
说:

read() 函数应尝试从文件中读取 nbyte 字节
与打开的文件描述符 fildes 关联到缓冲区
buf 指向的。多个并发读取的行为
未指定相同的管道、FIFO 或终端设备。

That's undefined behavior, POSIX
says:

The read() function shall attempt to read nbyte bytes from the file
associated with the open file descriptor, fildes, into the buffer
pointed to by buf. The behavior of multiple concurrent reads on the
same pipe, FIFO, or terminal device is unspecified.

彩扇题诗 2024-10-25 20:33:19

关于同时访问单个文件描述符(即从多个线程甚至进程),我将引用 POSIX.1-2008 (IEEE Std 1003.1-2008),2.9.7 线程与常规文件操作的交互小节:

2.9.7 线程与常规文件操作的交互

当它们对常规文件或符号链接进行操作时,以下所有函数在 POSIX.1-2008 中指定的效果中彼此之间应是原子的:

[…] 读() […]

如果两个线程各自调用这些函数之一,则每个调用要么看到另一个调用的所有指定效果,要么看不到任何效果。 [...]

乍一看,这看起来相当不错。不过,我希望您在操作常规文件或符号链接时没有错过限制

@jarero 引用:

同一管道、FIFO 或终端设备上的多个并发读取的行为未指定。

因此,我认为我们隐含地同意:这取决于您正在读取的文件的类型。你说,你从 STDIN 读取。好吧,如果您的 STDIN 是普通文件,则可以使用并发访问。否则你不应该。

About accessing a single file descriptor concurrently (i.e. from multiple threads or even processes), I'm going to cite POSIX.1-2008 (IEEE Std 1003.1-2008), Subsection 2.9.7 Thread Interactions with Regular File Operations:

2.9.7 Thread Interactions with Regular File Operations

All of the following functions shall be atomic with respect to each other in the effects specified in POSIX.1-2008 when they operate on regular files or symbolic links:

[…] read() […]

If two threads each call one of these functions, each call shall either see all of the specified effects of the other call, or none of them. […]

At first glance, this looks quite good. However, I hope you did not miss the restriction when they operate on regular files or symbolic links.

@jarero cites:

The behavior of multiple concurrent reads on the same pipe, FIFO, or terminal device is unspecified.

So, implicitly, we're agreeing, I assume: It depends on the type of the file you are reading. You said, you read from STDIN. Well, if your STDIN is a plain file, you can use concurrent access. Otherwise you shouldn't.

伤感在游骋 2024-10-25 20:33:19

当与描述符 (fd) 一起使用时,read() 和 write() 依赖于 fd 的内部状态来了解将发生读取和写入的“当前偏移量”。因此,它们不是线程安全的。

为了允许多个线程同时使用单个描述符,提供了 pread() 和 pwrite()。对于这些接口,指定了描述符和所需的偏移量,因此不使用描述符中的“当前偏移量”。

When used with a descriptor (fd), read() and write() rely on the internal state of the fd to know the "current offset" at which the read and write will occur. As a result, they aren't thread-safe.

To allow a single descriptor to be used by multiple threads simultaneously, pread() and pwrite() are provided. With those interfaces, the descriptor and the desired offset are specified, so the "current offset" in the descriptor isn't used.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文