非阻塞文件结束

发布于 2024-11-01 00:55:41 字数 26 浏览 4 评论 0原文

在非阻塞模式下如何检测文件的文件结尾?

How is end of file detected for a file in nonblocking mode?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

百思不得你姐 2024-11-08 00:55:41

至少在 POSIX(包括 Linux)上,明显的答案是不存在非阻塞常规文件。常规文件总是会阻塞,并且 O_NONBLOCK 会被默默地忽略。

类似地,poll()/select() 等。将始终告诉您指向常规文件的 fd 已准备好进行 I/O,无论数据是否已在页面缓存中准备好或仍在磁盘上(主要与读取相关)。

编辑 而且,由于 O_NONBLOCK 对常规文件是无操作的,因此常规文件上的 read() 永远不会将 errno 设置为 EAGAIN,这与此问题的另一个答案声称的相反。

EDIT2 参考资料:

来自 POSIX (p)select( ) 规范:“与常规文件关联的文件描述符应始终为准备读取、准备写入和错误条件选择 true。”

根据 POSIX poll() 规范:“常规文件应始终轮询对于阅读和写作来说都是如此。”

上述内容足以表明,虽然可能没有严格禁止,但非阻塞常规文件没有意义,因为除了忙等待之外没有办法轮询它们。

除此之外,至少还有一些

来自 POSIX open()< 的 间接证据/a> 规范:定义了引用管道、块特殊文件和字符特殊文件的文件描述符的行为。 “否则,O_NONBLOCK 的行为是未指定的。”

一些相关链接:

http://tinyclouds.org/iocp-links.html

http://www.remlab.net/op/nonblock.shtml

http://davmac.org/davpage/linux/async-io.html

而且,即使在这里stackoverflow:

常规文件读取能否受益于非阻塞 IO?

正如 R. 的回答所指出的,由于页面缓存的工作原理,常规文件的非阻塞并不是很容易定义。例如,如果通过某种机制,您发现数据已准备好在页面缓存中读取,然后在读取数据之前,内核由于内存压力决定将该页面踢出缓存,该怎么办?对于套接字和管道之类的东西来说是不同的,因为正确性要求数据不会像这样被丢弃。

另外,您将如何选择/轮询可查找的文件描述符?您需要一些新的 API,支持指定您感兴趣的文件中的字节范围。并且该 API 的内核实现将与 VM 系统相关联,因为它需要阻止您感兴趣的页面以免被赶出。这意味着这些页面将计入进程锁定页面限制(请参阅 ulimit -l),以防止 DOS。而且,这些页面什么时候会被解锁?等等。

At least on POSIX (including Linux), the obvious answer is that nonblocking regular files don't exist. Regular files ALWAYS block, and O_NONBLOCK is silently ignored.

Similarly, poll()/select() et al. will always tell you that a fd pointing to a regular file is ready for I/O, regardless of whether the data is ready in the page cache or still on disk (mostly relevant for reading).

EDIT And, since O_NONBLOCK is a no-op for regular files, a read() on a regular file will never set errno to EAGAIN, contrary to what another answer to this question claims.

EDIT2 References:

From the POSIX (p)select() specification: "File descriptors associated with regular files shall always select true for ready to read, ready to write, and error conditions."

From the POSIX poll() specification: "Regular files shall always poll TRUE for reading and writing."

The above suffices to imply that while perhaps not strictly prohibited, non-blocking regular files doesn't make sense as there would be no way to poll them except busy-waiting.

Beyond the above, there is at least some circumstantial evidence

From the POSIX open() specification: The behavior for file descriptors referring to pipes, block special files, and character special files is defined. "Otherwise, the behavior of O_NONBLOCK is unspecified."

Some related links:

http://tinyclouds.org/iocp-links.html

http://www.remlab.net/op/nonblock.shtml

http://davmac.org/davpage/linux/async-io.html

And, even here on stackoverflow:

Can regular file reading benefited from nonblocking-IO?

As the answer by R. points out, due to how page caching works, non-blocking for regular files is not very easily defined. E.g. what if by some mechanism you find out that data is ready for reading in the page cache, and then before you read it the kernel decides to kick that page out of cache due to memory pressure? It's different for things like sockets and pipes, because correctness requires that data is not discarded just like that.

Also, how would you select/poll for a seekable file descriptor? You'd need some new API that supported specifying which byte range in the file you're interested in. And the kernel implementation of that API would tie in to the VM system, as it would need to prevent the pages you're interested in from being kicked out. Which would imply that those pages would count against the process locked pages limit (see ulimit -l) in order to prevent a DOS. And, when would those pages be unlocked? And so on.

夜深人未静 2024-11-08 00:55:41

这是一个非常好的问题。非阻塞套接字从 recv() 返回一个空字符串,而不是抛出一个 socket.error 表明没有可用数据。但对于文件来说,Python 似乎没有任何可用的直接指示符。

我能想到的检测 EOF 的唯一机制是在收到空字符串后将文件的当前位置与整个文件大小进行比较:

def read_nonblock( fd ):
    t = os.read(fd, 4096)
    if t == '':
        if os.fstat(fd).st_size == os.lseek(fd, 0, os.SEEK_CUR):
            raise Exception("EOF reached")
    return t

当然,这假设非阻塞模式下的常规文件实际上会立即返回,而不是立即返回。等待从磁盘读取数据。我不确定 Windows 或 Linux 上是否如此。这是值得测试的,但如果即使在非阻塞模式下读取常规文件在遇到实际 EOF 时也只返回空字符串,我也不会感到惊讶。

This is a really good question. Non-blocking sockets return an empty string from recv() rather than throwing a socket.error indicating that there's no data available. For files though, there doesn't seem to be any direct indicator that's available to Python.

The only mechanism I can think of for detecting EOF is to compare the current position of the file to the overall file size after receiving an empty string:

def read_nonblock( fd ):
    t = os.read(fd, 4096)
    if t == '':
        if os.fstat(fd).st_size == os.lseek(fd, 0, os.SEEK_CUR):
            raise Exception("EOF reached")
    return t

This, of course, assumes that regular files in non-blocking mode will actually return immediately rather than wait for data to be read from the disk. I'm not sure if that's true on Windows or Linux. It'd be worth testing but I wouldn't be surprised if reading of regular files even in non-blocking mode only returns an empty string when the actual EOF is encountered.

万水千山粽是情ミ 2024-11-08 00:55:41

在 C++ (YMMV) 中运行良好的一个技巧是,如果返回的数据量小于缓冲区的大小(即缓冲区未满),您可以安全地假设事务已完成。那么文件的最后部分完全填充缓冲区的概率为 1/buffersize,因此对于高缓冲区大小,您可以合理地确定事务将以未填充的缓冲区结束,因此如果您测试数据量根据缓冲区大小返回,并且它们不相等,您知道发生了错误或事务已完成。不确定这是否会转换为 python,但这是我发现 EOF 的方法

A nice trick that works well in c++ (YMMV) is that if the amount of data returned is less that the size of the buffer (i.e. the buffer is not full) you can safely assume that the transaction has completed. there then is a 1/buffersize probability that the last part of the file completely fills the buffer so for a high buffer size you can be reasonable sure that the transaction will end with a non-filled buffer and so if you test the quantity of data returned against the buffer size and they are not equal you know that either an error occured or the transaction is complete. Not sure if this will translate to python but that is my method for spotting EOFs

浅笑轻吟梦一曲 2024-11-08 00:55:41

select 不会告诉你有东西要读,即使它只是 EOF 吗?如果它告诉你有东西要读,但你没有得到任何返回,那么它一定是 EOF。我相信套接字就是这种情况。

Doesn't select tell you there is something to read even if its just the EOF? If it tells you there is something to read and you don't get anything back then it must be EOF. I believe this to be the case for sockets.

骄兵必败 2024-11-08 00:55:41

对于文件,将文件描述符设置为非阻塞不会执行任何操作 - 无论如何,所有 IO 都是阻塞完成的。

如果您确实需要非阻塞文件IO,则需要查看aio_read和朋友,它们是用于文件访问的异步IO设施。这些非常不可移植,有时工作起来有些不稳定 - 因此大多数项目实际上决定使用单独的进程(或线程)进行 IO,并且只在那里使用阻塞 IO。

话又说回来,也许您对以某种方式“选择”文件感兴趣,以便在文件增长时您会收到通知。您可能已经意识到 selectpoll 等不起作用。大多数软件只是通过每隔一秒左右轮询一次文件来完成此操作 - 例如“tail -f”通过轮询就发挥了它的魔力。然而,您也可以让内核在文件被写入时通知您 - 这是由 inotify 和朋友完成的。有一些方便的库为您封装了所有这些内容,因此您不必自己处理具体细节。即,对于 python,inotifyx 和 pyinotify。

For files, setting the file descriptor as non-blocking does nothing - all IO is done blocking anyway.

If you really need non-blocking file IO, you need to look in to aio_read and friends, which are the asynchronous IO facility for file access. These are pretty non-portable and work somewhat flakily at times - so most projects have actually decided to use a separate process (or thread) for IO and just use blocking IO there.

Then again, maybe you are interested in somehow "select":ing a file such that you would get notified when the file grows. As you've probably realized select, poll, etc. do not work. Most software does this simply by polling the file every second or so - for example "tail -f" does it's magic by polling. However, you can also get the kernel to notify you when the file is written to - and this happens by inotify and friends. There are some handy libraries wrapping all this up for you so you don't have to muck around with the specifics yourself. Namely, for python, inotifyx and pyinotify.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文