read() 系统调用会复制数据而不是传递引用
read()
系统调用导致内核复制数据,而不是通过引用传递缓冲区。我在接受采访时被问及这样做的原因。我能想到的最好办法是:
- 避免跨多个进程对同一缓冲区进行并发写入。
- 如果用户级进程尝试访问映射到内核虚拟内存区域的缓冲区,则会导致段错误。
事实证明,面试官对这两个答案都不完全满意。如果有人能详细说明上述内容,我将不胜感激。
The read()
system call causes the kernel to copy the data instead of passing the buffer by reference. I was asked the reason for this in an interview. The best I could come up with were:
- To avoid concurrent writes on the same buffer across multiple processes.
- If the user-level process tries to access a buffer mapped to kernel virtual memory area it will result in a segfault.
As it turns out the interviewer was not entirely satisfied with either of these answers. I would greatly appreciate if anybody could elaborate on the above.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
零复制实现意味着用户级进程必须有权访问内核/驱动程序内部用于读取的缓冲区。用户使用完缓冲区后必须显式调用内核来释放缓冲区。
根据读取设备的类型,缓冲区可能不仅仅是内存区域。 (例如,某些设备可能要求缓冲区位于特定的内存区域中。或者它们只能支持在启动时向其写入固定的内存区域。)在这种情况下,用户程序将无法“释放”这些缓冲区(以便设备可以向其中写入更多数据)可能会导致设备和/或其驱动程序停止正常运行,这是用户程序永远不应该做的事情。
A zero copy implementation would mean the user level process would have to be given access to the buffers used internally by the kernel/driver for reading. The user would have to make an explicit call to the kernel to free the buffer after they were done with it.
Depending on the type of device being read from, the buffers could be more than just an area of memory. (For example, some devices could require the buffers to be in a specific area of memory. Or they could only support writing to a fixed area of memory be given to them at startup.) In this case, failure of the user program to "free" those buffers (so that the device could write more data to them) could cause the device and/or its driver to stop functioning properly, something a user program should never be able to do.
缓冲区由调用者指定,因此获取其中数据的唯一方法是复制它们。 API 的定义方式是出于历史原因。
请注意,上面的两点对于替代方案
mmap
来说没有问题,它确实通过引用传递缓冲区(并且写入它而不是写入文件,因此您无法处理数据就位,而许多read
用户就是这么做的)。The buffer is specified by the caller, so the only way to get the data there is to copy them. And the API is defined the way it is for historical reasons.
Note, that your two points above are no problem for the alternative,
mmap
, which does pass the buffer by reference (and writing to it than writes to the file, so you than can't process the data in place, while many users ofread
do just that).我可能已经准备好对面试官的说法提出异议。 read() 调用中的缓冲区由用户进程提供,因此来自用户地址空间。也不保证它以任何特定方式与页框对齐。这使得直接在缓冲区中执行 IO 所需的操作变得很棘手。将缓冲区映射到设备驱动程序的地址空间或将其连接到 DMA。然而,在有限的情况下,这是可能的。
我似乎记得 Mac OS X 使用的用于在地址空间之间复制数据的 BSD 子系统在这方面进行了优化,尽管我可能完全错误。
I might have been prepared to dispute the interviewer's assertion. The buffer in a
read()
call is supplied by the user process and therefore comes from the user address space. It's also not guaranteed to be aligned in any particular way with respect to page frames. That makes it tricky to do what is necessary to perform IO directly into the buffer ie. map the buffer into the device driver's address space or wire it for DMA. However, in limited circumstances, this may be possible.I seem to remember the BSD subsystem used by Mac OS X used to copy data between address spaces had an optimisation in this respect, although I may be completely mistaken.