写入文件和映射内存有什么区别?
我有以下与处理文件和映射它们(mmap
)相关的问题:
- 我们知道,如果我们创建一个文件并写入该文件,那么无论哪种方式我们都会写入内存。那么为什么要使用
mmap
将该文件映射到内存然后写入呢? - 如果是因为我们使用
mmap
实现保护 -PROT_NONE
、PROT_READ
、PROT_WRITE
,那么同样的还可以使用文件来实现保护级别。O_RDONLY
、O_RDWR
等。那为什么是mmap
呢? - 将文件映射到内存然后使用它有什么特殊的优势吗?而不是仅仅创建一个文件并写入它?
- 最后,假设我们将一个文件
mmap
到内存,如果我们写入mmap返回的内存位置,它是否也会同时写入该文件?
编辑:在线程之间共享文件
据我所知,如果我们在两个线程(而不是进程)之间共享文件,那么建议将其mmap
到内存中然后使用它,而不是直接使用文件。
但我们知道使用文件意味着它肯定在主内存中,那么为什么线程又需要映射呢?
I have the following questions related to handling files and mapping them (mmap
):
- We know that if we create a file, and write to that file, then either ways we are writing to the memory. Then why map that file to memory using
mmap
and then write? - If it is because of protection that we are achieving using
mmap
-PROT_NONE
,PROT_READ
,PROT_WRITE
, then the same level of protection can also be achieved using files.O_RDONLY
,O_RDWR
etc. Then whymmap
? - Is there any special advantage we get on mapping files to memory, and then using it? Rather than just creating a file and writing to it?
- Finally, suppose we
mmap
a file to memory, if we write to that memory location returned by mmap, does it also simultaneously write to that file as well?
Edit: sharing files between threads
As far as I know, if we share a file between two threads (not process) then it is advisable to mmap
it into memory and then use it, rather than directly using the file.
But we know that using a file means, it is surely in main memory, then why again the threads needs to be mmaped?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
内存映射文件实际上部分或全部映射在内存 (RAM) 中,而写入的文件将先写入内存,然后刷新到磁盘。内存映射文件从磁盘中取出并显式放入内存中以供读取和/或写入。它会一直保留在那里,直到您取消映射为止。
对磁盘的访问速度较慢,因此当您写入文件时,它将被刷新到磁盘并且不再驻留在 RAM 中,这意味着下次需要该文件时,您可能会从磁盘获取它(慢),而在内存映射文件中,您知道该文件位于 RAM 中,并且可以比在磁盘上时更快地访问它。
此外,内存映射文件通常用作 IPC 机制,因此两个或多个进程可以轻松共享同一文件并对其进行读/写。 (使用必要的 sycnh 机制)
当您需要经常读取文件并且该文件非常大时,将其映射到内存中可能会很有利,这样您就可以更快地访问它,然后必须打开它并从磁盘获取它每次。
编辑:
这取决于您的需求,当您有一个需要由不同线程非常频繁地访问的文件时,我不确定内存映射该文件从这个角度来看,这必然是一个好主意,如果您希望从不同线程的相同位置写入该文件,则需要同步对该文件的访问。如果这种情况经常发生,则可能会成为资源争用的地方。
只需从文件中读取,那么这可能是一个很好的解决方案,因为如果您仅从多个线程读取文件,那么您实际上并不需要同步访问。当你开始写作时,你必须使用同步机制。
我建议,如果您必须写入文件,就像处理任何其他文件一样,您可以让每个线程以线程本地方式执行自己的文件访问。通过这种方式,它减少了线程同步的需要以及难以发现和调试的错误的可能性。
A memory mapped file is actually partially or wholly mapped in memory (RAM), whereas a file you write to would be written to memory and then flushed to disk. A memory mapped file is taken from disk and placed into memory explicitly for reading and/or writing. It stays there until you unmap it.
Access to disk is slower, so when you've written to a file, it will be flushed to disk and no longer reside in RAM, which means, that next time you need the file, you might be going to get it from disk (slow), whereas in memory mapped files, you know the file is in RAM and you can have faster access to it then when it's on disk.
Also, mememory mapped files are often used as an IPC mechanism, so two or more processes can easily share the same file and read/write to it. (using necessary sycnh mechanisms)
When you need to read a file often, and this file is quite large, it can be advantageous to map it into memory so that you have faster access to it then having to go open it and get it from disk each time.
EDIT:
That depends on your needs, when you have a file that will need to be accessed very frequently by different threads, then I'm not sure that memory mapping the file will necessarily be a good idea, from the view that, you'll need to synch access to this mmap'ed file if you wish it write to it, in the same places from different threads. If that happens very often, it could be a spot for resource contention.
Just reading from the file, then this might be a good solution, cause you don't really need to synch access, if you're only reading from it from multiple threads. The moment you start writing, you do have to use synch mechanisms.
I suggest, that you have each thread do it's own file access in a thread local way, if you have to write to the file, just like you do with any other file. In this way it reduces the need for thread synchronization and the likelyhood of bugs hard to find and debug.
1)您误解了 write(2) 系统调用。 write() 不写入,它只是将缓冲区内容复制到操作系统缓冲区链并将其标记为脏。操作系统线程之一(bdflush IIRC)将拾取这些缓冲区,将它们写入磁盘并修改一些标志。之后。
使用 mmap,您可以直接访问操作系统缓冲区(但如果您更改其内容,它也会被标记为脏)
2)这与保护无关,而是与在页表条目中设置标志有关。
3)避免双缓冲。您还可以使用字符而不是块来寻址文件,这有时更实用
4) 这是您一直在使用的系统缓冲区(挂接到您的地址空间)。系统可能已将部分内容写入磁盘,也可能未将其部分写入磁盘。
5) 如果线程属于同一进程并共享页表和地址空间,则可以。
1) You misunderstand the write(2) system call. write() does not write, it just copies a buffer-contents to the OS buffer chain and marks it as dirty. One of the OS threads (bdflush IIRC) will pick up these buffers, write them to disk and fiddle with some flags. later.
With mmap, you directly access the OS buffer (but if you alter it's contents, it will be marked dirty, too)
2) This is not about protection, It is about setting flags in the pagetable entries.
3) you avoid double buffering. Also you can address the file in terms of characters instead of blocks, which sometimes is more practical
4) It's the system buffers (hooked into your address space) you have been using. The system may or may not have written parts of it to disk.
5) If threads belong to the same process and share the pagetables and address-space, yes.
一个原因可能是您将(遗留)代码设置为写入数据缓冲区,然后该缓冲区最后一次写入文件。在这种情况下,使用
mmap
将保存至少一份数据副本,因为操作系统可以直接将缓冲区写入磁盘。只要它只是关于文件写入,我(还)无法想象您想要使用
mmap
的任何其他原因。不,我想说保护在这里不相关。
它可能会保存一份或两份数据副本,例如从应用程序缓冲区到 libc 缓冲区再到操作系统缓冲区,请参阅第 1 点。在写入大量数据时,这可能会产生性能差异。
没有。据我所知,操作系统可以随时写入数据,只要在调用
msync
或munmap
在该内存区域上。(对于大多数文件来说,出于性能原因,它可能不会在大多数时间之间写入任何内容:将整个块写入磁盘,因为更改一个字节是相当昂贵的,特别是如果预计会进行更多修改的话到块将在不久的将来发生。)
One reason may be that you have (legacy) code that is set up to write into a data buffer, and then this buffer is written to file in one go at the end. In this case using
mmap
will save at least one copy of the data, as the OS can directly write the buffer to disk.As long as it is about file writing only, I can not (yet) imagine any other reasons why you'd want to use
mmap
.No, the protection is not relevant here I'd say.
It might save one or two copies of the data from e.g. app buffer to libc buffer to OS buffer, see point 1. This might make a performance difference when writing large amounts of data.
No. As far as I know, the OS is free to write the data at any time it likes, as long as the data has been written to disk after a call to
msync
ormunmap
on that memory region.(And for most files it will likely not write anything in between the majority of the time, for performce reasons: writing a whole block to disk because one byte changed is rather expensive, in particular if it is to be expected that a lot more modifications to the block will happen in the near future.)
在大多数情况下,您应该将内存映射文件视为您使用的内存。您应该只关心特殊情况,例如与光盘同步。它与内存是同一类型的存储,但可以在需要时从文件初始化并存储到文件。
In most cases you should consider memory mapped file as memory that you work with. You should care only about special cases like sync with disc. It's the same kind of storage as memory but it can be initialized from file and stored to file whenever you need.