内存映射文件：优点和缺点？

发布于 2024-12-21 05:56:28 字数 512 浏览 5 评论 0 原文

我需要在同一台机器（两个不同的 JVM）上运行的两个 Java 应用程序之间共享数据。我准确地说，要共享的数据很大（大约7GB）。应用程序必须非常快速地访问数据，因为它们必须以非常高的速率回答传入的查询。我不希望应用程序为每个应用程序保存一份数据副本。

我发现一种选择是使用内存映射文件。应用程序 A 从某个地方（假设是数据库）获取数据并将其存储在文件中。然后应用程序 B 可以使用 java.nio 访问这些文件。我不知道内存映射文件到底是如何工作的，我只知道数据存储在一个文件中，并且该文件（或其一部分）映射到内存的某个区域（虚拟内存？）。因此，这两个应用程序可以读写内存中的数据，并且更改会自动（我猜？）提交到文件。我也不知道文件完全映射到内存中是否有最大大小。

我的第一个问题是，在这种情况下两个应用程序共享数据的不同可能性是什么（我的意思是考虑到数据量非常大并且访问这些数据必须非常快）？我明确指出，这个问题与内存映射 I/O 无关，它只是想知道解决同一问题的其他方法是什么。

我的第二个问题是使用内存映射文件的优点和缺点是什么？

谢谢

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

归属感 2024-12-28 05:56:28

我的第一个问题是两个应用程序共享数据的不同可能性是什么？

正如 S.Lott 指出的，有很多机制：

操作系统级别消息队列
操作系统级别POSIX 共享内存段（进程死亡后仍保留）
操作系统级别内存映射（可以是匿名或文件支持）
操作系统级匿名管道（单向）
操作系统-level 命名管道（单向）
操作系统级套接字（双向）--是否AF_UNIX 或 AF_INET 或 AF_INET6
操作系统级别共享全局内存 --适合多线程程序
在文件中存储数据
应用级消息队列
应用程序级黑板式元组空间
应用程序级键/值存储
应用程序级远程过程调用框架 -- 许多可用
应用程序级别基于网络的框架

我的第二个问题是使用内存映射文件的优点和缺点是什么？

优点：

非常快 - 取决于您访问数据的方式，可能有零复制机制可用于直接对数据进行操作，而不会造成速度损失。必须注意以一致的方式更新对象。
应该是非常可移植的——在 Unix 系统上可用大约 25 年（或多或少），并且显然 Windows 有机制也是。

缺点：

单系统共享。如果您想将应用程序分布在多台计算机上，共享内存不是一个很好的选择。分布式共享内存系统是可用的，但它们感觉非常像我的思维方式的错误接口。
即使在单个系统上，如果内存位于单个 NUMA 节点但需要由于由多个节点的处理器访问，与为每个节点提供自己的内存段相比，节点间请求可能会显着减慢处理速度。
您不能只存储指针 - 所有内容都必须存储为基地址的偏移，因为内存可能会映射到不同进程中的不同位置。我不知道这对于 Java 对象意味着什么，尽管可能聪明的人尽最大努力使其对 Java 程序员透明。如果您不使用他们提供的机制，那么您可能必须自己完成这项工作。（如果没有 Java 中的实际指针，也许这并不是很繁重。）
事实证明，一致地更新对象是非常困难的。相反，在消息传递系统中传递不可变对象通常会导致程序的并发错误更少。（Erlang 中的并发编程感觉非常自然和直接。更多< a href="http://en.wikipedia.org/wiki/Imperative_languages" rel="noreferrer">命令式语言往往会引入大量新的并发控制：信号量, 互斥体, 自旋锁, 监视器）。

My first question is what are the different possibilities for two applications to share data?

As S.Lott points out, there's a lot of mechanisms:

OS-level message queues
OS-level POSIX shared memory segments (persist after process death)
OS-level memory mappings (could be anonymous or file-backed)
OS-level anonymous pipes (unidirectional)
OS-level named pipes (unidirectional)
OS-level sockets (bidirectional) -- whether AF_UNIX or AF_INET or AF_INET6
OS-level shared global memory -- suitable for multi-threaded programs
Storing data in files
Application-level message queues
Application-level blackboard-style tuplespaces
Application-level key/value stores
Application-level remote procedure call frameworks -- many are available
Application-level web-based frameworks

My second question is what are the pros and cons of using memory-mapped files?

Pros:

very fast -- depending upon how you access the data, potentially zero-copy mechanisms can be used to operate directly on the data with no speed penalties. Care must be taken to update objects in a consistent manner.
should be very portable -- available on Unix systems for probably 25 years (give or take), and apparently Windows has mechanisms too.

Cons:

Single-system sharing. If you want to distribute your application over multiple machines, shared memory isn't a great option. Distributed shared memory systems are available, but they feel very much like the wrong interface to my way of thinking.
Even on a single system, if the memory is located on a single NUMA node but needed to be accessed by processors from multiple nodes, the inter-node requests may significantly slow processing compared to giving each node their own segment of the memory.
You can't just store pointers -- everything must be stored as offsets to base addresses, because the memory may be mapped at different locations in different processes. I have no idea what this means for Java objects, though presumably someone smart did their best to make it transparent to Java programmers. If you're not using their provided mechanisms, then you probably must do the work yourself. (Without actual pointers in Java, perhaps this is not very onerous.)
Updating objects consistently has proven to be very difficult. Passing immutable objects in message-passing systems instead generally results in programs with fewer concurrency bugs. (Concurrent programming in Erlang feels very natural and straight-forward. Concurrent programming in more imperative languages tends to introduce a huge pile of new concurrency controls: semaphores, mutexes, spinlocks, monitors).