内存映射文件:优点和缺点?

发布于 2024-12-21 05:56:28 字数 512 浏览 5 评论 0 原文

我需要在同一台机器(两个不同的 JVM)上运行的两个 Java 应用程序之间共享数据。我准确地说,要共享的数据很大(大约7GB)。应用程序必须非常快速地访问数据,因为它们必须以非常高的速率回答传入的查询。我不希望应用程序为每个应用程序保存一份数据副本。

我发现一种选择是使用内存映射文件。应用程序 A 从某个地方(假设是数据库)获取数据并将其存储在文件中。然后应用程序 B 可以使用 java.nio 访问这些文件。我不知道内存映射文件到底是如何工作的,我只知道数据存储在一个文件中,并且该文件(或其一部分)映射到内存的某个区域(虚拟内存?)。因此,这两个应用程序可以读写内存中的数据,并且更改会自动(我猜?)提交到文件。我也不知道文件完全映射到内存中是否有最大大小。

我的第一个问题是,在这种情况下两个应用程序共享数据的不同可能性是什么(我的意思是考虑到数据量非常大并且访问这些数据必须非常快)?我明确指出,这个问题与内存映射 I/O 无关,它只是想知道解决同一问题的其他方法是什么。

我的第二个问题是使用内存映射文件的优点和缺点是什么?

谢谢

I need to share data between two Java applications running on the same machine (two different JVMs). I precise that the data to be shared is large (about 7 GB). The applications must access the data very fast because they have to answer incoming queries at a very high rate. I don't want the applications to hold each one a copy of the data.

I've seen that one option is to use memory-mapped files. Application A gets the data from somewhere (let's say a database) and stores it in files. Then application B may access these files using java.nio. I don't know exactly how memory-mapped files work, I only know that the data is stored in a file and that this file (or a part of it) is mapped to a region of the memory (virtual memory?). So, the two applications can read-write the data in memory and the changes are automatically (I guess?) committed to the file. I also don't know if there is a maximum size for a file to be entirely mapped in memory.

My first question is what are the different possibilities for two applications to share data in this scenario (I mean taking into account that the amount of data is very large and that access to this data must be very fast)? I precise that this question is not related to memory-mapped I/O, it just to know what are the other ways to solve the same problem.

My second question is what are the pros and cons of using memory-mapped files?

Thanks

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

归属感 2024-12-28 05:56:28

我的第一个问题是两个应用程序共享数据的不同可能性是什么?

正如 S.Lott 指出的,有很多机制:

我的第二个问题是使用内存映射文件的优点和缺点是什么?

优点:

  • 非常快 - 取决于您访问数据的方式,可能有零复制机制可用于直接对数据进行操作,而不会造成速度损失。必须注意以一致的方式更新对象。
  • 应该是非常可移植的——在 Unix 系统上可用大约 25 年(或多或少),并且显然 Windows 有机制也是。

缺点:

  • 单系统共享。如果您想将应用程序分布在多台计算机上,共享内存不是一个很好的选择。 分布式共享内存系统是可用的,但它们感觉非常像我的思维方式的错误接口。
  • 即使在单个系统上,如果内存位于单个 NUMA 节点 但需要由于由多个节点的处理器访问,与为每个节点提供自己的内存段相比,节点间请求可能会显着减慢处理速度。
  • 您不能只存储指针 - 所有内容都必须存储为基地址的偏移,因为内存可能会映射到不同进程中的不同位置。我不知道这对于 Java 对象意味着什么,尽管可能聪明的人尽最大努力使其对 Java 程序员透明。如果您不使用他们提供的机制,那么您可能必须自己完成这项工作。 (如果没有 Java 中的实际指针,也许这并不是很繁重。)
  • 事实证明,一致地更新对象是非常困难的。相反,在消息传递系统中传递不可变对象通常会导致程序的并发错误更少。 (Erlang 中的并发编程感觉非常自然和直接。更多< a href="http://en.wikipedia.org/wiki/Imperative_languages" rel="noreferrer">命令式语言往往会引入大量新的并发控制:信号量, 互斥体, 自旋锁, 监视器)。

My first question is what are the different possibilities for two applications to share data?

As S.Lott points out, there's a lot of mechanisms:

My second question is what are the pros and cons of using memory-mapped files?

Pros:

  • very fast -- depending upon how you access the data, potentially zero-copy mechanisms can be used to operate directly on the data with no speed penalties. Care must be taken to update objects in a consistent manner.
  • should be very portable -- available on Unix systems for probably 25 years (give or take), and apparently Windows has mechanisms too.

Cons:

  • Single-system sharing. If you want to distribute your application over multiple machines, shared memory isn't a great option. Distributed shared memory systems are available, but they feel very much like the wrong interface to my way of thinking.
  • Even on a single system, if the memory is located on a single NUMA node but needed to be accessed by processors from multiple nodes, the inter-node requests may significantly slow processing compared to giving each node their own segment of the memory.
  • You can't just store pointers -- everything must be stored as offsets to base addresses, because the memory may be mapped at different locations in different processes. I have no idea what this means for Java objects, though presumably someone smart did their best to make it transparent to Java programmers. If you're not using their provided mechanisms, then you probably must do the work yourself. (Without actual pointers in Java, perhaps this is not very onerous.)
  • Updating objects consistently has proven to be very difficult. Passing immutable objects in message-passing systems instead generally results in programs with fewer concurrency bugs. (Concurrent programming in Erlang feels very natural and straight-forward. Concurrent programming in more imperative languages tends to introduce a huge pile of new concurrency controls: semaphores, mutexes, spinlocks, monitors).
奈何桥上唱咆哮 2024-12-28 05:56:28

内存映射文件听起来很让人头疼。一个简单且不易出错的选项是使用具有集群感知缓存的共享数据库。这样,只有写入会写入数据库,而读取则可以从缓存中进行。

作为如何在 hibernate 中执行此操作的示例,请参阅 http://docs.jboss.org/hibernate/core/3.3/reference/en/html/performance.html#performance-cache

Memory mapped files sounds like a headache. A simple option and less error prone would be to use a shared database with a cluster aware cache. That way only writes go down to the database and reads can be served from the cache.

As an example of how to do this in hibernate see http://docs.jboss.org/hibernate/core/3.3/reference/en/html/performance.html#performance-cache

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文