当前位置：文江博客话题详情

Java 和 C/C 之间进程间通信的最快（低延迟）方法++

发布于 2024-08-29 00:11:52 字数 831 浏览 10 评论 0原文

我有一个Java应用程序，通过TCP套接字连接到用C/C++开发的“服务器”。

应用程序和服务器运行在同一台机器上，即 Solaris 机器（但我们正在考虑最终迁移到 Linux）。交换的数据类型是简单消息（登录、登录ACK，然后客户端请求某些内容，服务器回复）。每条消息的长度约为 300 字节。

目前我们正在使用套接字，一切都很好，但是我正在寻找一种更快的方式来交换数据（更低的延迟），使用 IPC 方法。

我一直在研究网络并提出了以下技术的参考：

共享内存
管道
队列
以及所谓的DMA（直接内存访问），

但我找不到对它们各自性能的正确分析，也没有找到如何实现它们在 JAVA 和 C/C++ 中（这样它们就可以互相交谈），除了我可以想象如何做的管道。

任何人都可以评论表演和表演吗？每种方法在这种情况下的可行性？任何有用的实现信息的指针/链接？

编辑/更新

在评论后我在这里得到的答案，我找到了有关 Unix 域套接字的信息，它似乎是在管道上构建的，并且可以节省我整个 TCP 堆栈。它是特定于平台的，因此我计划使用 JNI 或 juds 或 junixsocket。

下一个可能的步骤是直接实现管道，然后共享内存，尽管我已经被警告过额外的复杂性......

感谢您的帮助

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

新人笑 2024-09-05 00:11:52

刚刚在我的 Corei5 2.8GHz 上测试了 Java 的延迟，仅发送/接收单字节，
刚刚生成了 2 个 Java 进程，没有使用任务集分配特定的 CPU 核心：

TCP         - 25 microseconds
Named pipes - 15 microseconds

现在显式指定核心掩码，例如 taskset 1 java Srv 或 taskset 2 java Cli：

TCP, same cores:                      30 microseconds
TCP, explicit different cores:        22 microseconds
Named pipes, same core:               4-5 microseconds !!!!
Named pipes, taskset different cores: 7-8 microseconds !!!!

所以

TCP overhead is visible
scheduling overhead (or core caches?) is also the culprit

同时Thread.sleep(0)（如 strace 所示，导致执行单个 sched_yield() Linux 内核调用）需要 0.3 微秒 - 因此调度到单核的命名管道仍然有很大的开销

一些共享内存测量：
2009 年 9 月 14 日 – Solace Systems 今天宣布，其统一消息平台 API 使用共享内存传输可以实现低于 700 纳秒的平均延迟。
http://solacesystems.com/news/fastest-ipc-messaging/

PS - 第二天尝试共享内存以内存映射文件的形式，
如果繁忙等待可以接受，我们可以将延迟降低到0.3微秒
使用如下代码传递单个字节：

MappedByteBuffer mem =
  new RandomAccessFile("/tmp/mapped.txt", "rw").getChannel()
  .map(FileChannel.MapMode.READ_WRITE, 0, 1);

while(true){
  while(mem.get(0)!=5) Thread.sleep(0); // waiting for client request
  mem.put(0, (byte)10); // sending the reply
}

注意：需要 Thread.sleep(0) 以便 2 个进程可以看到彼此的更改
（我还不知道还有其他方法）。如果 2 个进程被迫与任务集相同的核心，
延迟变为 1.5 微秒 - 这是上下文切换延迟

P.PS - 0.3 微秒是一个不错的数字！以下代码仅花费 0.1 微秒，同时仅进行原始字符串连接：

int j=123456789;
String ret = "my-record-key-" + j  + "-in-db";

PPPS - 希望这不是太偏离主题，但最后我尝试用递增的静态易失性 int 变量替换 Thread.sleep(0) （JVM 碰巧这样做时刷新 CPU 缓存）并获得 - 记录！ - 72 纳秒延迟 Java 到 Java 进程通信！

然而，当被迫使用相同的 CPU 核心时，易失性递增的 JVM 永远不会相互放弃控制，从而产生恰好 10 毫秒的延迟 - Linux 时间量子似乎是 5 毫秒......所以只有在有空闲核心时才应该使用它 -否则 sleep(0) 更安全。

Just tested latency from Java on my Corei5 2.8GHz, only single byte send/received,
2 Java processes just spawned, without assigning specific CPU cores with taskset:

TCP         - 25 microseconds
Named pipes - 15 microseconds

Now explicitly specifying core masks, like taskset 1 java Srv or taskset 2 java Cli:

TCP, same cores:                      30 microseconds
TCP, explicit different cores:        22 microseconds
Named pipes, same core:               4-5 microseconds !!!!
Named pipes, taskset different cores: 7-8 microseconds !!!!

TCP overhead is visible
scheduling overhead (or core caches?) is also the culprit

At the same time Thread.sleep(0) (which as strace shows causes a single sched_yield() Linux kernel call to be executed) takes 0.3 microsecond - so named pipes scheduled to single core still have much overhead

Some shared memory measurement:
September 14, 2009 – Solace Systems announced today that its Unified Messaging Platform API can achieve an average latency of less than 700 nanoseconds using a shared memory transport.
http://solacesystems.com/news/fastest-ipc-messaging/

P.S. - tried shared memory next day in the form of memory mapped files,
if busy waiting is acceptable, we can reduce latency to 0.3 microsecond
for passing a single byte with code like this:

MappedByteBuffer mem =
  new RandomAccessFile("/tmp/mapped.txt", "rw").getChannel()
  .map(FileChannel.MapMode.READ_WRITE, 0, 1);

while(true){
  while(mem.get(0)!=5) Thread.sleep(0); // waiting for client request
  mem.put(0, (byte)10); // sending the reply
}

Notes: Thread.sleep(0) is needed so 2 processes can see each other's changes
(I don't know of another way yet). If 2 processes forced to same core with taskset,
the latency becomes 1.5 microseconds - that's a context switch delay

P.P.S - and 0.3 microsecond is a good number! The following code takes exactly 0.1 microsecond, while doing a primitive string concatenation only:

int j=123456789;
String ret = "my-record-key-" + j  + "-in-db";

P.P.P.S - hope this is not too much off-topic, but finally I tried replacing Thread.sleep(0) with incrementing a static volatile int variable (JVM happens to flush CPU caches when doing so) and obtained - record! - 72 nanoseconds latency java-to-java process communication!

When forced to same CPU Core, however, volatile-incrementing JVMs never yield control to each other, thus producing exactly 10 millisecond latency - Linux time quantum seems to be 5ms... So this should be used only if there is a spare core - otherwise sleep(0) is safer.

回复收藏 0 原文

瞳孔里扚悲伤 2024-09-05 00:11:52

这个问题是不久前提出的，但您可能对 https://github.com/peter 感兴趣-lawrey/Java-Chronicle，支持 200 ns 的典型延迟和 20 M 消息/秒的吞吐量。它使用进程之间共享的内存映射文件（它还保存数据，这使其成为保存数据的最快方式）

回复收藏 0 原文

软的没边 2024-09-05 00:11:52

DMA 是一种硬件设备可以在不中断 CPU 的情况下访问物理 RAM 的方法。例如，一个常见的例子是硬盘控制器，它可以将字节直接从磁盘复制到 RAM。因此它不适用于 IPC。

现代操作系统直接支持共享内存和管道。因此，它们的速度相当快。队列通常是抽象的，例如在套接字、管道和/或共享存储器之上实现。这可能看起来是一种较慢的机制，但另一种选择是您创建这样的抽象。

回复收藏 0 原文

挽梦忆笙歌 2024-09-05 00:11:52

这是一个包含各种 IPC 传输性能测试的项目：

http://github.com/rigtorp/ipc-bench< /a>

回复收藏 0 原文

眼中杀气 2024-09-05 00:11:52

虽然来晚了，但想指出一个致力于使用 Java NIO 测量 ping 延迟的开源项目。

在此博客文章。结果是（以纳秒为单位的 RTT）：

Implementation, Min,   50%,   90%,   99%,   99.9%, 99.99%,Max
IPC busy-spin,  89,    127,   168,   3326,  6501,  11555, 25131
UDP busy-spin,  4597,  5224,  5391,  5958,  8466,  10918, 18396
TCP busy-spin,  6244,  6784,  7475,  8697,  11070, 16791, 27265
TCP select-now, 8858,  9617,  9845,  12173, 13845, 19417, 26171
TCP block,      10696, 13103, 13299, 14428, 15629, 20373, 32149
TCP select,     13425, 15426, 15743, 18035, 20719, 24793, 37877

这与公认的答案一致。 System.nanotime() 误差（通过不进行任何测量来估计）测量值约为 40 纳秒，因此对于 IPC 而言，实际结果可能会更低。享受。

A late arrival, but wanted to point out an open source project dedicated to measuring ping latency using Java NIO.

Further explored/explained in this blog post. The results are(RTT in nanos):

Implementation, Min,   50%,   90%,   99%,   99.9%, 99.99%,Max
IPC busy-spin,  89,    127,   168,   3326,  6501,  11555, 25131
UDP busy-spin,  4597,  5224,  5391,  5958,  8466,  10918, 18396
TCP busy-spin,  6244,  6784,  7475,  8697,  11070, 16791, 27265
TCP select-now, 8858,  9617,  9845,  12173, 13845, 19417, 26171
TCP block,      10696, 13103, 13299, 14428, 15629, 20373, 32149
TCP select,     13425, 15426, 15743, 18035, 20719, 24793, 37877

This is along the lines of the accepted answer. System.nanotime() error (estimated by measuring nothing) is measured at around 40 nanos so for the IPC the actual result might be lower. Enjoy.

回复收藏 0 原文