Java 和 C/C 之间进程间通信的最快(低延迟)方法++

发布于 2024-08-29 00:11:52 字数 831 浏览 2 评论 0原文

我有一个Java应用程序,通过TCP套接字连接到用C/C++开发的“服务器”。

应用程序和服务器运行在同一台机器上,即 Solaris 机器(但我们正在考虑最终迁移到 Linux)。 交换的数据类型是简单消息(登录、登录ACK,然后客户端请求某些内容,服务器回复)。每条消息的长度约为 300 字节。

目前我们正在使用套接字,一切都很好,但是我正在寻找一种更快的方式来交换数据(更低的延迟),使用 IPC 方法。

我一直在研究网络并提出了以下技术的参考:

  • 共享内存
  • 管道
  • 队列
  • 以及所谓的DMA(直接内存访问),

但我找不到对它们各自性能的正确分析,也没有找到如何实现它们在 JAVA 和 C/C++ 中(这样它们就可以互相交谈),除了我可以想象如何做的管道。

任何人都可以评论表演和表演吗?每种方法在这种情况下的可行性? 任何有用的实现信息的指针/链接?


编辑/更新

在评论后 我在这里得到的答案,我找到了有关 Unix 域套接字的信息,它似乎是在管道上构建的,并且可以节省我整个 TCP 堆栈。 它是特定于平台的,因此我计划使用 JNI 或 judsjunixsocket

下一个可能的步骤是直接实现管道,然后共享内存,尽管我已经被警告过额外的复杂性......


感谢您的帮助

I have a Java app, connecting through TCP socket to a "server" developed in C/C++.

both app & server are running on the same machine, a Solaris box (but we're considering migrating to Linux eventually).
type of data exchanged is simple messages (login, login ACK, then client asks for something, server replies). each message is around 300 bytes long.

Currently we're using Sockets, and all is OK, however I'm looking for a faster way to exchange data (lower latency), using IPC methods.

I've been researching the net and came up with references to the following technologies:

  • shared memory
  • pipes
  • queues
  • as well as what's referred as DMA (Direct Memory Access)

but I couldn't find proper analysis of their respective performances, neither how to implement them in both JAVA and C/C++ (so that they can talk to each other), except maybe pipes that I could imagine how to do.

can anyone comment about performances & feasibility of each method in this context ?
any pointer / link to useful implementation information ?


EDIT / UPDATE

following the comment & answers I got here, I found info about Unix Domain Sockets, which seem to be built just over pipes, and would save me the whole TCP stack.
it's platform specific, so I plan on testing it with JNI or either juds or junixsocket.

next possible steps would be direct implementation of pipes, then shared memory, although I've been warned of the extra level of complexity...


thanks for your help

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(10

新人笑 2024-09-05 00:11:52

刚刚在我的 Corei5 2.8GHz 上测试了 Java 的延迟,仅发送/接收单字节,
刚刚生成了 2 个 Java 进程,没有使用任务集分配特定的 CPU 核心:

TCP         - 25 microseconds
Named pipes - 15 microseconds

现在显式指定核心掩码,例如 taskset 1 java Srvtaskset 2 java Cli

TCP, same cores:                      30 microseconds
TCP, explicit different cores:        22 microseconds
Named pipes, same core:               4-5 microseconds !!!!
Named pipes, taskset different cores: 7-8 microseconds !!!!

所以

TCP overhead is visible
scheduling overhead (or core caches?) is also the culprit

同时Thread.sleep(0)(如 strace 所示,导致执行单个 sched_yield() Linux 内核调用)需要 0.3 微秒 - 因此调度到单核的命名管道仍然有很大的开销

一些共享内存测量:
2009 年 9 月 14 日 – Solace Systems 今天宣布,其统一消息平台 API 使用共享内存传输可以实现低于 700 纳秒的平均延迟。
http://solacesystems.com/news/fastest-ipc-messaging/

PS - 第二天尝试共享内存以内存映射文件的形式,
如果繁忙等待可以接受,我们可以将延迟降低到0.3微秒
使用如下代码传递单个字节:

MappedByteBuffer mem =
  new RandomAccessFile("/tmp/mapped.txt", "rw").getChannel()
  .map(FileChannel.MapMode.READ_WRITE, 0, 1);

while(true){
  while(mem.get(0)!=5) Thread.sleep(0); // waiting for client request
  mem.put(0, (byte)10); // sending the reply
}

注意:需要 Thread.sleep(0) 以便 2 个进程可以看到彼此的更改
(我还不知道还有其他方法)。如果 2 个进程被迫与任务集相同的核心,
延迟变为 1.5 微秒 - 这是上下文切换延迟

P.PS - 0.3 微秒是一个不错的数字!以下代码仅花费 0.1 微秒,同时仅进行原始字符串连接:

int j=123456789;
String ret = "my-record-key-" + j  + "-in-db";

PPPS - 希望这不是太偏离主题,但最后我尝试用递增的静态易失性 int 变量替换 Thread.sleep(0) (JVM 碰巧这样做时刷新 CPU 缓存)并获得 - 记录! - 72 纳秒延迟 Java 到 Java 进程通信

然而,当被迫使用相同的 CPU 核心时,易失性递增的 JVM 永远不会相互放弃控制,从而产生恰好 10 毫秒的延迟 - Linux 时间量子似乎是 5 毫秒......所以只有在有空闲核心时才应该使用它 -否则 sleep(0) 更安全。

Just tested latency from Java on my Corei5 2.8GHz, only single byte send/received,
2 Java processes just spawned, without assigning specific CPU cores with taskset:

TCP         - 25 microseconds
Named pipes - 15 microseconds

Now explicitly specifying core masks, like taskset 1 java Srv or taskset 2 java Cli:

TCP, same cores:                      30 microseconds
TCP, explicit different cores:        22 microseconds
Named pipes, same core:               4-5 microseconds !!!!
Named pipes, taskset different cores: 7-8 microseconds !!!!

so

TCP overhead is visible
scheduling overhead (or core caches?) is also the culprit

At the same time Thread.sleep(0) (which as strace shows causes a single sched_yield() Linux kernel call to be executed) takes 0.3 microsecond - so named pipes scheduled to single core still have much overhead

Some shared memory measurement:
September 14, 2009 – Solace Systems announced today that its Unified Messaging Platform API can achieve an average latency of less than 700 nanoseconds using a shared memory transport.
http://solacesystems.com/news/fastest-ipc-messaging/

P.S. - tried shared memory next day in the form of memory mapped files,
if busy waiting is acceptable, we can reduce latency to 0.3 microsecond
for passing a single byte with code like this:

MappedByteBuffer mem =
  new RandomAccessFile("/tmp/mapped.txt", "rw").getChannel()
  .map(FileChannel.MapMode.READ_WRITE, 0, 1);

while(true){
  while(mem.get(0)!=5) Thread.sleep(0); // waiting for client request
  mem.put(0, (byte)10); // sending the reply
}

Notes: Thread.sleep(0) is needed so 2 processes can see each other's changes
(I don't know of another way yet). If 2 processes forced to same core with taskset,
the latency becomes 1.5 microseconds - that's a context switch delay

P.P.S - and 0.3 microsecond is a good number! The following code takes exactly 0.1 microsecond, while doing a primitive string concatenation only:

int j=123456789;
String ret = "my-record-key-" + j  + "-in-db";

P.P.P.S - hope this is not too much off-topic, but finally I tried replacing Thread.sleep(0) with incrementing a static volatile int variable (JVM happens to flush CPU caches when doing so) and obtained - record! - 72 nanoseconds latency java-to-java process communication!

When forced to same CPU Core, however, volatile-incrementing JVMs never yield control to each other, thus producing exactly 10 millisecond latency - Linux time quantum seems to be 5ms... So this should be used only if there is a spare core - otherwise sleep(0) is safer.

瞳孔里扚悲伤 2024-09-05 00:11:52

这个问题是不久前提出的,但您可能对 https://github.com/peter 感兴趣-lawrey/Java-Chronicle,支持 200 ns 的典型延迟和 20 M 消息/秒的吞吐量。它使用进程之间共享的内存映射文件(它还保存数据,这使其成为保存数据的最快方式)

The question was asked some time ago, but you might be interested in https://github.com/peter-lawrey/Java-Chronicle which supports typical latencies of 200 ns and throughputs of 20 M messages/second. It uses memory mapped files shared between processes (it also persists the data which makes it fastest way to persist data)

软的没边 2024-09-05 00:11:52

DMA 是一种硬件设备可以在不中断 CPU 的情况下访问物理 RAM 的方法。例如,一个常见的例子是硬盘控制器,它可以将字节直接从磁盘复制到 RAM。因此它不适用于 IPC。

现代操作系统直接支持共享内存和管道。因此,它们的速度相当快。队列通常是抽象的,例如在套接字、管道和/或共享存储器之上实现。这可能看起来是一种较慢的机制,但另一种选择是您创建这样的抽象。

DMA is a method by which hardware devices can access physical RAM without interrupting the CPU. E.g. a common example is a harddisk controller which can copy bytes straight from disk to RAM. As such it's not applicable to IPC.

Shared memory and pipes are both supported directly by modern OSes. As such, they're quite fast. Queues are typically abstractions, e.g. implemented on top of sockets, pipes and/or shared memory. This may look like a slower mechanism, but the alternative is that you create such an abstraction.

挽梦忆笙歌 2024-09-05 00:11:52

这是一个包含各种 IPC 传输性能测试的项目:

http://github.com/rigtorp/ipc-bench< /a>

Here's a project containing performance tests for various IPC transports:

http://github.com/rigtorp/ipc-bench

眼中杀气 2024-09-05 00:11:52

虽然来晚了,但想指出一个致力于使用 Java NIO 测量 ping 延迟的开源项目

在此博客文章。结果是(以纳秒为单位的 RTT):

Implementation, Min,   50%,   90%,   99%,   99.9%, 99.99%,Max
IPC busy-spin,  89,    127,   168,   3326,  6501,  11555, 25131
UDP busy-spin,  4597,  5224,  5391,  5958,  8466,  10918, 18396
TCP busy-spin,  6244,  6784,  7475,  8697,  11070, 16791, 27265
TCP select-now, 8858,  9617,  9845,  12173, 13845, 19417, 26171
TCP block,      10696, 13103, 13299, 14428, 15629, 20373, 32149
TCP select,     13425, 15426, 15743, 18035, 20719, 24793, 37877

这与公认的答案一致。 System.nanotime() 误差(通过不进行任何测量来估计)测量值约为 40 纳秒,因此对于 IPC 而言,实际结果可能会更低。享受。

A late arrival, but wanted to point out an open source project dedicated to measuring ping latency using Java NIO.

Further explored/explained in this blog post. The results are(RTT in nanos):

Implementation, Min,   50%,   90%,   99%,   99.9%, 99.99%,Max
IPC busy-spin,  89,    127,   168,   3326,  6501,  11555, 25131
UDP busy-spin,  4597,  5224,  5391,  5958,  8466,  10918, 18396
TCP busy-spin,  6244,  6784,  7475,  8697,  11070, 16791, 27265
TCP select-now, 8858,  9617,  9845,  12173, 13845, 19417, 26171
TCP block,      10696, 13103, 13299, 14428, 15629, 20373, 32149
TCP select,     13425, 15426, 15743, 18035, 20719, 24793, 37877

This is along the lines of the accepted answer. System.nanotime() error (estimated by measuring nothing) is measured at around 40 nanos so for the IPC the actual result might be lower. Enjoy.

你的笑 2024-09-05 00:11:52

如果您考虑使用本机访问(因为您的应用程序和“服务器”位于同一台计算机上),请考虑 JNA ,它有更少的样板代码供您处理。

If you ever consider using native access (since both your application and the "server" are on the same machine), consider JNA, it has less boilerplate code for you to deal with.

咽泪装欢 2024-09-05 00:11:52

我对本机进程间通信了解不多,但我猜想您需要使用本机代码进行通信,您可以使用 JNI 机制访问本机代码。因此,您可以从 Java 调用与其他进程对话的本机函数。

I don't know much about native inter-process communication, but I would guess that you need to communicate using native code, which you can access using JNI mechanisms. So, from Java you would call a native function that talks to the other process.

心病无药医 2024-09-05 00:11:52

在我以前的公司,我们曾经使用过这个项目,http://remotetea.sourceforge.net/,非常容易理解和集成。

In my former company we used to work with this project, http://remotetea.sourceforge.net/, very easy to understand and integrate.

老旧海报 2024-09-05 00:11:52

您是否考虑过保持套接字打开,以便可以重用连接?

Have you considered keeping the sockets open, so the connections can be reused?

北座城市 2024-09-05 00:11:52

Oracle 关于 JNI 性能的错误报告:http://bugs.java.com/bugdatabase /view_bug.do?bug_id=4096069

JNI 是一个缓慢的接口,因此 Java TCP 套接字是应用程序之间通知的最快方法,但这并不意味着您必须通过套接字发送有效负载。使用 LDMA 传输有效负载,但如 之前的问题已经指出,Java 对内存映射的支持并不理想,因此您需要实现一个 JNI 库来运行 mmap。

Oracle bug report on JNI performance: http://bugs.java.com/bugdatabase/view_bug.do?bug_id=4096069

JNI is a slow interface and so Java TCP sockets are the fastest method for notification between applications, however that doesn't mean you have to send the payload over a socket. Use LDMA to transfer the payload, but as previous questions have pointed out, Java support for memory mapping is not ideal and you so will want to implement a JNI library to run mmap.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文