数据包通过命名管道传输?单字节缓冲区还是预先确定大小?

发布于 2024-08-18 14:31:14 字数 432 浏览 9 评论 0原文

我想通过命名管道在两个程序之间发送数据“包”(即离散消息)。考虑到我必须提供一个缓冲区和一个缓冲区大小来读取,并且考虑到读取命令是阻塞的(我相信),我要么必须有一个缓冲区大小来保证我永远不会低于-run,或者预先知道消息的大小。我不希望发送程序必须知道缓冲区的大小并将其填充。

在我看来,有三种方法可以做到这一点。

  1. 在每个包前面添加要发送的消息的大小,以便侦听程序可以读取那么多字节。
  2. 一次从管道读取一个字节并侦听特殊的流结束值。
  3. 更好的方法

在第一种情况下,我将能够创建一个已知大小的缓冲区并立即读入它。在第二种情况下,我必须使用一字节缓冲区进行读取。这可能要么完全没问题,要么就是一场极其低效的嘲讽。

我选择第二种方法的唯一原因是为了更灵活的输入(例如,如果我想要的话,可以进行手动交互)。

哪条路最好?

I want to send 'packets' of data (i.e. discrete messages) between two programs through named pipes. Given that I have to supply a buffer and a buffer size to read, and given that the read command is blocking (I believe), I either have to have a buffer size that guarantees I never get an under-run, or to know the size of the message up-front. I don't want the sending program to have to know the size of the buffer and pad it out.

As I see it, there are three ways to do this.

  1. Prepend each package with the size of the message being sent so the listening program can read that many bytes.
  2. Read from the pipe a byte at a time and listen for a special end-of-stream value.
  3. A better way

In the first case I would be able to create a buffer of known size and read into it at once. In the second case I would have to read with a one-byte buffer. This might either be perfectly OK or a massively inefficient travesty.

The only reason I would go for the second approach would be for more flexible input (for example, manual interaction if I wanted it).

Which is the best way to go?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

书信已泛黄 2024-08-25 14:31:14

使用命名管道,读取和写入是(或可以是)原子的。在限制范围内,如果您向管道写入 1024 字节,则另一端查找至少 1024 字节的读取调用实际上将收到 1024 字节,即使当时管道中有更多数据。所读的。此外,如果命名管道中只有 1024 字节并且读取请求 4096 字节,那么它将在第一次尝试时获取 1024 字节,并且仅在后续尝试时阻塞。

你说:

鉴于我必须提供一个缓冲区和一个缓冲区大小来读取,

你需要......

考虑到读取命令是阻塞的(我相信),

除非您在文件描述符上设置 O_NONBLOCK...

我要么必须有一个缓冲区大小来保证我永远不会出现运行不足的情况,

您要发送什么类型的消息?您要处理的尺寸是多少?千字节、兆字节还是更大?

或者预先了解消息的大小。

比如说,在阅读器中拥有 4KB 缓冲区并以块的形式读取消息并没有什么特别的问题。问题在于知道您何时到达消息末尾。到目前为止,大多数协议都需要预先指定长度,因为它可以轻松可靠地编写阅读器代码。

如果您要进行“流结束”(EOS) 标记,那么您就是在进行“带内信令”。这会带来麻烦。您要使用什么角色?当该字符出现在数据中时会发生什么?您需要一个转义机制,例如表示“下一个字符不是 EOS 标记”的字符。例如,在与编程相关的文本中,反斜杠就是用于此目的。在终端,control-V 通常可以达到目的。

我不希望发送程序必须知道缓冲区的大小并将其填充。

为什么发送方很难知道缓冲区的大小?为什么需要“填充它”?

如果您正在处理大量数据(例如千字节以上),单字符解决方案不太可能产生可接受的性能。我认为最好让发送者能够确定数据包的大小并告诉读者,或者设计协议以限制数据包的大小。如果您需要传输任意数量的数据,请制定一个协议,其中规定:

  • 总大小未知的大量数据即将到来。
  • 对于每个子数据包,消息显示“这是一个大小为 NN KB 的子数据包”。
  • 对于最后一个子数据包,大小可能更短 - 这没关系,并且可以指示“大量数据结束”。
  • 如果最后一个子数据包是“全尺寸”,您可以发送一个空的最后一个数据包来指示 EOS。
  • 或者,如果子数据包的大小可变,您始终可以发送显式 EOS 数据包。

还要考虑如果您不想使用命名管道,而想要升级系统以通过套接字连接到另一台机器,那么将来会发生什么。

我认为您应该使用数据包来设计系统,其中数据包标头包含数据的大小(大多数网络协议(例如 TCP/IP)的工作方式)。如果存在未知大小的更高级别的数据流,请按照上面概述的方式进行处理。但即便如此,如果您能提前知道总体尺寸,那就更好了。

With named pipes, reads and writes are (or can be) atomic. Within limits, if you write, say, 1024 bytes to the pipe, a read call on the other end that is looking for at least 1024 bytes will actually receive the 1024 bytes, even if there is more data in the pipe at the time of the read. Further, and always, if there are just 1024 bytes in the named pipe and a read requests 4096 bytes, it will get the 1024 bytes on the first attempt, and only block on a subsequent attempt.

You say:

Given that I have to supply a buffer and a buffer size to read,

You do...

and given that the read command is blocking (I believe),

It is, unless you set O_NONBLOCK on the file descriptor...

I either have to have a buffer size that guarantees I never get an under-run,

What sort of messages are you sending? What size are you dealing with? Kilobytes, megabytes, bigger?

or to know the size of the message up-front.

There is no particular problem with having, say, a 4KB buffer in the reader, and reading the message in chunks. The issue is knowing when you reach the end of the message. By far the majority of protocols require the length up front, because it makes it easy to write the reader code reliably.

If you are going to do an 'end of stream' (EOS) marker, you are doing 'in-band signalling'. And that causes trouble. What character are you going to use? What happens when that character appears in the data? You need an escape mechanism, such as a character that means 'the next character is not the EOS marker'. For example, in text related to programming, the backslash is used for this. At a terminal, control-V often serves the purpose.

I don't want the sending program to have to know the size of the buffer and pad it out.

Why is it hard for the sender to know the size of the buffer? And why would it need to 'pad it out'?

If you are dealing with large amounts of data (from say kilobytes upwards), the single-character solution is unlikely to yield acceptable performance. I think you would be best off having the sender able to determine the size of packet and telling the reader, or designing the protocol so that there are limits on the size of a packet. If you need to convey arbitrary amounts of data, have a protocol which says:

  • Large quantity of data of unknown total size coming.
  • For each sub-packet, the message says 'this is a sub-packet of size NN KB'.
  • For the last sub-packet, the size might be shorter - that's OK and could indicate 'end of large quantity of data'.
  • If the last sub-packet is 'full size', you might send an empty last packet to indicate the EOS.
  • Alternatively, if the sub-packets can be of variable size, you can always send an explicit EOS packet.

Also consider what will happen in future if, instead of using named pipes, you want to upgrade your system to work over a socket connection to another machine.

I think you should design your system with packets where the packet headers include the size of the data (the way most networking protocols, such as TCP/IP, do things). And if there's a higher level flow of data of unknown size, handle it along the lines outlined above. But even there, it is better if you can tell the overall size ahead of time.

千里故人稀 2024-08-25 14:31:14

一种简单的方法是使用一个离散数据包,其中包含一个 ftok(基于命名管道)和一个指向共享内存中以空结尾的字符串的指针,该字符串已使用 ftok 返回值分配。所有其他离散信息都可以在数据包结构内传递。

发送者:

packet.ident = ftok("./mynamedpipe");
packet.pointer = shmget(packet.ident, sizeof(message), IPC_CREAT|IPC_EXCL);
strcpy(packet.pointer, message);

接收者:

message = shmat(packet.ident, NULL, NULL);   

请注意,shmat 中的地址未明确提供,以防止重新映射接收者进程中的现有内存。

One simple way would be to have a discrete packet that contains a ftok (based on the named pipe) and a pointer to a null terminated string in shared memory that has been assigned using the ftok return value. All other discrete information can be passed within the packet struct.

sender:

packet.ident = ftok("./mynamedpipe");
packet.pointer = shmget(packet.ident, sizeof(message), IPC_CREAT|IPC_EXCL);
strcpy(packet.pointer, message);

receiver:

message = shmat(packet.ident, NULL, NULL);   

Note that the address in shmat isn't explicitly provided in order to prevent remapping existing memory within the receiver process.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文