我应该在 TCP 传输中手动嵌入数据大小信息吗?
想象一下,你和我正在通过 TCP 发送一个相当长的句子(例如 1024000 字节)。
如果你给我写一个1024000字节的句子,你实际上是使用NetworkStream把这些字节写进去的。
当我接收时,我是否应该提前知道你发送的句子的大小?
如果没有,我如何检查何时应该停止stream.read?
如果是,程序是否应该具有将数据大小嵌入数据头部的功能?所以我首先接收 4 个字节,看看我总共应该读取多少个字节?
.Net 有什么东西可以自动在传输中嵌入数据大小吗?
Imagine that you and me are sending a quite long sentence (say, 1024000 bytes) through TCP.
If you write a 1024000 bytes sentence to me, you actually use NetworkStream to write those bytes in.
When I receive, should I know in advance the size of the sentence you sent?
If not, how can I check when I should stop the stream.read?
If yes, should the program have facilities that embed the data size in the head of the data? So I receive 4 bytes first to see how many total I should read?
Does .Net have anything to automatically embed the data size in the transfer?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(9)
.NET 和 TCP 协议都没有内置任何内容来预先定义消息的大小。 TCP 协议仅指定所有数据将被传输到接收端点(或者至少将尽最大努力这样做)。
您全权负责定义一种方法,让接收者知道要读取多少数据。正如其他人指出的那样,如何执行此操作的细节取决于您要传输的内容的性质:您可以像您提到的那样首先发送长度,您可以对称为终止符的特殊序列进行编码,您可以使用预定义的数据块因此所有消息都具有相同的大小,等等。
编辑
这最初是作为评论,但它的内容超出了该限制。
向流中添加 NULL 只是意味着附加一个具有二进制值 0 的字符(不要与字符
0
混淆)。根据您用于传输的编码(即 ASCII、UTF-8、UTF-16 等),可能会转换为发送一个或多个 0 字节,但如果您使用适当的转换,您只需输入类似 < code>\0 在你的字符串中。这是一个示例:当然,上述所有内容都假设您发送的所有其余数据不包含任何其他 NULL。这意味着它是文本,而不是任意二进制数据(例如文件的内容)。这非常重要!否则你不能使用 NULL 作为消息终止符,你必须想出另一个方案。
Neither .NET nor the TCP protocol have anything built in to define the size of the message to come in advance. The TCP protocol only specifies that all data will be transferred to the receiving end point (or at least that the best effort will be employed to do so).
You are solely responsible for defining a way to let the receiver know how much data to read. The details of how you do this are - as others have pointed out - dependent of the nature of what you're transferring: you could send the length first like you mentioned, you could encode special sequences called terminators, you could use predefined data chunks so all messages have the same size, etc.
EDIT
This started out as a comment but there's more to it than fits that limit.
To add NULL to the stream simply means appending a character which has the binary value 0 (not to be confused with the character
0
). Depending on the encoding you're using for your transfer (i.e. ASCII, UTF-8, UTF-16 etc) that may translate into sending one or more 0 bytes but if you're using appropriate translation you simply need to put something like\0
in your string. Here's an example:Of course all of the above assumes that all of the rest of the data you're sending does not contain any other NULLs. That means that it's text, and not arbitrary binary data (such as the contents of a file). That's very important! Otherwise you can't use NULL as a message terminator and you have to come up with another scheme.
一般来说,使用带有数据大小的标头比使用终止字符更好。终止字符方法容易受到拒绝服务攻击。我可以继续向您的服务发送数据,只要我不包含终止符,您就需要继续处理(可能还分配内存)直到崩溃。
使用包含总大小的标头,如果传输太大而无法处理,您可以忽略它,或发回错误。如果恶意方尝试发送比标头中声明的数据更多的数据,您会在下一个流的开头注意到损坏的标头并忽略它。
Generally speaking, its better to use a header with the data size than a terminating character. The terminating character method is susceptible to a denial of service attack. I can just keep sending data to your service, and as long as I don't include the terminator, you need to keep processing (and possibly allocating memory) until you crash.
Using a header that contains the total size, if a transmission is too big for you to handle, you can ignore it, or send back an error. If a malicious party tries to send more data than what is declared in the header, you'll notice a corrupt header at the start of the next stream and ignore it.
这可能会有所帮助(对于渲染进度条之类的事情),但不一定是必需的。
您的流的内容定义了这一点。例如,许多消息都会编码一些信息来告诉您此消息已结束(例如,用空字节表示字符串的结束,或
表示 HTML 的结束)文档)。
That can be helpful (for things like rendering progress bars), but it's not necessarily required.
The contents of your stream define this. For example, many messages encode some information that tell you that this message is over (e.g., a null byte to represent the end of a string, or
</html>
to represent the end of a HTML document).如果您知道或可以轻松查出消息的总长度,我建议您提前发送。如果不可能或非常昂贵地确定它,您可以使用类似于分块传输编码的东西HTTP。
If you know or can easily find out the total length of the message, I'd suggest to transmit it in advance. If it is impossible or very expensive to determine it you could use something similar to chunked transfer encoding in HTTP.
要点是,对于 TCP,发送端的套接字写入的数量和大小与接收端的套接字读取的数量/大小之间没有对应关系。
如果数据流具有某种结构,则必须在有效负载周围添加某种元/包装数据。
每当我必须解决这个问题时,我都会使用以下组合:
a)使用幻数来指示数据消息的开始或结束(或两者)
b)在消息末尾使用校验和来验证内容是否正确正确(我知道 TCP 执行错误检查和重传,但在接收器拾取流中偶然出现的开始/结束幻数/序列的情况下,校验和非常有用)
c) 在初始幻数(假设发送方在传输开始之前知道数据的长度),
但在进行 DIY 之前,请仔细查看为您正在使用的语言/平台实现了哪些更高级别的协议库。网络流?是 Windows API/MFC 之类的吗?
例如,我最近必须设置一个客户端/服务器系统。客户&服务器功能已经用 python 编写,因此只需使用 python xmlrpclib/server 就可以非常轻松地将两个程序连接在一起 - 逐字复制示例,我在 30 分钟内就完成了。如果我自己直接在 TCP 上编写一些虚构的协议,那将需要 5 天!
Main point is that with TCP there is no correspondence between the number and size of the socket writes on the transmission side with the number/size of socket reads on the receiver side.
If the stream of data has some kind of structure to it you'll have to add some kind of meta/wrapper data around the payload.
Anytime I have had to solve this problem I have used some combination of:
a) use a magic number to indicate the start or end of your data msg (or both)
b) use a checksum at the end the msg to verify the contents are correct (I know that TCP performs error checking & retranmission but the checksum is a useful in the case where the receiver picks up an incidental occurrence of the start/end magic number/sequence in the stream)
c) use a length field after the initial magic number (provided the transmitting side knows the length of the data before transmission begins)
Hoever before going diy have a good look at what higher level protocols libs are implemented for the language/platform you are using. NetworkStream? is that Windows API/ MFC or something.
For instance I recently had to setup a client/server system. The client & server functionality was already written in python so simply using python xmlrpclib/server made it completely easy to join the two programs together - literally copy the example and I was done in 30mins. If I'd coded some madey-up protocol myself directly on tcp it would've been 5 days!
有两种方法可以做到这一点,一种是您所描述的方法 - 将消息的大小放在标头中 - 另一种是在流上放置某种终止标记。例如,如果保证您的消息不嵌入
NUL
字符,则可以使用NUL
终止。There two ways you could do this, one is the way you described - placing the size of the message in the header - and another is to put some sort of terminating marker on the stream. For example, if your message is guaranteed not to have embedded
NUL
characters, you could terminate with aNUL
.由于 TCP 是一种可靠的协议,您可以构建协议来指示即将到来的字节数,或者使用某种终止符来指示传输的结束。如果您使用的是 UDP(不能保证其可靠性),那么构建一个能够承受丢失字节的协议或指示自包含终止的数据包以来预计有多少字节(并具有重传机制)将更为重要可能会丢失。最大数据传输时间和超时也可能有用,但前提是您可以确定合理的最大值。
Since TCP is a reliable protocol you could either structure your protocol to indicate the number of bytes coming or use some sort of terminator to indicate the end of transmission. If you were using UDP, which is not guaranteed to be reliable, it would be much more important to either build a protocol that will withstand dropped bytes or indicate how many bytes are expected (and have a retransmission mechanism) since the packet containing the termination may be lost. Maximum data transmission times and timeouts may also be useful, but only if you can determine a reasonable maximum.
我的回答是否定的。特别是对于大数据集。原因是首先发送大小会增加系统的延迟。
如果您想先发送大小,则需要在开始发送之前计算整个答案。
另一方面,如果使用终止标记,则可以在数据的前几位准备就绪后立即开始发送数据,同时计算后续数据。
My answer would be no. Especially for large data sets. The reason is that sending the size first adds latency in your system.
If you want to send the size first, you need to compute the whole answer before starting to send it.
On the other hand, if you use a termination marker, you can start sending the first bits of data as soon as they are ready, while computing the following data.
您可能还想研究 BinaryReader/BinaryWriter 类,它们可以包装在任何流、TCP 或其他流上。
除其他功能外,这些功能还支持读取/写入字符串(以您选择的编码),同时也考虑字符串的长度。
You may also want to investigate the BinaryReader/BinaryWriter classes which can be wrapped around any stream, TCP or otherwise.
These support, among other functions, reading/writing strings (in an encoding of your choice) while taking care of including the length of the string too.