在 Perl 中通过网络发送二进制安全数据
我正在实现一个向服务器发送消息的网络客户端。这些消息是字节流,协议要求我预先发送每个流的长度。
如果我给出的消息(通过使用我的模块的代码)是一个字节字符串,那么长度可以很容易地通过 length $string
给出。但如果它是一串字符,我需要对其进行处理以获取原始字节。我现在所做的基本上是这样的:
my $msg = shift; # some message from calling code
my $bytes;
if ( utf8::is_utf8( $msg ) ) {
$bytes = Encode::encode( 'utf-8', $msg );
} else {
$bytes = $msg;
}
my $length = length $bytes;
这是处理这个问题的正确方法吗?到目前为止似乎有效,但我还没有进行任何认真的测试。这种方法有哪些潜在的陷阱?
谢谢
I'm implementing a network client that sends messages to a server. The messages are streams of bytes, and the protocol requires that I send the length of each stream beforehand.
If the message that I am given (by the code using my module) is a byte string, then the length is given easily enough by length $string
. But if it's a string of characters, I'll need to massage it to get the raw bytes. What I'm doing now is basically this:
my $msg = shift; # some message from calling code
my $bytes;
if ( utf8::is_utf8( $msg ) ) {
$bytes = Encode::encode( 'utf-8', $msg );
} else {
$bytes = $msg;
}
my $length = length $bytes;
Is this the correct way to handle this? It seems to work so far, but I haven't done any serious testing yet. What potential pitfalls are there with this approach?
Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您不应该真正猜测您的输入是什么。 定义您的代码以接受字节字符串或 Unicode 字符串,并将其留给调用者将输入转换为正确的格式(或者为调用者提供某种方式来指定他们要使用哪种字符串)重新提供)。
如果您将代码定义为接受字节字符串,则
\xFF
上面的任何字符都是错误。如果您将代码定义为接受 Unicode 字符串,则可以使用
Encode::encode_utf8()
将它们转换为字节(无论 Perl 内部如何表示它们,都应该这样做)。无论如何,调用 utf8::is_utf8() 通常是一个错误 - 您的程序不应该关心字符串的内部表示,而只关心它们包含的实际数据(字符序列)。其中一些字符(特别是
\x80
到\xFF
范围内的字符)是否在内部由一个或两个字节表示并不重要。诗。阅读
perldoc Encode
可能有助于澄清 Perl 中字节和字符的问题。You shouldn't really be guessing at what your input is. Define your code to accept either byte strings or Unicode character strings, and leave it to the caller to convert the input to the proper format (or provide some way for the caller to specify which kind of strings they're providing).
If you define your code to accept byte strings, then any characters above
\xFF
are an error.If you define your code to accept Unicode character strings, then you can convert them to bytes with
Encode::encode_utf8()
(and should do so regardless of how they're internally represented by Perl).In any case, calling
utf8::is_utf8()
is usually a mistake — your program should not care about the internal representation of strings, only about the actual data (a sequence of characters) they contain. Whether some of those characters (in particular, those in the range\x80
to\xFF
) are internally represented by one or two bytes should not matter.Ps. Reading
perldoc Encode
may help to clarify issues with bytes and characters in Perl.发送者:
接收者:
The sender:
The receiver:
perldoc -f length
曾经说过,早在 v5.8 中,length
的现代文档没有提及bytes
:但我不认为这会废弃
do { use bytes; ... }
解决方案。perldoc -f length
used to say, back in v5.8,The modern docs for
length
don't mentionbytes
:but I don't think that deprecates the
do { use bytes; ... }
solution.