什么时候会使用 unpack('h' ...) 或 pack('h' ...)？

发布于 2024-09-26 00:57:31 字数 1020 浏览 7 评论 0 原文

在 Perl 中，pack 和 unpack 有两个用于将字节与十六进制相互转换的模板：

h 十六进制字符串（低位 nybble 在前）。
H 一个十六进制字符串（高半字节在前）。

最好用一个例子来说明这一点：

use 5.010; # so I can use say
my $buf = "\x12\x34\x56\x78";

say unpack('H*', $buf); # prints 12345678
say unpack('h*', $buf); # prints 21436587

如您所见，当人们考虑将字节与十六进制相互转换时，通常会想到 H 。那么 h 的用途是什么？拉里一定认为有人可能会使用它，否则他就不会费心将它包括在内。

您能否给出一个实际的示例，您实际上希望在 pack 或 中使用 h 而不是 H解压？我正在寻找一个具体的例子；如果您知道有一台像这样组织字节的机器，它是什么，您可以链接到它的一些文档吗？

我可以想到你可以使用 h的例子，例如当你并不真正关心格式是什么时序列化一些数据，只要你可以读回它，但是 H 对此也同样有用。我正在寻找一个示例，其中 h 比 H 更有用。

原文

In Perl, pack and unpack have two templates for converting bytes to/from hex:

h A hex string (low nybble first).
H A hex string (high nybble first).

This is best clarified with an example:

use 5.010; # so I can use say
my $buf = "\x12\x34\x56\x78";

say unpack('H*', $buf); # prints 12345678
say unpack('h*', $buf); # prints 21436587

As you can see, H is what people generally mean when they think about converting bytes to/from hexadecimal. So what's the purpose of h? Larry must have thought someone might use it, or he wouldn't have bothered to include it.

Can you give a real-world example where you'd actually want to use h instead of H with pack or unpack? I'm looking for a specific example; if you know of a machine that organized its bytes like that, what was it, and can you link to some documentation on it?

I can think of examples where you could use h, such as serializing some data when you don't really care what the format is, as long as you can read it back, but H would be just as useful for that. I'm looking for an example where h is more useful than H.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

伪心 2024-10-03 00:57:31

回想一下 MS-DOS 的糟糕日子，某些操作系统功能是通过设置来控制的寄存器上的高半字节和低半字节并执行中断 xx。例如，Int 21 访问许多文件函数。您可以将高半字节设置为驱动器编号 - 谁将拥有超过 15 个驱动器？低半字节作为该驱动器上请求的功能，等等。

此处是一些旧的 CPAN 代码，它使用您所描述的 pack 来设置寄存器以执行 MS-DOS 系统调用。

布莱奇！！！我一点也不怀念 MS-DOS...

--编辑

这里是具体的源代码：下载 Perl 5.00402 for DOS 此处，解压缩，

在文件 Opcode.pm 和 Opcode.pl 中，您可以在此处看到 unpack("h*",$_[0]); 的使用：

sub opset_to_hex ($) {
    return "(invalid opset)" unless verify_opset($_[0]);
    unpack("h*",$_[0]);
}

我没有完全遵循代码，但我怀疑这是从 MS-DOS 系统调用中恢复信息...

在 perlport 对于 Perl 5.8-8，您可以对目标的字节顺序进行以下建议测试：

不同的CPU以不同的方式存储整数和浮点数
顺序（称为字节序）和宽度（32 位和 64 位分别是
今天最常见）。当您的程序尝试传输时，这会影响它们
从一种 CPU 架构到另一种架构的二进制格式的数字，
通常通过网络连接“实时”，或者通过存储
数字到辅助存储，例如磁盘文件或磁带。

相互冲突的存储顺序使数字变得非常混乱。如果一个
小尾数主机（Intel、VAX）将 0x12345678 （305419896）存储在
十进制），大端主机（Motorola、Sparc、PA）将其读取为
0x78563412（2018915346，十进制）。 Alpha 和 MIPS 可以是：
Digital/Compaq 在小端模式下使用/使用它们； SGI/Cray 使用
它们处于大端模式。避免网络（套接字）出现此问题
连接使用 pack 和 unpack 格式 n 和 N，
“网络”订单。这些保证是可移植的。

从 perl 5.8.5 开始，您还可以使用 > 和 < 修饰符
强制使用大端或小端字节顺序。如果您愿意，这很有用
例如，存储有符号整数或 64 位整数。

您可以通过解包来探索平台的字节顺序
以本机格式打包的数据结构，例如：
 print unpack("h*", pack("s2", 1, 2)), "\n";
   # '10002000' 在例如 Intel x86 或 Alpha 21064 的小端模式下
   # 例如摩托罗拉 68040 上的“00100020”
如果您需要区分可以使用的字节序架构
任一变量设置如下：
 $is_big_endian = unpack("h*", pack("s", 1)) =~ /01/;
   $is_little_endian = unpack("h*", pack("s", 1)) =~ /^1/;
即使在相同的平台之间，不同的宽度也会导致截断
字节序。宽度较短的平台失去了上部的部分
数字。这个问题除了避免之外没有什么好的解决办法
传输或存储原始二进制数。

可以通过两种方式解决这两个问题。任何一个
始终以文本格式而不是原始格式传输和存储数字
二进制文件，或者考虑使用像 Data::Dumper 这样的模块（包含在
从 Perl 5.005 开始的标准发行版）和 Storable （包含为
perl 5.8）。将所有数据保留为文本可以显着简化问题。

v 字符串只能移植到 v2147483647 (0x7FFFFFFF)，即
EBCDIC，或者更准确地说，UTF-EBCDIC 将走多远。

看起来 unpack("h*",...) 比 pack("h*",...) 使用得更频繁。我确实注意到 return qq'unpack("F", pack("h*", "$hex"))'; 用于 Deparse.pm 和 < code>IO-Compress 在 Perl 5.12 中使用 pack("*h",...)

如果您想要更多示例，这里有一个 Google 代码搜索列表。您可以看到 pack|unpack("h*"...) 相当罕见，主要与确定平台字节序有关...

Recall in the bad 'ole days of MS-DOS that certain OS functions were controlled by setting high nibble and low nibbles on a register and performing an Interupt xx. For example, Int 21 accessed many file functions. You would set the high nibble as the drive number -- who will have more than 15 drives?? The low nibble as the requested function on that drive, etc.

Here is some old CPAN code that uses pack as you describe to set the registers to perform an MS-DOS system call.

Blech!!! I don't miss MS-DOS at all...

--Edit

Here is specific source code: Download Perl 5.00402 for DOS HERE, unzip,

In file Opcode.pm and Opcode.pl you see the use of unpack("h*",$_[0]); here:

sub opset_to_hex ($) {
    return "(invalid opset)" unless verify_opset($_[0]);
    unpack("h*",$_[0]);
}

I did not follow the code all the way through, but my suspicion is this is to recover info from an MS-DOS system call...

In perlport for Perl 5.8-8, you have these suggested tests for endianess of the target:

Different CPUs store integers and floating point numbers in different
orders (called endianness) and widths (32-bit and 64-bit being the
most common today). This affects your programs when they attempt to transfer
numbers in binary format from one CPU architecture to another,
usually either “live” via network connection, or by storing the
numbers to secondary storage such as a disk file or tape.

Conflicting storage orders make utter mess out of the numbers. If a
little-endian host (Intel, VAX) stores 0x12345678 (305419896 in
decimal), a big-endian host (Motorola, Sparc, PA) reads it as
0x78563412 (2018915346 in decimal). Alpha and MIPS can be either:
Digital/Compaq used/uses them in little-endian mode; SGI/Cray uses
them in big-endian mode. To avoid this problem in network (socket)
connections use the pack and unpack formats n and N, the
“network” orders. These are guaranteed to be portable.

As of perl 5.8.5, you can also use the > and < modifiers
to force big- or little-endian byte-order. This is useful if you want
to store signed integers or 64-bit integers, for example.

You can explore the endianness of your platform by unpacking a
data structure packed in native format such as:
   print unpack("h*", pack("s2", 1, 2)), "\n";
   # '10002000' on e.g. Intel x86 or Alpha 21064 in little-endian mode
   # '00100020' on e.g. Motorola 68040
If you need to distinguish between endian architectures you could use
either of the variables set like so:
   $is_big_endian    = unpack("h*", pack("s", 1)) =~ /01/;
   $is_little_endian = unpack("h*", pack("s", 1)) =~ /^1/;
Differing widths can cause truncation even between platforms of equal
endianness. The platform of shorter width loses the upper parts of the
number. There is no good solution for this problem except to avoid
transferring or storing raw binary numbers.

One can circumnavigate both these problems in two ways. Either
transfer and store numbers always in text format, instead of raw
binary, or else consider using modules like Data::Dumper (included in
the standard distribution as of Perl 5.005) and Storable (included as
of perl 5.8). Keeping all data as text significantly simplifies matters.

The v-strings are portable only up to v2147483647 (0x7FFFFFFF), that's
how far EBCDIC, or more precisely UTF-EBCDIC will go.

It seems that unpack("h*",...) is used more often than pack("h*",...). I did note that return qq'unpack("F", pack("h*", "$hex"))'; is used in Deparse.pm and IO-Compress uses pack("*h",...) in Perl 5.12

If you want further examples, here is a Google Code Search list. You can see pack|unpack("h*"...) is fairly rare and mostly to do with determining platform endianess...