还有什么比这更好的方法将 4 个字节打包成 3 个字节呢?
我有一个值范围在 0 - 63 之间的数组,并决定将每 4 个字节打包为 3 个字节,因为这些值只需要 6 位,并且我可以使用额外的 2 位来存储下一个值的前 2 位,很快。
在我使用 switch 语句和 nextbit 变量(类似设备的状态机)来进行打包并跟踪起始位之前,我从未这样做过。但我坚信,一定有更好的方法。
请提供建议/线索,但不要破坏我的乐趣;-)
有关大/小端的任何可移植性问题吗?
顺便说一句:我已经通过再次解压并与输入进行比较来验证此代码是否有效。不,这不是作业,只是我自己设定的练习。
/* build with gcc -std=c99 -Wconversion */
#define ASZ 400
typedef unsigned char uc_;
uc_ data[ASZ];
int i;
for (i = 0; i < ASZ; ++i) {
data[i] = (uc_)(i % 0x40);
}
size_t dl = sizeof(data);
printf("sizeof(data):%z\n",dl);
float fpl = ((float)dl / 4.0f) * 3.0f;
size_t pl = (size_t)(fpl > (float)((int)fpl) ? fpl + 1 : fpl);
printf("length of packed data:%z\n",pl);
for (i = 0; i < dl; ++i)
printf("%02d ", data[i]);
printf("\n");
uc_ * packeddata = calloc(pl, sizeof(uc_));
uc_ * byte = packeddata;
uc_ nextbit = 1;
for (int i = 0; i < dl; ++i) {
uc_ m = (uc_)(data[i] & 0x3f);
switch(nextbit) {
case 1:
/* all 6 bits of m into first 6 bits of byte: */
*byte = m;
nextbit = 7;
break;
case 3:
/* all 6 bits of m into last 6 bits of byte: */
*byte++ = (uc_)(*byte | (m << 2));
nextbit = 1;
break;
case 5:
/* 1st 4 bits of m into last 4 bits of byte: */
*byte++ = (uc_)(*byte | ((m & 0x0f) << 4));
/* 5th and 6th bits of m into 1st and 2nd bits of byte: */
*byte = (uc_)(*byte | ((m & 0x30) >> 4));
nextbit = 3;
break;
case 7:
/* 1st 2 bits of m into last 2 bits of byte: */
*byte++ = (uc_)(*byte | ((m & 0x03) << 6));
/* next (last) 4 bits of m into 1st 4 bits of byte: */
*byte = (uc_)((m & 0x3c) >> 2);
nextbit = 5;
break;
}
}
I have an array of values all well within the range 0 - 63, and decided I could pack every 4 bytes into 3 because the values only require 6 bits and I could use the extra 2bits to store the first 2 bits of the next value and so on.
Having never done this before I used the switch
statement and a nextbit
variable (a state machine like device) to do the packing and keep track of the starting bit. I'm convinced however, there must be a better way.
Suggestions/clues please, but don't ruin my fun ;-)
Any portability problems regarding big/little endian?
btw: I have verified this code is working, by unpacking it again and comparing with the input. And no it ain't homework, just an exercise I've set myself.
/* build with gcc -std=c99 -Wconversion */
#define ASZ 400
typedef unsigned char uc_;
uc_ data[ASZ];
int i;
for (i = 0; i < ASZ; ++i) {
data[i] = (uc_)(i % 0x40);
}
size_t dl = sizeof(data);
printf("sizeof(data):%z\n",dl);
float fpl = ((float)dl / 4.0f) * 3.0f;
size_t pl = (size_t)(fpl > (float)((int)fpl) ? fpl + 1 : fpl);
printf("length of packed data:%z\n",pl);
for (i = 0; i < dl; ++i)
printf("%02d ", data[i]);
printf("\n");
uc_ * packeddata = calloc(pl, sizeof(uc_));
uc_ * byte = packeddata;
uc_ nextbit = 1;
for (int i = 0; i < dl; ++i) {
uc_ m = (uc_)(data[i] & 0x3f);
switch(nextbit) {
case 1:
/* all 6 bits of m into first 6 bits of byte: */
*byte = m;
nextbit = 7;
break;
case 3:
/* all 6 bits of m into last 6 bits of byte: */
*byte++ = (uc_)(*byte | (m << 2));
nextbit = 1;
break;
case 5:
/* 1st 4 bits of m into last 4 bits of byte: */
*byte++ = (uc_)(*byte | ((m & 0x0f) << 4));
/* 5th and 6th bits of m into 1st and 2nd bits of byte: */
*byte = (uc_)(*byte | ((m & 0x30) >> 4));
nextbit = 3;
break;
case 7:
/* 1st 2 bits of m into last 2 bits of byte: */
*byte++ = (uc_)(*byte | ((m & 0x03) << 6));
/* next (last) 4 bits of m into 1st 4 bits of byte: */
*byte = (uc_)((m & 0x3c) >> 2);
nextbit = 5;
break;
}
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
所以,这有点像 code-golf,对吗?
So, this is kinda like code-golf, right?
查看 IETF RFC 4648 的“Base16、Base32 和 Base64 数据编码”。
部分代码批评:
不要使用浮点数据——只使用整数。并使用“%z”打印“size_t”值 - 假设您有一个 C99 库。
我认为您的循环可以通过处理 3 字节输入单元来简化,直到剩下部分单元,然后作为特殊情况处理剩余的 1 或 2 字节。我注意到引用的标准说您使用一两个“=”符号在末尾进行填充。
我有一个 Base64 编码器和解码器,可以完成其中的一些工作。您正在描述 Base64 的“解码”部分(其中 Base64 代码有 4 个字节的数据,应该仅存储在 3 个字节中)作为您的打包代码。 Base64 编码器对应于您需要的解包器。
Base-64 解码器
注意:base_64_inv 是一个包含 256 个值的数组,每个值对应一个可能的输入字节值;它为每个编码字节定义了正确的解码值。在 Base64 编码中,这是一个稀疏数组 - 3/4 个零。类似地,base_64_map是值0..63和对应存储值之间的映射。
Base-64 编码器
我必须处理使用变体字母表进行 Base64 编码的产品,并且还设法不填充数据,因此使生活变得复杂 - 因此“pad”参数(对于“空填充”或“空填充”可以为零=' 用于标准填充。 'base_64_map' 数组包含用于 0..63 范围内的 6 位值的字母表。
Check out the IETF RFC 4648 for 'The Base16, Base32 and Base64 Data Encodings'.
Partial code critique:
Don't use the floating point stuff - just use integers. And use '%z' to print 'size_t' values - assuming you've got a C99 library.
I think your loop could be simplified by dealing with 3-byte input units until you've got a partial unit left over, and then dealing with a remainder of 1 or 2 bytes as special cases. I note that the standard referenced says that you use one or two '=' signs to pad at the end.
I have a Base64 encoder and decode which does some of that. You are describing the 'decode' part of Base64 -- where the Base64 code has 4 bytes of data that should be stored in just 3 - as your packing code. The Base64 encoder corresponds to the unpacker you will need.
Base-64 Decoder
Note: base_64_inv is an array of 256 values, one for each possible input byte value; it defines the correct decoded value for each encoded byte. In the Base64 encoding, this is a sparse array - 3/4 zeroes. Similarly, base_64_map is the mapping between a value 0..63 and the corresponding storage value.
Base-64 Encoder
I complicate life by having to deal with a product that uses a variant alphabet for the Base64 encoding, and also manages not to pad data - hence the 'pad' argument (which can be zero for 'null padding' or '=' for standard padding. The 'base_64_map' array contains the alphabet to use for 6-bit values in the range 0..63.
另一种更简单的方法是使用位字段。 C
struct
语法中鲜为人知的角落之一是 big 字段。假设您有以下结构:这声明
chunk1
、chunk2
、chunk3
和chunk4
具有以下类型byte
但只占用结构中的 6 位。结果是sizeof(struct Packed_bytes) == 3
。现在您需要的只是一个小函数来获取数组并将其转储到结构中,如下所示:好了,您现在有了一个 struct returned_bytes 数组,您可以使用上面的结构轻松读取它。
Another simpler way to do it would be to use bit fields. One of the lesser known corners of C
struct
syntax is the big field. Let's say you have the following structure:This declares
chunk1
,chunk2
,chunk3
, andchunk4
to have the typebyte
but to only take up 6 bits in the structure. The result is thatsizeof(struct packed_bytes) == 3
. Now all you need is a little function to take your array and dump it into the structure like so:There you go, you now have an array of
struct packed_bytes
that you can easily read by using the above struct.您可以简单地使用计数器来计算当前字节中已使用了多少位,而不是使用状态机,从中您可以直接导出移位偏移量以及是否溢出到下一个字节。
关于字节顺序:只要您仅使用单一数据类型(即您不重新解释指向不同大小类型的指针(例如
int* a =...;short* b=(short*) a;
)在大多数情况下你不应该遇到字节序问题Instead of using a statemachine you can simply use a counter for how many bits are already used in the current byte, from which you can directly derive the shift-offsets and whether or not you overflow into the next byte.
Regarding the endianess: As long as you use only a single datatype (that is you don't reinterpret pointer to types of different size (e.g.
int* a =...;short* b=(short*) a;
) you shouldn't get problems with endianess in most cases结合 DigitalRoss 的紧凑代码、Grizzly 的建议和我自己的代码,我终于写出了自己的答案。虽然 DigitalRoss 提供了一个可用的工作答案,但我在不理解的情况下使用它,不会提供与学习某些东西相同的满足感。因此,我选择根据我的原始代码来回答。
我还选择忽略 Jonathon Leffler 给出的建议,以避免使用浮点运算来计算打包数据长度。给出的推荐方法(DigitalRoss 也使用相同的方法)将打包数据的长度增加了三个字节之多。当然,这并不多,但通过使用浮点数学也是可以避免的。
代码如下,欢迎批评指正:
Taking elements of DigitalRoss's compact code, Grizzly's suggestion, and my own code, I have written my own answer at last. Although DigitalRoss provides a usable working answer, my usage of it without understanding, would not have provided the same satisfaction as to learning something. For this reason I have chosen to base my answer on my original code.
I have also chosen to ignore the advice Jonathon Leffler gives to avoid using floating point arithmetic for the calculation of the packed data length. Both the recommended method given - the same DigitalRoss also uses, increases the length of the packed data by as much as three bytes. Granted this is not much, but is also avoidable by the use of floating point math.
Here is the code, criticisms welcome: