如何在c中进行位集/字节数组转换

发布于 2024-12-09 10:35:21 字数 535 浏览 0 评论 0原文

给定一个数组， unsigned char q[32]="1100111..."，

如何生成 4 字节位集，unsigned char p[4]，这样，该位组的位等于数组内的值，例如，第一个字节 p[0]= "q[0] ... q[7]"；第二个字节 p[1]="q[8] ... q[15]" 等，

以及如何相反地做到这一点，即给定位集，生成数组？

我自己对第一部分进行了尝试。

unsigned char p[4]={0};
for (int j=0; j<N; j++) 
{
    if (q[j] == '1')
    {
        p [j / 8] |= 1 << (7-(j % 8)); 
    }            
}

上面的说法对吗？有什么条件要检查吗？还有更好的办法吗？

编辑 - 1

我想知道以上是否是有效的方法？因为数组大小可能高达 4096 甚至更多。

原文

Given an array,
unsigned char q[32]="1100111...",

how can I generate a 4-bytes bit-set, unsigned char p[4], such that, the bit of this bit-set, equals to value inside the array, e.g., the first byte p[0]= "q[0] ... q[7]"; 2nd byte p[1]="q[8] ... q[15]", etc.

and also how to do it in opposite, i.e., given bit-set, generate the array?

my own trial out for the first part.

unsigned char p[4]={0};
for (int j=0; j<N; j++) 
{
    if (q[j] == '1')
    {
        p [j / 8] |= 1 << (7-(j % 8)); 
    }            
}

Is the above right? any conditions to check? Is there any better way?

EDIT - 1

I wonder if above is efficient way? As the array size could be upto 4096 or even more.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

遗心遗梦遗幸福 2024-12-16 10:35:21

首先，使用 strtoul 获取 32 位值。然后使用htonl将字节顺序转换为big-endian。最后，将结果存储在数组中：

#include <arpa/inet.h>
#include <stdlib.h>

/* ... */
unsigned char q[32] = "1100111...";
unsigned char result[4] = {0};
*(unsigned long*)result = htonl(strtoul(q, NULL, 2));

还有其他方法。

但我缺少！

然后你需要知道你的平台是什么字节顺序。如果它是大端字节序，则 htonl 不执行任何操作并且可以省略。如果它是小端字节序，那么 htonl 只是：

unsigned long htonl(unsigned long x)
{
    x = (x & 0xFF00FF00) >> 8) | (x & 0x00FF00FF) << 8);
    x = (x & 0xFFFF0000) >> 16) | (x & 0x0000FFFF) << 16);
    return x;
}

如果幸运的话，您的优化器可能会看到您在做什么并将其转换为高效的代码。如果没有，那么至少它都可以在寄存器中实现，并且时间复杂度为 O(log N)。

如果您不知道您的平台是什么字节顺序，那么您需要检测它：

typedef union {
    char c[sizeof(int) / sizeof(char)];
    int i;
} OrderTest;

unsigned long htonl(unsigned long x)
{
    OrderTest test;
    test.i = 1;
    if(!test.c[0])
        return x;

    x = (x & 0xFF00FF00) >> 8) | (x & 0x00FF00FF) << 8);
    x = (x & 0xFFFF0000) >> 16) | (x & 0x0000FFFF) << 16);
    return x;
}

也许 `long` 是 8 个字节！

好吧，OP 暗示 4 字节输入及其数组大小，但 8 字节 long 是可行的：

#define kCharsPerLong (sizeof(long) / sizeof(char))
unsigned char q[8 * kCharsPerLong] = "1100111...";
unsigned char result[kCharsPerLong] = {0};
*(unsigned long*)result = htonl(strtoul(q, NULL, 2));

unsigned long htonl(unsigned long x)
{
#if kCharsPerLong == 4
    x = (x & 0xFF00FF00UL) >> 8) | (x & 0x00FF00FFUL) << 8);
    x = (x & 0xFFFF0000UL) >> 16) | (x & 0x0000FFFFUL) << 16);
#elif kCharsPerLong == 8
    x = (x & 0xFF00FF00FF00FF00UL) >> 8) | (x & 0x00FF00FF00FF00FFUL) << 8);
    x = (x & 0xFFFF0000FFFF0000UL) >> 16) | (x & 0x0000FFFF0000FFFFUL) << 16);
    x = (x & 0xFFFFFFFF00000000UL) >> 32) | (x & 0x00000000FFFFFFFFUL) << 32);
#else
#error Unsupported word size.
#endif
    return x;
}

对于不是 8 位的 char （DSP 喜欢这样做），你就靠你自己了。（这就是为什么当 SHARC 系列 DSP 具有 8 位字节时这是一件大事；它使移植现有代码变得更加容易，因为面对现实，C 在可移植性支持方面做得很糟糕。）

任意长度怎么样？缓冲区？请不要使用有趣的指针类型转换。

OP 版本可以改进的主要内容是重新考虑循环的内部结构。不要将输出字节视为固定数据寄存器，而应将其视为移位寄存器，其中每个连续位都移至右端 (LSB)。这将使您免于所有这些部门和模组（希望它们能够针对位移进行优化）。

为了理智起见，我放弃了 unsigned char 而使用 uint8_t。

#include <stdint.h>

unsigned StringToBits(const char* inChars, uint8_t* outBytes, size_t numBytes,
    size_t* bytesRead)
/* Converts the string of '1' and '0' characters in `inChars` to a buffer of
 * bytes in `outBytes`. `numBytes` is the number of available bytes in the
 * `outBytes` buffer. On exit, if `bytesRead` is not NULL, the value it points
 * to is set to the number of bytes read (rounding up to the nearest full
 * byte). If a multiple of 8 bits is not read, the last byte written will be
 * padded with 0 bits to reach a multiple of 8 bits. This function returns the
 * number of padding bits that were added. For example, an input of 11 bits
 * will result `bytesRead` being set to 2 and the function will return 5. This
 * means that if a nonzero value is returned, then a partial byte was read,
 * which may be an error.
 */
{   size_t bytes = 0;
    unsigned bits = 0;
    uint8_t x = 0;

    while(bytes < numBytes)
    {   /* Parse a character. */
        switch(*inChars++)
        {   '0': x <<= 1; ++bits; break;
            '1': x = (x << 1) | 1; ++bits; break;
            default: numBytes = 0;
        }

        /* See if we filled a byte. */
        if(bits == 8)
        {   outBytes[bytes++] = x;
            x = 0;
            bits = 0;
        }
    }

    /* Padding, if needed. */
    if(bits)
    {   bits = 8 - bits;
        outBytes[bytes++] = x << bits;
    }

    /* Finish up. */
    if(bytesRead)
        *bytesRead = bytes;
    return bits;
}

您有责任确保 inChars 以 null 终止。该函数将在它看到的第一个非 '0' 或 '1' 字符处返回，或者如果它用完了输出缓冲区。一些示例用法：

unsigned char q[32] = "1100111...";
uint8_t buf[4];
size_t bytesRead = 5;
if(StringToBits(q, buf, 4, &bytesRead) || bytesRead != 4)
{
    /* Partial read; handle error here. */
}

这仅读取 4 个字节，如果不能读取则捕获错误。

unsigned char q[4096] = "1100111...";
uint8_t buf[512];
StringToBits(q, buf, 512, NULL);

这只是转换它能转换的部分，并将其余部分设置为 0 位。

如果 C 能够break跳出多级循环或switch，那么这个函数可以做得更好；就目前情况而言，我必须添加一个标志值才能获得相同的效果，这很混乱，或者我必须添加一个 goto，但我只是拒绝这样做。

First, Use strtoul to get a 32-bit value. Then convert the byte order to big-endian with htonl. Finally, store the result in your array:

#include <arpa/inet.h>
#include <stdlib.h>

/* ... */
unsigned char q[32] = "1100111...";
unsigned char result[4] = {0};
*(unsigned long*)result = htonl(strtoul(q, NULL, 2));

There are other ways as well.

But I lack `<arpa/inet.h>`!

Then you need to know what byte order your platform is. If it's big endian, then htonl does nothing and can be omitted. If it's little-endian, then htonl is just:

unsigned long htonl(unsigned long x)
{
    x = (x & 0xFF00FF00) >> 8) | (x & 0x00FF00FF) << 8);
    x = (x & 0xFFFF0000) >> 16) | (x & 0x0000FFFF) << 16);
    return x;
}

If you're lucky, your optimizer might see what you're doing and make it into efficient code. If not, well, at least it's all implementable in registers and O(log N).

If you don't know what byte order your platform is, then you need to detect it:

typedef union {
    char c[sizeof(int) / sizeof(char)];
    int i;
} OrderTest;

unsigned long htonl(unsigned long x)
{
    OrderTest test;
    test.i = 1;
    if(!test.c[0])
        return x;

    x = (x & 0xFF00FF00) >> 8) | (x & 0x00FF00FF) << 8);
    x = (x & 0xFFFF0000) >> 16) | (x & 0x0000FFFF) << 16);
    return x;
}

Maybe `long` is 8 bytes!

Well, the OP implied 4-byte inputs with their array size, but 8-byte long is doable:

#define kCharsPerLong (sizeof(long) / sizeof(char))
unsigned char q[8 * kCharsPerLong] = "1100111...";
unsigned char result[kCharsPerLong] = {0};
*(unsigned long*)result = htonl(strtoul(q, NULL, 2));

unsigned long htonl(unsigned long x)
{
#if kCharsPerLong == 4
    x = (x & 0xFF00FF00UL) >> 8) | (x & 0x00FF00FFUL) << 8);
    x = (x & 0xFFFF0000UL) >> 16) | (x & 0x0000FFFFUL) << 16);
#elif kCharsPerLong == 8
    x = (x & 0xFF00FF00FF00FF00UL) >> 8) | (x & 0x00FF00FF00FF00FFUL) << 8);
    x = (x & 0xFFFF0000FFFF0000UL) >> 16) | (x & 0x0000FFFF0000FFFFUL) << 16);
    x = (x & 0xFFFFFFFF00000000UL) >> 32) | (x & 0x00000000FFFFFFFFUL) << 32);
#else
#error Unsupported word size.
#endif
    return x;
}

For char that isn't 8 bits (DSPs like to do this), you're on your own. (This is why it was a Big Deal when the SHARC series of DSPs had 8-bit bytes; it made it a LOT easier to port existing code because, face it, C does a horrible job of portability support.)

What about arbitrary length buffers? No funny pointer typecasts, please.

The main thing that can be improved with the OP's version is to rethink the loop's internals. Instead of thinking of the output bytes as a fixed data register, think of it as a shift register, where each successive bit is shifted into the right (LSB) end. This will save you from all those divisions and mods (which, hopefully, are optimized away to bit shifts).

For sanity, I'm ditching unsigned char for uint8_t.

#include <stdint.h>

unsigned StringToBits(const char* inChars, uint8_t* outBytes, size_t numBytes,
    size_t* bytesRead)
/* Converts the string of '1' and '0' characters in `inChars` to a buffer of
 * bytes in `outBytes`. `numBytes` is the number of available bytes in the
 * `outBytes` buffer. On exit, if `bytesRead` is not NULL, the value it points
 * to is set to the number of bytes read (rounding up to the nearest full
 * byte). If a multiple of 8 bits is not read, the last byte written will be
 * padded with 0 bits to reach a multiple of 8 bits. This function returns the
 * number of padding bits that were added. For example, an input of 11 bits
 * will result `bytesRead` being set to 2 and the function will return 5. This
 * means that if a nonzero value is returned, then a partial byte was read,
 * which may be an error.
 */
{   size_t bytes = 0;
    unsigned bits = 0;
    uint8_t x = 0;

    while(bytes < numBytes)
    {   /* Parse a character. */
        switch(*inChars++)
        {   '0': x <<= 1; ++bits; break;
            '1': x = (x << 1) | 1; ++bits; break;
            default: numBytes = 0;
        }

        /* See if we filled a byte. */
        if(bits == 8)
        {   outBytes[bytes++] = x;
            x = 0;
            bits = 0;
        }
    }

    /* Padding, if needed. */
    if(bits)
    {   bits = 8 - bits;
        outBytes[bytes++] = x << bits;
    }

    /* Finish up. */
    if(bytesRead)
        *bytesRead = bytes;
    return bits;
}

It's your responsibility to make sure inChars is null-terminated. The function will return on the first non-'0' or '1' character it sees or if it runs out of output buffer. Some example usage:

unsigned char q[32] = "1100111...";
uint8_t buf[4];
size_t bytesRead = 5;
if(StringToBits(q, buf, 4, &bytesRead) || bytesRead != 4)
{
    /* Partial read; handle error here. */
}

This just reads 4 bytes, and traps the error if it can't.

unsigned char q[4096] = "1100111...";
uint8_t buf[512];
StringToBits(q, buf, 512, NULL);

This just converts what it can and sets the rest to 0 bits.

This function could be done better if C had the ability to break out of more than one level of loop or switch; as it stands, I'd have to add a flag value to get the same effect, which is clutter, or I'd have to add a goto, which I simply refuse.

回复收藏 0 原文

放血 2024-12-16 10:35:21

我认为这不太有效。您将每个“位”与 1 进行比较，而实际上它应该是 '1'。您还可以通过去掉 if 来提高效率：

unsigned char p[4]={0};
for (int j=0; j<32; j++) 
{
    p [j / 8] |= (q[j] == `1`) << (7-(j % 8));           
}

反向操作也非常简单。只需屏蔽您之前设置的每个“位”即可。

unsigned char q[32]={0};
for (int j=0; j<32; j++) {
  q[j] = p[j / 8] & ( 1 << (7-(j % 8)) ) + '0';
}

您会注意到创造性地使用了 (boolean) + '0' 在 1/0 和 '1'/'0' 之间进行转换。

I don't think that will quite work. You are comparing each "bit" to 1 when it should really be '1'. You can also make it a bit more efficient by getting rid of the if:

unsigned char p[4]={0};
for (int j=0; j<32; j++) 
{
    p [j / 8] |= (q[j] == `1`) << (7-(j % 8));           
}

Going in reverse is pretty simple too. Just mask for each "bit" that you set earlier.

unsigned char q[32]={0};
for (int j=0; j<32; j++) {
  q[j] = p[j / 8] & ( 1 << (7-(j % 8)) ) + '0';
}

You'll notice the creative use of (boolean) + '0' to convert between 1/0 and '1'/'0'.

回复收藏 0 原文

终遇你 2024-12-16 10:35:21

根据你的例子，它看起来并不像你想要的可读性，并且在（后期）刷新之后，我的解决方案看起来与 Chriszuma 非常相似，除了由于操作顺序和添加 !! 而缺少括号之外。强制执行 0 或 1。

const size_t N = 32; //N must be a multiple of 8
unsigned char q[N+1] = "11011101001001101001111110000111";
unsigned char p[N/8] = {0};
unsigned char r[N+1] = {0}; //reversed

for(size_t i = 0; i < N; ++i)
    p[i / 8] |= (q[i] == '1') << 7 - i % 8;

for(size_t i = 0; i < N; ++i)
    r[i] = '0' + !!(p[i / 8] & 1 << 7 - i % 8);

printf("%x %x %x %x\n", p[0], p[1], p[2], p[3]);
printf("%s\n%s\n", q,r);

According to your example it does not look like you are going for readability, and after a (late) refresh my solution looks very similar to Chriszuma except for the lack of parenthesis due to order of operations and the addition of the !! to enforce a 0 or 1.

const size_t N = 32; //N must be a multiple of 8
unsigned char q[N+1] = "11011101001001101001111110000111";
unsigned char p[N/8] = {0};
unsigned char r[N+1] = {0}; //reversed

for(size_t i = 0; i < N; ++i)
    p[i / 8] |= (q[i] == '1') << 7 - i % 8;

for(size_t i = 0; i < N; ++i)
    r[i] = '0' + !!(p[i / 8] & 1 << 7 - i % 8);

printf("%x %x %x %x\n", p[0], p[1], p[2], p[3]);
printf("%s\n%s\n", q,r);

回复收藏 0 原文

山田美奈子 2024-12-16 10:35:21

如果您正在寻求极高的效率，请尝试使用以下技术：

通过减去 '0' 替换 if （似乎您可以假设您的输入符号只能是 <代码>0 或1）。
还处理从较低指数到较高指数的输入。

for (int c = 0; c < N; c += 8)
{
    int y = 0;
    for (int b = 0; b < 8; ++b)
        y = y * 2 + q[c + b] - '0';
    p[c / 8] = y;
}

用自动递增指针替换数组索引：

const char* qptr = q;
unsigned char* pptr = p;
for (int c = 0; c < N; c += 8)
{
    int y = 0;
    for (int b = 0; b < 8; ++b)
        y = y * 2 + *qptr++ - '0';
    *pptr++ = y;
}

展开内部循环：

const char* qptr = q;
unsigned char* pptr = p;
for (int c = 0; c < N; c += 8)
{
    *pptr++ =
        qptr[0] - '0' << 7 |
        qptr[1] - '0' << 6 |
        qptr[2] - '0' << 5 |
        qptr[3] - '0' << 4 |
        qptr[4] - '0' << 3 |
        qptr[5] - '0' << 2 |
        qptr[6] - '0' << 1 |
        qptr[7] - '0' << 0;
    qptr += 8;
}

同时处理多个输入字符（使用位旋转黑客或 MMX 指令） - 这具有巨大的加速潜力！

If you are looking for extreme efficiency, try to use the following techniques:

Replace if by subtraction of '0' (seems like you can assume your input symbols can be only 0 or 1).
Also process the input from lower indices to higher ones.

for (int c = 0; c < N; c += 8)
{
    int y = 0;
    for (int b = 0; b < 8; ++b)
        y = y * 2 + q[c + b] - '0';
    p[c / 8] = y;
}

Replace array indices by auto-incrementing pointers:

const char* qptr = q;
unsigned char* pptr = p;
for (int c = 0; c < N; c += 8)
{
    int y = 0;
    for (int b = 0; b < 8; ++b)
        y = y * 2 + *qptr++ - '0';
    *pptr++ = y;
}

Unroll the inner loop:

const char* qptr = q;
unsigned char* pptr = p;
for (int c = 0; c < N; c += 8)
{
    *pptr++ =
        qptr[0] - '0' << 7 |
        qptr[1] - '0' << 6 |
        qptr[2] - '0' << 5 |
        qptr[3] - '0' << 4 |
        qptr[4] - '0' << 3 |
        qptr[5] - '0' << 2 |
        qptr[6] - '0' << 1 |
        qptr[7] - '0' << 0;
    qptr += 8;
}

Process several input characters simultaneously (using bit twiddling hacks or MMX instructions) - this has great speedup potential!

回复收藏 0 原文

~没有更多了~