C/C++位数组或位向量

发布于 2024-10-10 14:37:31 字数 422 浏览 3 评论 0原文

我正在学习 C/C++ 编程 &遇到过“位数组”或“位向量”的使用。无法理解他们的目的吗?这是我的疑问 -

  1. 它们是否用作布尔标志?
  2. 可以使用 int 数组代替吗? (当然需要更多内存,但是..)
  3. 位屏蔽的概念是什么?
  4. 如果位掩码是简单的位操作以获得适当的标志,那么如何为它们编程?在 head 中执行此操作以查看与十进制数字相对应的标志是什么,是不是很困难?

我正在寻找应用程序,以便我可以更好地理解。例如 -

Q. 您将获得一个包含范围(1 到 100 万)整数的文件。有一些重复,因此缺少一些数字。寻找寻找失踪者的最快方法 数字?

对于上述问题,我已阅读告诉我使用位数组的解决方案。如何将每个整数存储在一位中?

I am learning C/C++ programming & have encountered the usage of 'Bit arrays' or 'Bit Vectors'. Am not able to understand their purpose? here are my doubts -

  1. Are they used as boolean flags?
  2. Can one use int arrays instead? (more memory of course, but..)
  3. What's this concept of Bit-Masking?
  4. If bit-masking is simple bit operations to get an appropriate flag, how do one program for them? is it not difficult to do this operation in head to see what the flag would be, as apposed to decimal numbers?

I am looking for applications, so that I can understand better. for Eg -

Q. You are given a file containing integers in the range (1 to 1 million). There are some duplicates and hence some numbers are missing. Find the fastest way of finding missing
numbers?

For the above question, I have read solutions telling me to use bit arrays. How would one store each integer in a bit?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

把昨日还给我 2024-10-17 14:37:31

我认为您对数组和数字感到困惑,特别是操作二进制数意味着什么。

我将通过例子来讨论这个问题。假设您有许多错误消息,并且希望将它们作为函数的返回值返回。现在,您可能会将错误标记为 1、2、3、4……这对您来说是有意义的,但是如果只给出一个数字,您如何计算出发生了哪些错误呢?

现在,尝试将错误标记为 1,2,4,8,16...基本上是 2 的递增幂。为什么这有效?好吧,当您以 2 为底时,您正在操作一个像 00000000 这样的数字,其中每个数字对应于 2 的幂乘以它从右侧开始的位置。假设发生了错误 1、4 和 8。那么这可以表示为 00001101。相反,第一个数字 = 1*2^0,第三个数字 1*2^2,第四个数字 1*2^3。将它们全部加起来得到 13。

现在,我们可以通过应用位掩码来测试是否发生了此类错误。例如,如果您想确定是否发生了错误 8,请使用 8 = 00001000 的位表示形式。现在,为了提取是否发生了该错误,请使用二进制文件,如下所示:

  00001101
& 00001000
= 00001000

我确信您知道 and 是如何工作的,或者可以从上面推断出它 - 按数字工作,如果任何两个数字都是1,结果为 1,否则为 0。

现在,在 C 中:

int func(...)
{
    int retval = 0;

    if ( sometestthatmeans an error )
    {
        retval += 1;
    }


    if ( sometestthatmeans an error )
    {
        retval += 2;
    }
    return retval
}

int anotherfunc(...)
{
    uint8_t x = func(...)

    /* binary and with 8 and shift 3 plaes to the right
     * so that the resultant expression is either 1 or 0 */
    if ( ( ( x & 0x08 ) >> 3 ) == 1 )
    {
        /* that error occurred */
    }
}

现在,开始实用。当内存稀疏并且协议没有详细的 xml 等的奢侈时,通常将字段定界为很多位宽。在该字段中,您将各种位(标志、2 的幂)分配给特定含义,并应用二进制运算来推断它们是否已设置,然后对它们进行操作。

我还应该补充一点,二进制运算在思想上与计算机的底层电子器件很接近。想象一下,如果位字段对应于各种电路的输出(是否承载电流)。通过使用足够多的上述电路的组合,你就可以制造出……一台计算机。

I think you've got yourself confused between arrays and numbers, specifically what it means to manipulate binary numbers.

I'll go about this by example. Say you have a number of error messages and you want to return them in a return value from a function. Now, you might label your errors 1,2,3,4... which makes sense to your mind, but then how do you, given just one number, work out which errors have occured?

Now, try labelling the errors 1,2,4,8,16... increasing powers of two, basically. Why does this work? Well, when you work base 2 you are manipulating a number like 00000000 where each digit corresponds to a power of 2 multiplied by its position from the right. So let's say errors 1, 4 and 8 occur. Well, then that could be represented as 00001101. In reverse, the first digit = 1*2^0, the third digit 1*2^2 and the fourth digit 1*2^3. Adding them all up gives you 13.

Now, we are able to test if such an error has occured by applying a bitmask. By example, if you wanted to work out if error 8 has occured, use the bit representation of 8 = 00001000. Now, in order to extract whether or not that error has occured, use a binary and like so:

  00001101
& 00001000
= 00001000

I'm sure you know how an and works or can deduce it from the above - working digit-wise, if any two digits are both 1, the result is 1, else it is 0.

Now, in C:

int func(...)
{
    int retval = 0;

    if ( sometestthatmeans an error )
    {
        retval += 1;
    }


    if ( sometestthatmeans an error )
    {
        retval += 2;
    }
    return retval
}

int anotherfunc(...)
{
    uint8_t x = func(...)

    /* binary and with 8 and shift 3 plaes to the right
     * so that the resultant expression is either 1 or 0 */
    if ( ( ( x & 0x08 ) >> 3 ) == 1 )
    {
        /* that error occurred */
    }
}

Now, to practicalities. When memory was sparse and protocols didn't have the luxury of verbose xml etc, it was common to delimit a field as being so many bits wide. In that field, you assign various bits (flags, powers of 2) to a certain meaning and apply binary operations to deduce if they are set, then operate on these.

I should also add that binary operations are close in idea to the underlying electronics of a computer. Imagine if the bit fields corresponded to the output of various circuits (carrying current or not). By using enough combinations of said circuits, you make... a computer.

以为你会在 2024-10-17 14:37:31

关于位数组的用法:

如果您知道“只有”100 万个数字 - 您可以使用 100 万位的数组。一开始,所有位都为零,每次读取数字时 - 使用该数字作为索引,并将该索引中的位更改为 1(如果它还不是 1)。

读取所有数字后 - 缺失的数字是数组中零的索引。

例如,如果我们只有 0 - 4 之间的数字,则数组的开头将如下所示:0 0 0 0 0。
如果我们读数字:3,2,2
数组看起来像这样:read 3 --> 0 0 0 1 0. 读 3(再次) --> 0 0 0 1 0.读取2 --> 0 0 1 1 0.检查零的索引:0,1,4 - 这些是缺失的数字

顺便说一句,当然你可以使用整数而不是位,但它可能需要(取决于系统)32倍内存

Sivan

regarding the usage the bits array :

if you know there are "only" 1 million numbers - you use an array of 1 million bits. in the beginning all bits will be zero and every time you read a number - use this number as index and change the bit in this index to be one (if it's not one already).

after reading all numbers - the missing numbers are the indices of the zeros in the array.

for example, if we had only numbers between 0 - 4 the array would look like this in the beginning: 0 0 0 0 0.
if we read the numbers : 3, 2, 2
the array would look like this: read 3 --> 0 0 0 1 0. read 3 (again) --> 0 0 0 1 0. read 2 --> 0 0 1 1 0. check the indices of the zeroes: 0,1,4 - those are the missing numbers

BTW, of course you can use integers instead of bits but it may take (depends on the system) 32 times memory

Sivan

背叛残局 2024-10-17 14:37:31

位数组或位向量可以作为布尔值数组。通常,布尔变量至少需要一个字节存储,但在位数组/向量中只需要一位。
如果您有大量此类数据,这会很方便,因此可以节省大量内存。

另一种用法是如果您有数字不完全适合标准变量,其大小为 8、16、32 或 64 位。您可以通过这种方式将一个由 4 位、一个 2 位和一个 10 位组成的数字存储到 16 位的位向量中。通常您必须使用 3 个大小为 8,8 和 16 位的变量,因此您只浪费了 50% 的存储空间。

但所有这些用途在业务应用程序中很少使用,在通过pinvoke/interop函数连接驱动程序以及进行低级编程时经常使用。

Bit Arrays or Bit Vectors can be though as an array of boolean values. Normally a boolean variable needs at least one byte storage, but in a bit array/vector only one bit is needed.
This gets handy if you have lots of such data so you save memory at large.

Another usage is if you have numbers which do not exactly fit in standard variables which are 8,16,32 or 64 bit in size. You could this way store into a bit vector of 16 bit a number which consists of 4 bit, one that is 2 bit and one that is 10 bits in size. Normally you would have to use 3 variables with sizes of 8,8 and 16 bit, so you only have 50% of storage wasted.

But all these uses are very rarely used in business aplications, the come to use often when interfacing drivers through pinvoke/interop functions and doing low level programming.

清泪尽 2024-10-17 14:37:31

位向量的位数组用作从位置到某个位值的映射。是的,它与 Bool 数组基本相同,但典型的 Bool 实现是一到四个字节长,并且使用了太多空间。

通过使用字数组和二进制掩码操作以及移位来存储和检索它们,我们可以更有效地存储相同数量的数据(更少的总体内存使用,更少的内存访问,更少的缓存未命中,更少的内存页面交换)。访问各个位的代码仍然非常简单。

C 语言中还内置了一些位字段支持(您可以编写诸如 int i:1; 之类的内容来表示“仅消耗一位”),但它不适用于数组,并且您的控制权较少总体结果(实现的细节取决于编译器和对齐问题)。

以下是回答“搜索缺失号码”问题的可能方法。为了简单起见,我将 int 大小固定为 32 位,但可以使用 sizeof(int) 编写它以使其可移植。并且(取决于编译器和目标处理器)只能使用 >> 才能使代码更快。 5 而不是 / 32& 31 而不是 % 32,但这只是为了给出想法。

#include <stdio.h>
#include <errno.h>
#include <stdint.h>

int main(){
    /* put all numbers from 1 to 1000000 in a file, except 765 and 777777 */
    {
        printf("writing test file\n");
        int x = 0;
        FILE * f = fopen("testfile.txt", "w");
        for (x=0; x < 1000000; ++x){
            if (x == 765 || x == 777760 || x == 777791){
                continue;
            }
            fprintf(f, "%d\n", x);
        }
        fprintf(f, "%d\n", 57768); /* this one is a duplicate */
        fclose(f);
    }

    uint32_t bitarray[1000000 / 32];

    /* read file containing integers in the range [1,1000000] */
    /* any non number is considered as separator */
    /* the goal is to find missing numbers */
    printf("Reading test file\n");
    {
        unsigned int x = 0;
        FILE * f = fopen("testfile.txt", "r");
        while (1 == fscanf(f, " %u",&x)){
            bitarray[x / 32] |= 1 << (x % 32);
        }
        fclose(f);
    }
    /* find missing number in bitarray */
    {
        int x = 0;
        for (x=0; x < (1000000 / 32) ; ++x){
            int n = bitarray[x];
            if (n != (uint32_t)-1){
                printf("Missing number(s) between %d and %d [%x]\n",
                    x * 32, (x+1) * 32, bitarray[x]);
                int b;
                for (b = 0 ; b < 32 ; ++b){
                    if (0 == (n & (1 << b))){
                        printf("missing number is %d\n", x*32+b);
                    }
                }
            }
        }
    }
}

Bit Arrays of Bit Vectors are used as a mapping from position to some bit value. Yes it's basically the same thing as an array of Bool, but typical Bool implementation is one to four bytes long and it uses too much space.

We can store the same amount of data much more efficiently by using arrays of words and binary masking operations and shifts to store and retrieve them (less overall memory used, less accesses to memory, less cache miss, less memory page swap). The code to access individual bits is still quite straightforward.

There is also some bit field support builtin in C language (you write things like int i:1; to say "only consume one bit") , but it is not available for arrays and you have less control of the overall result (details of implementation depends on compiler and alignment issues).

Below is a possible way to answer to your "search missing numbers" question. I fixed int size to 32 bits to keep things simple, but it could be written using sizeof(int) to make it portable. And (depending on the compiler and target processor) the code could only be made faster using >> 5 instead of / 32 and & 31 instead of % 32, but that is just to give the idea.

#include <stdio.h>
#include <errno.h>
#include <stdint.h>

int main(){
    /* put all numbers from 1 to 1000000 in a file, except 765 and 777777 */
    {
        printf("writing test file\n");
        int x = 0;
        FILE * f = fopen("testfile.txt", "w");
        for (x=0; x < 1000000; ++x){
            if (x == 765 || x == 777760 || x == 777791){
                continue;
            }
            fprintf(f, "%d\n", x);
        }
        fprintf(f, "%d\n", 57768); /* this one is a duplicate */
        fclose(f);
    }

    uint32_t bitarray[1000000 / 32];

    /* read file containing integers in the range [1,1000000] */
    /* any non number is considered as separator */
    /* the goal is to find missing numbers */
    printf("Reading test file\n");
    {
        unsigned int x = 0;
        FILE * f = fopen("testfile.txt", "r");
        while (1 == fscanf(f, " %u",&x)){
            bitarray[x / 32] |= 1 << (x % 32);
        }
        fclose(f);
    }
    /* find missing number in bitarray */
    {
        int x = 0;
        for (x=0; x < (1000000 / 32) ; ++x){
            int n = bitarray[x];
            if (n != (uint32_t)-1){
                printf("Missing number(s) between %d and %d [%x]\n",
                    x * 32, (x+1) * 32, bitarray[x]);
                int b;
                for (b = 0 ; b < 32 ; ++b){
                    if (0 == (n & (1 << b))){
                        printf("missing number is %d\n", x*32+b);
                    }
                }
            }
        }
    }
}
同尘 2024-10-17 14:37:31

它用于位标志存储,以及解析不同的二进制协议字段,其中 1 个字节被划分为多个位字段。它广泛用于 TCP/IP、ASN.1 编码、OpenPGP 数据包等协议中。

That is used for bit flags storage, as well as for parsing different binary protocols fields, where 1 byte is divided into a number of bit-fields. This is widely used, in protocols like TCP/IP, up to ASN.1 encodings, OpenPGP packets, and so on.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文