位掩码的大小是否有实际限制?

发布于 2024-07-06 10:23:24 字数 419 浏览 14 评论 0原文

有一种常见的方法可以使用位掩码在一个变量中存储多个值。 例如,如果用户对某个项目具有读取、写入和执行权限,则可以通过以下方式将其转换为单个数字:读取 = 4 (2^2)、写入 = 2 (2^1)、执行 = 1 (2^0) 然后将它们加在一起得到 7。

我在几个 Web 应用程序中使用这种技术,我通常将变量存储到一个字段中,并为其指定 MEDIUMINT 类型或其他类型,具体取决于关于不同值的数量。

我感兴趣的是,您可以像这样存储的值的数量是否存在实际限制? 例如,如果数字超过 64,则不能再使用(64 位)整数。 如果是这种情况,你会用什么? 它会如何影响您的程序逻辑(即:您仍然可以使用按位比较)吗?

我知道,一旦您开始获得非常大的值集,不同的方法将是最佳解决方案,但我对方法的边界感兴趣。

There's a common way to store multiple values in one variable, by using a bitmask. For example, if a user has read, write and execute privileges on an item, that can be converted to a single number by saying read = 4 (2^2), write = 2 (2^1), execute = 1 (2^0) and then add them together to get 7.

I use this technique in several web applications, where I'd usually store the variable into a field and give it a type of MEDIUMINT or whatever, depending on the number of different values.

What I'm interested in, is whether or not there is a practical limit to the number of values you can store like this? For example, if the number was over 64, you couldn't use (64 bit) integers any more. If this was the case, what would you use? How would it affect your program logic (ie: could you still use bitwise comparisons)?

I know that once you start getting really large sets of values, a different method would be the optimal solution, but I'm interested in the boundaries of this method.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

东北女汉子 2024-07-13 10:23:24

我突然想到,我会编写一个 set_bitget_bit 函数,它们可以采用字节数组和数组中的位偏移量,并使用一些位-摆弄以设置/获取数组中的适当位。 像这样的东西(用C语言,但希望你能明白):

// sets the n-th bit in |bytes|. num_bytes is the number of bytes in the array
// result is 0 on success, non-zero on failure (offset out-of-bounds)
int set_bit(char* bytes, unsigned long num_bytes, unsigned long offset)
{
  // make sure offset is valid
  if(offset < 0 || offset > (num_bytes<<3)-1) { return -1; }

  //set the right bit
  bytes[offset >> 3] |= (1 << (offset & 0x7));

  return 0; //success 
}

//gets the n-th bit in |bytes|. num_bytes is the number of bytes in the array
// returns (-1) on error, 0 if bit is "off", positive number if "on"
int get_bit(char* bytes, unsigned long num_bytes, unsigned long offset)
{
  // make sure offset is valid
  if(offset < 0 || offset > (num_bytes<<3)-1) { return -1; }

  //get the right bit
  return (bytes[offset >> 3] & (1 << (offset & 0x7));
}

Off the top of my head, I'd write a set_bit and get_bit function that could take an array of bytes and a bit offset in the array, and use some bit-twiddling to set/get the appropriate bit in the array. Something like this (in C, but hopefully you get the idea):

// sets the n-th bit in |bytes|. num_bytes is the number of bytes in the array
// result is 0 on success, non-zero on failure (offset out-of-bounds)
int set_bit(char* bytes, unsigned long num_bytes, unsigned long offset)
{
  // make sure offset is valid
  if(offset < 0 || offset > (num_bytes<<3)-1) { return -1; }

  //set the right bit
  bytes[offset >> 3] |= (1 << (offset & 0x7));

  return 0; //success 
}

//gets the n-th bit in |bytes|. num_bytes is the number of bytes in the array
// returns (-1) on error, 0 if bit is "off", positive number if "on"
int get_bit(char* bytes, unsigned long num_bytes, unsigned long offset)
{
  // make sure offset is valid
  if(offset < 0 || offset > (num_bytes<<3)-1) { return -1; }

  //get the right bit
  return (bytes[offset >> 3] & (1 << (offset & 0x7));
}
关于从前 2024-07-13 10:23:24

我在文件系统代码中使用了位掩码,其中位掩码比机器字大很多倍。 把它想象成一个“布尔数组”;

(如果您想知道,请在闪存中记录掩码)

许多编译器都知道如何为您执行此操作。 添加一点面向对象的代码,使类型能够合理地运行,然后你的代码开始看起来像它的意图,而不是一些位敲击。

我的2分钱。

I've used bit masks in filesystem code where the bit mask is many times bigger than a machine word. think of it like an "array of booleans";

(journalling masks in flash memory if you want to know)

many compilers know how to do this for you. Adda bit of OO code to have types that operate senibly and then your code starts looking like it's intent, not some bit-banging.

My 2 cents.

另类 2024-07-13 10:23:24

对于 64 位整数,最多可以存储 2^64-1 的值,64 只是 2^6。 所以是的,有一个限制,但如果您需要超过 64 个标志,我很想知道它们都在做什么:)

您需要考虑多少个状态? 如果有 64 个潜在状态,它们可以存在的组合数量就是 64 位整数的完整大小。

如果您需要担心 128 个标志,那么一对位向量就足够了 (2^64 * 2)。

补充:在Programming Pearls中,有一个关于使用长度为10^7的位数组的扩展讨论,以整数实现(用于保存使用的800个数字) - 它非常快,并且非常适合该任务在那一章中描述。

With a 64-bit integer, you can store values up to 2^64-1, 64 is only 2^6. So yes, there is a limit, but if you need more than 64-its worth of flags, I'd be very interested to know what they were all doing :)

How many states so you need to potentially think about? If you have 64 potential states, the number of combinations they can exist in is the full size of a 64-bit integer.

If you need to worry about 128 flags, then a pair of bit vectors would suffice (2^64 * 2).

Addition: in Programming Pearls, there is an extended discussion of using a bit array of length 10^7, implemented in integers (for holding used 800 numbers) - it's very fast, and very appropriate for the task described in that chapter.

高跟鞋的旋律 2024-07-13 10:23:24

有些语言(我相信 perl 确实如此,不确定)允许对字符串进行按位算术。 为您提供更大的有效范围。 ((strlen * 8bit chars)组合)

但是,我不会使用单个值来叠加多个/类型/数据。 3 位整数的基本 r/w/x 三元组可能是“实际”上限,不是出于空间效率原因,而是出于实际开发原因。

( PHP 使用这个系统来控制它的错误消息,我已经发现,当你必须定义 php 常量不常驻的值并且你必须手动生成整数时,它有点过分了,并且老实说,如果 chmod 不支持 'ugo+rwx' 样式语法,我永远不想使用它,因为我永远记不起幻数)

当你必须破解打开常量表来调试代码时,你知道你已经走得太远了。

Some languages ( I believe perl does, not sure ) permit bitwise arithmetic on strings. Giving you a much greater effective range. ( (strlen * 8bit chars ) combinations )

However, I wouldn't use a single value for superimposition of more than one /type/ of data. The basic r/w/x triplet of 3-bit ints would probably be the upper "practical" limit, not for space efficiency reasons, but for practical development reasons.

( Php uses this system to control its error-messages, and I have already found that its a bit over-the-top when you have to define values where php's constants are not resident and you have to generate the integer by hand, and to be honest, if chmod didn't support the 'ugo+rwx' style syntax I'd never want to use it because i can never remember the magic numbers )

The instant you have to crack open a constants table to debug code you know you've gone too far.

瞳孔里扚悲伤 2024-07-13 10:23:24

老线程,但值得一提的是,有些情况需要臃肿的位掩码,例如分子指纹,它们通常生成为 1024 位数组,我们将其打包在 32 个 bigint 字段中(SQL Server 不支持 UInt32)。 按位操作工作得很好 - 直到您的表开始增长并且您意识到单独的函数调用的缓慢性。 如果不是 T-SQL 禁止使用具有两个二进制操作数的按位运算符,二进制数据类型将会起作用。

Old thread, but it's worth mentioning that there are cases requiring bloated bit masks, e.g., molecular fingerprints, which are often generated as 1024-bit arrays which we have packed in 32 bigint fields (SQL Server not supporting UInt32). Bit wise operations work fine - until your table starts to grow and you realize the sluggishness of separate function calls. The binary data type would work, were it not for T-SQL's ban on bitwise operators having two binary operands.

雨夜星沙 2024-07-13 10:23:24

例如,.NET 使用整数数组作为 BitArray 类的内部存储。
实际上没有其他办法。

话虽这么说,在 SQL 中,您将需要多个列(或使用 BLOBS)来存储所有状态。

For example .NET uses array of integers as an internal storage for their BitArray class.
Practically there's no other way around.

That being said, in SQL you will need more than one column (or use the BLOBS) to store all the states.

拥抱没勇气 2024-07-13 10:23:24

您标记了这个问题SQL,所以我认为您需要查阅数据库的文档来查找整数的大小。 然后为安全起见减去一位符号。

编辑:您的评论表明您正在使用 MySQL。 MySQL 5.0 数字类型 的文档指出NUMERIC 的最大长度为 64 或 65 位数字。 即 64 位数字为 212 位。

请记住,您选择的语言必须能够处理这些数字,因此您可能只能使用 64 位整数。

You tagged this question SQL, so I think you need to consult with the documentation for your database to find the size of an integer. Then subtract one bit for the sign, just to be safe.

Edit: Your comment says you're using MySQL. The documentation for MySQL 5.0 Numeric Types states that the maximum size of a NUMERIC is 64 or 65 digits. That's 212 bits for 64 digits.

Remember that your language of choice has to be able to work with those digits, so you may be limited to a 64-bit integer anyway.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文