为什么 std::bitset<8> 4字节大?

发布于 2024-12-05 18:40:34 字数 202 浏览 4 评论 0 原文

看来对于 std::bitset<1 to 32>,大小设置为 4 字节。对于大小 33 到 64,它直接跳到 8 字节。不会有任何开销,因为 std::bitset<32>是偶数4个字节。

我可以看到在处理位时与字节长度对齐,但是为什么位集需要与字长对齐,特别是对于最有可能在内存预算紧张的情况下使用的容器?

这是VS2010下的。

It seems for std::bitset<1 to 32>, the size is set to 4 bytes. For sizes 33 to 64, it jumps straight up to 8 bytes. There can't be any overhead because std::bitset<32> is an even 4 bytes.

I can see aligning to byte length when dealing with bits, but why would a bitset need to align to word length, especially for a container most likely to be used in situations with a tight memory budget?

This is under VS2010.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

美人如玉 2024-12-12 18:40:34

最可能的解释是 bitset 使用全部机器字来存储数组。

这样做可能是出于内存带宽的原因:读/写在字边界对齐的字通常相对便宜。另一方面,在某些架构上读取(尤其是写入!)任意对齐的字节可能会很昂贵。

由于我们讨论的是每个位集几个字节的固定大小惩罚,这听起来对于通用库来说是一个合理的权衡。

The most likely explanation is that bitset is using a whole number of machine words to store the array.

This is probably done for memory bandwidth reasons: it is typically relatively cheap to read/write a word that's aligned at a word boundary. On the other hand, reading (and especially writing!) an arbitrarily-aligned byte can be expensive on some architectures.

Since we're talking about a fixed-sized penalty of a few bytes per bitset, this sounds like a reasonable tradeoff for a general-purpose library.

俏︾媚 2024-12-12 18:40:34

我假设对位集的索引是通过获取 32 位值然后隔离相关位来完成的,因为这在处理器指令方面是最快的(在 x86 上处理较小尺寸的值速度较慢)。为此所需的两个索引也可以非常快速地计算出来:

int wordIndex = (index & 0xfffffff8) >> 3;
int bitIndex = index & 0x7;

然后您可以执行此操作,这也非常快:

int word = m_pStorage[wordIndex];
bool bit = ((word & (1 << bitIndex)) >> bitIndex) == 1;

此外,恕我直言,每个位集最大浪费 3 个字节并不完全是内存问题。考虑到位集已经是存储此类信息的最有效的数据结构,因此您必须以总结构大小的百分比来评估浪费。

对于 1025 位,此方法使用 132 个字节而不是 129 个字节,开销为 2.3%(随着位集站点的增加,开销会减少)。考虑到可能的性能优势,听起来​​很合理。

I assume that indexing into the bitset is done by grabbing a 32-bit value and then isolating the relevant bit because this is fastest in terms of processor instructions (working with smaller-sized values is slower on x86). The two indexes needed for this can also be calculated very quickly:

int wordIndex = (index & 0xfffffff8) >> 3;
int bitIndex = index & 0x7;

And then you can do this, which is also very fast:

int word = m_pStorage[wordIndex];
bool bit = ((word & (1 << bitIndex)) >> bitIndex) == 1;

Also, a maximum waste of 3 bytes per bitset is not exactly a memory concern IMHO. Consider that a bitset is already the most efficient data structure to store this type of information, so you would have to evaluate the waste as a percentage of the total structure size.

For 1025 bits this approach uses up 132 bytes instead of 129, for 2.3% overhead (and this goes down as the bitset site goes up). Sounds reasonable considering the likely performance benefits.

雪落纷纷 2024-12-12 18:40:34

现代机器上的内存系统除了从内存中获取单词之外无法获取任何其他内容,除了一些提取所需位的遗留功能之外。因此,将位集与单词对齐可以使处理速度更快,因为在访问它时不需要屏蔽掉不需要的位。如果你不屏蔽,做类似的事情

bitset<4> foo = 0;
if (foo) {
    // ...
}

很可能会失败。除此之外,我记得不久前读到有一种方法可以将几个位集压缩在一起,但我记不太清楚了。我认为当您在一个结构中拥有多个位集时,它们可以占用“共享”内存,这不适用于位域的大多数用例。

The memory system on modern machines cannot fetch anything else but words from memory, apart from some legacy functions that extract the desired bits. Hence, having the bitsets aligned to words makes them a lot faster to handle, because you do not need to mask out the bits you don't need when accessing it. If you do not mask, doing something like

bitset<4> foo = 0;
if (foo) {
    // ...
}

will most likely fail. Apart from that, I remember reading some time ago that there was a way to cramp several bitsets together, but I don't remember exactly. I think it was when you have several bitsets together in a structure that they can take up "shared" memory, which is not applicable to most use cases of bitfields.

月光色 2024-12-12 18:40:34

我在 Aix 和 Linux 实现中具有相同的功能。在 Aix 中,内部位集存储是基于字符的:

typedef unsigned char _Ty;
....
_Ty _A[_Nw + 1];

在 Linux 中,内部存储是基于长的:

typedef unsigned long _WordT;
....
_WordT            _M_w[_Nw];

出于兼容性原因,我们使用基于字符的存储修改了 Linux 版本

检查您在 bitset.h 中使用的是哪种实现

I had the same feature in Aix and Linux implementations. In Aix, internal bitset storage is char based:

typedef unsigned char _Ty;
....
_Ty _A[_Nw + 1];

In Linux, internal storage is long based:

typedef unsigned long _WordT;
....
_WordT            _M_w[_Nw];

For compatibility reasons, we modified Linux version with char based storage

Check which implementation are you using inside bitset.h

抽个烟儿 2024-12-12 18:40:34

因为 32 位 Intel 兼容处理器无法单独访问字节(或者更好,它可以通过隐式应用一些位掩码和移位来访问),但一次只能访问 32 位字。

如果您声明

bitset<4> a,b,c;

即使库将其实现为 char,abc 也将是 32 位对齐,因此存在相同的浪费空间。但是,在让位集代码执行自己的掩码之前,处理器将被迫预先掩码字节。

因此,MS 使用 int[1+(N-1)/32] 作为位的容器。

Because a 32 bit Intel-compatible processor cannot access bytes individually (or better, it can by applying implicitly some bit mask and shifts) but only 32bit words at time.

if you declare

bitset<4> a,b,c;

even if the library implements it as char, a,b and c will be 32 bit aligned, so the same wasted space exist. But the processor will be forced to premask the bytes before letting bitset code to do its own mask.

For this reason MS used a int[1+(N-1)/32] as a container for the bits.

墨落成白 2024-12-12 18:40:34

也许是因为它默认使用 int ,如果溢出则切换到 long long ? (只是猜测...)

Maybe because it's using int by default, and switches to long long if it overflows? (Just a guess...)

萌面超妹 2024-12-12 18:40:34

如果你的 std::bitset< 8> 是结构体的成员,则可能会出现以下情况:

struct A
{
  std::bitset< 8 > mask;
  void * pointerToSomething;
}

如果 bitset<8> 存储在一个字节中(并且结构打包在 1 字节边界上),那么结构中跟随它的指针将是未对齐的,这将是一件坏事。拥有 bitset<8> 是安全且有用的唯一一次。如果它是在打包结构中,则存储在一个字节中,并且后面跟着一些其他可以将其打包在一起的单字节字段。我想这个用例太窄了,不值得提供库实现。

基本上,在八叉树中,单字节位集只有在打包结构中后面跟着另外一到三个单字节成员时才有用。否则,无论如何(在 32 位机器上)都必须将其填充到四个字节,以确保后面的变量是字对齐的。

If your std::bitset< 8 > was a member of a structure, you might have this:

struct A
{
  std::bitset< 8 > mask;
  void * pointerToSomething;
}

If bitset<8> was stored in one byte (and the structure packed on 1-byte boundaries) then the pointer following it in the structure would be unaligned, which would be A Bad Thing. The only time when it would be safe and useful to have a bitset<8> stored in one byte would be if it was in a packed structure and followed by some other one-byte fields with which it could be packed together. I guess this is too narrow a use case for it to be worthwhile providing a library implementation.

Basically, in your octree, a single byte bitset would only be useful if it was followed in a packed structure by another one to three single-byte members. Otherwise, it would have to be padded to four bytes anyway (on a 32-bit machine) to ensure that the following variable was word-aligned.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文