将不相关的数据存储在指针的最低有效位中可以可靠地工作吗?

发布于 2024-11-15 03:43:09 字数 1360 浏览 2 评论 0原文

让我先说一下,我知道我将要提出的建议是一个致命的罪过,而且我可能会因为考虑它而在编程地狱中被烧死。

也就是说,我仍然有兴趣知道是否有任何原因导致这行不通。

情况是:我有一个引用计数智能指针类,我在任何地方都使用它。它目前看起来像这样(注意:不完整/简化的伪代码):

class IRefCountable
{
public:
    IRefCountable() : _refCount(0) {}
    virtual ~IRefCountable() {}

    void Ref() {_refCount++;}
    bool Unref() {return (--_refCount==0);}

private:
    unsigned int _refCount;
};

class Ref
{
public:
   Ref(IRefCountable * ptr, bool isObjectOnHeap) : _ptr(ptr), _isObjectOnHeap(isObjectOnHeap) 
   { 
      _ptr->Ref();
   }

   ~Ref() 
   {
      if ((_ptr->Unref())&&(_isObjectOnHeap)) delete _ptr;
   }

private:
   IRefCountable * _ptr;
   bool _isObjectOnHeap;
};

今天我注意到 sizeof(Ref)=16。但是,如果我删除布尔成员变量 _isObjectOnHeapsizeof(Ref) 将减少到 8。这意味着对于我的程序中的每个 Ref ,浪费了 7.875 字节的 RAM...并且我的程序中有很多很多 Refs

嗯,这似乎浪费了一些内存。但我真的需要额外的信息(好吧,幽默一下,为了讨论的目的,假设我确实需要)。我注意到,由于 IRefCountable 是一个非 POD 类,因此(大概)它将始终分配在字对齐的内存地址上。因此,(_ptr) 的最低有效位应始终为零。

这让我想知道......有什么原因我不能将我的一位布尔数据或到指针的最低有效位,从而将 sizeof(Ref) 减少一半而不需要牺牲任何功能吗?当然,在取消引用指针之前,我必须小心地对该位进行“与”操作,这会降低指针取消引用的效率,但这可能可以通过引用现在更小,因此引用更多的事实来弥补可以立即放入处理器的缓存中,等等。

这是合理的做法吗?还是我让自己陷入了一个受伤的世界?如果是后者,我到底会受到怎样的伤害呢? (请注意,这是需要在所有相当现代的桌面环境中正确运行的代码,但它不需要在嵌入式计算机或超级计算机或任何类似的奇特环境中运行)

Let me just say up front that what I'm aware that what I'm about to propose is a mortal sin, and that I will probably burn in Programming Hell for even considering it.

That said, I'm still interested in knowing if there's any reason why this wouldn't work.

The situation is: I have a reference-counting smart-pointer class that I use everywhere. It currently looks something like this (note: incomplete/simplified pseudocode):

class IRefCountable
{
public:
    IRefCountable() : _refCount(0) {}
    virtual ~IRefCountable() {}

    void Ref() {_refCount++;}
    bool Unref() {return (--_refCount==0);}

private:
    unsigned int _refCount;
};

class Ref
{
public:
   Ref(IRefCountable * ptr, bool isObjectOnHeap) : _ptr(ptr), _isObjectOnHeap(isObjectOnHeap) 
   { 
      _ptr->Ref();
   }

   ~Ref() 
   {
      if ((_ptr->Unref())&&(_isObjectOnHeap)) delete _ptr;
   }

private:
   IRefCountable * _ptr;
   bool _isObjectOnHeap;
};

Today I noticed that sizeof(Ref)=16. However, if I remove the boolean member variable _isObjectOnHeap, sizeof(Ref) is reduced to 8. That means that for every Ref in my program, there are 7.875 wasted bytes of RAM... and there are many, many Refs in my program.

Well, that seems like a waste of some RAM. But I really need that extra bit of information (okay, humor me and assume for the sake of the discussion that I really do). And I notice that since IRefCountable is a non-POD class, it will (presumably) always be allocated on a word-aligned memory address. Therefore, the least significant bit of (_ptr) should always be zero.

Which makes me wonder... is there any reason why I can't OR my one bit of boolean data into the least-significant bit of the pointer, and thus reduce sizeof(Ref) by half without sacrificing any functionality? I'd have to be careful to AND out that bit before dereferencing the pointer, of course, which would make pointer dereferences less efficient, but that might be made up for by the fact that the Refs are now smaller, and thus more of them can fit into the processor's cache at once, and so on.

Is this a reasonable thing to do? Or am I setting myself up for a world of hurt? And if the latter, how exactly would that hurt be visited upon me? (Note that this is code that needs to run correctly in all reasonably modern desktop environments, but it doesn't need to run in embedded machines or supercomputers or anything exotic like that)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

暖风昔人 2024-11-22 03:43:09

如果您只想使用标准设施而不依赖于任何实现,那么使用 C++0x 有多种方法可以表达对齐(这里是 最近的问题我回答了)。还有 std::uintptr_t 来可靠地获取足够大以容纳指针的无符号整数类型。现在可以保证的一件事是从指针类型到 std::[u]intptr_t 的转换并返回到相同类型会产生原始指针。

我想你可能会争辩说,如果你能取回原始的 std::intptr_t (带有掩码),那么你就可以获得原始指针。我不知道这个推理有多可靠。

[编辑:考虑一下,不能保证对齐指针在转换为整数类型时采用任何特定形式,例如未设置某些位的指针。这里可能有点牵强]

If you want to use only the standard facilities and not rely on any implementation then with C++0x there are ways to express alignment (here is a recent question I answered). There's also std::uintptr_t to reliably get an unsigned integral type large enough to hold a pointer. Now the one thing guaranteed is that a conversion from the pointer type to std::[u]intptr_t and back to that same type yields the original pointer.

I suppose you could argue that if you can get back the original std::intptr_t (with masking), then you can get the original pointer. I don't know how solid this reasoning would be.

[edit: thinking about it there's no guarantee that an aligned pointer takes any particular form when converted to an integral type, e.g. one with some bits unset. probably too much of a stretch here]

迷路的信 2024-11-22 03:43:09

这里的问题是它完全依赖于机器。这在 C 或 C++ 代码中并不常见,但在汇编中肯定已经做过很多次了。旧的 Lisp 解释器几乎总是使用这个技巧来将类型信息存储在低位中。 (我在 C 代码中看到过它,但在为特定目标平台实现的项目中看到过。)

就我个人而言,如果我尝试编写可移植代码,我可能不会这样做。事实上,它几乎肯定可以在“所有相当现代的桌面环境”上运行。 (当然,它适用于我能想到的每一种。)

很大程度上取决于代码的性质。如果你正在维护它,并且没有其他人需要处理“受伤的世界”,那么它可能没问题。您必须为以后可能需要支持的任何奇怪的架构添加 #ifdef 指令。另一方面,如果您将其作为“可移植”代码发布给全世界,那就会引起人们的关注。

处理这个问题的另一种方法是编写两个版本的智能指针,一个用于可以工作的机器,一个用于不可以工作的机器。这样,只要您维护这两个版本,更改配置文件以使用 16 字节版本就不是什么大问题了。

不用说,您必须避免编写任何其他假设 sizeof(Ref) 为 8 而不是 16 的代码。如果您使用单元测试,请使用两个版本运行它们。

The problem here is that it is entirely machine-dependent. It isn't something one often sees in C or C++ code, but it has certainly been done many times in assembly. Old Lisp interpreters almost always used this trick to store type information in the low bit(s). (I have seen it in C code, but in projects that were being implemented for a specific target platform.)

Personally, if I were trying to write portable code, I probably wouldn't do this. The fact is that it will almost certainly work on "all reasonably modern desktop environments". (Certainly, it will work on every one I can think of.)

A lot depends on the nature of your code. If you are maintaining it, and nobody else will ever have to deal with the "world of hurt", then it might be OK. You will have to add #ifdef directives for any odd architecture that you might need to support later on. On the other hand, if you are releasing it to the world as "portable" code, that would be cause for concern.

Another way to handle this is to write two versions of your smart pointer, one for machines on which this will work and one for machines where it won't. That way, as long as you maintain both versions, it won't be that big a deal to change a config file to use the 16-byte version.

It goes without saying that you would have to avoid writing any other code that assumes sizeof(Ref) is 8 rather than 16. If you are using unit tests, run them with both versions.

倾城°AllureLove 2024-11-22 03:43:09

有什么原因吗?除非最近标准发生了变化,否则指针的值表示是实现定义的。当然,某些地方的某些实现可能会采用相同的技巧,为自己的目的定义这些原本未使用的低位。更有可能的是,某些实现可能使用字指针而不是字节指针,因此两个相邻的字不是位于“地址”0x8640 和 0x8642,而是位于“地址”0x4320 和 0x4321。

解决该问题的一个棘手方法是使 Ref 成为(事实上的)抽象类,并且所有实例实际上都是 RefOnHeap 和 RefNotOnHeap 的实例。如果周围有那么多 Ref,则用于存储三个类(而不是一个类)的代码和元数据的额外空间将由每个 Ref 大小减半所节省的空间来弥补。(不起作用)太好了,如果没有虚方法,编译器可以省略 vtable 指针,并且引入虚方法会将 4 或 8 字节添加回类中)。

Any reason? Unless things have changed in the standard lately, the value representation of a pointer is implementation-defined. It is certainly possible that some implementation somewhere may pull the same trick, defining these otherwise-unused low bits for its own purposes. It's even more possible that some implementation might use word-pointers rather than byte-pointers, so instead of two adjacent words being at "addresses" 0x8640 and 0x8642, they would be at "addresses" 0x4320 and 0x4321.

One tricky way around the problem would be to make Ref a (de facto) abstract class, and all instances would actually be instances of RefOnHeap and RefNotOnHeap. If there are that many Refs around, the extra space used to store the code and metadata for three classes rather than one would be made up by the space savings in having each Ref being half the size. (Won't work too well, the compiler can omit the vtable pointer if there are no virtual methods and introducing virtual methods will add the 4-or-8 bytes back to the class).

薄荷梦 2024-11-22 03:43:09

只要

  1. 您不指向对齐为 1 的结构体或数组内的任意位置,或者
  2. 平台为您提供了一个空闲位

,那么您始终至少有一个空闲位可以在指针中使用,因为 IRefCountable 的对齐方式为 4,您将在 IRefCountable* 中拥有 2 个可用的底部位来使用。


关于第一点,将数据存储在最低有效位中始终是可靠的< em>如果指针是对齐到大于 1 的 2 次方。这意味着它适用于除 char*/bool* 或指向包含以下内容的结构的指针之外的所有内容所有 char/bool 成员,显然它适用于您的情况下的 IRefCountable* 。在 C++11 中,您可以使用 alignofstd::alignment_of 以确保您有 即使您有一些仅 1 字节对齐

static_assert(alignof(Ref) > 1);
static_assert(alignof(IRefCountable) > 1);

// This check for power of 2 is likely redundant
static_assert((alignof(Ref) & (alignof(Ref) - 1)) == 0);

// Now IRefCountable* is always aligned,
// so its least significant bit can be used freely

的对象,例如,如果您将 IRefCountable 中的 _refCount 更改为 uint8_t ,那么您仍然可以使用 alignas,或与其他旧版 C++ 中的扩展,例如 __declspec(对齐)。动态分配的内存已经与 max_align_t 对齐,或者您可以使用 aligned_alloc() 用于更高级别的对齐


我的第二个要点意味着,如果您确实需要存储指向具有绝对 1 字节对齐的对象的任意指针,那么大多数时候您仍然可以利用平台的功能

  • 在许多 32 位平台上,地址空间分为用户进程和内核进程的两半。用户指针的最高有效位始终未设置,因此您可以使用它来存储数据。当然,它不能在用户地址空间超过 2GB 的平台上工作,例如当分割为 3/1 或 4/4 时
  • 在 64 位平台上,目前大多数只有 48 位虚拟地址,还有一些更新的高虚拟地址-end CPU 可能有 57 位虚拟地址,这与总的 64 位相差甚远。因此,您将有很多空闲空间。实际上,这在个人计算中始终有效,因为您永远无法填充巨大的地址空间,

这称为标记指针

如果数据始终是堆分配的,那么您可以告诉操作系统限制用于获取更多位的地址空间范围

有关更多信息,请阅读 在 64 位指针中使用额外的 16 位

You always have at least a free bit to use in the pointer as long as

  1. you're not pointing to arbitrary positions inside a struct or array with alignment of 1, or
  2. the platform gives you a free bit

Since IRefCountable has an alignment of 4, you'll have 2 free bottom bits in IRefCountable* to use


Regarding the first point, storing data in the least significant bit is always reliable if the pointer is aligned to a power of 2 larger than 1. That means it'll work for everything apart from char*/bool* or a pointer to a struct containing all char/bool members, and obviously it'll work for IRefCountable* in your case. In C++11 you can use alignof or std::alignment_of to ensure that you have the required alignment like this

static_assert(alignof(Ref) > 1);
static_assert(alignof(IRefCountable) > 1);

// This check for power of 2 is likely redundant
static_assert((alignof(Ref) & (alignof(Ref) - 1)) == 0);

// Now IRefCountable* is always aligned,
// so its least significant bit can be used freely

Even if you have some object with only 1-byte alignment, for example if you change the _refCount in IRefCountable to uint8_t, then you can still enforce alignment requirement with alignas, or with other extensions in older C++ like __declspec(align). Dynamically allocated memory is already aligned to max_align_t, or you can use aligned_alloc() for a higher level alignment


My second bullet point means in case you really need to store arbitrary pointers to objects with absolute 1-byte alignment then most of the time you can still utilize the feature from the platform

  • On many 32-bit platforms the address space is split in half for user and kernel processes. User pointers will always have the most significant bit unset so you can use that to store data. Of course it won't work on platforms with more than 2GB of user address space, like when the split is 3/1 or 4/4
  • On 64-bit platforms currently most have only 48-bit virtual address, and a few newer high-end CPUs may have 57-bit virtual address which is far from the total 64 bits. Therefore you'll have lots of bits to spare. And in reality this always work in personal computing since you'll never be able to fill that vast address space

This is called tagged pointer

If the data is always heap-allocated then you can tell the OS to limit the range of address space to use to get more bits

For more information read Using the extra 16 bits in 64-bit pointers

一笔一画续写前缘 2024-11-22 03:43:09

是的,这可以可靠地工作。事实上,Linux 内核将其用作其红黑树实现的一部分。内核没有存储额外的布尔值来指示节点是红色还是黑色(这可能会占用相当多的额外空间),而是使用父节点地址的低位。

来自 rbtree_types.h

struct rb_node {
    unsigned long  __rb_parent_color;
    struct rb_node *rb_right;
    struct rb_node *rb_left;
} __attribute__((aligned(sizeof(long))));

__rb_parent_color 字段存储节点父节点的地址和节点的颜色(在最低有效位中)。

获取指针

要从此字段检索父地址,您只需清除低位(这会清除最低 2 位)。

来自rbtree.h

#define rb_parent(r)   ((struct rb_node *)((r)->__rb_parent_color & ~3))

获取布尔值

要检索颜色,您只需提取较低位并将其视为布尔值即可。

来自rbtree_augmented.h

#define __rb_color(pc)     ((pc) & 1)
#define __rb_is_black(pc)  __rb_color(pc)
#define __rb_is_red(pc)    (!__rb_color(pc))
#define rb_color(rb)       __rb_color((rb)->__rb_parent_color)
#define rb_is_red(rb)      __rb_is_red((rb)->__rb_parent_color)
#define rb_is_black(rb)    __rb_is_black((rb)->__rb_parent_color)

设置指针和布尔值

您可以使用标准位操作操作设置指针和布尔值(确保保留最终值的每个部分)。

来自 rbtree_augmented.h

static inline void rb_set_parent(struct rb_node *rb, struct rb_node *p)
{
    rb->__rb_parent_color = rb_color(rb) | (unsigned long)p;
}

static inline void rb_set_parent_color(struct rb_node *rb,
                       struct rb_node *p, int color)
{
    rb->__rb_parent_color = (unsigned long)p | color;
}

您还可以通过 (unsigned long)p & 清除布尔值,将其设置为 false ~1

Yes, this can work reliably. This is, in fact, used by the Linux kernel as part of its red-black tree implementation. Instead of storing an extra boolean to indicate whether a node is red or black (which can take up quite a bit of additional space), the kernel uses the low-order bit of the parent node address.

From rbtree_types.h:

struct rb_node {
    unsigned long  __rb_parent_color;
    struct rb_node *rb_right;
    struct rb_node *rb_left;
} __attribute__((aligned(sizeof(long))));

The __rb_parent_color field stores both the address of the nodes parent and the color of the node (in the least-significant bit).

Getting The Pointer

To retrieve the parent address from this field you just clear the lower order bits (this clears the lowest 2-bits).

From rbtree.h:

#define rb_parent(r)   ((struct rb_node *)((r)->__rb_parent_color & ~3))

Getting The Boolean

To retrieve the color you just extract the lower bit and treat it like a boolean.

From rbtree_augmented.h:

#define __rb_color(pc)     ((pc) & 1)
#define __rb_is_black(pc)  __rb_color(pc)
#define __rb_is_red(pc)    (!__rb_color(pc))
#define rb_color(rb)       __rb_color((rb)->__rb_parent_color)
#define rb_is_red(rb)      __rb_is_red((rb)->__rb_parent_color)
#define rb_is_black(rb)    __rb_is_black((rb)->__rb_parent_color)

Setting The Pointer And Boolean

You set the pointer and boolean value using standard bit manipulation operations (making sure to preserve each part of the final value).

From rbtree_augmented.h:

static inline void rb_set_parent(struct rb_node *rb, struct rb_node *p)
{
    rb->__rb_parent_color = rb_color(rb) | (unsigned long)p;
}

static inline void rb_set_parent_color(struct rb_node *rb,
                       struct rb_node *p, int color)
{
    rb->__rb_parent_color = (unsigned long)p | color;
}

You can also clear the boolean value setting it to false via (unsigned long)p & ~1.

满身野味 2024-11-22 03:43:09

即使这种方法有效,心中总会有一种不确定性感,因为最终您正在使用的内部架构可能是可移植的,也可能不是可移植的。

另一方面,为了解决这个问题,如果你想避免 bool 变量,我会建议一个简单的构造函数,

Ref(IRefCountable * ptr) : _ptr(ptr) 
{
  if(ptr != 0) 
    _ptr->Ref();
}

从代码中,我闻到引用计数是仅当对象位于上时才需要。对于自动对象,您只需将 0 传递给 class Ref 并在构造函数/析构函数中进行适当的 null 检查。

There will be always a sense of uncertainty in mind even if this method is working, because ultimately you are playing with the internal architecture which may or may not be portable.

On the other hand to solve this problem, if you want to avoid bool variable, I would suggest a simple constructor as,

Ref(IRefCountable * ptr) : _ptr(ptr) 
{
  if(ptr != 0) 
    _ptr->Ref();
}

From the code, I smell that the reference counting is needed only when the object is on heap. For automatic objects, you can simply pass 0 to the class Ref and put appropriate null checks in constructor/destructor.

来日方长 2024-11-22 03:43:09

您是否考虑过一流的存储?

根据您是否需要担心多线程并控制 new/delete/malloc/free 的实现,这可能值得一试。

要点是,您将维护一个“计数器”映射地址,而不是递增本地计数器(对象本地)。计数会傲慢地忽略分配区域之外的地址(例如堆栈)。

这可能看起来很愚蠢(MT 中存在争用的空间),但它在只读方面也表现得相当好,因为对象不是仅为了计数而“修改”的。

当然,我不知道您希望通过此实现的性能:p

Have you thought about an out of class storage ?

Depending on whether you have (or not) to worry about multi-threading and control the implementation of new/delete/malloc/free, it might be worth a try.

The point would be that instead of incrementing a local counter (local to the object), you would maintain a "counter" map address --> count that would haughtily ignore addresses passed that are outside the allocated area (stack for example).

It may seem silly (there is room for contention in MT), but it also plays rather nice with read-only since the object is not "modified" only for counting.

Of course, I have no idea of the performance you might hope to achieve with this :p

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文