std::string 分配策略

发布于 2024-12-26 04:28:27 字数 1067 浏览 3 评论 0原文

我对一些基本的字符串实现有点困惑。我一直在浏览源代码以了解内部工作原理并学习新东西。我无法完全掌握内存是如何管理的。

只是基本字符串实现的一些花絮

  • 原始分配器适用于 char 类型

    typedef 类型名 _Alloc::template rebind::other _Raw_bytes_alloc;
    
  • ...然后在分配 Rep 时将其放置在分配的缓冲区中 __size 被计算以适合字符

    size_type __size = (__capacity + 1) * sizeof(_CharT) + sizeof(_Rep);
    void* __place = _Raw_bytes_alloc(__alloc).allocate(__size);
    _Rep *__p = new (__place) _Rep;
    
  • 这就是从 _Rep 获取字符数据的方式缓冲区

    _CharT* _M_refdata() throw()
    {
        返回reinterpret_cast<_CharT*>(this + 1);
    }
    
  • 设置字符 - 一种方式

    _M_assign(__p->_M_refdata(), __n, __c);
    

令我困扰的是原始分配器是 char 类型,但分配的内存可能保存一个 _Rep 对象,加上字符数据(它没有为 char 类型)

此外,为什么(或者更确切地说,如何)调用 _M_refdata 知道字符数据的开始(或结束)在缓冲区内的位置(即 this+1< /代码>)

编辑:确实this+1 只是将内部指针推到 _Rep 对象之后的下一个位置?

我对内存对齐和转换有基本的了解,但这似乎超出了我读过的任何内容。

有人可以帮忙,或者给我指出更多信息丰富的阅读材料吗?

I am a bit confused with some of the basic string implementation. I have been going through the source to understand the inner working and learn new things. I can't entirely grasp how the memory is managed.

Just some tidbits from the basic string implementation

  • The raw allocator is for char type

    typedef typename _Alloc::template rebind<char>::other _Raw_bytes_alloc;
    
  • ...then when allocating Rep is placed within the allocated buffer __size is calculated to also fit the characters

    size_type __size = (__capacity + 1) * sizeof(_CharT) + sizeof(_Rep);
    void* __place = _Raw_bytes_alloc(__alloc).allocate(__size);
    _Rep *__p = new (__place) _Rep;
    
  • This is how the character data is fetched from the _Rep buffer

    _CharT* _M_refdata() throw()
    {
        return reinterpret_cast<_CharT*>(this + 1);
    }
    
  • Setting up the character - for one type of way

    _M_assign(__p->_M_refdata(), __n, __c);
    

What is bothering me is that the raw allocator is type char, but the allocated memory may hold a _Rep object, plus the character data (which does not have to be type char)

Also, why (or rather how) does the call to _M_refdata know where the start (or end) of the character data is within the buffer (ie this+1)

Edit: does this+1 just push the internal pointer to the next position after the _Rep object?

I have a basic understanding of memory alignment and casting, but this seems to go beyond anything I have read up on.

Can anybody help, or point me to more informative reading material?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

灼痛 2025-01-02 04:28:27

您错过了新的展示位置。该行在

_Rep *__p = new (__place) _Rep;

__place 初始化一个新的 _Rep 对象。之前已经分配了这个空间(意味着placement-new本身不会分配,它实际上只是一个构造函数调用)。

C 和 C++ 中的指针算术告诉您,this + 1 是一个指针,指向 this 右侧的 sizeof(*this) 字节。由于之前已经分配了 (__capacity + 1) * sizeof(_CharT) + sizeof(_Rep) 字节,因此 _Rep 对象后面的空间用于字符数据。布局是这样的:

| _Rep |  (__capacity + 1) * _CharT  |

You're missing the placement new. The line

_Rep *__p = new (__place) _Rep;

initializes a new _Rep-object at __place. The space for this has already been allocated before (meaning a placement-new doesn't allocate by itself, it's actually only a constructor call).

Pointer arithmetics in C and C++ tells you, that this + 1 is a pointer that points sizeof(*this) bytes right of this. Since there have been allocated (__capacity + 1) * sizeof(_CharT) + sizeof(_Rep) bytes before, the space after the _Rep object is used for the character data. The layout is thus like this:

| _Rep |  (__capacity + 1) * _CharT  |
因为看清所以看轻 2025-01-02 04:28:27

分配器,如 C 的 malloc,返回指向字节的指针,而不是对象。因此,返回类型是 char *void *

在 C 和 C++ 标准中的某个位置,有一个子句明确允许在 char 和任何其他对象类型之间重新解释转换。这是因为 C 通常需要将对象视为字节数组(如写入磁盘或网络套接字时),并且需要将字节数组视为对象(如分配一定范围的内存或从磁盘读取时)。

为了防止别名和优化问题,您不允许将相同的 char * 转换为不同类型的对象,并且一旦转换了 char * 对于对象类型,不允许通过写入字节来修改对象的值。

Allocators, like C's malloc, return pointers to bytes, not objects. So, the return type is either char * or void *.

Somewhere in the C and C++ standards, there is a clause that explicitly allows reinterpret casting between char and any other object type. This is because C often needs to treat objects as byte arrays (as when writing to disk or a network socket) and it needs to treat byte arrays as objects (like when allocating a range of memory or reading from disk).

To protect against aliasing and optimization problems, you're not allowed to cast the same char * to different types of objects and once you've casted a char * to an object type, you are not allowed to modify the object's value by writing to the bytes.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文