std::string 分配策略
我对一些基本的字符串实现有点困惑。我一直在浏览源代码以了解内部工作原理并学习新东西。我无法完全掌握内存是如何管理的。
只是基本字符串实现的一些花絮
原始分配器适用于 char 类型
typedef 类型名 _Alloc::template rebind
::other _Raw_bytes_alloc; ...然后在分配 Rep 时将其放置在分配的缓冲区中
__size
被计算以适合字符size_type __size = (__capacity + 1) * sizeof(_CharT) + sizeof(_Rep); void* __place = _Raw_bytes_alloc(__alloc).allocate(__size); _Rep *__p = new (__place) _Rep;
这就是从 _Rep 获取字符数据的方式缓冲区
_CharT* _M_refdata() throw() { 返回reinterpret_cast<_CharT*>(this + 1); }
设置字符 - 一种方式
_M_assign(__p->_M_refdata(), __n, __c);
令我困扰的是原始分配器是 char 类型,但分配的内存可能保存一个 _Rep 对象,加上字符数据(它没有为 char 类型)
此外,为什么(或者更确切地说,如何)调用 _M_refdata
知道字符数据的开始(或结束)在缓冲区内的位置(即 this+1< /代码>)
编辑:确实this+1
只是将内部指针推到 _Rep
对象之后的下一个位置?
我对内存对齐和转换有基本的了解,但这似乎超出了我读过的任何内容。
有人可以帮忙,或者给我指出更多信息丰富的阅读材料吗?
I am a bit confused with some of the basic string implementation. I have been going through the source to understand the inner working and learn new things. I can't entirely grasp how the memory is managed.
Just some tidbits from the basic string implementation
The raw allocator is for char type
typedef typename _Alloc::template rebind<char>::other _Raw_bytes_alloc;
...then when allocating Rep is placed within the allocated buffer
__size
is calculated to also fit the characterssize_type __size = (__capacity + 1) * sizeof(_CharT) + sizeof(_Rep); void* __place = _Raw_bytes_alloc(__alloc).allocate(__size); _Rep *__p = new (__place) _Rep;
This is how the character data is fetched from the _Rep buffer
_CharT* _M_refdata() throw() { return reinterpret_cast<_CharT*>(this + 1); }
Setting up the character - for one type of way
_M_assign(__p->_M_refdata(), __n, __c);
What is bothering me is that the raw allocator is type char, but the allocated memory may hold a _Rep object, plus the character data (which does not have to be type char)
Also, why (or rather how) does the call to _M_refdata
know where the start (or end) of the character data is within the buffer (ie this+1
)
Edit: does this+1
just push the internal pointer to the next position after the _Rep
object?
I have a basic understanding of memory alignment and casting, but this seems to go beyond anything I have read up on.
Can anybody help, or point me to more informative reading material?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您错过了新的展示位置。该行在
__place
初始化一个新的_Rep
对象。之前已经分配了这个空间(意味着placement-new本身不会分配,它实际上只是一个构造函数调用)。C 和 C++ 中的指针算术告诉您,
this + 1
是一个指针,指向this
右侧的sizeof(*this)
字节。由于之前已经分配了(__capacity + 1) * sizeof(_CharT) + sizeof(_Rep)
字节,因此_Rep
对象后面的空间用于字符数据。布局是这样的:You're missing the placement new. The line
initializes a new
_Rep
-object at__place
. The space for this has already been allocated before (meaning a placement-new doesn't allocate by itself, it's actually only a constructor call).Pointer arithmetics in C and C++ tells you, that
this + 1
is a pointer that pointssizeof(*this)
bytes right ofthis
. Since there have been allocated(__capacity + 1) * sizeof(_CharT) + sizeof(_Rep)
bytes before, the space after the_Rep
object is used for the character data. The layout is thus like this:分配器,如 C 的
malloc
,返回指向字节的指针,而不是对象。因此,返回类型是char *
或void *
。在 C 和 C++ 标准中的某个位置,有一个子句明确允许在
char
和任何其他对象类型之间重新解释转换。这是因为 C 通常需要将对象视为字节数组(如写入磁盘或网络套接字时),并且需要将字节数组视为对象(如分配一定范围的内存或从磁盘读取时)。为了防止别名和优化问题,您不允许将相同的
char *
转换为不同类型的对象,并且一旦转换了char *
对于对象类型,不允许通过写入字节来修改对象的值。Allocators, like C's
malloc
, return pointers to bytes, not objects. So, the return type is eitherchar *
orvoid *
.Somewhere in the C and C++ standards, there is a clause that explicitly allows reinterpret casting between
char
and any other object type. This is because C often needs to treat objects as byte arrays (as when writing to disk or a network socket) and it needs to treat byte arrays as objects (like when allocating a range of memory or reading from disk).To protect against aliasing and optimization problems, you're not allowed to cast the same
char *
to different types of objects and once you've casted achar *
to an object type, you are not allowed to modify the object's value by writing to the bytes.