为什么内部 Lua 字符串以它们的方式存储?
我想要一个简单的字符串表来存储一堆常量,我想“嘿!Lua 可以做到这一点,让我使用其中的一些函数!”
这主要在lstring.h/lstring.c文件中(我使用的是5.2)
我将首先展示我好奇的代码。它来自 lobject.h
/*
** Header for string value; string bytes follow the end of this structure
*/
typedef union TString {
L_Umaxalign dummy; /* ensures maximum alignment for strings */
struct {
CommonHeader;
lu_byte reserved;
unsigned int hash;
size_t len; /* number of characters in string */
} tsv;
} TString;
/* get the actual string (array of bytes) from a TString */
#define getstr(ts) cast(const char *, (ts) + 1)
/* get the actual string (array of bytes) from a Lua value */
#define svalue(o) getstr(rawtsvalue(o))
如您所见,数据存储在结构之外。要获取字节流,请获取 TString 的大小,加 1,然后就得到 char* 指针。
但这不是糟糕的编码吗?它已被深入到我的 C 类中的 m 中,以形成明确定义的结构。我知道我可能会在这里搅动一个巢,但是您真的会失去那么多的速度/空间来定义一个结构作为数据的标头而不是为该数据定义一个指针值吗?
I was wanting a simple string table that will store a bunch of constants and I thought "Hey! Lua does that, let me use some of there functions!"
This is mainly in the lstring.h/lstring.c files (I am using 5.2)
I will show the code I am curious about first. Its from lobject.h
/*
** Header for string value; string bytes follow the end of this structure
*/
typedef union TString {
L_Umaxalign dummy; /* ensures maximum alignment for strings */
struct {
CommonHeader;
lu_byte reserved;
unsigned int hash;
size_t len; /* number of characters in string */
} tsv;
} TString;
/* get the actual string (array of bytes) from a TString */
#define getstr(ts) cast(const char *, (ts) + 1)
/* get the actual string (array of bytes) from a Lua value */
#define svalue(o) getstr(rawtsvalue(o))
As you see, the data is stored outside of the structure. To get the byte stream, you take the size of TString, add 1, and you got the char* pointer.
Isn't this bad coding though? Its been DRILLED into m in my C classes to make clearly defined structures. I know I might be stirring a nest here, but do you really lose that much speed/space defining a structure as header for data rather than defining a pointer value for that data?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
这个想法可能是您将标头和数据分配在一大块数据中,而不是两块中:
除了一次调用 malloc/free 之外,您还可以减少内存碎片并增加内存本地化。
但回答你的问题,是的,这种黑客行为通常是一种不好的做法,应该非常小心地进行。如果您这样做,您可能希望将它们隐藏在宏/内联函数层下。
The idea is probably that you allocate the header and the data in one big chunk of data instead of two:
In addition to having just one call to malloc/free, you also reduce memory fragmentation and increase memory localization.
But answering your question, yes, these kind of hacks are usually a bad practice, and should be done with extreme care. And if you do, you'll probably want to hide them under a layer of macros/inline functions.
正如罗德里戈所说,这个想法是将标头和字符串数据分配为单个内存块。值得指出的是,您还看到了非标准 hack
,但 C99 添加了灵活的数组成员,因此可以以标准兼容的方式完成,就像
如果 Lua 的字符串以这种方式完成,它会是这样的
As rodrigo says, the idea is to allocate the header and string data as a single chunk of memory. It's worth pointing out that you also see the non-standard hack
but C99 added flexible array members so it can be done in a standard compliant way as
If Lua's string were done in this way it'd be something like
它与更有限的 C 语言引起的复杂性有关。在 C++ 中,您只需定义一个名为 GCObject 的基类,其中包含垃圾收集变量,然后 TString 将是一个子类,并且通过使用虚拟析构函数,两个
TString 都将成为一个子类。 >TString
及其附带的const char *
块将被正确释放。当用 C 语言编写相同类型的功能时,会有点困难,因为类和虚拟继承不存在。
Lua 所做的就是通过插入管理其后面部分内存的垃圾收集状态所需的标头来实现垃圾收集。请记住,
free(void *)
除了内存块的地址之外不需要知道任何内容。Lua 保留这些“可收集”内存块的链表(在本例中是字符数组),以便它可以在不知道其指向的对象类型的情况下有效地释放内存。
如果您的 TString 指向字符数组所在的另一个内存块,那么它需要垃圾收集器确定对象的类型,然后深入研究其结构以释放字符串缓冲。
这种垃圾收集的伪代码如下所示:
It relates to the complications arising from the more limited C language. In C++, you would just define a base class called
GCObject
which contains the garbage collection variables, thenTString
would be a subclass and by using a virtual destructor, both theTString
and it's accompanyingconst char *
blocks would be freed properly.When it comes to writing the same kind of functionality in C, it's a bit more difficult as classes and virtual inheritance do not exist.
What Lua is doing is implementing garbage collection by inserting the header required to manage the garbage collection status of the part of memory following it. Remember that
free(void *)
does not need to know anything other than the address of the memory block.Lua keeps a linked list of these "collectable" blocks of memory, in this case an array of characters, so that it can then free the memory efficiently without knowing the type of object it is pointing to.
If your
TString
pointed to another block of memory where the character array was, then it require the garbage collector determine the object's type, then delve into its structure to also free the string buffer.The pseudo code for this kind of garbage collection would be something like this: